README.md
  1  # diff-sequences
  2  
  3  Compare items in two sequences to find a **longest common subsequence**.
  4  
  5  The items not in common are the items to delete or insert in a **shortest edit script**.
  6  
  7  To maximize flexibility and minimize memory, you write **callback** functions as configuration:
  8  
  9  **Input** function `isCommon(aIndex, bIndex)` compares items at indexes in the sequences and returns a truthy/falsey value. This package might call your function more than once for some pairs of indexes.
 10  
 11  - Because your function encapsulates **comparison**, this package can compare items according to `===` operator, `Object.is` method, or other criterion.
 12  - Because your function encapsulates **sequences**, this package can find differences in arrays, strings, or other data.
 13  
 14  **Output** function `foundSubsequence(nCommon, aCommon, bCommon)` receives the number of adjacent items and starting indexes of each common subsequence. If sequences do not have common items, then this package does not call your function.
 15  
 16  If N is the sum of lengths of sequences and L is length of a longest common subsequence, then D = N – 2L is the number of **differences** in the corresponding shortest edit script.
 17  
 18  [_An O(ND) Difference Algorithm and Its Variations_](http://xmailserver.org/diff2.pdf) by Eugene W. Myers is fast when sequences have **few** differences.
 19  
 20  This package implements the **linear space** variation with optimizations so it is fast even when sequences have **many** differences.
 21  
 22  ## Usage
 23  
 24  To add this package as a dependency of a project, do either of the following:
 25  
 26  - `npm install diff-sequences`
 27  - `yarn add diff-sequences`
 28  
 29  To use `diff` as the name of the default export from this package, do either of the following:
 30  
 31  - `var diff = require('diff-sequences').default; // CommonJS modules`
 32  - `import diff from 'diff-sequences'; // ECMAScript modules`
 33  
 34  Call `diff` with the **lengths** of sequences and your **callback** functions:
 35  
 36  ```js
 37  const a = ['a', 'b', 'c', 'a', 'b', 'b', 'a'];
 38  const b = ['c', 'b', 'a', 'b', 'a', 'c'];
 39  
 40  function isCommon(aIndex, bIndex) {
 41    return a[aIndex] === b[bIndex];
 42  }
 43  function foundSubsequence(nCommon, aCommon, bCommon) {
 44    // see examples
 45  }
 46  
 47  diff(a.length, b.length, isCommon, foundSubsequence);
 48  ```
 49  
 50  ## Example of longest common subsequence
 51  
 52  Some sequences (for example, `a` and `b` in the example of usage) have more than one longest common subsequence.
 53  
 54  This package finds the following common items:
 55  
 56  | comparisons of common items      | values     |            output arguments |
 57  | :------------------------------- | :--------- | --------------------------: |
 58  | `a[2] === b[0]`                  | `'c'`      | `foundSubsequence(1, 2, 0)` |
 59  | `a[4] === b[1]`                  | `'b'`      | `foundSubsequence(1, 4, 1)` |
 60  | `a[5] === b[3] && a[6] === b[4]` | `'b', 'a'` | `foundSubsequence(2, 5, 3)` |
 61  
 62  The “edit graph” analogy in the Myers paper shows the following common items:
 63  
 64  | comparisons of common items      | values     |
 65  | :------------------------------- | :--------- |
 66  | `a[2] === b[0]`                  | `'c'`      |
 67  | `a[3] === b[2] && a[4] === b[3]` | `'a', 'b'` |
 68  | `a[6] === b[4]`                  | `'a'`      |
 69  
 70  Various packages which implement the Myers algorithm will **always agree** on the **length** of a longest common subsequence, but might **sometimes disagree** on which **items** are in it.
 71  
 72  ## Example of callback functions to count common items
 73  
 74  ```js
 75  // Return length of longest common subsequence according to === operator.
 76  function countCommonItems(a, b) {
 77    let n = 0;
 78    function isCommon(aIndex, bIndex) {
 79      return a[aIndex] === b[bIndex];
 80    }
 81    function foundSubsequence(nCommon) {
 82      n += nCommon;
 83    }
 84  
 85    diff(a.length, b.length, isCommon, foundSubsequence);
 86  
 87    return n;
 88  }
 89  
 90  const commonLength = countCommonItems(
 91    ['a', 'b', 'c', 'a', 'b', 'b', 'a'],
 92    ['c', 'b', 'a', 'b', 'a', 'c'],
 93  );
 94  ```
 95  
 96  | category of items  |                expression | value |
 97  | :----------------- | ------------------------: | ----: |
 98  | in common          |            `commonLength` |   `4` |
 99  | to delete from `a` | `a.length - commonLength` |   `3` |
100  | to insert from `b` | `b.length - commonLength` |   `2` |
101  
102  If the length difference `b.length - a.length` is:
103  
104  - negative: its absolute value is the minimum number of items to **delete** from `a`
105  - positive: it is the minimum number of items to **insert** from `b`
106  - zero: there is an **equal** number of items to delete from `a` and insert from `b`
107  - non-zero: there is an equal number of **additional** items to delete from `a` and insert from `b`
108  
109  In this example, `6 - 7` is:
110  
111  - negative: `1` is the minimum number of items to **delete** from `a`
112  - non-zero: `2` is the number of **additional** items to delete from `a` and insert from `b`
113  
114  ## Example of callback functions to find common items
115  
116  ```js
117  // Return array of items in longest common subsequence according to Object.is method.
118  const findCommonItems = (a, b) => {
119    const array = [];
120    diff(
121      a.length,
122      b.length,
123      (aIndex, bIndex) => Object.is(a[aIndex], b[bIndex]),
124      (nCommon, aCommon) => {
125        for (; nCommon !== 0; nCommon -= 1, aCommon += 1) {
126          array.push(a[aCommon]);
127        }
128      },
129    );
130    return array;
131  };
132  
133  const commonItems = findCommonItems(
134    ['a', 'b', 'c', 'a', 'b', 'b', 'a'],
135    ['c', 'b', 'a', 'b', 'a', 'c'],
136  );
137  ```
138  
139  | `i` | `commonItems[i]` | `aIndex` |
140  | --: | :--------------- | -------: |
141  | `0` | `'c'`            |      `2` |
142  | `1` | `'b'`            |      `4` |
143  | `2` | `'b'`            |      `5` |
144  | `3` | `'a'`            |      `6` |
145  
146  ## Example of callback functions to diff index intervals
147  
148  Instead of slicing array-like objects, you can adjust indexes in your callback functions.
149  
150  ```js
151  // Diff index intervals that are half open [start, end) like array slice method.
152  const diffIndexIntervals = (a, aStart, aEnd, b, bStart, bEnd) => {
153    // Validate: 0 <= aStart and aStart <= aEnd and aEnd <= a.length
154    // Validate: 0 <= bStart and bStart <= bEnd and bEnd <= b.length
155  
156    diff(
157      aEnd - aStart,
158      bEnd - bStart,
159      (aIndex, bIndex) => Object.is(a[aStart + aIndex], b[bStart + bIndex]),
160      (nCommon, aCommon, bCommon) => {
161        // aStart + aCommon, bStart + bCommon
162      },
163    );
164  
165    // After the last common subsequence, do any remaining work.
166  };
167  ```
168  
169  ## Example of callback functions to emulate diff command
170  
171  Linux or Unix has a `diff` command to compare files line by line. Its output is a **shortest edit script**:
172  
173  - **c**hange adjacent lines from the first file to lines from the second file
174  - **d**elete lines from the first file
175  - **a**ppend or insert lines from the second file
176  
177  ```js
178  // Given zero-based half-open range [start, end) of array indexes,
179  // return one-based closed range [start + 1, end] as string.
180  const getRange = (start, end) =>
181    start + 1 === end ? `${start + 1}` : `${start + 1},${end}`;
182  
183  // Given index intervals of lines to delete or insert, or both, or neither,
184  // push formatted diff lines onto array.
185  const pushDelIns = (aLines, aIndex, aEnd, bLines, bIndex, bEnd, array) => {
186    const deleteLines = aIndex !== aEnd;
187    const insertLines = bIndex !== bEnd;
188    const changeLines = deleteLines && insertLines;
189    if (changeLines) {
190      array.push(getRange(aIndex, aEnd) + 'c' + getRange(bIndex, bEnd));
191    } else if (deleteLines) {
192      array.push(getRange(aIndex, aEnd) + 'd' + String(bIndex));
193    } else if (insertLines) {
194      array.push(String(aIndex) + 'a' + getRange(bIndex, bEnd));
195    } else {
196      return;
197    }
198  
199    for (; aIndex !== aEnd; aIndex += 1) {
200      array.push('< ' + aLines[aIndex]); // delete is less than
201    }
202  
203    if (changeLines) {
204      array.push('---');
205    }
206  
207    for (; bIndex !== bEnd; bIndex += 1) {
208      array.push('> ' + bLines[bIndex]); // insert is greater than
209    }
210  };
211  
212  // Given content of two files, return emulated output of diff utility.
213  const findShortestEditScript = (a, b) => {
214    const aLines = a.split('\n');
215    const bLines = b.split('\n');
216    const aLength = aLines.length;
217    const bLength = bLines.length;
218  
219    const isCommon = (aIndex, bIndex) => aLines[aIndex] === bLines[bIndex];
220  
221    let aIndex = 0;
222    let bIndex = 0;
223    const array = [];
224    const foundSubsequence = (nCommon, aCommon, bCommon) => {
225      pushDelIns(aLines, aIndex, aCommon, bLines, bIndex, bCommon, array);
226      aIndex = aCommon + nCommon; // number of lines compared in a
227      bIndex = bCommon + nCommon; // number of lines compared in b
228    };
229  
230    diff(aLength, bLength, isCommon, foundSubsequence);
231  
232    // After the last common subsequence, push remaining change lines.
233    pushDelIns(aLines, aIndex, aLength, bLines, bIndex, bLength, array);
234  
235    return array.length === 0 ? '' : array.join('\n') + '\n';
236  };
237  ```
238  
239  ## Example of callback functions to format diff lines
240  
241  Here is simplified code to format **changed and unchanged lines** in expected and received values after a test fails in Jest:
242  
243  ```js
244  // Format diff with minus or plus for change lines and space for common lines.
245  const formatDiffLines = (a, b) => {
246    // Jest depends on pretty-format package to serialize objects as strings.
247    // Unindented for comparison to avoid distracting differences:
248    const aLinesUn = format(a, {indent: 0 /*, other options*/}).split('\n');
249    const bLinesUn = format(b, {indent: 0 /*, other options*/}).split('\n');
250    // Indented to display changed and unchanged lines:
251    const aLinesIn = format(a, {indent: 2 /*, other options*/}).split('\n');
252    const bLinesIn = format(b, {indent: 2 /*, other options*/}).split('\n');
253  
254    const aLength = aLinesIn.length; // Validate: aLinesUn.length === aLength
255    const bLength = bLinesIn.length; // Validate: bLinesUn.length === bLength
256  
257    const isCommon = (aIndex, bIndex) => aLinesUn[aIndex] === bLinesUn[bIndex];
258  
259    // Only because the GitHub Flavored Markdown doc collapses adjacent spaces,
260    // this example code and the following table represent spaces as middle dots.
261    let aIndex = 0;
262    let bIndex = 0;
263    const array = [];
264    const foundSubsequence = (nCommon, aCommon, bCommon) => {
265      for (; aIndex !== aCommon; aIndex += 1) {
266        array.push('-·' + aLinesIn[aIndex]); // delete is minus
267      }
268      for (; bIndex !== bCommon; bIndex += 1) {
269        array.push('+·' + bLinesIn[bIndex]); // insert is plus
270      }
271      for (; nCommon !== 0; nCommon -= 1, aIndex += 1, bIndex += 1) {
272        // For common lines, received indentation seems more intuitive.
273        array.push('··' + bLinesIn[bIndex]); // common is space
274      }
275    };
276  
277    diff(aLength, bLength, isCommon, foundSubsequence);
278  
279    // After the last common subsequence, push remaining change lines.
280    for (; aIndex !== aLength; aIndex += 1) {
281      array.push('-·' + aLinesIn[aIndex]);
282    }
283    for (; bIndex !== bLength; bIndex += 1) {
284      array.push('+·' + bLinesIn[bIndex]);
285    }
286  
287    return array;
288  };
289  
290  const expected = {
291    searching: '',
292    sorting: {
293      ascending: true,
294      fieldKey: 'what',
295    },
296  };
297  const received = {
298    searching: '',
299    sorting: [
300      {
301        descending: false,
302        fieldKey: 'what',
303      },
304    ],
305  };
306  
307  const diffLines = formatDiffLines(expected, received);
308  ```
309  
310  If N is the sum of lengths of sequences and L is length of a longest common subsequence, then N – L is length of an array of diff lines. In this example, N is 7 + 9, L is 5, and N – L is 11.
311  
312  |  `i` | `diffLines[i]`                     | `aIndex` | `bIndex` |
313  | ---: | :--------------------------------- | -------: | -------: |
314  |  `0` | `'··Object {'`                     |      `0` |      `0` |
315  |  `1` | `'····"searching": "",'`           |      `1` |      `1` |
316  |  `2` | `'-···"sorting": Object {'`        |      `2` |          |
317  |  `3` | `'-·····"ascending": true,'`       |      `3` |          |
318  |  `4` | `'+·····"sorting": Array ['`       |          |      `2` |
319  |  `5` | `'+·······Object {'`               |          |      `3` |
320  |  `6` | `'+·········"descending": false,'` |          |      `4` |
321  |  `7` | `'··········"fieldKey": "what",'`  |      `4` |      `5` |
322  |  `8` | `'········},'`                     |      `5` |      `6` |
323  |  `9` | `'+·····],'`                       |          |      `7` |
324  | `10` | `'··}'`                            |      `6` |      `8` |
325  
326  ## Example of callback functions to find diff items
327  
328  Here is simplified code to find changed and unchanged substrings **within adjacent changed lines** in expected and received values after a test fails in Jest:
329  
330  ```js
331  // Return diff items for strings (compatible with diff-match-patch package).
332  const findDiffItems = (a, b) => {
333    const isCommon = (aIndex, bIndex) => a[aIndex] === b[bIndex];
334  
335    let aIndex = 0;
336    let bIndex = 0;
337    const array = [];
338    const foundSubsequence = (nCommon, aCommon, bCommon) => {
339      if (aIndex !== aCommon) {
340        array.push([-1, a.slice(aIndex, aCommon)]); // delete is -1
341      }
342      if (bIndex !== bCommon) {
343        array.push([1, b.slice(bIndex, bCommon)]); // insert is 1
344      }
345  
346      aIndex = aCommon + nCommon; // number of characters compared in a
347      bIndex = bCommon + nCommon; // number of characters compared in b
348      array.push([0, a.slice(aCommon, aIndex)]); // common is 0
349    };
350  
351    diff(a.length, b.length, isCommon, foundSubsequence);
352  
353    // After the last common subsequence, push remaining change items.
354    if (aIndex !== a.length) {
355      array.push([-1, a.slice(aIndex)]);
356    }
357    if (bIndex !== b.length) {
358      array.push([1, b.slice(bIndex)]);
359    }
360  
361    return array;
362  };
363  
364  const expectedDeleted = ['"sorting": Object {', '"ascending": true,'].join(
365    '\n',
366  );
367  const receivedInserted = [
368    '"sorting": Array [',
369    'Object {',
370    '"descending": false,',
371  ].join('\n');
372  
373  const diffItems = findDiffItems(expectedDeleted, receivedInserted);
374  ```
375  
376  | `i` | `diffItems[i][0]` | `diffItems[i][1]` |
377  | --: | ----------------: | :---------------- |
378  | `0` |               `0` | `'"sorting": '`   |
379  | `1` |               `1` | `'Array [\n'`     |
380  | `2` |               `0` | `'Object {\n"'`   |
381  | `3` |              `-1` | `'a'`             |
382  | `4` |               `1` | `'de'`            |
383  | `5` |               `0` | `'scending": '`   |
384  | `6` |              `-1` | `'tru'`           |
385  | `7` |               `1` | `'fals'`          |
386  | `8` |               `0` | `'e,'`            |
387  
388  The length difference `b.length - a.length` is equal to the sum of `diffItems[i][0]` values times `diffItems[i][1]` lengths. In this example, the difference `48 - 38` is equal to the sum `10`.
389  
390  | category of diff item | `[0]` |      `[1]` lengths | subtotal |
391  | :-------------------- | ----: | -----------------: | -------: |
392  | in common             |   `0` | `11 + 10 + 11 + 2` |      `0` |
393  | to delete from `a`    |  `–1` |            `1 + 3` |     `-4` |
394  | to insert from `b`    |   `1` |        `8 + 2 + 4` |     `14` |
395  
396  Instead of formatting the changed substrings with escape codes for colors in the `foundSubsequence` function to save memory, this example spends memory to **gain flexibility** before formatting, so a separate heuristic algorithm might modify the generic array of diff items to show changes more clearly:
397  
398  | `i` | `diffItems[i][0]` | `diffItems[i][1]` |
399  | --: | ----------------: | :---------------- |
400  | `6` |              `-1` | `'true'`          |
401  | `7` |               `1` | `'false'`         |
402  | `8` |               `0` | `','`             |
403  
404  For expected and received strings of serialized data, the result of finding changed **lines**, and then finding changed **substrings** within adjacent changed lines (as in the preceding two examples) sometimes displays the changes in a more intuitive way than the result of finding changed substrings, and then splitting them into changed and unchanged lines.