README.md
1 # diff-sequences 2 3 Compare items in two sequences to find a **longest common subsequence**. 4 5 The items not in common are the items to delete or insert in a **shortest edit script**. 6 7 To maximize flexibility and minimize memory, you write **callback** functions as configuration: 8 9 **Input** function `isCommon(aIndex, bIndex)` compares items at indexes in the sequences and returns a truthy/falsey value. This package might call your function more than once for some pairs of indexes. 10 11 - Because your function encapsulates **comparison**, this package can compare items according to `===` operator, `Object.is` method, or other criterion. 12 - Because your function encapsulates **sequences**, this package can find differences in arrays, strings, or other data. 13 14 **Output** function `foundSubsequence(nCommon, aCommon, bCommon)` receives the number of adjacent items and starting indexes of each common subsequence. If sequences do not have common items, then this package does not call your function. 15 16 If N is the sum of lengths of sequences and L is length of a longest common subsequence, then D = N – 2L is the number of **differences** in the corresponding shortest edit script. 17 18 [_An O(ND) Difference Algorithm and Its Variations_](http://xmailserver.org/diff2.pdf) by Eugene W. Myers is fast when sequences have **few** differences. 19 20 This package implements the **linear space** variation with optimizations so it is fast even when sequences have **many** differences. 21 22 ## Usage 23 24 To add this package as a dependency of a project, do either of the following: 25 26 - `npm install diff-sequences` 27 - `yarn add diff-sequences` 28 29 To use `diff` as the name of the default export from this package, do either of the following: 30 31 - `var diff = require('diff-sequences').default; // CommonJS modules` 32 - `import diff from 'diff-sequences'; // ECMAScript modules` 33 34 Call `diff` with the **lengths** of sequences and your **callback** functions: 35 36 ```js 37 const a = ['a', 'b', 'c', 'a', 'b', 'b', 'a']; 38 const b = ['c', 'b', 'a', 'b', 'a', 'c']; 39 40 function isCommon(aIndex, bIndex) { 41 return a[aIndex] === b[bIndex]; 42 } 43 function foundSubsequence(nCommon, aCommon, bCommon) { 44 // see examples 45 } 46 47 diff(a.length, b.length, isCommon, foundSubsequence); 48 ``` 49 50 ## Example of longest common subsequence 51 52 Some sequences (for example, `a` and `b` in the example of usage) have more than one longest common subsequence. 53 54 This package finds the following common items: 55 56 | comparisons of common items | values | output arguments | 57 | :------------------------------- | :--------- | --------------------------: | 58 | `a[2] === b[0]` | `'c'` | `foundSubsequence(1, 2, 0)` | 59 | `a[4] === b[1]` | `'b'` | `foundSubsequence(1, 4, 1)` | 60 | `a[5] === b[3] && a[6] === b[4]` | `'b', 'a'` | `foundSubsequence(2, 5, 3)` | 61 62 The “edit graph” analogy in the Myers paper shows the following common items: 63 64 | comparisons of common items | values | 65 | :------------------------------- | :--------- | 66 | `a[2] === b[0]` | `'c'` | 67 | `a[3] === b[2] && a[4] === b[3]` | `'a', 'b'` | 68 | `a[6] === b[4]` | `'a'` | 69 70 Various packages which implement the Myers algorithm will **always agree** on the **length** of a longest common subsequence, but might **sometimes disagree** on which **items** are in it. 71 72 ## Example of callback functions to count common items 73 74 ```js 75 // Return length of longest common subsequence according to === operator. 76 function countCommonItems(a, b) { 77 let n = 0; 78 function isCommon(aIndex, bIndex) { 79 return a[aIndex] === b[bIndex]; 80 } 81 function foundSubsequence(nCommon) { 82 n += nCommon; 83 } 84 85 diff(a.length, b.length, isCommon, foundSubsequence); 86 87 return n; 88 } 89 90 const commonLength = countCommonItems( 91 ['a', 'b', 'c', 'a', 'b', 'b', 'a'], 92 ['c', 'b', 'a', 'b', 'a', 'c'], 93 ); 94 ``` 95 96 | category of items | expression | value | 97 | :----------------- | ------------------------: | ----: | 98 | in common | `commonLength` | `4` | 99 | to delete from `a` | `a.length - commonLength` | `3` | 100 | to insert from `b` | `b.length - commonLength` | `2` | 101 102 If the length difference `b.length - a.length` is: 103 104 - negative: its absolute value is the minimum number of items to **delete** from `a` 105 - positive: it is the minimum number of items to **insert** from `b` 106 - zero: there is an **equal** number of items to delete from `a` and insert from `b` 107 - non-zero: there is an equal number of **additional** items to delete from `a` and insert from `b` 108 109 In this example, `6 - 7` is: 110 111 - negative: `1` is the minimum number of items to **delete** from `a` 112 - non-zero: `2` is the number of **additional** items to delete from `a` and insert from `b` 113 114 ## Example of callback functions to find common items 115 116 ```js 117 // Return array of items in longest common subsequence according to Object.is method. 118 const findCommonItems = (a, b) => { 119 const array = []; 120 diff( 121 a.length, 122 b.length, 123 (aIndex, bIndex) => Object.is(a[aIndex], b[bIndex]), 124 (nCommon, aCommon) => { 125 for (; nCommon !== 0; nCommon -= 1, aCommon += 1) { 126 array.push(a[aCommon]); 127 } 128 }, 129 ); 130 return array; 131 }; 132 133 const commonItems = findCommonItems( 134 ['a', 'b', 'c', 'a', 'b', 'b', 'a'], 135 ['c', 'b', 'a', 'b', 'a', 'c'], 136 ); 137 ``` 138 139 | `i` | `commonItems[i]` | `aIndex` | 140 | --: | :--------------- | -------: | 141 | `0` | `'c'` | `2` | 142 | `1` | `'b'` | `4` | 143 | `2` | `'b'` | `5` | 144 | `3` | `'a'` | `6` | 145 146 ## Example of callback functions to diff index intervals 147 148 Instead of slicing array-like objects, you can adjust indexes in your callback functions. 149 150 ```js 151 // Diff index intervals that are half open [start, end) like array slice method. 152 const diffIndexIntervals = (a, aStart, aEnd, b, bStart, bEnd) => { 153 // Validate: 0 <= aStart and aStart <= aEnd and aEnd <= a.length 154 // Validate: 0 <= bStart and bStart <= bEnd and bEnd <= b.length 155 156 diff( 157 aEnd - aStart, 158 bEnd - bStart, 159 (aIndex, bIndex) => Object.is(a[aStart + aIndex], b[bStart + bIndex]), 160 (nCommon, aCommon, bCommon) => { 161 // aStart + aCommon, bStart + bCommon 162 }, 163 ); 164 165 // After the last common subsequence, do any remaining work. 166 }; 167 ``` 168 169 ## Example of callback functions to emulate diff command 170 171 Linux or Unix has a `diff` command to compare files line by line. Its output is a **shortest edit script**: 172 173 - **c**hange adjacent lines from the first file to lines from the second file 174 - **d**elete lines from the first file 175 - **a**ppend or insert lines from the second file 176 177 ```js 178 // Given zero-based half-open range [start, end) of array indexes, 179 // return one-based closed range [start + 1, end] as string. 180 const getRange = (start, end) => 181 start + 1 === end ? `${start + 1}` : `${start + 1},${end}`; 182 183 // Given index intervals of lines to delete or insert, or both, or neither, 184 // push formatted diff lines onto array. 185 const pushDelIns = (aLines, aIndex, aEnd, bLines, bIndex, bEnd, array) => { 186 const deleteLines = aIndex !== aEnd; 187 const insertLines = bIndex !== bEnd; 188 const changeLines = deleteLines && insertLines; 189 if (changeLines) { 190 array.push(getRange(aIndex, aEnd) + 'c' + getRange(bIndex, bEnd)); 191 } else if (deleteLines) { 192 array.push(getRange(aIndex, aEnd) + 'd' + String(bIndex)); 193 } else if (insertLines) { 194 array.push(String(aIndex) + 'a' + getRange(bIndex, bEnd)); 195 } else { 196 return; 197 } 198 199 for (; aIndex !== aEnd; aIndex += 1) { 200 array.push('< ' + aLines[aIndex]); // delete is less than 201 } 202 203 if (changeLines) { 204 array.push('---'); 205 } 206 207 for (; bIndex !== bEnd; bIndex += 1) { 208 array.push('> ' + bLines[bIndex]); // insert is greater than 209 } 210 }; 211 212 // Given content of two files, return emulated output of diff utility. 213 const findShortestEditScript = (a, b) => { 214 const aLines = a.split('\n'); 215 const bLines = b.split('\n'); 216 const aLength = aLines.length; 217 const bLength = bLines.length; 218 219 const isCommon = (aIndex, bIndex) => aLines[aIndex] === bLines[bIndex]; 220 221 let aIndex = 0; 222 let bIndex = 0; 223 const array = []; 224 const foundSubsequence = (nCommon, aCommon, bCommon) => { 225 pushDelIns(aLines, aIndex, aCommon, bLines, bIndex, bCommon, array); 226 aIndex = aCommon + nCommon; // number of lines compared in a 227 bIndex = bCommon + nCommon; // number of lines compared in b 228 }; 229 230 diff(aLength, bLength, isCommon, foundSubsequence); 231 232 // After the last common subsequence, push remaining change lines. 233 pushDelIns(aLines, aIndex, aLength, bLines, bIndex, bLength, array); 234 235 return array.length === 0 ? '' : array.join('\n') + '\n'; 236 }; 237 ``` 238 239 ## Example of callback functions to format diff lines 240 241 Here is simplified code to format **changed and unchanged lines** in expected and received values after a test fails in Jest: 242 243 ```js 244 // Format diff with minus or plus for change lines and space for common lines. 245 const formatDiffLines = (a, b) => { 246 // Jest depends on pretty-format package to serialize objects as strings. 247 // Unindented for comparison to avoid distracting differences: 248 const aLinesUn = format(a, {indent: 0 /*, other options*/}).split('\n'); 249 const bLinesUn = format(b, {indent: 0 /*, other options*/}).split('\n'); 250 // Indented to display changed and unchanged lines: 251 const aLinesIn = format(a, {indent: 2 /*, other options*/}).split('\n'); 252 const bLinesIn = format(b, {indent: 2 /*, other options*/}).split('\n'); 253 254 const aLength = aLinesIn.length; // Validate: aLinesUn.length === aLength 255 const bLength = bLinesIn.length; // Validate: bLinesUn.length === bLength 256 257 const isCommon = (aIndex, bIndex) => aLinesUn[aIndex] === bLinesUn[bIndex]; 258 259 // Only because the GitHub Flavored Markdown doc collapses adjacent spaces, 260 // this example code and the following table represent spaces as middle dots. 261 let aIndex = 0; 262 let bIndex = 0; 263 const array = []; 264 const foundSubsequence = (nCommon, aCommon, bCommon) => { 265 for (; aIndex !== aCommon; aIndex += 1) { 266 array.push('-·' + aLinesIn[aIndex]); // delete is minus 267 } 268 for (; bIndex !== bCommon; bIndex += 1) { 269 array.push('+·' + bLinesIn[bIndex]); // insert is plus 270 } 271 for (; nCommon !== 0; nCommon -= 1, aIndex += 1, bIndex += 1) { 272 // For common lines, received indentation seems more intuitive. 273 array.push('··' + bLinesIn[bIndex]); // common is space 274 } 275 }; 276 277 diff(aLength, bLength, isCommon, foundSubsequence); 278 279 // After the last common subsequence, push remaining change lines. 280 for (; aIndex !== aLength; aIndex += 1) { 281 array.push('-·' + aLinesIn[aIndex]); 282 } 283 for (; bIndex !== bLength; bIndex += 1) { 284 array.push('+·' + bLinesIn[bIndex]); 285 } 286 287 return array; 288 }; 289 290 const expected = { 291 searching: '', 292 sorting: { 293 ascending: true, 294 fieldKey: 'what', 295 }, 296 }; 297 const received = { 298 searching: '', 299 sorting: [ 300 { 301 descending: false, 302 fieldKey: 'what', 303 }, 304 ], 305 }; 306 307 const diffLines = formatDiffLines(expected, received); 308 ``` 309 310 If N is the sum of lengths of sequences and L is length of a longest common subsequence, then N – L is length of an array of diff lines. In this example, N is 7 + 9, L is 5, and N – L is 11. 311 312 | `i` | `diffLines[i]` | `aIndex` | `bIndex` | 313 | ---: | :--------------------------------- | -------: | -------: | 314 | `0` | `'··Object {'` | `0` | `0` | 315 | `1` | `'····"searching": "",'` | `1` | `1` | 316 | `2` | `'-···"sorting": Object {'` | `2` | | 317 | `3` | `'-·····"ascending": true,'` | `3` | | 318 | `4` | `'+·····"sorting": Array ['` | | `2` | 319 | `5` | `'+·······Object {'` | | `3` | 320 | `6` | `'+·········"descending": false,'` | | `4` | 321 | `7` | `'··········"fieldKey": "what",'` | `4` | `5` | 322 | `8` | `'········},'` | `5` | `6` | 323 | `9` | `'+·····],'` | | `7` | 324 | `10` | `'··}'` | `6` | `8` | 325 326 ## Example of callback functions to find diff items 327 328 Here is simplified code to find changed and unchanged substrings **within adjacent changed lines** in expected and received values after a test fails in Jest: 329 330 ```js 331 // Return diff items for strings (compatible with diff-match-patch package). 332 const findDiffItems = (a, b) => { 333 const isCommon = (aIndex, bIndex) => a[aIndex] === b[bIndex]; 334 335 let aIndex = 0; 336 let bIndex = 0; 337 const array = []; 338 const foundSubsequence = (nCommon, aCommon, bCommon) => { 339 if (aIndex !== aCommon) { 340 array.push([-1, a.slice(aIndex, aCommon)]); // delete is -1 341 } 342 if (bIndex !== bCommon) { 343 array.push([1, b.slice(bIndex, bCommon)]); // insert is 1 344 } 345 346 aIndex = aCommon + nCommon; // number of characters compared in a 347 bIndex = bCommon + nCommon; // number of characters compared in b 348 array.push([0, a.slice(aCommon, aIndex)]); // common is 0 349 }; 350 351 diff(a.length, b.length, isCommon, foundSubsequence); 352 353 // After the last common subsequence, push remaining change items. 354 if (aIndex !== a.length) { 355 array.push([-1, a.slice(aIndex)]); 356 } 357 if (bIndex !== b.length) { 358 array.push([1, b.slice(bIndex)]); 359 } 360 361 return array; 362 }; 363 364 const expectedDeleted = ['"sorting": Object {', '"ascending": true,'].join( 365 '\n', 366 ); 367 const receivedInserted = [ 368 '"sorting": Array [', 369 'Object {', 370 '"descending": false,', 371 ].join('\n'); 372 373 const diffItems = findDiffItems(expectedDeleted, receivedInserted); 374 ``` 375 376 | `i` | `diffItems[i][0]` | `diffItems[i][1]` | 377 | --: | ----------------: | :---------------- | 378 | `0` | `0` | `'"sorting": '` | 379 | `1` | `1` | `'Array [\n'` | 380 | `2` | `0` | `'Object {\n"'` | 381 | `3` | `-1` | `'a'` | 382 | `4` | `1` | `'de'` | 383 | `5` | `0` | `'scending": '` | 384 | `6` | `-1` | `'tru'` | 385 | `7` | `1` | `'fals'` | 386 | `8` | `0` | `'e,'` | 387 388 The length difference `b.length - a.length` is equal to the sum of `diffItems[i][0]` values times `diffItems[i][1]` lengths. In this example, the difference `48 - 38` is equal to the sum `10`. 389 390 | category of diff item | `[0]` | `[1]` lengths | subtotal | 391 | :-------------------- | ----: | -----------------: | -------: | 392 | in common | `0` | `11 + 10 + 11 + 2` | `0` | 393 | to delete from `a` | `–1` | `1 + 3` | `-4` | 394 | to insert from `b` | `1` | `8 + 2 + 4` | `14` | 395 396 Instead of formatting the changed substrings with escape codes for colors in the `foundSubsequence` function to save memory, this example spends memory to **gain flexibility** before formatting, so a separate heuristic algorithm might modify the generic array of diff items to show changes more clearly: 397 398 | `i` | `diffItems[i][0]` | `diffItems[i][1]` | 399 | --: | ----------------: | :---------------- | 400 | `6` | `-1` | `'true'` | 401 | `7` | `1` | `'false'` | 402 | `8` | `0` | `','` | 403 404 For expected and received strings of serialized data, the result of finding changed **lines**, and then finding changed **substrings** within adjacent changed lines (as in the preceding two examples) sometimes displays the changes in a more intuitive way than the result of finding changed substrings, and then splitting them into changed and unchanged lines.