README.md
1 Difflib.js 2 ========== 3 4 A JavaScript module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including context and unified diffs. Ported from Python's [difflib](http://docs.python.org/library/difflib.html) module. 5 6 Installation 7 ------------ 8 9 #### Browser 10 11 To use it in the browser, you may download the [minified js file](https://github.com/qiao/difflib.js/raw/master/dist/difflib-browser.js) and include it in your webpage. 12 13 ```html 14 <script type="text/javascript" src="./difflib-browser.js"></script> 15 ``` 16 17 #### Node.js 18 19 For Node.js, you can install it using Node Package Manager (npm): 20 21 ```bash 22 npm install difflib 23 ``` 24 25 Then, in your script: 26 27 ```js 28 var difflib = require('difflib'); 29 ``` 30 31 Quick Examples 32 -------------- 33 34 1. contextDiff 35 36 ```js 37 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 38 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 39 >>> difflib.contextDiff(s1, s2, {fromfile:'before.py', tofile:'after.py'}) 40 [ '*** before.py\n', 41 '--- after.py\n', 42 '***************\n', 43 '*** 1,4 ****\n', 44 '! bacon\n', 45 '! eggs\n', 46 '! ham\n', 47 ' guido\n', 48 '--- 1,4 ----\n', 49 '! python\n', 50 '! eggy\n', 51 '! hamster\n', 52 ' guido\n' ] 53 ``` 54 55 2. unifiedDiff 56 57 ```js 58 >>> difflib.unifiedDiff('one two three four'.split(' '), 59 ... 'zero one tree four'.split(' '), { 60 ... fromfile: 'Original' 61 ... tofile: 'Current', 62 ... fromfiledate: '2005-01-26 23:30:50', 63 ... tofiledate: '2010-04-02 10:20:52', 64 ... lineterm: '' 65 ... }) 66 [ '--- Original\t2005-01-26 23:30:50', 67 '+++ Current\t2010-04-02 10:20:52', 68 '@@ -1,4 +1,4 @@', 69 '+zero', 70 ' one', 71 '-two', 72 '-three', 73 '+tree', 74 ' four' ] 75 ``` 76 77 78 3. ndiff 79 80 ```js 81 >>> a = ['one\n', 'two\n', 'three\n'] 82 >>> b = ['ore\n', 'tree\n', 'emu\n'] 83 >>> difflib.ndiff(a, b) 84 [ '- one\n', 85 '? ^\n', 86 '+ ore\n', 87 '? ^\n', 88 '- two\n', 89 '- three\n', 90 '? -\n', 91 '+ tree\n', 92 '+ emu\n' ] 93 ``` 94 95 4. ratio 96 97 ```js 98 >>> s = new difflib.SequenceMatcher(null, 'abcd', 'bcde'); 99 >>> s.ratio(); 100 0.75 101 >>> s.quickRatio(); 102 0.75 103 >>> s.realQuickRatio(); 104 1.0 105 ``` 106 107 5. getOpcodes 108 109 ```js 110 >>> s = new difflib.SequenceMatcher(null, 'qabxcd', 'abycdf'); 111 >>> s.getOpcodes(); 112 [ [ 'delete' , 0 , 1 , 0 , 0 ] , 113 [ 'equal' , 1 , 3 , 0 , 2 ] , 114 [ 'replace' , 3 , 4 , 2 , 3 ] , 115 [ 'equal' , 4 , 6 , 3 , 5 ] , 116 [ 'insert' , 6 , 6 , 5 , 6 ] ] 117 ``` 118 119 6. getCloseMatches 120 121 ```js 122 >>> difflib.getCloseMatches('appel', ['ape', 'apple', 'peach', 'puppy']) 123 ['apple', 'ape'] 124 ``` 125 126 Documentation 127 ------------- 128 129 * [SequenceMatcher](#SequenceMatcher) 130 131 * [setSeqs](#setSeqs) 132 * [setSeq1](#setSeq1) 133 * [setSeq2](#setSeq2) 134 * [findLongestMatch](#findLongestMatch) 135 * [getMatchingBlocks](#getMatchingBlocks) 136 * [getOpcodes](#getOpcodes) 137 * [getGroupedOpcodes](#getGroupedOpcodes) 138 * [ratio](#ratio) 139 * [quickRatio](#quickRatio) 140 * [realQuickRatio](#realQuickRatio) 141 142 * [Differ](#Differ) 143 144 * [compare](#compare) 145 146 * [contextDiff](#contextDiff) 147 * [getCloseMatches](#getCloseMatches) 148 * [ndiff](#ndiff) 149 * [restore](#restore) 150 * [unifiedDiff](#unifiedDiff) 151 * [IS_LINE_JUNK](#IS_LINE_JUNK) 152 * [IS_CHARACTER_JUNK](#IS_CHARACTER_JUNK) 153 154 155 <a name="SequenceMatcher" /> 156 ### *class* difflib.**SequenceMatcher**([isjunk[, a[, b[, autojunk=true]]]]) 157 158 This is a flexible class for comparing pairs of sequences of any type. 159 160 Optional argument *isjunk* must be **null** (the default) or a one-argument function 161 that takes a sequence element and returns true if and only if the element is 162 "junk" and should be ignored. 163 164 Passing **null** for *isjunk* is equivalent to passing 165 166 ```js 167 function(x) { return false; }; 168 ``` 169 170 in other words, no elements are ignored. 171 172 For example, pass: 173 174 ```js 175 function(x) { return x == ' ' || x == '\t'; } 176 ``` 177 178 if you're comparing lines as sequences of characters, 179 and don’t want to synch up on blanks or hard tabs. 180 181 The optional arguments *a* and *b* are sequences to be compared; 182 both default to empty strings. 183 184 The optional argument *autojunk* can be used to disable the 185 automatic junk heuristic, which automatically treats certain sequence items as junk. 186 187 188 <a name="setSeqs" /> 189 #### setSeqs(a, b) 190 191 Set the two sequences to be compared. 192 193 SequenceMatcher computes and caches detailed information about the second 194 sequence, so if you want to compare one sequence against many sequences, 195 use [setSeq2()](#setSeq2) to set the commonly used sequence once and call 196 [setSeq1()](#setSeq1) repeatedly, once for each of the other sequences. 197 198 <a name="setSeq1" /> 199 #### setSeq1(a) 200 201 Set the first sequence to be compared. The second sequence to be compared is not changed. 202 203 <a name="setSeq2" /> 204 #### setSeq2(a) 205 206 Set the second sequence to be compared. The first sequence to be compared is not changed. 207 208 <a name="findLongestMatch" /> 209 #### findLongestMatch(alo, ahi, blo, bhi) 210 211 Find longest matching block in `a[alo:ahi]` and `b[blo:bhi]`. 212 213 If *isjunk* was omitted or null, *findLongestMatch()* returns `[i, j, k]` such that 214 `a[i:i+k]` is equal to `b[j:j+k]`, where `alo <= i <= i+k <= ahi` and 215 `blo <= j <= j+k <= bhi`. 216 For all `[i', j', k']` meeting those conditions, the additional conditions `k >= k'`, 217 `i <= i'`, and if `i == i'`, `j <= j'` are also met. 218 In other words, of all maximal matching blocks, return one that starts earliest in *a*, 219 and of all those maximal matching blocks that start earliest in *a*, 220 return the one that starts earliest in *b*. 221 222 ```js 223 >>> s = new difflib.SequenceMatcher(null, " abcd", "abcd abcd"); 224 >>> s.findLongestMatch(0, 5, 0, 9); 225 [0, 4, 5] 226 ``` 227 228 If *isjunk* was provided, first the longest matching block is determined 229 as above, but with the additional restriction that no junk element appears 230 in the block. 231 Then that block is extended as far as possible by matching (only) junk 232 elements on both sides. So the resulting block never matches on junk 233 except as identical junk happens to be adjacent to an interesting match. 234 235 Here's the same example as before, but considering blanks to be junk. 236 That prevents `' abcd'` from matching the `' abcd'` at the tail end of 237 the second sequence directly. 238 Instead only the `'abcd'` can match, and matches the leftmost `'abcd'` 239 in the second sequence: 240 241 ```js 242 >>> s = new difflib.SequenceMatcher(function(x) {return x == ' ';}, " abcd", "abcd abcd") 243 >>> s.findLongestMatch(0, 5, 0, 9) 244 [1, 0, 4] 245 ``` 246 247 If no blocks match, this returns `[alo, blo, 0]`. 248 249 250 <a name="getMatchingBlocks" /> 251 #### getMatchingBlocks() 252 253 Return list of triples describing matching subsequences. 254 Each triple is of the form `[i, j, n]`, and means that `a[i:i+n] == b[j:j+n]`. 255 The triples are monotonically increasing in *i* and *j*. 256 257 The last triple is a dummy, and has the value `[a.length, b.length, 0]`. 258 It is the only triple with `n == 0`. If `[i, j, n]` and `[i', j', n']` 259 are adjacent triples in the list, and the second is not the last triple 260 in the list, then `i+n != i'` or `j+n != j'`; 261 in other words, adjacent triples always describe non-adjacent equal blocks. 262 263 ```js 264 >>> s = new difflib.SequenceMatcher(null, "abxcd", "abcd") 265 >>> s.getMatchingBlocks() 266 [ [0, 0, 2], [3, 2, 2], [5, 4, 0] ] 267 ``` 268 269 <a name="getOpcodes" /> 270 #### getOpcodes() 271 272 Return list of 5-tuples describing how to turn a into b. 273 Each tuple is of the form `[tag, i1, i2, j1, j2]`. 274 The first tuple has `i1 == j1 == 0`, and remaining tuples 275 have *i1* equal to the *i2* from the preceding tuple, 276 and, likewise, *j1* equal to the previous *j2*. 277 278 The tag values are strings, with these meanings: 279 280 Value Meaning 281 282 'replace' a[i1:i2] should be replaced by b[j1:j2]. 283 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 284 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 285 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal). 286 287 ```js 288 >>> s = new difflib.SequenceMatcher(null, 'qabxcd', 'abycdf'); 289 >>> s.getOpcodes(); 290 [ [ 'delete' , 0 , 1 , 0 , 0 ] , 291 [ 'equal' , 1 , 3 , 0 , 2 ] , 292 [ 'replace' , 3 , 4 , 2 , 3 ] , 293 [ 'equal' , 4 , 6 , 3 , 5 ] , 294 [ 'insert' , 6 , 6 , 5 , 6 ] ] 295 ``` 296 297 <a name="getGroupedOpcodes" /> 298 #### getGroupedOpcodes([n]) 299 300 Return a list groups with upto n (default is 3) lines of context. 301 Each group is in the same format as returned by [getOpcodes()](#getOpcodes). 302 303 <a name="ratio" /> 304 #### ratio() 305 306 Return a measure of the sequences’ similarity as a float in the range [0, 1]. 307 308 Where T is the total number of elements in both sequences, 309 and M is the number of matches, this is 2.0*M / T. 310 Note that this is `1.0` if the sequences are identical, 311 and `0.0` if they have nothing in common. 312 313 This is expensive to compute if [getMatchingBlocks()](#getMatchingBlocks) or 314 [getOpcodes()](#getOpcodes) hasn’t already been called, in which case 315 you may want to try [quickRatio()](#quickRatio) or 316 [realQuickRatio()](#realQuickRatio) first to get an upper bound. 317 318 <a name="quickRatio" /> 319 #### quickRatio() 320 321 Return an upper bound on ratio() relatively quickly. 322 323 <a name="realQuickRatio" /> 324 #### realQuickRatio() 325 326 Return an upper bound on ratio() very quickly. 327 328 ```js 329 >>> s = new difflib.SequenceMatcher(null, 'abcd', 'bcde'); 330 >>> s.ratio(); 331 0.75 332 >>> s.quickRatio(); 333 0.75 334 >>> s.realQuickRatio(); 335 1.0 336 ``` 337 338 <a name="Differ" /> 339 ### *class* difflib.**Differ**([linejunk[, charjunk]]) 340 341 This is a class for comparing sequences of lines of text, 342 and producing human-readable differences or deltas. 343 Differ uses [SequenceMatcher](#SequenceMatcher) both to compare 344 sequences of lines, and to compare sequences of characters within 345 similar (near-matching) lines. 346 347 Each line of a Differ delta begins with a two-letter code: 348 349 Code Meaning 350 '- ' line unique to sequence 1 351 '+ ' line unique to sequence 2 352 ' ' line common to both sequences 353 '? ' line not present in either input sequence 354 355 Lines beginning with `?` attempt to guide the eye to intraline differences, 356 and were not present in either input sequence. 357 These lines can be confusing if the sequences contain tab characters. 358 359 Optional parameters *linejunk* and *charjunk* are for filter functions (or **null**): 360 361 *linejunk*: A function that accepts a single string argument, 362 and returns true if the string is junk. 363 The default is **null**, meaning that no line is considered junk. 364 365 *charjunk*: A function that accepts a single character argument 366 (a string of length 1), and returns true if the character is junk. 367 The default is *null*, meaning that no character is considered junk. 368 369 <a name="compare" /> 370 #### compare(a, b) 371 372 Compare two sequences of lines, and generate the delta (a sequence of lines). 373 374 Each sequence must contain individual single-line strings ending with newlines. 375 376 ```js 377 >>> d = new difflib.Differ() 378 >>> d.compare(['one\n', 'two\n', 'three\n'], 379 ... ['ore\n', 'tree\n', 'emu\n']) 380 [ '- one\n', 381 '? ^\n', 382 '+ ore\n', 383 '? ^\n', 384 '- two\n', 385 '- three\n', 386 '? -\n', 387 '+ tree\n', 388 '+ emu\n' ] 389 ``` 390 391 <a name="contextDiff" /> 392 ### difflib.**contextDiff**(a, b, options) 393 394 Compare *a* and *b* (lists of strings); 395 return the delta lines in context diff format. 396 397 options: 398 399 * fromfile 400 * tofile 401 * fromfiledate 402 * tofiledate 403 * n 404 * lineterm 405 406 Context diffs are a compact way of showing just the lines that 407 have changed plus a few lines of context. The changes are shown in a 408 before/after style. 409 The number of context lines is set by n which defaults to three. 410 411 By default, the diff control lines (those with `***` or `---`) are created 412 with a trailing newline. 413 414 For inputs that do not have trailing newlines, set the lineterm argument 415 to `""` so that the output will be uniformly newline free. 416 417 The context diff format normally has a header for filenames and modification 418 times. Any or all of these may be specified using strings for *fromfile*, 419 *tofile*, *fromfiledate*, and *tofiledate*. 420 The modification times are normally expressed in the ISO 8601 format. 421 If not specified, the strings default to blanks. 422 423 ```js 424 >>> var s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 425 >>> var s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 426 >>> difflib.contextDiff(s1, s2, {fromfile:'before.py', tofile:'after.py'}) 427 [ '*** before.py\n', 428 '--- after.py\n', 429 '***************\n', 430 '*** 1,4 ****\n', 431 '! bacon\n', 432 '! eggs\n', 433 '! ham\n', 434 ' guido\n', 435 '--- 1,4 ----\n', 436 '! python\n', 437 '! eggy\n', 438 '! hamster\n', 439 ' guido\n' ] 440 ``` 441 442 <a name="getCloseMatches" /> 443 ### difflib.*getCloseMatches*(word, possibilities\[, n\]\[, cutoff\]) 444 445 Return a list of the best “good enough” matches. 446 *word* is a sequence for which close matches are desired 447 (typically a string), and *possibilities* is a list of sequences against 448 which to match word (typically a list of strings). 449 450 Optional argument *n* (default 3) is the maximum number of close 451 matches to return; *n* must be greater than 0. 452 453 Optional argument *cutoff* (default 0.6) is a float in the range 454 [0, 1]. 455 Possibilities that don’t score at least that similar to word are ignored. 456 457 The best (no more than n) matches among the possibilities are 458 returned in a list, sorted by similarity score, most similar first. 459 460 ```js 461 >>> difflib.getCloseMatches('appel', ['ape', 'apple', 'peach', 'puppy']) 462 ['apple', 'ape'] 463 ``` 464 465 <a name="ndiff" /> 466 ### difflib.**ndiff**(a, b\[, linejunk\]\[, charjunk\]) 467 468 Compare *a* and b (lists of strings); 469 return Differ-style delta lines 470 471 Optional keyword parameters *linejunk* and *charjunk* are for 472 filter functions (or **null**): 473 474 *linejunk*: A function that accepts a single string argument, 475 and returns true if the string is junk, or false if not. 476 The default is (*null*). 477 478 *charjunk*: A function that accepts a character (a string of length 1), 479 and returns if the character is junk, or false if not. The default is 480 module-level function [IS_CHARACTER_JUNK()](#IS_CHARACTER_JUNK), 481 which filters out whitespace characters (a blank or tab; note: 482 bad idea to include newline in this!). 483 484 ```js 485 >>> a = ['one\n', 'two\n', 'three\n'] 486 >>> b = ['ore\n', 'tree\n', 'emu\n'] 487 >>> difflib.ndiff(a, b) 488 [ '- one\n', 489 '? ^\n', 490 '+ ore\n', 491 '? ^\n', 492 '- two\n', 493 '- three\n', 494 '? -\n', 495 '+ tree\n', 496 '+ emu\n' ] 497 ``` 498 499 <a name="restore" /> 500 ### difflib.**restore**(sequence, which) 501 502 Return one of the two sequences that generated a delta. 503 504 Given a sequence produced by Differ.compare() or ndiff(), 505 extract lines originating from file 1 or 2 (parameter which), stripping off line prefixes. 506 507 ```js 508 >>> a = ['one\n', 'two\n', 'three\n'] 509 >>> b = ['ore\n', 'tree\n', 'emu\n'] 510 >>> diff = difflib.ndiff(a, b) 511 >>> difflib.restore(diff, 1) 512 [ 'one\n', 513 'two\n', 514 'three\n' ] 515 >>> restore(diff, 2) 516 [ 'ore\n', 517 'tree\n', 518 'emu\n' ] 519 ``` 520 521 <a name="unifiedDiff" /> 522 ### difflib.**unifiedDiff**(a, b, options) 523 524 Compare a and b (lists of strings); 525 return delta lines in unified diff format. 526 527 options: 528 529 * fromfile 530 * tofile 531 * fromfiledate 532 * tofiledate 533 * n 534 * lineterm 535 536 Unified diffs are a compact way of showing just the lines that have 537 changed plus a few lines of context. 538 The changes are shown in a inline style (instead of separate before/after 539 blocks). 540 The number of context lines is set by n which defaults to three. 541 542 By default, the diff control lines (those with `---`, `+++`, or `@@`) are 543 created with a trailing newline. 544 545 For inputs that do not have trailing newlines, set the lineterm argument 546 to `""` so that the output will be uniformly newline free. 547 548 The context diff format normally has a header for filenames and modification 549 times. Any or all of these may be specified using strings for *fromfile*, 550 *tofile*, *fromfiledate*, and *tofiledate*. 551 The modification times are normally expressed in the ISO 8601 format. 552 If not specified, the strings default to blanks. 553 554 ```js 555 >>> difflib.unifiedDiff('one two three four'.split(' '), 556 ... 'zero one tree four'.split(' '), { 557 ... fromfile: 'Original' 558 ... tofile: 'Current', 559 ... fromfiledate: '2005-01-26 23:30:50', 560 ... tofiledate: '2010-04-02 10:20:52', 561 ... lineterm: '' 562 ... }) 563 [ '--- Original\t2005-01-26 23:30:50', 564 '+++ Current\t2010-04-02 10:20:52', 565 '@@ -1,4 +1,4 @@', 566 '+zero', 567 ' one', 568 '-two', 569 '-three', 570 '+tree', 571 ' four' ] 572 ``` 573 574 575 <a name="IS_LINE_JUNK" /> 576 ### difflib.**IS\_LINE\_JUNK**(line) 577 578 Return true for ignorable lines. The line line is ignorable if *line* is 579 blank or contains a single `'#'`, otherwise it is not ignorable. 580 581 <a name="IS_CHARACTER_JUNK" /> 582 ### difflib.**IS\_CHARACTER\_JUNK**(ch) 583 584 Return true for ignorable characters. The character *ch* is ignorable if ch 585 is a space or tab, otherwise it is not ignorable. 586 Used as a default for parameter charjunk in [ndiff()](#ndiff). 587 588 589 License 590 ------- 591 592 Ported by Xueqiao Xu <xueqiaoxu@gmail.com> 593 594 PSF LICENSE AGREEMENT FOR PYTHON 2.7.2 595 596 1. This LICENSE AGREEMENT is between the Python Software Foundation (“PSF”), and the Individual or Organization (“Licensee”) accessing and otherwise using Python 2.7.2 software in source or binary form and its associated documentation. 597 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 2.7.2 alone or in any derivative version, provided, however, that PSF’s License Agreement and PSF’s notice of copyright, i.e., “Copyright © 2001-2012 Python Software Foundation; All Rights Reserved” are retained in Python 2.7.2 alone or in any derivative version prepared by Licensee. 598 3. In the event Licensee prepares a derivative work that is based on or incorporates Python 2.7.2 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 2.7.2. 599 4. PSF is making Python 2.7.2 available to Licensee on an “AS IS” basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 2.7.2 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 600 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 2.7.2 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 2.7.2, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 601 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 602 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 603 8. By copying, installing or otherwise using Python 2.7.2, Licensee agrees to be bound by the terms and conditions of this License Agreement.