README.md
  1  Difflib.js
  2  ==========
  3  
  4  A JavaScript module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including context and unified diffs. Ported from Python's [difflib](http://docs.python.org/library/difflib.html) module.
  5  
  6  Installation
  7  ------------
  8  
  9  #### Browser
 10  
 11  To use it in the browser, you may download the [minified js file](https://github.com/qiao/difflib.js/raw/master/dist/difflib-browser.js) and include it in your webpage.
 12  
 13  ```html
 14  <script type="text/javascript" src="./difflib-browser.js"></script>
 15  ```
 16  
 17  #### Node.js
 18  
 19  For Node.js, you can install it using Node Package Manager (npm):
 20  
 21  ```bash
 22  npm install difflib
 23  ```
 24  
 25  Then, in your script:
 26  
 27  ```js
 28  var difflib = require('difflib');
 29  ```
 30  
 31  Quick Examples
 32  --------------
 33  
 34  1. contextDiff
 35  
 36      ```js
 37      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
 38      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
 39      >>> difflib.contextDiff(s1, s2, {fromfile:'before.py', tofile:'after.py'})
 40      [ '*** before.py\n',
 41        '--- after.py\n',
 42        '***************\n',
 43        '*** 1,4 ****\n',
 44        '! bacon\n',
 45        '! eggs\n',
 46        '! ham\n',
 47        '  guido\n',
 48        '--- 1,4 ----\n',
 49        '! python\n',
 50        '! eggy\n',
 51        '! hamster\n',
 52        '  guido\n' ]
 53      ```
 54  
 55  2. unifiedDiff
 56  
 57      ```js
 58      >>> difflib.unifiedDiff('one two three four'.split(' '),
 59      ...                     'zero one tree four'.split(' '), {
 60      ...                       fromfile: 'Original'
 61      ...                       tofile: 'Current',
 62      ...                       fromfiledate: '2005-01-26 23:30:50',
 63      ...                       tofiledate: '2010-04-02 10:20:52',
 64      ...                       lineterm: ''
 65      ...                     })
 66      [ '--- Original\t2005-01-26 23:30:50',
 67        '+++ Current\t2010-04-02 10:20:52',
 68        '@@ -1,4 +1,4 @@',
 69        '+zero',
 70        ' one',
 71        '-two',
 72        '-three',
 73        '+tree',
 74        ' four' ]
 75      ```
 76  
 77  
 78  3. ndiff
 79  
 80      ```js
 81      >>> a = ['one\n', 'two\n', 'three\n']
 82      >>> b = ['ore\n', 'tree\n', 'emu\n']
 83      >>> difflib.ndiff(a, b)
 84      [ '- one\n',
 85        '?  ^\n',
 86        '+ ore\n',
 87        '?  ^\n',
 88        '- two\n',
 89        '- three\n',
 90        '?  -\n',
 91        '+ tree\n',
 92        '+ emu\n' ]
 93      ```
 94  
 95  4. ratio
 96  
 97      ```js
 98      >>> s = new difflib.SequenceMatcher(null, 'abcd', 'bcde');
 99      >>> s.ratio();
100      0.75
101      >>> s.quickRatio();
102      0.75
103      >>> s.realQuickRatio();
104      1.0
105      ```
106  
107  5. getOpcodes
108  
109      ```js
110      >>> s = new difflib.SequenceMatcher(null, 'qabxcd', 'abycdf');
111      >>> s.getOpcodes();
112      [ [ 'delete'  , 0 , 1 , 0 , 0 ] ,
113        [ 'equal'   , 1 , 3 , 0 , 2 ] ,
114        [ 'replace' , 3 , 4 , 2 , 3 ] ,
115        [ 'equal'   , 4 , 6 , 3 , 5 ] ,
116        [ 'insert'  , 6 , 6 , 5 , 6 ] ]
117      ```
118  
119  6. getCloseMatches
120  
121      ```js
122      >>> difflib.getCloseMatches('appel', ['ape', 'apple', 'peach', 'puppy'])
123      ['apple', 'ape']
124      ```
125  
126  Documentation
127  -------------
128  
129  * [SequenceMatcher](#SequenceMatcher)
130  
131      * [setSeqs](#setSeqs)
132      * [setSeq1](#setSeq1)
133      * [setSeq2](#setSeq2)
134      * [findLongestMatch](#findLongestMatch)
135      * [getMatchingBlocks](#getMatchingBlocks)
136      * [getOpcodes](#getOpcodes)
137      * [getGroupedOpcodes](#getGroupedOpcodes)
138      * [ratio](#ratio)
139      * [quickRatio](#quickRatio)
140      * [realQuickRatio](#realQuickRatio)
141  
142  * [Differ](#Differ)
143  
144      * [compare](#compare)
145  
146  * [contextDiff](#contextDiff)
147  * [getCloseMatches](#getCloseMatches)
148  * [ndiff](#ndiff)
149  * [restore](#restore)
150  * [unifiedDiff](#unifiedDiff)
151  * [IS_LINE_JUNK](#IS_LINE_JUNK)
152  * [IS_CHARACTER_JUNK](#IS_CHARACTER_JUNK)
153  
154  
155  <a name="SequenceMatcher" />
156  ### *class* difflib.**SequenceMatcher**([isjunk[, a[, b[, autojunk=true]]]])
157  
158  This is a flexible class for comparing pairs of sequences of any type.
159  
160  Optional argument *isjunk* must be **null** (the default) or a one-argument function
161  that takes a sequence element and returns true if and only if the element is
162  "junk" and should be ignored. 
163  
164  Passing **null** for *isjunk* is equivalent to passing
165  
166  ```js
167  function(x) { return false; }; 
168  ```
169  
170  in other words, no elements are ignored. 
171  
172  For example, pass:
173  
174  ```js
175  function(x) { return x == ' ' || x == '\t'; }
176  ```
177  
178  if you're comparing lines as sequences of characters, 
179  and don’t want to synch up on blanks or hard tabs.
180  
181  The optional arguments *a* and *b* are sequences to be compared;
182  both default to empty strings.
183  
184  The optional argument *autojunk* can be used to disable the 
185  automatic junk heuristic, which automatically treats certain sequence items as junk.
186  
187  
188  <a name="setSeqs" />
189  #### setSeqs(a, b)
190  
191  Set the two sequences to be compared.
192  
193  SequenceMatcher computes and caches detailed information about the second
194  sequence, so if you want to compare one sequence against many sequences,
195  use [setSeq2()](#setSeq2) to set the commonly used sequence once and call 
196  [setSeq1()](#setSeq1) repeatedly, once for each of the other sequences.
197  
198  <a name="setSeq1" />
199  #### setSeq1(a)
200  
201  Set the first sequence to be compared. The second sequence to be compared is not changed.
202  
203  <a name="setSeq2" />
204  #### setSeq2(a)
205  
206  Set the second sequence to be compared. The first sequence to be compared is not changed.
207  
208  <a name="findLongestMatch" />
209  #### findLongestMatch(alo, ahi, blo, bhi)
210  
211  Find longest matching block in `a[alo:ahi]` and `b[blo:bhi]`.
212  
213  If *isjunk* was omitted or null, *findLongestMatch()* returns `[i, j, k]` such that 
214  `a[i:i+k]` is equal to `b[j:j+k]`, where `alo <= i <= i+k <= ahi` and 
215  `blo <= j <= j+k <= bhi`. 
216  For all `[i', j', k']` meeting those conditions, the additional conditions `k >= k'`, 
217  `i <= i'`, and if `i == i'`, `j <= j'` are also met. 
218  In other words, of all maximal matching blocks, return one that starts earliest in *a*,
219  and of all those maximal matching blocks that start earliest in *a*, 
220  return the one that starts earliest in *b*.
221  
222  ```js
223  >>> s = new difflib.SequenceMatcher(null, " abcd", "abcd abcd");
224  >>> s.findLongestMatch(0, 5, 0, 9);
225  [0, 4, 5]
226  ```
227  
228  If *isjunk* was provided, first the longest matching block is determined
229  as above, but with the additional restriction that no junk element appears
230  in the block. 
231  Then that block is extended as far as possible by matching (only) junk 
232  elements on both sides. So the resulting block never matches on junk 
233  except as identical junk happens to be adjacent to an interesting match.
234  
235  Here's the same example as before, but considering blanks to be junk. 
236  That prevents `' abcd'` from matching the `' abcd'` at the tail end of 
237  the second sequence directly. 
238  Instead only the `'abcd'` can match, and matches the leftmost `'abcd'` 
239  in the second sequence:
240  
241  ```js
242  >>> s = new difflib.SequenceMatcher(function(x) {return x == ' ';}, " abcd", "abcd abcd")
243  >>> s.findLongestMatch(0, 5, 0, 9)
244  [1, 0, 4]
245  ```
246  
247  If no blocks match, this returns `[alo, blo, 0]`.
248  
249  
250  <a name="getMatchingBlocks" />
251  #### getMatchingBlocks()
252  
253  Return list of triples describing matching subsequences. 
254  Each triple is of the form `[i, j, n]`, and means that `a[i:i+n] == b[j:j+n]`. 
255  The triples are monotonically increasing in *i* and *j*.
256  
257  The last triple is a dummy, and has the value `[a.length, b.length, 0]`.
258  It is the only triple with `n == 0`. If `[i, j, n]` and `[i', j', n']` 
259  are adjacent triples in the list, and the second is not the last triple 
260  in the list, then `i+n != i'` or `j+n != j'`; 
261  in other words, adjacent triples always describe non-adjacent equal blocks.
262  
263  ```js
264  >>> s = new difflib.SequenceMatcher(null, "abxcd", "abcd")
265  >>> s.getMatchingBlocks()
266  [ [0, 0, 2], [3, 2, 2], [5, 4, 0] ]
267  ```
268  
269  <a name="getOpcodes" />
270  #### getOpcodes()
271  
272  Return list of 5-tuples describing how to turn a into b. 
273  Each tuple is of the form `[tag, i1, i2, j1, j2]`. 
274  The first tuple has `i1 == j1 == 0`, and remaining tuples 
275  have *i1* equal to the *i2* from the preceding tuple, 
276  and, likewise, *j1* equal to the previous *j2*.
277  
278  The tag values are strings, with these meanings:
279  
280      Value       Meaning
281  
282      'replace'   a[i1:i2] should be replaced by b[j1:j2].
283      'delete'    a[i1:i2] should be deleted. Note that j1 == j2 in this case.
284      'insert'    b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case.
285      'equal'     a[i1:i2] == b[j1:j2] (the sub-sequences are equal).
286  
287  ```js
288  >>> s = new difflib.SequenceMatcher(null, 'qabxcd', 'abycdf');
289  >>> s.getOpcodes();
290  [ [ 'delete'  , 0 , 1 , 0 , 0 ] ,
291    [ 'equal'   , 1 , 3 , 0 , 2 ] ,
292    [ 'replace' , 3 , 4 , 2 , 3 ] ,
293    [ 'equal'   , 4 , 6 , 3 , 5 ] ,
294    [ 'insert'  , 6 , 6 , 5 , 6 ] ]
295  ```
296  
297  <a name="getGroupedOpcodes" />
298  #### getGroupedOpcodes([n])
299  
300  Return a list groups with upto n (default is 3) lines of context.
301  Each group is in the same format as returned by [getOpcodes()](#getOpcodes).
302  
303  <a name="ratio" />
304  #### ratio()
305  
306  Return a measure of the sequences’ similarity as a float in the range [0, 1].
307  
308  Where T is the total number of elements in both sequences, 
309  and M is the number of matches, this is 2.0*M / T. 
310  Note that this is `1.0` if the sequences are identical, 
311  and `0.0` if they have nothing in common.
312  
313  This is expensive to compute if [getMatchingBlocks()](#getMatchingBlocks) or 
314  [getOpcodes()](#getOpcodes) hasn’t already been called, in which case 
315  you may want to try [quickRatio()](#quickRatio) or 
316  [realQuickRatio()](#realQuickRatio) first to get an upper bound.
317  
318  <a name="quickRatio" />
319  #### quickRatio()
320  
321  Return an upper bound on ratio() relatively quickly.
322  
323  <a name="realQuickRatio" />
324  #### realQuickRatio()
325  
326  Return an upper bound on ratio() very quickly.
327  
328  ```js
329  >>> s = new difflib.SequenceMatcher(null, 'abcd', 'bcde');
330  >>> s.ratio();
331  0.75
332  >>> s.quickRatio();
333  0.75
334  >>> s.realQuickRatio();
335  1.0
336  ```
337  
338  <a name="Differ" />
339  ### *class* difflib.**Differ**([linejunk[, charjunk]])
340  
341  This is a class for comparing sequences of lines of text, 
342  and producing human-readable differences or deltas. 
343  Differ uses [SequenceMatcher](#SequenceMatcher) both to compare 
344  sequences of lines, and to compare sequences of characters within 
345  similar (near-matching) lines.
346  
347  Each line of a Differ delta begins with a two-letter code:
348  
349      Code    Meaning
350      '- '    line unique to sequence 1
351      '+ '    line unique to sequence 2
352      '  '    line common to both sequences
353      '? '    line not present in either input sequence
354  
355  Lines beginning with `?` attempt to guide the eye to intraline differences, 
356  and were not present in either input sequence. 
357  These lines can be confusing if the sequences contain tab characters.
358  
359  Optional parameters *linejunk* and *charjunk* are for filter functions (or **null**):
360  
361  *linejunk*: A function that accepts a single string argument, 
362  and returns true if the string is junk. 
363  The default is **null**, meaning that no line is considered junk.
364  
365  *charjunk*: A function that accepts a single character argument 
366  (a string of length 1), and returns true if the character is junk. 
367  The default is *null*, meaning that no character is considered junk.
368  
369  <a name="compare" />
370  #### compare(a, b)
371  
372  Compare two sequences of lines, and generate the delta (a sequence of lines).
373  
374  Each sequence must contain individual single-line strings ending with newlines.
375  
376  ```js
377  >>> d = new difflib.Differ()
378  >>> d.compare(['one\n', 'two\n', 'three\n'],
379  ...           ['ore\n', 'tree\n', 'emu\n'])
380  [ '- one\n',
381    '?  ^\n',
382    '+ ore\n',
383    '?  ^\n',
384    '- two\n',
385    '- three\n',
386    '?  -\n',
387    '+ tree\n',
388    '+ emu\n' ]
389  ```
390  
391  <a name="contextDiff" />
392  ### difflib.**contextDiff**(a, b, options)
393  
394  Compare *a* and *b* (lists of strings); 
395  return the delta lines in context diff format.
396  
397  options:
398  
399  * fromfile
400  * tofile
401  * fromfiledate
402  * tofiledate
403  * n
404  * lineterm
405  
406  Context diffs are a compact way of showing just the lines that 
407  have changed plus a few lines of context. The changes are shown in a 
408  before/after style. 
409  The number of context lines is set by n which defaults to three.
410  
411  By default, the diff control lines (those with `***` or `---`) are created 
412  with a trailing newline. 
413  
414  For inputs that do not have trailing newlines, set the lineterm argument 
415  to `""` so that the output will be uniformly newline free.
416  
417  The context diff format normally has a header for filenames and modification
418  times. Any or all of these may be specified using strings for *fromfile*, 
419  *tofile*, *fromfiledate*, and *tofiledate*. 
420  The modification times are normally expressed in the ISO 8601 format. 
421  If not specified, the strings default to blanks.
422  
423  ```js
424  >>> var s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
425  >>> var s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
426  >>> difflib.contextDiff(s1, s2, {fromfile:'before.py', tofile:'after.py'})
427  [ '*** before.py\n',
428    '--- after.py\n',
429    '***************\n',
430    '*** 1,4 ****\n',
431    '! bacon\n',
432    '! eggs\n',
433    '! ham\n',
434    '  guido\n',
435    '--- 1,4 ----\n',
436    '! python\n',
437    '! eggy\n',
438    '! hamster\n',
439    '  guido\n' ]
440  ```
441  
442  <a name="getCloseMatches" />
443  ### difflib.*getCloseMatches*(word, possibilities\[, n\]\[, cutoff\])
444  
445  Return a list of the best “good enough” matches. 
446  *word* is a sequence for which close matches are desired 
447  (typically a string), and *possibilities* is a list of sequences against 
448  which to match word (typically a list of strings).
449  
450  Optional argument *n* (default 3) is the maximum number of close 
451  matches to return; *n* must be greater than 0.
452  
453  Optional argument *cutoff* (default 0.6) is a float in the range 
454  [0, 1]. 
455  Possibilities that don’t score at least that similar to word are ignored.
456  
457  The best (no more than n) matches among the possibilities are 
458  returned in a list, sorted by similarity score, most similar first.
459  
460  ```js
461  >>> difflib.getCloseMatches('appel', ['ape', 'apple', 'peach', 'puppy'])
462  ['apple', 'ape']
463  ```
464  
465  <a name="ndiff" />
466  ### difflib.**ndiff**(a, b\[, linejunk\]\[, charjunk\])
467  
468  Compare *a* and b (lists of strings); 
469  return Differ-style delta lines
470  
471  Optional keyword parameters *linejunk* and *charjunk* are for 
472  filter functions (or **null**):
473  
474  *linejunk*: A function that accepts a single string argument, 
475  and returns true if the string is junk, or false if not. 
476  The default is (*null*).
477  
478  *charjunk*: A function that accepts a character (a string of length 1),
479  and returns if the character is junk, or false if not. The default is 
480  module-level function [IS_CHARACTER_JUNK()](#IS_CHARACTER_JUNK), 
481  which filters out whitespace characters (a blank or tab; note: 
482  bad idea to include newline in this!).
483  
484  ```js
485  >>> a = ['one\n', 'two\n', 'three\n']
486  >>> b = ['ore\n', 'tree\n', 'emu\n']
487  >>> difflib.ndiff(a, b)
488  [ '- one\n',
489    '?  ^\n',
490    '+ ore\n',
491    '?  ^\n',
492    '- two\n',
493    '- three\n',
494    '?  -\n',
495    '+ tree\n',
496    '+ emu\n' ]
497  ```
498  
499  <a name="restore" />
500  ### difflib.**restore**(sequence, which)
501  
502  Return one of the two sequences that generated a delta.
503  
504  Given a sequence produced by Differ.compare() or ndiff(), 
505  extract lines originating from file 1 or 2 (parameter which), stripping off line prefixes.
506  
507  ```js
508  >>> a = ['one\n', 'two\n', 'three\n']
509  >>> b = ['ore\n', 'tree\n', 'emu\n']
510  >>> diff = difflib.ndiff(a, b)
511  >>> difflib.restore(diff, 1)
512  [ 'one\n',
513    'two\n',
514    'three\n' ]
515  >>> restore(diff, 2)
516  [ 'ore\n',
517    'tree\n',
518    'emu\n' ]
519  ```
520  
521  <a name="unifiedDiff" />
522  ### difflib.**unifiedDiff**(a, b, options)
523  
524  Compare a and b (lists of strings); 
525  return delta lines in unified diff format.
526  
527  options:
528  
529  * fromfile
530  * tofile
531  * fromfiledate
532  * tofiledate
533  * n
534  * lineterm
535  
536  Unified diffs are a compact way of showing just the lines that have 
537  changed plus a few lines of context. 
538  The changes are shown in a inline style (instead of separate before/after 
539  blocks). 
540  The number of context lines is set by n which defaults to three.
541  
542  By default, the diff control lines (those with `---`, `+++`, or `@@`) are 
543  created with a trailing newline. 
544  
545  For inputs that do not have trailing newlines, set the lineterm argument 
546  to `""` so that the output will be uniformly newline free.
547  
548  The context diff format normally has a header for filenames and modification
549  times. Any or all of these may be specified using strings for *fromfile*, 
550  *tofile*, *fromfiledate*, and *tofiledate*. 
551  The modification times are normally expressed in the ISO 8601 format.
552  If not specified, the strings default to blanks.
553  
554  ```js
555  >>> difflib.unifiedDiff('one two three four'.split(' '),
556  ...                     'zero one tree four'.split(' '), {
557  ...                       fromfile: 'Original'
558  ...                       tofile: 'Current',
559  ...                       fromfiledate: '2005-01-26 23:30:50',
560  ...                       tofiledate: '2010-04-02 10:20:52',
561  ...                       lineterm: ''
562  ...                     })
563  [ '--- Original\t2005-01-26 23:30:50',
564    '+++ Current\t2010-04-02 10:20:52',
565    '@@ -1,4 +1,4 @@',
566    '+zero',
567    ' one',
568    '-two',
569    '-three',
570    '+tree',
571    ' four' ]
572  ```
573  
574  
575  <a name="IS_LINE_JUNK" />
576  ### difflib.**IS\_LINE\_JUNK**(line)
577  
578  Return true for ignorable lines. The line line is ignorable if *line* is 
579  blank or contains a single `'#'`, otherwise it is not ignorable.
580  
581  <a name="IS_CHARACTER_JUNK" />
582  ### difflib.**IS\_CHARACTER\_JUNK**(ch)
583  
584  Return true for ignorable characters. The character *ch* is ignorable if ch
585  is a space or tab, otherwise it is not ignorable. 
586  Used as a default for parameter charjunk in [ndiff()](#ndiff).
587  
588  
589  License
590  -------
591  
592  Ported by Xueqiao Xu &lt;xueqiaoxu@gmail.com&gt;
593  
594  PSF LICENSE AGREEMENT FOR PYTHON 2.7.2
595  
596  1. This LICENSE AGREEMENT is between the Python Software Foundation (“PSF”), and the Individual or Organization (“Licensee”) accessing and otherwise using Python 2.7.2 software in source or binary form and its associated documentation.
597  2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 2.7.2 alone or in any derivative version, provided, however, that PSF’s License Agreement and PSF’s notice of copyright, i.e., “Copyright © 2001-2012 Python Software Foundation; All Rights Reserved” are retained in Python 2.7.2 alone or in any derivative version prepared by Licensee.
598  3. In the event Licensee prepares a derivative work that is based on or incorporates Python 2.7.2 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 2.7.2.
599  4. PSF is making Python 2.7.2 available to Licensee on an “AS IS” basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 2.7.2 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
600  5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 2.7.2 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 2.7.2, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
601  6. This License Agreement will automatically terminate upon a material breach of its terms and conditions.
602  7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party.
603  8. By copying, installing or otherwise using Python 2.7.2, Licensee agrees to be bound by the terms and conditions of this License Agreement.