idnits 2.17.1 

draft-cheney-simplediff-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 23 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 2017) is 2385 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Missing reference section? '1' on line 970 looks like a reference

  -- Missing reference section? '0' on line 970 looks like a reference

  -- Missing reference section? '2' on line 812 looks like a reference

  -- Missing reference section? '4' on line 813 looks like a reference

  -- Missing reference section? '3' on line 811 looks like a reference


     Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT
2	Independent Submission                                         A. Cheney
3	Category: Informational                                   prettydiff.com
4	                                                            October 2017

6	                         Simple Diff Algorithm
7	                     draft-cheney-simplediff-00.txt

9	Abstract

11	   This document describes a simplified algorithm for performing
12	   computational comparisons.  The design goal is to achieve an approach
13	   that uses the smallest amount of logic.  Necessary secondary goals
14	   include an approach that is language agnostic, fast, extensible, and
15	   more predictable than other similar algorithms.

17	Copyright Notice

19	   Copyright (c) 2017 IETF Trust and the persons identified as the
20	   document authors.  All rights reserved.

22	   This document is subject to BCP 78 and the IETF Trust's Legal
23	   Provisions Relating to IETF Documents
24	   (http://trustee.ietf.org/license-info) in effect on the date of
25	   publication of this document.  Please review these documents
26	   carefully, as they describe your rights and restrictions with respect
27	   to this document.

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.  Internet-Drafts are working
31	   documents of the Internet Engineering Task Force (IETF).  Note that
32	   other groups may also distribute working documents as
33	   Internet-Drafts.  The list of current Internet-Drafts is at
34	   http://datatracker.ietf.org/drafts/current.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	Comments

43	   Comments are solicited and should be addressed to
44	   info@prettydiff.com.

46	RFC XXXX                Simple Diff Algorithm               October 2017

48	Table of Contents

50	   1. Introduction ....................................................3
51	   2. Understanding how to compare ....................................3
52	      2.1. It is more about the equality than the differences .........3
53	      2.2. Algorithm quality ..........................................4
54	      2.3. Speed ......................................................4
55	      2.4. Output format ..............................................6
56	   3. How the algorithm works at a high level .........................7
57	      3.1. Getting by with a hash map and some counts .................7
58	      3.2. The third and final pass through the data ..................8
59	      3.3. The "fix" function ........................................11
60	      3.4. Done ......................................................16
61	   4. Extensibility ..................................................23
62	   5. Security considerations ........................................23
63	   6. IANA considerations ............................................23
64	   7. References .....................................................23
65	   8. Author's address ...............................................23

67	RFC XXXX                Simple Diff Algorithm               October 2017

69	1. Introduction

71	   There are many comparison algorithms available, which are commonly
72	   referred to as "diff" algorithms after the similarly named Unix
73	   application.  While there are many algorithms already available none
74	   achieve speed, precision, and simplicity in an easily predictable
75	   manner.

77	   The most commonly fault of many diff algorithms is that they solve
78	   for the incorrect problem.  Instead of accomplishing the goal, to
79	   find how two samples differ, they instead will try to solve for a
80	   common computer science problem called "Longest Common Subsequence
81	   (LCS)".  The LCS problem seeks to find the longest blocks of equality
82	   in samples of comparison, which sounds ideal.  Unfortunately, the
83	   execution of this approach is not predictable as it is bound to the
84	   commonality of the compared samples.  Furthermore, it isn't simple or
85	   fast.  Worse still is that this approach is a precision failure in
86	   situations where differences are interspersed across frequent
87	   repetition.

89	2. Understanding how to compare

91	2.1. It is more about the equality than the differences

93	   A good diff algorithm will attempt to identify as much equality as
94	   possible.  Everything else qualifies as differences.  The metric for
95	   quality and precision is a smaller count of captured differences.
96	   The smaller this number the better, presuming differences aren't
97	   escaping undetected.

99	   False negatives, which is allowing differences through without
100	   detection, is really bad.  This is absolute failure in a diff
101	   algorithm.  False positives, which is identifying more differences
102	   than there actually are is also bad, but a false positive is much
103	   better than a false negative.  This means it is safer to report more
104	   differences than are actually present, which is why a higher number
105	   of reported differences is less precise.  Maximum precision is
106	   reporting differences without any false negatives or false positives.

108	   One way to achieve higher precision is to first capture as much
109	   equality as possible.  For everything else that is left over
110	   prioritize unique items first and report these items as differences.
111	   Finally determine the next bit of equality or uniqueness and
112	   everything in between is either a change, an insertion, or a
113	   deletion.

115	RFC XXXX                Simple Diff Algorithm               October 2017

117	2.2. Algorithm quality

119	   The primary priorities when writing this kind of code are execution
120	   speed, precision (as previous described), and code simplicity.  In
121	   most cases precision is the most important factor in code design,
122	   but in some cases speed is more important when processing large input
123	   or a batch of thousands of files.  Simplicity is necessary so that
124	   other people can understand the code and modify it as necessary to
125	   improve upon the design and additional features.  No algorithm is
126	   ever capable of meeting all needs for all use cases, so it is
127	   important that other people can understand the code with minimal
128	   effort.

130	2.3. Speed

132	   Faster execution is the result of a couple of things.  The most
133	   important consideration for making an algorithm faster is to reduce
134	   the number of passes through data where possible.  After that
135	   eliminate all costly operations.  In most programming languages
136	   simple arithmetic and static string comparisons are cheap to execute
137	   particularly if the examined strings aren't changing.  The
138	   theoretical minimum number of data passes is two as you have to read
139	   the contents of each submitted sample.  This proposed algorithm
140	   achieves speed by only taking three complete passes through data and
141	   making all possible effort to never repeat a step or loop iteration.

143	   This approach is linear and predictable where the number of
144	   interations passing through data is computed from the sum of:

146	      * number of iterations from the first sample
147	      * number of iterations from the second sample
148	      * the number of iterations from the smallest of those samples.

150	   Performance of the mentioned approach is heavily based upon key
151	   access in a hash map, which will likely vary by programming language.

153	   Until a way to perform a comparison without reading from the samples
154	   is discovered 2 data passes can be safely assumed as the fastest
155	   possible approach.  Between that and the Simple Diff approach there
156	   are two possibilities for increased performance. The first
157	   possibility is to make decisions and write output immediately upon
158	   reading from the second sample so as to have only two data passes.
159	   The challenge with this approach is that analysis occurs in the
160	   moment of the current loop interation without knowledge of what comes
161	   later in the second sample.  A more practical second performance
162	   possibility is to write a smaller hash map.  Writing a smaller hash
163	   map means moving some of the decision tree up front before a separate
164	   formal analysis phase.  In order for this approach to be viable this
165	   early step in logic must be tiny, non-replicating, and a different
166	   means of iteration must be devised to account for a data store of
167	   unpredictable size.

169	RFC XXXX                Simple Diff Algorithm               October 2017

171	   The proposed algorithm could be faster than the popular Myers' O(ND)
172	   approach.  That may or may not be true and no evidence was provided
173	   either way (conjecture is not evidence).  In terms of experimental
174	   criteria algorithms are themselves largely irrelevant.  More
175	   important is the implementation of those algorithms.  If exercising
176	   the Myers' approach makes fewer decisions and has fewer total data
177	   passes, as described in the previous paragraph, then it likely is
178	   faster.  Pronounced emphasis is applied to the word "total" because
179	   this makes all the difference.

181	   Many prior diff applications don't provide a complete third data
182	   pass.  Instead they provide the minimum two complete data passes over
183	   the samples and various smaller passes as calculated by block moves
184	   and edit distances.  If these various fractional passes are non-
185	   linear, which means starting and stopping in different places
186	   respectively, their performance is less predictable.  If these
187	   fractional passes are non-linear and touch any given data index more
188	   than once they have repetitive logic, and likely are not as fast.  To
189	   affirmatively guarantee superior performance over this Simple Diff
190	   approach there needs to be fewer passes over data, which means no
191	   repetition and a smaller number of iterations.  Predictability
192	   ensures the performance of an application scales roughly
193	   proportionately to the size of the provided samples, where an
194	   unpredictable approach would scale disproportionately (slower over
195	   time).

197	   The approach taken here is fast.  It cannot be argued,
198	   scientifically, it is the fastest ever (or slowest) approach for its
199	   level of accuracy without also writing alternate algorithms into
200	   applications with identical application constraints.  One could argue
201	   this approach is the fastest ever comparative algorithm for its level
202	   of predictability and precision.

204	RFC XXXX                Simple Diff Algorithm               October 2017

206	2.4. Output format

208	   This Simple Diff algorithm inherits its output format from the
209	   application jsdifflib, https://github.com/cemerick/jsdifflib.  The
210	   output is a big array containing child arrays at each index.  The
211	   child arrays each contain 5 indexes in the format:

213	      * type
214	      * start in first sample
215	      * end in first sample
216	      * start in second sample
217	      * end in second sample

219	   There are 4 types defined in the output:

221	      * "equal"   - both array index items are identical at the indexes
222	         compared
223	      * "replace" - changes are present at indexes compared
224	      * "insert"  - the index of the second sample is unique to the
225	         second sample
226	      * "delete"  - the index of the first sample is unique to the first
227	         sample

229	RFC XXXX                Simple Diff Algorithm               October 2017

231	3. How the algorithm works at a high level

233	3.1. Getting by with a hash map and some counts

235	   This algorithm takes after the Paul Heckel algorithm.  First thing is
236	   to transform the string input into arrays.  The most primative way to
237	   do this is to split the input by lines where each line of input is an
238	   index in an array.  Once the two input samples are converted to
239	   arrays they will each have to be examined.

241	   The first loop will walk through an array representing one of the
242	   code samples and make a decision for a hash map which is named
243	   "table" in the following code sample.  The two arrays are simply
244	   named "one" and "two".  Each index of the array, by default is a line
245	   of code, which will serve as a key in the hash map named "table".  If
246	   this key does not exist then it will be added.  The value for each
247	   key is an object with two properites named "one" and "two".  These
248	   properties are simply a count indicating the number of times the key
249	   is encountered in the two arrays. If the key is already present then
250	   we will simply increase the value of properties "one" or "two"
251	   respective of the current array.

253	   Here is an example of this step in code:

255	   do {
256	       if (table[two[b]] === undefined) {
257	           table[two[b]] = {
258	               one: 0,
259	               two: 1
260	           };
261	       } else {
262	           table[two[b]].two += 1;
263	       }
264	       b += 1;
265	   } while (b < lenb);

267	RFC XXXX                Simple Diff Algorithm               October 2017

269	3.2. The third and final pass through the data

271	   Once the table is fully populated with all indexes from both arrays,
272	   "one" and "two",  we need a third and final loop to walk across the
273	   data and make some simple decisions.  Here is this complete step in
274	   code.

276	   do {
277	       c = a;
278	       d = b;
279	       if (one[a] === two[b]) {
280	           equality();
281	       } else if (table[one[a]][1] < 1 && table[two[b]][0] < 1) {
282	           replaceUniques();
283	       } else if (table[one[a]][1] < 1 && one[a + 1] !== two[b + 2]) {
284	           deletion();
285	       } else if (table[two[b]][0] < 1 && one[a + 2] !== two[b + 1]) {
286	           insertion();
287	       } else if (
288	           table[one[a]][0] - table[one[a]][1] === 1 &&
289	           one[a + 1] !== two[b + 2]
290	       ) {
291	           deletionStatic();
292	       } else if (
293	           table[two[b]][1] - table[two[b]][0] === 1 &&
294	           one[a + 2] !== two[b + 1]
295	       ) {
296	           insertionStatic();
297	       } else if (one[a + 1] === two[b]) {
298	           deletion();
299	       } else if (one[a] === two[b + 1]) {
300	           insertion();
301	       } else {
302	           replacement();
303	       }
304	       a = a + 1;
305	       b = b + 1;
306	   } while (a < lena && b < lenb);

308	   Before moving any further please understand this logic is fast and
309	   simple, but it isn't precise at all.  We will fix this later with a
310	   child function named "fix".  Corrections for precision are a
311	   secondary step so as to not disrupt performance and also isolate
312	   complexity into a separate single location.

314	   In the above code references "a" and "b" are simply positive integer
315	   incrementors.  The "a" reference is always associated with the "one"
316	   array while the "b" reference is always associated with the "two"
317	   array.  This is necessary so that each array may increment
318	   independently without collision.

320	RFC XXXX                Simple Diff Algorithm               October 2017

322	   The "c" and "d" references are secondary incrementors used as
323	   closures in the child functions. These secondary incrementors allow
324	   for child loops to occur without mutation to the primary
325	   incrementors.

327	   The first thing that happens in this loop is an attempt to discover
328	   equality.  Everything else is less important and so happens later in
329	   the decision tree.

331	   The second rule is to identify items that occur only one more time in
332	   each array.  If the next item to occur in the arrays is not equal but
333	   occurs just once more then we know the item is a unique change.

335	   The third rule is to determine if the current item is a deletion.  If
336	   the current item, and possibly additional following items,
337	   is present in the first sample but no longer exists in the second
338	   sample then it is a dynamic deletion.

340	   The fourth rule is the same as the third rule but in reverse to
341	   determine dynamic insertions.

343	   The fifth rule is to determine if the current index occurs exactly
344	   one more time in the first sample than in the second sample and yet
345	   is not a match for the next item in the first sample with two items
346	   up in the second sample. This is a static deletion, or rather a
347	   deletion of a fixed number of items.

349	   The sixth rule is the same as the fifth rule but in reverse to
350	   determine static insertions.

352	   The seventh and final rule determines that the current items in the
353	   samples are non-unique changes.

355	   The child functions are as follows:

357	   equality = function simpleDiff_equality() {
358	       do {
359	           table[one[c]].one -= 1;
360	           table[one[c]].two -= 1;
361	           c                 += 1;
362	           d                 += 1;
363	       } while (
364	           c < lena &&
365	           d < lenb &&
366	           one[c] === two[d]
367	        );
368	       fix(["equal", a, c, b, d]);
369	       b = d - 1;
370	       a = c - 1;
371	   },

373	RFC XXXX                Simple Diff Algorithm               October 2017

375	   deletion = function simpleDiff_deletion() {
376	       do {
377	           table[one[c]].one -= 1;
378	           c                 += 1;
379	       } while (
380	           c < lena &&
381	           table[one[c]].two < 1
382	        );
383	       fix(["delete", a, c, -1, -1]);
384	       a = c - 1;
385	       b = d - 1;
386	   },
387	   deletionStatic = function simpleDiff_deletionStatic() {
388	       table[one[a]].one -= 1;
389	       fix([
390	           "delete", a, a + 1,
391	           -1,
392	           -1
393	       ]);
394	       a = c;
395	       b = d - 1;
396	   },
397	   insertion = function simpleDiff_insertion() {
398	       do {
399	           table[two[d]].two -= 1;
400	           d                 += 1;
401	       } while (
402	           d < lenb &&
403	           table[two[d]].one < 1
404	        );
405	       fix(["insert", -1, -1, b, d]);
406	       a = c - 1;
407	       b = d - 1;
408	   },
409	   insertionStatic = function simpleDiff_insertionStatic() {
410	       table[two[b]].two -= 1;
411	       fix([
412	           "insert", -1, -1, b, b + 1
413	       ]);
414	       a = c - 1;
415	       b = d;
416	   },

418	RFC XXXX                Simple Diff Algorithm               October 2017

420	   replacement = function simpleDiff_replacement() {
421	       do {
422	           table[one[c]].one -= 1;
423	           table[two[d]].two -= 1;
424	           c                 += 1;
425	           d                 += 1;
426	       } while (
427	           c < lena &&
428	           d < lenb &&
429	           table[one[c]].two > 0 &&
430	           table[two[d]].one > 0
431	       );
432	       fix(["replace", a, c, b, d]);
433	       a = c - 1;
434	       b = d - 1;
435	   },
436	   replaceUniques = function simpleDiff_replaceUniques() {
437	       do {
438	           table[one[c]].one -= 1;
439	           c                 += 1;
440	           d                 += 1;
441	       } while (
442	           c < lena &&
443	           d < lenb &&
444	           table[one[c]].two < 1 &&
445	           table[two[d]].one < 1
446	       );
447	       fix(["replace", a, c, b, d]);
448	       a = c - 1;
449	       b = d - 1;
450	   }

452	   It must be noted that most of the these child functions do contain
453	   their own loops, but these loops are not duplicates.  The primary
454	   parent loop, the so called third and final loop, is adjusted forward
455	   by the distance these smaller loops increment so that there is no
456	   repetition or lose of execution speed.

458	3.3. The "fix" function

460	   We can see from the child function code that arrays are generated
461	   which appear to be the format described as the child arrays of the
462	   output.  Instead of pushing this data to the output array directly
463	   it will be passed through a function named "fix" which serves as a
464	   sort of quality control filter.  This function checks for things
465	   like:

467	RFC XXXX                Simple Diff Algorithm               October 2017

469	   * A diff type immediately following the same diff type, which can be
470	     combined into a single child array for the output
471	   * An insertion immediately following a deletion, or vise versa, which
472	     should likely be a replacement
473	   * Whether false positives generated from replacements immediately
474	     following an insertion or deletion can be eliminated

476	   The general idea is to only accept a child array of the output as an
477	   argument and make a decision after comparing it against the previous
478	   child array of the output.  Aside from two narrow edge cases there is
479	   no analysis of the original data.  Think of it like an appeals court
480	   in that no new evidence is allowed, because only the laws are under
481	   scrutiny.

483	   The code for the fix function is as follows:

485	   fix = function simpleDiff_fix(code) {
486	       var prior = codes[codes.length - 1];
487	       if (prior !== undefined) {
488	           if (prior[0] === code[0]) {
489	               if (code[0] === "replace" || code[0] === "equal") {
490	                   prior[2] = code[2];
491	                   prior[4] = code[4];
492	               } else if (code[0] === "delete") {
493	                   prior[2] = code[2];
494	               } else if (code[0] === "insert") {
495	                   prior[4] = code[4];
496	               }
497	               return;
498	           }
499	           if (prior[0] === "insert" && prior[4] - prior[3] === 1) {
500	               if (code[2] - code[1] === 1) {
501	                   if (code[0] === "replace") {
502	                       prior[0] = "replace";
503	                       prior[1] = code[1];
504	                       prior[2] = code[2];
505	                       code[0]  = "insert";
506	                       code[1]  = -1;
507	                       code[2]  = -1;
508	                   } else if (code[0] === "delete") {
509	                       code[0] = "replace";
510	                       code[3] = prior[3];
511	                       code[4] = prior[4];
512	                       codes.pop();
513	                       prior = codes[codes.length - 1];
514	                       if (prior[0] === "replace") {
515	                           prior[2] = code[2];
516	                           prior[4] = code[4];
517	                           return;
518	                       }
519	                   }

521	RFC XXXX                Simple Diff Algorithm               October 2017

523	               } else if (code[0] === "delete") {
524	                   prior[0] = "replace";
525	                   prior[1] = code[1];
526	                   prior[2] = code[1] + 1;
527	                   code[1]  = code[1] + 1;
528	               } else if (code[0] === "replace") {
529	                   prior[0] = "replace";
530	                   prior[1] = code[1];
531	                   prior[2] = code[1] + 1;
532	                   c        = prior[2];
533	                   d        = prior[4];
534	                   return;
535	               }
536	           } else if (
537	               prior[0] === "insert" &&
538	               code[0] === "delete" &&
539	               code[2] - code[1] === 1
540	           ) {
541	               prior[4] = prior[4] - 1;
542	               code[0]  = "replace";
543	               code[3]  = prior[4];
544	               code[4]  = prior[4] + 1;
545	           } else if (
546	               prior[0] === "delete" &&
547	               prior[2] - prior[1] === 1
548	           ) {
549	               if (code[4] - code[3] === 1) {
550	                   if (code[0] === "replace") {
551	                       prior[0] = "replace";
552	                       prior[3] = code[3];
553	                       prior[4] = code[4];
554	                       code[0]  = "delete";
555	                       code[3]  = -1;
556	                       code[4]  = -1;
557	                   } else if (code[0] === "insert") {
558	                       code[0] = "replace";
559	                       code[1] = prior[1];
560	                       code[2] = prior[2];
561	                       codes.pop();
562	                       prior = codes[codes.length - 1];
563	                       if (prior[0] === "replace") {
564	                           prior[2] = code[2];
565	                           prior[4] = code[4];
566	                           return;
567	                       }
568	                   }
569	               } else if (code[0] === "insert") {
570	                   prior[0] = "replace";
571	                   prior[3] = code[3];
572	                   prior[4] = code[3] + 1;
573	                   code[3]  = code[3] + 1;

575	RFC XXXX                Simple Diff Algorithm               October 2017

577	               } else if (code[0] === "replace") {
578	                   prior[0] = "replace";
579	                   prior[3] = code[3];
580	                   prior[4] = code[4] + 1;
581	                   c        = prior[2];
582	                   d        = prior[4];
583	                   return;
584	               }
585	           } else if (
586	               prior[0] === "delete" &&
587	               code[0] === "insert" &&
588	               code[4] - code[3] === 1
589	           ) {
590	               prior[2] = prior[2] - 1;
591	               code[0]  = "replace";
592	               code[1]  = prior[2];
593	               code[2]  = prior[2] + 1;
594	           } else if (prior[0] === "replace") {
595	               if (code[0] === "delete") {
596	                   if (one[code[2] - 1] === two[prior[4] - 1]) {
597	                       if (prior[2] - prior[1] > 1) {
598	                           prior[4] = prior[4] - 1;
599	                       }
600	                       c = c - 1;
601	                       d = d - 1;
602	                       return;
603	                   }
604	                   if (one[code[2]] === two[prior[4] - 1]) {
605	                       if (prior[2] - prior[1] > 1) {
606	                           prior[2]             = prior[2] - 1;
607	                           prior[4]             = prior[4] - 11;
608	                           table[one[c - 1]][0] =
609	                               table[one[c - 1]][0] - 1;
610	                       }
611	                   }
612	               } else if (code[0] === "insert") {
613	                   if (one[prior[2] - 1] === two[code[4] - 1]) {
614	                       if (prior[2] - prior[1] > 1) {
615	                           prior[2] = prior[2] - 1;
616	                       }
617	                       c = c - 1;
618	                       d = d - 1;
619	                       return;
620	                   }
621	                   if (one[code[2] - 1] === two[prior[4]]) {
622	                       if (prior[4] - prior[3] > 1) {
623	                           prior[2]             = prior[2] - 1;
624	                           prior[4]             = prior[4] - 1;
625	                           table[two[d - 1]][1] =
626	                               table[two[d - 1]][1] - 1;
627	                       }

629	RFC XXXX                Simple Diff Algorithm               October 2017

631	                   }
632	               }
633	           }
634	       }
635	       codes.push(code);
636	   };

638	RFC XXXX                Simple Diff Algorithm               October 2017

640	3.4. Done!

642	   The following is the complete algorithm with all parts assembled:

644	   function simpleDiff () {
645	       var table           = {},
646	           lines           = function simpleDiff_lines(str) {
647	               str = str
648	                   .replace(/\r\n/g, "\n")
649	                   .replace(/\r/g, "\n");
650	               return str.split("\n");
651	           },
652	           one             = (typeof options.source === "string")
653	               ? lines(options.source)
654	               : options.source,
655	           two             = (typeof options.diff === "string")
656	               ? lines(options.diff)
657	               : options.diff,
658	           lena            = one.length,
659	           lenb            = two.length,
660	           a               = 0,
661	           b               = 0,
662	           c               = 0,
663	           d               = 0,
664	           codes           = [],
665	           fix             = function simpleDiff_fix(code) {
666	               var prior = codes[codes.length - 1];
667	               if (prior !== undefined) {
668	                   if (prior[0] === code[0]) {
669	                       if (
670	                           code[0] === "replace" ||
671	                           code[0] === "equal"
672	                       ) {
673	                           prior[2] = code[2];
674	                           prior[4] = code[4];
675	                       } else if (code[0] === "delete") {
676	                           prior[2] = code[2];
677	                       } else if (code[0] === "insert") {
678	                           prior[4] = code[4];
679	                       }
680	                       return;
681	                   }
682	                   if (
683	                       prior[0] === "insert" &&
684	                       prior[4] - prior[3] === 1
685	                   ) {
686	                       if (code[2] - code[1] === 1) {
687	                           if (code[0] === "replace") {
688	                               prior[0] = "replace";
689	                               prior[1] = code[1];
690	                               prior[2] = code[2];

692	RFC XXXX                Simple Diff Algorithm               October 2017

694	                               code[0]  = "insert";
695	                               code[1]  = -1;
696	                               code[2]  = -1;
697	                           } else if (code[0] === "delete") {
698	                               code[0] = "replace";
699	                               code[3] = prior[3];
700	                               code[4] = prior[4];
701	                               codes.pop();
702	                               prior = codes[codes.length - 1];
703	                               if (prior[0] === "replace") {
704	                                   prior[2] = code[2];
705	                                   prior[4] = code[4];
706	                                   return;
707	                               }
708	                           }
709	                       } else if (code[0] === "delete") {
710	                           prior[0] = "replace";
711	                           prior[1] = code[1];
712	                           prior[2] = code[1] + 1;
713	                           code[1]  = code[1] + 1;
714	                       } else if (code[0] === "replace") {
715	                           prior[0] = "replace";
716	                           prior[1] = code[1];
717	                           prior[2] = code[1] + 1;
718	                           c        = prior[2];
719	                           d        = prior[4];
720	                           return;
721	                       }
722	                   } else if (
723	                       prior[0] === "insert" &&
724	                       code[0] === "delete" &&
725	                       code[2] - code[1] === 1
726	                   ) {
727	                       prior[4] = prior[4] - 1;
728	                       code[0]  = "replace";
729	                       code[3]  = prior[4];
730	                       code[4]  = prior[4] + 1;
731	                   } else if (
732	                       prior[0] === "delete" &&
733	                       prior[2] - prior[1] === 1
734	                   ) {
735	                       if (code[4] - code[3] === 1) {
736	                           if (code[0] === "replace") {
737	                               prior[0] = "replace";
738	                               prior[3] = code[3];
739	                               prior[4] = code[4];
740	                               code[0]  = "delete";
741	                               code[3]  = -1;
742	                               code[4]  = -1;
743	                           } else if (code[0] === "insert") {
744	                               code[0] = "replace";

746	RFC XXXX                Simple Diff Algorithm               October 2017

748	                               code[1] = prior[1];
749	                               code[2] = prior[2];
750	                               codes.pop();
751	                               prior = codes[codes.length - 1];
752	                               if (prior[0] === "replace") {
753	                                   prior[2] = code[2];
754	                                   prior[4] = code[4];
755	                                   return;
756	                               }
757	                           }
758	                       } else if (code[0] === "insert") {
759	                           prior[0] = "replace";
760	                           prior[3] = code[3];
761	                           prior[4] = code[3] + 1;
762	                           code[3]  = code[3] + 1;
763	                       } else if (code[0] === "replace") {
764	                           prior[0] = "replace";
765	                           prior[3] = code[3];
766	                           prior[4] = code[4] + 1;
767	                           c        = prior[2];
768	                           d        = prior[4];
769	                           return;
770	                       }
771	                   } else if (
772	                       prior[0] === "delete" &&
773	                       code[0] === "insert" &&
774	                       code[4] - code[3] === 1
775	                   ) {
776	                       prior[2] = prior[2] - 1;
777	                       code[0]  = "replace";
778	                       code[1]  = prior[2];
779	                       code[2]  = prior[2] + 1;
780	                   } else if (prior[0] === "replace") {
781	                       if (code[0] === "delete") {
782	                           if (one[code[2] - 1] === two[prior[4] - 1]) {
783	                               if (prior[2] - prior[1] > 1) {
784	                                   prior[4] = prior[4] - 1;
785	                               }
786	                               c = c - 1;
787	                               d = d - 1;
788	                               return;
789	                           }
790	                           if (one[code[2]] === two[prior[4] - 1]) {
791	                               if (prior[2] - prior[1] > 1) {
792	                                   prior[2]             = prior[2] - 1;
793	                                   prior[4]             = prior[4] - 11;
794	                                   table[one[c - 1]][0] =
795	                                       table[one[c - 1]][0] - 1;
796	                               }
797	                           }
798	                       } else if (code[0] === "insert") {

800	RFC XXXX                Simple Diff Algorithm               October 2017

802	                           if (one[prior[2] - 1] === two[code[4] - 1]) {
803	                               if (prior[2] - prior[1] > 1) {
804	                                   prior[2] = prior[2] - 1;
805	                               }
806	                               c = c - 1;
807	                               d = d - 1;
808	                               return;
809	                           }
810	                           if (one[code[2] - 1] === two[prior[4]]) {
811	                               if (prior[4] - prior[3] > 1) {
812	                                   prior[2]             = prior[2] - 1;
813	                                   prior[4]             = prior[4] - 1;
814	                                   table[two[d - 1]][1] =
815	                                       table[two[d - 1]][1] - 1;
816	                               }
817	                           }
818	                       }
819	                   }
820	               }
821	               codes.push(code);
822	           },
823	           equality        = function simpleDiff_equality() {
824	               do {
825	                   table[one[c]][0] = table[one[c]][0] - 1;
826	                   table[one[c]][1] = table[one[c]][1] - 1;
827	                   c                = c + 1;
828	                   d                = d + 1;
829	               } while (c < lena && d < lenb && one[c] === two[d]);
830	               fix(["equal", a, c, b, d]);
831	               b = d - 1;
832	               a = c - 1;
833	           },
834	           deletion        = function simpleDiff_deletion() {
835	               do {
836	                   table[one[c]][0] = table[one[c]][0] - 1;
837	                   c                = c + 1;
838	               } while (c < lena && table[one[c]][1] < 1);
839	               fix(["delete", a, c, -1, -1]);
840	               a = c - 1;
841	               b = d - 1;
842	           },
843	           deletionStatic  = function simpleDiff_deletionStatic() {
844	               table[one[a]][0] = table[one[a]][0] - 1;
845	               fix([
846	                   "delete", a, a + 1,
847	                   -1,
848	                   -1
849	               ]);
850	               a = c;
851	               b = d - 1;
852	           },

854	RFC XXXX                Simple Diff Algorithm               October 2017

856	           insertion       = function simpleDiff_insertion() {
857	               do {
858	                   table[two[d]][1] = table[two[d]][1] - 1;
859	                   d                = d + 1;
860	               } while (d < lenb && table[two[d]][0] < 1);
861	               fix(["insert", -1, -1, b, d]);
862	               a = c - 1;
863	               b = d - 1;
864	           },
865	           insertionStatic = function simpleDiff_insertionStatic() {
866	               table[two[b]][1] = table[two[b]][1] - 1;
867	               fix([
868	                   "insert", -1, -1, b, b + 1
869	               ]);
870	               a = c - 1;
871	               b = d;
872	           },
873	           replacement     = function simpleDiff_replacement() {
874	               do {
875	                   table[one[c]][0] = table[one[c]][0] - 1;
876	                   table[two[d]][1] = table[two[d]][1] - 1;
877	                   c                = c + 1;
878	                   d                = d + 1;
879	               } while (
880	                   c < lena &&
881	                   d < lenb &&
882	                   table[one[c]][1] > 0 &&
883	                   table[two[d]][0] > 0
884	               );
885	               fix(["replace", a, c, b, d]);
886	               a = c - 1;
887	               b = d - 1;
888	           },
889	           replaceUniques  = function simpleDiff_replaceUniques() {
890	               do {
891	                   table[one[c]][0] = table[one[c]][0] - 1;
892	                   c                = c + 1;
893	                   d                = d + 1;
894	               } while (
895	                   c < lena &&
896	                   d < lenb &&
897	                   table[one[c]][1] < 1 &&
898	                   table[two[d]][0] < 1
899	               );
900	               fix(["replace", a, c, b, d]);
901	               a = c - 1;
902	               b = d - 1;
903	           };

905	       // * First Pass, account for lines from first file
906	       // * build the table from the second file

908	RFC XXXX                Simple Diff Algorithm               October 2017

910	       do {
911	           if (options.diffspaceignore === true) {
912	               two[b] = two[b].replace(/\s+/g, "");
913	           }
914	           if (table[two[b]] === undefined) {
915	               table[two[b]] = [0, 1];
916	           } else {
917	               table[two[b]][1] = table[two[b]][1] + 1;
918	           }
919	           b = b + 1;
920	       } while (b < lenb);

922	       // * Second Pass, account for lines from second file
923	       // * build the table from the first file
924	       lena = one.length;
925	       a    = 0;
926	       do {
927	           if (options.diffspaceignore === true) {
928	               one[a] = one[a].replace(/\s+/g, "");
929	           }
930	           if (table[one[a]] === undefined) {
931	               table[one[a]] = [1, 0];
932	           } else {
933	               table[one[a]][0] = table[one[a]][0] + 1;
934	           }
935	           a = a + 1;
936	       } while (a < lena);
937	       a = 0;
938	       b = 0;

940	       // find all equality... differences are what's left over solve
941	       // only for the second set test removing reverse test removing
942	       // undefined checks for table refs

944	       do {
945	           c = a;
946	           d = b;
947	           if (one[a] === two[b]) {
948	               equality();
949	           } else if (table[one[a]][1] < 1 && table[two[b]][0] < 1) {
950	               replaceUniques();
951	           } else if (
952	               table[one[a]][1] < 1 &&
953	               one[a + 1] !== two[b + 2]
954	           ) {
955	               deletion();
956	           } else if (
957	               table[two[b]][0] < 1 &&
958	               one[a + 2] !== two[b + 1]
959	           ) {
960	               insertion();

962	RFC XXXX                Simple Diff Algorithm               October 2017

964	           } else if (
965	               table[one[a]][0] - table[one[a]][1] === 1 &&
966	               one[a + 1] !== two[b + 2]
967	           ) {
968	               deletionStatic();
969	           } else if (
970	               table[two[b]][1] - table[two[b]][0] === 1 &&
971	               one[a + 2] !== two[b + 1]
972	           ) {
973	               insertionStatic();
974	           } else if (one[a + 1] === two[b]) {
975	               deletion();
976	           } else if (one[a] === two[b + 1]) {
977	               insertion();
978	           } else {
979	               replacement();
980	           }
981	           a = a + 1;
982	           b = b + 1;
983	       } while (a < lena && b < lenb);
984	       if (lena - a === lenb - b) {
985	           if (one[a] === two[b]) {
986	               fix(["equal", a, lena, b, lenb]);
987	           } else {
988	               fix(["replace", a, lena, b, lenb]);
989	           }
990	       } else if (a < lena) {
991	           fix(["delete", a, lena, -1, -1]);
992	       } else if (b < lenb) {
993	           fix(["insert", -1, -1, b, lenb]);
994	       }
995	       return codes;
996	   }

998	RFC XXXX                Simple Diff Algorithm               October 2017

1000	4. Extensibility

1002	   Since the code is simple and its execution is predictable the
1003	   approach is open to modification.  In different contexts a most
1004	   precise comparison of text is not the most desirable objective.  One
1005	   example is to ignore certain particular differences that might be
1006	   considered as unimportant.  This degree of flexibility and
1007	   extensibility allows for custom precision.

1009	   Custom flexibility allows for a variety of advanced concerns.  Such
1010	   concerns may include nuances in accounting for language specific
1011	   criteria whether programming, spoken, or written.  Due to flexible
1012	   rules, whether colloquialisms or syntax variations, differences of
1013	   literal text may commonly reside that are not represented as
1014	   differences in the parsed, or interpreted, result.  In artificially
1015	   intelligent systems it is necessary to account for these variations
1016	   as comparisons occur.

1018	5. Security considerations

1020	   None.

1022	6. IANA considerations

1024	   None.

1026	7. References

1028	   Heckel, P. (April 1978) "A technique for isolating differences
1029	   between files".  Communications of the ACM. 21 (4): 264-268.
1030	   doi:10.1145/359460.359467

1032	   Myers, E. (1986) "An O(ND) Difference Algorithm and Its Variations".
1033	   Retrieved from http://xmailserver.org/diff2.pdf on 3 October 2017.

1035	8. Author's address

1037	   Austin Cheney
1038	   http://prettydiff.com
1039	   info@prettydiff.com

1041	EXPIRATION 6 April 2017