idnits 2.17.1 draft-heikkila-ip-checksum-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. -- The draft header indicates that this document updates RFC1071, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC1624, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC1141, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 233 has weird spacing: '... and c ='...' (Using the creation date from RFC1071, updated by this document, for RFC5378 checks: 1988-09-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 5, 2012) is 4190 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'A' is mentioned on line 296, but not defined

  == Missing Reference: 'B' is mentioned on line 296, but not defined

  == Missing Reference: 'C' is mentioned on line 296, but not defined

  == Missing Reference: 'D' is mentioned on line 296, but not defined

  == Missing Reference: 'E' is mentioned on line 284, but not defined

  == Missing Reference: 'F' is mentioned on line 284, but not defined

  == Missing Reference: 'G' is mentioned on line 284, but not defined

  == Missing Reference: 'H' is mentioned on line 284, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 746

  -- Looks like a reference, but probably isn't: '255' on line 333

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)


     Summary: 3 errors (**), 0 flaws (~~), 10 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                               H. Heikkila
3	
4	Intended Status: Informational
5	Updates: 1071, 1141, 1624 (once approved)
6	Expires: May 9, 2013                                    November 5, 2012

8	           Practical Math and Algorithms for TCP/IP checksum
9	                  

11	Abstract

13	   This document reformulates the definition of TCP/IP checksum.
14	   The new formulation is equivalent to the traditional one, but
15	   it uses much simpler mathematics, avoiding concepts like "one's
16	   complement sum".  This document attempts to be helpful for both
17	   newbies and seasoned engineers when considering checksum
18	   problems.  Practical calculation and software examples are included.

20	Status of this Memo

22	   Distribution of this memo is unlimited.

24	   This Internet-Draft is submitted in full
25	   conformance with the provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html

43	Copyright and License Notice

45	   Copyright (c) 2012 IETF Trust and the persons identified as the
46	   document authors. All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document. Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document. Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1. Introduction
61	   2. Old Definition of Checksum
62	   3. Mathematical Observations
63	      3.1. Basics
64	      3.2. The Shift Property
65	      3.3. Negation and Subtraction
66	      3.4. Byte Order Independence
67	   4. New Definition of Checksum
68	   5. Examples of Algorithms
69	      5.1. Basic Calculations
70	      5.2. Integer Division versus Folding
71	      5.3. Choice of Data Types
72	      5.4. Efficient Calculation of Sum
73	      5.5. Byte Order Issues
74	      5.6. Incremental Update of Checksum
75	      5.7. The Last Octet
76	   6. Security Considerations
77	   7. IANA Considerations
78	   8. References
79	      8.1. Normative References
80	      8.2. Informative References

82	1. Introduction

84	   RFC 793 defines the TCP checksum briefly as "the 16 bit one's
85	   complement of the one's complement sum of all 16 bit words in the
86	   header and text" [RFC0793].  RFC 1071 contains helpful details and
87	   examples that make the checksum understandable, and mentions the
88	   useful method of "end around carry".  However, it can be argued that
89	   the presentation of RFC 1071 is still more complicated than necessary
90	   [RFC1071].

92	   This document argues that the checksum is best described in terms of
93	   more elementary mathematics, namely remainders modulo 65,535.  It may
94	   be easier to avoid the concept of one's complement sum, which is
95	   nowadays not widely used for computer arithmetic.  This document also
96	   attempts to show that the proposed new description is, in general,
97	   sufficient for practical purposes.  In what follows, the equivalence
98	   of the old and new definition is proved, example calculations are
99	   included to show how the known mathematical properties of the
100	   checksum can be derived from the new definition, and software code
101	   examples are included.

103	2. Old Definition of Checksum

105	   RFC 1071 uses the notation +' for one's complement sum.  Here is a
106	   definition of it:

108	         (Definition of binary operation +')
109	         For any two integers a and b such that 0 <= a,b <= 65,535,

111	         a +' b  =  a+b,           if a+b <= 65,535;            [Eq. 1]

113	                 =  a+b-65,535,    otherwise.

115	   Textually, this definition appears to differ from that in RFC 1071;
116	   but if the reader understands the +' operation in RFC 1071, he or she
117	   can easily verify that the two formulations of +' are equivalent.
118	   RFC 1071 calls this operation "1's complement addition".  But in the
119	   definition of this document, the operands a,b and the result are
120	   definitely non-negative integers (actual 1's complement calculations
121	   are not involved).

123	   Note that this definition uses elementary school mathematics: the
124	   right-hand side of the definition uses just the well-known addition
125	   and subtraction operators.  Overflow or wraparound properties are not
126	   used as such and wraparound of +' is defined explicitly.  Also, the
127	   decimal constant 65,535 is chosen (instead of the hexadecimal
128	   constant ffff) to emphasize the elementary character of the proposed
129	   mathematical operations.

131	   Operation +' wraps around to prevent results larger than 65,535, but
132	   the wraparound is slightly different from ordinary addition modulo
133	   65,535.  The following figure illustrates the wraparound properties
134	   of operator +' and compares it to the normal wraparound modulo 65,535
135	   (wrap-to-one versus wrap-to-zero).  Of course, both wrap methods
136	   differ from the familiar 16-bit register addition modulo 2^16, since
137	   65,535 = 2^16-1.

139	                +----------  wraparound of +'  ----------+
140	                |                                        |
141	                v                                        |
142	           -> 0 -> 1 -> 2 -> . . . -> 65,534 -> 65,535 ->+     [Fig. 2]
143	           ^                                  |
144	           |                                  |
145	           +---  wraparound modulo 65,535  ---+

147	   Following RFC 1071 very closely, we define IP checksum as follows:

149	     (1) Data consists of octets, which are paired to form 16-bit
150	         unsigned integers, with most significant octet first (network
151	         byte order).  Within the data, there is a field for 16-bit
152	         checksum, in an even offset location.  If the total number of
153	         octets is odd, an additional zeroed octet is added to the end
154	         of the octet string for checksum calculation and removed after
155	         the calculation.

157	     (2) To generate a checksum, the checksum field itself is cleared,
158	         the sum B is computed over the octets concerned using operation
159	         +', and the value (65,535-B) is placed in the checksum field.
160	         (Note that 65,535-B means just the elementary subtraction.)

162	     (3) To check a checksum, the sum B is computed over the octets
163	         concerned using operation +'. If the result is 65,535, the
164	         check succeeds.

166	   Note some differences to RFC 1071:  We have 16-bit *unsigned*
167	   integers, while RFC 1071 has just "integers" (probably to make it
168	   possible to interpret these 16 bits as a signed integer with one's
169	   complement representation).  Instead of (65,535-B), RFC 1071 uses bit
170	   complementation, but the reader can easily see that the results are
171	   the same.  Thus, the definition presented above is equivalent to that
172	   in RFC 1071.

174	   For completeness, we add a fourth rule, mandated by [RFC0768] and
175	   [RFC2460]:

177	     (4) To generate a checksum for UDP header, calculate as above, but
178	         if the resulting checksum is 0, place 65,535 instead in the
179	         checksum field.

181	   One observation is still in order.  If some data consists of nothing
182	   but octets where all bits are cleared, we call the data "zeroed".
183	   Real protocols never send such data; for example, IPv4 header starts
184	   with version number 4 and so IPv4 header checksum is never calculated
185	   over zeroed data [RFC0791].  However, the algorithm may still face
186	   zeroed data, as errors are possible.  The following facts are
187	   notable.

189	   o  If checksum is generated over zeroed data, the checksum will be
190	      65,535.  After that checksum has been placed to the checksum
191	      field, the resulting data will be acceptable to the checking
192	      algorithm.

194	   o  If checksum checking is done over zeroed data (meaning that even
195	      the checksum is zeroed), the checking will reject the data.  (This
196	      is appropriate, because no reasonable protocol would send such
197	      data anyway.)

199	   In what follows, we occasionally need to consider three special cases
200	   for checksum calculation: UDP data, zeroed data with valid checksum
201	   65,535, and zeroed data with invalid checksum 0.  (This is slightly
202	   annoying, because without these cases the checksum analysis would be
203	   easier.)

205	3. Mathematical Observations

207	   This section contains basic mathematical analysis of the checksum
208	   algorithm.  While this section is not really difficult, the reader
209	   may anyway choose to skip it.  The later part of this document uses
210	   only simple and well-known concepts like division with remainder and
211	   bit shift.

213	3.1. Basics

215	   We use the symbol % for remainder division, in accordance with the C
216	   programming language.  While our divisors are positive, we allow the
217	   dividends to be negative.  The remainder is always positive; for
218	   example, (-2) % 65,535 = 65,533.  Computers are known to be
219	   unpredictable with division of negative integers; so we avoid such
220	   divisions in algorithms.

222	   As expected, we need the concept of "congruence modulo M"; we use
223	   symbol =' for the congruence relation.  Thus
224	                              a =' b   (mod M)

226	   means simply that a and b have the same remainder when divided by M.
227	   Equivalently, this means that the difference a-b is an integral
228	   multiple of M.  For checksum, the modulus M is always 65,535, and we
229	   omit the (mod M) from the notation, from now on.

231	   One well-known property of congruence is this.  If

233	                      a =' b    and   c =' d

235	   then

237	      a+c =' b+d     and     a-c =' b-d     and     ac =' bd.   [Eq. 3]

239	   This fact makes it possible to calculate with congruences almost as
240	   easily as with equalities.

242	   Obviously a+b =' a+b-65,535, so that definition [Eq. 1] yields this:

244	      Fact.  a+'b =' a+b.

246	      Corollary.  If B is the sum calculated with +' over some 16-bit
247	      unsigned integers and S is the sum calculated with ordinary
248	      addition over the same data, then B =' S.

250	   According to definition item (3), the sum with +' yields 65,535 when
251	   calculated over correctly checksummed data; since 65,535 =' 0, we
252	   see:

254	      Corollary.  If S is the sum calculated with ordinary addition over
255	      some data that has correct checksum, S =' 0; equivalently, S is
256	      divisible by 65,535.
257	                                                                 [Eq. 4]

259	   Now we see that [Eq. 4] is an almost complete definition of the
260	   checksum, without referring to one's complement sum.  From it, we can
261	   derive both generation and checking algorithms for checksum, with the
262	   following open issues: for generating a checksum, [Eq. 4] does not
263	   tell whether 0 or 65,535 should be chosen if both are possible; for
264	   checking a checksum, [Eq. 4] considers both 0 and 65,535 equal as
265	   checksums, while sometimes they are not.

267	3.2. The Shift Property

269	   Obviously, 65,536 =' 1 and 65,536 = 2^16.  From this we see that for
270	   any integer a, we have a*2^16 =' a.  But for non-negative integers,
271	   multiplication by 2^16 means bit shifting to the left by 16 bits.

273	   Using the C notation for bit shift (left shift is <<, right shift is
274	   >>), we have the following basic property for any non-negative
275	   integer a:

277	                                a << 16  =' a.                  [Eq. 5]

279	   As a first application of this property, consider four-octet
280	   integers, using the notation of RFC 1071, sec. 2(C).

282	       [A,B,C,D] + [E,F,G,H] =  [A,B,0,0] + [C,D] + [E,F,0,0] + [G,H]
283	                             =  ([A,B]<<16) + [C,D] + ([E,F]<<16) + [G,H]
284	                             =' [A,B] + [C,D] + [E,F] + [G,H]

286	   This is what RFC 1071 calls "parallel summation": we can accumulate
287	   the sum in 32-bit pieces.  The resulting block sum is not the same as
288	   the real block sum, but still congruent to it.  It is fairly easy to
289	   see that we can do parallel summation even for eight bytes, if we
290	   have 64-bit arithmetic support in hardware.

292	   The next trick is what RFC1071 calls "folding"; long integers can be
293	   shortened with bit shifting.

295	       [A,B,C,D]  =  [A,B,0,0] + [C,D]  =  ([A,B]<<16) + [C,D]
296	                                        =' [A,B] + [C,D]

298	3.3. Negation and Subtraction

300	   Negation and subtraction are easy to do with unsigned (non-negative)
301	   numbers if the modulus 65,535 is first added to the operands
302	   sufficiently many times.  To manipulate the negative of A, first find
303	   some N such that N*65,535 >= A; then (-A) =' N*65,535-A, and the
304	   latter value is non-negative but still congruent to the
305	   mathematically correct value.

307	   Similarly, to do subtraction A-B, substitute (N*65,535+A-B), where N
308	   is chosen so that the result is positive and calculations do not
309	   overflow.

311	3.4. Byte Order Independence

313	   We use the notation [a,b] = a*256+b for a 16-bit integer that
314	   consists of two octets a and b, as RFC 1071.  If we multiply [a,b] by
315	   256, we can calculate:

317	         256*[a,b]  =  256*(a*256+b) = a*256*256 + b*256
318	                    =  a*2^16 + b*256 = b*256 + (a << 16)
319	                    =' b*256 + a.

321	   From this we see: 256*[a,b] =' [b,a] and 256*[b,a] =' [a,b].  Byte
322	   swapping is thus equivalent to multiplying by 256, modulo M.  But
323	   multiplication is distributive over addition, so that

325	       256*([a,b]+[c,d]+ ... + [y,z])
326	             =  256*[a,b]+256*[c,d]+ ... +256*[y,z]
327	             =' [b,a] + [d,c] + ... + [z,y]

329	   From this we see that if we deliberately swap the bytes of terms of a
330	   sum, calculate the sum, and then swap the bytes back, we get a result
331	   that is (of course) in general not identical to the correct sum but
332	   at least congruent to it modulo 65,535.  And in checksum calculation
333	   the critical results 0 = [0,0] and 65,535 = [255,255] are immune to
334	   swapping.  So the congruence method yields rather elegantly the well
335	   known property of IP checksum: it can be calculated with inverted
336	   byte order.

338	4. New Definition of Checksum

340	   Here is a modified checksum definition.  The mathematical
341	   considerations above show that it is equivalent to the traditional
342	   one.

344	   We define the *blocksum* over a set of 16-bit unsigned integers to be
345	   the arithmetic sum of these integers.

347	     (1) Data consists of octets, which are paired to form 16-bit
348	         unsigned integers, with most significant octet first (network
349	         byte order).  Within the data, there is a field for 16-bit
350	         checksum, in an even offset location.  If the total number of
351	         octets is odd, an additional zeroed octet is added to the end
352	         of the octet string for checksum calculation and removed after
353	         the calculation.

355	     (2) To generate a checksum, the checksum field itself is cleared,
356	         the blocksum B is computed over the octets concerned, then B is
357	         divided by 65,535, yielding remainder R.  The value C is placed
358	         in the checksum field, computed as follows.

360	          (2a) If the blocksum B is zero, let C = 65,535.  (This is the
361	               theoretical case of zeroed data.)

363	          (2b) Otherwise, if R = 0 and the checksum is for UDP header,
364	               let C = 65,535 [RFC0768] [RFC2460].

366	          (2c) Otherwise, if R = 0, let C = 0.

368	          (2d) Otherwise, let C = 65,535 - R.

370	     (3) To check a checksum, the blocksum B is computed over the octets
371	         concerned.  If the result B is non-zero and divisible by
372	         65,535, the check succeeds.

374	   Note for the UDP case: UDP protocol examines the checksum field
375	   before applying algorithm (3); if the checksum is zero, (3) is not
376	   applied at all, as there is no checksum.  (Instead, the zero-checksum
377	   data is either accepted as in [RFC0768] or rejected as in [RFC2460].)

379	   Note also that the definition above is formulated to be exactly
380	   identical to that in RFC 1071.  For example, if IPv4 header contains
381	   checksum 65,535 (hex-ffff), algorithm (3) accepts it (as RFC 1071
382	   would do) although algorithm (2) would not produce such a checksum
383	   except in case (2b).  Also, in algorithm (3) the condition "B is non-
384	   zero" is usually unnecessary, because B=0 implies that all data is
385	   zeroed, and reasonable protocols reject such data on other grounds
386	   (for example, IPv4 header is rejected if its protocol version is not
387	   4).

389	5. Examples of Algorithms

391	   All examples use the C programming language.  Naturally, identical
392	   algorithms can be implemented in other software languages and even in
393	   hardware.

395	5.1. Basic Calculations

397	   When calculating checksum, the basic method is just to add together
398	   the 16-bit unsigned integers over the data, accumulating the sum in a
399	   32-bit variable.  The following example is taken from RFC 1071 (where
400	   it appears in a slightly more complicated form):

402	         register unsigned long sum = 0;
403	         unsigned short *addr = . .;
404	         int count = . . /* octet count */

406	         while (count > 1) {
407	            sum += *addr++;
408	            count -= 2;
409	         }

411	   If checksum is to be generated, the blocksum is calculated as above,
412	   after ensuring that the checksum field is cleared.  The basic method
413	   to calculate checksum is then:

415	         checksum = ((N*65535) - sum) % 65535;

417	   (This calculates the remainder of the negative of sum, but the
418	   addition of N*65535 with suitably large N guarantees that dividend is
419	   positive without changing the result.  For example, N can be 65537.)
420	   But if the checksum is calculated for UDP header, the algorithm is
421	   this:

423	         udp_checksum = 65535 - (sum % 65535);

425	   To check the checksum of some received data, blocksum is calculated
426	   as above, and verification can be done like this:

428	         if (sum % 65535 != 0) goto checksum_error;

430	   (Pedantically, the check should be "if (sum == 0 || sum % 65535 !=
431	   0)", but usually the sum is not zero, and if it is, other parts of
432	   the software reject the packet anyway.)

434	   In practice, many alterations of the algorithm are possible and often
435	   necessary, as follows.

437	      o  If hardware support for efficient remainder division is not
438	         available, some alternative algorithm is needed.

440	      o  The summing algorithm usually needs optimizations (as
441	         emphasized already in RFC 1071).

443	      o  The summing can be done in pieces (usually pieces of even
444	         offset and, except for the last piece, even size), and pieces
445	         can be summed in any order.

447	      o  Computer byte order needs to be taken into account (little-
448	         endian machines tend to reverse the byte order when doing
449	         arithmetic operations).

451	      o  The last odd octet, if any, needs special attention.

453	      o  We need algorithms for incremental update of checksum.

455	   The following sections consider these points.  We start with some
456	   preliminary remarks.

458	   The checksum algorithm is such that usually only the remainder of the
459	   sum is significant (when divided by 65,535).  There are several
460	   operations that change the mathematical value but preserve the
461	   remainder.  Such operations can be used during the course of
462	   calculation.  (The mathematical section above proves the validity of
463	   these operations.)

465	      o  Any multiple of 65,535 can be added or subtracted, as long as
466	         there is no overflow or underflow.  In particular, 0xffffffff
467	         (the largest unsigned 32-bit integer) is a multiple of 65,535.

469	      o  Remainder division can be done:  sum = sum % 65535.

471	      o  A non-negative value can be bit-shifted left by 16 bits: sum =
472	         sum << 16.

474	      o  A non-negative value can be folded by shifting the most-
475	         significant bits right by 16 and adding to the rest of the
476	         number, for example:

478	               0x123456789ab =  0x12345670000 + 0x89ab
479	                             =  (0x1234567<<16) + 0x89ab
480	                             -> change to 0x1234567 + 0x89ab

482	   In C language, we can define folding essentially as in RFC 1071:

484	               #define FOLD(sum)  (((sum) & 0xffff) + ((sum) >> 16))

486	      o  In assembly programming, the carry bit that overflows after
487	         unsigned 16-, 32-, or 64-bit addition can be shifted to the
488	         right and added back to the sum ("end around carry" [RFC1071]).

490	5.2. Integer Division versus Folding

492	   The algorithms, as defined, use the % operator, the remainder
493	   division.  Of course it is not available in all hardware
494	   implementations; on the other hand, even relatively simple
495	   microprocessors like the 16-bit Intel 8086 can divide a 32-bit
496	   integer by 65,535 and find the remainder with one single machine
497	   instruction (DIV).  Regarding efficiency, consider the example of
498	   Intel386SX (developed in 1980's); in 32-bit mode it can execute DIV
499	   instruction, nominally, in 38 clock cycles [386SX].  This should be
500	   contrasted with the masking, shifting and addition operations that
501	   would be the division-free alternative (folding); they might together
502	   take some ten clock cycles.

504	   Also, divisions would not be frequently repeated anyway.  Processing
505	   a typical incoming TCP segment might require exactly two division
506	   operations, one to the check the IPv4 header, the other to check the
507	   TCP data.  So, compared with other necessary processing, division is
508	   likely to be a feasible alternative in some applications.

510	   But there is an alternative to division, folding (defined above).
511	   This algorithm reduces any positive integer to 16 bits without
512	   changing the remainder (again, it is found in RFC 1071):

514	         while (sum>>16)
515	            sum = FOLD(sum);

517	   But usually sum has no more than 32 bits, in which case only two
518	   foldings are needed:

520	         sum = FOLD(FOLD(sum));

522	   Note that folding never produces zero result (unless its argument is
523	   zero), so folding has wrap-to-one behaviour; see [Fig. 2].  This
524	   changes the algorithms slighly, as folding tends to produce 65,535
525	   when true remainder is zero.  So here are the algorithms for checksum
526	   generation (UDP considered separately) and checking, without
527	   division:

529	         checksum = 65535 - FOLD(FOLD(sum));

531	         udp_checksum = FOLD(FOLD(0xffffffff - sum));

533	         if (FOLD(FOLD(sum) != 65535) goto checksum_error;

535	   In many systems, operations (65535-x) and (0xffffffff-x) can be
536	   easily implemented with bit complement.  Such optimizations are left
537	   to the reader as exercise.

539	5.3. Choice of Data Types

541	   We usually prefer unsigned integers, because division of signed
542	   integers is unpredictable in computers.  (It is worth noting,
543	   however, that signed division can be faster than unsigned division in
544	   some implementations.)

546	   Usually 32-bit integers are suitable for blocksum calculations.  For
547	   example, the maximum amount of data in a TCP/IPv4 segment is less
548	   than 65,535 octets (32,768 octet pairs), which sets an upper limit of
549	   block sum to 32,768*65,535 = 2,147,450,880; this means that blocksums
550	   fit in 32-bit integers, or even in signed 32-bit integers.

552	   However, 16-bit values can be calculated in parallel as 32-bit
553	   values, and then it may be advantageous to use 64-bit values for
554	   blocksums.  See below for parallel calculation.

556	5.4. Efficient Calculation of Sum

558	   As RFC 1071 correctly points out, it is usually appropriate to
559	   optimize the checksum routine; and obviously the reading and summing
560	   loop is *the* most important point to optimize.  Note that while the
561	   examples shown here are in the C programming language, the most
562	   optimal solution is usually achieved with assembly language.

564	   Data sum should be calculated in the natural byte order, as in
565	   example above: "sum += *addr".  This means that little-endian
566	   machines swap bytes, e.g., data that the network signifies as 0x1234
567	   is summed as 0x3412.  As known, and shown in the mathematical section
568	   above, this leads to correct results, if the calculated checksum is
569	   written to its field in the same natural manner: "*addr = checksum".
570	   Also, the important values 0 and 65,535 are immune to byte order.
571	   However, see section 5.5 below.

573	   It may be advantageous to read data in bigger pieces, 32 bits (or
574	   even 64 bits) as follows.  However, addition must not overflow, which
575	   is why folding is used in this example:

577	         register unsigned long sum = 0;
578	         unsigned long *addr = . .; /* long assumed to have 32 bits */
579	         int count = . . /* octet count */

581	         while (count > 3) {
582	            sum += FOLD(*addr++);
583	            count -= 4;
584	         }

586	   But if the sum counter has 64 bits, folding as above is not
587	   necessary.

589	   As a curiosity, consider the following case, which assumes that sum
590	   is a 17-bit (sic!) integer and addr is a pointer to 16-bit integer:

592	         while ( . . . ) {
593	            sum += *addr++;
594	            sum = FOLD(sum);
595	            count -= 2;
596	        }

598	   The case of a "17-bit" integer may sound theoretical, but actually
599	   this is precisely what 16-bit processors often do: the sum is
600	   accumulated to a 16-bit register, which, together with the "carry"
601	   bit, is effectively a 17-bit counter.  The FOLD operation corresponds
602	   to what RFC 1071 calls "end around carry".

604	   Often blocksum is calculated in pieces, and sometimes the start of
605	   some piece is not at an even offset from the beginning of data.  Such
606	   calculation swaps the octets in the sum; but this can be corrected so
607	   that the blocksum is swapped again before incorporating to the final
608	   sum (as noted in RFC 1071).

610	5.5. Byte Order Issues

612	   As explained above, byte order should usually be ignored in software
613	   algorithms, permitting the processor to move data from memory to
614	   calculation as efficiently as possible.  However, this means that
615	   checksum data is in network byte order while ordinary data is in host
616	   byte order.  Then the standard operations "htons()" (host-to-network
617	   swap, short variable) and "ntohs()" (network-to-host) should be used
618	   as appropriate [POSIX].  Consider the case of IPv4 header, where the
619	   TTL field is decremented by 1 at every router, and this means that
620	   one element in the blocksum is decremented by 0x0100.  In order not
621	   to recalculate the whole header checksum, the checksum is simply
622	   incremented by 0x0100, but this value must be modified as follows:

624	      new_chksum = ((unsigned long)old_chksum + htons(0x0100)) % 65535;

626	   Here, "htons()" is necessary because computers keep arithmetic
627	   constants in host byte order.  (The pedantic typecast "(unsigned
628	   long)" is a C technicality to force the machine to use at least 32
629	   bits during calculation to prevent overflow, even if old_chksum is a
630	   16-bit integer.  Here, the typecast is shown also with a
631	   documentation purpose, to emphasize that more than 16 bits are indeed
632	   necessary during the calculation.)

634	   Note: RFC 1141 and RFC 1624 consider a similar problem, but they
635	   emphasize hardware optimization (use bit operations instead of
636	   division), and do not mention byte order [RFC1141] [RFC1624].

638	5.6. Incremental Update of Checksum

640	   Incremental update of checksum is an old idea; it is addressed in
641	   [RFC1141] and [RFC1624], and also section 5.5 above uses one case of
642	   it as an example.  This section contains a new example, ICMPv6 Echo
643	   Reply message, to illustrate the method of incremental update.

645	   RFC 4443 defines the important Echo Request message, to which a host
646	   replies with Echo Reply (ping6).  Suppose a host has received a
647	   Request message; naturally, it has to verify the checksum of that
648	   message before accepting it.  Before sending the Reply message, the
649	   host has to calculate also the checksum over the reply.  But since
650	   the Request and Reply are almost identical, some CPU time can be
651	   saved so that the Reply checksum is calculated from the Request
652	   checksum.  Inspection of RFC 4443 shows that the checksums of Request
653	   and Reply differ only with respect to IPv6 addresses (in pseudo-
654	   header) and ICMPv6 type (and the checksum itself).  So the methods of
655	   incremental update are well applicable to the Echo Reply case.  We
656	   limit our consideration to the IPv6 address case; the update of
657	   ICMPv6 type from 128 to 129 is an easy exercise to the reader.

659	   In some cases, the IPv6 addresses of Request and Reply are the same,
660	   just so that the source and destination are swapped; and the order of
661	   addresses does not, of course, change the checksum.  But if the
662	   destination address of the Request is a multicast or anycast address,
663	   the source address of the Reply must be a unicast address; see
664	   [RFC4443], Sec. 4.2.

666	   In what follows, we imitate the notation of RFC 1624, using ordinary
667	   addition and subtraction instead of +' and bit complement.  We
668	   occasionally use the congruence operation =' (modulo 65,535).

670	             HC  - old checksum in ICMPv6 header
671	             C   - blocksum of old data
672	             HC' - new checksum in ICMPv6 header (ICMPv6 type still
673	                   unchanged)
674	             C'  - blocksum of new data (ICMPv6 type still unchanged)
675	             m   - blocksum over the 16-octet old address (multicast
676	                   or anycast)
677	             m'  - blocksum over the 16-octet new address (unicast)

679	   Then, if the blocksum of the unchanged part of the data is A, we
680	   have:

682	             A + m  +  HC  = C
683	             A + m' +  HC' = C'

685	   Subtracting the two equations we get this:

687	            HC' - HC + m' - m = C' - C

689	   Although C and C' need not be identical, they are the same modulo
690	   65,535 (in fact, C =' C' =' 0, if the checksum is correctly
691	   calculated), so that C' - C =' 0.  Then, doing elementary
692	   manipulations, we get these facts, analogously to RFC 1642:

694	            HC' =' HC + m - m'
695	            HC' =' -(m' - m - HC)

697	   Both of these formulas can be used.  The former formula is suitable
698	   if we use division, and the latter one if we use folding.  (RFC 1642
699	   does not consider division and so prefers the latter formula.)  We
700	   obtain the following C code (two alternatives):

702	      new_chk = (9*65535 + old_chk
703	                 + oldaddr_sum - newaddr_sum) % 65535;

705	      new_chk = 65535 - FOLD(FOLD(9*65535 + newaddr_sum
706	                                  - oldaddr_sum - old_chk));

708	   The constant 9*65535 is chosen small enough so that 32-bit addition
709	   does not overflow but large enough so that subtraction cannot make
710	   the result negative.  Note that a sum over a 16-bit IPv6 address
711	   cannot exceed 8*65,535 and a checksum cannot exceed 65,535, hence the
712	   constant 9.

714	   A final note: if we were to update UDP checksum incrementally, we
715	   would need a checksum in the range 1...65,535, and so the following C
716	   code would be appropriate (two alternatives, where old_ds and new_ds
717	   are sum over old data and sum over new data, respectively):

719	      n_udp_chk = 65535 -
720	                  (N*65535 + new_ds - old_ds - o_udp_chk) % 65535;

722	      n_udp_chk = FOLD(FOLD(N*65535 + o_udp_chk + old_ds - new_ds));

724	   Also in this case, N must be chosen according to circumstances so
725	   that addition and subtraction cannot lead to overflow or underflow in
726	   32-bit arithmetic.

728	   Note that byte order is not an issue in this example, provided that
729	   all calculations are done with the same byte order (the octets in
730	   IPv6 addresses are calculated with the same algorithm as all other
731	   octets in the data).

733	5.7. The Last Octet

735	   If checksum is calculated over an odd number of octets, the last
736	   octet is alone and needs a zeroed octet to form a 16-bit unsigned
737	   integer.  This is easy, if byte order is taken into account.  Here
738	   are two ways to add the last octet to a sum:

740	      {
741	         unsigned short tmp = 0;
742	         ((unsigned char*)&tmp)[0] = ((unsigned char*)last_addr)[0];
743	         sum += tmp;
744	      }

746	      sum += htons(((unsigned char*)last_addr)[0] << 8);

748	6.  Security Considerations

750	   There are no security considerations relevant to this document.

752	7.  IANA Considerations

754	   No actions are required from IANA as result of the publication of
755	   this document.

757	8.  References

759	8.1.  Normative References

761	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
762	              August 1980.

764	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791, September
765	              1981.

767	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
768	              793, September 1981.

770	   [RFC1071]  Braden, R., Borman, D., and C. Partridge, "Computing the
771	              Internet checksum", RFC 1071, September 1988.

773	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
774	              (IPv6) Specification", RFC 2460, December 1998.

776	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, Ed., "Internet
777	              Control Message Protocol (ICMPv6) for the Internet
778	              Protocol Version 6 (IPv6) Specification", RFC 4443, March
779	              2006.

781	8.2.  Informative References

783	   [POSIX]    The Open Group Base Specifications Issue 7.  IEEE Std
784	              1003.1-2008.

786	   [RFC1141]  Mallory, T. and A. Kullberg, "Incremental updating of the
787	              Internet checksum", RFC 1141, January 1990.

789	   [RFC1624]  Rijsinghani, A., Ed., "Computation of the Internet
790	              Checksum via Incremental Update", RFC 1624, May 1994.

792	   [386SX]    Intel386(TM) SX Microprocessor Programmers's Reference
793	              Manual.  Inter Order No. 240331-002, ISBN 1-55512-154-3,
794	              1991.

796	Authors' Addresses

798	   Heikki Heikkila
799	   Tellabs Oy
800	   Sinimaentie 6 C
801	   02630 Espoo
802	   Finland
803	   EMail: heikki.heikkila@tellabs.com