idnits 2.17.1 draft-krovetz-umac-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1489 has weird spacing: '...dom tag pro...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2000) is 8595 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'Decimal' on line 490

  -- Looks like a reference, but probably isn't: 'Hexadecimal' on line 490

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  ** Downref: Normative reference to an Informational RFC: RFC 2104 (ref. '9')

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'


     Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IPSec Working Group                                    T. Krovetz, Intel
3	INTERNET-DRAFT                                             J. Black, UNR
4	Expires April 2001                                        S. Halevi, IBM
5	                                                A. Hevia, U.C. San Diego
6	                                                   H. Krawczyk, Technion
7	                                                  P. Rogaway, U.C. Davis
8	                                                            October 2000

10	       UMAC: Message Authentication Code using Universal Hashing
11	                      

13	Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of RFC2026.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	Abstract

33	   This specification describes how to generate an authentication tag
34	   (also called a "MAC") using the UMAC message authentication code.
35	   UMAC is designed to be very fast to compute, in software, on
36	   contemporary processors.  Measured speeds are as low as 1.0 cycles
37	   per byte.  The heart of UMAC is a universal hash function, UHASH,
38	   which relies on addition and multiplication of 16-bit, 32-bit, or
39	   64-bit numbers, operations well-supported by contemporary machines.

41	   To generate the authentication tag on a given message, UHASH is
42	   applied to the message and key to produce a short, fixed-length, hash
43	   value, and this hash value is then XOR-ed with a key-derived
44	   pseudorandom pad.  UMAC enjoys a rigorous security analysis and its
45	   only "cryptographic" use is a block cipher, AES, to generate the
46	   pseudorandom pads and internal key material.

48	                           Table of Contents

50	1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . .   2
51	   1.1  Organization . . . . . . . . . . . . . . . . . . . . . . . .   5
52	2  Named parameter sets: UMAC16 and UMAC32 . . . . . . . . . . . . .   5
53	   2.1  Named parameters . . . . . . . . . . . . . . . . . . . . . .   5
54	   2.2  Alternative instantiations . . . . . . . . . . . . . . . . .   6
55	   2.3  Naming convention  . . . . . . . . . . . . . . . . . . . . .   7
56	3  Notation and basic operations . . . . . . . . . . . . . . . . . .   7
57	   3.1  Operations on strings  . . . . . . . . . . . . . . . . . . .   8
58	   3.2  Operations on integers . . . . . . . . . . . . . . . . . . .  10
59	   3.3  String-Integer conversion operations . . . . . . . . . . . .  10
60	   3.4  Mathematical operations on strings . . . . . . . . . . . . .  11
61	4  Key and pad derivation functions  . . . . . . . . . . . . . . . .  12
62	   4.1  KDF: Key derivation function . . . . . . . . . . . . . . . .  12
63	   4.2  PDF: Pad-derivation function . . . . . . . . . . . . . . . .  13
64	5  UHASH-32: Universal hash function for a 32-bit word size  . . . .  15
65	   5.1  NH-32: NH hashing with a 32-bit word size  . . . . . . . . .  16
66	   5.2  L1-HASH-32: First-layer hash . . . . . . . . . . . . . . . .  17
67	   5.3  POLY: Polynomial hash  . . . . . . . . . . . . . . . . . . .  19
68	   5.4  L2-HASH-32: Second-layer hash  . . . . . . . . . . . . . . .  20
69	   5.5  L3-HASH-32: Third-layer hash . . . . . . . . . . . . . . . .  22
70	   5.6  UHASH-32: Three-layer universal hash . . . . . . . . . . . .  23
71	6  UHASH-16: Universal hash function for a 16-bit word size  . . . .  24
72	   6.1  NH-16: NH hashing with a 16-bit word size  . . . . . . . . .  24
73	   6.2  L1-HASH-16: First-layer hash . . . . . . . . . . . . . . . .  25
74	   6.3  L2-HASH-16: Second-layer hash  . . . . . . . . . . . . . . .  27
75	   6.4  L3-HASH-16: Third-layer hash . . . . . . . . . . . . . . . .  28
76	   6.5  UHASH-16: Three-layer universal hash . . . . . . . . . . . .  29
77	7  UMAC tag generation . . . . . . . . . . . . . . . . . . . . . . .  30
78	   7.1  Interface  . . . . . . . . . . . . . . . . . . . . . . . . .  30
79	   7.2  Algorithm  . . . . . . . . . . . . . . . . . . . . . . . . .  30
80	8  Security considerations . . . . . . . . . . . . . . . . . . . . .  31
81	   8.1  Resistance to cryptanalysis  . . . . . . . . . . . . . . . .  31
82	   8.2  Tag lengths and forging probability  . . . . . . . . . . . .  31
83	   8.3  Selective-assurance authentication . . . . . . . . . . . . .  33
84	   8.4  Nonce considerations . . . . . . . . . . . . . . . . . . . .  34
85	   8.5  Guarding against replay attacks  . . . . . . . . . . . . . .  35
86	9  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  36
87	10 References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  36
88	11 Author contact information  . . . . . . . . . . . . . . . . . . .  37
89	A  Suggested application programming interface (API) . . . . . . . .  38
90	B  Reference code and test vectors . . . . . . . . . . . . . . . . .  39
91	1  Introduction

93	   This specification describes how to generate an authentication tag
94	   (also called a "MAC") using the UMAC message authentication code.
95	   Typically the authentication tag will be transmitted along with a
96	   message and a nonce to allow the receiver of the message to verify
97	   the message's authenticity.  Generation and verification of the
98	   authentication tag depends on the message, the nonce, and on a secret
99	   key (typically, shared by sender and receiver).

101	   UMAC is designed to be very fast to compute, in software, on
102	   contemporary processors.  The heart of UMAC is a universal hash
103	   function, UHASH, which relies on addition and multiplication of
104	   16-bit, 32-bit, and 64-bit numbers.  These operations are supported
105	   well by contemporary machines.

107	   For many applications, especially ones with short-lived
108	   authentication needs, sufficient speed is already obtained by
109	   algorithms such as HMAC-SHA1 [2, 9] or the CBC-MAC of a block cipher
110	   [1, 8].  But for the most speed-demanding applications, UMAC may be a
111	   better choice:  An optimized implementation of UMAC can achieve peak
112	   performance which is about an order of magnitude faster than what can
113	   be achieved with HMAC or CBC-MAC.  Moreover, UMAC offers a tradeoff
114	   between forging probability and speed (it is possible to trade
115	   forging probability for speed).  UMAC has been designed so that
116	   computing the prefix of a tag can be done faster than computing the
117	   entire tag.  This feature allows for a receiver to verify the
118	   authenticity of a message to various levels of assurance depending on
119	   its needs and resources.  Finally, UMAC enjoys better analytical
120	   security properties than many other constructions.

122	   Closely associated to this specification are the papers [3, 4, 10,
123	   11].  See those papers for descriptions of the ideas which underlie
124	   this algorithm, for performance data, and for proofs of the
125	   correctness and maximal forging probability of UMAC.

127	   The UMAC algorithms described in the papers [3, 4] are
128	   "parameterized".  This means that various low-level choices, like the
129	   endian convention and the underlying cryptographic primitive, have
130	   not been fixed.  One must choose values for these parameters before
131	   the authentication tag generated by UMAC (for a given message, key,
132	   and nonce) becomes fully-defined.  In this document we provide two
133	   collections of parameter settings, and have named the sets UMAC16 and
134	   UMAC32. The parameter sets have been chosen based on experimentation
135	   and provide good performance on a wide variety of processors.  UMAC16
136	   is designed to excel on processors which provide small-scale SIMD
137	   parallelism of the type found in Intel's MMX and Motorola's AltiVec
138	   instruction sets, while UMAC32 is designed to do well on processors
139	   with good 32- and 64- bit support.  UMAC32 may take advantage of SIMD
140	   parallelism in future processors.

142	   UMAC has been designed to allow implementations which accommodate
143	   "on-line" authentication.  This means that pieces of the message may
144	   be presented to UMAC at different times (but in correct order) and an
145	   on-line implementation will be able to process the message correctly
146	   without the need to buffer more than a few dozen bytes of the
147	   message.  For simplicity, the algorithms in this specification are
148	   presented as if the entire message being authenticated were available
149	   at once.

151	   The ideas which underlie UMAC go back to Wegman and Carter [12].  The
152	   sender and receiver share a secret key (the MAC key) which
153	   determines:

155	   * The key for a "universal hash function".  This hash function is
156	     "non-cryptographic", in the sense that it does not need to have any
157	     cryptographic "hardness" property.  Rather, it needs to satisfy
158	     some combinatorial property, which can be proven to hold without
159	     relying on unproven hardness assumptions.  The concept of a
160	     universal hash function (family) is due to [5].

162	   * The key for a pseudorandom function.  This is where one needs a
163	     cryptographic hardness assumption.  The pseudorandom function may
164	     be obtained (for example) from a block cipher or cryptographic hash
165	     function.  The concept of a pseudorandom function (family) is due
166	     to [6].

168	   To authenticate a message, Msg, one first applies the universal hash
169	   function, resulting in a string which is typically much shorter than
170	   the original message.  The pseudorandom function is applied to a
171	   nonce, and the result is used in the manner of a Vernam cipher: the
172	   authentication tag is the xor of the output from the hash function
173	   and the output from the pseudorandom function.  Thus, an
174	   authentication tag is generated as

176	        AuthTag = f(Nonce) xor h(Msg).

178	   Here f is the pseudorandom function shared between the sender and the
179	   receiver, and h is a universal hash function shared by the sender and
180	   the receiver.  In UMAC, a shared key is used to key the pseudorandom
181	   function f, and then f is used for both tag generation and internally
182	   to generate all of the bits needed by the universal hash function.
183	   For a general discussion of the speed and assurance advantages of
184	   this approach see, for example, [3, 7].

186	   The universal hash function that we use is called UHASH.  It combines
187	   several software-optimized algorithms into a multi-layered structure.
188	   The algorithm is moderately complex.  Some of this complexity comes
189	   from extensive speed optimizations.

191	   For the pseudorandom function we use the block cipher of the Advanced
192	   Encryption Standard (AES).  (At the time of this working draft, the
193	   AES definition process is still in progress.  Here AES refers to the
194	   final blok cipher defined by this process.)  Any block cipher with
195	   the same block-length (128 bits) and key-length (128 bits) could
196	   trivially be substituted in place of what we call AES.  With slightly
197	   more effort one can define UMAC using a pseudorandom function other
198	   than a block cipher.

200	   One unusual feature of UMAC is that authentication-tag generation
201	   depends on a nonce (in addition to depending on the message and key)
202	   It is imperative that the nonce not be reused when generating
203	   authentication tags under the same key.  Thus the nonce will normally
204	   be implemented by a counter, though any other way to achieve a non-
205	   repeating value (or almost certainly non-repeating value) is
206	   acceptable.

208	   This document specifies the procedure for generating the
209	   authentication tag from the message, key and nonce.  The exact way in
210	   which the message, nonce and authentication tag are transmitted
211	   between sender and receiver is not specified here.  It is the
212	   responsibility of the particular applications using UMAC to specify
213	   how the message, nonce and tag are transmitted.  For example, an
214	   application may choose to send the three values concatenated by some
215	   encoding scheme while others may choose not to transmit the nonce at
216	   all if it is known to both parties (e.g., when the nonce is a shared
217	   state used to detect replay of messages), or to send only part of the
218	   bits of the nonce.

220	   Section 8 discusses security considerations that are important for
221	   the proper understanding and use of UMAC.

223	   To the authors' knowledge no ideas utilized in UMAC have been or will
224	   be patented.  To the best of the authors' knowledge, it should be
225	   possible to use UMAC immediately, without any intellectual property
226	   concerns.

228	   Public-domain reference code for UMAC is available from the UMAC
229	   homepage: http://www.cs.ucdavis.edu/~rogaway/umac/ Other information,
230	   like timing data and papers, are distributed from the same URL.

232	1.1  Organization

234	   The rest of this document is organized as follows:  In Section 2
235	   parameters of the named parameter sets UMAC16 and UMAC32 are
236	   described.  In Section 3 we introduce the basic notations used
237	   throughout the rest of the document.  Section 4 describes the methods
238	   used for generating the Vernam pad and the pseudorandom strings
239	   needed internally for hashing.  In Sections 5 and 6 the universal
240	   hash function is described.  Finally, in Section 7 we describe how
241	   all these components fit together in the UMAC construction.  Some
242	   readers may prefer to read sections 4-7 backwards, in order to get a
243	   top-down description.  Section 8 describes some security
244	   considerations in the use of UMAC.

246	2  Named parameter sets: UMAC16 and UMAC32

248	   As described in [3, 4], a concrete instantiation of UMAC requires the
249	   setting of many parameters.  We have chosen two sets of values for
250	   all of these parameters which allow for good performance on a wide
251	   variety of processors.  For maximum performance we offer UMAC16 which
252	   is designed to exploit the vector-parallel instructions on the Intel
253	   MMX and Motorola AltiVec instruction sets.  For good performance on
254	   processors which support 32- and 64-bit quantities well, we offer
255	   UMAC32.

257	2.1  Named parameters

259	   Throughout the algorithms described in this document, we have
260	   integrated most of the parameters required for a concrete UMAC
261	   instantiation as unnamed numeric constants.  However, we have named
262	   six parameters and assign them the following values depending on
263	   whether one wishes to use UMAC16 or UMAC32.

265	                                 UMAC16                 UMAC32
266	                                 ------                 ------
267	      WORD-LEN                        2                      4
268	      UMAC-OUTPUT-LEN                 8                      8
269	      L1-KEY-LEN                   1024                   1024
270	      UMAC-KEY-LEN                   16                     16
271	      ENDIAN-FAVORITE            LITTLE                 LITTLE
272	      L1-OPERATIONS-SIGN         SIGNED               UNSIGNED

274	   Here we give a brief explanation of the role each named parameter
275	   plays.

277	     WORD-LEN:           Specifies the size in bytes of a "word".  UMAC
278	                         will be significantly faster in execution if
279	                         the executing machine supports well certain
280	                         operations on datatypes of this size.  Note
281	                         that WORD-LEN is not necessarily the native
282	                         wordsize of the target machine (and on some
283	                         machines a smaller value turns out to be
284	                         preferable).

286	     UMAC-OUTPUT-LEN:    Specifies the length of the authentication tag
287	                         generated by UMAC, in bytes.

289	     L1-KEY-LEN:         Specifies the "block length," in bytes, on
290	                         which the hash-function initially operates.
291	                         This much storage (and then some) will be
292	                         needed in the run-time environment for UMAC's
293	                         internal keys.

295	     UMAC-KEY-LEN:       Specifies the length in bytes of the user-sup-
296	                         plied UMAC key.

298	     ENDIAN-FAVORITE:    Specifies which endian-orientation will be fol-
299	                         lowed in the reading of data to be hashed.
300	                         This need not be equal to the native endianess
301	                         of any specific machine running UMAC.

303	     L1-OPERATIONS-SIGN: Specifies whether the strings manipulated in
304	                         the hash-function are to be initially consid-
305	                         ered as signed or unsigned integers.

307	2.2  Alternative instantiations

309	   Although this document only specifies two named parameter sets, the
310	   named parameters could be altered to suit specific authentication
311	   needs which are not adequately served by either UMAC16 or UMAC32.
312	   Below, we list alternatives that are supported by this specification
313	   for each of the named parameters.

315	     WORD-LEN           ::= 2 | 4
316	     UMAC-OUTPUT-LEN    ::= 1 | 2 | ... | 31 | 32
317	     L1-KEY-LEN         ::= 32 | 64 | 128 | 256 | ... | 2^28
318	     UMAC-KEY-LEN       ::= 16 | 32
319	     ENDIAN-FAVORITE    ::= BIG | LITTLE
320	     L1-OPERATIONS-SIGN ::= SIGNED | UNSIGNED

322	   Roughly speaking, doubling UMAC-OUTPUT-LEN approximately doubles
323	   execution time and squares (ie. decreases) the probability of MAC
324	   forgery.  Setting ENDIAN-FAVORITE to BIG causes UMAC to perform
325	   better on big-endian processors rather than little-endian processors.
326	   Setting L1-OPERATIONS-SIGN to UNSIGNED slightly increases UMAC
327	   security at the expense of complicating implementations on systems
328	   which do not support unsigned integers well.  This effectively
329	   disallows the use of Intel's MMX instructions which only support
330	   signed integers.  Finally, increasing L1-KEY-LEN tends to speed tag
331	   generation on large messages, but requires more memory for processing
332	   and could potentially slow the processor by overflowing its cache.

334	2.3  Naming convention

336	   A concise shorthand may be used to specify an instance of UMAC.  The
337	   word "UMAC" followed by up to six parameters specifies unambiguously
338	   an instance of UMAC.  If only a prefix of the six parameters are
339	   written, it is implicitly specified that those missing parameters
340	   take on default values listed below. The format of the shorthand is
341	   "UMAC-w/l/n/k/s/e", and the meaning of the letters (and their
342	   defaults) is as follows:

344	       w = WORD-LEN            (4)
345	       l = UMAC-OUTPUT-LEN     (8)
346	       n = L1-KEY-LEN          (1024)
347	       k = UMAC-KEY-LEN        (16)
348	       s = L1-OPERATIONS-SIGN  (UNSIGNED)
349	       e = ENDIAN-FAVORITE     (LITTLE)

351	   Some examples

353	       UMAC-4/8/1024/16/UNSIGNED/LITTLE  (Same as named set "UMAC32" )
354	       UMAC-2/8/1024/16/SIGNED/LITTLE    (Same as named set "UMAC16" )
355	       UMAC-4/12                         ("UMAC32" with 96-bit output)
356	       UMAC-2/8/4096                     ("UMAC16" with 4K L1-key and)
357	                                         (unsigned L1-OPERATIONS     )

359	3  Notation and basic operations

361	   The specification of UMAC involves the manipulation of both strings
362	   and numbers.  String variables are denoted with initial capitals
363	   (upper-case), whereas numeric variables are denoted in all lower-
364	   case.  Global parameters are denoted in all capital letters.  Simple
365	   functions, like those for string-length and string-xor, are written
366	   with all lower-case, while the algorithms of UMAC are named in all
367	   upper-case.

369	   Whenever a variable is followed by an underscore ("_"), the
370	   underscore is intended to denote a subscript, with the subscripted
371	   expression needing to be evaluated to resolve the meaning of the
372	   variable.  For example, if i=2, then M_{2 * i} refers to the variable
373	   M_4.

375	   We now define some basic operations for manipulating strings and
376	   numbers, and for converting between the two.

378	3.1  Operations on strings

380	   In this specification, we view the messages to be hashed (as well as
381	   the keys used for hashing) as strings of bytes.  A "byte" is an 8-bit
382	   string.  The algorithms have been designed so that they are easily
383	   extendable to allow arbitrary bit-strings, if necessary.  We use the
384	   following notation to manipulate these strings.

386	     length(S):    The length of string S in bytes.

388	     zeroes(n):    The string made of n zero-bytes.

390	     S xor T:      The string which is the bitwise exclusive-or of S and
391	                   T.  Strings S and T must have the same length.

393	     S and T:      The string which is the bitwise conjunction of S and
394	                   T.  Strings S and T must have the same length.

396	     S[i]:         The i-th byte of the string S (indices begin at 1).

398	     S[i..j]:      The substring of S consisting of bytes i through j.

400	     S || T:       The string S concatenated with string T.

402	     zeropad(S,n): The string S, padded with zero-bytes to the nearest
403	                   non-zero multiple of n bytes.  Formally, zeropad(S,n)
404	                   = S || zeroes(i), where i is the smallest nonnegative
405	                   integer such that S || zeroes(i) is non-empty and n
406	                   divides length(S)+i.

408	3.1.1  ENDIAN-SWAP: Adjusting endian orientation

410	   This routine is used to make the data input to UMAC conform to the
411	   ENDIAN-FAVORITE global parameter.

413	3.1.1.1  Discussion

415	   The most time consuming portion of many UMAC computations involves
416	   the reading of key and message data from memory.  Because big- and
417	   little-endian computers will read these bytes differently, specifying
418	   a particular endian-orientation for UMAC could have significant
419	   performance ramifications.  If necessary, the key-bytes can be
420	   preprocessed once during key setup to eliminate the need for their
421	   reorientation during performance-critical tag generation.  But,
422	   message data presumably cannot be preprocessed.  Any reorientation
423	   needed for each message must be done during tag generation,
424	   introducing a significant penalty to computers whose native endian-
425	   orientation is opposite to that specified for UMAC.  Therefore, UMAC
426	   defines a parameter, ENDIAN-FAVORITE, which allows UMAC to be
427	   specified to favor big- or little-endian memory conventions.  If the
428	   parameter is set to favor little-endian computers, then we specify
429	   the reversal of the bytes of every word in the input message using
430	   the following support function.  By reversing the data in the
431	   specification, an implementation on a little-endian machine would in
432	   fact do nothing but read the input data using native-endian word
433	   loads.  The loads would automatically reverse the bytes within each
434	   word, fulfilling the requirements of the specification.  Any other
435	   endian reorientation needed to comply with the specification requires
436	   an insignificant amount of time during each tag calculation.

438	3.1.1.2  Interface

440	   Function Name:
441	     ENDIAN-SWAP
442	   Input:
443	     S, string with length divisible by WORD-LEN bytes.
444	   Output:
445	     T, string S with each word endian-reversed.

447	3.1.1.3  Algorithm

449	   Compute T using the following algorithm.

451	     //
452	     // Break S into word-size chunks
453	     //
454	     n = length(S) / WORD-LEN
455	     Let S_1, S_2, ..., S_n be strings of length WORD-LEN bytes
456	        so that S_1 || S_2 || .. || S_n = S.

458	     //
459	     // Byte-reverse each chunk, and build-up T
460	     //
461	     T = 
462	     for i = 1 to n do
463	       Let W_1, W_2, ..., W_{WORD-LEN}  be bytes
464	          so that W_1 || W_2 || ... || W_{WORD-LEN} = S_i
465	       SReversed_i = W_{WORD-LEN} || W_{WORD-LEN - 1} || ... || W_1
466	       T = T || SReversed_i

468	     Return T

470	3.2  Operations on integers

472	   In this specification, we generally use standard notation for
473	   mathematical operations, such as "*" for multiplication, "+" for
474	   addition and "mod" for modular reduction.  Some less standard
475	   notations are defined here.

477	     a^i:      The integer a raised to the integer i-th power.

479	     lg a:     The base-2 logarithm of integer a.

481	     floor(x): The largest integer less than or equal to x.

483	     ceil(x):  The smallest integer greater than or equal to x.

485	     prime(n): The largest prime number less than 2^n.

487	   The prime numbers used in UMAC are:

489	    +-----+--------------------+---------------------------------------+
490	    |  x  | prime(x) [Decimal] | prime(x) [Hexadecimal]                |
491	    +-----+--------------------+---------------------------------------+
492	    | 19  | 2^19  - 1          | 0x0007FFFF                            |
493	    | 32  | 2^32  - 5          | 0xFFFFFFFB                            |
494	    | 36  | 2^36  - 5          | 0x0000000F FFFFFFFB                   |
495	    | 64  | 2^64  - 59         | 0xFFFFFFFF FFFFFFC5                   |
496	    | 128 | 2^128 - 159        | 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFF61 |
497	    +-----+--------------------+---------------------------------------+

499	3.3  String-Integer conversion operations

501	   We convert between strings and integers using the following
502	   functions.  Each function treats initial bits as more significant
503	   than later ones.

505	     bit(S,n):      Returns the integer 1 if the n-th bit of the string
506	                    S is 1, otherwise returns the integer 0 (indices
507	                    begin at 1).  Here n must be between 1 and the bit-
508	                    length of S.

510	     str2uint(S):   The non-negative integer whose binary representation
511	                    is the string S.  More formally, if S is t bits long
512	                    then str2uint(S) = 2^{t-1} * bit(S,1) + 2^{t-2} *
513	                    bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t).

515	     uint2str(n,i): The i-byte string S such that str2uint(S) = n.  If
516	                    no such string exists then uint2str(n,i) is unde-
517	                    fined.

519	     str2sint(S):   The integer whose binary representation in two's-
520	                    complement is the string S.  More formally, if S is
521	                    t bits long then str2sint(S) = -2^{t-1} * bit(S,1) +
522	                    2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) +
523	                    bit(S,t).

525	     sint2str(n,i): The i-byte string S such that str2sint(S) = n.  If
526	                    no such string exists then sint2str(n,i) is unde-
527	                    fined.

529	3.4  Mathematical operations on strings

531	   One of the primary operations in the universal hashing part of UMAC
532	   is repeated application of addition and multiplication on strings.
533	   We use "+_n" and "*_n" to denote the string operations which are
534	   equivalent to addition and multiplication modulo 2^n, respectively.
535	   These operations correspond exactly with the addition and
536	   multiplication operations which are performed efficiently on
537	   registers by modern computers.  So, when n is 16, 32 or 64, these
538	   operations can be preformed by computers very quickly.

540	   There are two interpretations of the operators depending on whether
541	   the strings are interpreted as signed or unsigned integers.  The
542	   global parameter L1-OPERATIONS-SIGN determines which interpretation
543	   is made.

545	   If strings S and T are interpreted as signed integers (that is,
546	   L1-OPERATIONS-SIGN == SIGNED) then

548	     "S *_n T" as uint2str(str2sint(S) * str2sint(T) mod 2^n, n/8), and

550	     "S +_n T" as uint2str(str2sint(S) + str2sint(T) mod 2^n, n/8).

552	   If strings S and T are interpreted as unsigned integers (that is,
553	   L1-OPERATIONS-SIGN == UNSIGNED) then we define

555	     "S *_n T" as uint2str(str2uint(S) * str2uint(T) mod 2^n, n/8), and

557	     "S +_n T" as uint2str(str2uint(S) + str2uint(T) mod 2^n, n/8).

559	   In any case, the number n must be divisible by 8.  In this document
560	   we use S *_16 T, S *_32 T, S *_64 T, S +_32 T and S +_64 T,
561	   corresponding to multiplication of 2, 4 and 8 byte numbers, and the
562	   addition of 4 and 8 byte numbers.

564	4  Key and pad derivation functions

566	   UMAC, as described in this document, requires either a 16- or 32-byte
567	   key which is used with a key-derivation function (KDF) to produce
568	   pseudorandom bits needed within the universal hash function.

570	4.1  KDF: Key derivation function

572	   Stretching the user-supplied key into pseudorandom bits used
573	   internally by UMAC is done with a key-derivation function (KDF).  In
574	   this section we define a KDF which is efficiently instantiated with a
575	   block cipher.  The Advanced Encryption Standard (AES) is used in
576	   output-feedback mode to produce the required bits.  If UMAC-KEY-LEN
577	   is 16, then the 128-bit key/128-bit block-length variant of AES is
578	   used, and if UMAC-KEY-LEN is 32, then the 256-bit key/128-bit block-
579	   length variant is used.  The KDF requires an "index" parameter.
580	   Using the same key, but different indices, generates different
581	   pseudorandom outputs.

583	4.1.1  Interface

585	   Function Name:
586	     KDF
587	   Input:
588	     K, string of length UMAC-KEY-LEN bytes  // key to AES
589	     index, a non-negative integer less than 256.
590	     numbytes, a positive integer.
591	   Output:
592	     Y, string of length numbytes bytes.

594	4.1.2  Algorithm

596	   Compute Y using the following algorithm.

598	     //
599	     // Calculate number of AES iterations, set indexed starting point
600	     //
601	     n = ceil(numbytes / 16)
602	     T = zeroes(15) || uint2str(index, 1)
603	     Y = 

605	     //
606	     // Build Y using AES in a feedback mode
607	     //
608	     for i = 1 to n do
609	       T = AES(K, T)
610	       Y = Y || T

612	     Y = Y[1..numbytes]

614	     Return Y

616	4.2  PDF: Pad-derivation function

618	   The Wegman-Carter MAC scheme used in UMAC requires the exclusive-or
619	   of a pseudorandom string with the output from the universal hash
620	   function.  The pseudorandom string is obtained by applying a pad-
621	   derivation function (PDF) to a nonce which, for security reasons,
622	   must change with each authentication-tag computation.  Nonces may be
623	   any number of bytes from 1 to 16, but all nonces in a single
624	   authentication session must be of equal length.  In this section we
625	   define a PDF which is efficiently instantiated with a block cipher.
626	   Again we use AES with either 16- or 32-bytes keys depending on the
627	   value of UMAC-KEY-LEN.

629	4.2.1  Discussion

631	   The PDF output is exclusive-or'd with the result of the universal
632	   hash function.  AES, however, may provide more or fewer bits per
633	   invocation than are needed for this purpose.  For example, UMAC-
634	   OUTPUT-LEN is normally 8 bytes and AES produces an output of 16
635	   bytes.  It would save processing time if half of the AES output bits
636	   could be used to generate one tag, and then the second half of the
637	   same AES output could be used for the tag of the next message.  For
638	   this reason, we include an optimization which allows the use of
639	   different substrings of the same AES output.  This optimization is
640	   effective only when nonces are sequential.  We do so by using the low
641	   bits of the nonce as an index into the AES output, which is generated
642	   using the higher bits of the nonce which are not used for indexing.
643	   This speeds message authentication by reducing the average time spent
644	   by AES for each authentication.  Note that if a counter-variable is
645	   used to exploit this optimization, and the variable is stored in
646	   memory, then the variable must be treated as big-endian.  If UMAC-
647	   OUTPUT-LEN is larger than 16, then two AES invocations are required
648	   to produce a sufficient number of bits.

650	4.2.2  Interface

652	   Function Name:
653	     PDF
654	   Input:
655	     K, string of length UMAC-KEY-LEN bytes   // key for AES
656	     Nonce, string of length 1 to 16 bytes.
657	   Output:
658	     Y, string of length UMAC-OUTPUT-LEN bytes.

660	4.2.3  Algorithm

662	   Compute Y using the following algorithm.

664	      //
665	      // Make Nonce 16 bytes by prepending zeroes
666	      //
667	      Nonce = Nonce || zeroes(16 - length(Nonce))

669	      //
670	      // If one AES invocation is enough for more than one
671	      // PDF invocation.
672	      //
673	      if (UMAC-OUTPUT-LEN <= 8) then

675	         //
676	         // Compute number of index bits needed
677	         //
678	         i = floor(16 / UMAC-OUTPUT-LEN)
679	         numlowbits = floor(lg(i))

681	         //
682	         // Extract index bits and zero low bits of Nonce
683	         //
684	         nlowbitsnum = str2uint(Nonce) mod 2^numlowbits
685	         Nonce = Nonce xor uint2str(nlowbitsnum, 16)
686	         //
687	         // Generate subkey, AES and extract indexed substring
688	         //
689	         K' = KDF(K, 128, UMAC-KEY-LEN)
690	         T = AES(K', Nonce)
691	         Y = T[ nlowbitsnum      * UMAC-OUTPUT-LEN + 1 ..
692	               (nlowbitsnum + 1) * UMAC-OUTPUT-LEN]

694	      else

696	         //
697	         // Repeated AES calls to build length
698	         //
699	         K_1 = KDF(K, 128, UMAC-KEY-LEN)
700	         K_2 = KDF(K, 129, UMAC-KEY-LEN)
701	         if (UMAC-OUTPUT-LEN <= 16)
702	            Y = AES(K_1, Nonce)
703	         else
704	            Y = AES(K_1, Nonce) || AES(K_2, Nonce)
705	         Y = Y[1..UMAC-OUTPUT-LEN]

707	      Return Y

709	5  UHASH-32: Universal hash function for a 32-bit word size

711	   UHASH is a keyed hash function, which takes as input a string of
712	   arbitrary length, and produces as output a string of fixed length
713	   (such as 8 bytes).  The actual output length depends on the parameter
714	   UMAC-OUTPUT-LEN.

716	   UHASH has been shown to be epsilon-ASU ("Almost Strongly Universal"),
717	   where epsilon is a small (parameter-dependent) real number.
718	   Informally, saying that a keyed hash function is epsilon-ASU means
719	   that for any two distinct fixed input strings, the two outputs of the
720	   hash function with a random key "look almost like a pair of random
721	   strings".  The number epsilon measures how non-random the output
722	   strings may be.  For details, see [3, 4, 11].

724	   UHASH has been designed to be fast by exploiting several
725	   architectural features of modern commodity processors.  It was
726	   specifically designed for use in UMAC.  But UHASH is useful beyond
727	   that domain, and can be easily adopted for other purposes.

729	   UHASH does its work in three layers.  First, a hash function called
730	   NH [3] is used to compress input messages into strings which are
731	   typically many times smaller than the input message.  Second, the
732	   compressed message is hashed with an optimized "polynomial hash
733	   function" into a fixed-length 16-byte string.  Finally, the 16-byte
734	   string is hashed using an "inner-product hash" into a string of
735	   length WORD-LEN bytes.  These three layers are repeated (with a
736	   modified key) until the outputs total UMAC-OUTPUT-LEN bytes.

738	   Note: Because the repetitions of the three-layer scheme are
739	   independent (aside from sharing some internal key), it follows that
740	   each "word" of the final output can be computed independently.
741	   Hence, to compute a prefix of a UMAC tag, one can simply repeat the
742	   three-layer scheme fewer times.  Thus, computing a prefix of the tag
743	   can be done significantly faster than computing the whole tag.

745	5.1  NH-32: NH hashing with a 32-bit word size

747	   The first of the three hash-layers that UHASH uses is the NH hash
748	   function [3].  More than any other part of UHASH, NH is designed to
749	   be fast on modern processors, because it is where the bulk of the
750	   UHASH work is done.  The NH universal hash function hashes an input
751	   string M using a key K by considering M and K to be arrays of
752	   integers, each WORD-LEN bytes in length, and performing a sequence of
753	   arithmetic operations on them.  See [3] for definitions, proofs and
754	   rationale relating to NH.

756	   The NH-32 algorithm is designed to perform well on processors which
757	   support well multiplications of 32-bit operands into 64-bit results.
758	   NH-32 is also designed to exploit the recent trend of including
759	   instructions for small-scale vector parallelism in uniprocessor CPUs.
760	   Intel's Streaming SIMD 2 instruction set is a good example of this
761	   trend.  It supports an instruction, which multiplies two pairs of
762	   32-bit operands into two 64-bit results, which can be used by
763	   UHASH-32 for accelerated hashing.  To accommodate this parallelism,
764	   NH-32 accesses data-words in pairs which are 4 words (16 bytes)
765	   apart.

767	5.1.1  Interface

769	   Function Name:
770	     NH-32
771	   Input:
772	     K, string of length L1-KEY-LEN bytes.
773	     M, string with length divisible by 32 bytes.
774	   Output:
775	     Y, string of length 8 bytes.

777	5.1.2  Algorithm

779	   Compute Y using the following algorithm.

781	     //
782	     // Break M and K into 4-byte chunks
783	     //
784	     t = length(M) / 4
785	     Let M_1, M_2, ..., M_t be 4-byte strings
786	       so that M = M_1 || M_2 || .. || M_t.
787	     Let K_1, K_2, ..., K_t be 4-byte strings
788	       so that K_1 || K_2 || .. || K_t  is a prefix of K.

790	     //
791	     // Perform NH hash on the chunks, pairing words for multiplication
792	     // which are 4 apart to accommodate vector-parallelism.
793	     //
794	     Y = zeroes(8)
795	     i = 1
796	     while (i < t) do
797	       Y = Y +_64 ((M_{i+0} +_32 K_{i+0}) *_64 (M_{i+4} +_32 K_{i+4}))
798	       Y = Y +_64 ((M_{i+1} +_32 K_{i+1}) *_64 (M_{i+5} +_32 K_{i+5}))
799	       Y = Y +_64 ((M_{i+2} +_32 K_{i+2}) *_64 (M_{i+6} +_32 K_{i+6}))
800	       Y = Y +_64 ((M_{i+3} +_32 K_{i+3}) *_64 (M_{i+7} +_32 K_{i+7}))
801	       i = i + 8

803	     Return Y

805	5.2  L1-HASH-32: First-layer hash

807	   To limit the length of key required in the first layer of hashing,
808	   L1-HASH-32 breaks the input message into chunks no longer than
809	   L1-KEY-LEN and NH hashes each with a key of the same length.

811	5.2.1  Discussion

813	   The NH hash function requires a key which is just as long as the
814	   message being hashed.  To limit the amount of key used in the NH
815	   hashing layer, we use a key of fixed length (defined by the parameter
816	   L1-KEY-LEN), and process the message in chunks of this length (or
817	   less).  The L1-HASH-32 algorithm takes an input message and breaks it
818	   into chunks of L1-KEY-LEN bytes (except the last chuck, which may be
819	   shorter and may need to be zero-padded to an appropriate length).
820	   Each chunk is hashed with NH-32, and the outputs from all the NH
821	   invocations are annotated with some length information and
822	   concatenated to produce the final L1-HASH-32 result.

824	   If ENDIAN-FAVORITE is LITTLE, then each word in the input message is
825	   required to be endian reversed.

827	5.2.2  Interface

829	   Function Name:
830	     L1-HASH-32
831	   Input:
832	     K, string of length L1-KEY-LEN bytes.
833	     M, string of length less than 2^64 bytes.
834	   Output:
835	     Y, string of length (8 * ceil(length(M)/L1-KEY-LEN)) bytes.

837	5.2.3  Algorithm

839	   Compute Y using the following algorithm.

841	     //
842	     // Break M into L1-KEY-LEN byte chunks (final chunk may be shorter)
843	     //
844	     t = ceil(length(M) / L1-KEY-LEN)
845	     Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. ||
846	        M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t.

848	     //
849	     // For each chunk, except the last: endian-adjust, NH hash
850	     // and add bit-length.  Use results to build Y.
851	     //
852	     Len = uint2str(L1-KEY-LEN * 8, 8)
853	     Y = 
854	     for i = 1 to t-1 do
855	       if (ENDIAN-FAVORITE == LITTLE) then  // See endian discussion
856	          ENDIAN-SWAP(M_i)                  // in section 3.1.1
857	       Y = Y || (NH-32(K, M_i) +_64 Len)

859	     //
860	     // For the last chunk: pad to 32-byte boundary, endian-adjust,
861	     // NH hash and add bit-length.  Concatenate the result to Y.
862	     //
863	     Len = uint2str(length(M_t) * 8, 8)
864	     M_t = zeropad(M_t, 32)
865	     if (ENDIAN-FAVORITE == LITTLE) then
866	          ENDIAN-SWAP(M_t)
867	     Y = Y || (NH-32(K, M_t) +_64 Len)

869	     return Y

871	5.3  POLY: Polynomial hash

873	   The output from L1-HASH is a string which is shorter than, but still
874	   proportional to, that of its input.  The POLY hash algorithm takes an
875	   arbitrary message and hashes it to a fixed length.

877	5.3.1  Discussion

879	   Polynomial hashing treats an input message as a sequence of
880	   coefficients of a polynomial, and the hash-key is the point at which
881	   this polynomial is evaluated.  The security guarantee assured by
882	   polynomial hashing degrades linearly in the length of the message
883	   being hashed.  If two messages of n words are hashed, then the
884	   probability they collide when hashed by POLY with a prime modulus of
885	   p is no more than n / p.  For more information on the polynomial
886	   hashing schemes used in UMAC see [10].

888	   The parameter 'wordbits' specifies the prime modulus used in the
889	   polynomial as well as the granularity (length of words) in which the
890	   input message should be broken.  Because some strings of length
891	   wordbits are greater than prime(wordbits), a mechanism is needed to
892	   fix words which are not in the range 0 .. prime(wordbits) - 1.  To
893	   this end, any word larger than 'maxwordrange' is split into two words
894	   guaranteed to be in range, and each is hashed by the polynomial hash.

896	5.3.2  Interface

898	   Function Name:
899	     POLY
900	   Input:
901	     wordbits, positive integer divisible by 8.
902	     maxwordrange, positive integer less than 2^wordbits.
903	     k, integer in the range 0 .. prime(wordbits) - 1.
904	     M, string with length divisible by (wordbits / 8) bytes.
905	   Output:
906	     y, integer in the range 0 .. prime(wordbits) - 1.

908	5.3.3  Algorithm

910	   Compute y using the following algorithm.

912	     //
913	     // Define constants used for fixing out-of-range words
914	     //
915	     wordbytes = wordbits / 8
916	     p = prime(wordbits)
917	     offset = 2^wordbits - p
918	     marker = p - 1

920	     //
921	     // Break M into chunks of length wordbytes bytes
922	     //
923	     n = length(M) / wordbytes
924	     Let M_1, M_2, ..., M_n be strings of length wordbytes bytes
925	       so that M = M_1 || M_2 || .. || M_n

927	     //
928	     // For each input word, compare it with maxwordrange.  If larger
929	     // then hash the words 'marker' and (m - offset), both in range.
930	     //
931	     y = 1
932	     for i = 1 to n do
933	        m = str2uint(M_i)
934	        if (m >= maxwordrange) then
935	           y = (k * y + marker) mod p
936	           y = (k * y + (m - offset)) mod p
937	        else
938	           y = (k * y + m) mod p

940	     Return y

942	5.4  L2-HASH-32: Second-layer hash

944	   Because L1-HASH may produce quite long strings, and POLY's security
945	   guarantee degrades linearly, a scheme is required to allow long
946	   strings while ensuring that the collision probability never grows
947	   beyond a certain pre-set bound.  This is accomplished by dynamically
948	   increasing the prime modulus used in the polynomial hashing as the
949	   collision probability bound is approached.

951	5.4.1  Discussion

953	   The probability of two n-word messages hashing to the same result
954	   when polynomially hashed with prime modulus p is as much as (n / p).
955	   To maintain a limit on the maximum collision probability, a scheme is
956	   needed to disallow (n / p) growing too large.  The scheme used here
957	   hashes a number of words n_1 under modulus p_1 until (n_1 / p_1)
958	   reaches a critical point.  The result of the hash-so-far is prepended
959	   to the remaining message needing to be hashed, and the hashing
960	   continues, but under a prime modulus p_2 which is substantially
961	   larger than p_1.  Hashing continues for n_2 more words until (n_2 /
962	   p_2) also reaches a critical point, at which time a new larger prime
963	   p_3 could be used.

965	   Because polynomial hashing under a small prime modulus is often
966	   faster than hashing under a large one, this dynamic ramping-up of the
967	   polynomial's modulus provides a hash function which is faster on
968	   short messages, but still accommodates long ones.

970	   The keys used for polynomial hashing are restricted to particular
971	   subsets to allow for faster implementations on 32-bit architectures.
972	   The restrictions allow an implementor to disregard some potential
973	   arithmetic carries during computation.

975	   For more information see [10].

977	5.4.2  Interface

979	   Function Name:
980	     L2-HASH-32
981	   Input:
982	     K, string of length 24 bytes.
983	     M, string of length less than 2^64 bytes.
984	   Output:
985	     Y, string of length 16 bytes.

987	5.4.3  Algorithm

989	   Compute y using the following algorithm.

991	     //
992	     //  Extract keys and restrict to special key-sets
993	     //
994	     Mask64  = uint2str(0x01ffffff01ffffff, 8)
995	     Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16)
996	     k64    = str2uint(K[1..8]  and Mask64)
997	     k128   = str2uint(K[9..24] and Mask128)

999	     //
1000	     // If M no more than 2^17 bytes, hash under 64-bit prime,
1001	     // otherwise, hash first 2^17 bytes under 64-bit prime and
1002	     // remainder under 128-bit prime.
1003	     //
1004	     if (length(M) <= 2^17) then             // 2^14 64-bit words

1006	        //
1007	        // View M as an array of 64-bit words, and use POLY modulo
1008	        // prime(64) (and with bound 2^64 - 2^32) to hash it.
1009	        //
1010	        y = POLY(64, 2^64 - 2^32,  k64, M)

1012	     else

1014	        M_1 = M[1 .. 2^17]
1015	        M_2 = M[2^17 + 1 .. length(M)]
1016	        M_2 = zeropad(M_2 || uint2str(0x80,1), 16)
1017	        y = POLY(64, 2^64 - 2^32, k64, M_1)
1018	        y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_2)

1020	     Y = uint2str(y, 16)

1022	     Return Y

1024	5.5  L3-HASH-32: Third-layer hash

1026	   The output from L2-HASH-32 is 16 bytes long.  This final hash
1027	   function hashes the 16-byte string to a fixed length of 4 bytes using
1028	   a simple inner-product hash with affine translation.  A 36-bit prime
1029	   modulus is used to improve security.

1031	5.5.1  Interface

1033	   Function Name:
1034	     L3-HASH-32
1035	   Input:
1036	     K1, string of length 64 bytes.
1037	     K2, string of length 4 bytes.
1038	     M, string of length 16 bytes.
1039	   Output:
1040	     Y, string of length 4 bytes.

1042	5.5.2  Algorithm

1044	   Compute Y using the following algorithm.

1046	     y = 0

1048	     //
1049	     // Break M and K1 into 8 chunks and convert to integers
1050	     //
1051	     for i = 1 to 8 do
1052	       M_i = M [(i - 1) * 2 + 1 .. i * 2]
1053	       K_i = K1[(i - 1) * 8 + 1 .. i * 8]
1054	       m_i = str2uint(M_i)
1055	       k_i = str2uint(K_i) mod prime(36)

1057	     //
1058	     // Inner-product hash, extract last 32 bits and affine-translate
1059	     //
1060	     y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(36)
1061	     y = y mod 2^32
1062	     Y = uint2str(y, 4)
1063	     Y = Y xor K2

1065	     Return Y

1067	5.6  UHASH-32: Three-layer universal hash

1069	   The hash functions L1-HASH, L2-HASH and L3-HASH are used together in
1070	   a straightforward manner.  A message is first hashed by L1-HASH, its
1071	   output is then hashed by L2-HASH, whose output is then hashed by
1072	   L3-HASH.  If the message being hashed is no longer than L1-KEY-LEN
1073	   bytes, then L2-HASH is skipped as an optimization.  Because L3-HASH
1074	   outputs a string whose length is only WORD-LEN bytes long, multiple
1075	   iterations of this three-layer hash are used, with different keys
1076	   each time, until UMAC-OUTPUT-LEN have been generated.  To reduce
1077	   memory requirements, L1-HASH and L3-HASH both reuse most of their
1078	   key-material between iterations.

1080	5.6.1  Interface

1082	   Function Name:
1083	     UHASH-32
1084	   Input:
1085	     K, string of length UMAC-KEY-LEN bytes.
1086	     M, string of length less than 2^64 bytes.
1087	   Output:
1088	     Y, string of length UMAC-OUTPUT-LEN bytes.

1090	5.6.2  Algorithm

1092	   Compute Y using the following algorithm.

1094	     //
1095	     // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes
1096	     //
1097	     streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN)

1099	     //
1100	     // Define total key needed for all iterations using KDF.
1101	     // L1Key and L3Key1 both reuse most key between iterations.
1102	     //
1103	     L1Key  = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16)
1104	     L2Key  = KDF(K, 1, streams * 24)
1105	     L3Key1 = KDF(K, 2, streams * 64)
1106	     L3Key2 = KDF(K, 3, streams * 4)

1108	     //
1109	     // For each iteration, extract key and three-layer hash.
1110	     // If length(M) <= L1-KEY-LEN, then skip L2-HASH.
1111	     //
1112	     Y = 
1113	     for i = 1 to streams do
1114	       L1Key_i  = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN]
1115	       L2Key_i  = L2Key [(i-1) * 24 + 1 .. i * 24]
1116	       L3Key1_i = L3Key1[(i-1) * 64 + 1 .. i * 64]
1117	       L3Key2_i = L3Key2[(i-1) * 4  + 1 .. i * 4]

1119	       A = L1-HASH-32(L1Key_i, M)
1120	       if (length(M) <= L1-KEY-LEN) then
1121	         B = zeroes(8) || A
1122	       else
1123	         B = L2-HASH-32(L2Key_i, A)
1124	       C = L3-HASH-32(L3Key1_i, L3Key2_i, B)
1125	       Y = Y || C
1126	     Y = Y[1 .. UMAC-OUTPUT-LEN]

1128	     Return Y

1130	6  UHASH-16: Universal hash function for a 16-bit word size

1132	   See Section 5 (UHASH-32) for general discussion of the UHASH
1133	   algorithm.  Each sub-section of Section 6 will note only differences
1134	   between UHASH-32 and UHASH-16.

1136	6.1  NH-16: NH hashing with a 16-bit word size

1138	   The NH-16 algorithm is designed to exploit the recent trend of
1139	   including instructions for small-scale vector parallelism in
1140	   uniprocessor CPUs.  Intel's MMX and Mororola's AltiVec instruction
1141	   sets are good examples of this trend.  Both support single-
1142	   instruction multiply-add instructions on vectors of 16-bit words
1143	   which can be used by UHASH-16 for accelerated hashing.  To
1144	   accommodate this parallelism, NH-16 accesses data-words in pairs
1145	   which are 8 words (16 bytes) apart.

1147	6.1.1  Interface

1149	   Function Name:
1150	     NH-16
1151	   Input:
1152	     K, string of length L1-KEY-LEN bytes.
1153	     M, string with length divisible by 32 bytes.
1154	   Output:
1155	     Y, string of length 4 bytes.

1157	6.1.2  Algorithm

1159	   Compute Y using the following algorithm.

1161	     //
1162	     // Break M and K into 2-byte chunks
1163	     //
1164	     t = length(M) / 2
1165	     Let M_1, M_2, ..., M_t be 2-byte strings
1166	       so that M = M_1 || M_2 || .. || M_t.
1167	     Let K_1, K_2, ..., K_t be 2-byte strings
1168	       so that K_1 || K_2 || .. || K_t  is a prefix of K.

1170	     //
1171	     // Perform NH hash on the chunks, pairing words for multiplication
1172	     // which are 8 apart to accommodate vector-parallelism.
1173	     //
1174	     Y = zeroes(4)
1175	     i = 1
1176	     while (i < t) do
1177	       Y = Y +_32 ((M_{i+0} +_16 K_{i+0}) *_32 (M_{i+ 8} +_16 K_{i+ 8}))
1178	       Y = Y +_32 ((M_{i+1} +_16 K_{i+1}) *_32 (M_{i+ 9} +_16 K_{i+ 9}))
1179	       Y = Y +_32 ((M_{i+2} +_16 K_{i+2}) *_32 (M_{i+10} +_16 K_{i+10}))
1180	       Y = Y +_32 ((M_{i+3} +_16 K_{i+3}) *_32 (M_{i+11} +_16 K_{i+11}))
1181	       Y = Y +_32 ((M_{i+4} +_16 K_{i+4}) *_32 (M_{i+12} +_16 K_{i+12}))
1182	       Y = Y +_32 ((M_{i+5} +_16 K_{i+5}) *_32 (M_{i+13} +_16 K_{i+13}))
1183	       Y = Y +_32 ((M_{i+6} +_16 K_{i+6}) *_32 (M_{i+14} +_16 K_{i+14}))
1184	       Y = Y +_32 ((M_{i+7} +_16 K_{i+7}) *_32 (M_{i+15} +_16 K_{i+15}))
1185	       i = i + 16

1187	     Return Y

1189	6.2  L1-HASH-16: First-layer hash

1191	   To limit the length of key required in the first layer of hashing,
1192	   L1-HASH-16 breaks the input message into chunks no longer than
1193	   L1-KEY-LEN bytes and NH hashes each with a key of that same length.

1195	6.2.1  Interface

1197	   Function Name:
1198	     L1-HASH-16
1199	   Input:
1200	     K, string of length L1-KEY-LEN bytes.
1201	     M, string of length less than 2^64 bytes.
1202	   Output:
1203	     Y, string of length (4 * ceil(length(M)/L1-KEY-LEN)) bytes.

1205	6.2.2  Algorithm

1207	   Compute Y using the following algorithm.

1209	     //
1210	     // Break M into L1-KEY-LEN byte chunks (final chunk may be shorter)
1211	     //
1212	     t = ceil(length(M) / L1-KEY-LEN)
1213	     Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. ||
1214	       M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t.

1216	     //
1217	     // For each chunk, except the last: endian-adjust, NH hash
1218	     // and add bit-length.  Use results to build Y.
1219	     //
1220	     Len = uint2str(L1-KEY-LEN * 8, 4)
1221	     Y = 
1222	     for i = 1 to t-1 do
1223	       if (ENDIAN-FAVORITE == LITTLE) then  // See endian discussion
1224	          ENDIAN-SWAP(M_i)                  // in section 3.1.1
1225	       Y = Y || (NH-16(K, M_i) +_32 Len)

1227	     //
1228	     // For the last chunk: pad to 32-byte boundary, endian-adjust,
1229	     // NH hash and add bit-length.  Concatenate the result to Y.
1230	     //
1231	     Len = uint2str(length(M_t) * 8, 4)
1232	     M_t = zeropad(M_t, 32)
1233	     if (ENDIAN-FAVORITE == LITTLE) then
1234	        ENDIAN-SWAP(M_t)
1235	     Y = Y || (NH-16(K, M_t) +_32 Len)

1237	     return Y

1239	6.3  L2-HASH-16: Second-layer hash

1241	   L2-HASH-16 differs from L2-HASH-32 by beginning the ramped hash with
1242	   a smaller prime modulus.  See Section 5.3 for the definition of POLY.

1244	6.3.1  Interface

1246	   Function Name:
1247	     L2-HASH-16
1248	   Input:
1249	     K, string of length 28 bytes.
1250	     M, string of length less than 2^64 bytes.
1251	   Output:
1252	     Y, string of length 16 bytes.

1254	6.3.2  Algorithm

1256	   Compute Y using the following algorithm.

1258	     //
1259	     //  Extract keys and restrict to special key-sets
1260	     //
1261	     Mask32  = uint2str(0x1fffffff, 4)
1262	     Mask64  = uint2str(0x01ffffff01ffffff, 8)
1263	     Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16)
1264	     k_32    = str2uint(K[1..4]   and Mask32)
1265	     k64    = str2uint(K[5..12]  and Mask64)
1266	     k128   = str2uint(K[13..28] and Mask128)

1268	     //
1269	     // If M no more than 2^11 bytes, hash under 32-bit prime.
1270	     // Otherwise, hash under increasingly long primes.
1271	     //
1272	     if (length(M) <= 2^11) then         // 2^9 32-bit words

1274	        y = POLY(32, 2^32 - 6, k_32, M)

1276	     else if (length(M) <= 2^33) then    // 2^31 32-bit words

1278	        M_1 = M[1 .. 2^11]
1279	        M_2 = M[2^11 + 1 .. length(M)]
1280	        M_2 = zeropad(M_2 || uint2str(0x80,1), 8)
1281	        y = POLY(32, 2^32 - 6,    k_32, M_1)
1282	        y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2)

1284	     else
1285	        M_1 = M[1 .. 2^11]
1286	        M_2 = M[2^11 + 1 .. 2^33]
1287	        M_3 = M[2^33 + 1 .. length(M)]
1288	        M_3 = zeropad(M || uint2str(0x80,1), 16)
1289	        y = POLY(32,  2^32 - 6,     k_32,  M_1)
1290	        y = POLY(64,  2^64 - 2^32,  k64,  uint2str(y,  8) || M_2)
1291	        y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_3)

1293	     Y = uint2str(y, 16)

1295	     Return Y

1297	6.4  L3-HASH-16: Third-layer hash

1299	   The L3-HASH-16 algorithm differs from L3-HASH-32 by hashing under a
1300	   19-bit prime modulus (instead of a 36-bit prime modulus) and then
1301	   returning a 2-byte result (instead of a 4-byte result).

1303	6.4.1  Interface

1305	   Function Name:
1306	     L3-HASH-16
1307	   Input:
1308	     K1, string of length 32 bytes.
1309	     K2, a string of length 2 bytes.
1310	     M, a string of length 16 bytes.
1311	   Output:
1312	     Y, a string of length 2 bytes.

1314	6.4.2  Algorithm

1316	   Compute Y using the following algorithm.

1318	     y = 0

1320	     //
1321	     // Break M and K1 into 8 chunks and convert to integers
1322	     //
1323	     for i = 1 to 8 do
1324	       M_i = M[(i - 1) * 2 + 1 .. i * 2]
1325	       K_i = K1[(i - 1) * 4 + 1 .. i * 4]
1326	       m_i = str2uint(M_i)
1327	       k_i = str2uint(K_i) mod prime(19)

1329	     //
1330	     // Inner-product hash, extract last 32 bits and affine-translate
1331	     //
1332	     y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(19)
1333	     y = y mod 2^16
1334	     Y = uint2str(y, 2)
1335	     Y = Y xor K2

1337	     Return Y

1339	6.5  UHASH-16: Three-layer universal hash

1341	   The algorithm UHASH-16 differs from UHASH-32 only in the size of its
1342	   keys generated, and in that it refers to the 16-bit variants of the
1343	   three-layer hash functions.

1345	6.5.1  Interface

1347	   Function Name:
1348	     UHASH-16
1349	   Input:
1350	     K, string of length UMAC-KEY-LEN bytes.
1351	     M, string of length less than 2^64 bytes.
1352	   Output:
1353	     Y, string of length UMAC-OUTPUT-LEN bytes.

1355	6.5.2  Algorithm

1357	   Compute Y using the following algorithm.

1359	     //
1360	     // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes
1361	     //
1362	     streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN)

1364	     //
1365	     // Define total key needed for all iterations using KDF.
1366	     // L1Key and L3Key1 both reuse most key between iterations.
1367	     //
1368	     L1Key  = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16)
1369	     L2Key  = KDF(K, 1, streams * 28)
1370	     L3Key1 = KDF(K, 2, streams * 32)
1371	     L3Key2 = KDF(K, 3, streams * 2)

1373	     //
1374	     // For each iteration, extract key and three-layer hash.

1376	     // If length(M) <= L1-KEY-LEN, then skip L2-HASH.
1377	     //
1378	     Y = 
1379	     for i = 1 to streams do
1380	       L1Key_i  = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN]
1381	       L2Key_i  = L2Key [(i-1) * 28 + 1 .. i * 28]
1382	       L3Key1_i = L3Key1[(i-1) * 32 + 1 .. i * 32]
1383	       L3Key2_i = L3Key2[(i-1) * 2 + 1  .. i * 2]

1385	       A = L1-HASH-16(L1Key_i, M)
1386	       if (length(M) <= L1-KEY-LEN) then
1387	         B = zeroes(12) || A
1388	       else
1389	         B = L2-HASH-16(L2Key_i, A)
1390	       C = L3-HASH-16(L3Key1_i, L3Key2_i, B)
1391	       Y = Y || C
1392	     Y = Y[1 .. UMAC-OUTPUT-LEN]

1394	     Return Y

1396	7  UMAC tag generation

1398	   Tag generation for UMAC proceeds as follows.  Use UHASH to hash the
1399	   message and apply the PDF to the nonce to produce a string to xor
1400	   with the UHASH output.  The resulting string is the authentication-
1401	   tag.

1403	7.1  Interface

1405	   Function Name:
1406	     UMAC
1407	   Input:
1408	     K, string of length UMAC-KEY-LEN bytes.
1409	     M, string of length less than 2^64 bytes.
1410	     Nonce, string of length 1 to 16 bytes.
1411	   Output:
1412	     AuthTag, string of length UMAC-OUTPUT-LEN bytes.

1414	7.2  Algorithm

1416	   Compute AuthTag using the following algorithm.

1418	     if (WORD-LEN == 2) then
1419	        HashedMessage = UHASH-16(K, M)
1420	     else
1421	        HashedMessage = UHASH-32(K, M)

1423	     Pad              = PDF(K, Nonce)
1424	     AuthTag          = Pad xor HashedMessage

1426	     Return AuthTag

1428	8  Security considerations

1430	   As a specification of a message authentication code, this entire
1431	   document is about security.  Here we describe some security
1432	   considerations important for the proper understanding and use of
1433	   UMAC.

1435	8.1  Resistance to cryptanalysis

1437	   The strength of UMAC depends on the strength of its underlying
1438	   cryptographic functions: the key-derivation function (KDF) and the
1439	   pad-derivation function (PDF).  In this specification it is assumed
1440	   that both operations are implemented using the Advanced Encryption
1441	   Standard (AES).  However, the full design and specification allow for
1442	   the replacement of these components.  Indeed, it is straightforward
1443	   to use other block ciphers or other cryptographic objects, such as
1444	   SHA-1 or HMAC for the realization of the KDF or PDF.

1446	   The core of the UMAC design, the UHASH function, does not depend on
1447	   any "cryptographic assumptions": its strength is specified by a
1448	   purely mathematical property stated in terms of collision
1449	   probability, and this property is proven in an absolute sense.  In
1450	   this way, the strength of UHASH is guaranteed regardless of future
1451	   advances in cryptanalysis.

1453	   The analysis of UMAC [3, 4] shows this scheme to have "provable
1454	   security", in the sense of modern cryptography, by way of tight
1455	   reductions.  What this means is that an adversarial attack on UMAC
1456	   which forges with probability significantly exceeding the established
1457	   collision probability will give rise to an attack of comparable
1458	   complexity which breaks the AES, in the sense of distinguishing AES
1459	   from a family of random permutations.  This design approach
1460	   essentially obviates the need for cryptanalysis on UMAC:
1461	   cryptanalytic efforts might as well focus on AES, the results imply.

1463	8.2  Tag lengths and forging probability

1465	   A MAC algorithm is used between two parties that share a secret MAC
1466	   key, K.  Messages transmitted between these parties are accompanied
1467	   by authentication tags computed using K and a given nonce.  Breaking
1468	   the MAC means that the attacker is able to generate, on its own, a
1469	   new message M (i.e. one not previously transmitted between the
1470	   legitimate parties) and to compute on M a correct authentication tag
1471	   under the key K.  This is called a forgery.  Note that if the
1472	   authentication tag is specified to be of length t then the attacker
1473	   can trivially break the MAC with probability 1/2^t.  For this the
1474	   attacker can just generate any message of its choice and try a random
1475	   tag; obviously, the tag is correct with probability 1/2^t.  By
1476	   repeated guesses the attacker can increase linearly its probability
1477	   of success.

1479	   UMAC is designed to make this guessing strategy the best possible
1480	   attack against UMAC as long as the attacker does not invest the
1481	   computational effort needed to break the underlying cipher, e.g. AES,
1482	   used to produce the one time pads used in UMAC computation.  More
1483	   precisely, under the assumed strength of this cipher UMAC provides
1484	   for close-to-optimal security with regards to forgery probability as
1485	   represented in the next table.

1487	    --------------------------------------------------------------------
1488	     UHASH-OUTPUT-LEN  Forging probability   Approximate actual forging
1489	         (bytes)       using a random tag    probability in UMAC
1490	                                             (using a clever tag)

1492	            2              2^-16                    2^-15
1493	            4              2^-32                    2^-30
1494	            8              2^-64                    2^-60
1495	           16              2^-128                   2^-120
1496	    --------------------------------------------------------------------

1498	   Recall that the parameter UHASH-OUTPUT-LEN specifies the length of
1499	   the UMAC authentication tag.  The above table states, for example,
1500	   for the case of an 8-byte tag that the ideal forging probability
1501	   would be 2^-64 while UMAC would withstand an actual forging
1502	   probability of 2^-60.  Note that under this tag length (which is the
1503	   default length in UMAC) the probability of forging a message is well
1504	   under the chance that a randomly guessed DES key is correct.  DES is
1505	   now widely seen as vulnerable, but the problem has never been that
1506	   some extraordinarily lucky attacker might, in a single guess, find
1507	   the right key.  Instead, the problem is that large amounts of
1508	   computation can be thrown at the problem until, off-line, the
1509	   attacker finds the right key.

1511	   With  UMAC, off-line computation aimed at exceeding the forging
1512	   probability is hopeless, regardless of tag length, as long as the
1513	   underlying cipher is not broken.  The only way to forge is to
1514	   interact with the entity that verifies the MAC and to try a huge
1515	   amount of forgeries before one is likely to succeed.  The system
1516	   architecture will determine the extent to which this is possible.  In
1517	   a well-architected system there should not be any high-bandwidth
1518	   capability for presenting forged MACs and determining if they are
1519	   valid.  In particular, the number of authentication failures at the
1520	   verifying party should be limited.  If a large number of such
1521	   attempts are detected the session key in use should be dropped and
1522	   the event be recorded in an audit log.

1524	   Let us reemphasize: a forging probability of 1 / 2^60 does not mean
1525	   that there is an attack that runs in 2^60 time - as long as AES
1526	   maintains its believed security there is no such attack for UMAC.
1527	   Instead, a 1 / 2^60 forging probability means that if an attacker
1528	   could try out 2^60 MACs, then the attacker would probably get one
1529	   right.  But the architecture of any real system should render this
1530	   infeasible.  One can certainly imagine an attacker having a high
1531	   bandwidth channel (e.g., 1 Gbit/second or more) over which it can
1532	   continually present attempted forgeries, the attacker being signaled
1533	   when a correct tag is found, but realizing such a scenario in a real
1534	   system would represent a major lapse in the security architecture.

1536	   It should be pointed out that once an attempted forgery is
1537	   successful, it is entirely possible that all subsequent messages
1538	   under this key may be forged, too.  This is important to
1539	   understanding in gauging the severity of a successful forgery.

1541	   In conclusion, the default 64-bit tags seem appropriate for most
1542	   security architectures and applications.  In cases where when the
1543	   consequences of an authentication failure are not extremely severe,
1544	   and when the system architecture is designed to conscientiously limit
1545	   the number of forgery attempts before a session is torn down, 32-bit
1546	   authentication tags may be adequate.  For the paranoid, or if an
1547	   attacker is allowed a fantastic number of forgery tests, 96- or
1548	   128-bits may be utilized.

1550	8.3  Selective-assurance authentication

1552	   We have already remarked about the flexibility built into UMAC to use
1553	   authentication tags of various lengths:  shorter tags are faster to
1554	   compute and one needs to transmit fewer bits, but the forging
1555	   probability is higher.  There is an additional degree of flexibility
1556	   built into the design of UMAC: even if the sender generates and
1557	   transmits a tag of 8 bytes, say, a receiver may elect to verify only
1558	   the first 4 bytes of the tag, and computing that 4-byte prefix by the
1559	   receiver will be substantially faster than computing what the full
1560	   8-byte tag would be.  Indeed when WORD-LEN is 2 one can more quickly
1561	   check the 2-byte prefix of the tag than the 4-byte prefix of the tag,
1562	   one can more quickly check the 4-byte prefix of the tag than the
1563	   6-byte prefix of the tag, and so forth.   When WORD-LEN is 4 one can
1564	   more quickly check the 4-byte prefix of the tag than an entire 8-byte
1565	   tag, and so forth.  This type of flexibility allows different parties
1566	   who receive a MAC (as in a multicast setting) to authenticate the
1567	   transmission to the extent deemed necessary and to the extent
1568	   consistent with any computational limits of the receiver.

1570	   In a scenario where receivers are allowed to verify short prefixes of
1571	   longer tags, it is even more important that conservative policies are
1572	   followed when a bad tag is presented to the receiver.  Because short
1573	   prefixes are easier to forge than are long ones, an attacker may
1574	   attempt to forge short prefixes and then leverage information gained
1575	   from these attacks to forge longer tags.  If the attacker can learn
1576	   which short tags are good and which are bad, the attacker may be able
1577	   to learn enough to allow longer forgeries.

1579	   One salient feature of the security-performance trade-off offered by
1580	   UMAC is its usability in contexts where performance is severely
1581	   constrained.  In such cases, using a mild-security authentication tag
1582	   can be of significant value especially if the alternative would be
1583	   not to use authentication at all (a possible such scenario could be
1584	   the high-speed transmission of real-time multimedia streams).
1585	   Another potential scenario where short and fast-to-compute tags can
1586	   be useful is for fast detection of data forgery intended as a denial
1587	   of service attack.  In this case, even a moderate increase in the
1588	   attacker's difficulty to produce forgeries may suffice to make the
1589	   attack worthless for the attacker.  Moreover, being able to detect
1590	   just a portion of attempted forgeries may be enough to identify the
1591	   attack.

1593	8.4  Nonce considerations

1595	   The function UMAC (Section 7) requires a nonce with length in the
1596	   range 1 to 16 bytes.  All nonces in an authentication session must be
1597	   equal in length.  For secure operation, no nonce value should be
1598	   repeated within the life of a single UMAC session-key.

1600	   To authenticate messages over a duplex channel (where two parties
1601	   send messages to each other), a different key could be used for each
1602	   direction.  If the same key is used in both directions, then it is
1603	   crucial that all nonces be distinct.  For example, one party can use
1604	   even nonces while the other party uses odd ones.  The receiving party
1605	   must verify that the sender is using a nonce of the correct form.

1607	   This specification does not indicate how nonce values are created,
1608	   updated, or communicated between the entity producing a tag and the
1609	   entity verifying a tag.  The following exemplify some of the
1610	   possibilities:

1612	   1.  The nonce is an eight-byte [resp., four-byte] unsigned number,
1613	       Counter, which is initialized to zero, which is incremented by
1614	       one following the generation of each authentication tag, and
1615	       which is always communicated along with the message and the
1616	       authentication tag.  An error occurs at the sender if there is an
1617	       attempt to authenticate more than 2^64 [resp., 2^32] messages
1618	       within a session.

1620	   2.  The nonce is a 16-byte unsigned number, Counter, which is
1621	       initialized to zero and which is incremented by one following the
1622	       generation of each authentication tag.  The Counter is not
1623	       explicitly communicated between the sender and receiver.
1624	       Instead, the two are assumed to communicate over a reliable
1625	       transport, and each maintains its own counter so as to keep track
1626	       of what the current nonce value is.

1628	   3.  The nonce is a 16-byte random value.  (Because repetitions in a
1629	       random n-bit value are expected at around 2^(n/2) trials, the
1630	       number of messages to be communicated in a session using n-bit
1631	       nonces should not be allowed to approach 2^(n/2).)

1633	   We emphasize that the value of the nonce need not be kept secret.

1635	   When UMAC is used within a higher-level protocol there may already be
1636	   a field, such as a sequence number, which can be co-opted so as to
1637	   specify the nonce needed by UMAC.  The application will then specify
1638	   how to construct the nonce from this already-existing field.

1640	   Note that if the nonce starts at zero and is incremented with each
1641	   message then an attacker can easily ascertain the number of messages
1642	   which have been sent during a session.  If this is information which
1643	   one wishes to deny the attacker then one might have the sender
1644	   initialize the nonce to a random value, rather than to zero.
1645	   Inspecting the current nonce will no longer reveal to the attacker
1646	   the number of messages which have been sent during this session.
1647	   This is a computationally cheaper approach than enciphering the
1648	   nonce.

1650	8.5  Guarding against replay attacks

1652	   A replay attack entails the attacker repeating a message, nonce, and
1653	   authentication tag.  In systems, replay attacks may be quite
1654	   damaging, and many applications will want to guard against them.  In
1655	   UMAC, this would normally be done at the receiver by having the
1656	   receiver check that no nonce value is used twice.  On a reliable
1657	   connection, when the nonce is a counter, this is trivial.  On an
1658	   unreliable connection, when the nonce is a counter, one would
1659	   normally cache some "window" of recent nonces.  Out-of-order message
1660	   delivery in excess of what the window allows will result in rejecting
1661	   otherwise valid authentication tags.

1663	   We emphasize that it is up to the receiver when a given message,
1664	   nonce and tag will be deemed authentic.  Certainly the tag should be
1665	   valid for the message and nonce, as determined by UMAC, but the
1666	   message may still be deemed inauthentic because the nonce is detected
1667	   to be a replay.

1669	9  Acknowledgments

1671	   Thanks are due to David Balenson and David Carman of NAI Labs, who
1672	   suggested the advantages of allowing a receiver to verify
1673	   authentication tags to various forgery probabilities.  Thanks are
1674	   also due to David McGrew and Scott Fluhrer of Cisco Systems for
1675	   encouraging us to improve UMAC performance on short messages.

1677	   Phillip Rogaway, John Black, and Ted Krovetz were supported in this
1678	   work under Rogaway's NSF CAREER Award CCR-962540, and under MICRO
1679	   grants 97-150, 98-129, and 99-103 funded by RSA Data Security, Inc.,
1680	   and ORINCON Corporation.  Much of Rogaway's work was carried out
1681	   during two sabbatical visits to Chiang Mai University. Special thanks
1682	   to Prof. Darunee Smawatakul for helping to arrange these visits.

1684	10 References

1686	   [1]   ANSI X9.9, "American National Standard for Financial
1687	         Institution Message Authentication (Wholesale)", American
1688	         Bankers Association, 1981.  Revised 1986.

1690	   [2]   M. Bellare, R. Canetti, and H. Krawczyk, "Keyed hash functions
1691	         and message authentication", Advances in Cryptology - CRYPTO
1692	         '96, LNCS vol. 1109, pp. 1-15. Full version available from
1693	         http://www.research.ibm.com/security/keyed-md5.html/

1695	   [3]   J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway,
1696	         "UMAC: Fast and provably secure message authentication",
1697	         Advances in Cryptology - CRYPTO '99, LNCS vol. 1666, pp.
1698	         216-233. Full version available from
1699	         http://www.cs.ucdavis.edu/~rogaway/umac

1701	   [4]   J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway,
1702	         "The UMAC message authentication code", work in progress, 2000.

1704	         To be available from http://www.cs.ucdavis.edu/~rogaway/umac

1706	   [5]   L. Carter and M. Wegman, "Universal classes of hash functions",
1707	         Journal of Computer and System Sciences, 18 (1979), pp.
1708	         143-154.

1710	   [6]   O. Goldreich, S. Goldwasser and  S. Micali, "How to construct
1711	         random functions", Journal of the ACM, 33, No. 4 (1986), pp.
1712	         210-217.

1714	   [7]   S. Halevi and H. Krawczyk,  "MMH: Software message
1715	         authentication in the Gbit/second rates", Fast Software
1716	         Encryption, LNCS Vol. 1267, Springer-Verlag, pp. 172-189, 1997.

1718	   [8]   ISO/IEC 9797-1, "Information technology - Security techniques -
1719	         Data integrity mechanism using a cryptographic check function
1720	         employing a block cipher algorithm", International Organization
1721	         for Standardization, 1999.

1723	   [9]   H. Krawczyk, M. Bellare, and R. Canetti, "HMAC: Keyed-hashing
1724	         for message authentication", RFC-2104, February 1997.

1726	   [10]  T. Krovetz, and P. Rogaway, "Fast universal hashing with small
1727	         keys and no preprocessing", work in progress, 2000.  To be
1728	         available from http://www.cs.ucdavis.edu/~rogaway/umac

1730	   [11]  T. Krovetz, and P. Rogaway, "Variationally universal hashing",
1731	         work in progress, 2000.  To be available from
1732	         http://www.cs.ucdavis.edu/~rogaway/umac

1734	   [12]  M. Wegman and L. Carter, "New hash functions and their use in
1735	         authentication and set equality", Journal of Computer and
1736	         System Sciences, 22 (1981), pp. 265-279.

1738	11 Author contact information

1740	   Authors' Addresses

1742	     John Black
1743	     Department of Computer Science
1744	     University of Nevada
1745	     Reno NV 89557
1746	     USA

1748	     EMail: jrb@cs.unr.edu

1750	     Shai Halevi
1751	     IBM T.J. Watson Research Center
1752	     P.O. Box 704
1753	     Yorktown Heights NY 10598
1754	     USA

1756	     EMail: shaih@watson.ibm.com

1758	     Alejandro Hevia
1759	     Department of Computer Science & Engineering
1760	     University of California at San Diego
1761	     La Jolla CA 92093
1762	     USA

1764	     EMail: ahevia@cs.ucsd.edu

1766	     Hugo Krawczyk
1767	     Deprtment of Electrical Engineering
1768	     Technion
1769	     Haifa 32000
1770	     ISRAEL

1772	     EMail: hugo@ee.technion.ac.il

1774	     Ted Krovetz
1775	     Intel Corporation
1776	     1900 Prairie City Road
1777	     Folsom CA 95630
1778	     USA

1780	     EMail: tdk@acm.org

1782	     Phillip Rogaway
1783	     Department of Computer Science
1784	     University of California
1785	     Davis CA 95616
1786	     USA

1788	     EMail: rogaway@cs.ucdavis.edu

1790	A  Suggested application programming interface (API)

1792	     /* umac.h */

1794	     typedef struct UMAC_CTX *umac_ctx_t;

1796	     umac_ctx_t umac_alloc(char key[]);
1797	       /* Dynamically allocate UMAC_CTX struct */
1798	       /* initialize variables and generate   */
1799	       /* subkeys for default parameters.     */

1801	     int umac_free(umac_ctx_t ctx);
1802	       /* Deallocate the context structure.   */

1804	     int umac_set_params(umac_ctx_t ctx, void *params);
1805	       /* After default initialization,      */
1806	       /* optionally set parameters to       */
1807	       /* different values and reset for     */
1808	       /* new message.                       */

1810	     int umac_update(umac_ctx_t ctx, char *input, long len);
1811	       /* Incorporate len bytes pointed to by  */
1812	       /* input into context ctx.             */

1814	     int umac_final(umac_ctx_t ctx, char tag[], char nonce[]);
1815	       /* Incorporate nonce value and return  */
1816	       /* tag.  Reset ctx for next message.   */

1818	     int umac(umac_ctx_t ctx, char *input,  long len,
1819	              char tag[], char nonce[]);
1820	       /* All-in-one (non-incremental)        */
1821	       /* implementation of the functions     */
1822	       /* umac_update() and umac_final().     */

1824	   Each routine returns zero if unsuccessful.

1826	B  Reference code and test vectors

1828	   See the UMAC World Wide Web homepage for reference code and test
1829	   vectors.

1831	       http://www.cs.ucdavis.edu/~rogaway/umac/