idnits 2.17.1 

draft-hallambaker-udf-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 367 has weird spacing: '...a 47 cc  ab fe...'

  == Line 370 has weird spacing: '...9 e0 bd  ea 47...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Since PKIX certificates and CLRs contain security policy
     information, UDF fingerprints used to identify certificates or CRLs
     SHOULD be presented with a minimum of 200 bits of precision.  PKIX
     applications MUST not accept UDF fingerprints specified with less than
     200 bits of precision for purposes of identifying trust anchors.

  -- The document date (August 14, 2017) is 2440 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC2119' is mentioned on line 279, but not defined


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    P. Hallam-Baker
3	Internet-Draft                                         Comodo Group Inc.
4	Intended status: Informational                           August 14, 2017
5	Expires: February 15, 2018

7	                     Uniform Data Fingerprint (UDF)
8	                        draft-hallambaker-udf-06

10	Abstract

12	   This document is also available online at
13	   http://prismproof.org/Documents/draft-hallambaker-udf.html .

15	Status of This Memo

17	   This Internet-Draft is submitted in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF).  Note that other groups may also distribute
22	   working documents as Internet-Drafts.  The list of current Internet-
23	   Drafts is at http://datatracker.ietf.org/drafts/current/.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   This Internet-Draft will expire on February 15, 2018.

32	Copyright Notice

34	   Copyright (c) 2017 IETF Trust and the persons identified as the
35	   document authors.  All rights reserved.

37	   This document is subject to BCP 78 and the IETF Trust's Legal
38	   Provisions Relating to IETF Documents
39	   (http://trustee.ietf.org/license-info) in effect on the date of
40	   publication of this document.  Please review these documents
41	   carefully, as they describe your rights and restrictions with respect
42	   to this document.  Code Components extracted from this document must
43	   include Simplified BSD License text as described in Section 4.e of
44	   the Trust Legal Provisions and are provided without warranty as
45	   described in the Simplified BSD License.

47	Table of Contents

49	   1.  Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . .   2
50	   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
51	     2.1.  Algorithm Identifier  . . . . . . . . . . . . . . . . . .   4
52	     2.2.  Content Type Identifier . . . . . . . . . . . . . . . . .   4
53	     2.3.  Representation  . . . . . . . . . . . . . . . . . . . . .   5
54	     2.4.  Truncation  . . . . . . . . . . . . . . . . . . . . . . .   5
55	   3.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   5
56	     3.1.  Requirements Language . . . . . . . . . . . . . . . . . .   6
57	   4.  Encoding  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
58	     4.1.  Binary Fingerprint Value  . . . . . . . . . . . . . . . .   7
59	       4.1.1.  Version ID  . . . . . . . . . . . . . . . . . . . . .   7
60	     4.2.  Truncation  . . . . . . . . . . . . . . . . . . . . . . .   8
61	     4.3.  Base32 Representation . . . . . . . . . . . . . . . . . .   8
62	     4.4.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .   8
63	       4.4.1.  Using SHA-2-512 Digest  . . . . . . . . . . . . . . .   8
64	     4.5.  Fingerprint Improvement . . . . . . . . . . . . . . . . .   9
65	     4.6.  Compressed Presentation . . . . . . . . . . . . . . . . .   9
66	     4.7.  Identifiers formed using UDFs . . . . . . . . . . . . . .   9
67	       4.7.1.  URI Representation  . . . . . . . . . . . . . . . . .  10
68	       4.7.2.  DNS Name  . . . . . . . . . . . . . . . . . . . . . .  10
69	   5.  Content Types . . . . . . . . . . . . . . . . . . . . . . . .  11
70	     5.1.  PKIX Certificates and Keys  . . . . . . . . . . . . . . .  11
71	     5.2.  OpenPGP Key . . . . . . . . . . . . . . . . . . . . . . .  11
72	     5.3.  DNSSEC  . . . . . . . . . . . . . . . . . . . . . . . . .  11
73	   6.  Additional UDF Renderings . . . . . . . . . . . . . . . . . .  12
74	     6.1.  Machine Readable Rendering  . . . . . . . . . . . . . . .  12
75	     6.2.  Word Lists  . . . . . . . . . . . . . . . . . . . . . . .  12
76	     6.3.  Image List  . . . . . . . . . . . . . . . . . . . . . . .  12
77	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  13
78	     7.1.  Work Factor and Precision . . . . . . . . . . . . . . . .  13
79	     7.2.  Semantic Substitution . . . . . . . . . . . . . . . . . .  14
80	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14
81	     8.1.  URI Registration  . . . . . . . . . . . . . . . . . . . .  14
82	     8.2.  Content Type Registration . . . . . . . . . . . . . . . .  14
83	     8.3.  Version Registry  . . . . . . . . . . . . . . . . . . . .  15
84	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
85	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
86	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  15
87	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  15

89	1.  Abstract

91	   This document describes means of generating Uniform Data Fingerprint
92	   (UDF) values and their presentation as text sequences and as URIs.

94	   Cryptographic digests provide a means of uniquely identifying static
95	   data without the need for a registration authority.  A fingerprint is
96	   a form of presenting a cryptographic digest that makes it suitable
97	   for use in applications where human readability is required.  The UDF
98	   fingerprint format improves over existing formats through the
99	   introduction of a compact algorithm identifier affording an
100	   intentionally limited choice of digest algorithm and the inclusion of
101	   an IANA registered MIME Content-Type identifier within the scope of
102	   the digest input to allow the use of a single fingerprint format in
103	   multiple application domains.

105	   Alternative means of rendering fingerprint values are considered
106	   including machine-readable codes, word and image lists.

108	2.  Introduction

110	   The use of cryptographic digest functions to produce identifiers is
111	   well established as a means of generating a unique identifier for
112	   fixed data without the need for a registration authority.

114	   While the use of fingerprints of public keys was popularized by PGP,
115	   they are employed in many other applications including OpenPGP, SSH,
116	   BitCoin and PKIX.

118	   A cryptographic digest is a particular form of hash function that has
119	   the properties:

121	   o  It is easy to compute the digest value for any given message

123	   o  It is infeasible to generate a message from its digest value

125	   o  It is infeasible to modify a message without changing the digest
126	      value

128	   o  It is infeasible to find two different messages with the same
129	      digest value.

131	   If these properties are met, the only way that two data objects that
132	   map to the same digest value is by random chance.  If the number of
133	   possible digest values is sufficiently large (i.e. is a sufficiently
134	   large number of bits in length), this chance is reduced to an
135	   arbitrarily infinitesimal probability.  Such values are described as
136	   being probabilistically unique.

138	   A fingerprint is a representation of a cryptographic digest value
139	   optimized for purposes of verification and in some cases data entry.

141	2.1.  Algorithm Identifier

143	   Although a secure cryptographic digest algorithm has properties that
144	   make it ideal for certain types of identifier use, several
145	   cryptographic digest algorithms have found widespread use, some of
146	   which have been demonstrated to be insecure.

148	   For example the MD5 message digest algorithm [RFC1321] [RFC1321] ,
149	   was widely used in IETF protocols until it was demonstrated to be
150	   vulnerable to collision attacks [Dobertin95] [Dobertin95] .

152	   The secure use of a fingerprint scheme therefore requires the digest
153	   algorithm to either be fixed or otherwise determined by the
154	   fingerprint value itself.  Otherwise an attacker may be able to use a
155	   weak, broken digest algorithm to generate a data object matching a
156	   fingerprint value generated using a strong digest algorithm.

158	   The two digest algorithms currently used in the UDF scheme are both
159	   believed to be strong.  These are SHA-2-512 [SHA-2] [SHA-2] and SHA-
160	   3-512 [SHA-3] [SHA-3] . The most secure, 512 bit version of the
161	   algorithm is used in both cases although the output is almost
162	   invariably truncated to a shorter length.  Use of the strongest
163	   version of the algorithm in every circumstance eliminates the need to
164	   negotiate the algorithm strength.

166	2.2.  Content Type Identifier

168	   A secure cryptographic digest algorithm provides a unique digest
169	   value that is probabilistically unique for a particular byte sequence
170	   but does not fix the context in which a byte sequence is interpreted.
171	   While such ambiguity may be tolerated in a fingerprint format
172	   designed for a single specific field of use, it is not acceptable in
173	   a general purpose format.

175	   For example, the SSH and OpenPGP applications both make use of
176	   fingerprints as identifiers for the public keys used but using
177	   different digest algorithms and data formats for representing the
178	   public key data.  While no such vulnerability has been demonstrated
179	   to date, it is certainly conceivable that a crafty attacker might
180	   construct an SSH key in such a fashion that OpenPGP interprets the
181	   data in an insecure fashion.  If the number of applications making
182	   use of fingerprint format that permits such substitutions is
183	   sufficiently large, the probability of a semantic substitution
184	   vulnerability being possible becomes unacceptably large.

186	   A simple control that defeats such attacks is to incorporate a
187	   content type identifier within the scope of the data input to the
188	   hash function.

190	2.3.  Representation

192	   The representation of a fingerprint is the format in which it is
193	   presented to either an application or the user.

195	   Base32 encoding is used to produce the preferred text representation
196	   of a UDF fingerprint.  This encoding uses only the letters of the
197	   Latin alphabet with numbers chosen to minimize the risk of ambiguity
198	   between numbers and letters (2, 3, 4, 5, 6 and 7).

200	   To enhance readability and improve data entry, characters are grouped
201	   into groups of five.

203	2.4.  Truncation

205	   Different applications of fingerprints demand different tradeoffs
206	   between compactness of the representation and the number of
207	   significant bits.  A larger the number of significant bits reduces
208	   the risk of collision but at a cost to convenience.

210	   Modern cryptographic digest functions such as SHA-2 produce output
211	   values of at least 256 bits in length.  This is considerably larger
212	   than most uses of fingerprints require and certainly greater than can
213	   be represented in human readable form on a business card.

215	   Since a strong cryptographic digest function produces an output value
216	   in which every bit in the input value affects every bit in the output
217	   value with equal probability, it follows that truncating the digest
218	   value to produce a finger print is at least as strong as any other
219	   mechanism if digest algorithm used is strong.

221	   Using truncation to reduce the precision of the digest function has
222	   the advantage that a lower precision fingerprint of some data content
223	   is always a prefix of a higher prefix of the same content.  This
224	   allows higher precision fingerprints to be converted to a lower
225	   precision without the need for special tools.

227	3.  Definitions

229	   Cryptographic Digest Function

231	   A hash function that has the properties required for use as a
232	   cryptographic hash function.  These include collision resistance,
233	   first pre-image resistance and second pre-image resistance.

235	      An identifier indicating how a Data Value is to be interpreted as
236	      specified in the IANA registry Media Types.

238	      The binary octet stream that is the input to the digest function
239	      used to calculate a digest value.

241	      A Data Value and its associated Content Type

243	      A synonym for Cryptographic Digest Function

245	      The output of a Cryptographic Digest Function

247	      The output of a Cryptographic Digest Function for a given Data
248	      Value input.

250	      A presentation of the digest value of a data value or data object.

252	      The representation of at least some part of a fingerprint value in
253	      human or machine readable form.

255	      The practice of recording a higher precision presentation of a
256	      fingerprint on successful validation.

258	      The practice of generating a sequence of fingerprints until one is
259	      found that matches criteria that permit a compressed presentation
260	      form to be used.  The compressed fingerprint thus being shorter
261	      than but presenting the same work factor as an uncompressed one.

263	      A function which takes an input and returns a fixed-size output.
264	      Ideally, the output of a hash function is unbiased and not
265	      correlated to the outputs returned to similar inputs in any
266	      predictable fashion.

268	      The number of significant bits provided by a Fingerprint
269	      Presentation.

271	      A measure of the computational effort required to perform an
272	      attack against some security property.

274	3.1.  Requirements Language

276	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
277	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
278	   document are to be interpreted as described in RFC 2119 [RFC2119].

280	4.  Encoding

282	   A UDF fingerprint for a given data object is generated by calculating
283	   the Binary Fingerprint Value for the given data object and type
284	   identifier, truncating it to obtain the desired degree of precision
285	   and then converting the truncated value to a representation.

287	4.1.  Binary Fingerprint Value

289	   The binary encoding of a fingerprint is calculated using the formula:

291	   Fingerprint = &<Version-ID> + H (&<Content-ID> + ?:? + H(&<Data>))

293	                                 Figure 1

295	   Where

297	   H(x) is the cryptographic digest function
298	   &<Version-ID> is the fingerprint version and algorithm identifier.
299	   &<Content-ID> is the MIME Content-Type of the data.
300	   &<Data> is the binary data.

302	                                 Figure 2

304	   The use of the nested hash function permits a fingerprint to be taken
305	   of data for which a digest value is already known without the need to
306	   calculate a new digest over the data.

308	   The inclusion of a MIME content type prevents message substitution
309	   attacks in which one content type is substituted for another.

311	4.1.1.  Version ID

313	   A Version Identifier consists of a single byte.  The following digest
314	   algorithm identifiers are specified in this document:

316	      +-----------------+------------------------+-----------------+
317	      | Version ID      | Algorithm              | Reference       |
318	      +-----------------+------------------------+-----------------+
319	      | 96              | SHA-2-512              | <norm="SHA-2"/> |
320	      | 97, 98, 99, 100 | SHA-2-512 (compressed) | <norm="SHA-2"/> |
321	      | 144             | SHA-3-512              | <norm="SHA-3"/> |
322	      +-----------------+------------------------+-----------------+

324	                                  Table 1

326	   These algorithm identifiers have been chosen so that the first
327	   character in a SHA-2-512 fingerprint will always be ?M? and the first
328	   character in a SHA-3-512 fingerprint will always be ?S?. These
329	   provide mnemonics for ?Merkle-Damgard? and ?Sponge? respectively.

331	4.2.  Truncation

333	   The Binary Fingerprint Value is truncated to an integer multiple of
334	   25 bits regardless of the intended output presentation.

336	   The output of the hash function is truncated to a sequence of n bits
337	   by first selecting the first n/8 bytes of the output function.  If n
338	   is an integer multiple of 8, no additional bits are required and this
339	   is the result.  Otherwise the remaining bits are taken from the most
340	   significant bits of the next byte and any unused bits set to 0.

342	   For example, to truncate the byte sequence [a0, b1, c2, d3, e4] to 25
343	   bits. 25/8 = 3 bytes with 1 bit remaining, the first three bytes of
344	   the truncated sequence is [a0, b1, c2] and the final byte is e4 AND
345	   80 = 80 which we add to the previous result to obtain the final
346	   truncated sequence of [a0, b1, c2, 80]

348	4.3.  Base32 Representation

350	   A modified version of Base32 [RFC4648] [RFC4648] encoding is used to
351	   present the fingerprint in text form grouping the output text into
352	   groups of five characters separated by a dash ?-?. This
353	   representation improves the accuracy of both data entry and
354	   verification.

356	4.4.  Examples

358	   In the following examples, <Content-ID> is the UTF8 encoding of the
359	   string "text/plain" and <Data> is the UTF8 encoding of the string
360	   "UDF Data Value"

362	   Data = 55 44 46 20 44 61 74 61 20 56 61 6c 75 65

364	4.4.1.  Using SHA-2-512 Digest

366	   H( <Data> ) =
367	       48 da 47 cc  ab fe a4 5c  76 61 d3 21  ba 34 3e 58
368	       10 87 2a 03  b4 02 9d ab  84 7c ce d2  22 b6 9c ab
369	       02 38 d4 e9  1e 2f 6b 36  a0 9e ed 11  09 8a ea ac
370	       99 d9 e0 bd  ea 47 93 15  bd 7a e9 e1  2e ad c4 15
371	   H(H( <Data> ) + Content-ID>) =
372	       45 e0 59 e0  39 34 ea b7  f6 5d 83 b2  d8 f9 b1 6d
373	       2a 6b 08 63  d9 3c c1 02  86 7b 83 49  f2 d9 f0 8f
374	       fe 07 87 30  c7 c9 05 74  ac a1 38 2b  b3 14 4d c6
375	       39 f9 8c 12  c0 4a 3e b5  05 0b 3e 67  df 52 4b 57

377	                                 Figure 3

379	      MB2GK-6DUF5-YGYYL-JNY5E

381	      MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ

383	      MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J

385	      MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J-C4OZQ-5GIN2-GQ7FQ-EEHFI

387	4.5.  Fingerprint Improvement

389	   Since an application must always calculate the full fingerprint value
390	   as part of the verification process, an application MAY record a

392	   Applications are encouraged to make use of the practice of
393	   fingerprint improvement wherever possible.

395	4.6.  Compressed Presentation

397	   Fingerprint compression permits the use of shorter fingerprint
398	   presentation without a reduction in the attacker work factor by
399	   requiring the fingerprint value to match a particular pattern.

401	   UDF fingerprints MUST use compression if possible.  A compressed
402	   fingerprint uses a version identifier that specifies the form of
403	   compression used as follows:

405	                 +------------+-------------------------+
406	                 | Version ID | Compression             |
407	                 +------------+-------------------------+
408	                 | 96         | None                    |
409	                 | 97         | First 25 bits are zeros |
410	                 | 98         | First 40 bits are zeros |
411	                 | 99         | First 50 bits are zeros |
412	                 | 100        | First 55 bits are zeros |
413	                 +------------+-------------------------+

415	                                  Table 2

417	   Thus, the fingerprint that would be represented in uncompressed form
418	   as MAAAA-AAWIY-LTMFTG-CZTRO is instead represented as MIWIY-LTMFTG-
419	   CZTRO.

421	4.7.  Identifiers formed using UDFs

423	   UDF fingerprints MAY be used to form a part of another protocol
424	   identifier.  Such practice carries the implicit semantic that the
425	   interpretation of the identifier formed is bound to the document
426	   identified by the fingerprint.

428	4.7.1.  URI Representation

430	   Any UDF fingerprint MAY be encoded as a URI by prefixing the Base32
431	   text representation of the fingerprint with the string ?udf:?

433	4.7.2.  DNS Name

435	   A UDF fingerprint MAY be encoded as a DNS label by prefixing the
436	   Base32 text representation with the string ?mm--?.

438	   A DNS name that includes a UDF fingerprint as a DNS label carries the
439	   implicit assertion that the interpretation of the address MUST be
440	   authorized by a security policy that is validated under a key that
441	   matches the corresponding fingerprint.

443	   Placing such a DNS label as the top level (rightmost) label in a DNS
444	   address creates an address that is not legal and thus cannot be
445	   resolved by the Internet DNS infrastructure.  Thus ensuring that the
446	   address is rejected by applications that are not capable of
447	   performing the associated validation steps.

449	   For example, Alice has the email security key with fingerprint MB2GK-
450	   6DUF5-YGYYL-JNY5E.  She uses the following email addresses:

452	      Alice publishes this email address when she does not want the
453	      other party to use the secure email system.

455	      Alice publishes this email address when she wants to give the
456	      other party the option of using secure email if their system
457	      supports it.

459	      The DNS server for example.com has been configured to redirect
460	      requests to resolve zz--mb2gk-6duf5-ygyyl-jny5e.example.com to the
461	      mail server example.com.

463	      Alice uses this email address when she wants the other party to be
464	      able to send her email if and only if their client supports use of
465	      the secure messaging system.

467	   While there should never be a DNS label of the form mm--* in the
468	   authoritative DNS root, such labels MAY be introduced by a trusted
469	   local resolver.  This would allow attempts at making an untrusted
470	   communication request to be transparently redirected through a
471	   locally trusted security enhancing proxy.

473	5.  Content Types

475	   While a UDF fingerprint MAY be used to identify any form of static
476	   data, the use of a UDF fingerprint to identify a public key signature
477	   key provides a level of indirection and thus the ability to identify
478	   dynamic data.  The content types used to identify public keys are
479	   thus of particular interest.

481	   As described in the security considerations section, the use of
482	   fingerprints to identify a bare public key and the use of
483	   fingerprints to identify a public key and associated security policy
484	   information are very different.

486	5.1.  PKIX Certificates and Keys

488	   UDF fingerprints MAY be used to identify PKIX certificates, CRLs and
489	   public keys in the ASN.1 encoding used in PKIX certificates.

491	   Since PKIX certificates and CLRs contain security policy information,
492	   UDF fingerprints used to identify certificates or CRLs SHOULD be
493	   presented with a minimum of 200 bits of precision.  PKIX applications
494	   MUST not accept UDF fingerprints specified with less than 200 bits of
495	   precision for purposes of identifying trust anchors.

497	   PKIX certificates, keys and related content data are identified by
498	   the following content types:

500	      A PKIX Certificate

502	      A PKIX CRL

504	      The KeyInfo structure defined in the PKIX certificate
505	      specification

507	5.2.  OpenPGP Key

509	   OpenPGPv5 keys and key set content data are identified by the
510	   following content types:

512	      An OpenPGP key

514	      An OpenPGP key set.

516	5.3.  DNSSEC

518	   DNSSEC record data consists of DNS records which are identified by
519	   the following content type:

521	      A DNS resource record in binary format

523	6.  Additional UDF Renderings

525	   By default, a UDF fingerprint is rendered in the Base32 encoding
526	   described in this document.  Additional renderings MAY be employed to
527	   facilitate entry and/or verification of fingerprint values.

529	6.1.  Machine Readable Rendering

531	   The use of a machine-readable rendering such as a QR Code allows a
532	   UDF value to be input directly using a smartphone or other device
533	   equipped with a camera.

535	   A QR code fixed to a network capable device might contain the
536	   fingerprint of a machine readable description of the device.

538	6.2.  Word Lists

540	   The use of a Word List to encode fingerprint values was introduced by
541	   Patrick Juola and Philip Zimmerman for the PGPfone application.  The
542	   PGP Word List is designed to facilitate exchange and verification of
543	   fingerprint values in a voice application.  To minimize the risk of
544	   misinterpretation, two word lists of 256 values each are used to
545	   encode alternative fingerprint bytes.  The compact size of the lists
546	   used allowed the compilers to curate them so as to maximize the
547	   phonetic distance of the words selected.

549	   The PGP Word List is designed to achieve a balance between ease of
550	   entry and verification.  Applications where only verification is
551	   required may be better served by a much larger word list, permitting
552	   shorter fingerprint encodings.

554	   For example, a word list with 16384 entries permits 14 bits of the
555	   fingerprint to be encoded at once, 65536 entries permits 16.  These
556	   encodings allow a 125 bit fingerprint to be encoded in 9 and 8 words
557	   respectively.

559	6.3.  Image List

561	   An image list is used in the same manner as a word list affording
562	   rapid visual verification of a fingerprint value.  For obvious
563	   reasons, this approach is not generally suited to data entry.

565	7.  Security Considerations

567	7.1.  Work Factor and Precision

569	   A given UDF data object has a single fingerprint value that may be
570	   presented at different precisions.  The shortest legitimate precision
571	   with which a UDF fingerprint may be presented has 96 significant bits

573	   A UDF fingerprint presents the same work factor as any other
574	   cryptographic digest function.  The difficulty of finding a second
575	   data item that matches a given fingerprint is 2^n and the difficulty
576	   or finding two data items that have the same fingerprint is 2^(n/2).
577	   Where n is the precision of the fingerprint.

579	   For the algorithms specified in this document, n = 512 and thus the
580	   work factor for finding collisions is 2^256, a value that is
581	   generally considered to be computationally infeasible.

583	   Since the use of 512 bit fingerprints is impractical in the type of
584	   applications where fingerprints are generally used, truncation is a
585	   practical necessity.  The longer a fingerprint is, the less likely it
586	   is that a user will check every character.  It is therefore important
587	   to consider carefully whether the security of an application depends
588	   on second pre-image resistance or collision resistance.

590	   In most fingerprint applications, such as the use of fingerprints to
591	   identify public keys, the fact that a malicious party might generate
592	   two keys that have the same fingerprint value is a minor concern.
593	   Combined with a flawed protocol architecture, such a vulnerability
594	   may permit an attacker to construct a document such that the
595	   signature will be accepted as valid by some parties but not by
596	   others.

598	   For example, Alice generates keypairs until two are generated that
599	   have the same 100 bit UDF presentation (typically 2^48 attempts).
600	   She registers one keypair with a merchant and the other with her
601	   bank.  This allows Alice to create a payment instrument that will be
602	   accepted as valid by one and rejected by the other.

604	   The ability to generate of two PKIX certificates with the same
605	   fingerprint and different certificate attributes raises very
606	   different and more serious security concerns.  For example, an
607	   attacker might generate two certificates with the same key and
608	   different use constraints.  This might allow an attacker to present a
609	   highly constrained certificate that does not present a security risk
610	   to an application for purposes of gaining approval and an
611	   unconstrained certificate to request a malicious action.

613	   In general, any use of fingerprints to identify data that has
614	   security policy semantics requires the risk of collision attacks to
615	   be considered.  For this reason the use of short, ?user friendly?
616	   fingerprint presentations (Less than 200 bits) SHOULD only be used
617	   for public key values.

619	7.2.  Semantic Substitution

621	   Many applications record the fact that a data item is trusted, rather
622	   fewer record the circumstances in which the data item is trusted.
623	   This results in a semantic substitution vulnerability which an
624	   attacker may exploit by presenting the trusted data item in the wrong
625	   context.

627	   The UDF format provides protection against high level semantic
628	   substitution attacks by incorporating the content type into the input
629	   to the outermost fingerprint digest function.  The work factor for
630	   generating a UDF fingerprint that is valid in both contexts is thus
631	   the same as the work factor for finding a second preimage in the
632	   digest function (2^512 for the specified digest algorithms).

634	   It is thus infeasible to generate a data item such that some
635	   applications will interpret it as a PKIX key and others will accept
636	   as an OpenPGP key.  While attempting to parse a PKIX key as an
637	   OpenPGP key is virtually certain to fail to return the correct key
638	   parameters it cannot be assumed that the attempt is guaranteed to
639	   fail with an error message.

641	   The UDF format does not provide protection against semantic
642	   substitution attacks that do not affect the content type.

644	8.  IANA Considerations

646	   [This will be extended later]

648	8.1.  URI Registration

650	   [Here a URI registration for the udf: scheme]

652	8.2.  Content Type Registration

654	   [application/pkix-keyinfo]

656	   [application/pgp-key]

658	8.3.  Version Registry

660	   96 = SHA-2-512
661	   97 = SHA-2-512 with 25 leading zeros
662	   98 = SHA-2-512 with 40 leading zeros
663	   99 = SHA-2-512 with 50 leading zeros
664	   100 = SHA-2-512 with 55 leading zeros
665	   144 = SHA-3-512

667	                                 Figure 4

669	9.  References

671	9.1.  Normative References

673	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
674	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006.

676	   [SHA-2]    "[Reference Not Found!]".

678	   [SHA-3]    "[Reference Not Found!]".

680	9.2.  Informative References

682	   [Dobertin95]
683	              "[Reference Not Found!]".

685	   [RFC1321]  Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
686	              DOI 10.17487/RFC1321, April 1992.

688	Author's Address

690	   Phillip Hallam-Baker
691	   Comodo Group Inc.

693	   Email: philliph@comodo.com