idnits 2.17.1 draft-ietf-conneg-feature-hash-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Unrecognized Status in 'Category: Work-in-progress', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (16 June 1999) is 9074 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 635, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2234 (ref. '3') (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 2396 (ref. '5') (Obsoleted by RFC 3986) ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '6') -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' Summary: 9 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF conneg working group Graham Klyne, editor 2 Internet draft 5GM/Content Technologies 3 Category: Work-in-progress Larry Masinter 4 Xerox Corporation 5 16 June 1999 6 Expires: December 1999 8 Identifying composite media features 9 11 Status of this memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC 2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other 23 documents at any time. It is inappropriate to use Internet- Drafts 24 as reference material or to cite them other than as "work in 25 progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Copyright Notice 35 Copyright (C) The Internet Society 1999. All Rights Reserved. 37 Abstract 39 In RFC 2533 [1], an expression format is presented for describing 40 media feature capabilities as a combination of simple media feature 41 tags [2]. 43 This document describes an abbreviated format for a composite media 44 feature set, based upon a hash of the feature expression describing 45 that composite, or the URI of a resource containing the feature 46 expression. 48 Internet Draft Identifying composite media features 50 Table of contents 52 1. Introduction ............................................2 53 1.1 Organization of this document 2 54 1.2 Terminology and document conventions 3 55 1.3 Discussion of this document 3 56 2. Motivation and goals ....................................4 57 3. Composite feature representation ........................5 58 3.1 Feature set hashed reference format 5 59 3.1.1 Hash value calculation 6 60 3.1.2 Base-32 value representation 7 61 3.2 Feature set URI reference format 8 62 3.3 Resolving feature set references 9 63 3.3.1 URI reference 9 64 3.3.2 Inline feature set details 10 65 4. Examples ................................................11 66 5. Internationalization considerations .....................11 67 6. Security considerations .................................12 68 7. Full copyright statement ................................12 69 8. Acknowledgements ........................................13 70 9. References ..............................................13 71 10. Authors' addresses .....................................14 72 Appendix A The birthday problem ............................14 73 Appendix B: Revision history ...............................16 75 1. Introduction 77 In "A syntax for describing media feature sets" [1], an expression 78 format is presented for describing media feature capabilities as a 79 combination of simple media feature tags [2]. 81 This document proposes an abbreviated format for a composite media 82 feature set, based upon a hash of the feature expression describing 83 that composite. 85 1.1 Organization of this document 87 Section 2 sets out some of the background and goals for feature set 88 references. 90 Section 3 presents a syntax for feature set references, and 91 describes how they are related to feature set expressions. 93 Internet Draft Identifying composite media features 95 1.2 Terminology and document conventions 97 This section defines a number of terms and other document 98 conventions, which are used with specific meaning in this memo. 99 The terms are listed in alphabetical order. 101 dereference 102 the act of replacing a feature set reference with its 103 corresponding feature set expression. Also called 104 "resolution". 106 feature set 107 some set of media features described by a media feature 108 assertion, as described in "A syntax for describing media 109 feature sets" [1]. (See that memo for a more formal 110 definition of this term.) 112 feature set expression 113 a string that describes some feature set, formulated 114 according to the rules in "A syntax for describing media 115 feature sets" [1] (and possibly extended by other 116 specifications). 118 feature set reference 119 a brief construct that references some feature set. 120 (See also: "dereference".) 122 feature set tag 123 a name that conforms to the syntax of a feature tag [1] 124 that is used to denote a feature set rather than a single 125 feature. 127 resolution 128 (See "dereference"). 130 This specification uses syntax notation and conventions described 131 in RFC2234 "Augmented BNF for Syntax Specifications: ABNF" [3]. 133 NOTE: Comments like this provide additional nonessential 134 information about the rationale behind this document. 135 Such information is not needed for building a conformant 136 implementation, but may help those who wish to understand 137 the design in greater depth. 139 1.3 Discussion of this document 141 Discussion of this document should take place on the content 142 negotiation and media feature registration mailing list hosted by 143 the Internet Mail Consortium (IMC). 145 Internet Draft Identifying composite media features 146 Please send comments regarding this document to: 148 ietf-medfree@imc.org 150 To subscribe to this list, send a message with the body 'subscribe' 151 to "ietf-medfree-request@imc.org". 153 To see what has gone on before you subscribed, please see the 154 mailing list archive at: 156 http://www.imc.org/ietf-medfree/ 158 2. Motivation and goals 160 The range of media feature capabilities of a message handling 161 system can be quite extensive, and the corresponding feature set 162 expression [1] can reach a significant size. 164 A requirement has been identified to allow recurring feature sets 165 to be identified by a single reference value, which can be combined 166 with other elements in a feature set expression. It is anticipated 167 that mechanisms will be provided that allow the recipient of such a 168 feature set reference to discover the corresponding feature set 169 expression. 171 Thus, the goals for this proposal are: 173 o to provide an abbreviated form for referencing an arbitrary 174 feature set expression. 176 o the meaning of (i.e. the corresponding feature set expression) a 177 feature set reference should be independent of any particular 178 mechanism that may be used to dereference it. 180 o to be able to verify whether a given feature set expression 181 corresponds to some feature set reference without having to 182 perform an explicit dereferencing operation (i.e. without 183 incurring additional network traffic). 185 o for protocol processors that conform to [1] to be able to 186 sensibly handle a feature set reference without explicit 187 knowledge of its meaning (i.e. the introduction of feature set 188 references should not break existing feature expression 189 processors). 191 o to allow, but not require, some indication of how to dereference 192 a feature set reference to be included in a feature set 193 expression. 195 Internet Draft Identifying composite media features 196 NOTE: This proposal does not attempt to address the 197 "override" or "default" problem. (Also called 198 "delegation", where a feature set may be referenced and 199 selectively modified.) 201 3. Composite feature representation 203 This specification hinges on three central ideas: 205 o the use of auxiliary predicates (introduced in [1]) to form the 206 basis of a feature set reference, and 208 o the use of a token based on a hash function computed over the 209 referenced feature set expression. 211 o the use of an expression containing a URI to indicate an 212 identifier and possibly a service for resolution of a composite 213 feature set. 215 A key reason to use a hash function to generate an identifier is to 216 define a global name space without requiring a central naming 217 authority. New feature set tags can be introduced by any party 218 following the appropriate rules of formulation, without reference 219 to any centralized authority. 221 Local resolution services may be needed to map feature set tags to 222 their corresponding feature set expressions, but these are not able 223 to vary the meaning of any given tag. Failure of a resolution 224 service to return the correct expression is detectable by a calling 225 application, which should reject any incorrect value supplied. 227 This memo also suggests that an expression containing a URI in the 228 format '' may be used to indicate a mechanism and location of 229 a service to perform feature set resolution. 231 3.1 Feature set hashed reference format 233 This specification introduces a special form of auxiliary predicate 234 name with the following syntax: 236 fname = "h." 1*BASE32DIGIT 237 BASE32DIGIT = DIGIT 238 / "A" / "B" / "C" / "D" / "E" / "F" / "G" / "H" 239 / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" 240 / "Q" / "R" / "S" / "T" / "U" / "V" 242 The sequence of base-32 digits represents the value of a hash 243 function calculated over the corresponding feature set expression 245 Internet Draft Identifying composite media features 246 (see following sections). Note that the above syntax allows upper- 247 or lower- case letters for base-32 digits (per RFC 2234 [3]). 249 Thus, within a feature set expression, a hashed feature set 250 reference would have the following form: 252 (h.123456789abcdefghijklmnopq) 254 3.1.1 Hash value calculation 256 The hash value is calculated using the MD5 algorithm [6] over the 257 text of the referenced feature set expression subjected to certain 258 normalizations. The feature expression must conform to the syntax 259 given for 'filter' in RFC 2533 [1]: 261 filter = "(" filtercomp ")" *( ";" parameter ) 263 The steps for calculating a hash value are: 265 1. Whitespace normalization: all spaces, CR, LF, TAB and any other 266 layout control characters that may be embedded in the feature 267 expression string are removed (or ignored for the purpose of hash 268 value computation). 270 2. Case normalization: all lower case letters in the feature 271 expression, other than those contained within quoted strings, are 272 converted to upper case. That is, unquoted characters with 273 values 97 to 122 (decimal) are changed to corresponding 274 characters in the range 65 to 90. 276 3. Hash computation: the MD5 algorithm, described in RFC 1321 [6], 277 is applied to the normalized feature expression string. 279 The result obtained in step 3 is a 128-bit (16 octet) value that is 280 converted to a base-32 representation to form the feature set 281 reference. 283 NOTE: under some circumstances, removal of ALL 284 whitespace may result in an invalid feature expression 285 string. This should not be a problem as this is done 286 only for the purpose of calculating a hash value, and 287 significantly different feature expressions are expected 288 to differ in ways other than their whitespace. 290 NOTE: case normalization is deemed appropriate since 291 feature tag and token matching is case insensitive. 293 Internet Draft Identifying composite media features 295 3.1.2 Base-32 value representation 297 RFC 1321 [6] describes how to calculate an MD5 hash value that is a 298 sequence of 16 octets. This is then required to be coded as a 299 base-32 value, which is a sequence of base-32 digit characters. 301 Each successive character in a base-32 value represents 5 302 successive bits of the underlying octet sequence. Thus, each group 303 of 8 characters represents a sequence of 5 octets (40 bits): 305 1 2 3 306 01234567 89012345 67890123 45678901 23456789 307 +--------+--------+--------+--------+--------+ 308 |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >| 309 +--------+--------+--------+--------+--------+ 310 <===> 8th character 311 <====> 7th character 312 <===> 6th character 313 <====> 5th character 314 <====> 4th character 315 <===> 3rd character 316 <====> 2nd character 317 <===> 1st character 319 The value (i.e. sequence of bits) represented by each base-32 digit 320 character is indicated by the following table: 321 "0" 0 "A" 10 "K" 20 "U" 30 322 "1" 1 "B" 11 "L" 21 "V" 31 323 "2" 2 "C" 12 "M" 22 324 "3" 3 "D" 13 "N" 23 325 "4" 4 "E" 14 "O" 24 326 "5" 5 "F" 15 "P" 25 327 "6" 6 "G" 16 "Q" 26 328 "7" 7 "H" 17 "R" 27 329 "8" 8 "I" 18 "S" 28 330 "9" 9 "J" 19 "T" 29 332 When encoding a base-32 value, each full group of 5 octets is 333 represented by a sequence of 8 characters indicated above. If a 334 group of less than 5 octets remain after this, they are encoded 335 using as many additional characters as may be needed: 1, 2, 3 or 4 336 octets are encoded by 2, 4, 5 or 7 characters respectively. Any 337 spare bits represented by the base-32 digit characters are selected 338 to be zero. 340 When decoding a base-32 value, the reverse mapping is applied: 341 each full group of 8 characters codes a sequence of 5 octets. A 342 final group of 2, 4, 5 or 7 characters codes a sequence of 1, 2, 3 343 or 4 octets respectively. Any spare bits represented by the final 344 group of characters are discarded. 346 Internet Draft Identifying composite media features 347 Thus, for a 128-bit (16 octet) MD5 hash value, the first 15 octets 348 are coded as 24 base 32 digit characters, and the final octet is 349 coded by two characters. 351 NOTE: Base64 representation (per MIME [4]) would be more 352 compact (21 rather than 26 characters for the MD5 128-bit 353 hash value), but an auxiliary predicate name is defined 354 (by [1]) to have the same syntax as a feature tag, and 355 the feature tag matching rules (per [2]) state that 356 feature tag matching is case insensitive. 358 Base36 representation was considered (i.e. using all 359 letters "A"-"Z") but was not used because this would 360 require extended precision multiplication and division 361 operations to encode and decode the hash values. 363 3.2 Feature set URI reference format 365 This section introduces a new form of feature set predicate by 366 extending the feature set syntax [1] as follows: 368 filter =/ "<" URI ">" *( ";" parameter ) 370 where 'URI' is described by "Uniform Resource Identifiers (URI): 371 Generic Syntax" [5]. 373 The meaning of this construct is defined to be the meaning of the 374 expression in which '' is replaced by a copy of the resource 375 indicated by 'URI'. The indicated resource is required to be a 376 text value containing a valid feature set expression, NOT itself 377 containing a '' reference. 379 If a '' reference is used within a feature expression that 380 defines a hash reference, then the hash value is calculated over 381 the expression obtained after the resource has been subsituted. 383 Thus, the following are examples of feature set expressions using 384 URI references: 386 388 (& (dpi=100) ) 390 This specification does not indicate: 392 o any specific URI schemes to be supported, 394 o any meaning if the resource cannot be accessed, of if the value 395 obtained does not correspond to some recognized format. 397 Internet Draft Identifying composite media features 399 These details must be indicated by the specification of any 400 application or protocol that relies upon this form of a feature 401 predicate. 403 If the URI uses characters other than a designated subset of US- 404 ASCII then those additional characters should be represented by a 405 sequence of US-ASCII characters allowed by RFC 2396 [5]. 407 NOTE: the syntax above allows a '' reference to 408 appear almost anywhere in a feature set expression. In 409 some contexts, the appearance of these references may be 410 restricted to specific locations (e.g. the right hand 411 side of a auxiliary feature predicate, as indicated in 412 section 3.3.1). The specification of any application or 413 protocol should fully describe any such restriction that 414 it may impose. 416 3.3 Resolving feature set references 418 This memo does not mandate any particular mechanism for 419 defeferencing a feature set reference. It is expected that 420 specific dereferencing mechanisms will be specified for any 421 application or protocol that uses them. 423 The following sections describe some ways that feature set 424 dereferencing information may be incorporated into a feature set 425 expression. Both of these mechanisms are based on auxiliary 426 predicate definitions within a "where" clause [1]. 428 When a hashed feature set reference is used, conformance to the 429 hashing rules takes precedence over any other determination of the 430 feature expression. Any expression, however obtained, may not be 431 substituted for the hash-based reference unless it yields the 432 correct hash value. 434 3.3.1 URI reference 436 The two formats for feature set references described above may be 437 combined by defining the meaning of a hashed reference to be a URI- 438 based reference. For example: 440 (& (dpi=100) (h.1234567890) ) 441 where 442 (h.1234567890) :- 443 end 445 This indicates that the meaning of the hash-based form is contained 446 in the resource whose URI is given. In this case, an HTTP resource 447 retrieval is suggested. 449 Internet Draft Identifying composite media features 450 The hash value used is calculated over the feature set expression 451 obtained by defererencing the URI form expression. 453 IMPORTANT: How a calling application processes a URI is 454 not specified here. 456 For URIs that are URLs, one reasonable approach would be 457 to use the URL scheme protocol to access the 458 corresponding feature set expression. But other 459 mechanisms might be used; e.g. protocols developed by 460 the IETF resource capability (RESCAP) working group. In 461 any case, any mechanism used must be specified by an 462 application or protocol that uses URI references in this 463 way. 465 When a hashed feature set reference is resolved using a URI value, 466 the retrieving program should use the feature expression thus 467 obtained only if it hashes to the correct value. 469 3.3.2 Inline feature set details 471 In this case, a reference is resolved by including its definition 472 inline in an expression. 474 The feature set expression associated with a reference value may be 475 specified directly in a "where" clause, using the auxiliary 476 predicate definition syntax [1]; e.g. 478 (& (dpi=100) (h.1234567890) ) 479 where 480 (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) 481 end 483 This form might be used on request (where the request mechanism is 484 defined by the invoking application protocol), or when the 485 originator believes the recipient may not understand the reference. 487 It is an error if the inline feature expression does not yield the 488 hash value contained in auxiliary predicate name. 490 NOTE: viewed in isolation, this format does not have any 491 obvious value, in that the (h.xxx) form of auxiliary 492 predicate could be replaced by any arbitrary name. 494 Internet Draft Identifying composite media features 495 It is anticipated that this form might be used as a 496 follow-up response in a sequence along the lines of: 497 A> Capabilities are: 498 (& (dpi=100) (h.1234567890) ) 499 B> Do not understand: 500 (h.1234567890) 501 A> Capabilities are: 502 (& (dpi=100) (h.1234567890) ) 503 where 504 (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) 505 end 507 4. Examples 509 The following are some examples of feature set expressions 510 containing feature set references: 512 (& (dpi=100) (h.1234567890abcdefghijklmnop) ) 514 (& (dpi=100) 515 ) 517 (& (dpi=100) (h.1234567890abcdefghijklmnop) ) 518 where 519 (h.1234567890abcdefghijklmnop) :- 520 521 end 523 5. Internationalization considerations 525 Feature set expressions and URI strings are currently defined to 526 consist of only characters from the US-ASCII repertoire [1,5]; 527 under these circumstances this specification is not impacted by 528 internationalization considerations (other than any already 529 applicable to URIs [5]). 531 But, if future revisions of the feature set syntax permit non-US- 532 ASCII characters (e.g. within quoted strings), then some canonical 533 representation must be defined for the purposes of calculating hash 534 values. One choice might be to use a UTF-8 equivalent 535 representation as the basis for calculating the feature set hash. 536 Another choice might be to leave this as an application protocol 537 issue (but this could lead to non-interoperable feature sets 538 between different protocols). 540 Internet Draft Identifying composite media features 541 Another conceivable issue is that of up-casing the feature 542 expression in preparation for computing a hash value. This does 543 not apply to the content of strings so is not likely to be an 544 issue. But if changes are made that do permit non-US-ASCII 545 characters in feature tags or token strings, consideration must be 546 given to properly defining how case conversion is to be performed. 548 6. Security considerations 550 For the most part, security considerations are the same as those 551 that apply for capability identification in general [1,2,9]. 553 A possible added consideration is that use of a specific feature 554 set tag may reveal more information about a system than is 555 necessary for a transaction at hand. 557 7. Full copyright statement 559 Copyright (C) The Internet Society 1999. All Rights Reserved. 561 This document and translations of it may be copied and furnished to 562 others, and derivative works that comment on or otherwise explain 563 it or assist in its implementation may be prepared, copied, 564 published and distributed, in whole or in part, without restriction 565 of any kind, provided that the above copyright notice and this 566 paragraph are included on all such copies and derivative works. 567 However, this document itself may not be modified in any way, such 568 as by removing the copyright notice or references to the Internet 569 Society or other Internet organizations, except as needed for the 570 purpose of developing Internet standards in which case the 571 procedures for copyrights defined in the Internet Standards process 572 must be followed, or as required to translate it into languages 573 other than English. 575 The limited permissions granted above are perpetual and will not be 576 revoked by the Internet Society or its successors or assigns. 578 This document and the information contained herein is provided on 579 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 580 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 581 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 582 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 583 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 585 Internet Draft Identifying composite media features 587 8. Acknowledgements 589 Much of the initial work for URI references to feature sets was 590 provided by Bill Newman. Some of the ideas here have been improved 591 by early discussions with Martin Duerst, Al Gilman and Ted Hardie. 593 9. References 595 [1] RFC 2533, "A syntax for describing media feature sets" 596 Graham Klyne, 5GM/Content Technologies 597 March 1999. 599 [2] RFC 2506, "Media Feature Tag Registration Procedure" 600 Koen Holtman, TUE 601 Andrew Mutz, Hewlett-Packard 602 Ted Hardie, Equinix 603 March 1999. 605 [3] RFC 2234, "Augmented BNF for Syntax Specifications: ABNF" 606 D. Crocker (editor), Internet Mail Consortium 607 P. Overell, Demon Internet Ltd. 608 November 1997. 610 [4] RFC 2045, "Multipurpose Internet Mail Extensions (MIME) 611 Part 1: Format of Internet message bodies" 612 N. Freed, Innosoft 613 N. Borenstein, First Virtual 614 November 1996. 616 [5] RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax", 617 Tim Berners-Lee, World Wide Web Consortium/MIT 618 Roy T. Fielding, University of California, Irvine 619 Larry Masinter, Xerox PARC 620 August 1998. 622 [6] RFC 1321, "The MD5 Message-Digest Algorithm", 623 R. Rivest, MIT Laboratory for Computer Science and RSA Data 624 Security, Inc., 625 April 1992. 627 [7] "Applied Cryptography" 628 Bruce Schneier 629 John Wiley and Sons, 1996 (second edition) 630 ISBN 0-471-12845-7 (cloth) 631 ISBN 0-471-11709-9 (paper) 633 Internet Draft Identifying composite media features 635 [8] "Protocol-independent content negotiation framework" 636 Graham Klyne, 5GM/Content Technologies 637 Internet draft: 638 Work in progress, March 1999. 640 [9] "Numerical Recipes" 641 William H Press, Brian P Flannery, Saul A Teukolski and William T 642 Vetterling 643 Cambridge University Press (1986) 644 ISBN 0 521 30811 9 645 (The Gamma function approximation is presented in chapter 6 on 646 "Special Functions". There have been several later editions of 647 this book published, so the chapter reference may change.) 649 10. Authors' addresses 651 Graham Klyne 652 5th Generation Messaging Ltd. Content Technologies Ltd. 653 5 Watlington Street Forum 1, Station Road 654 Nettlebed Theale 655 Henley-on-Thames, RG9 5AB Reading, RG7 4RA 656 United Kingdom United Kingdom. 657 Telephone: +44 1491 641 641 +44 118 930 1300 658 Facsimile: +44 1491 641 611 +44 118 930 1301 659 E-mail: GK@ACM.ORG 661 Larry Masinter 662 Xerox Corporation 663 3333 Coyote Hill Road 664 Palo Alto, CA 94304 665 Facsimile: +1 650 812 4333 666 EMail: masinter@parc.xerox.com 667 http://www.parc.xerox.com/masinter 669 Appendix A The birthday problem 671 NOTE: this entire section is commentary, and does not 672 affect the feature set reference specification in any 673 way. 675 The use of a hash value to represent an arbitrary feature set is 676 based on a presumption that no two distinct feature sets will yield 677 the same hash value. 679 There is a small but distinct possibility that two different 680 feature sets will indeed yield the same hash value. 682 Internet Draft Identifying composite media features 683 We assume that the 128-bit hash function distributes hash values 684 for feature sets, even those with very small differences, randomly 685 and evenly through the range of 2^128 (approximately 3*10^38) 686 possible values. This is a fundamental property of a good digest 687 algorithm like MD5. Thus, the chance that any two distinct feature 688 set expressions yield the same hash is less than 1 in 10^38. This 689 is negligible when compared with, say, the probability that a 690 receiving system will fail having received data conforming to a 691 negotiated feature set. 693 But when the number of distinct feature sets in circulation 694 increases, the probability of repeating a hash value increases 695 surprisingly. This is illustrated by the "birthday paradox": 696 given a random collection of just 23 people, there is a greater 697 than even chance that there exists some pair with the same birthay. 698 This topic is discussed further in sections 7.4 and 7.5 of Bruce 699 Schneier's "Applied Cryptography" [7]. 701 The table below shows the "birthday paradox" probabilities that at 702 least one pair of feature sets has the same hash value for 703 different numbers of feature sets in use. 705 Number of feature Probability of two 706 sets in use sets with the same 707 hash value 708 1 0 709 2 3E-39 710 10 1E-37 711 1E3 1E-33 712 1E6 1E-27 713 1E9 1E-21 714 1E12 1E-15 715 1E15 1E-9 716 1E18 1E-3 718 The above probability computations are approximate, being 719 performed using logarithms of a Gamma function 720 approximation by Lanczos [9]. The probability formula is 721 'P=1-(m!/((m-n)! m^n))', where 'm' is the total number of 722 possible hash values (2^128) and 'n' is the number of 723 feature sets in use. 725 If original feature set expressions are generated manually, or only 726 in response to some manually constrained process, the total number 727 of feature sets in circulation is likely to remain very small in 728 relation to the total number of possible hash values. 730 Internet Draft Identifying composite media features 731 The outcome of all this is: assuming that the feature sets are 732 manually generated, even taking account of the birthday paradox 733 effect, the probability of incorrectly identifying a feature set 734 using a hash value is still negligibly small when compared with 735 other possible failure modes. 737 Appendix B: Revision history 739 [[[RFC editor: please remove this section on publication]]] 741 00a 10-Feb-1999 Initial draft. 743 01a 16-Feb-1999 Added pointers to mailing list for discussion. 745 01b 25-Mar-1999 Name all authors. Add some terms to the glossary. 746 Expand on meaning of URI tag used as auxiliary 747 predicate name. Update references. Rework 748 section 3 to deal more evenly with both hash and 749 URI based feature set references. State absolute 750 requirement for hash-based references to be 751 resolved to expressions that yield the correct 752 hash value. 754 01c 06-Apr-1999 Define form of URI reference using new '<...>' 755 syntax, and adjust other text accordingly. 757 01d 06-Apr-1999 Editorial revisions. Include values in table of 758 probabilities for hash value clashes. Remove 759 discussion of algebraic simplification of hash 760 references. Correct syntax of some examples. 762 02a 16-Jun-1999 Move birthday problem to an appendix. Remove 763 RESCAP citation. Use base-32 to represent feature 764 hashes; describe base-32 encoding. 766 02b 16-Jun-1999 Add note that the form of feature reference 767 may not be allowed at arbitrary locations in all 768 contexts.