idnits 2.17.1 draft-ietf-conneg-feature-hash-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Unrecognized Status in 'Category: Work-in-progress', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (19 July 1999) is 9045 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 665, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2234 (ref. '3') (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 2396 (ref. '5') (Obsoleted by RFC 3986) ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '6') -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF conneg working group Graham Klyne, editor 2 Internet draft 5GM/Content Technologies 3 Category: Work-in-progress Larry Masinter 4 Xerox Corporation 5 19 July 1999 6 Expires: January 2000 8 Identifying composite media features 9 11 Status of this memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC 2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other 23 documents at any time. It is inappropriate to use Internet- Drafts 24 as reference material or to cite them other than as "work in 25 progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Copyright Notice 35 Copyright (C) The Internet Society 1999. All Rights Reserved. 37 Abstract 39 In RFC 2533 [1], an expression format is presented for describing 40 media feature capabilities as a combination of simple media feature 41 tags [2]. 43 This document describes an abbreviated format for a composite media 44 feature set, based upon a hash of the feature expression describing 45 that composite, or the URI of a resource containing the feature 46 expression. 48 Internet Draft Identifying composite media features 50 Table of contents 52 1. Introduction ............................................2 53 1.1 Organization of this document 2 54 1.2 Terminology and document conventions 3 55 1.3 Discussion of this document 3 56 2. Motivation and goals ....................................4 57 3. Composite feature representation ........................5 58 3.1 Feature set hashed reference format 6 59 3.1.1 Hash value calculation 6 60 3.1.2 Base-32 value representation 7 61 3.2 Feature set URI reference format 9 62 3.3 Resolving feature set references 10 63 3.3.1 URI reference 10 64 3.3.2 Inline feature set details 11 65 4. Examples ................................................12 66 5. Internationalization considerations .....................12 67 6. Security considerations .................................12 68 7. Full copyright statement ................................13 69 8. Acknowledgements ........................................13 70 9. References ..............................................13 71 10. Authors' addresses .....................................15 72 Appendix A The birthday problem ............................15 73 Appendix B: Revision history ...............................17 75 1. Introduction 77 In "A syntax for describing media feature sets" [1], an expression 78 format is presented for describing media feature capabilities as a 79 combination of simple media feature tags [2]. 81 This document proposes an abbreviated format for a composite media 82 feature set, based upon a hash of the feature expression describing 83 that composite. 85 This memo extends and builds upon the expression syntax described 86 in RFC 2533 [1], and it is assumed that the reader is familiar with 87 the interpretation and processing of feature set expressions 88 described there. 90 1.1 Organization of this document 92 Section 2 sets out some of the background and goals for feature set 93 references. 95 Section 3 presents a syntax for feature set references, and 96 describes how they are related to feature set expressions. 98 Internet Draft Identifying composite media features 100 1.2 Terminology and document conventions 102 This section defines a number of terms and other document 103 conventions, which are used with specific meaning in this memo. 104 The terms are listed in alphabetical order. 106 dereference 107 the act of replacing a feature set reference with its 108 corresponding feature set expression. Also called 109 "resolution". 111 feature set 112 some set of media features described by a media feature 113 assertion, as described in "A syntax for describing media 114 feature sets" [1]. (See that memo for a more formal 115 definition of this term.) 117 feature set expression 118 a string that describes some feature set, formulated 119 according to the rules in "A syntax for describing media 120 feature sets" [1] (and possibly extended by other 121 specifications). 123 feature set reference 124 a brief construct that references some feature set. 125 (See also: "dereference".) 127 feature set tag 128 a name that conforms to the syntax of a feature tag [1] 129 that is used to denote a feature set rather than a single 130 feature. 132 resolution 133 (See "dereference"). 135 This specification uses syntax notation and conventions described 136 in RFC2234 "Augmented BNF for Syntax Specifications: ABNF" [3]. 138 NOTE: Comments like this provide additional nonessential 139 information about the rationale behind this document. 140 Such information is not needed for building a conformant 141 implementation, but may help those who wish to understand 142 the design in greater depth. 144 1.3 Discussion of this document 146 Discussion of this document should take place on the content 147 negotiation and media feature registration mailing list hosted by 148 the Internet Mail Consortium (IMC). 150 Internet Draft Identifying composite media features 151 Please send comments regarding this document to: 153 ietf-medfree@imc.org 155 To subscribe to this list, send a message with the body 'subscribe' 156 to "ietf-medfree-request@imc.org". 158 To see what has gone on before you subscribed, please see the 159 mailing list archive at: 161 http://www.imc.org/ietf-medfree/ 163 2. Motivation and goals 165 The range of media feature capabilities of a message handling 166 system can be quite extensive, and the corresponding feature set 167 expression [1] can reach a significant size. 169 A requirement has been identified to allow recurring feature sets 170 to be identified by a single reference value, which can be combined 171 with other elements in a feature set expression. It is anticipated 172 that mechanisms will be provided that allow the recipient of such a 173 feature set reference to discover the corresponding feature set 174 expression. 176 Thus, the goals for this proposal are: 178 o to provide an abbreviated form for referencing an arbitrary 179 feature set expression. 181 o the meaning of (i.e. the corresponding feature set expression) a 182 feature set reference should be independent of any particular 183 mechanism that may be used to dereference it. 185 o to be able to verify whether a given feature set expression 186 corresponds to some feature set reference without having to 187 perform an explicit dereferencing operation (i.e. without 188 incurring additional network traffic). 190 o for protocol processors that conform to RFC 2533 [1] to be able 191 to sensibly handle a feature set reference without explicit 192 knowledge of its meaning (i.e. the introduction of feature set 193 references should not break existing feature expression 194 processors). That is, the applicable interpretation and 195 processing rules of RFC 2533 [1] apply equally to expressions 196 containing feature set references. 198 Internet Draft Identifying composite media features 200 o to allow, but not require, some indication of how to dereference 201 a feature set reference to be included in a feature set 202 expression. 204 NOTE: This proposal does not attempt to address the 205 "override" or "default" problem. (Also called 206 "delegation", where a feature set may be referenced and 207 selectively modified.) 209 3. Composite feature representation 211 This specification hinges on three central ideas: 213 o the use of auxiliary predicates (introduced in RFC 2533 [1]) to 214 form the basis of a feature set reference, and 216 o the use of a token based on a hash function computed over the 217 referenced feature set expression. 219 o the use of an expression containing a URI to indicate an 220 identifier and possibly a service for resolution of a composite 221 feature set. 223 A key reason to use a hash function to generate an identifier is to 224 define a global name space without requiring a central naming 225 authority. New feature set tags can be introduced by any party 226 following the appropriate rules of formulation, without reference 227 to any centralized authority. 229 Local resolution services may be needed to map feature set tags to 230 their corresponding feature set expressions, but these are not able 231 to vary the meaning of any given tag. Failure of a resolution 232 service to return the correct expression is detectable by a calling 233 application, which should reject any incorrect value supplied. 235 This memo also suggests that an expression containing a URI in the 236 format '' may be used to indicate a mechanism and location of 237 a service to perform feature set resolution. 239 NOTE: where a feature set reference is used, its meaning 240 is defined by substitution of the referenced feature 241 expression into the referencing expression. When all 242 references have been thus replaced, the result is 243 interpreted as a normal feature expression. 245 Internet Draft Identifying composite media features 246 In particular, if a referenced feature expression 247 contains some feature tag that is also constrained by the 248 referencing expression, the constraints are interpreted 249 per RFC 2533 [1], without regard for their origin. e.g. 250 (using some notation introduced below): 251 (& (pix-x=100) (pix-y<=300) (h.1234567890) ) 252 where (h.1234567890) resolves to: 253 (& (pix-x<=200) (pix-y<=150) ) 254 yields a result equivalent to: 255 (& (pix-x=100) (pix-y<=150) ) 257 3.1 Feature set hashed reference format 259 This specification introduces a special form of auxiliary predicate 260 name with the following syntax: 262 fname = "h." 1*BASE32DIGIT 263 BASE32DIGIT = DIGIT 264 / "A" / "B" / "C" / "D" / "E" / "F" / "G" / "H" 265 / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" 266 / "Q" / "R" / "S" / "T" / "U" / "V" 268 The sequence of base-32 digits represents the value of a hash 269 function calculated over the corresponding feature set expression 270 (see following sections). Note that the above syntax allows upper- 271 or lower- case letters for base-32 digits (per RFC 2234 [3]). 273 Thus, within a feature set expression, a hashed feature set 274 reference would have the following form: 276 (h.123456789abcdefghijklmnopq) 278 3.1.1 Hash value calculation 280 The hash value is calculated using the MD5 algorithm [6] over the 281 text of the referenced feature set expression subjected to certain 282 normalizations. The feature expression must conform to the syntax 283 given for 'filter' in RFC 2533 [1]: 285 filter = "(" filtercomp ")" *( ";" parameter ) 287 The steps for calculating a hash value are: 289 1. Whitespace normalization: all spaces, CR, LF, TAB and any other 290 layout control characters that may be embedded in the feature 291 expression string are removed (or ignored for the purpose of hash 292 value computation). 294 Internet Draft Identifying composite media features 296 2. Case normalization: all lower case letters in the feature 297 expression, other than those contained within quoted strings, are 298 converted to upper case. That is, unquoted characters with 299 values 97 to 122 (decimal) are changed to corresponding 300 characters in the range 65 to 90. 302 3. Hash computation: the MD5 algorithm, described in RFC 1321 [6], 303 is applied to the normalized feature expression string. 305 The result obtained in step 3 is a 128-bit (16 octet) value that is 306 converted to a base-32 representation to form the feature set 307 reference. 309 NOTE: under some circumstances, removal of ALL 310 whitespace may result in an invalid feature expression 311 string. This should not be a problem as this is done 312 only for the purpose of calculating a hash value, and 313 significantly different feature expressions are expected 314 to differ in ways other than their whitespace. 316 NOTE: case normalization is deemed appropriate since 317 feature tag and token matching is case insensitive. 319 3.1.2 Base-32 value representation 321 RFC 1321 [6] describes how to calculate an MD5 hash value that is a 322 sequence of 16 octets. This is then required to be coded as a 323 base-32 value, which is a sequence of base-32 digit characters. 325 Each successive character in a base-32 value represents 5 326 successive bits of the underlying octet sequence. Thus, each group 327 of 8 characters represents a sequence of 5 octets (40 bits): 329 1 2 3 330 01234567 89012345 67890123 45678901 23456789 331 +--------+--------+--------+--------+--------+ 332 |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >| 333 +--------+--------+--------+--------+--------+ 334 <===> 8th character 335 <====> 7th character 336 <===> 6th character 337 <====> 5th character 338 <====> 4th character 339 <===> 3rd character 340 <====> 2nd character 341 <===> 1st character 343 Internet Draft Identifying composite media features 345 The value (i.e. sequence of bits) represented by each base-32 digit 346 character is indicated by the following table: 347 "0" 0 "A" 10 "K" 20 "U" 30 348 "1" 1 "B" 11 "L" 21 "V" 31 349 "2" 2 "C" 12 "M" 22 350 "3" 3 "D" 13 "N" 23 351 "4" 4 "E" 14 "O" 24 352 "5" 5 "F" 15 "P" 25 353 "6" 6 "G" 16 "Q" 26 354 "7" 7 "H" 17 "R" 27 355 "8" 8 "I" 18 "S" 28 356 "9" 9 "J" 19 "T" 29 358 When encoding a base-32 value, each full group of 5 octets is 359 represented by a sequence of 8 characters indicated above. If a 360 group of less than 5 octets remain after this, they are encoded 361 using as many additional characters as may be needed: 1, 2, 3 or 4 362 octets are encoded by 2, 4, 5 or 7 characters respectively. Any 363 spare bits represented by the base-32 digit characters are selected 364 to be zero. 366 When decoding a base-32 value, the reverse mapping is applied: 367 each full group of 8 characters codes a sequence of 5 octets. A 368 final group of 2, 4, 5 or 7 characters codes a sequence of 1, 2, 3 369 or 4 octets respectively. Any spare bits represented by the final 370 group of characters are discarded. 372 Thus, for a 128-bit (16 octet) MD5 hash value, the first 15 octets 373 are coded as 24 base 32 digit characters, and the final octet is 374 coded by two characters. 376 NOTE: Base64 representation (per MIME [4]) would be more 377 compact (21 rather than 26 characters for the MD5 128-bit 378 hash value), but an auxiliary predicate name is defined 379 (by [1]) to have the same syntax as a feature tag, and 380 the feature tag matching rules (per [2]) state that 381 feature tag matching is case insensitive. 383 Base36 representation was considered (i.e. using all 384 letters "A"-"Z") but was not used because this would 385 require extended precision multiplication and division 386 operations to encode and decode the hash values. 388 Internet Draft Identifying composite media features 390 3.2 Feature set URI reference format 392 This section introduces a new form of feature set predicate by 393 extending the feature set syntax [1] as follows: 395 filter =/ "<" URI ">" *( ";" parameter ) 397 where 'URI' is described by "Uniform Resource Identifiers (URI): 398 Generic Syntax" [5]. 400 The meaning of this construct is defined to be the meaning of the 401 expression in which '' is replaced by a copy of the resource 402 indicated by 'URI'. The indicated resource is required to be a 403 text value containing a valid feature set expression, NOT itself 404 containing a '' reference. 406 If a '' reference is used within a feature expression that 407 defines a hash reference, then the hash value is calculated over 408 the expression obtained after the resource has been substituted. 410 Thus, the following are examples of feature set expressions using 411 URI references: 413 415 (& (dpi=100) ) 417 This specification does not indicate: 419 o any specific URI schemes to be supported, 421 o any meaning if the resource cannot be accessed, of if the value 422 obtained does not correspond to some recognized format. 424 These details must be indicated by the specification of any 425 application or protocol that relies upon this form of a feature 426 predicate. 428 If the URI uses characters other than a designated subset of US- 429 ASCII then those additional characters should be represented by a 430 sequence of US-ASCII characters allowed by RFC 2396 [5]. 432 NOTE: the syntax above allows a '' reference to 433 appear almost anywhere in a feature set expression. In 434 some contexts, the appearance of these references may be 435 restricted to specific locations (e.g. the right hand 436 side of a auxiliary feature predicate, as indicated in 437 section 3.3.1). The specification of any application or 438 protocol should fully describe any such restriction that 439 it may impose. 441 Internet Draft Identifying composite media features 443 3.3 Resolving feature set references 445 This memo does not mandate any particular mechanism for 446 defeferencing a feature set reference. It is expected that 447 specific dereferencing mechanisms will be specified for any 448 application or protocol that uses them. 450 The following sections describe some ways that feature set 451 dereferencing information may be incorporated into a feature set 452 expression. Both of these mechanisms are based on auxiliary 453 predicate definitions within a "where" clause [1]. 455 When a hashed feature set reference is used, conformance to the 456 hashing rules takes precedence over any other determination of the 457 feature expression. Any expression, however obtained, may not be 458 substituted for the hash-based reference unless it yields the 459 correct hash value. 461 3.3.1 URI reference 463 The two formats for feature set references described above may be 464 combined by defining the meaning of a hashed reference to be a URI- 465 based reference. For example: 467 (& (dpi=100) (h.1234567890) ) 468 where 469 (h.1234567890) :- 470 end 472 This indicates that the meaning of the hash-based form is contained 473 in the resource whose URI is given. In this case, an HTTP resource 474 retrieval is suggested. 476 The hash value used is calculated over the feature set expression 477 obtained by defererencing the URI form expression. 479 IMPORTANT: How a calling application processes a URI is 480 not specified here. 482 For URIs that are URLs, one reasonable approach would be 483 to use the URL scheme protocol to access the 484 corresponding feature set expression. But other 485 mechanisms might be used; e.g. protocols developed by 486 the IETF resource capability (RESCAP) working group. In 487 any case, any mechanism used must be specified by an 488 application or protocol that uses URI references in this 489 way. 491 Internet Draft Identifying composite media features 493 When a hashed feature set reference is resolved using a URI value, 494 the retrieving program should use the feature expression thus 495 obtained only if it hashes to the correct value. 497 3.3.2 Inline feature set details 499 In this case, a reference is resolved by including its definition 500 inline in an expression. 502 The feature set expression associated with a reference value may be 503 specified directly in a "where" clause, using the auxiliary 504 predicate definition syntax [1]; e.g. 506 (& (dpi=100) (h.1234567890) ) 507 where 508 (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) 509 end 511 This form might be used on request (where the request mechanism is 512 defined by the invoking application protocol), or when the 513 originator believes the recipient may not understand the reference. 515 It is an error if the inline feature expression does not yield the 516 hash value contained in auxiliary predicate name. 518 NOTE: viewed in isolation, this format does not have any 519 obvious value, in that the (h.xxx) form of auxiliary 520 predicate could be replaced by any arbitrary name. 522 It is anticipated that this form might be used as a 523 follow-up response in a sequence along the lines of: 524 A> Capabilities are: 525 (& (dpi=100) (h.1234567890) ) 526 B> Do not understand: 527 (h.1234567890) 528 A> Capabilities are: 529 (& (dpi=100) (h.1234567890) ) 530 where 531 (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) ) 532 end 534 Internet Draft Identifying composite media features 536 4. Examples 538 The following are some examples of feature set expressions 539 containing feature set references: 541 (& (dpi=100) (h.1234567890abcdefghijklmnop) ) 543 (& (dpi=100) 544 ) 546 (& (dpi=100) (h.1234567890abcdefghijklmnop) ) 547 where 548 (h.1234567890abcdefghijklmnop) :- 549 550 end 552 5. Internationalization considerations 554 Feature set expressions and URI strings are currently defined to 555 consist of only characters from the US-ASCII repertoire [1,5]; 556 under these circumstances this specification is not impacted by 557 internationalization considerations (other than any already 558 applicable to URIs [5]). 560 But, if future revisions of the feature set syntax permit non-US- 561 ASCII characters (e.g. within quoted strings), then some canonical 562 representation must be defined for the purposes of calculating hash 563 values. One choice might be to use a UTF-8 equivalent 564 representation as the basis for calculating the feature set hash. 565 Another choice might be to leave this as an application protocol 566 issue (but this could lead to non-interoperable feature sets 567 between different protocols). 569 Another conceivable issue is that of up-casing the feature 570 expression in preparation for computing a hash value. This does 571 not apply to the content of strings so is not likely to be an 572 issue. But if changes are made that do permit non-US-ASCII 573 characters in feature tags or token strings, consideration must be 574 given to properly defining how case conversion is to be performed. 576 6. Security considerations 578 For the most part, security considerations are the same as those 579 that apply for capability identification in general [1,2,9]. 581 A possible added consideration is that use of a specific feature 582 set tag may reveal more information about a system than is 583 necessary for a transaction at hand. 585 Internet Draft Identifying composite media features 587 7. Full copyright statement 589 Copyright (C) The Internet Society 1999. All Rights Reserved. 591 This document and translations of it may be copied and furnished to 592 others, and derivative works that comment on or otherwise explain 593 it or assist in its implementation may be prepared, copied, 594 published and distributed, in whole or in part, without restriction 595 of any kind, provided that the above copyright notice and this 596 paragraph are included on all such copies and derivative works. 597 However, this document itself may not be modified in any way, such 598 as by removing the copyright notice or references to the Internet 599 Society or other Internet organizations, except as needed for the 600 purpose of developing Internet standards in which case the 601 procedures for copyrights defined in the Internet Standards process 602 must be followed, or as required to translate it into languages 603 other than English. 605 The limited permissions granted above are perpetual and will not be 606 revoked by the Internet Society or its successors or assigns. 608 This document and the information contained herein is provided on 609 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 610 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 611 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 612 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 613 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 615 8. Acknowledgements 617 Much of the initial work for URI references to feature sets was 618 provided by Bill Newman. Some of the ideas here have been improved 619 by early discussions with Martin Duerst, Al Gilman and Ted Hardie. 620 Useful suggestions for improvement were provided by Maurizio 621 Codogno. 623 9. References 625 [1] RFC 2533, "A syntax for describing media feature sets" 626 Graham Klyne, 5GM/Content Technologies 627 March 1999. 629 [2] RFC 2506, "Media Feature Tag Registration Procedure" 630 Koen Holtman, TUE 631 Andrew Mutz, Hewlett-Packard 632 Ted Hardie, Equinix 633 March 1999. 635 Internet Draft Identifying composite media features 637 [3] RFC 2234, "Augmented BNF for Syntax Specifications: ABNF" 638 D. Crocker (editor), Internet Mail Consortium 639 P. Overell, Demon Internet Ltd. 640 November 1997. 642 [4] RFC 2045, "Multipurpose Internet Mail Extensions (MIME) 643 Part 1: Format of Internet message bodies" 644 N. Freed, Innosoft 645 N. Borenstein, First Virtual 646 November 1996. 648 [5] RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax", 649 Tim Berners-Lee, World Wide Web Consortium/MIT 650 Roy T. Fielding, University of California, Irvine 651 Larry Masinter, Xerox PARC 652 August 1998. 654 [6] RFC 1321, "The MD5 Message-Digest Algorithm", 655 R. Rivest, MIT Laboratory for Computer Science and RSA Data 656 Security, Inc., 657 April 1992. 659 [7] "Applied Cryptography" 660 Bruce Schneier 661 John Wiley and Sons, 1996 (second edition) 662 ISBN 0-471-12845-7 (cloth) 663 ISBN 0-471-11709-9 (paper) 665 [8] "Protocol-independent content negotiation framework" 666 Graham Klyne, 5GM/Content Technologies 667 Internet draft: 668 Work in progress, March 1999. 670 [9] "Numerical Recipes" 671 William H Press, Brian P Flannery, Saul A Teukolski and William T 672 Vetterling 673 Cambridge University Press (1986) 674 ISBN 0 521 30811 9 675 (The Gamma function approximation is presented in chapter 6 on 676 "Special Functions". There have been several later editions of 677 this book published, so the chapter reference may change.) 679 Internet Draft Identifying composite media features 681 10. Authors' addresses 683 Graham Klyne 684 5th Generation Messaging Ltd. Content Technologies Ltd. 685 5 Watlington Street Forum 1, Station Road 686 Nettlebed Theale 687 Henley-on-Thames, RG9 5AB Reading, RG7 4RA 688 United Kingdom United Kingdom. 689 Telephone: +44 1491 641 641 +44 118 930 1300 690 Facsimile: +44 1491 641 611 +44 118 930 1301 691 E-mail: GK@ACM.ORG 693 Larry Masinter 694 Xerox Corporation 695 3333 Coyote Hill Road 696 Palo Alto, CA 94304 697 Facsimile: +1 650 812 4333 698 EMail: masinter@parc.xerox.com 699 http://www.parc.xerox.com/masinter 701 Appendix A The birthday problem 703 NOTE: this entire section is commentary, and does not 704 affect the feature set reference specification in any 705 way. 707 The use of a hash value to represent an arbitrary feature set is 708 based on a presumption that no two distinct feature sets will yield 709 the same hash value. 711 There is a small but distinct possibility that two different 712 feature sets will indeed yield the same hash value. 714 We assume that the 128-bit hash function distributes hash values 715 for feature sets, even those with very small differences, randomly 716 and evenly through the range of 2^128 (approximately 3*10^38) 717 possible values. This is a fundamental property of a good digest 718 algorithm like MD5. Thus, the chance that any two distinct feature 719 set expressions yield the same hash is less than 1 in 10^38. This 720 is negligible when compared with, say, the probability that a 721 receiving system will fail having received data conforming to a 722 negotiated feature set. 724 Internet Draft Identifying composite media features 725 But when the number of distinct feature sets in circulation 726 increases, the probability of repeating a hash value increases 727 surprisingly. This is illustrated by the "birthday paradox": 728 given a random collection of just 23 people, there is a greater 729 than even chance that there exists some pair with the same birthay. 730 This topic is discussed further in sections 7.4 and 7.5 of Bruce 731 Schneier's "Applied Cryptography" [7]. 733 The table below shows the "birthday paradox" probabilities that at 734 least one pair of feature sets has the same hash value for 735 different numbers of feature sets in use. 737 Number of feature Probability of two 738 sets in use sets with the same 739 hash value 740 1 0 741 2 3E-39 742 10 1E-37 743 1E3 1E-33 744 1E6 1E-27 745 1E9 1E-21 746 1E12 1E-15 747 1E15 1E-9 748 1E18 1E-3 750 The above probability computations are approximate, being 751 performed using logarithms of a Gamma function 752 approximation by Lanczos [9]. The probability formula is 753 'P=1-(m!/((m-n)! m^n))', where 'm' is the total number of 754 possible hash values (2^128) and 'n' is the number of 755 feature sets in use. 757 If original feature set expressions are generated manually, or only 758 in response to some manually constrained process, the total number 759 of feature sets in circulation is likely to remain very small in 760 relation to the total number of possible hash values. 762 The outcome of all this is: assuming that the feature sets are 763 manually generated, even taking account of the birthday paradox 764 effect, the probability of incorrectly identifying a feature set 765 using a hash value is still negligibly small when compared with 766 other possible failure modes. 768 Internet Draft Identifying composite media features 770 Appendix B: Revision history 772 [[[RFC editor: please remove this section on publication]]] 774 00a 10-Feb-1999 Initial draft. 776 01a 16-Feb-1999 Added pointers to mailing list for discussion. 778 01b 25-Mar-1999 Name all authors. Add some terms to the glossary. 779 Expand on meaning of URI tag used as auxiliary 780 predicate name. Update references. Rework 781 section 3 to deal more evenly with both hash and 782 URI based feature set references. State absolute 783 requirement for hash-based references to be 784 resolved to expressions that yield the correct 785 hash value. 787 01c 06-Apr-1999 Define form of URI reference using new '<...>' 788 syntax, and adjust other text accordingly. 790 01d 06-Apr-1999 Editorial revisions. Include values in table of 791 probabilities for hash value clashes. Remove 792 discussion of algebraic simplification of hash 793 references. Correct syntax of some examples. 795 02a 16-Jun-1999 Move birthday problem to an appendix. Remove 796 RESCAP citation. Use base-32 to represent feature 797 hashes; describe base-32 encoding. 799 02b 16-Jun-1999 Add note that the form of feature reference 800 may not be allowed at arbitrary locations in all 801 contexts. 803 03a 19-Jul-1999 Incorporate review comments. Added some text to 804 emphasize applicability of RFC 2533 rules for 805 interpretation and processing of feature 806 expressions.