idnits 2.17.1 draft-ietf-iri-comparison-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 14, 2011) is 4611 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 515, but no explicit reference was found in the text == Unused Reference: 'RFC3490' is defined on line 518, but no explicit reference was found in the text == Unused Reference: 'RFC3491' is defined on line 522, but no explicit reference was found in the text == Unused Reference: 'RFC3629' is defined on line 526, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIV6' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Duplicate reference: RFC3987, mentioned in 'RFC3987', was also mentioned in 'RFC3987bis'. Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internationalized Resource Identifiers L. Masinter 3 (iri) Adobe 4 Internet-Draft M. Duerst 5 Intended status: Standards Track Aoyama Gakuin University 6 Expires: February 15, 2012 August 14, 2011 8 Equivalence and Canonicalization of Internationalized Resource 9 Identifiers (IRIs) 10 draft-ietf-iri-comparison-00 12 Abstract 14 Internationalized Resource Identifiers (IRIs) are unicode strings 15 used to identify resources on the Internet. Applications that use 16 IRIs often define a means of comparing two IRIs to determine when two 17 IRIs are equivalent for the purpose of that application. Some 18 applications also define a method for 'canonicalizing' or 19 'normalizing' an IRI -- translating one IRI into another which is 20 equivalent under the comparison method used. 22 This document gives guidelines and best practices for defining and 23 using IRI comparison, equivalence, normalization and canonicalization 24 methods. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on February 15, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 This document may contain material from IETF Documents or IETF 59 Contributions published or made publicly available before November 60 10, 2008. The person(s) controlling the copyright in some of this 61 material may not have granted the IETF Trust the right to allow 62 modifications of such material outside the IETF Standards Process. 63 Without obtaining an adequate license from the person(s) controlling 64 the copyright in such materials, this document may not be modified 65 outside the IETF Standards Process, and derivative works of it may 66 not be created outside the IETF Standards Process, except to format 67 it for publication as an RFC or to translate it into languages other 68 than English. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 2. Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 3. Preparation for Comparison . . . . . . . . . . . . . . . . . . 6 75 4. Comparison Ladder . . . . . . . . . . . . . . . . . . . . . . 6 76 4.1. Simple String Comparison . . . . . . . . . . . . . . . . . 7 77 4.2. Syntax-Based Normalization . . . . . . . . . . . . . . . . 8 78 4.2.1. Case Normalization . . . . . . . . . . . . . . . . . . 8 79 4.2.2. Character Normalization . . . . . . . . . . . . . . . 8 80 4.2.3. Percent-Encoding Normalization . . . . . . . . . . . . 10 81 4.2.4. Path Segment Normalization . . . . . . . . . . . . . . 10 82 4.3. Scheme-Based Normalization . . . . . . . . . . . . . . . . 10 83 4.4. Protocol-Based Normalization . . . . . . . . . . . . . . . 12 84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 85 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 87 7.1. Normative References . . . . . . . . . . . . . . . . . . . 13 88 7.2. Informative References . . . . . . . . . . . . . . . . . . 13 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 91 1. Introduction 93 Internationalized Resource Identifiers (IRIs) are unicode strings 94 used to identify resources on the Internet. Applications that use 95 IRIs often define a means of comparing two IRIs to determine when two 96 IRIs are equivalent for the purpose of that application. Some 97 applications also define a method for 'canonicalizing' or 98 'normalizing' an IRI -- translating one IRI into another which is 99 equivalent under the comparison method used. 101 This document gives guidelines and best practices for defining and 102 using IRI comparison, equivalence, normalization and canonicalization 103 methods. 105 Things to do: 107 o Introductory section on comparison, equivalence, normalization and 108 canonicalization. 110 o Verify acknowledgements for this component. 112 o Verify cross-references from other documents. 114 o Consider making 4395bis reference this document and recommend 115 scheme definitions describe equivalence specifically. 117 o Consider making this document 'update' 3986 in order to resolve 118 which one is normative if there are conflicts. 120 o alternatively? Consider making this document BCP rather than 121 standards track, since it basically gives guidance for protocols 122 and applications needing equivalence, and doesn't directly have a 123 scope of application? 125 o Distingish between IRIs as sequence-of-unicode characters and 126 presentations of IRIs. 128 o Should we insist that percent-hex encoding equivalence of non- 129 reserved characters MUST be always used if there is any 130 equivalence at all? 132 o Update security considerations to describe security concerns 133 specific to comparison. 135 o Consider making sections talk about 'equivalent' rather than 136 'normalization' where appropriate. 138 One of the most common operations on IRIs is simple comparison: 140 Determining whether two IRIs are equivalent, without using the IRIs 141 to access their respective resource(s). A comparison is performed 142 whenever a response cache is accessed, a browser checks its history 143 to color a link, or an XML parser processes tags within a namespace. 144 Extensive normalization prior to comparison of IRIs may be used by 145 spiders and indexing engines to prune a search space or reduce 146 duplication of request actions and response storage. 148 IRI comparison is performed for some particular purpose. Protocols 149 or implementations that compare IRIs for different purposes will 150 often be subject to differing design trade-offs in regards to how 151 much effort should be spent in reducing aliased identifiers. This 152 document describes various methods that may be used to compare IRIs, 153 the trade-offs between them, and the types of applications that might 154 use them. 156 2. Equivalence 158 Because IRIs exist to identify resources, presumably they should be 159 considered equivalent when they identify the same resource. However, 160 this definition of equivalence is not of much practical use, as there 161 is no way for an implementation to compare two resources to determine 162 if they are "the same" unless it has full knowledge or control of 163 them. For this reason, determination of equivalence or difference of 164 IRIs is based on string comparison, perhaps augmented by reference to 165 additional rules provided by URI scheme definitions. We use the 166 terms "different" and "equivalent" to describe the possible outcomes 167 of such comparisons, but there are many application-dependent 168 versions of equivalence. 170 Even when it is possible to determine that two IRIs are equivalent, 171 IRI comparison is not sufficient to determine whether two IRIs 172 identify different resources. For example, an owner of two different 173 domain names could decide to serve the same resource from both, 174 resulting in two different IRIs. Therefore, comparison methods are 175 designed to minimize false negatives while strictly avoiding false 176 positives. 178 In testing for equivalence, applications should not directly compare 179 relative references; the references should be converted to their 180 respective target IRIs before comparison. When IRIs are compared to 181 select (or avoid) a network action, such as retrieval of a 182 representation, fragment components (if any) MUST be excluded from 183 the comparison. 185 Applications using IRIs as identity tokens with no relationship to a 186 protocol MUST use the Simple String Comparison (see Section 4.1). 188 All other applications MUST select one of the comparison practices 189 from the Comparison Ladder (see Section 4. 191 3. Preparation for Comparison 193 Any kind of IRI comparison REQUIRES that any additional contextual 194 processing is first performed, including undoing higher-level 195 escapings or encodings in the protocol or format that carries an IRI. 196 This preprocessing is usually done when the protocol or format is 197 parsed. 199 Examples of such escapings or encodings are entities and numeric 200 character references in [HTML4] and [XML1]. As an example, 201 "http://example.org/rosé" (in HTML), 202 "http://example.org/rosé" (in HTML or XML), and 203 "http://example.org/rosé" (in HTML or XML) are all resolved into 204 what is denoted in this document (see 'Notation' section of 205 [RFC3987bis]) as "http://example.org/rosé" (the "é" here 206 standing for the actual e-acute character, to compensate for the fact 207 that this document cannot contain non-ASCII characters). 209 Similar considerations apply to encodings such as Transfer Codings in 210 HTTP (see [RFC2616]) and Content Transfer Encodings in MIME 211 ([RFC2045]), although in these cases, the encoding is based not on 212 characters but on octets, and additional care is required to make 213 sure that characters, and not just arbitrary octets, are compared 214 (see Section 4.1). 216 4. Comparison Ladder 218 In practice, a variety of methods are used to test IRI equivalence. 219 These methods fall into a range distinguished by the amount of 220 processing required and the degree to which the probability of false 221 negatives is reduced. As noted above, false negatives cannot be 222 eliminated. In practice, their probability can be reduced, but this 223 reduction requires more processing and is not cost-effective for all 224 applications. 226 If this range of comparison practices is considered as a ladder, the 227 following discussion will climb the ladder, starting with practices 228 that are cheap but have a relatively higher chance of producing false 229 negatives, and proceeding to those that have higher computational 230 cost and lower risk of false negatives. 232 4.1. Simple String Comparison 234 If two IRIs, when considered as character strings, are identical, 235 then it is safe to conclude that they are equivalent. This type of 236 equivalence test has very low computational cost and is in wide use 237 in a variety of applications, particularly in the domain of parsing. 238 It is also used when a definitive answer to the question of IRI 239 equivalence is needed that is independent of the scheme used and that 240 can be calculated quickly and without accessing a network. An 241 example of such a case is XML Namespaces ([XMLNamespace]). 243 Testing strings for equivalence requires some basic precautions. 244 This procedure is often referred to as "bit-for-bit" or "byte-for- 245 byte" comparison, which is potentially misleading. Testing strings 246 for equality is normally based on pair comparison of the characters 247 that make up the strings, starting from the first and proceeding 248 until both strings are exhausted and all characters are found to be 249 equal, until a pair of characters compares unequal, or until one of 250 the strings is exhausted before the other. 252 This character comparison requires that each pair of characters be 253 put in comparable encoding form. For example, should one IRI be 254 stored in a byte array in UTF-8 encoding form and the second in a 255 UTF-16 encoding form, bit-for-bit comparisons applied naively will 256 produce errors. It is better to speak of equality on a character- 257 for-character rather than on a byte-for-byte or bit-for-bit basis. 258 In practical terms, character-by-character comparisons should be done 259 codepoint by codepoint after conversion to a common character 260 encoding form. When comparing character by character, the comparison 261 function MUST NOT map IRIs to URIs, because such a mapping would 262 create additional spurious equivalences. It follows that an IRI 263 SHOULD NOT be modified when being transported if there is any chance 264 that this IRI might be used in a context that uses Simple String 265 Comparison. 267 False negatives are caused by the production and use of IRI aliases. 268 Unnecessary aliases can be reduced, regardless of the comparison 269 method, by consistently providing IRI references in an already 270 normalized form (i.e., a form identical to what would be produced 271 after normalization is applied, as described below). Protocols and 272 data formats often limit some IRI comparisons to simple string 273 comparison, based on the theory that people and implementations will, 274 in their own best interest, be consistent in providing IRI 275 references, or at least be consistent enough to negate any efficiency 276 that might be obtained from further normalization. 278 4.2. Syntax-Based Normalization 280 Implementations may use logic based on the definitions provided by 281 this specification to reduce the probability of false negatives. 282 This processing is moderately higher in cost than character-for- 283 character string comparison. For example, an application using this 284 approach could reasonably consider the following two IRIs equivalent: 286 example://a/b/c/%7Bfoo%7D/rosé 287 eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9 289 Web user agents, such as browsers, typically apply this type of IRI 290 normalization when determining whether a cached response is 291 available. Syntax-based normalization includes such techniques as 292 case normalization, character normalization, percent-encoding 293 normalization, and removal of dot-segments. 295 4.2.1. Case Normalization 297 For all IRIs, the hexadecimal digits within a percent-encoding 298 triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore 299 should be normalized to use uppercase letters for the digits A-F. 301 When an IRI uses components of the generic syntax, the component 302 syntax equivalence rules always apply; namely, that the scheme and 303 US-ASCII only host are case insensitive and therefore should be 304 normalized to lowercase. For example, the URI 305 "HTTP://www.EXAMPLE.com/" is equivalent to "http://www.example.com/". 306 Case equivalence for non-ASCII characters in IRI components that are 307 IDNs are discussed in Section 4.3. The other generic syntax 308 components are assumed to be case sensitive unless specifically 309 defined otherwise by the scheme. 311 Creating schemes that allow case-insensitive syntax components 312 containing non-ASCII characters should be avoided. Case 313 normalization of non-ASCII characters can be culturally dependent and 314 is always a complex operation. The only exception concerns non-ASCII 315 host names for which the character normalization includes a mapping 316 step derived from case folding. 318 4.2.2. Character Normalization 320 The Unicode Standard [UNIV6] defines various equivalences between 321 sequences of characters for various purposes. Unicode Standard Annex 322 #15 [UTR15] defines various Normalization Forms for these 323 equivalences, in particular Normalization Form C (NFC, Canonical 324 Decomposition, followed by Canonical Composition) and Normalization 325 Form KC (NFKC, Compatibility Decomposition, followed by Canonical 326 Composition). 328 IRIs already in Unicode MUST NOT be normalized before parsing or 329 interpreting. In many non-Unicode character encodings, some text 330 cannot be represented directly. For example, the word "Vietnam" is 331 natively written "Việt Nam" (containing a LATIN SMALL LETTER E 332 WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct transcoding from 333 the windows-1258 character encoding leads to "Việt Nam" 334 (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX followed by a 335 COMBINING DOT BELOW). Direct transcoding of other 8-bit encodings of 336 Vietnamese may lead to other representations. 338 Equivalence of IRIs MUST rely on the assumption that IRIs are 339 appropriately pre-character-normalized rather than apply character 340 normalization when comparing two IRIs. The exceptions are conversion 341 from a non-digital form, and conversion from a non-UCS-based 342 character encoding to a UCS-based character encoding. In these 343 cases, NFC or a normalizing transcoder using NFC MUST be used for 344 interoperability. To avoid false negatives and problems with 345 transcoding, IRIs SHOULD be created by using NFC. Using NFKC may 346 avoid even more problems; for example, by choosing half-width Latin 347 letters instead of full-width ones, and full-width instead of half- 348 width Katakana. 350 As an example, "http://www.example.org/résumé.html" (in XML 351 Notation) is in NFC. On the other hand, 352 "http://www.example.org/résumé.html" is not in NFC. 354 The former uses precombined e-acute characters, and the latter uses 355 "e" characters followed by combining acute accents. Both usages are 356 defined as canonically equivalent in [UNIV6]. 358 Note: Because it is unknown how a particular sequence of characters 359 is being treated with respect to character normalization, it would 360 be inappropriate to allow third parties to normalize an IRI 361 arbitrarily. This does not contradict the recommendation that 362 when a resource is created, its IRI should be as character 363 normalized as possible (i.e., NFC or even NFKC). This is similar 364 to the uppercase/lowercase problems. Some parts of a URI are case 365 insensitive (for example, the domain name). For others, it is 366 unclear whether they are case sensitive, case insensitive, or 367 something in between (e.g., case sensitive, but with a multiple 368 choice selection if the wrong case is used, instead of a direct 369 negative result). The best recipe is that the creator use a 370 reasonable capitalization and, when transferring the URI, 371 capitalization never be changed. 373 Various IRI schemes may allow the usage of Internationalized Domain 374 Names (IDN) [RFC5890] either in the ireg-name part or elsewhere. 375 Character Normalization also applies to IDNs, as discussed in 376 Section 4.3. 378 4.2.3. Percent-Encoding Normalization 380 The percent-encoding mechanism (Section 2.1 of [RFC3986]) is a 381 frequent source of variance among otherwise identical IRIs. In 382 addition to the case normalization issue noted above, some IRI 383 producers percent-encode octets that do not require percent-encoding, 384 resulting in IRIs that are equivalent to their nonencoded 385 counterparts. These IRIs should be normalized by decoding any 386 percent-encoded octet sequence that corresponds to an unreserved 387 character, as described in section 2.3 of [RFC3986]. 389 For actual resolution, differences in percent-encoding (except for 390 the percent-encoding of reserved characters) MUST always result in 391 the same resource. For example, "http://example.org/~user", 392 "http://example.org/%7euser", and "http://example.org/%7Euser", must 393 resolve to the same resource. 395 If this kind of equivalence is to be tested, the percent-encoding of 396 both IRIs to be compared has to be aligned; for example, by 397 converting both IRIs to URIs (see Section 3.1), eliminating escape 398 differences in the resulting URIs, and making sure that the case of 399 the hexadecimal characters in the percent-encoding is always the same 400 (preferably upper case). If the IRI is to be passed to another 401 application or used further in some other way, its original form MUST 402 be preserved. The conversion described here should be performed only 403 for local comparison. 405 4.2.4. Path Segment Normalization 407 The complete path segments "." and ".." are intended only for use 408 within relative references (Section 4.1 of [RFC3986]) and are removed 409 as part of the reference resolution process (Section 5.2 of 410 [RFC3986]). However, some implementations may incorrectly assume 411 that reference resolution is not necessary when the reference is 412 already an IRI, and thus fail to remove dot-segments when they occur 413 in non-relative paths. IRI normalizers should remove dot-segments by 414 applying the remove_dot_segments algorithm to the path, as described 415 in Section 5.2.4 of [RFC3986]. 417 4.3. Scheme-Based Normalization 419 The syntax and semantics of IRIs vary from scheme to scheme, as 420 described by the defining specification for each scheme. 421 Implementations may use scheme-specific rules, at further processing 422 cost, to reduce the probability of false negatives. For example, 423 because the "http" scheme makes use of an authority component, has a 424 default port of "80", and defines an empty path to be equivalent to 425 "/", the following four IRIs are equivalent: 427 http://example.com 428 http://example.com/ 429 http://example.com:/ 430 http://example.com:80/ 432 In general, an IRI that uses the generic syntax for authority with an 433 empty path should be normalized to a path of "/". Likewise, an 434 explicit ":port", for which the port is empty or the default for the 435 scheme, is equivalent to one where the port and its ":" delimiter are 436 elided and thus should be removed by scheme-based normalization. For 437 example, the second IRI above is the normal form for the "http" 438 scheme. 440 Another case where normalization varies by scheme is in the handling 441 of an empty authority component or empty host subcomponent. For many 442 scheme specifications, an empty authority or host is considered an 443 error; for others, it is considered equivalent to "localhost" or the 444 end-user's host. When a scheme defines a default for authority and 445 an IRI reference to that default is desired, the reference should be 446 normalized to an empty authority for the sake of uniformity, brevity, 447 and internationalization. If, however, either the userinfo or port 448 subcomponents are non-empty, then the host should be given explicitly 449 even if it matches the default. 451 Normalization should not remove delimiters when their associated 452 component is empty unless it is licensed to do so by the scheme 453 specification. For example, the IRI "http://example.com/?" cannot be 454 assumed to be equivalent to any of the examples above. Likewise, the 455 presence or absence of delimiters within a userinfo subcomponent is 456 usually significant to its interpretation. The fragment component is 457 not subject to any scheme-based normalization; thus, two IRIs that 458 differ only by the suffix "#" are considered different regardless of 459 the scheme. 461 Some IRI schemes allow the usage of Internationalized Domain Names 462 (IDN) [RFC5890] either in their ireg-name part or elswhere. When in 463 use in IRIs, those names SHOULD conform to the definition of U-Label 464 in [RFC5890]. An IRI containing an invalid IDN cannot successfully 465 be resolved. For legibility purposes, they SHOULD NOT be converted 466 into ASCII Compatible Encoding (ACE). 468 Scheme-based normalization may also consider IDN components and their 469 conversions to punycode as equivalent. As an example, 470 "http://résumé.example.org" may be considered equivalent to 471 "http://xn--rsum-bpad.example.org". 473 Other scheme-specific normalizations are possible. 475 4.4. Protocol-Based Normalization 477 Substantial effort to reduce the incidence of false negatives is 478 often cost-effective for web spiders. Consequently, they implement 479 even more aggressive techniques in IRI comparison. For example, if 480 they observe that an IRI such as 482 http://example.com/data 484 redirects to an IRI differing only in the trailing slash 486 http://example.com/data/ 488 they will likely regard the two as equivalent in the future. This 489 kind of technique is only appropriate when equivalence is clearly 490 indicated by both the result of accessing the resources and the 491 common conventions of their scheme's dereference algorithm (in this 492 case, use of redirection by HTTP origin servers to avoid problems 493 with relative references). 495 5. Security Considerations 497 The primary security difficulty comes from applications choosing the 498 wrong equivalence relationship, or two different parties disagreeing 499 on equivalence. This is especially a problem when IRIs are used in 500 security protocols. 502 Besides the large character repertoire of Unicode, reasons for 503 confusion include different forms of normalization and different 504 normalization expectations, use of percent-encoding with various 505 legacy encodings, and bidirectionality issues. See also [UTR36]. 507 6. Acknowledgements 509 This document was originally derived from [RFC3986] and [RFC3987], 510 based on text contributed by Tim Bray. 512 7. References 513 7.1. Normative References 515 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 516 Requirement Levels", BCP 14, RFC 2119, March 1997. 518 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 519 "Internationalizing Domain Names in Applications (IDNA)", 520 RFC 3490, March 2003. 522 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 523 Profile for Internationalized Domain Names (IDN)", 524 RFC 3491, March 2003. 526 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 527 10646", STD 63, RFC 3629, November 2003. 529 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 530 Resource Identifier (URI): Generic Syntax", STD 66, 531 RFC 3986, January 2005. 533 [RFC3987bis] 534 Duerst, M., Masinter, L., and M. Suignard, 535 "Internationalized Resource Identifiers (IRIs)", 2011, 536 . 538 [RFC5890] Klensin, J., "Internationalized Domain Names for 539 Applications (IDNA): Definitions and Document Framework", 540 RFC 5890, August 2010. 542 [UNIV6] The Unicode Consortium, "The Unicode Standard, Version 543 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, 544 ISBN 978-1-936213-01-6)", October 2010. 546 [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", 547 Unicode Standard Annex #15, March 2008, 548 . 551 7.2. Informative References 553 [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 554 Specification", World Wide Web Consortium Recommendation, 555 December 1999, 556 . 558 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 559 Extensions (MIME) Part One: Format of Internet Message 560 Bodies", RFC 2045, November 1996. 562 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 563 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 564 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 566 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 567 Identifiers (IRIs)", RFC 3987, January 2005. 569 [UTR36] Davis, M. and M. Suignard, "Unicode Security 570 Considerations", Unicode Technical Report #36, 571 August 2010, . 573 [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and 574 F. Yergeau, "Extensible Markup Language (XML) 1.0 (Forth 575 Edition)", World Wide Web Consortium Recommendation, 576 August 2006, . 578 [XMLNamespace] 579 Bray, T., Hollander, D., Layman, A., and R. Tobin, 580 "Namespaces in XML (Second Edition)", World Wide Web 581 Consortium Recommendation, August 2006, 582 . 584 Authors' Addresses 586 Larry Masinter 587 Adobe 588 345 Park Ave 589 San Jose, CA 95110 590 U.S.A. 592 Phone: +1-408-536-3024 593 Email: masinter@adobe.com 594 URI: http://larry.masinter.net 596 Martin Duerst 597 Aoyama Gakuin University 598 5-10-1 Fuchinobe 599 Sagamihara, Kanagawa 229-8558 600 Japan 602 Phone: +81 42 759 6329 603 Fax: +81 42 759 6495 604 Email: duerst@it.aoyama.ac.jp 605 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/