idnits 2.17.1 draft-ietf-iri-comparison-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) -- The draft header indicates that this document updates RFC3986, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC3986, updated by this document, for RFC5378 checks: 2002-11-01) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 23, 2012) is 4196 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 505, but no explicit reference was found in the text == Unused Reference: 'RFC3490' is defined on line 508, but no explicit reference was found in the text == Unused Reference: 'RFC3491' is defined on line 512, but no explicit reference was found in the text == Unused Reference: 'RFC3629' is defined on line 516, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIV6' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Duplicate reference: RFC3987, mentioned in 'RFC3987', was also mentioned in 'RFC3987bis'. Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internationalized Resource Identifiers L. Masinter 3 (iri) Adobe 4 Internet-Draft M. Duerst 5 Updates: 3986 (if approved) Aoyama Gakuin University 6 Intended status: Standards Track October 23, 2012 7 Expires: April 26, 2013 9 Comparison, Equivalence and Canonicalization of Internationalized 10 Resource Identifiers 11 draft-ietf-iri-comparison-02 13 Abstract 15 Internationalized Resource Identifiers (IRIs) are Unicode strings 16 used to identify resources on the Internet. Applications that use 17 IRIs often define a means of comparing IRIs to determine when two 18 IRIs are equivalent for the purpose of that application. Some 19 applications also define a method for canonicalizing an IRI -- 20 translating one IRI into another which is equivalent under the 21 comparison method used. 23 This document gives guidelines and best practices for defining and 24 using IRI comparison and canonicalization methods. 26 Comparison methods are used to determine equivalence. As URIs are a 27 subset of IRIs, the guidelines apply to URI comparison as well. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 26, 2013. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 2. General guidelines . . . . . . . . . . . . . . . . . . . . . . 4 77 3. Preparation for Comparison . . . . . . . . . . . . . . . . . . 5 78 4. Comparison Hierarchy . . . . . . . . . . . . . . . . . . . . . 6 79 4.1. Simple String Comparison . . . . . . . . . . . . . . . . . 6 80 4.2. Syntax-Based Equivalence . . . . . . . . . . . . . . . . . 7 81 4.2.1. Case Equivalence . . . . . . . . . . . . . . . . . . . 8 82 4.2.2. Unicode Character Normalization . . . . . . . . . . . 8 83 4.2.3. Percent-Encoding Equivalence . . . . . . . . . . . . . 9 84 4.2.4. Path Segment Equivalence . . . . . . . . . . . . . . . 10 85 4.3. Scheme-Based Comparison . . . . . . . . . . . . . . . . . 10 86 4.4. Protocol-Based Comparison . . . . . . . . . . . . . . . . 11 87 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 88 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 89 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 90 7.1. Normative References . . . . . . . . . . . . . . . . . . . 12 91 7.2. Informative References . . . . . . . . . . . . . . . . . . 13 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 94 1. Introduction 96 Internationalized Resource Identifiers (IRIs) are Unicode strings 97 used to identify resources on the Internet. Applications that use 98 IRIs often define a means of comparing IRIs to determine when two 99 IRIs are equivalent for the purpose of that application. Some 100 applications also define a method for canonicalizing an IRI -- 101 translating one IRI into another which is equivalent under the 102 comparison method used. 104 This document gives guidelines and best practices for defining and 105 using IRI comparison and canonicalization methods. 107 As every URI is also an IRI, the comparison and canonicalization 108 methods also apply to URIs. 110 IRI comparison is expected to determine whether two IRIs are 111 equivalent without using the IRIs to access their respective 112 resource(s). For example, comparisons are performed whenever a 113 response cache is accessed, a browser checks its history to color a 114 link, or an XML parser processes tags within a namespace. 116 Comparison for equivalence is often accomplished by canonicalization: 117 (sometimes called normalization): a process for converting data that 118 has more than one possible representation into a "standard", 119 "normal", or "canonical" form. Extensive canonicalization prior to 120 comparison of IRIs may be used by spiders and indexing engines to 121 prune a search space or reduce duplication of request actions and 122 response storage. 124 IRI comparison is performed for some particular purpose. Protocols 125 or implementations that compare IRIs for different purposes will 126 often be subject to differing design trade-offs in regards to how 127 much effort should be spent in reducing aliased identifiers. This 128 document describes various methods that may be used to compare IRIs, 129 the trade-offs between them, and the types of applications that might 130 use them. 132 2. General guidelines 134 Because IRIs exist to identify resources, one might expect two IRIs 135 to be considered equivalent when they identify the same resource. 136 However, this definition of equivalence is not of much practical use, 137 as there is in general no way for an implementation to compare two 138 resources to determine if they are "the same" unless it has full 139 knowledge or control of them. Comparison methods for IRIs are 140 generally based strictly on examining the characters that make up the 141 IRI, without performing any network access. 143 We use the terms "different" and "equivalent" to describe the 144 possible outcomes of such comparisons, but there are many 145 application-dependent versions of equivalence. 147 Even when it is possible to determine that two IRIs are equivalent, 148 IRI comparison is not sufficient to determine whether two IRIs 149 identify different resources. For example, an owner of two different 150 domain names could decide to serve the same resource from both, 151 resulting in two different IRIs. For this reason, false negatives 152 (e.g., returning "different" even with the resources are "the same") 153 cannot be completely avoided. Comparison methods often try to 154 minimize false negatives while strictly avoiding false positives. 155 However, in some cases (such as cache invalidation), false negatives 156 are more harmful than false positives. 158 A comparison method for determining equivalence might have multiple 159 values, for example, returning "equivalent", "different", or 160 "equivalence cannot be determined". 162 Multiple canonicalization (normalizations) methods might be defined, 163 where sequential application of each results in greater sets of 164 equivalent values. 166 In testing for equivalence, applications should not directly compare 167 relative references; the references should be converted to their 168 respective target IRIs before comparison. [[ref 3987bis]] 170 Some IRIs contain fragment identifiers. In general, the equivalence 171 of two IRIs is determined first by comparing the IRIs without any 172 fragment identifiers, and then (if appropriate) the fragment 173 components (if any) compared. 175 Some applications (such as XML namespaces) use IRIs as identity 176 tokens without any relationship to acessing the resources. Those 177 applications use the Simple String Comparison (see Section 4.1). 179 3. Preparation for Comparison 181 Any kind of IRI comparison REQUIRES that any additional contextual 182 processing is first performed, including undoing higher-level 183 escapings or encodings in the protocol or format that carries an IRI. 184 This preprocessing is usually done when the protocol or format is 185 parsed. 187 NOTE: This document has not yet been updated to use in-line Unicode 188 examples. 190 Examples of such escapings or encodings are entities and numeric 191 character references in [HTML4] and [XML1]. As an example, 192 "http://example.org/rosé" (in HTML), 193 "http://example.org/rosé" (in HTML or XML), and 194 "http://example.org/rosé" (in HTML or XML) are all resolved into 195 what is denoted in this document (see 'Notation' section of 196 [RFC3987bis]) as "http://example.org/rosé" (the "é" here 197 standing for the actual e-acute character, to compensate for the fact 198 that this document cannot contain non-ASCII characters). 200 An IRI is a sequence of Unicode characters. IRIs are sometimes 201 represented in documents as sequences of bytes in a charset, either 202 Unicode-based (UTF-8) or using some other character encoding (e.g., 203 ISO-8859-1). Before comparing two such sequences, they must both be 204 converted into sequences of Unicode characters. 206 Similarly, encodings such as Transfer Codings in HTTP (see [RFC2616]) 207 and Content Transfer Encodings in MIME ([RFC2045]) must be unencoded. 208 In these cases, the encoding is based not on characters but on 209 octets, and additional care is required to make sure that characters, 210 and not just arbitrary octets, are compared (see Section 4.1. 212 4. Comparison Hierarchy 214 In practice, a variety of methods are used to test IRI equivalence. 215 These methods generally fall into a range distinguished by the amount 216 of processing required and the degree to which the probability of 217 false negatives is reduced. As noted above, false negatives cannot 218 be eliminated. In practice, their probability can be reduced, but 219 this reduction requires more processing and is not cost-effective for 220 all applications. 222 The following discussion starts with comparison methods that are 223 cheap but have a relatively higher chance of producing false 224 negatives, and proceeding to those that have higher computational 225 cost and lower risk of false negatives. 227 4.1. Simple String Comparison 229 If two IRIs (when considered as strings of Unicode characters) are 230 identical, then it is safe to conclude that they are equivalent. 231 This type of equivalence test has very low computational cost and is 232 in wide use in a variety of applications, particularly in the domain 233 of parsing. It is also used when a definitive answer to the question 234 of IRI equivalence is needed that is independent of the scheme used 235 and that can be calculated quickly and without accessing a network. 236 An example of such a case is XML Namespaces ([XMLNamespace]). 238 Testing strings for equivalence requires some basic precautions. 239 This procedure is often referred to as "bit-for-bit" or "byte-for- 240 byte" comparison, which is potentially misleading. Testing strings 241 for equality is normally based on pair comparison of the characters 242 that make up the strings, starting from the first and proceeding 243 until both strings are exhausted and all characters are found to be 244 equal, until a pair of characters compares unequal, or until one of 245 the strings is exhausted before the other. 247 This character comparison requires that each pair of characters be 248 put in comparable encoding form. For example, should one IRI be 249 stored in a byte array in UTF-8 encoding form and the second in a 250 UTF-16 encoding form, bit-for-bit comparisons applied naively will 251 produce errors. It is better to speak of equality on a character- 252 for-character rather than on a byte-for-byte or bit-for-bit basis. 253 In practical terms, character-by-character comparisons should be done 254 codepoint by codepoint after conversion to a common character 255 encoding form. When comparing character by character, the comparison 256 function MUST NOT map IRIs to URIs, because such a mapping would 257 create additional spurious equivalences. It follows that an IRI 258 SHOULD NOT be modified when being transported if there is any chance 259 that this IRI might be used in a context that uses Simple String 260 Comparison. 262 False negatives are caused by the production and use of IRI aliases. 263 Unnecessary aliases can be reduced, regardless of the comparison 264 method, by consistently providing IRI references in a canonical form 265 (after canonicalization is applied). 267 Protocols and data formats might limit some IRI comparisons to simple 268 string comparison, based on the theory that people and 269 implementations will, in their own best interest, be consistent in 270 providing IRI references, or at least be consistent enough to negate 271 any efficiency that might be obtained from further canonicalization. 273 4.2. Syntax-Based Equivalence 275 Implementations may use logic based on the definitions provided by 276 this specification to reduce the probability of false negatives. 277 This processing is moderately higher in cost than character-for- 278 character string comparison. For example, an application using this 279 approach could reasonably consider the following two IRIs equivalent: 281 example://a/b/c/%7Bfoo%7D/rosé 282 eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9 284 Web user agents, such as browsers, typically apply this type of IRI 285 equivalence when determining whether a cached response is available. 286 Syntax-based equivalence includes such techniques as case 287 equivalence, Unicode character normalization, percent-encoding 288 equivalence, and removal of dot-segments. 290 4.2.1. Case Equivalence 292 For all IRIs, the hexadecimal digits within a percent-encoding 293 triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore 294 should be considered equivalent to forms which use uppercase letters 295 for the digits A-F. 297 When an IRI uses components of the generic syntax, the component 298 syntax equivalence rules always apply; namely, that the scheme and 299 US-ASCII only host are case insensitive and therefore should be 300 treated equivalent to lowercase. For example, the URI 301 "HTTP://www.EXAMPLE.com/" is equivalent to "http://www.example.com/". 302 Case equivalence for non-ASCII characters in IRI components that are 303 IDNs are discussed in Section 4.3. The other generic syntax 304 components are assumed to be case sensitive unless specifically 305 defined otherwise by the scheme. 307 Creating schemes that allow case-insensitive syntax components 308 containing non-ASCII characters should be avoided. Case equivalence 309 of non-ASCII characters can be culturally dependent and is always a 310 complex operation. The only exception concerns non-ASCII host names 311 for which the character normalization includes a mapping step derived 312 from case folding. 314 4.2.2. Unicode Character Normalization 316 The Unicode Standard [UNIV6] defines various equivalences between 317 sequences of characters for various purposes. Unicode Standard Annex 318 #15 [UTR15] defines various Normalization Forms for these 319 equivalences, in particular Normalization Form C (NFC, Canonical 320 Decomposition, followed by Canonical Composition) and Normalization 321 Form KC (NFKC, Compatibility Decomposition, followed by Canonical 322 Composition). 324 IRIs already in Unicode MUST NOT be normalized before parsing or 325 interpreting. In many non-Unicode character encodings, some text 326 cannot be represented directly. For example, the word "Vietnam" is 327 natively written "Việt Nam" (containing a LATIN SMALL LETTER E 328 WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct transcoding from 329 the windows-1258 character encoding leads to "Việt Nam" 330 (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX followed by a 331 COMBINING DOT BELOW). Direct transcoding of other 8-bit encodings of 332 Vietnamese may lead to other representations. 334 Equivalence of IRIs MUST rely on the assumption that IRIs are 335 appropriately pre-character-normalized rather than apply character 336 normalization when comparing two IRIs. The exceptions are conversion 337 from a non-digital form, and conversion from a non-UCS-based 338 character encoding to a UCS-based character encoding. In these 339 cases, NFC or a normalizing transcoder using NFC MUST be used for 340 interoperability. To avoid false negatives and problems with 341 transcoding, IRIs SHOULD be created by using NFC. Using NFKC may 342 avoid even more problems; for example, by choosing half-width Latin 343 letters instead of full-width ones, and full-width instead of half- 344 width Katakana. 346 As an example, "http://www.example.org/résumé.html" (in XML 347 Notation) is in NFC. On the other hand, 348 "http://www.example.org/résumé.html" is not in NFC. 350 The former uses precombined e-acute characters, and the latter uses 351 "e" characters followed by combining acute accents. Both usages are 352 defined as canonically equivalent in [UNIV6]. 354 Note: Because it is unknown how a particular sequence of characters 355 is being treated with respect to character normalization, it would 356 be inappropriate to allow third parties to normalize an IRI 357 arbitrarily. This does not contradict the recommendation that 358 when a resource is created, its IRI should be as character 359 normalized as possible (i.e., NFC or even NFKC). This is similar 360 to the uppercase/lowercase problems. Some parts of a URI are case 361 insensitive (for example, the domain name). For others, it is 362 unclear whether they are case sensitive, case insensitive, or 363 something in between (e.g., case sensitive, but with a multiple 364 choice selection if the wrong case is used, instead of a direct 365 negative result). The best recipe is that the creator use a 366 reasonable capitalization and, when transferring the URI, 367 capitalization never be changed. 369 Various IRI schemes may allow the usage of Internationalized Domain 370 Names (IDN) [RFC5890] either in the ireg-name part or elsewhere. 371 Character Normalization also applies to IDNs, as discussed in 372 Section 4.3. 374 4.2.3. Percent-Encoding Equivalence 376 The percent-encoding mechanism (Section 2.1 of [RFC3986]) is a 377 frequent source of variance among otherwise identical IRIs. In 378 addition to the case equivalence issue noted above, some IRI 379 producers percent-encode octets that do not require percent-encoding, 380 resulting in IRIs that are equivalent to their nonencoded 381 counterparts. These IRIs should be compared by first decoding any 382 percent-encoded octet sequence that corresponds to an unreserved 383 character, as described in section 2.3 of [RFC3986]. 385 For actual resolution, differences in percent-encoding (except for 386 the percent-encoding of reserved characters) SHOULD always result in 387 the same resource. For example, "http://example.org/~user", 388 "http://example.org/%7euser", and "http://example.org/%7Euser", 389 SHOULD resolve to the same resource. 391 If this kind of equivalence is to be tested, the percent-encoding of 392 both IRIs to be compared first needs to be aligned; for example, by 393 converting both IRIs to URIs, eliminating escape differences in the 394 resulting URIs, and making sure that the case of the hexadecimal 395 characters in the percent-encoding is always the same (preferably 396 upper case). If the IRI is to be passed to another application or 397 used further in some other way, its original form MUST be preserved. 398 The conversion described here should be performed only for local 399 comparison. 401 4.2.4. Path Segment Equivalence 403 The complete path segments "." and ".." are intended only for use 404 within relative references (Section 4.1 of [RFC3986]) and are removed 405 as part of the reference resolution process (Section 5.2 of 406 [RFC3986]). However, some implementations may incorrectly assume 407 that reference resolution is not necessary when the reference is 408 already an IRI, and thus fail to remove dot-segments when they occur 409 in non-relative paths. IRI comparison SHOULD remove dot-segments by 410 applying the remove_dot_segments algorithm to the path, as described 411 in Section 5.2.4 of [RFC3986]. 413 4.3. Scheme-Based Comparison 415 The syntax and semantics of IRIs vary from scheme to scheme, as 416 described by the defining specification for each scheme. 417 Implementations may use scheme-specific rules, at further processing 418 cost, to reduce the probability of false negatives. For example, 419 because the "http" scheme makes use of an authority component, has a 420 default port of "80", and defines an empty path to be equivalent to 421 "/", the following four IRIs are equivalent: 423 http://example.com 424 http://example.com/ 425 http://example.com:/ 426 http://example.com:80/ 428 In general, an IRI that uses the generic syntax for authority with an 429 empty path should be equivalent to a path of "/". Likewise, an 430 explicit ":port", for which the port is empty or the default for the 431 scheme, is equivalent to one where the port and its ":" delimiter are 432 elided. 434 Another case where equivalence varies by scheme is in the handling of 435 an empty authority component or empty host subcomponent. For many 436 scheme specifications, an empty authority or host is considered an 437 error; for others, it is considered equivalent to "localhost" or the 438 end-user's host. 440 The presence of a missing component vs. one with an empty string 441 component in an IRI SHOULD NOT be treated as equivalent unless 442 explicitly defined as such by the scheme definition. For example, 443 the IRI "http://example.com/?" cannot be assumed to be equivalent to 444 any of the examples above; an empty query component is NOT equivalent 445 to a missing one. Likewise, the presence or absence of delimiters 446 within a userinfo subcomponent is usually significant to its 447 interpretation. The fragment component is not subject to any scheme- 448 based equivalence; thus, two IRIs that differ only by the suffix "#" 449 are considered different regardless of the scheme. 451 Some IRI schemes allow the usage of Internationalized Domain Names 452 (IDN) [RFC5890] either in their ireg-name part or elswhere. When in 453 use in IRIs, those names SHOULD conform to the definition of U-Label 454 in [RFC5890]. An IRI containing an invalid IDN cannot successfully 455 be resolved. For legibility purposes, they SHOULD NOT be converted 456 into ASCII Compatible Encoding (ACE). 458 Scheme-based comparison may also consider IDN components and their 459 conversions to punycode as equivalent. As an example, 460 "http://résumé.example.org" may be considered equivalent to 461 "http://xn--rsum-bpad.example.org". 463 Other scheme-specific equivalence rules are possible. 465 4.4. Protocol-Based Comparison 467 Substantial effort to reduce the incidence of false negatives is 468 often cost-effective for web spiders. Consequently, they implement 469 even more aggressive techniques in IRI comparison. For example, if 470 they observe that an IRI such as 472 http://example.com/data 474 redirects to an IRI differing only in the trailing slash 475 http://example.com/data/ 477 they will likely regard the two as equivalent in the future. This 478 kind of technique is only appropriate when equivalence is clearly 479 indicated by both the result of accessing the resources and the 480 common conventions of their scheme's dereference algorithm (in this 481 case, use of redirection by HTTP origin servers to avoid problems 482 with relative references). 484 5. Security Considerations 486 The primary security difficulty comes from applications choosing the 487 wrong equivalence relationship, or two different parties disagreeing 488 on equivalence. This is especially a problem when IRIs are used in 489 security protocols. 491 Besides the large character repertoire of Unicode, reasons for 492 confusion include different forms of normalization and different 493 normalization expectations, use of percent-encoding with various 494 legacy encodings, and bidirectionality issues. See also [UTR36]. 496 6. Acknowledgements 498 This document was originally derived from [RFC3986] and [RFC3987], 499 based on text contributed by Tim Bray. 501 7. References 503 7.1. Normative References 505 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 506 Requirement Levels", BCP 14, RFC 2119, March 1997. 508 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 509 "Internationalizing Domain Names in Applications (IDNA)", 510 RFC 3490, March 2003. 512 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 513 Profile for Internationalized Domain Names (IDN)", 514 RFC 3491, March 2003. 516 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 517 10646", STD 63, RFC 3629, November 2003. 519 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 520 Resource Identifier (URI): Generic Syntax", STD 66, 521 RFC 3986, January 2005. 523 [RFC3987bis] 524 Duerst, M., Masinter, L., and M. Suignard, 525 "Internationalized Resource Identifiers (IRIs)", 2012, 526 . 528 [RFC5890] Klensin, J., "Internationalized Domain Names for 529 Applications (IDNA): Definitions and Document Framework", 530 RFC 5890, August 2010. 532 [UNIV6] The Unicode Consortium, "The Unicode Standard, Version 533 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, 534 ISBN 978-1-936213-01-6)", October 2010. 536 [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", 537 Unicode Standard Annex #15, March 2008, 538 . 541 7.2. Informative References 543 [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 544 Specification", World Wide Web Consortium Recommendation, 545 December 1999, 546 . 548 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 549 Extensions (MIME) Part One: Format of Internet Message 550 Bodies", RFC 2045, November 1996. 552 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 553 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 554 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 556 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 557 Identifiers (IRIs)", RFC 3987, January 2005. 559 [UTR36] Davis, M. and M. Suignard, "Unicode Security 560 Considerations", Unicode Technical Report #36, 561 August 2010, . 563 [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and 564 F. Yergeau, "Extensible Markup Language (XML) 1.0 (Forth 565 Edition)", World Wide Web Consortium Recommendation, 566 August 2006, . 568 [XMLNamespace] 569 Bray, T., Hollander, D., Layman, A., and R. Tobin, 570 "Namespaces in XML (Second Edition)", World Wide Web 571 Consortium Recommendation, August 2006, 572 . 574 Authors' Addresses 576 Larry Masinter 577 Adobe 578 345 Park Ave 579 San Jose, CA 95110 580 U.S.A. 582 Phone: +1-408-536-3024 583 Email: masinter@adobe.com 584 URI: http://larry.masinter.net 586 Martin Duerst 587 Aoyama Gakuin University 588 5-10-1 Fuchinobe 589 Sagamihara, Kanagawa 229-8558 590 Japan 592 Phone: +81 42 759 6329 593 Fax: +81 42 759 6495 594 Email: duerst@it.aoyama.ac.jp 595 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/