idnits 2.17.1 draft-ietf-idn-uri-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 8 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 25 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC2396]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 3, 2002) is 7846 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2279' is mentioned on line 110, but not defined ** Obsolete undefined reference: RFC 2279 (Obsoleted by RFC 3629) == Unused Reference: 'ISO10646' is defined on line 212, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 223, but no explicit reference was found in the text == Unused Reference: 'RFC2279' is defined on line 233, but no explicit reference was found in the text == Unused Reference: 'RFC2640' is defined on line 242, but no explicit reference was found in the text -- Unexpected draft version: The latest known version of draft-ietf-idn-idna is -13, but you're referring to -14. -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNWG' == Outdated reference: A later version (-11) exists of draft-duerst-iri-02 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Unexpected draft version: The latest known version of draft-ietf-idn-nameprep is -10, but you're referring to -11. ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2192 (Obsoleted by RFC 5092) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2718 (Obsoleted by RFC 4395) ** Obsolete normative reference: RFC 2732 (Obsoleted by RFC 3986) Summary: 13 errors (**), 0 flaws (~~), 10 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Duerst 3 Internet-Draft W3C 4 Expires: May 4, 2003 November 3, 2002 6 Internationalized Domain Names in URIs 7 draft-ietf-idn-uri-03 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at http:// 25 www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on May 4, 2003. 32 Copyright Notice 34 Copyright (C) The Internet Society (2002). All Rights Reserved. 36 Abstract 38 This document proposes to upgrade the definition of URIs (RFC 2396) 39 [RFC2396] to work consistently with internationalized domain names. 41 Table of Contents 43 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 44 2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3 45 3. Security considerations . . . . . . . . . . . . . . . . . . . 5 46 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5 47 5. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5 48 5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03 . 5 49 5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02 . 5 50 5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01 . 5 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 52 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 53 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8 55 1. Introduction 57 Internet domain names serve to identify hosts and services on the 58 Internet in a convenient way. The IETF IDN working group [IDNWG] has 59 been working on extending the character repertoire usable in domain 60 names beyond a subset of US-ASCII. 62 One of the most important places where domain names appear are 63 Uniform Resource Identifiers (URIs, [RFC2396], as modified by 64 [RFC2732]). However, in the current definition of the generic URI 65 syntax, the restrictions on domain names are 'hard-coded'. In 66 Section 2, this document relaxes these restrictions by updating the 67 syntax, and defines how internationalized domain names are encoded in 68 URIs. 70 The syntax in this document has been chosen to further increase the 71 uniformity of URI syntax, which is a very important principle of 72 URIs. 74 In practice, escaped domain names should be used as rarely as 75 possible. Wherever possible, the actual characters in 76 Internationalized Domain Names should be preserved as long as 77 possible by using IRIs [IRI] rather than URIs, and only converting to 78 URIs and then to ACE-encoded [IDNA] domain names (or ideally directly 79 to ACE-encoding without even using URIs) when resolving the IRI. 80 Also, this document does not exclude the use of ACE encoding directly 81 in an URI domain name part. ACE encoding may be used directly in an 82 URI domain name part if this is considered necessary for 83 interoperability. 85 Please note that even with the definition of URIs in [RFC2396], some 86 URIs can already contain host names with escaped characters. For 87 example, mailto:example@w%33.org is legal per [RFC2396] because the 88 mailto: URI scheme does not follow the generic syntax of [RFC2396]. 90 2. URI syntax changes 92 The syntax of URIs [RFC2396] currently contains the following rules 93 relevant to domain names: 95 hostname = *( domainlabel "." ) toplabel [ "." ] 96 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum 97 toplabel = alpha | alpha *( alphanum | "-" ) alphanum 99 The later two rules are changed as follows: 101 domainlabel = anchar | anchar *( anchar | "-" ) anchar 102 toplabel = achar | achar *( anchar | "-" ) anchar 104 and the following rules are added: 106 anchar = alphanum | escaped 107 achar = alpha | escaped 109 Characters outside the repertoire (alphanum) are encoded by first 110 encoding the characters in UTF-8 [RFC 2279], resulting in a sequence 111 of octets, and then escaping these octets according to the rules 112 defined in [RFC2396]. 114 Using UTF-8 assures that this encoding interoperates with IRIs [IRI]. 115 It is also aligned with the recommendations in [RFC2277] and 116 [RFC2718], and is consistent with the URN syntax [RFC2141] as well as 117 recent URL scheme definitions that define encodings of non-ASCII 118 characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs 119 [RFC2384]). 121 The above syntax rules permit for domain names that are neither 122 permitted as US-ASCII only domain names nor as internationalized 123 domain names. However, such domain names should never be used, and 124 will never be resolved because no such domains will be registered. 125 For US-ASCII only domain names, the syntax rules in [RFC2396] are 126 relevant. For example, http://www.w%33.org is legal, because the 127 corresponding 'w3' is a legal 'domainlabel' according to [RFC2396]. 128 However, http://%2a.example.org is illegal because the corresponding 129 '*' is not a legal 'domainlabel' according to [RFC2396]. 131 For domain names containing non-ASCII characters, the legal domain 132 names are those for which the ToASCII operation ([IDNA], [Nameprep]; 133 using the unescaped UTF-8 values as input), with the flags 134 "UseSTD3ASCIIRules" and "AllowUnassigned" set, is successful. The 135 URI resolver MUST apply any steps required as part of domain name 136 resolution by [IDNA], in particular the ToASCII operation, with the 137 above-mentioned flags set. URIs where the ToASCII operation results 138 in an error should be treated as unresolvable. 140 For domain names containing non-ASCII characters, the Nameprep 141 specification ([Nameprep]) defines some mappings, which mainly 142 include normalization to NFKC and folding to lower case. When 143 encoding an internationalized domain name in an URI, these mappings 144 SHOULD NOT be applied. It should be assumed that the domain name is 145 already normalized as far as appropriate. 147 For consistency in comparison operations and for interoperability 148 with older software, the following should be noted: 1) US-ASCII 149 characters in domain names should not be escaped. 2) Because of the 150 principle of syntax uniformity for URIs, it is always more prudent to 151 take into account the possibility that US-ASCII characters are 152 escaped. 154 3. Security considerations 156 The security considerations of [RFC2396] and those applying to 157 internationalized domain names apply. There may be an increased 158 potential to smuggle escaped US-ASCII-based domain names across 159 firewalls, although because of the uniform syntax principle for URIs, 160 such a potential is already existing. 162 4. Acknowledgements 164 Erik Nordmark 166 5. Change Log 168 5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03 170 Clarified expectations on name checking. 172 5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02 174 Moved change log to back 176 Changed to only change URIs; IRI syntax updated directly in IRI 177 draft. 179 Removed syntax restriction on %hh in the US-ASCII part, but made 180 clear that restrictions to domain names apply. 182 Made clear that escaped domain names in URIs should only be an 183 intermediate representation. 185 Gave example of mailto: as already allowing escaped host names. 187 Corrected some typos. 189 5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01 191 Changed requirement for URI/IRI resolvers from MUST to SHOULD 193 Changed IRI syntax slightly (ichar -> idchar, based on changes in 194 [IRI]) 195 Various wording changes 197 References 199 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, 200 "Internationalizing Domain Names in Applications (IDNA)", 201 draft-ietf-idn-idna-14.txt (work in progress), October 202 2002, . 205 [IDNWG] "IETF Internationalized Domain Name (idn) Working Group". 207 [IRI] Duerst, M. and M. Suignard, "Internationalized Resource 208 Identifiers (IRI)", draft-duerst-iri-02.txt (work in 209 progress), November 2002, . 212 [ISO10646] International Organization for Standardization, 213 "Information Technology - Universal Multiple-Octet Coded 214 Character Set (UCS) - Part 1: Architecture and Basic 215 Multilingual Plane", ISO Standard 10646-1, October 2000. 217 [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 218 Profile for Internationalized Domain Names", draft-ietf- 219 idn-nameprep-11.txt (work in progress), June 2002, 220 . 223 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 224 Requirement Levels", BCP 14, RFC 2119, March 1997. 226 [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. 228 [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. 230 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 231 Languages", BCP 18, RFC 2277, January 1998. 233 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 234 10646", RFC 2279, January 1998. 236 [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. 238 [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform 239 Resource Identifiers (URI): Generic Syntax", RFC 2396, 240 August 1998. 242 [RFC2640] Curtin, B., "Internationalization of the File Transfer 243 Protocol", RFC 2640, July 1999. 245 [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, 246 "Guidelines for new URL Schemes", RFC 2718, November 247 1999. 249 [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for 250 Literal IPv6 Addresses in URL's", RFC 2732, December 251 1999. 253 Author's Address 255 Martin Duerst 256 World Wide Web Consortium 257 200 Technology Square 258 Cambridge, MA 02139 259 U.S.A. 261 Phone: +1 617 253 5509 262 Fax: +1 617 258 5999 263 EMail: duerst@w3.org 264 URI: http://www.w3.org/People/D%C3%BCrst/ 266 Full Copyright Statement 268 Copyright (C) The Internet Society (2002). All Rights Reserved. 270 This document and translations of it may be copied and furnished to 271 others, and derivative works that comment on or otherwise explain it 272 or assist in its implementation may be prepared, copied, published 273 and distributed, in whole or in part, without restriction of any 274 kind, provided that the above copyright notice and this paragraph are 275 included on all such copies and derivative works. However, this 276 document itself may not be modified in any way, such as by removing 277 the copyright notice or references to the Internet Society or other 278 Internet organizations, except as needed for the purpose of 279 developing Internet standards in which case the procedures for 280 copyrights defined in the Internet Standards process must be 281 followed, or as required to translate it into languages other than 282 English. 284 The limited permissions granted above are perpetual and will not be 285 revoked by the Internet Society or its successors or assigns. 287 This document and the information contained herein is provided on an 288 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 289 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 290 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 291 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 292 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 294 Acknowledgement 296 Funding for the RFC Editor function is currently provided by the 297 Internet Society.