idnits 2.17.1 draft-duerst-idn-uri-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC2396]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 24, 2002) is 7976 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2279' is mentioned on line 122, but not defined ** Obsolete undefined reference: RFC 2279 (Obsoleted by RFC 3629) == Unused Reference: 'ISO10646' is defined on line 175, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 180, but no explicit reference was found in the text == Unused Reference: 'RFC2279' is defined on line 190, but no explicit reference was found in the text == Unused Reference: 'RFC2640' is defined on line 199, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNWG' == Outdated reference: A later version (-11) exists of draft-duerst-iri-00 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2192 (Obsoleted by RFC 5092) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2718 (Obsoleted by RFC 4395) ** Obsolete normative reference: RFC 2732 (Obsoleted by RFC 3986) == Outdated reference: A later version (-13) exists of draft-ietf-idn-idna-09 Summary: 13 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Duerst 3 Internet-Draft W3C/Keio University 4 Expires: December 23, 2002 June 24, 2002 6 Internationalized Domain Names in URIs 7 draft-duerst-idn-uri-00 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at http:// 25 www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on December 23, 2002. 32 Copyright Notice 34 Copyright (C) The Internet Society (2002). All Rights Reserved. 36 Abstract 38 This document proposes to upgrade the definition of URIs (RFC 2396) 39 [RFC2396] to work consistently with internationalized domain names. 41 Table of Contents 43 1. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 3 44 1.1 Changes from draft-ietf-idn-uri--01 to draft-duerst-idn-uri-00 3 45 1.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 . 3 46 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 47 3. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 4 48 4. Security considerations . . . . . . . . . . . . . . . . . . . 5 49 References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 50 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 6 51 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 7 53 1. Change Log 55 1.1 Changes from draft-ietf-idn-uri--01 to draft-duerst-idn-uri-00 57 Changed to only change URIs; IRI syntax updated directly in IRI 58 draft. 60 Removed syntax restriction on %hh in the US-ASCII part, but made 61 clear that restrictions to domain names apply. 63 Made clear that escaped domain names in URIs should only be an 64 intermediate representation. 66 1.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 68 Changed requirement for URI/IRI resolvers from MUST to SHOULD 70 Changed IRI syntax slightly (ichar -> idchar, based on changes in 71 [IRI]) 73 Various wording changes 75 2. Introduction 77 Internet domain names serve to identify hosts and services on the 78 Internet in a convenient way. The IETF IDN working group [IDNWG] has 79 been working on extending the character repertoire usable in domain 80 names beyond a subset of US-ASCII. 82 One of the most important places where domain names appear are 83 Uniform Resource Identifiers (URIs, [RFC2396], as modified by 84 [RFC2732]). However, in the current definition of the generic URI 85 syntax, the restrictions on domain names are 'hard-coded'. In 86 Section 2, this document relaxes these restrictions by updating the 87 syntax, and defines how internationalized domain names are encoded in 88 URIs. 90 The syntax in this document is defined for consistency. Uniformity 91 of syntax is a very important principle of URIs. In practice, 92 escaped domanin names should be used as rarely as possible. Wherever 93 possible, the actual characters in Internationalized Domain Names 94 should be preserved as long as possible by using IRIs [IRI] rather 95 than URIs, and only converting to URIs and then to ACE-encoded domain 96 names (or directly to ACE-encoding without even using URIs) when 97 resolving the IRI. Also, this document does in no way exclude the 98 use of ACE encoding directly in an URI domain name part. ACE 99 encoding may be used directly in an URI domain name part if it is 100 considered necessary for interoperability. 102 3. URI syntax changes 104 The syntax of URIs [RFC2396] currently contains the following rules 105 relevant to domain names: 107 hostname = *( domainlabel "." ) toplabel [ "." ] 108 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum 109 toplabel = alpha | alpha *( alphanum | "-" ) alphanum 111 The later two rules are changed as follows: 113 domainlabel = anchar | anchar *( anchar | "-" ) anchar 114 toplabel = achar | achar *( anchar | "-" ) anchar 116 and the following rules are added: 118 anchar = alphanum | escaped 119 achar = alpha | escaped 121 Characters outside the repertoire are encoded by first encoding the 122 characters in UTF-8 [RFC 2279], resulting in a sequence of octets, 123 and then escaping these octets according to the rules defined in 124 [RFC2396]. 126 Using UTF-8 assures that this encoding interoperates with IRIs [IRI]. 127 It is also aligned with the recommendations in [RFC2277] and 128 [RFC2718], and is consistent with the URN syntax [RFC2141] as well as 129 recent URL scheme definitions that define encodings of non-ASCII 130 characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs 131 [RFC2384]). 133 The above syntax rules permit for domain names that are neither 134 permitted as US-ASCII only domain names nor as internationalized 135 domain names. However, such syntax should never be used, and must 136 always be rejected by resolvers. For US-ASCII only domain names, the 137 syntax rules in [RFC2396] are relevant. For example, http:// 138 www.w%33.org is legal, because the corresponding 'w3' is a legal 139 'domainlabel' according to [RFC2396]. However, http:// 140 %2a.example.org is illegal because the corresponding '*' is not a 141 legal 'domainlabel' according to [RFC2396]. For domain names 142 containing non-ASCII characters, the legal domain names are those for 143 which the ToASCII operation ([IDNA], [Nameprep]; using the unescaped 144 UTF-8 values as input) is successful. 146 For consistency in comparison operations and for interoperability 147 with older software, the following should be noted: 1) US-ASCII 148 characters in domain names should never be escaped. 2) Because of 149 the principle of syntax uniformity for URIs, it is always more 150 prudent to take into account the possibility that US-ASCII characters 151 are escaped. 153 The work of the IDN WG includes some procedures for name preparation 154 [Nameprep]. Before encoding an internationalized domain name in an 155 URI, this preparation step SHOULD be applied. However, the URI 156 resolver MUST also apply any steps required by [IDNA] as part of 157 domain name resolution. 159 4. Security considerations 161 The security considerations of [RFC2396] and those applying to 162 internationalized domain names apply. There may be an increased 163 potential to smuggle escaped US-ASCII-based domain names across 164 firewalls, although because of the uniform syntax principle for URIs, 165 such a potential is already existing. 167 References 169 [IDNWG] "IETF Internationalized Domain Name (idn) Working Group". 171 [IRI] Duerst, M. and M. Suignard, "Internationalized Resource 172 Identifiers (IRI)", draft-duerst-iri-00 (work in 173 progress), April 2002. 175 [ISO10646] International Organization for Standardization, 176 "Information Technology - Universal Multiple-Octet Coded 177 Character Set (UCS) - Part 1: Architecture and Basic 178 Multilingual Plane", ISO Standard 10646-1, October 2000. 180 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 181 Requirement Levels", BCP 14, RFC 2119, March 1997. 183 [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. 185 [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. 187 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 188 Languages", BCP 18, RFC 2277, January 1998. 190 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 191 10646", RFC 2279, January 1998. 193 [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. 195 [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform 196 Resource Identifiers (URI): Generic Syntax", RFC 2396, 197 August 1998. 199 [RFC2640] Curtin, B., "Internationalization of the File Transfer 200 Protocol", RFC 2640, July 1999. 202 [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, 203 "Guidelines for new URL Schemes", RFC 2718, November 204 1999. 206 [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for 207 Literal IPv6 Addresses in URL's", RFC 2732, December 208 1999. 210 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, 211 "Internationalizing Domain Names in Applications (IDNA)", 212 draft-ietf-idn-idna-09.txt (work in progress), May 2002, 213 . 216 [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 217 Profile for Internationalized Domain Names", draft-ietf- 218 idn-nameprep-10.txt (work in progress), May 2002, . 222 Author's Address 224 Martin Duerst 225 W3C/Keio University 226 5322 Endo 227 Fujisawa 252-8520 228 Japan 230 Phone: +81 466 49 1170 231 Fax: +81 466 49 1171 232 EMail: duerst@w3.org 233 URI: http://www.w3.org/People/D%C3%BCrst/ 235 Full Copyright Statement 237 Copyright (C) The Internet Society (2002). All Rights Reserved. 239 This document and translations of it may be copied and furnished to 240 others, and derivative works that comment on or otherwise explain it 241 or assist in its implementation may be prepared, copied, published 242 and distributed, in whole or in part, without restriction of any 243 kind, provided that the above copyright notice and this paragraph are 244 included on all such copies and derivative works. However, this 245 document itself may not be modified in any way, such as by removing 246 the copyright notice or references to the Internet Society or other 247 Internet organizations, except as needed for the purpose of 248 developing Internet standards in which case the procedures for 249 copyrights defined in the Internet Standards process must be 250 followed, or as required to translate it into languages other than 251 English. 253 The limited permissions granted above are perpetual and will not be 254 revoked by the Internet Society or its successors or assigns. 256 This document and the information contained herein is provided on an 257 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 258 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 259 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 260 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 261 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 263 Acknowledgement 265 Funding for the RFC Editor function is currently provided by the 266 Internet Society.