idnits 2.17.1 draft-ietf-idn-uri-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 26 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC2396]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1, 2002) is 7969 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2279' is mentioned on line 108, but not defined ** Obsolete undefined reference: RFC 2279 (Obsoleted by RFC 3629) == Unused Reference: 'ISO10646' is defined on line 193, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 204, but no explicit reference was found in the text == Unused Reference: 'RFC2279' is defined on line 214, but no explicit reference was found in the text == Unused Reference: 'RFC2640' is defined on line 223, but no explicit reference was found in the text == Outdated reference: A later version (-13) exists of draft-ietf-idn-idna-09 -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNWG' == Outdated reference: A later version (-11) exists of draft-duerst-iri-01 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2192 (Obsoleted by RFC 5092) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2718 (Obsoleted by RFC 4395) ** Obsolete normative reference: RFC 2732 (Obsoleted by RFC 3986) Summary: 13 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Duerst 3 Internet-Draft W3C/Keio University 4 Expires: December 30, 2002 July 1, 2002 6 Internationalized Domain Names in URIs 7 draft-ietf-idn-uri-02 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at http:// 25 www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on December 30, 2002. 32 Copyright Notice 34 Copyright (C) The Internet Society (2002). All Rights Reserved. 36 Abstract 38 This document proposes to upgrade the definition of URIs (RFC 2396) 39 [RFC2396] to work consistently with internationalized domain names. 41 Table of Contents 43 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 44 2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3 45 3. Security considerations . . . . . . . . . . . . . . . . . . . 5 46 4. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5 47 4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 . 5 48 4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 . 5 49 References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 50 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 51 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8 53 1. Introduction 55 Internet domain names serve to identify hosts and services on the 56 Internet in a convenient way. The IETF IDN working group [IDNWG] has 57 been working on extending the character repertoire usable in domain 58 names beyond a subset of US-ASCII. 60 One of the most important places where domain names appear are 61 Uniform Resource Identifiers (URIs, [RFC2396], as modified by 62 [RFC2732]). However, in the current definition of the generic URI 63 syntax, the restrictions on domain names are 'hard-coded'. In 64 Section 2, this document relaxes these restrictions by updating the 65 syntax, and defines how internationalized domain names are encoded in 66 URIs. 68 The syntax in this document has been choosen to further increase the 69 uniformity of URI syntax, which is a very important principle of 70 URIs. 72 In practice, escaped domanin names should be used as rarely as 73 possible. Wherever possible, the actual characters in 74 Internationalized Domain Names should be preserved as long as 75 possible by using IRIs [IRI] rather than URIs, and only converting to 76 URIs and then to ACE-encoded [IDNA] domain names (or ideally directly 77 to ACE-encoding without even using URIs) when resolving the IRI. 78 Also, this document does in no way exclude the use of ACE encoding 79 directly in an URI domain name part. ACE encoding may be used 80 directly in an URI domain name part if this is considered necessary 81 for interoperability. 83 Please note that even with the definition of URIs in [RFC2396], some 84 URIs can already contain host names with escaped characters. For 85 example, mailto:example@w%33.org is legal per [RFC2396] because the 86 mailto: URI scheme does not follow the generic syntax of [RFC2396]. 88 2. URI syntax changes 90 The syntax of URIs [RFC2396] currently contains the following rules 91 relevant to domain names: 93 hostname = *( domainlabel "." ) toplabel [ "." ] 94 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum 95 toplabel = alpha | alpha *( alphanum | "-" ) alphanum 97 The later two rules are changed as follows: 99 domainlabel = anchar | anchar *( anchar | "-" ) anchar 100 toplabel = achar | achar *( anchar | "-" ) anchar 102 and the following rules are added: 104 anchar = alphanum | escaped 105 achar = alpha | escaped 107 Characters outside the repertoire (alphanum) are encoded by first 108 encoding the characters in UTF-8 [RFC 2279], resulting in a sequence 109 of octets, and then escaping these octets according to the rules 110 defined in [RFC2396]. 112 Using UTF-8 assures that this encoding interoperates with IRIs [IRI]. 113 It is also aligned with the recommendations in [RFC2277] and 114 [RFC2718], and is consistent with the URN syntax [RFC2141] as well as 115 recent URL scheme definitions that define encodings of non-ASCII 116 characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs 117 [RFC2384]). 119 The above syntax rules permit for domain names that are neither 120 permitted as US-ASCII only domain names nor as internationalized 121 domain names. However, such syntax should never be used, and will 122 always be rejected by resolvers. For US-ASCII only domain names, the 123 syntax rules in [RFC2396] are relevant. For example, http:// 124 www.w%33.org is legal, because the corresponding 'w3' is a legal 125 'domainlabel' according to [RFC2396]. However, http:// 126 %2a.example.org is illegal because the corresponding '*' is not a 127 legal 'domainlabel' according to [RFC2396]. For domain names 128 containing non-ASCII characters, the legal domain names are those for 129 which the ToASCII operation ([IDNA], [Nameprep]; using the unescaped 130 UTF-8 values as input) is successful. 132 For consistency in comparison operations and for interoperability 133 with older software, the following should be noted: 1) US-ASCII 134 characters in domain names should not be escaped. 2) Because of the 135 principle of syntax uniformity for URIs, it is always more prudent to 136 take into account the possibility that US-ASCII characters are 137 escaped. 139 The work of the IDN WG includes some procedures for name preparation 140 [Nameprep]. Before encoding an internationalized domain name in an 141 URI, this preparation step SHOULD be applied. However, the URI 142 resolver MUST also apply any steps required as part of domain name 143 resolution by [IDNA]. 145 3. Security considerations 147 The security considerations of [RFC2396] and those applying to 148 internationalized domain names apply. There may be an increased 149 potential to smuggle escaped US-ASCII-based domain names across 150 firewalls, although because of the uniform syntax principle for URIs, 151 such a potential is already existing. 153 4. Change Log 155 4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 157 Moved change log to back 159 Changed to only change URIs; IRI syntax updated directly in IRI 160 draft. 162 Removed syntax restriction on %hh in the US-ASCII part, but made 163 clear that restrictions to domain names apply. 165 Made clear that escaped domain names in URIs should only be an 166 intermediate representation. 168 Gave example of mailto: as already allowing escaped host names. 170 4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 172 Changed requirement for URI/IRI resolvers from MUST to SHOULD 174 Changed IRI syntax slightly (ichar -> idchar, based on changes in 175 [IRI]) 177 Various wording changes 179 References 181 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, 182 "Internationalizing Domain Names in Applications (IDNA)", 183 draft-ietf-idn-idna-09.txt (work in progress), May 2002, 184 . 187 [IDNWG] "IETF Internationalized Domain Name (idn) Working Group". 189 [IRI] Duerst, M. and M. Suignard, "Internationalized Resource 190 Identifiers (IRI)", draft-duerst-iri-01 (work in 191 progress), July 2002. 193 [ISO10646] International Organization for Standardization, 194 "Information Technology - Universal Multiple-Octet Coded 195 Character Set (UCS) - Part 1: Architecture and Basic 196 Multilingual Plane", ISO Standard 10646-1, October 2000. 198 [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 199 Profile for Internationalized Domain Names", draft-ietf- 200 idn-nameprep-10.txt (work in progress), May 2002, . 204 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 205 Requirement Levels", BCP 14, RFC 2119, March 1997. 207 [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. 209 [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. 211 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 212 Languages", BCP 18, RFC 2277, January 1998. 214 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 215 10646", RFC 2279, January 1998. 217 [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. 219 [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform 220 Resource Identifiers (URI): Generic Syntax", RFC 2396, 221 August 1998. 223 [RFC2640] Curtin, B., "Internationalization of the File Transfer 224 Protocol", RFC 2640, July 1999. 226 [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, 227 "Guidelines for new URL Schemes", RFC 2718, November 228 1999. 230 [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for 231 Literal IPv6 Addresses in URL's", RFC 2732, December 232 1999. 234 Author's Address 236 Martin Duerst 237 W3C/Keio University 238 5322 Endo 239 Fujisawa 252-8520 240 Japan 242 Phone: +81 466 49 1170 243 Fax: +81 466 49 1171 244 EMail: duerst@w3.org 245 URI: http://www.w3.org/People/D%C3%BCrst/ 247 Full Copyright Statement 249 Copyright (C) The Internet Society (2002). All Rights Reserved. 251 This document and translations of it may be copied and furnished to 252 others, and derivative works that comment on or otherwise explain it 253 or assist in its implementation may be prepared, copied, published 254 and distributed, in whole or in part, without restriction of any 255 kind, provided that the above copyright notice and this paragraph are 256 included on all such copies and derivative works. However, this 257 document itself may not be modified in any way, such as by removing 258 the copyright notice or references to the Internet Society or other 259 Internet organizations, except as needed for the purpose of 260 developing Internet standards in which case the procedures for 261 copyrights defined in the Internet Standards process must be 262 followed, or as required to translate it into languages other than 263 English. 265 The limited permissions granted above are perpetual and will not be 266 revoked by the Internet Society or its successors or assigns. 268 This document and the information contained herein is provided on an 269 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 270 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 271 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 272 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 273 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 275 Acknowledgement 277 Funding for the RFC Editor function is currently provided by the 278 Internet Society.