idnits 2.17.1 draft-ietf-idn-uri-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 283 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC2396], [IRI]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 20, 2001) is 8186 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2732' is mentioned on line 51, but not defined ** Obsolete undefined reference: RFC 2732 (Obsoleted by RFC 3986) == Missing Reference: 'RFC2326' is mentioned on line 70, but not defined ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826) == Missing Reference: 'RFC2396' is mentioned on line 94, but not defined ** Obsolete undefined reference: RFC 2396 (Obsoleted by RFC 3986) == Missing Reference: 'RFC2141' is mentioned on line 98, but not defined ** Obsolete undefined reference: RFC 2141 (Obsoleted by RFC 8141) == Unused Reference: 'RFC 2119' is defined on line 244, but no explicit reference was found in the text == Unused Reference: 'RFC 2141' is defined on line 247, but no explicit reference was found in the text == Unused Reference: 'RFC 2640' is defined on line 262, but no explicit reference was found in the text == Unused Reference: 'RFC 2732' is defined on line 268, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNWG' -- Possible downref: Non-RFC (?) normative reference: ref. 'IRI' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2192 (Obsoleted by RFC 5092) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2718 (Obsoleted by RFC 4395) ** Obsolete normative reference: RFC 2732 (Obsoleted by RFC 3986) Summary: 16 errors (**), 0 flaws (~~), 11 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Martin Duerst 2 draft-ietf-idn-uri-01 W3C/Keio University 3 Expires May 2002 November 20, 2001 5 Internationalized Domain Names in URIs and IRIs 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with all 10 provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering Task 13 Force (IETF), its areas, and its working groups. Note that other 14 groups may also distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet- Drafts as reference 19 material or to cite them other than as "work in progress." 21 The list of current Internet-Drafts can be accessed at 22 http://www.ietf.org/ietf/1id-abstracts.txt. 24 The list of Internet-Draft Shadow Directories can be accessed at 25 http://www.ietf.org/shadow.html. 27 Abstract 29 This document proposes to upgrade the definitions of URIs [RFC 2396] 30 and IRIs (Internationalized Resource Identifiers, [IRI]) to work 31 consistently with internationalized domain names. 33 0. Change Log 35 0.1 Changes from -00 to -01 37 - Changed requirement for URI/IRI resolvers from MUST to SHOULD 38 - Changed IRI syntax slightly (ichar -> idchar, based on changes 39 in [IRI]) 40 - Various wording changes 42 1. Introduction 44 Internet domain names serve to identify hosts and services on the 45 Internet in a convenient way. The IETF IDN working group is currently 46 working on extending the character repertoire usable in domain names 47 beyond a subset of US-ASCII. 49 One of the most important places where domain names appear are 50 Uniform Resource Identifiers (URIs, [RFC 2396], as modified by 51 [RFC2732]). However, in the current definition of the generic URI 52 syntax, the restrictions on domain names are 'hard-coded'. In 53 Section 2, this document relaxes these restrictions by updating 54 the syntax, and defines how internationalized domain names are 55 encoded in URIs. 57 URIs are restricted to a subset of US-ASCII. However, IRIs 58 (Internationalized Resource Identifier [IRI]) in general allow 59 non-ASCII characters. But the syntax of IRIs has the same 'hard-coded' 60 restrictions on domain names as the syntax of URIs. In Section 3, 61 this document relaxes these restrictions by updating the IRI syntax. 62 This is done in a way that is compatible with the new syntax for URIs. 63 This means that encoding an internationalized domain name in an URI 64 and encoding the same domain name in an IRI will produce an URI and an 65 IRI that can be converted into each other using the procedures defined 66 in [IRI] for these conversions. 68 2. URI syntax changes 70 The syntax of URIs [RFC2326] currently contains the following rules 71 relevant to domain names: 73 hostname = *( domainlabel "." ) toplabel [ "." ] 74 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum 75 toplabel = alpha | alpha *( alphanum | "-" ) alphanum 77 The later two rules are changed as follows: 79 domainlabel = escalphanum | escalphanum *( escalphanum | "-" ) 80 escalphanum 81 toplabel = escalpha | escalpha *( escalphanum | "-" ) 82 escalphanum 84 and the following rules are added: 86 escalphanum = escaped8 | alphanum 87 escalpha = elcaped8 | alpha 88 escaped8 = "%" hexdig8 HEXDIG 89 hexdig8 = <> 91 The %HH escaping is used to encode characters outside the repertoire 92 of US-ASCII. This is done by first encoding the characters in UTF-8 93 [RFC 2279], resulting in a sequence of octets, and then escaping these 94 octets according to the rules defined in [RFC2396]. 96 Using UTF-8 assures that this encoding interoperates with IRIs (see 97 Section 3). It is also aligned with the recommendations in [RFC 2277] 98 and [RFC 2718], and is consistent with the URN syntax [RFC2141] as 99 well as recent URL scheme definitions that define encodings of 100 non-ASCII characters based on UTF-8 (e.g., IMAP URLs [RFC 2192] and 101 POP URLs [RFC 2384]). 103 Please note that the use of UTF-8 for encoding internationalized 104 domain names in URIs is independent of the choice of encoding chosen 105 for these names in the DNS protocol. Depending on the choice of 106 encoding for the DNS protocol, an appropriate conversion is necessary. 108 The above syntax rules do not extend the possible domain names based 109 on US-ASCII characters. This is in accordance with the current direction 110 of the IDN WG [IDNWG]. 112 The above rules also do not allow escaping of US-ASCII characters, 113 although this is allowed in the other parts of an URI (except for the 114 special provisions in case of reserved characters). Allowing such 115 escaping would make the syntax rules quite a bit more complicated, 116 would mean that the restrictions on US-ASCII characters can be 117 circumvented by using escaping, or would lead to much simpler syntax 118 rules that don't express these restrictions anymore. 120 Whether escaping of US-ASCII characters is allowed or not, two things 121 should be noted: 1) It is always better not to escape US-ASCII characters 122 in domain names because of the possibility that a resolver does not unescape 123 them. At least purely US-ASCII domain names would then always be resolved 124 by such a processor. 2) Because of the principle of syntax uniformity for 125 URIs, it is always more prudent to take into account the possibility that 126 US-ASCII characters are escaped. 128 Only the restrictions on US-ASCII characters are expressed in the 129 rules above. However, all the other restrictions on internationalized 130 domain names that are defined by the IDN WG [IDNWG] MUST be respected. 132 The work of the IDN WG currently includes some procedures for name 133 preparation. Before encoding an internationalized domain name in an 134 URI, this preparation step SHOULD be applied. However, the URI resolver 135 SHOULD also apply name preparation. 137 3. IRI syntax changes 139 The syntax of IRIs [IRI] currently contains the following rules 140 relevant to domain names: 142 hostname = *( domainlabel "." ) toplabel [ "." ] 143 domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum 144 toplabel = alpha | alpha *( alphanum | "-" ) alphanum 146 The later two rules are changed as follows: 148 domainlabel = intalphanum | intalphanum *( intalphanum | "-" ) 149 intalphanum 150 toplabel = intalpha | intalpha *( intalphanum | "-" ) 151 intalphanum 153 and the following rules are added: 155 intalphanum = idchar | alphanum | escaped8 156 intalpha = idchar | alpha | escaped8 157 escaped8 = "%" hexdig8 HEXDIG 158 hexdig8 = <> 159 idchar = << any character of the UCS [ISO10646] of U+00A0 160 and beyond, subject to limitations in Section 161 3.1. of [IRI] >> 163 With respect to the allowed domain names based on US-ASCII characters, 164 the same considerations as in Section 2 apply. 166 As in Section 2, all the other restrictions on internationalized 167 domain names that will be defined by the IDN WG MUST be respected. 168 Also, before encoding an internationalized domain name in an IRI, 169 name preparation SHOULD be applied. However, the IRI resolver SHOULD 170 also apply name preparation. 172 It is expected that the rules in Section 3.1 of [IRI] will be less 173 restrictive than the rules for internationalized domain names, so that 174 no escaping is necessary. Nevertheless, escaping is allowed for cases 175 where not all characters can be directly represented. 177 4. Security Considerations 179 The security considerations of [RFC 2396] and [IRI] and those applying 180 to internationalized domain names apply. There may be an increased 181 potential to smuggle escaped US-ASCII-based domain names across 182 firewalls, although because of the uniform syntax principle for 183 URIs, such a potential is already existing. 185 Acknowledgements 187 Looking forward for comments. Will acknowledge them here! 189 Copyright 191 Copyright (C) The Internet Society, 1997. All Rights Reserved. 193 This document and translations of it may be copied and furnished to 194 others, and derivative works that comment on or otherwise explain it 195 or assist in its implementation may be prepared, copied, published 196 and distributed, in whole or in part, without restriction of any 197 kind, provided that the above copyright notice and this paragraph 198 are included on all such copies and derivative works. However, this 199 document itself may not be modified in any way, such as by removing 200 the copyright notice or references to the Internet Society or other 201 Internet organizations, except as needed for the purpose of 202 developing Internet standards in which case the procedures for 203 copyrights defined in the Internet Standards process must be 204 followed, or as required to translate it into languages other 205 than English. 207 The limited permissions granted above are perpetual and will not be 208 revoked by the Internet Society or its successors or assigns. 210 This document and the information contained herein is provided on an 211 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 212 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 213 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 214 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 215 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 217 Author's address 219 Martin J. Duerst 220 W3C/Keio University 221 5322 Endo, Fujisawa 222 252-8520 Japan 223 duerst@w3.org 224 http://www.w3.org/People/D%C3%BCrst/ 225 Tel/Fax: +81 466 49 1170 227 Note: Please write "Duerst" with u-umlaut wherever 228 possible, e.g. as "Dürst" in XML and HTML. 230 References 232 [IDNWG] IETF Internationalized Domain Name (idn) Working Group. 233 Information at http://www.ietf.org/html.charters/idn-charter.html. 235 [IRI] L. Masinter, M. Duerst, "Internationalized Resource Identifiers 236 (IRI)", Internet Draft, November 2001, 237 , 238 work in progress. 240 [ISO10646] ISO/IEC, Information Technology - Universal Multiple-Octet 241 Coded Character Set (UCS) - Part 1: Architecture and Basic 242 Multilingual Plane, Oct. 2000, with amendments. 244 [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate 245 Requirement Levels", March 1997. 247 [RFC 2141] R. Moats, "URN Syntax", May 1997. 249 [RFC 2192] C. Newman, "IMAP URL Scheme", September 1997. 251 [RFC 2277] H. Alvestrad, "IETF Policy on Character Sets and 252 Languages". 254 [RFC 2279] F. Yergeau. "UTF-8, a transformation format of ISO 10646.", 255 January 1998. 257 [RFC 2384] R. Gellens, "POP URL Scheme", August 1998. 259 [RFC 2396] T.Berners-Lee, R.Fielding, L.Masinter. "Uniform Resource 260 Identifiers (URI): Generic Syntax." August 1998. 262 [RFC 2640] B. Curtis, "Internationalization of the File Transfer 263 Protocol", July 1999. 265 [RFC 2718] L. Masinter, H. Alvestrand, D. Zigmond, R. Petke, 266 "Guidelines for new URL Schemes", November 1999. 268 [RFC 2732] R. Hinden, B. Carpenter, L. Masinter, "Format for Literal 269 IPv6 Addresses in URL's", December 1999.