idnits 2.17.1 draft-ietf-idn-udns-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC1035], [ISO10646]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document updates RFC19, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC2181, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC1034, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC1035, but the abstract doesn't seem to directly say this. It does mention RFC1035 though, so this could be OK. -- The draft header indicates that this document updates RFC2535, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: As long labels are not understood by older software, a response MUST not include a long label unless the query did. At a later date, IETF may change this. -- No information found for rfc19 - is the name correct? (Using the creation date from RFC1034, updated by this document, for RFC5378 checks: 1987-11-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (19 August 2001) is 8279 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1034' is defined on line 348, but no explicit reference was found in the text == Unused Reference: 'RFC2181' is defined on line 357, but no explicit reference was found in the text == Unused Reference: 'Unicode' is defined on line 373, but no explicit reference was found in the text == Unused Reference: 'UTR21' is defined on line 382, but no explicit reference was found in the text == Unused Reference: 'IANADNS' is defined on line 394, but no explicit reference was found in the text == Unused Reference: 'IDNE' is defined on line 397, but no explicit reference was found in the text == Unused Reference: 'IDNCOMP' is defined on line 403, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Obsolete normative reference: RFC 2671 (Obsoleted by RFC 6891) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21' -- Possible downref: Non-RFC (?) normative reference: ref. 'UDATA' -- No information found for draft-ietf-idn-requirement - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'IDNREQ' -- Possible downref: Normative reference to a draft: ref. 'IDNE' -- Possible downref: Normative reference to a draft: ref. 'CHNORM' -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' -- Duplicate reference: draft-ietf-idn-compare, mentioned in 'NAMEPREP', was also mentioned in 'IDNCOMP'. -- Possible downref: Normative reference to a draft: ref. 'NAMEPREP' -- Possible downref: Normative reference to a draft: ref. 'SACE' -- Possible downref: Normative reference to a draft: ref. 'RACE' Summary: 7 errors (**), 0 flaws (~~), 10 warnings (==), 23 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Dan Oscarsson 2 draft-ietf-idn-udns-03.txt Telia ProSoft 3 Updates: RFC 2181, 1035, 1034, 2535 19 August 2001 4 Expires: 19 February 2002 6 Using the Universal Character Set in the Domain Name System (UDNS) 8 Status of this memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that other 15 groups may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 Since the Domain Name System (DNS) [RFC1035] was created there have 31 been a desire to use other characters than ASCII in domain names. 32 Lately this desire have grown very strong and several groups have 33 started to experiment with non-ASCII names. This document defines 34 how the Universal Character Set (UCS) [ISO10646] is to be used in 35 DNS. It includes both a transition scheme for older software 36 supporting non-ASCII handling in applications only, as well as how to 37 use UCS in labels and having more than 63 octets in a label. 39 1. Introduction 41 While the need for non-ASCII domain names have existed since the 42 creation of the DNS, the need have increased very much during the 43 last few years. Currently there are at least two implementations 44 using UTF-8 in use, and others using other methods. 46 To avoid several different implementations of non-ASCII names in DNS 47 that do not work together, and to avoid breaking the current ASCII 48 only DNS, there is an immediate need to standardise how DNS shall 49 handle non-ASCII names. 51 While the DNS protocol allow any octet in character data, so far the 52 octets are only defined for the ASCII code points. Octets outside the 53 ASCII range have no defined interpretation. This document defines how 54 all octets are to be used in character data allowing a standardised 55 way to use non-ASCII in DNS. 57 The specification here conforms to the IDN requirements [IDNREQ]. 59 1.1 Terminology 61 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 62 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 63 document are to be interpreted as described in [RFC2119]. 65 IDN: Internationalised Domain Name, here used to mean a domain name 66 containing non-ASCII characters. 68 ACE: ASCII Compatible Encoding. Used to encode IDNs in a way 69 compatible with the ASCII host name syntax. 71 1.2 Previous versions of this document 73 This version contains just minor corrections to the 4:th version. 75 The third version of this document included a way to return both 76 ASCII and non-ASCII versions of a name. As this could not be 77 guaranteed to work it has been removed. 79 The second version of this document was available as draft-ietf-idn- 80 udns-00.txt. It included a lot of possibilities as well as a flag bit 81 that is now removed. 83 The first version of this document was available as draft-oscarsson- 84 i18ndns-00.txt. 86 2. The DNS Protocol 88 The DNS protocol is used when communicating between DNS servers and 89 other DNS servers or DNS clients. User interface issues like the 90 format of zone files or how to enter or display domain names are not 91 part of the protocol. 93 The update of the protocol defined here can be used immediately as it 94 is fully compatible with the DNS of today. 96 For a long time there will be software understanding UCS in DNS and 97 software only understanding ASCII in DNS. It is therefore necessary 98 to support a mixing of both types. For the following text software 99 understanding UCS in DNS will be called UDNS aware. 101 This specification supports the following scenarios: 103 - UDNS unaware client, UDNS aware DNS server 104 - UDNS aware client, UDNS unaware DNS server 105 - UDNS aware client, UDNS aware DNS server 107 2.1 Fundamentals 109 2.1.1 Standard Character Encoding (SCE) 111 Character data need to be able to represent as much as possible of 112 the characters in the world as well as being compatible with ASCII. 113 Character data is used in labels and in text fields in the RDATA part 114 of a RR. 116 The Standard Character Encoding of character data used in the DNS 117 protocol MUST: 118 - Use ISO 10646 (UCS) [ISO10646] as coded character set. 119 - Be normalised using form C as defined in Unicode technical report 120 #15 [UTR15]. See also [CHNORM]. 121 - Encoded using the UTF-8 [RFC2279] character encoding scheme. 123 2.1.2 Binary Comparison Format (BCF) 125 RFC 1035 states that the labels of a name are matched case- 126 insensitively. When using UCS this is no longer enough as there are 127 other forms than case that need to match as equivalent. Form- 128 insensitive matching of UCS includes: 129 - Letters of different case are compared as the same character. 130 - Code points of primary typographical variations of the same 131 character are compared as the same character. An example is double 132 width/normal width characters or presentation forms of a 133 character. 134 - Some characters are represented with multiple code points in UCS. 135 All code points of one character must compare as the same. For 136 example the degree Kelvin sign is the same as the letter K. 138 The original definition is now extended to be: labels must be 139 compared using form-insensitivity. 141 To handle form-insensitivity it is here defined the Binary Comparison 142 Format (BCF) to which strings can be mapped. After strings is mapped 143 to BCF they can be compared using binary string comparison. 144 Implementors may implement the form-insensitive comparison without 145 using BCF, as long as the results are the same. 147 Mapping of a label to BCF is typically done by steps like: changing 148 all upper case letters to lower case, mapping different forms to one 149 form and changing different code points of one character into a 150 single code point. 152 For the UCS character code range 0-255 (ASCII and ISO 8859-1) the BCF 153 MUST be done by mapping all upper case characters to lower case 154 following the one to one mapping as defined in the Unicode 3.0 155 Character Database [UDATA]. 157 The definition of the Binary Comparison Format (BCF) for the rest of 158 UCS will be defined in a separate document. The nearest today is 159 [NAMEPREP]. 161 2.1.3 Backward Compatibility Encoding (BCE) 163 To support older software expecting only ASCII and to support 164 downgrading from 8-bit to 7-bit ASCII in other protocols (like SMTP) 165 a Backward Compatibility Encoding (BCE) is available. It is a 166 transition mechanism and will no longer be supported at some future 167 time when it is so decided. 169 The Backward Compatibility Encoding (BCE) of a label is defined as 170 the BCF of the label encoded using an ASCII Compatible Encoding 171 (ACE). 173 The definition of the ACE to be used, is defined in a separate 174 document. Typical definitions that are suitable are [SACE] and 175 [RACE]. 177 The reason that the BCF form of the label is used is to support 178 solutions where only applications know about non-ASCII labels. By 179 using BCF the server need not know about UCS and can just do binary 180 matching so it can be handled in old servers. Though due to the fact 181 that BCF destroys information contained in the original form of a 182 label it is impossible to return the original form to a client using 183 BCE. 185 2.1.4 Long names 187 The current DNS protocol limits a label to 63 octets. As UTF-8 take 188 more than one octet for some characters, an UTF-8 name cannot have 63 189 characters in a label like an ASCII name can. For example a name 190 using Hangul would have a maximum of 21 characters. 192 The limits imposed by RFC 1035 is 63 octets per label and 255 octets 193 for the full name. The 255 limit is not a protocol limit but one to 194 simplify implementations. 196 To support longer names a long label type is defined using [RFC2671] 197 as extended label 0b000011 (the label type will be assigned by IANA 198 and may not be the number used here). 200 1 1 1 1 1 1 1 1 1 1 201 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 203 |0 1 0 0 0 0 1 1| length | label data ... 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 206 length: length of label in octets 207 label data: the label 209 The long label MUST be handled by all software following this 210 specification. Also, they MUST support a UDP packet size of up to 211 1280 bytes. 213 The limits for labels are updated since RFC 1025 as follows: 214 A label is limited to a maximum of 63 character code points in UCS 215 normalised using Unicode form C. The full name is limited to a 216 maximum of 255 character code points normalised as for a label. 218 A long label MUST always use the Standard Character Encoding (SCE). 220 As long labels are not understood by older software, a response MUST 221 not include a long label unless the query did. At a later date, IETF 222 may change this. 224 2.2 Rules for matching of domain names in UDNS aware DNS servers 226 To be able to handle correct domain name matching in lookups, the 227 following MUST be followed by DNS servers: 228 - Do matching on authorative data using form-insensitive matching 229 for the characters used in the data (for example a zone using only 230 ASCII need only handle matching of ASCII characters). 231 - On non-authorative data, either do binary matching or case- 232 insensitive matching on ASCII letters and binary matching on all 233 others. 235 The effect of the above is: 237 - only servers handling authorative data must implement form- 238 insensitive matching of names. And they need only implement the 239 subset needed for the subset of characters of UCS they support in 240 their authorative zones. 241 - it normally gives fast lookup because data is usually sent like: 242 resolver <-> server <-> authorative server. 243 While form-insensitive matching can be complex and CPU consuming, 244 the server in the middle will do caching with only simple and fast 245 binary matching. So the impact of complex matching rules should 246 not slow down DNS very much. 248 2.3 Mixing of UDNS aware and non-UDNS aware clients and servers 250 To handle the mixing of UDNS aware and non-UDNS aware clients and 251 servers the following MUST be followed for clients and servers. 253 2.3.1 Native UDNS aware client 255 A native UDNS aware client is a client supporting all in this 256 document. 258 When doing a query it MUST: 259 - Use the long label in the QNAME. 260 - If server rejected query due to long label, retry the query using 261 the normal short label. If the QNAME contains non-ASCII it must be 262 encoded using BCE. 263 - Handle answers containg BCE. 265 The client may skip trying a query using the long label if it knows 266 the server does not understand it. 268 2.3.2 Application based UDNS aware client 270 An application based UDNS aware client is a client supporting UDNS 271 through BCE handling in the application. 273 It only understands BCE and need only a non-UDNS aware resolver to 274 work. All encoding and decoding of BCE is handled in the 275 application. 277 Due to BCE being an ACE of BCF the names returned in an answer need 278 not contain the real form of the name. Instead it may contains the 279 simplified form used in name matching. As this is a transition 280 mechanism to support non-ASCII in names before the DNS servers have 281 been upgraded, it is acceptable and will give people a reason to 282 upgrade. 284 2.3.3 non-UDNS aware client 285 A non-UDNS aware client will send ASCII or whatever is sent from an 286 application. It can be BCE which will for the client just be ASCII 287 text. 289 2.3.4 UDNS aware server 291 An UDNS aware server MUST handle all in this document and follow: 292 - If an incoming query contains a long label the answer may contain 293 a long label and the client is identified as being UDNS aware. 294 - If the query comes from a non-UDNS aware client and the answer 295 contains non-ASCII, the non-ASCII labels must be encoded using 296 BCE. 297 - If a short label is used in a query and the QNAME contains non- 298 ASCII, an authorative server must handle the query if the 299 character encoding can be recognised. If must recognise SCE and 300 should recognise common encodings used for the labels in the 301 domain it is authorative for. Answers will use BCE for all labels 302 except the one matching QNAME. This will allow clients using the 303 local character set to work in many cases before the resolver code 304 is upgraded. 306 2.3.5 non-UDNS aware server 308 A non-UDNS server can only handle ASCII matching when comparing 309 names. It can support the transition mechanism with BCE. The 310 authorative zones will then have to be loaded with manually BCE 311 encoded names. 313 2.4 DNSSEC 315 As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be 316 revised so that it also can handle that. 318 3. Effect on other protocols 320 As now a domain name may include non-ASCII many other protocols that 321 include domain names need to be updated. For example SMTP, HTTP and 322 URIs. The BCE format can be used when interfacing with ASCII only 323 software or protocols. Protocols like SMTP could be extended using 324 ESMTP and a UTF8 option that defines that all headers are in UTF-8. 326 It is recommended that protocols updated to handle i18n do this by 327 encoding character data in the same standard format as defined for 328 DNS in this document (UCS normalised form C). The use of encoding it 329 in ASCII or by tagged character sets should be avoided. 331 DNS do not only have domain names in them, for example e-mail 332 addresses are also included. So an e-mail address would be expected 333 to be changed to include non-ASCII both before and after the @-sign. 335 Software need to be updated to follow the user interface 336 recommendations given above, so that a human will see the characters 337 in their local character set, if possible. 339 4. Security Considerations 341 As always with data, if software does not check for data that can be 342 a problem, security may be affected. As more characters than ASCII is 343 allowed, software only expecting ASCII and with no checks may now get 344 security problems. 346 5. References 348 [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", 349 STD 13, RFC 1034, November 1987. 351 [RFC1035] P. Mockapetris, "Domain Names - Implementation and 352 Specification", STD 13, RFC 1035, November 1987. 354 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate 355 Requirement Levels", March 1997, RFC 2119. 357 [RFC2181] R. Elz and R. Bush, "Clarifications to the DNS 358 Specification", RFC 2181, July 1997. 360 [RFC2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646", 361 RFC 2279, January 1998. 363 [RFC2535] D. Eastlake, "Domain Name System Security Extensions". 364 RFC 2535, March 1999. 366 [RFC2671] P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC 367 2671, August 1999. 369 [ISO10646] ISO/IEC 10646-1:2000. International Standard -- 370 Information technology -- Universal Multiple-Octet Coded 371 Character Set (UCS) 373 [Unicode] The Unicode Consortium, "The Unicode Standard -- Version 374 3.0", ISBN 0-201-61633-5. Described at 375 http://www.unicode.org/unicode/standard/versions/ 376 Unicode3.0.html 378 [UTR15] M. Davis and M. Duerst, "Unicode Normalization Forms", 379 Unicode Technical Report #15, Nov 1999, 380 http://www.unicode.org/unicode/reports/tr15/. 382 [UTR21] M. Davis, "Case Mappings", Unicode Technical Report #21, 383 Dec 1999, http://www.unicode.org/unicode/reports/tr21/. 385 [UDATA] The Unicode Character Database, 386 ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt. 387 The database is described in 388 ftp://ftp.unicode.org/Public/UNIDATA/ 389 UnicodeCharacterDatabase.html. 391 [IDNREQ] James Seng, "Requirements of Internationalized Domain 392 Names", draft-ietf-idn-requirement. 394 [IANADNS] Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name 395 System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns. 397 [IDNE] Marc Blanchet,Paul Hoffman, "Internationalized domain 398 names using EDNS (IDNE)", draft-ietf-idn-idne. 400 [CHNORM] M. Duerst, M. Davis, "Character Normalization in IETF 401 Protocols", draft-duerst-i18n-norm. 403 [IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name 404 Proposals", draft-ietf-idn-compare. 406 [NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name 407 Proposals", draft-ietf-idn-compare. 409 [SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding", draft- 410 ietf-idn-sace. 412 [RACE] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding 413 for IDN", draft-ietf-idn-race. 415 6. Acknowledgements 417 Paul Hoffman giving many comments in our e-mail discussions. 419 Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent 420 Karlsson. 422 Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for 423 comments on my draft. 425 Discussions and comments by the members of the IDN working group. 427 Author's Address 429 Dan Oscarsson 430 Telia ProSoft AB 431 Box 85 432 201 20 Malmo 433 Sweden 435 E-mail: Dan.Oscarsson@trab.se