idnits 2.17.1 

draft-ietf-idn-udns-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([RFC1035], [ISO10646]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == The 'Updates: ' line in the draft header should list only the _numbers_
     of the RFCs which will be updated by this document (if approved); it
     should not include the word 'RFC' in the list.

  -- The draft header indicates that this document updates RFC19, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC2181, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC1034, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC1035, but the
     abstract doesn't seem to directly say this.  It does mention RFC1035
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC2535, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     As long labels are not understood by older software, a response
     MUST not include a long label unless the query did. At a later date, IETF
     may change this.

  -- No information found for rfc19 - is the name correct?

     (Using the creation date from RFC1034, updated by this document, for
     RFC5378 checks: 1987-11-01)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (19 August 2001) is 8279 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC1034' is defined on line 348, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2181' is defined on line 357, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode' is defined on line 373, but no explicit
     reference was found in the text

  == Unused Reference: 'UTR21' is defined on line 382, but no explicit
     reference was found in the text

  == Unused Reference: 'IANADNS' is defined on line 394, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNE' is defined on line 397, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNCOMP' is defined on line 403, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629)

  ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034,
     RFC 4035)

  ** Obsolete normative reference: RFC 2671 (Obsoleted by RFC 6891)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UDATA'

  -- No information found for draft-ietf-idn-requirement - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'IDNREQ' 

  -- Possible downref: Normative reference to a draft: ref. 'IDNE' 

  -- Possible downref: Normative reference to a draft: ref. 'CHNORM' 

  -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' 

  -- Duplicate reference: draft-ietf-idn-compare, mentioned in 'NAMEPREP',
     was also mentioned in 'IDNCOMP'.

  -- Possible downref: Normative reference to a draft: ref. 'NAMEPREP' 

  -- Possible downref: Normative reference to a draft: ref. 'SACE' 

  -- Possible downref: Normative reference to a draft: ref. 'RACE' 


     Summary: 7 errors (**), 0 flaws (~~), 10 warnings (==), 23 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                            Dan Oscarsson
2	draft-ietf-idn-udns-03.txt                                Telia ProSoft
3	Updates: RFC 2181, 1035, 1034, 2535                       19 August 2001
4	Expires: 19 February 2002

6	   Using the Universal Character Set in the Domain Name System (UDNS)

8	Status of this memo

10	   This document is an Internet-Draft and is in full conformance with
11	   all provisions of Section 10 of RFC2026.

13	   Internet-Drafts are working documents of the Internet Engineering
14	   Task Force (IETF), its areas, and its working groups. Note that other
15	   groups may also distribute working documents as Internet-Drafts.

17	   Internet-Drafts are draft documents valid for a maximum of six months
18	   and may be updated, replaced, or obsoleted by other documents at any
19	   time. It is inappropriate to use Internet-Drafts as reference
20	   material or to cite them other than as "work in progress."

22	     The list of current Internet-Drafts can be accessed at
23	     http://www.ietf.org/ietf/1id-abstracts.txt

25	     The list of Internet-Draft Shadow Directories can be accessed at
26	     http://www.ietf.org/shadow.html.

28	Abstract

30	   Since the Domain Name System (DNS) [RFC1035] was created there have
31	   been a desire to use other characters than ASCII in domain names.
32	   Lately this desire have grown very strong and several groups have
33	   started to experiment with non-ASCII names.  This document defines
34	   how the Universal Character Set (UCS) [ISO10646] is to be used in
35	   DNS.  It includes both a transition scheme for older software
36	   supporting non-ASCII handling in applications only, as well as how to
37	   use UCS in labels and having more than 63 octets in a label.

39	1. Introduction

41	   While the need for non-ASCII domain names have existed since the
42	   creation of the DNS, the need have increased very much during the
43	   last few years. Currently there are at least two implementations
44	   using UTF-8 in use, and others using other methods.

46	   To avoid several different implementations of non-ASCII names in DNS
47	   that do not work together, and to avoid breaking the current ASCII
48	   only DNS, there is an immediate need to standardise how DNS shall
49	   handle non-ASCII names.

51	   While the DNS protocol allow any octet in character data, so far the
52	   octets are only defined for the ASCII code points. Octets outside the
53	   ASCII range have no defined interpretation. This document defines how
54	   all octets are to be used in character data allowing a standardised
55	   way to use non-ASCII in DNS.

57	   The specification here conforms to the IDN requirements [IDNREQ].

59	1.1 Terminology

61	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
62	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
63	   document are to be interpreted as described in [RFC2119].

65	    IDN: Internationalised Domain Name, here used to mean a domain name
66	      containing non-ASCII characters.

68	    ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
69	      compatible with the ASCII host name syntax.

71	1.2 Previous versions of this document

73	   This version contains just minor corrections to the 4:th version.

75	   The third version of this document included a way to return both
76	   ASCII and non-ASCII versions of a name. As this could not be
77	   guaranteed to work it has been removed.

79	   The second version of this document was available as draft-ietf-idn-
80	   udns-00.txt. It included a lot of possibilities as well as a flag bit
81	   that is now removed.

83	   The first version of this document was available as draft-oscarsson-
84	   i18ndns-00.txt.

86	2. The DNS Protocol

88	   The DNS protocol is used when communicating between DNS servers and
89	   other DNS servers or DNS clients. User interface issues like the
90	   format of zone files or how to enter or display domain names are not
91	   part of the protocol.

93	   The update of the protocol defined here can be used immediately as it
94	   is fully compatible with the DNS of today.

96	   For a long time there will be software understanding UCS in DNS and
97	   software only understanding ASCII in DNS. It is therefore necessary
98	   to support a mixing of both types. For the following text software
99	   understanding UCS in DNS will be called UDNS aware.

101	   This specification supports the following scenarios:

103	    - UDNS unaware client, UDNS aware DNS server
104	    - UDNS aware client, UDNS unaware DNS server
105	    - UDNS aware client, UDNS aware DNS server

107	2.1 Fundamentals

109	2.1.1 Standard Character Encoding (SCE)

111	   Character data need to be able to represent as much as possible of
112	   the characters in the world as well as being compatible with ASCII.
113	   Character data is used in labels and in text fields in the RDATA part
114	   of a RR.

116	   The Standard Character Encoding of character data used in the DNS
117	   protocol MUST:
118	    - Use ISO 10646 (UCS) [ISO10646] as coded character set.
119	    - Be normalised using form C as defined in Unicode technical report
120	      #15 [UTR15]. See also [CHNORM].
121	    - Encoded using the UTF-8 [RFC2279] character encoding scheme.

123	2.1.2 Binary Comparison Format (BCF)

125	   RFC 1035 states that the labels of a name are matched case-
126	   insensitively.  When using UCS this is no longer enough as there are
127	   other forms than case that need to match as equivalent. Form-
128	   insensitive matching of UCS includes:
129	    - Letters of different case are compared as the same character.
130	    - Code points of primary typographical variations of the same
131	      character are compared as the same character. An example is double
132	      width/normal width characters or presentation forms of a
133	      character.
134	    - Some characters are represented with multiple code points in UCS.
135	      All code points of one character must compare as the same.  For
136	      example the degree Kelvin sign is the same as the letter K.

138	   The original definition is now extended to be: labels must be
139	   compared using form-insensitivity.

141	   To handle form-insensitivity it is here defined the Binary Comparison
142	   Format (BCF) to which strings can be mapped.  After strings is mapped
143	   to BCF they can be compared using binary string comparison.
144	   Implementors may implement the form-insensitive comparison without
145	   using BCF, as long as the results are the same.

147	   Mapping of a label to BCF is typically done by steps like: changing
148	   all upper case letters to lower case, mapping different forms to one
149	   form and changing different code points of one character into a
150	   single code point.

152	   For the UCS character code range 0-255 (ASCII and ISO 8859-1) the BCF
153	   MUST be done by mapping all upper case characters to lower case
154	   following the one to one mapping as defined in the Unicode 3.0
155	   Character Database [UDATA].

157	   The definition of the Binary Comparison Format (BCF) for the rest of
158	   UCS will be defined in a separate document. The nearest today is
159	   [NAMEPREP].

161	2.1.3 Backward Compatibility Encoding (BCE)

163	   To support older software expecting only ASCII and to support
164	   downgrading from 8-bit to 7-bit ASCII in other protocols (like SMTP)
165	   a Backward Compatibility Encoding (BCE) is available. It is a
166	   transition mechanism and will no longer be supported at some future
167	   time when it is so decided.

169	   The Backward Compatibility Encoding (BCE) of a label is defined as
170	   the BCF of the label encoded using an ASCII Compatible Encoding
171	   (ACE).

173	   The definition of the ACE to be used, is defined in a separate
174	   document.  Typical definitions that are suitable are [SACE] and
175	   [RACE].

177	   The reason that the BCF form of the label is used is to support
178	   solutions where only applications know about non-ASCII labels. By
179	   using BCF the server need not know about UCS and can just do binary
180	   matching so it can be handled in old servers. Though due to the fact
181	   that BCF destroys information contained in the original form of a
182	   label it is impossible to return the original form to a client using
183	   BCE.

185	2.1.4 Long names

187	   The current DNS protocol limits a label to 63 octets. As UTF-8 take
188	   more than one octet for some characters, an UTF-8 name cannot have 63
189	   characters in a label like an ASCII name can. For example a name
190	   using Hangul would have a maximum of 21 characters.

192	   The limits imposed by RFC 1035 is 63 octets per label and 255 octets
193	   for the full name. The 255 limit is not a protocol limit but one to
194	   simplify implementations.

196	   To support longer names a long label type is defined using [RFC2671]
197	   as extended label 0b000011 (the label type will be assigned by IANA
198	   and may not be the number used here).

200	                                 1 1 1 1 1 1 1 1 1 1
201	             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
202	            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
203	            |0 1 0 0 0 0 1 1|  length       |  label data ...
204	            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

206	   length: length of label in octets
207	   label data: the label

209	   The long label MUST be handled by all software following this
210	   specification.  Also, they MUST support a UDP packet size of up to
211	   1280 bytes.

213	   The limits for labels are updated since RFC 1025 as follows:
214	   A label is limited to a maximum of 63 character code points in UCS
215	   normalised using Unicode form C.  The full name is limited to a
216	   maximum of 255 character code points normalised as for a label.

218	   A long label MUST always use the Standard Character Encoding (SCE).

220	   As long labels are not understood by older software, a response MUST
221	   not include a long label unless the query did. At a later date, IETF
222	   may change this.

224	2.2 Rules for matching of domain names in UDNS aware DNS servers

226	   To be able to handle correct domain name matching in lookups, the
227	   following MUST be followed by DNS servers:
228	    - Do matching on authorative data using form-insensitive matching
229	      for the characters used in the data (for example a zone using only
230	      ASCII need only handle matching of ASCII characters).
231	    - On non-authorative data, either do binary matching or case-
232	      insensitive matching on ASCII letters and binary matching on all
233	      others.

235	   The effect of the above is:

237	    - only servers handling authorative data must implement form-
238	      insensitive matching of names. And they need only implement the
239	      subset needed for the subset of characters of UCS they support in
240	      their authorative zones.
241	    - it normally gives fast lookup because data is usually sent like:
242	      resolver <-> server <-> authorative server.
243	      While form-insensitive matching can be complex and CPU consuming,
244	      the server in the middle will do caching with only simple and fast
245	      binary matching. So the impact of complex matching rules should
246	      not slow down DNS very much.

248	2.3 Mixing of UDNS aware and non-UDNS aware clients and servers

250	   To handle the mixing of UDNS aware and non-UDNS aware clients and
251	   servers the following MUST be followed for clients and servers.

253	2.3.1 Native UDNS aware client

255	   A native UDNS aware client is a client supporting all in this
256	   document.

258	   When doing a query it MUST:
259	    - Use the long label in the QNAME.
260	    - If server rejected query due to long label, retry the query using
261	      the normal short label. If the QNAME contains non-ASCII it must be
262	      encoded using BCE.
263	    - Handle answers containg BCE.

265	   The client may skip trying a query using the long label if it knows
266	   the server does not understand it.

268	2.3.2 Application based UDNS aware client

270	   An application based UDNS aware client is a client supporting UDNS
271	   through BCE handling in the application.

273	   It only understands BCE and need only a non-UDNS aware resolver to
274	   work.  All encoding and decoding of BCE is handled in the
275	   application.

277	   Due to BCE being an ACE of BCF the names returned in an answer need
278	   not contain the real form of the name. Instead it may contains the
279	   simplified form used in name matching. As this is a transition
280	   mechanism to support non-ASCII in names before the DNS servers have
281	   been upgraded, it is acceptable and will give people a reason to
282	   upgrade.

284	2.3.3 non-UDNS aware client
285	   A non-UDNS aware client will send ASCII or whatever is sent from an
286	   application. It can be BCE which will for the client just be ASCII
287	   text.

289	2.3.4 UDNS aware server

291	   An UDNS aware server MUST handle all in this document and follow:
292	    - If an incoming query contains a long label the answer may contain
293	      a long label and the client is identified as being UDNS aware.
294	    - If the query comes from a non-UDNS aware client and the answer
295	      contains non-ASCII, the non-ASCII labels must be encoded using
296	      BCE.
297	    - If a short label is used in a query and the QNAME contains non-
298	      ASCII, an authorative server must handle the query if the
299	      character encoding can be recognised. If must recognise SCE and
300	      should recognise common encodings used for the labels in the
301	      domain it is authorative for. Answers will use BCE for all labels
302	      except the one matching QNAME.  This will allow clients using the
303	      local character set to work in many cases before the resolver code
304	      is upgraded.

306	2.3.5 non-UDNS aware server

308	   A non-UDNS server can only handle ASCII matching when comparing
309	   names.  It can support the transition mechanism with BCE. The
310	   authorative zones will then have to be loaded with manually BCE
311	   encoded names.

313	2.4 DNSSEC

315	   As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be
316	   revised so that it also can handle that.

318	3. Effect on other protocols

320	   As now a domain name may include non-ASCII many other protocols that
321	   include domain names need to be updated. For example SMTP, HTTP and
322	   URIs. The BCE format can be used when interfacing with ASCII only
323	   software or protocols.  Protocols like SMTP could be extended using
324	   ESMTP and a UTF8 option that defines that all headers are in UTF-8.

326	   It is recommended that protocols updated to handle i18n do this by
327	   encoding character data in the same standard format as defined for
328	   DNS in this document (UCS normalised form C). The use of encoding it
329	   in ASCII or by tagged character sets should be avoided.

331	   DNS do not only have domain names in them, for example e-mail
332	   addresses are also included. So an e-mail address would be expected
333	   to be changed to include non-ASCII both before and after the @-sign.

335	   Software need to be updated to follow the user interface
336	   recommendations given above, so that a human will see the characters
337	   in their local character set, if possible.

339	4. Security Considerations

341	   As always with data, if software does not check for data that can be
342	   a problem, security may be affected. As more characters than ASCII is
343	   allowed, software only expecting ASCII and with no checks may now get
344	   security problems.

346	5. References

348	   [RFC1034]  P. Mockapetris, "Domain Names - Concepts and Facilities",
349	              STD 13, RFC 1034, November 1987.

351	   [RFC1035]  P. Mockapetris, "Domain Names - Implementation and
352	              Specification", STD 13, RFC 1035, November 1987.

354	   [RFC2119]  Scott Bradner, "Key words for use in RFCs to Indicate
355	              Requirement Levels", March 1997, RFC 2119.

357	   [RFC2181]  R. Elz and R. Bush, "Clarifications to the DNS
358	              Specification", RFC 2181, July 1997.

360	   [RFC2279]  F. Yergeau, "UTF-8, a transformation format of ISO 10646",
361	              RFC 2279, January 1998.

363	   [RFC2535]  D. Eastlake, "Domain Name System Security Extensions".
364	              RFC 2535, March 1999.

366	   [RFC2671]  P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
367	              2671, August 1999.

369	   [ISO10646] ISO/IEC 10646-1:2000. International Standard --
370	              Information technology -- Universal Multiple-Octet Coded
371	              Character Set (UCS)

373	   [Unicode]  The Unicode Consortium, "The Unicode Standard -- Version
374	              3.0", ISBN 0-201-61633-5. Described at
375	              http://www.unicode.org/unicode/standard/versions/
376	              Unicode3.0.html

378	   [UTR15]    M. Davis and M. Duerst, "Unicode Normalization Forms",
379	              Unicode Technical Report #15, Nov 1999,
380	              http://www.unicode.org/unicode/reports/tr15/.

382	   [UTR21]    M. Davis, "Case Mappings", Unicode Technical Report #21,
383	              Dec 1999, http://www.unicode.org/unicode/reports/tr21/.

385	   [UDATA]    The Unicode Character Database,
386	              ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
387	              The database is described in
388	              ftp://ftp.unicode.org/Public/UNIDATA/
389	              UnicodeCharacterDatabase.html.

391	   [IDNREQ]   James Seng, "Requirements of Internationalized Domain
392	   Names", draft-ietf-idn-requirement.

394	   [IANADNS]  Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
395	   System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.

397	   [IDNE]     Marc Blanchet,Paul  Hoffman, "Internationalized domain
398	   names using EDNS (IDNE)", draft-ietf-idn-idne.

400	   [CHNORM]   M. Duerst, M. Davis, "Character Normalization in IETF
401	   Protocols", draft-duerst-i18n-norm.

403	   [IDNCOMP]  Paul Hoffman, "Comparison of Internationalized Domain Name
404	   Proposals", draft-ietf-idn-compare.

406	   [NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
407	   Proposals", draft-ietf-idn-compare.

409	   [SACE]     Dan Oscarsson, "Simple ASCII Compatible Encoding", draft-
410	   ietf-idn-sace.

412	   [RACE]     Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
413	   for IDN", draft-ietf-idn-race.

415	6. Acknowledgements

417	   Paul Hoffman giving many comments in our e-mail discussions.

419	   Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent
420	   Karlsson.

422	   Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
423	   comments on my draft.

425	   Discussions and comments by the members of the IDN working group.

427	Author's Address

429	   Dan Oscarsson
430	   Telia ProSoft AB
431	   Box 85
432	   201 20 Malmo
433	   Sweden

435	   E-mail: Dan.Oscarsson@trab.se