idnits 2.17.1
draft-ietf-idnabis-protocol-09.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** The document seems to lack a License Notice according IETF Trust
Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
Section 6.b -- however, there's a paragraph with a matching beginning.
Boilerplate error?
-- It seems you're using the 'non-IETF stream' Licence Notice instead
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
-- The draft header indicates that this document obsoletes RFC3490, but the
abstract doesn't seem to mention this, which it should.
-- The draft header indicates that this document updates RFC3492, but the
abstract doesn't seem to mention this, which it should.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
(Using the creation date from RFC3492, updated by this document, for
RFC5378 checks: 2002-01-10)
-- The document seems to contain a disclaimer for pre-RFC5378 work, and may
have content which was first submitted before 10 November 2008. The
disclaimer is necessary when there are original authors that you have
been unable to contact, or if some do not wish to grant the BCP78 rights
to the IETF Trust. If you are able to get all authors (current and
original) to grant those rights, you can and should remove the
disclaimer; otherwise, the disclaimer is needed and you can ignore this
comment. (See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (February 20, 2009) is 5538 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Unused Reference: 'RFC1123' is defined on line 676, but no explicit
reference was found in the text
== Unused Reference: 'Unicode-PropertyValueAliases' is defined on line 686,
but no explicit reference was found in the text
== Unused Reference: 'Unicode-RegEx' is defined on line 691, but no
explicit reference was found in the text
== Unused Reference: 'Unicode-Scripts' is defined on line 696, but no
explicit reference was found in the text
== Unused Reference: 'ASCII' is defined on line 708, but no explicit
reference was found in the text
== Unused Reference: 'RFC2136' is defined on line 722, but no explicit
reference was found in the text
== Unused Reference: 'RFC2181' is defined on line 726, but no explicit
reference was found in the text
== Unused Reference: 'RFC2535' is defined on line 729, but no explicit
reference was found in the text
== Unused Reference: 'RFC2671' is defined on line 732, but no explicit
reference was found in the text
-- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA2008-BIDI'
-- Possible downref: Non-RFC (?) normative reference: ref.
'Unicode-PropertyValueAliases'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-RegEx'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-Scripts'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-UAX15'
-- Obsolete informational reference (is this intentional?): RFC 2535
(Obsoleted by RFC 4033, RFC 4034, RFC 4035)
-- Obsolete informational reference (is this intentional?): RFC 2671
(Obsoleted by RFC 6891)
-- Obsolete informational reference (is this intentional?): RFC 3490
(Obsoleted by RFC 5890, RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 3491
(Obsoleted by RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 4952
(Obsoleted by RFC 6530)
Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 15 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Klensin
3 Internet-Draft February 20, 2009
4 Obsoletes: 3490, 3491
5 (if approved)
6 Updates: 3492 (if approved)
7 Intended status: Standards Track
8 Expires: August 24, 2009
10 Internationalized Domain Names in Applications (IDNA): Protocol
11 draft-ietf-idnabis-protocol-09.txt
13 Status of this Memo
15 This Internet-Draft is submitted to IETF in full conformance with the
16 provisions of BCP 78 and BCP 79.
18 Internet-Drafts are working documents of the Internet Engineering
19 Task Force (IETF), its areas, and its working groups. Note that
20 other groups may also distribute working documents as Internet-
21 Drafts.
23 Internet-Drafts are draft documents valid for a maximum of six months
24 and may be updated, replaced, or obsoleted by other documents at any
25 time. It is inappropriate to use Internet-Drafts as reference
26 material or to cite them other than as "work in progress."
28 The list of current Internet-Drafts can be accessed at
29 http://www.ietf.org/ietf/1id-abstracts.txt.
31 The list of Internet-Draft Shadow Directories can be accessed at
32 http://www.ietf.org/shadow.html.
34 This Internet-Draft will expire on August 24, 2009.
36 Copyright Notice
38 Copyright (c) 2009 IETF Trust and the persons identified as the
39 document authors. All rights reserved.
41 This document is subject to BCP 78 and the IETF Trust's Legal
42 Provisions Relating to IETF Documents
43 (http://trustee.ietf.org/license-info) in effect on the date of
44 publication of this document. Please review these documents
45 carefully, as they describe your rights and restrictions with respect
46 to this document.
48 This document may contain material from IETF Documents or IETF
49 Contributions published or made publicly available before November
50 10, 2008. The person(s) controlling the copyright in some of this
51 material may not have granted the IETF Trust the right to allow
52 modifications of such material outside the IETF Standards Process.
53 Without obtaining an adequate license from the person(s) controlling
54 the copyright in such materials, this document may not be modified
55 outside the IETF Standards Process, and derivative works of it may
56 not be created outside the IETF Standards Process, except to format
57 it for publication as an RFC or to translate it into languages other
58 than English.
60 Abstract
62 This document supplies the protocol definition for a revised and
63 updated specification for internationalized domain names (IDNs). The
64 rationale for these changes, the relationship to the older
65 specification, and important terminology are provided in other
66 documents. This document specifies the protocol mechanism, called
67 Internationalizing Domain Names in Applications (IDNA), for
68 registering and looking up IDNs in a way that does not require
69 changes to the DNS itself. IDNA is only meant for processing domain
70 names, not free text.
72 Table of Contents
74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
75 1.1. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5
76 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
77 3. Requirements and Applicability . . . . . . . . . . . . . . . . 6
78 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 6
79 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6
80 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 7
81 3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 7
82 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 7
83 4.1. Input to IDNA Registration Process . . . . . . . . . . . . 8
84 4.2. Permitted Character and Label Validation . . . . . . . . . 8
85 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 8
86 4.2.2. Rejection of Characters that are not Permitted . . . . 8
87 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 9
88 4.2.4. Registration Validation Summary . . . . . . . . . . . 10
89 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 10
90 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 11
91 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 11
92 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 11
93 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 11
94 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 11
95 5.3. Character Changes in Preprocessing or the User
96 Interface . . . . . . . . . . . . . . . . . . . . . . . . 12
97 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 13
98 5.5. Validation and Character List Testing . . . . . . . . . . 13
99 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 14
100 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 14
101 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15
102 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
103 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 15
104 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15
105 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
106 10.1. Normative References . . . . . . . . . . . . . . . . . . . 16
107 10.2. Informative References . . . . . . . . . . . . . . . . . . 17
108 Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 18
109 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 19
110 B.1. Changes between Version -00 and -01 of
111 draft-ietf-idnabis-protocol . . . . . . . . . . . . . . . 19
112 B.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 19
113 B.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 19
114 B.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 20
115 B.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 20
116 B.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 20
117 B.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 20
118 B.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 21
119 B.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 21
121 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21
123 1. Introduction
125 This document supplies the protocol definition for a revised and
126 updated specification for internationalized domain names. Essential
127 definitions and terminology for understanding this document and a
128 road map of the collection of documents that make up IDNA2008 appear
129 in [IDNA2008-Defs]. Appendix A discusses the relationship between
130 this specification and the earlier version of IDNA (referred to here
131 as "IDNA2003") and the rationale for these changes, along with
132 considerable explanatory material and advice to zone administrators
133 who support IDNs is provided in another documents, notably
134 [IDNA2008-Rationale].
136 IDNA works by allowing applications to use certain ASCII string
137 labels (beginning with a special prefix) to represent non-ASCII name
138 labels. Lower-layer protocols need not be aware of this; therefore
139 IDNA does not depend on changes to any infrastructure. In
140 particular, IDNA does not depend on any changes to DNS servers,
141 resolvers, or protocol elements, because the ASCII name service
142 provided by the existing DNS is entirely sufficient for IDNA.
144 IDNA is applied only to DNS labels. Standards for combining labels
145 into fully-qualified domain names and parsing labels out of those
146 names are covered in the base DNS standards [RFC1034] [RFC1035] and
147 their various updates. An application may, of course, apply locally-
148 appropriate conventions to the presentation forms of domain names as
149 discussed in [IDNA2008-Rationale].
151 While they share terminology, reference data, and some operations,
152 this document describes two separate protocols, one for IDN
153 registration (Section 4) and one for IDN lookup (Section 5).
155 1.1. Discussion Forum
157 [[anchor3: RFC Editor: please remove this section.]]
159 This work is being discussed in the IETF IDNABIS WG and on the
160 mailing list idna-update@alvestrand.no
162 2. Terminology
164 General terminology applicable to IDNA, but with meanings familiar to
165 those who have worked with Unicode or other character set standards
166 and the DNS, appears in [IDNA2008-Defs]. Terminology that is an
167 integral, normative, part of the IDNA definition, including the
168 definitions of "ACE", appears in that document as well. Familiarity
169 with the terminology materials in that document is assumed for
170 reading this one. The reader of this document is assumed to be
171 familiar with DNS-specific terminology as defined in RFC 1034
172 [RFC1034].
174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
176 document are to be interpreted as described in BCP 14, RFC 2119
177 [RFC2119].
179 3. Requirements and Applicability
181 3.1. Requirements
183 IDNA conformance means adherence to the following requirements:
185 1. Whenever a domain name is put into an IDN-unaware domain name
186 slot (see Section 2 and [IDNA2008-Defs]), it MUST contain only
187 ASCII characters (i.e., must be either an A-label or an NR-LDH-
188 label), or must be a label associated with a DNS application that
189 is not subject to either IDNA or the historical recommendations
190 for "hostname"-style names [RFC1034].
192 2. Comparison of labels MUST be done on equivalent forms: either
193 both A-Label forms or both U-Label forms. Because A-labels and
194 U-labels can be transformed into each other without loss of
195 information, these comparisons are equivalent. However, when a
196 pair of putative A-labels are compared, the comparison MUST use
197 an ASCII case-insensitive comparison (as with all comparisons of
198 ASCII DNS labels). Comparisons on putative U-labels must test
199 that the two strings are identical, without case-folding or other
200 intermediate steps. Note that it is not necessary to verify that
201 labels are valid in order to compare them. In many cases,
202 verification of validity (that the strings actually are A-labels
203 or U-labels) may be important for other reasons and SHOULD be
204 performed.
206 3. Labels being registered MUST conform to the requirements of
207 Section 4. Labels being looked up and the lookup process MUST
208 conform to the requirements of Section 5.
210 3.2. Applicability
212 IDNA is applicable to all domain names in all domain name slots
213 except where it is explicitly excluded. It is not applicable to
214 domain name slots which do not use the LDH syntax rules.
216 This implies that IDNA is applicable to many protocols that predate
217 IDNA. Note that IDNs occupying domain name slots in those older
218 protocols MUST be in A-label form until and unless those protocols
219 and implementations of them are upgraded to be IDN-aware. IDNs
220 actually appearing in DNS queries or responses MUST be A-labels.
222 3.2.1. DNS Resource Records
224 IDNA applies only to domain names in the NAME and RDATA fields of DNS
225 resource records whose CLASS is IN.
227 There are currently no other exclusions on the applicability of IDNA
228 to DNS resource records. Applicability depends entirely on the
229 CLASS, and not on the TYPE except as noted below. This will remain
230 true, even as new types are defined, unless there is a compelling
231 reason for a new type that requires type-specific rules. The special
232 naming conventions applicable to SRV records are examples of type-
233 specific rules that are incompatible with IDNA coding. Hence the
234 first two labels (the ones required to start in "_") on a record with
235 TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible
236 to write a non-ASCII string with a leading underscore, conversion to
237 an A-label would be impossible without loss of information because
238 the underscore is not a letter, digit, or hyphen and is consequently
239 DISALLOWED in IDNs). Of course, those labels may be part of a domain
240 that uses IDN labels at higher levels in the tree.
242 3.2.2. Non-domain-name Data Types Stored in the DNS
244 Although IDNA enables the representation of non-ASCII characters in
245 domain names, that does not imply that IDNA enables the
246 representation of non-ASCII characters in other data types that are
247 stored in domain names, specifically in the RDATA field for types
248 that have structured RDATA format. For example, an email address
249 local part is stored in a domain name in the RNAME field as part of
250 the RDATA of an SOA record (hostmaster@example.com would be
251 represented as hostmaster.example.com). IDNA specifically does not
252 update the existing email standards, which allow only ASCII
253 characters in local parts. Even though work is in progress to define
254 internationalization for email addresses [RFC4952], changes to the
255 email address part of the SOA RDATA would require action in, or
256 updates to, other standards, specifically those that specify the
257 format of the SOA RR.
259 4. Registration Protocol
261 This section defines the procedure for registering an IDN. The
262 procedure is implementation independent; any sequence of steps that
263 produces exactly the same result for all labels is considered a valid
264 implementation.
266 Note that, while the registration and lookup protocols (Section 5)
267 are very similar in most respects, they are different and
268 implementers should carefully follow the steps they are implementing.
270 4.1. Input to IDNA Registration Process
272 [[anchor8: Note in Draft: This subsection is new in -08, based on
273 comments on the mailing list in January and February 2009. It
274 replaces the previous first two subsections of this section and
275 completely eliminates the discussion of local mapping for
276 registration.]]
278 Registration processes are outside the scope of these protocols and
279 may differ significantly depending on local needs. By the time a
280 string enters the IDNA registration process as described in this
281 specification, it is expected to be in Unicode and MUST be in Unicode
282 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
283 zone files ("registries") are expected to accept only the exact
284 string for which registration is requested, free of any mappings or
285 local adjustments. They SHOULD avoid any possible ambiguity by
286 accepting registrations only for A-labels, possibly paired with the
287 relevant U-labels so that they can verify the correspondence.
289 4.2. Permitted Character and Label Validation
291 4.2.1. Input Format
293 The registry MAY permit submission of labels in A-label form. If it
294 does so, it MUST perform a conversion to a U-label, perform the steps
295 and tests described below, and verify that the A-label produced by
296 the step in Section 4.4 matches the one provided as input. If, for
297 some reason, it does not, the registration MUST be rejected. If the
298 conversion to a U-label is not performed, the registry MUST verify
299 that the A-label is superficially valid, i.e., that it does not
300 violate any of the rules of Punycode [RFC3492] encoding such as the
301 prohibition on trailing hyphen-minus, appearance of non-basic
302 characters before the delimiter, and so on. Invalid strings that
303 appear to be A-labels MUST NOT be placed in DNS zones.
305 4.2.2. Rejection of Characters that are not Permitted
307 The candidate Unicode string is checked to verify that characters
308 that IDNA does not permit do not appear in it. Those characters are
309 identified in the "DISALLOWED" and "UNASSIGNED" lists that are
310 specified in [IDNA2008-Tables] and described informally in
311 [IDNA2008-Rationale]. Characters that are either DISALLOWED or
312 UNASSIGNED MUST NOT be part of labels to be processed for
313 registration in the DNS.
315 4.2.3. Label Validation
317 The proposed label (in the form of a Unicode string, i.e., a putative
318 U-label) is then examined, performing tests that require examination
319 of more than one character.
321 4.2.3.1. Rejection of Hyphen Sequences in U-labels
323 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
324 the third and fourth character positions when the label is considered
325 in "on the wire" order.
327 4.2.3.2. Leading Combining Marks
329 The first character of the string (when the label is considered in
330 "on the wire" order) is examined to verify that it is not a combining
331 mark (or combining character) (see The Unicode Standard, Section 2.11
332 [Unicode] for an exact definition). If it is a combining mark, the
333 string MUST NOT be registered.
335 4.2.3.3. Contextual Rules
337 Each code point is checked for its identification as a character
338 requiring contextual processing for registration (the list of
339 characters appears as the combination of CONTEXTJ and CONTEXTO in
340 [IDNA2008-Tables] as do the contextual rules themselves). If that
341 indication appears, the table of contextual rules is checked for a
342 rule for that character. If no rule is found, the proposed label is
343 rejected and MUST NOT be installed in a zone file. If one is found,
344 it is applied (typically as a test on the entire label or on adjacent
345 characters within the label). If the application of the rule does
346 not conclude that the character is valid in context, the proposed
347 label MUST BE rejected. (See the IANA Considerations: IDNA Context
348 Registry section of [IDNA2008-Tables].)
350 These contextual rules are required to support the use of characters
351 that could be used, under other conditions, to produce misleading
352 labels or to cause unacceptable ambiguity in label matching and
353 interpretation. For example, labels containing invisible ("zero-
354 width") characters may be permitted in context with characters whose
355 presentation forms are significantly changed by the presence or
356 absence of the zero-width characters, while other labels in which
357 zero-width characters appear may be rejected.
359 4.2.3.4. Labels Containing Characters Written Right to Left
361 Special tests are required for strings containing characters that are
362 normally written from right to left. The criteria for classifying
363 characters in terms of directionality are identified in the "Bidi"
364 document [IDNA2008-BIDI] in this series. That document also
365 describes conditions for strings that contain one or more of those
366 characters to be U-labels. The tests for those conditions, specified
367 there, are applied. Strings that contain right to left characters
368 that do not conform to the IDNA Bidi rules MUST NOT be inserted as
369 labels in zone files.
371 4.2.4. Registration Validation Summary
373 Strings that contain at least one non-ASCII character, have been
374 produced by the steps above, whose contents pass the above tests, and
375 are 63 or fewer characters long in ACE form (see Section 4.4), are
376 U-labels.
378 To summarize, tests are made in Section 4.2 for invalid characters,
379 invalid combinations of characters, and for labels that are invalid
380 even if the characters they contain are valid individually.
382 4.3. Registry Restrictions
384 Registries at all levels of the DNS, not just the top level, are
385 expected to establish policies about the labels that may be
386 registered, and for the processes associated with that action. While
387 exact policies are not specified as part of IDNA2008 and it is
388 expected that different registries may specify different policies,
389 there SHOULD be policies. Even a trivial policy (e.g., "anything can
390 be registered in this zone that can be represented as an A-label -
391 U-label pair") has value because it provides notice to users and
392 applications implementers that the registry cannot be relied upon to
393 provide even minimal user-protection restrictions. These per-
394 registry policies and restrictions are an essential element of the
395 IDNA registration protocol even for registries (and corresponding
396 zone files) deep in the DNS hierarchy. As discussed in
397 [IDNA2008-Rationale], such restrictions have always existed in the
398 DNS. That document also contains a discussion and recommendations
399 about possible types of rules.
401 The string produced by the above steps is checked and processed as
402 appropriate to local registry restrictions. Application of those
403 registry restrictions may result in the rejection of some labels or
404 the application of special restrictions to others.
406 4.4. Punycode Conversion
408 The resulting U-label is converted to an A-label. The A-label, more
409 precisely defined elsewhere, is the encoding of the U-label according
410 to the Punycode algorithm [RFC3492] with the ACE prefix "xn--" added
411 at the beginning of the string. The resulting string must, of
412 course, conform to the length limits imposed by the DNS. This
413 document updates RFC 3492 only to the extent of replacing the
414 reference to the discussion of the ACE prefix. The ACE prefix is now
415 specified in this document rather than as part of RFC 3490 or
416 Nameprep [RFC3491] but is the same in both sets of documents.
418 The failure conditions identified in the Punycode encoding procedure
419 cannot occur if the input is a U-label as determined by the steps
420 above.
422 4.5. Insertion in the Zone
424 The A-label is registered in the DNS by insertion into a zone.
426 5. Domain Name Lookup Protocol
428 Lookup is conceptually different from registration and different
429 tests are applied on the client. Although some validity checks are
430 necessary to avoid serious problems with the protocol (see
431 Section 5.5ff.), the lookup-side tests are more permissive and rely
432 on the assumption that names that are present in the DNS are valid.
433 That assumption is, however, a weak one because the presence of wild
434 cards in the DNS might cause a string that is not actually registered
435 in the DNS to be successfully looked up.
437 5.1. Label String Input
439 The user supplies a string in the local character set, typically by
440 typing it or clicking on, or copying and pasting, a resource
441 identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the
442 domain name is extracted. Alternately, some process not directly
443 involving the user may read the string from a file or obtain it in
444 some other way. Processing in this step and the next two are local
445 matters, to be accomplished prior to actual invocation of IDNA, but
446 at least the two steps in Section 5.2 and Section 5.3 must be
447 accomplished in some way.
449 5.2. Conversion to Unicode
451 The string is converted from the local character set into Unicode, if
452 it is not already Unicode. The exact nature of this conversion is
453 beyond the scope of this document, but may involve normalization
454 identical to that discussed in Section 4.1. The result MUST be a
455 Unicode string in NFC form.
457 5.3. Character Changes in Preprocessing or the User Interface
459 The Unicode string MAY then be processed to prevent confounding of
460 user expectations. For instance, it might be reasonable, at this
461 step, to convert all upper case characters to lower case, if this
462 makes sense in the user's environment, but even this should be
463 approached with caution due to some edge cases: in the long term, it
464 is probably better for users to understand IDNs strictly in lower-
465 case, U-label, form. More generally, preprocessing may be useful to
466 smooth the transition from IDNA2003, especially for direct user
467 input, but with similar cautions. In general, IDNs appearing in
468 files and those transmitted across the network as part of protocols
469 are expected to be in either ASCII form (including A-labels) or to
470 contain U-labels, rather than being in forms requiring mapping or
471 other conversions.
473 Other examples of processing for localization might be applied,
474 especially to direct user input, at this point. They include
475 interpreting various characters as separating domain name components
476 from each other (label separators) because they either look like
477 periods or are used to separate sentences, mapping halfwidth or
478 fullwidth East Asian characters to the common form permitted in
479 labels, or giving special treatment to characters whose presentation
480 forms are dependent only on placement in the label. Such
481 localization changes are also outside the scope of this
482 specification.
484 Recommendations for preprocessing for global contexts (i.e., when
485 local considerations do not apply or cannot be used) and for maximum
486 interoperability with labels that might have been specified under
487 liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. It
488 is important to note that the intent of these specifications is that
489 labels in application protocols, files, or links are intended to be
490 in U-label or A-label form. Preprocessing MUST NOT map a character
491 that is valid in a label as specified elsewhere in this document or
492 in [IDNA2008-Tables] into another character. Excessively liberal use
493 of preprocessing, especially to strings stored in files, poses a
494 threat to consistent and predictable behavior for the user even if
495 not to actual interoperability.
497 Because these transformations are local, it is important that domain
498 names that might be passed between systems (e.g., in IRIs) be
499 U-labels or A-labels and not forms that might be accepted locally as
500 a consequence of this step. This step is not standardized as part of
501 IDNA, and is not further specified here.
503 5.4. A-label Input
505 If the input to this procedure appears to be an A-label (i.e., it
506 starts in "xn--"), the lookup application MAY attempt to convert it
507 to a U-label and apply the tests of Section 5.5 and the conversion of
508 Section 5.6 to that form. If the label is converted to Unicode
509 (i.e., to U-label form) using the Punycode decoding algorithm, then
510 the processing specified in those two sections MUST be performed, and
511 the label MUST be rejected if the resulting label is not identical to
512 the original. See the Name Server Considerations section of
513 [IDNA2008-Rationale] for additional discussion on this topic.
515 That conversion and testing SHOULD be performed if the domain name
516 will later be presented to the user in native character form (this
517 requires that the lookup application be IDNA-aware). If those steps
518 are not performed, the lookup process SHOULD at least make tests to
519 determine that the string is actually an A-label, examining it for
520 the invalid formats specified in the Punycode decoding specification.
521 Applications that are not IDNA-aware will obviously omit that
522 testing; others MAY treat the string as opaque to avoid the
523 additional processing at the expense of providing less protection and
524 information to users.
526 5.5. Validation and Character List Testing
528 As with the registration procedure described in Section 4, the
529 Unicode string is checked to verify that all characters that appear
530 in it are valid as input to IDNA lookup processing. As discussed
531 above and in [IDNA2008-Rationale], the lookup check is more liberal
532 than the registration one. Putative labels with any of the following
533 characteristics MUST BE rejected prior to DNS lookup:
535 o Labels containing code points that are unassigned in the version
536 of Unicode being used by the application, i.e.,in the UNASSIGNED
537 category of [IDNA2008-Tables].
539 o Labels that are not in NFC form as defined in [Unicode-UAX15].
541 o Labels containing prohibited code points, i.e., those that are
542 assigned to the "DISALLOWED" category in the permitted character
543 table [IDNA2008-Tables].
545 o Labels containing code points that are identified in
546 [IDNA2008-Tables] as "CONTEXTJ", i.e., requiring exceptional
547 contextual rule processing on lookup, but that do not conform to
548 that rule. Note that this implies that a rule much be defined,
549 not null: a character that requires a contextual rule but for
550 which the rule is null is treated in this step as having failed to
551 conform to the rule.
553 o Labels containing code points that are identified in
554 [IDNA2008-Tables] as "CONTEXTO", but for which no such rule
555 appears in the table of rules. Applications resolving DNS names
556 or carrying out equivalent operations are not required to test
557 contextual rules for "CONTEXTO" characters, only to verify that a
558 rule is defined (although they MAY make such tests to give better
559 information to the user).
561 o Labels whose first character is a combining mark (see
562 Section 4.2.3.2.
564 In addition, the application SHOULD apply the following test. The
565 test may be omitted in special circumstances, such as when the lookup
566 application knows that the conditions are enforced elsewhere, because
567 an attempt to look up and resolve such strings will almost certainly
568 lead to a DNS lookup failure except when wildcards are present in the
569 zone. However, applying the test is likely to give much better
570 information about the reason for a lookup failure -- information that
571 may be usefully passed to the user when that is feasible -- than DNS
572 resolution failure information alone. In any event, lookup
573 applications should avoid attempting to resolve labels that are
574 invalid under that test.
576 o Verification that the string is compliant with the requirements
577 for right to left characters, specified in [IDNA2008-BIDI].
579 For all other strings, the lookup application MUST rely on the
580 presence or absence of labels in the DNS to determine the validity of
581 those labels and the validity of the characters they contain. If
582 they are registered, they are presumed to be valid; if they are not,
583 their possible validity is not relevant. A lookup application that
584 declines to process a string that conforms to the rules above and
585 does not look it up in the DNS is not in conformance with this
586 protocol.
588 5.6. Punycode Conversion
590 The validated string, a U-label, is converted to an A-label using the
591 Punycode algorithm with the ACE prefix added.
593 5.7. DNS Name Resolution
595 The A-label is looked up in the DNS, using normal DNS resolver
596 procedures.
598 6. Security Considerations
600 Security Considerations for this version of IDNA except for the
601 special issues associated with right to left scripts and characters
602 are described in [IDNA2008-Defs]. Specific issues for labels
603 containing characters associated with scripts written right to left
604 appear in [IDNA2008-BIDI].
606 7. IANA Considerations
608 IANA actions for this version of IDNA are specified in
609 [IDNA2008-Tables] and discussed informally in [IDNA2008-Rationale].
610 The components of IDNA described in this document do not require any
611 IANA actions.
613 8. Contributors
615 While the listed editor held the pen, the original versions of this
616 document represent the joint work and conclusions of an ad hoc design
617 team consisting of the editor and, in alphabetic order, Harald
618 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
619 draws significantly on the original version of IDNA [RFC3490] both
620 conceptually and for specific text. This second-generation version
621 would not have been possible without the work that went into that
622 first version and its authors, Patrik Faltstrom, Paul Hoffman, and
623 Adam Costello. While Faltstrom was actively involved in the creation
624 of this version, Hoffman and Costello were not and should not be held
625 responsible for any errors or omissions.
627 9. Acknowledgments
629 This revision to IDNA would have been impossible without the
630 accumulated experience since RFC 3490 was published and resulting
631 comments and complaints of many people in the IETF, ICANN, and other
632 communities, too many people to list here. Nor would it have been
633 possible without RFC 3490 itself and the efforts of the Working Group
634 that defined it. Those people whose contributions are acknowledged
635 in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly
636 important.
638 Specific textual changes were incorporated into this document after
639 suggestions from the other contributors, Stephane Bortzmeyer, Vint
640 Cerf, Mark Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel,
641 Marcos Sanz, Andrew Sullivan, Ken Whistler, and other WG
642 participants. Special thanks are due to Paul Hoffman for permission
643 to extract material from his Internet-Draft to form the basis for
644 Appendix A
646 10. References
648 10.1. Normative References
650 [IDNA2008-BIDI]
651 Alvestrand, H. and C. Karp, "An updated IDNA criterion for
652 right-to-left scripts", July 2008, .
655 [IDNA2008-Defs]
656 Klensin, J., "Internationalized Domain Names for
657 Applications (IDNA): Definitions and Document Framework",
658 February 2009, .
661 [IDNA2008-Tables]
662 Faltstrom, P., "The Unicode Codepoints and IDNA",
663 July 2008, .
666 A version of this document is available in HTML format at
667 http://stupid.domain.name/idnabis/
668 draft-ietf-idnabis-tables-02.html
670 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
671 STD 13, RFC 1034, November 1987.
673 [RFC1035] Mockapetris, P., "Domain names - implementation and
674 specification", STD 13, RFC 1035, November 1987.
676 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
677 and Support", STD 3, RFC 1123, October 1989.
679 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
680 Requirement Levels", BCP 14, RFC 2119, March 1997.
682 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
683 for Internationalized Domain Names in Applications
684 (IDNA)", RFC 3492, March 2003.
686 [Unicode-PropertyValueAliases]
687 The Unicode Consortium, "Unicode Character Database:
688 PropertyValueAliases", March 2008, .
691 [Unicode-RegEx]
692 The Unicode Consortium, "Unicode Technical Standard #18:
693 Unicode Regular Expressions", May 2005,
694 .
696 [Unicode-Scripts]
697 The Unicode Consortium, "Unicode Standard Annex #24:
698 Unicode Script Property", February 2008,
699 .
701 [Unicode-UAX15]
702 The Unicode Consortium, "Unicode Standard Annex #15:
703 Unicode Normalization Forms", 2006,
704 .
706 10.2. Informative References
708 [ASCII] American National Standards Institute (formerly United
709 States of America Standards Institute), "USA Code for
710 Information Interchange", ANSI X3.4-1968, 1968.
712 ANSI X3.4-1968 has been replaced by newer versions with
713 slight modifications, but the 1968 version remains
714 definitive for the Internet.
716 [IDNA2008-Rationale]
717 Klensin, J., Ed., "Internationalizing Domain Names for
718 Applications (IDNA): Issues, Explanation, and Rationale",
719 February 2009, .
722 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
723 "Dynamic Updates in the Domain Name System (DNS UPDATE)",
724 RFC 2136, April 1997.
726 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
727 Specification", RFC 2181, July 1997.
729 [RFC2535] Eastlake, D., "Domain Name System Security Extensions",
730 RFC 2535, March 1999.
732 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
733 RFC 2671, August 1999.
735 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
736 "Internationalizing Domain Names in Applications (IDNA)",
737 RFC 3490, March 2003.
739 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
740 Profile for Internationalized Domain Names (IDN)",
741 RFC 3491, March 2003.
743 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
744 Resource Identifier (URI): Generic Syntax", STD 66,
745 RFC 3986, January 2005.
747 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
748 Identifiers (IRIs)", RFC 3987, January 2005.
750 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
751 Recommendations for Internationalized Domain Names
752 (IDNs)", RFC 4690, September 2006.
754 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
755 Internationalized Email", RFC 4952, July 2007.
757 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
758 5.0", 2007.
760 Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0
762 Appendix A. Summary of Major Changes from IDNA2003
764 1. Update base character set from Unicode 3.2 to Unicode version-
765 agnostic.
767 2. Separate the definitions for the "registration" and "lookup"
768 activities.
770 3. Disallow symbol and punctuation characters except where special
771 exceptions are necessary.
773 4. Remove the mapping and normalization steps from the protocol and
774 have them instead done by the applications themselves, possibly
775 in a local fashion, before invoking the protocol.
777 5. Change the way that the protocol specifies which characters are
778 allowed in labels from "humans decide what the table of
779 codepoints contains" to "decision about codepoints are based on
780 Unicode properties plus a small exclusion list created by
781 humans".
783 6. Introduce the new concept of characters that can be used only in
784 specific contexts.
786 7. Allow typical words and names in languages such as Dhivehi and
787 Yiddish to be expressed.
789 8. Make bidirectional domain names (delimited strings of labels,
790 not just labels standing on their own) display in a less
791 surprising fashion whether they appear in obvious domain name
792 contexts or as part of running text in paragraphs.
794 9. Remove the dot separator from the mandatory part of the
795 protocol.
797 10. Make some currently-valid labels that are not actually IDNA
798 labels invalid.
800 Appendix B. Change Log
802 [[anchor20: RFC Editor: Please remove this appendix.]]
804 B.1. Changes between Version -00 and -01 of draft-ietf-idnabis-protocol
806 o Corrected discussion of SRV records.
808 o Several small corrections for clarity.
810 o Inserted more "open issue" placeholders.
812 B.2. Version -02
814 o Rewrote the "conversion to Unicode" text in Section 5.2 as
815 requested on-list.
817 o Added a comment (and reference) about EDNS0 to the "DNS Server
818 Conventions" section, which was also retitled.
820 o Made several editorial corrections and improvements in response to
821 various comments.
823 o Added several new discussion placeholder anchors and updated some
824 older ones.
826 B.3. Version -03
828 o Trimmed change log, removing information about pre-WG drafts.
830 o Incorporated a number of changes suggested by Marcos Sanz in his
831 note of 2008.07.17 and added several more placeholder anchors.
833 o Several minor editorial corrections and improvements.
835 o "Editor" designation temporarily removed because the automatic
836 posting machinery does not accept it.
838 B.4. Version -04
840 o Removed Contextual Rule appendices for transfer to Tables.
842 o Several changes, including removal of discussion anchors, based on
843 discussions at IETF 72 (Dublin)
845 o Rewrote the preprocessing material (Section 5.3) somewhat.
847 B.5. Version -05
849 o Updated part of the A-label input explanation (Section 5.4) per
850 note from Erik van der Poel.
852 B.6. Version -06
854 o Corrected a few typographical errors.
856 o Incorporated the material (formerly in Rationale) on the
857 relationship between IDNA2003 and IDNA2008 as an appendix and
858 pointed to the new definitions document.
860 o Text modified in several places to recognize the dangers of
861 interaction between DNS wildcards and IDNs.
863 o Text added to be explicit about the handling of edge and failure
864 cases in Punycode encoding and decoding.
866 o Revised for consistency with the new Definitions document and to
867 make the text read more smoothly.
869 B.7. Version -07
871 o Multiple small textual and editorial changes and clarifications.
873 o Requirement for normalization clarified to apply to all cases and
874 conditions for preprocessing further clarified.
876 o Substantive change to Section 4.2.1, turning a SHOULD to a MUST
877 (see note from Mark Davis, 19 November, 2008 18:14 -0800).
879 B.8. Version -08
881 o Added some references and altered text to improve clarity.
883 o Changed the description of CONTEXTJ/CONTEXTO to conform to that in
884 Tables. In other words, these are now treated as distinction
885 categories (again), rather than as specially-flagged subsets of
886 PROTOCOL VALID.
888 o The discussion of label comparisons has been rewritten to make it
889 more precise and to clarify that one does not need to verify that
890 a string is a [valid] A-label or U-label in order to test it for
891 equality with another string. The WG should verify that the
892 current text is what is desired.
894 o Other changes to reflect post-IETF discussions or editorial
895 improvements.
897 B.9. Version -09
899 o Removed Security Considerations material to Defs document.
901 o Removed the Name Server Considerations material to Rationale.
902 That material is not normative and not needed to implement the
903 protocol itself.
905 o Adjusted terminology to match new version of Defs.
907 o Removed all discussion of local mapping and option for it from
908 registration protocol.
910 o Removed some old placeholders and inquiries because no comments
911 have been received.
913 o Small editorial corrections.
915 Author's Address
917 John C Klensin
918 1770 Massachusetts Ave, Ste 322
919 Cambridge, MA 02140
920 USA
922 Phone: +1 617 245 1457
923 Email: john+ietf@jck.com