idnits 2.17.1
draft-ietf-idnabis-protocol-10.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** The document seems to lack a License Notice according IETF Trust
Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
Section 6.b -- however, there's a paragraph with a matching beginning.
Boilerplate error?
(You're using the IETF Trust Provisions' Section 6.b License Notice from
12 Feb 2009 rather than one of the newer Notices. See
https://trustee.ietf.org/license-info/.)
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
-- The draft header indicates that this document obsoletes RFC3490, but the
abstract doesn't seem to mention this, which it should.
-- The draft header indicates that this document updates RFC3492, but the
abstract doesn't seem to mention this, which it should.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
(Using the creation date from RFC3492, updated by this document, for
RFC5378 checks: 2002-01-10)
-- The document seems to contain a disclaimer for pre-RFC5378 work, and may
have content which was first submitted before 10 November 2008. The
disclaimer is necessary when there are original authors that you have
been unable to contact, or if some do not wish to grant the BCP78 rights
to the IETF Trust. If you are able to get all authors (current and
original) to grant those rights, you can and should remove the
disclaimer; otherwise, the disclaimer is needed and you can ignore this
comment. (See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (March 5, 2009) is 5529 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Unused Reference: 'RFC1123' is defined on line 716, but no explicit
reference was found in the text
== Unused Reference: 'Unicode-PropertyValueAliases' is defined on line 726,
but no explicit reference was found in the text
== Unused Reference: 'Unicode-RegEx' is defined on line 731, but no
explicit reference was found in the text
== Unused Reference: 'Unicode-Scripts' is defined on line 736, but no
explicit reference was found in the text
== Unused Reference: 'ASCII' is defined on line 748, but no explicit
reference was found in the text
== Unused Reference: 'RFC2136' is defined on line 762, but no explicit
reference was found in the text
== Unused Reference: 'RFC2181' is defined on line 766, but no explicit
reference was found in the text
== Unused Reference: 'RFC2535' is defined on line 769, but no explicit
reference was found in the text
== Unused Reference: 'RFC2671' is defined on line 772, but no explicit
reference was found in the text
-- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA2008-BIDI'
-- Possible downref: Non-RFC (?) normative reference: ref.
'Unicode-PropertyValueAliases'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-RegEx'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-Scripts'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-UAX15'
-- Obsolete informational reference (is this intentional?): RFC 2535
(Obsoleted by RFC 4033, RFC 4034, RFC 4035)
-- Obsolete informational reference (is this intentional?): RFC 2671
(Obsoleted by RFC 6891)
-- Obsolete informational reference (is this intentional?): RFC 3490
(Obsoleted by RFC 5890, RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 3491
(Obsoleted by RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 4952
(Obsoleted by RFC 6530)
Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 14 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Klensin
3 Internet-Draft March 5, 2009
4 Obsoletes: 3490, 3491
5 (if approved)
6 Updates: 3492 (if approved)
7 Intended status: Standards Track
8 Expires: September 6, 2009
10 Internationalized Domain Names in Applications (IDNA): Protocol
11 draft-ietf-idnabis-protocol-10.txt
13 Status of this Memo
15 This Internet-Draft is submitted to IETF in full conformance with the
16 provisions of BCP 78 and BCP 79. This document may contain material
17 from IETF Documents or IETF Contributions published or made publicly
18 available before November 10, 2008. The person(s) controlling the
19 copyright in some of this material may not have granted the IETF
20 Trust the right to allow modifications of such material outside the
21 IETF Standards Process. Without obtaining an adequate license from
22 the person(s) controlling the copyright in such materials, this
23 document may not be modified outside the IETF Standards Process, and
24 derivative works of it may not be created outside the IETF Standards
25 Process, except to format it for publication as an RFC or to
26 translate it into languages other than English.
28 Internet-Drafts are working documents of the Internet Engineering
29 Task Force (IETF), its areas, and its working groups. Note that
30 other groups may also distribute working documents as Internet-
31 Drafts.
33 Internet-Drafts are draft documents valid for a maximum of six months
34 and may be updated, replaced, or obsoleted by other documents at any
35 time. It is inappropriate to use Internet-Drafts as reference
36 material or to cite them other than as "work in progress."
38 The list of current Internet-Drafts can be accessed at
39 http://www.ietf.org/ietf/1id-abstracts.txt.
41 The list of Internet-Draft Shadow Directories can be accessed at
42 http://www.ietf.org/shadow.html.
44 This Internet-Draft will expire on September 6, 2009.
46 Copyright Notice
48 Copyright (c) 2009 IETF Trust and the persons identified as the
49 document authors. All rights reserved.
51 This document is subject to BCP 78 and the IETF Trust's Legal
52 Provisions Relating to IETF Documents in effect on the date of
53 publication of this document (http://trustee.ietf.org/license-info).
54 Please review these documents carefully, as they describe your rights
55 and restrictions with respect to this document.
57 Abstract
59 This document supplies the protocol definition for a revised and
60 updated specification for internationalized domain names (IDNs). The
61 rationale for these changes, the relationship to the older
62 specification, and important terminology are provided in other
63 documents. This document specifies the protocol mechanism, called
64 Internationalizing Domain Names in Applications (IDNA), for
65 registering and looking up IDNs in a way that does not require
66 changes to the DNS itself. IDNA is only meant for processing domain
67 names, not free text.
69 Table of Contents
71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
72 1.1. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5
73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
74 3. Requirements and Applicability . . . . . . . . . . . . . . . . 6
75 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 6
76 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6
77 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 7
78 3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 7
79 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 7
80 4.1. Input to IDNA Registration Process . . . . . . . . . . . . 8
81 4.2. Permitted Character and Label Validation . . . . . . . . . 8
82 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 8
83 4.2.2. Rejection of Characters that are not Permitted . . . . 9
84 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 9
85 4.2.4. Registration Validation Summary . . . . . . . . . . . 10
86 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 10
87 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 11
88 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 11
89 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 11
90 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 12
91 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 12
92 5.3. Character Changes in Preprocessing or the User
93 Interface . . . . . . . . . . . . . . . . . . . . . . . . 12
94 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 13
95 5.5. Validation and Character List Testing . . . . . . . . . . 14
96 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 15
97 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 15
98 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15
99 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
100 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 16
101 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16
102 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
103 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17
104 10.2. Informative References . . . . . . . . . . . . . . . . . . 18
105 Appendix A. Local Mapping Alternatives . . . . . . . . . . . . . 19
106 A.1. Transitional Mapping Model . . . . . . . . . . . . . . . . 19
107 A.2. Internationalized Resource Identifier (IRI) Mapping
108 Model . . . . . . . . . . . . . . . . . . . . . . . . . . 20
109 Appendix B. Summary of Major Changes from IDNA2003 . . . . . . . 21
110 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 22
111 C.1. Changes between Version -00 and -01 of
112 draft-ietf-idnabis-protocol . . . . . . . . . . . . . . . 22
113 C.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 22
114 C.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 22
115 C.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 22
116 C.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 23
117 C.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 23
118 C.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 23
119 C.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 23
120 C.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 24
121 C.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 24
122 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24
124 1. Introduction
126 This document supplies the protocol definition for a revised and
127 updated specification for internationalized domain names. Essential
128 definitions and terminology for understanding this document and a
129 road map of the collection of documents that make up IDNA2008 appear
130 in [IDNA2008-Defs]. Appendix B discusses the relationship between
131 this specification and the earlier version of IDNA (referred to here
132 as "IDNA2003") and the rationale for these changes, along with
133 considerable explanatory material and advice to zone administrators
134 who support IDNs is provided in another documents, notably
135 [IDNA2008-Rationale].
137 IDNA works by allowing applications to use certain ASCII string
138 labels (beginning with a special prefix) to represent non-ASCII name
139 labels. Lower-layer protocols need not be aware of this; therefore
140 IDNA does not depend on changes to any infrastructure. In
141 particular, IDNA does not depend on any changes to DNS servers,
142 resolvers, or protocol elements, because the ASCII name service
143 provided by the existing DNS is entirely sufficient for IDNA.
145 IDNA is applied only to DNS labels. Standards for combining labels
146 into fully-qualified domain names and parsing labels out of those
147 names are covered in the base DNS standards [RFC1034] [RFC1035] and
148 their various updates. An application may, of course, apply locally-
149 appropriate conventions to the presentation forms of domain names as
150 discussed in [IDNA2008-Rationale].
152 While they share terminology, reference data, and some operations,
153 this document describes two separate protocols, one for IDN
154 registration (Section 4) and one for IDN lookup (Section 5).
156 1.1. Discussion Forum
158 [[anchor3: RFC Editor: please remove this section.]]
160 This work is being discussed in the IETF IDNABIS WG and on the
161 mailing list idna-update@alvestrand.no
163 2. Terminology
165 General terminology applicable to IDNA, but with meanings familiar to
166 those who have worked with Unicode or other character set standards
167 and the DNS, appears in [IDNA2008-Defs]. Terminology that is an
168 integral, normative, part of the IDNA definition, including the
169 definitions of "ACE", appears in that document as well. Familiarity
170 with the terminology materials in that document is assumed for
171 reading this one. The reader of this document is assumed to be
172 familiar with DNS-specific terminology as defined in RFC 1034
173 [RFC1034].
175 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
176 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
177 document are to be interpreted as described in BCP 14, RFC 2119
178 [RFC2119].
180 3. Requirements and Applicability
182 3.1. Requirements
184 IDNA conformance means adherence to the following requirements:
186 1. Whenever a domain name is put into an IDN-unaware domain name
187 slot (see Section 2 and [IDNA2008-Defs]), it MUST contain only
188 ASCII characters (i.e., must be either an A-label or an NR-LDH-
189 label), or must be a label associated with a DNS application that
190 is not subject to either IDNA or the historical recommendations
191 for "hostname"-style names [RFC1034].
193 2. Comparison of labels MUST be done on equivalent forms: either
194 both A-Label forms or both U-Label forms. Because A-labels and
195 U-labels can be transformed into each other without loss of
196 information, these comparisons are equivalent. However, when a
197 pair of putative A-labels are compared, the comparison MUST use
198 an ASCII case-insensitive comparison (as with all comparisons of
199 ASCII DNS labels). Comparisons on putative U-labels must test
200 that the two strings are identical, without case-folding or other
201 intermediate steps. Note that it is not necessary to verify that
202 labels are valid in order to compare them. In many cases,
203 verification of validity (that the strings actually are A-labels
204 or U-labels) may be important for other reasons and SHOULD be
205 performed.
207 3. Labels being registered MUST conform to the requirements of
208 Section 4. Labels being looked up and the lookup process MUST
209 conform to the requirements of Section 5.
211 3.2. Applicability
213 IDNA is applicable to all domain names in all domain name slots
214 except where it is explicitly excluded. It is not applicable to
215 domain name slots which do not use the LDH syntax rules.
217 This implies that IDNA is applicable to many protocols that predate
218 IDNA. Note that IDNs occupying domain name slots in those older
219 protocols MUST be in A-label form until and unless those protocols
220 and implementations of them are upgraded to be IDN-aware. IDNs
221 actually appearing in DNS queries or responses MUST be A-labels.
223 3.2.1. DNS Resource Records
225 IDNA applies only to domain names in the NAME and RDATA fields of DNS
226 resource records whose CLASS is IN.
228 There are currently no other exclusions on the applicability of IDNA
229 to DNS resource records. Applicability depends entirely on the
230 CLASS, and not on the TYPE except as noted below. This will remain
231 true, even as new types are defined, unless there is a compelling
232 reason for a new type that requires type-specific rules. The special
233 naming conventions applicable to SRV records are examples of type-
234 specific rules that are incompatible with IDNA coding. Hence the
235 first two labels (the ones required to start in "_") on a record with
236 TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible
237 to write a non-ASCII string with a leading underscore, conversion to
238 an A-label would be impossible without loss of information because
239 the underscore is not a letter, digit, or hyphen and is consequently
240 DISALLOWED in IDNs). Of course, those labels may be part of a domain
241 that uses IDN labels at higher levels in the tree.
243 3.2.2. Non-domain-name Data Types Stored in the DNS
245 Although IDNA enables the representation of non-ASCII characters in
246 domain names, that does not imply that IDNA enables the
247 representation of non-ASCII characters in other data types that are
248 stored in domain names, specifically in the RDATA field for types
249 that have structured RDATA format. For example, an email address
250 local part is stored in a domain name in the RNAME field as part of
251 the RDATA of an SOA record (hostmaster@example.com would be
252 represented as hostmaster.example.com). IDNA specifically does not
253 update the existing email standards, which allow only ASCII
254 characters in local parts. Even though work is in progress to define
255 internationalization for email addresses [RFC4952], changes to the
256 email address part of the SOA RDATA would require action in, or
257 updates to, other standards, specifically those that specify the
258 format of the SOA RR.
260 4. Registration Protocol
262 This section defines the procedure for registering an IDN. The
263 procedure is implementation independent; any sequence of steps that
264 produces exactly the same result for all labels is considered a valid
265 implementation.
267 Note that, while the registration and lookup protocols (Section 5)
268 are very similar in most respects, they are different and
269 implementers should carefully follow the steps they are implementing.
271 4.1. Input to IDNA Registration Process
273 [[anchor8: Note in Draft: This subsection is new in -09/, based on
274 comments on the mailing list in January and February 2009. It
275 replaces the previous first two subsections of this section and
276 completely eliminates the discussion of local mapping for
277 registration.]]
279 Registration processes are outside the scope of these protocols and
280 may differ significantly depending on local needs. By the time a
281 string enters the IDNA registration process as described in this
282 specification, it is expected to be in Unicode and MUST be in Unicode
283 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
284 zone files ("registries") are expected to accept only the exact
285 string for which registration is requested, free of any mappings or
286 local adjustments. They SHOULD avoid any possible ambiguity by
287 accepting registrations only for A-labels, possibly paired with the
288 relevant U-labels so that they can verify the correspondence.
290 4.2. Permitted Character and Label Validation
292 4.2.1. Input Format
294 The registry MAY permit submission of labels in A-label form and is
295 encouraged to accept both the A-label form and the U-label one. If
296 it does so, it MUST perform a conversion to a U-label, perform the
297 steps and tests described below, and verify that the A-label produced
298 by the step in Section 4.4 matches the one provided as input. In
299 addition, if a U-label was provided, that U-label and the one
300 obtained by conversion of the A-label MUST match exactly. If, for
301 some reason, these tests fail, the registration MUST be rejected. If
302 the conversion to a U-label is not performed, the registry MUST still
303 verify that the A-label is superficially valid, i.e., that it does
304 not violate any of the rules of Punycode [RFC3492] encoding such as
305 the prohibition on trailing hyphen-minus, appearance of non-basic
306 characters before the delimiter, and so on. Fake A-labels, i.e.,
307 invalid strings that appear to be A-labels but are not, MUST NOT be
308 placed in DNS zones that support IDNA.
310 4.2.2. Rejection of Characters that are not Permitted
312 The candidate Unicode string is checked to verify that characters
313 that IDNA does not permit do not appear in it. Those characters are
314 identified in the "DISALLOWED" and "UNASSIGNED" lists that are
315 specified in [IDNA2008-Tables] and described informally in
316 [IDNA2008-Rationale]. Characters that are either DISALLOWED or
317 UNASSIGNED MUST NOT be part of labels to be processed for
318 registration in the DNS.
320 4.2.3. Label Validation
322 The proposed label (in the form of a Unicode string, i.e., a putative
323 U-label) is then examined, performing tests that require examination
324 of more than one character.
326 4.2.3.1. Rejection of Hyphen Sequences in U-labels
328 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
329 the third and fourth character positions when the label is considered
330 in "on the wire" order.
332 4.2.3.2. Leading Combining Marks
334 The first character of the string (when the label is considered in
335 "on the wire" order) is examined to verify that it is not a combining
336 mark (or combining character) (see The Unicode Standard, Section 2.11
337 [Unicode] for an exact definition). If it is a combining mark, the
338 string MUST NOT be registered.
340 4.2.3.3. Contextual Rules
342 Each code point is checked for its identification as a character
343 requiring contextual processing for registration (the list of
344 characters appears as the combination of CONTEXTJ and CONTEXTO in
345 [IDNA2008-Tables] as do the contextual rules themselves). If that
346 indication appears, the table of contextual rules is checked for a
347 rule for that character. If no rule is found, the proposed label is
348 rejected and MUST NOT be installed in a zone file. If one is found,
349 it is applied (typically as a test on the entire label or on adjacent
350 characters within the label). If the application of the rule does
351 not conclude that the character is valid in context, the proposed
352 label MUST BE rejected. (See the IANA Considerations: IDNA Context
353 Registry section of [IDNA2008-Tables].)
355 These contextual rules are required to support the use of characters
356 that could be used, under other conditions, to produce misleading
357 labels or to cause unacceptable ambiguity in label matching and
358 interpretation. For example, labels containing invisible ("zero-
359 width") characters may be permitted in context with characters whose
360 presentation forms are significantly changed by the presence or
361 absence of the zero-width characters, while other labels in which
362 zero-width characters appear may be rejected.
364 4.2.3.4. Labels Containing Characters Written Right to Left
366 Special tests are required for strings containing characters that are
367 normally written from right to left. The criteria for classifying
368 characters in terms of directionality are identified in the "Bidi"
369 document [IDNA2008-BIDI] in this series. That document also
370 describes conditions for strings that contain one or more of those
371 characters to be U-labels. The tests for those conditions, specified
372 there, are applied. Strings that contain right to left characters
373 but that do not conform to the IDNA Bidi rules MUST NOT be inserted
374 as labels in zone files.
376 4.2.4. Registration Validation Summary
378 Strings that contain at least one non-ASCII character, have been
379 produced by the steps above, whose contents pass all of the tests in
380 Section 4.2, and are 63 or fewer characters long in ACE form (see
381 Section 4.4), are U-labels.
383 To summarize, tests are made in Section 4.2 for invalid characters,
384 invalid combinations of characters, for labels that are invalid even
385 if the characters they contain are valid individually, and for labels
386 that do not conform to the restrictions for strings containing right
387 to left characters.
389 4.3. Registry Restrictions
391 Registries at all levels of the DNS, not just the top level, are
392 expected to establish policies about the labels that may be
393 registered, and for the processes associated with that action. While
394 exact policies are not specified as part of IDNA2008 and it is
395 expected that different registries may specify different policies,
396 there SHOULD be policies. Even a trivial policy (e.g., "anything can
397 be registered in this zone that can be represented as an A-label -
398 U-label pair") has value because it provides notice to users and
399 applications implementers that the registry cannot be relied upon to
400 provide even minimal user-protection restrictions. These per-
401 registry policies and restrictions are an essential element of the
402 IDNA registration protocol even for registries (and corresponding
403 zone files) deep in the DNS hierarchy. As discussed in
404 [IDNA2008-Rationale], such restrictions have always existed in the
405 DNS. That document also contains a discussion and recommendations
406 about possible types of rules.
408 The string produced by the above steps is checked and processed as
409 appropriate to local registry restrictions. Application of those
410 registry restrictions may result in the rejection of some labels or
411 the application of special restrictions to others.
413 4.4. Punycode Conversion
415 The resulting U-label is converted to an A-label. The A-label, more
416 precisely defined elsewhere, is the encoding of the U-label according
417 to the Punycode algorithm [RFC3492] with the ACE prefix "xn--" added
418 at the beginning of the string. The resulting string must, of
419 course, conform to the length limits imposed by the DNS. This
420 document updates RFC 3492 only to the extent of replacing the
421 reference to the discussion of the ACE prefix. The ACE prefix is now
422 specified in this document rather than as part of RFC 3490 or
423 Nameprep [RFC3491] but is the same in both sets of documents.
425 The failure conditions identified in the Punycode encoding procedure
426 cannot occur if the input is a U-label as determined by the steps
427 above.
429 4.5. Insertion in the Zone
431 The A-label is registered in the DNS by insertion into a zone.
433 5. Domain Name Lookup Protocol
435 Lookup is conceptually different from registration and different
436 tests are applied on the client. Although some validity checks are
437 necessary to avoid serious problems with the protocol (see
438 Section 5.5ff.), the lookup-side tests are more permissive and rely
439 on the assumption that names that are present in the DNS are valid.
440 That assumption is, however, a weak one because the presence of wild
441 cards in the DNS might cause a string that is not actually registered
442 in the DNS to be successfully looked up.
444 For convenience in description, we introduce an extra concept, a
445 "C-label", to describe a string that has the same appearance as an
446 A-label but that has been verified only to meet the somewhat more
447 flexible lookup requirements.
449 [[anchor14: Note in Draft: Try to reorganize and renumber Section 5
450 (Lookup) so that it exactly parallels Section 4 (Registration). This
451 has no been done in draft -10 because the task will be much easier if
452 the local mapping material is pulled from here (and there is no point
453 trying to align the section numbers twice).]]
455 5.1. Label String Input
457 The user supplies a string in the local character set, typically by
458 typing it or clicking on, or copying and pasting, a resource
459 identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the
460 domain name is extracted. Alternately, some process not directly
461 involving the user may read the string from a file or obtain it in
462 some other way. Processing in this step and the next two are local
463 matters, to be accomplished prior to actual invocation of IDNA, but
464 at least the two steps in Section 5.2 and Section 5.3 must be
465 accomplished in some way.
467 5.2. Conversion to Unicode
469 The string is converted from the local character set into Unicode, if
470 it is not already Unicode. The exact nature of this conversion is
471 beyond the scope of this document, but may involve normalization
472 identical to that discussed in Section 4.1. The result MUST be a
473 Unicode string in NFC form.
475 5.3. Character Changes in Preprocessing or the User Interface
477 [[anchor15: Note in Draft -10. As of the time this draft was posted,
478 the WG was continuing to discuss various alternatives to this
479 section, which was pragmatic relative to various options and behavior
480 but that seems to make no one happy from a predictability or
481 transition standpoint. Please see the (temporary) first appendix to
482 this document for a first cut at possible alternate formulations.]]
484 The Unicode string MAY then be processed to prevent confounding of
485 user expectations. For instance, it might be reasonable, at this
486 step, to convert all upper case characters to lower case, if this
487 makes sense in the user's environment, but even this should be
488 approached with caution due to some edge cases: in the long term, it
489 is probably better for users to understand IDNs strictly in lower-
490 case, U-label, form. More generally, preprocessing may be useful to
491 smooth the transition from IDNA2003, especially for direct user
492 input, but with similar cautions. In general, IDNs appearing in
493 files and those transmitted across the network as part of protocols
494 are expected to be in either ASCII form (including A-labels) or to
495 contain U-labels, rather than being in forms requiring mapping or
496 other conversions.
498 Other examples of processing for localization might be applied,
499 especially to direct user input, at this point. They include
500 interpreting various characters as separating domain name components
501 from each other (label separators) because they either look like
502 periods or are used to separate sentences, mapping halfwidth or
503 fullwidth East Asian characters to the common form permitted in
504 labels, or giving special treatment to characters whose presentation
505 forms are dependent only on placement in the label. Such
506 localization changes are also outside the scope of this
507 specification.
509 Recommendations for preprocessing for global contexts (i.e., when
510 local considerations do not apply or cannot be used) and for maximum
511 interoperability with labels that might have been specified under
512 liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. It
513 is important to note that the intent of these specifications is that
514 labels in application protocols, files, or links are intended to be
515 in U-label or A-label form. Preprocessing MUST NOT map a character
516 that is valid in a label as specified elsewhere in this document or
517 in [IDNA2008-Tables] into another character. Excessively liberal use
518 of preprocessing, especially to strings stored in files, poses a
519 threat to consistent and predictable behavior for the user even if
520 not to actual interoperability.
522 Because these transformations are local, it is important that domain
523 names that might be passed between systems (e.g., in IRIs) be
524 U-labels or A-labels and not forms that might be accepted locally as
525 a consequence of this step. This step is not standardized as part of
526 IDNA, and is not further specified here.
528 5.4. A-label Input
530 If the input to this procedure appears to be an A-label (i.e., it
531 starts in "xn--"), the lookup application MAY attempt to convert it
532 to a U-label and apply the tests of Section 5.5 and the conversion of
533 Section 5.6 to that form. If the label is converted to Unicode
534 (i.e., to U-label form) using the Punycode decoding algorithm, then
535 the processing specified in those two sections MUST be performed, and
536 the label MUST be rejected if the resulting label is not identical to
537 the original. See the Name Server Considerations section of
538 [IDNA2008-Rationale] for additional discussion on this topic.
540 That conversion and testing SHOULD be performed if the domain name
541 will later be presented to the user in native character form (this
542 requires that the lookup application be IDNA-aware). If those steps
543 are not performed, the lookup process SHOULD at least make tests to
544 determine that the string is actually an A-label, examining it for
545 the invalid formats specified in the Punycode decoding specification.
546 Applications that are not IDNA-aware will obviously omit that
547 testing; others MAY treat the string as opaque to avoid the
548 additional processing at the expense of providing less protection and
549 information to users.
551 5.5. Validation and Character List Testing
553 As with the registration procedure described in Section 4, the
554 Unicode string is checked to verify that all characters that appear
555 in it are valid as input to IDNA lookup processing. As discussed
556 above and in [IDNA2008-Rationale], the lookup check is more liberal
557 than the registration one. Putative labels with any of the following
558 characteristics MUST BE rejected prior to DNS lookup:
560 o Labels containing code points that are unassigned in the version
561 of Unicode being used by the application, i.e.,in the UNASSIGNED
562 category of [IDNA2008-Tables].
564 o Labels that are not in NFC form as defined in [Unicode-UAX15].
566 o Labels containing prohibited code points, i.e., those that are
567 assigned to the "DISALLOWED" category in the permitted character
568 table [IDNA2008-Tables].
570 o Labels containing code points that are identified in
571 [IDNA2008-Tables] as "CONTEXTJ", i.e., requiring exceptional
572 contextual rule processing on lookup, but that do not conform to
573 that rule. Note that this implies that a rule much be defined,
574 not null: a character that requires a contextual rule but for
575 which the rule is null is treated in this step as having failed to
576 conform to the rule.
578 o Labels containing code points that are identified in
579 [IDNA2008-Tables] as "CONTEXTO", but for which no such rule
580 appears in the table of rules. Applications resolving DNS names
581 or carrying out equivalent operations are not required to test
582 contextual rules for "CONTEXTO" characters, only to verify that a
583 rule is defined (although they MAY make such tests to give better
584 information to the user).
586 o Labels whose first character is a combining mark (see
587 Section 4.2.3.2.
589 In addition, the application SHOULD apply the following test. The
590 test may be omitted in special circumstances, such as when the lookup
591 application knows that the conditions are enforced elsewhere, because
592 an attempt to look up and resolve such strings will almost certainly
593 lead to a DNS lookup failure except when wildcards are present in the
594 zone. However, applying the test is likely to give much better
595 information about the reason for a lookup failure -- information that
596 may be usefully passed to the user when that is feasible -- than DNS
597 resolution failure information alone. In any event, lookup
598 applications should avoid attempting to resolve labels that are
599 invalid under that test.
601 o Verification that the string is compliant with the requirements
602 for right to left characters, specified in [IDNA2008-BIDI].
604 For all other strings, the lookup application MUST rely on the
605 presence or absence of labels in the DNS to determine the validity of
606 those labels and the validity of the characters they contain. If
607 they are registered, they are presumed to be valid; if they are not,
608 their possible validity is not relevant. A lookup application that
609 declines to process a string that conforms to the rules above and
610 does not look it up in the DNS is not in conformance with this
611 protocol.
613 5.6. Punycode Conversion
615 The validated string, an apparent U-label, is converted to an
616 apparent A-label using the Punycode algorithm with the ACE prefix
617 added. These label forms are "apparent" U-labels and A-labels
618 because not all of the tests used in the Registration procedure
619 (Section 4) to effectively define those terms precisely are applied
620 in this lookup procedure.
621 [[anchor16: Note in Draft: As of -10, we are back to "apparent" (or
622 "putative" if the WG prefers) label forms. The previous text
623 asserted that these strings were A-labels and U-labels, which was
624 clearly wrong, since those terms are defined in terms of complete
625 validity and all of the registration tests. Mark suggested an
626 alternative, which was to introduce a new term, C-label, which was a
627 superset of A-labels but with fewer test conditions. I like the
628 idea, but could not figure out how to make it work without also
629 introducing a near-U-label term, and that started to become much too
630 terminology heavy to be followed easily. Suggestions of ways out of
631 this, preferably with specific text for this document and Defs, would
632 be welcome.]]
634 5.7. DNS Name Resolution
636 The resulting string (the apparent A-label) is looked up in the DNS,
637 using normal DNS resolver procedures.
639 6. Security Considerations
641 Security Considerations for this version of IDNA, except for the
642 special issues associated with right to left scripts and characters,
643 are described in [IDNA2008-Defs]. Specific issues for labels
644 containing characters associated with scripts written right to left
645 appear in [IDNA2008-BIDI].
647 7. IANA Considerations
649 IANA actions for this version of IDNA are specified in
650 [IDNA2008-Tables] and discussed informally in [IDNA2008-Rationale].
651 The components of IDNA described in this document do not require any
652 IANA actions.
654 8. Contributors
656 While the listed editor held the pen, the original versions of this
657 document represent the joint work and conclusions of an ad hoc design
658 team consisting of the editor and, in alphabetic order, Harald
659 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
660 draws significantly on the original version of IDNA [RFC3490] both
661 conceptually and for specific text. This second-generation version
662 would not have been possible without the work that went into that
663 first version and its authors, Patrik Faltstrom, Paul Hoffman, and
664 Adam Costello. While Faltstrom was actively involved in the creation
665 of this version, Hoffman and Costello were not and should not be held
666 responsible for any errors or omissions.
668 9. Acknowledgments
670 This revision to IDNA would have been impossible without the
671 accumulated experience since RFC 3490 was published and resulting
672 comments and complaints of many people in the IETF, ICANN, and other
673 communities, too many people to list here. Nor would it have been
674 possible without RFC 3490 itself and the efforts of the Working Group
675 that defined it. Those people whose contributions are acknowledged
676 in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly
677 important.
679 Specific textual changes were incorporated into this document after
680 suggestions from the other contributors, Stephane Bortzmeyer, Vint
681 Cerf, Mark Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel,
682 Marcos Sanz, Andrew Sullivan, Ken Whistler, and other WG
683 participants. Special thanks are due to Paul Hoffman for permission
684 to extract material from his Internet-Draft to form the basis for
685 Appendix B
687 10. References
688 10.1. Normative References
690 [IDNA2008-BIDI]
691 Alvestrand, H. and C. Karp, "An updated IDNA criterion for
692 right-to-left scripts", July 2008, .
695 [IDNA2008-Defs]
696 Klensin, J., "Internationalized Domain Names for
697 Applications (IDNA): Definitions and Document Framework",
698 February 2009, .
701 [IDNA2008-Tables]
702 Faltstrom, P., "The Unicode Codepoints and IDNA",
703 July 2008, .
706 A version of this document is available in HTML format at
707 http://stupid.domain.name/idnabis/
708 draft-ietf-idnabis-tables-02.html
710 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
711 STD 13, RFC 1034, November 1987.
713 [RFC1035] Mockapetris, P., "Domain names - implementation and
714 specification", STD 13, RFC 1035, November 1987.
716 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
717 and Support", STD 3, RFC 1123, October 1989.
719 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
720 Requirement Levels", BCP 14, RFC 2119, March 1997.
722 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
723 for Internationalized Domain Names in Applications
724 (IDNA)", RFC 3492, March 2003.
726 [Unicode-PropertyValueAliases]
727 The Unicode Consortium, "Unicode Character Database:
728 PropertyValueAliases", March 2008, .
731 [Unicode-RegEx]
732 The Unicode Consortium, "Unicode Technical Standard #18:
733 Unicode Regular Expressions", May 2005,
734 .
736 [Unicode-Scripts]
737 The Unicode Consortium, "Unicode Standard Annex #24:
738 Unicode Script Property", February 2008,
739 .
741 [Unicode-UAX15]
742 The Unicode Consortium, "Unicode Standard Annex #15:
743 Unicode Normalization Forms", 2006,
744 .
746 10.2. Informative References
748 [ASCII] American National Standards Institute (formerly United
749 States of America Standards Institute), "USA Code for
750 Information Interchange", ANSI X3.4-1968, 1968.
752 ANSI X3.4-1968 has been replaced by newer versions with
753 slight modifications, but the 1968 version remains
754 definitive for the Internet.
756 [IDNA2008-Rationale]
757 Klensin, J., Ed., "Internationalizing Domain Names for
758 Applications (IDNA): Issues, Explanation, and Rationale",
759 February 2009, .
762 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
763 "Dynamic Updates in the Domain Name System (DNS UPDATE)",
764 RFC 2136, April 1997.
766 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
767 Specification", RFC 2181, July 1997.
769 [RFC2535] Eastlake, D., "Domain Name System Security Extensions",
770 RFC 2535, March 1999.
772 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
773 RFC 2671, August 1999.
775 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
776 "Internationalizing Domain Names in Applications (IDNA)",
777 RFC 3490, March 2003.
779 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
780 Profile for Internationalized Domain Names (IDN)",
781 RFC 3491, March 2003.
783 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
784 Resource Identifier (URI): Generic Syntax", STD 66,
785 RFC 3986, January 2005.
787 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
788 Identifiers (IRIs)", RFC 3987, January 2005.
790 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
791 Recommendations for Internationalized Domain Names
792 (IDNs)", RFC 4690, September 2006.
794 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
795 Internationalized Email", RFC 4952, July 2007.
797 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
798 5.0", 2007.
800 Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0
802 Appendix A. Local Mapping Alternatives
804 The subsections of this appendix are temporary and represent
805 different sketches of possible replacements for Section 5.3. They do
806 not represent an assertion of WG consensus or any assertion about the
807 possibility of including one of them as part of the WG's work
808 program. Instead, they are supplied only for purposes of comparison,
809 discussion, and, should it be relevant, refinement.
811 The first paragraph of each subsection describes how the material
812 would be placed relative to the existing main document text.
813 Subsequent paragraphs are the actual suggestions, although in
814 incomplete sketch form.
816 A.1. Transitional Mapping Model
818 If this subsection were adopted, Section 5.3 would be deleted and
819 this one would be inserted after, or integrated with, Section 5.7.
821 This specification does not support the extensive mappings from one
822 character to another, including Unicode Case Folding and
823 Compatibility Character mapping, of IDNA2003. It also changes the
824 interpretations of a small number of characters relative to IDNA2003.
825 Most applications, especially those with which IDNs have been used
826 for some time, will need to maintain reasonable compatibility with
827 files created under IDNA2003 and user interfaces designed for it.
828 This section specifies additional steps to be taken to provide
829 maximum IDNA2003 compatibility.
831 If an application requires IDNA2003 backward compatibility, it MUST
832 execute one of the two bulleted steps below.
834 o If the resolution attempt in Section 5.7 fails, the apparent
835 U-label is processed through the ToASCII operation specified in
836 IDNA2003 [RFC3490] and, if the two apparent A-labels are not
837 identical, the result is looked up. If it is found, the relevant
838 values are handled as if the resolution attempt in Section 5.7 had
839 succeeded with that value. If the resolution attempt in
840 Section 5.7 is successful, this step simply produces that value.
842 o Once the resolution attempt in Section 5.7 is completed, the
843 apparent U-label is processed through the ToASCII operation
844 specified in IDNA2003 [RFC3490]. The two apparent A-labels are
845 compared to each other. If they are not identical, the second one
846 is looked up as well. If one of the two lookups is successful and
847 the other is not, that value is used as the result of the lookup.
848 If both are successful, the user is presented with a choice. If
849 neither is successful, the IDNA lookup fails.
851 Note that, if both interpretations of the name return values, the
852 lookup application has no practical way to tell whether the relevant
853 registry has applied "variant" or "bundling" techniques to ensure
854 that both domain name are under the same control or not. From that
855 perspective, the first of these approaches assumes that has been done
856 (if the IDNA2003-interpretation label is present at all) while the
857 second assumes that such bundling is unlikely to have occurred.
858 [[anchor24: Note in Draft: If this appendix is used, RFC3490 must be
859 moved from Informative to Normative.]]
861 A.2. Internationalized Resource Identifier (IRI) Mapping Model
863 This subsection is intended to be descriptive of an approach that
864 lies outside IDNA, rather than a normative component of it. If it
865 were adopted, Section 5.3 would be deleted and the material below
866 would be referenced, either as a non-normative Appendix in Protocol
867 or, more reasonably, as a section of Rationale.
869 IDNA2003 supported extensive mappings from one character to another,
870 including Unicode Case Folding and Compatibility Character mapping.
871 Those mappings are no longer supported on registration and are
872 inconsistent with the "exact match" lookups that people expect from
873 the DNS. Some mapping should still be supported, both for
874 compatibility with applications that assume IDNA2003 and to avoid
875 confounding user expectations. The specific mappings involved are
876 not part of IDNA, but are expected to be specified as part of a
877 revision to the IRI specification [RFC3987] and the conversion from
878 IRI form to URI form. That change leaves mapping unspecified and
879 prohibited for actual domain names, however, in practice, most domain
880 names, especially in the web applications that appear to have been
881 most important for IDNs between the publication of IDNA2003 and the
882 release of this specification, are not interpreted as themselves but
883 as abbreviated form of URIs or IRIs and hence subject to the
884 transformation rules of the latter.
886 Appendix B. Summary of Major Changes from IDNA2003
888 1. Update base character set from Unicode 3.2 to Unicode version-
889 agnostic.
891 2. Separate the definitions for the "registration" and "lookup"
892 activities.
894 3. Disallow symbol and punctuation characters except where special
895 exceptions are necessary.
897 4. Remove the mapping and normalization steps from the protocol and
898 have them instead done by the applications themselves, possibly
899 in a local fashion, before invoking the protocol.
901 5. Change the way that the protocol specifies which characters are
902 allowed in labels from "humans decide what the table of
903 codepoints contains" to "decision about codepoints are based on
904 Unicode properties plus a small exclusion list created by
905 humans".
907 6. Introduce the new concept of characters that can be used only in
908 specific contexts.
910 7. Allow typical words and names in languages such as Dhivehi and
911 Yiddish to be expressed.
913 8. Make bidirectional domain names (delimited strings of labels,
914 not just labels standing on their own) display in a less
915 surprising fashion whether they appear in obvious domain name
916 contexts or as part of running text in paragraphs.
918 9. Remove the dot separator from the mandatory part of the
919 protocol.
921 10. Make some currently-valid labels that are not actually IDNA
922 labels invalid.
924 Appendix C. Change Log
926 [[anchor27: RFC Editor: Please remove this appendix.]]
928 C.1. Changes between Version -00 and -01 of draft-ietf-idnabis-protocol
930 o Corrected discussion of SRV records.
932 o Several small corrections for clarity.
934 o Inserted more "open issue" placeholders.
936 C.2. Version -02
938 o Rewrote the "conversion to Unicode" text in Section 5.2 as
939 requested on-list.
941 o Added a comment (and reference) about EDNS0 to the "DNS Server
942 Conventions" section, which was also retitled.
944 o Made several editorial corrections and improvements in response to
945 various comments.
947 o Added several new discussion placeholder anchors and updated some
948 older ones.
950 C.3. Version -03
952 o Trimmed change log, removing information about pre-WG drafts.
954 o Incorporated a number of changes suggested by Marcos Sanz in his
955 note of 2008.07.17 and added several more placeholder anchors.
957 o Several minor editorial corrections and improvements.
959 o "Editor" designation temporarily removed because the automatic
960 posting machinery does not accept it.
962 C.4. Version -04
964 o Removed Contextual Rule appendices for transfer to Tables.
966 o Several changes, including removal of discussion anchors, based on
967 discussions at IETF 72 (Dublin)
969 o Rewrote the preprocessing material (Section 5.3) somewhat.
971 C.5. Version -05
973 o Updated part of the A-label input explanation (Section 5.4) per
974 note from Erik van der Poel.
976 C.6. Version -06
978 o Corrected a few typographical errors.
980 o Incorporated the material (formerly in Rationale) on the
981 relationship between IDNA2003 and IDNA2008 as an appendix and
982 pointed to the new definitions document.
984 o Text modified in several places to recognize the dangers of
985 interaction between DNS wildcards and IDNs.
987 o Text added to be explicit about the handling of edge and failure
988 cases in Punycode encoding and decoding.
990 o Revised for consistency with the new Definitions document and to
991 make the text read more smoothly.
993 C.7. Version -07
995 o Multiple small textual and editorial changes and clarifications.
997 o Requirement for normalization clarified to apply to all cases and
998 conditions for preprocessing further clarified.
1000 o Substantive change to Section 4.2.1, turning a SHOULD to a MUST
1001 (see note from Mark Davis, 19 November, 2008 18:14 -0800).
1003 C.8. Version -08
1005 o Added some references and altered text to improve clarity.
1007 o Changed the description of CONTEXTJ/CONTEXTO to conform to that in
1008 Tables. In other words, these are now treated as distinction
1009 categories (again), rather than as specially-flagged subsets of
1010 PROTOCOL VALID.
1012 o The discussion of label comparisons has been rewritten to make it
1013 more precise and to clarify that one does not need to verify that
1014 a string is a [valid] A-label or U-label in order to test it for
1015 equality with another string. The WG should verify that the
1016 current text is what is desired.
1018 o Other changes to reflect post-IETF discussions or editorial
1019 improvements.
1021 C.9. Version -09
1023 o Removed Security Considerations material to Defs document.
1025 o Removed the Name Server Considerations material to Rationale.
1026 That material is not normative and not needed to implement the
1027 protocol itself.
1029 o Adjusted terminology to match new version of Defs.
1031 o Removed all discussion of local mapping and option for it from
1032 registration protocol.
1034 o Removed some old placeholders and inquiries because no comments
1035 have been received.
1037 o Small editorial corrections.
1039 C.10. Version -10
1041 o Rewrote the registration input material slightly to further
1042 clarify the "no mapping on registration" principle.
1044 o Added placeholder notes about several tasks, notably reorganizing
1045 Section 4 and Section 5 so that subsection numbers are parallel.
1047 o Cleaned up an incorrect use of the terms "A-label" and "U-label"
1048 in the lookup phase that was spotted by Mark Davis. Inserted a
1049 note there about alternate ways to deal with the resulting
1050 terminology problem.
1052 o Added a temporarily appendix (above) to document alternate
1053 strategies for possible replacements for Section 5.3.
1055 Author's Address
1057 John C Klensin
1058 1770 Massachusetts Ave, Ste 322
1059 Cambridge, MA 02140
1060 USA
1062 Phone: +1 617 245 1457
1063 Email: john+ietf@jck.com