idnits 2.17.1
draft-ietf-idnabis-protocol-06.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** It looks like you're using RFC 3978 boilerplate. You should update this
to the boilerplate described in the IETF Trust License Policy document
(see https://trustee.ietf.org/license-info), which is required now.
-- Found old boilerplate from RFC 3978, Section 5.1 on line 18.
-- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
line 965.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 976.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 983.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 989.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
-- The draft header indicates that this document obsoletes RFC3490, but the
abstract doesn't seem to mention this, which it should.
-- The draft header indicates that this document updates RFC3492, but the
abstract doesn't seem to mention this, which it should.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust Copyright Line does not match the
current year
(Using the creation date from RFC3492, updated by this document, for
RFC5378 checks: 2002-01-10)
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (November 2, 2008) is 5654 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Unused Reference: 'Unicode-PropertyValueAliases' is defined on line 758,
but no explicit reference was found in the text
== Unused Reference: 'Unicode-RegEx' is defined on line 763, but no
explicit reference was found in the text
== Unused Reference: 'Unicode-Scripts' is defined on line 768, but no
explicit reference was found in the text
== Unused Reference: 'ASCII' is defined on line 780, but no explicit
reference was found in the text
== Unused Reference: 'Unicode' is defined on line 829, but no explicit
reference was found in the text
-- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA2008-BIDI'
-- Possible downref: Non-RFC (?) normative reference: ref.
'Unicode-PropertyValueAliases'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-RegEx'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-Scripts'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-UAX15'
-- Obsolete informational reference (is this intentional?): RFC 2535
(Obsoleted by RFC 4033, RFC 4034, RFC 4035)
-- Obsolete informational reference (is this intentional?): RFC 2671
(Obsoleted by RFC 6891)
-- Obsolete informational reference (is this intentional?): RFC 3490
(Obsoleted by RFC 5890, RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 3491
(Obsoleted by RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 4952
(Obsoleted by RFC 6530)
Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 19 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Klensin
3 Internet-Draft November 2, 2008
4 Obsoletes: 3490, 3491
5 (if approved)
6 Updates: 3492 (if approved)
7 Intended status: Standards Track
8 Expires: May 6, 2009
10 Internationalized Domain Names in Applications (IDNA): Protocol
11 draft-ietf-idnabis-protocol-06.txt
13 Status of this Memo
15 By submitting this Internet-Draft, each author represents that any
16 applicable patent or other IPR claims of which he or she is aware
17 have been or will be disclosed, and any of which he or she becomes
18 aware will be disclosed, in accordance with Section 6 of BCP 79.
20 Internet-Drafts are working documents of the Internet Engineering
21 Task Force (IETF), its areas, and its working groups. Note that
22 other groups may also distribute working documents as Internet-
23 Drafts.
25 Internet-Drafts are draft documents valid for a maximum of six months
26 and may be updated, replaced, or obsoleted by other documents at any
27 time. It is inappropriate to use Internet-Drafts as reference
28 material or to cite them other than as "work in progress."
30 The list of current Internet-Drafts can be accessed at
31 http://www.ietf.org/ietf/1id-abstracts.txt.
33 The list of Internet-Draft Shadow Directories can be accessed at
34 http://www.ietf.org/shadow.html.
36 This Internet-Draft will expire on May 6, 2009.
38 Abstract
40 This document supplies the protocol definition for a revised and
41 updated specification for internationalized domain names (IDNs). The
42 rationale for these changes, the relationship to the older
43 specification, and important terminology are provided in other
44 documents. This document specifies the protocol mechanism, called
45 Internationalizing Domain Names in Applications (IDNA), for
46 registering and looking up IDNs in a way that does not require
47 changes to the DNS itself. IDNA is only meant for processing domain
48 names, not free text.
50 Table of Contents
52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
53 1.1. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4
54 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
55 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
56 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
57 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
58 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
59 3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 6
60 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
61 4.1. Proposed label . . . . . . . . . . . . . . . . . . . . . . 7
62 4.2. Conversion to Unicode and Normalization . . . . . . . . . 7
63 4.3. Permitted Character and Label Validation . . . . . . . . . 8
64 4.3.1. Rejection of Characters that are not Permitted . . . . 8
65 4.3.2. Label Validation . . . . . . . . . . . . . . . . . . . 8
66 4.3.3. Registration Validation Summary . . . . . . . . . . . 9
67 4.4. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
68 4.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 10
69 4.6. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
70 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
71 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
72 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
73 5.3. Character Changes in Preprocessing or the User
74 Interface . . . . . . . . . . . . . . . . . . . . . . . . 11
75 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 12
76 5.5. Validation and Character List Testing . . . . . . . . . . 12
77 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
78 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
79 6. Name Server Considerations . . . . . . . . . . . . . . . . . . 13
80 6.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 14
81 6.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 14
82 6.3. Root and other DNS Server Considerations . . . . . . . . . 15
83 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15
84 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
85 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 16
86 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
87 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
88 11.1. Normative References . . . . . . . . . . . . . . . . . . . 17
89 11.2. Informative References . . . . . . . . . . . . . . . . . . 18
90 Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 19
91 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 20
92 B.1. Changes between Version -00 and -01 of
93 draft-ietf-idnabis-protocol . . . . . . . . . . . . . . . 20
94 B.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 20
95 B.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 20
96 B.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 21
97 B.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 21
98 B.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 21
99 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21
100 Intellectual Property and Copyright Statements . . . . . . . . . . 22
102 1. Introduction
104 This document supplies the protocol definition for a revised and
105 updated specification for internationalized domain names. Essential
106 definitions and terminology for understanding this document and a
107 road map of the collection of documents that make up IDNA2008 appear
108 in [IDNA2008-Defs]. Appendix A discusses the relationship between
109 this specification and the earlier version of IDNA (referred to here
110 as "IDNA2003") and the rationale for these changes, along with
111 considerable explanatory material and advice to zone administrators
112 who support IDNs is provided in another documents, notably
113 [IDNA2008-Rationale].
115 IDNA works by allowing applications to use certain ASCII string
116 labels (beginning with a special prefix) to represent non-ASCII name
117 labels. Lower-layer protocols need not be aware of this; therefore
118 IDNA does not depend on changes to any infrastructure. In
119 particular, IDNA does not depend on any changes to DNS servers,
120 resolvers, or protocol elements, because the ASCII name service
121 provided by the existing DNS is entirely sufficient for IDNA.
123 IDNA is applied only to DNS labels. Standards for combining labels
124 into fully-qualified domain names and parsing labels out of those
125 names are covered in the base DNS standards [RFC1034] [RFC1035] and
126 their various updates. An application may, of course, apply locally-
127 appropriate conventions to the presentation forms of domain names as
128 discussed in [IDNA2008-Rationale].
130 While they share terminology, reference data, and some operations,
131 this document describes two separate protocols, one for IDN
132 registration (Section 4) and one for IDN lookup (Section 5).
134 A good deal of the background material that appeared in RFC 3490
135 [RFC3490] has been removed from this update. That material is either
136 of historical interest only or has been covered from a more recent
137 perspective in RFC 4690 [RFC4690] and [IDNA2008-Rationale].
138 [[anchor2: This paragraph is not normative and not required to
139 understand this spec. It will be removed in version -07 unless
140 someone provides a convincing rationale for retaining it.]]
142 1.1. Discussion Forum
144 [[anchor4: RFC Editor: please remove this section.]]
146 This work is being discussed in the IETF IDNABIS WG and on the
147 mailing list idna-update@alvestrand.no
149 2. Terminology
151 General terminology applicable to IDNA, but with meanings familiar to
152 those who have worked with Unicode or other character set standards
153 and the DNS, appears in [IDNA2008-Defs]. Terminology that is an
154 integral, normative, part of the IDNA definition, including the
155 definitions of "ACE", appears in that document as well. Familiarity
156 with the terminology materials in that document is assumed for
157 reading this one. The reader of this document is assumed to be
158 familiar with DNS-specific terminology as defined in RFC 1034
159 [RFC1034].
161 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
162 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
163 document are to be interpreted as described in BCP 14, RFC 2119
164 [RFC2119].
166 3. Requirements and Applicability
168 3.1. Requirements
170 IDNA conformance means adherence to the following requirements:
172 1. Whenever a domain name is put into an IDN-unaware domain name
173 slot (see Section 2 and [IDNA2008-Rationale]), it MUST contain
174 only ASCII characters (i.e., must be either an A-label or an LDH-
175 label), or must be a label associated with a DNS application that
176 is not subject to either IDNA or the historical recommendations
177 for "hostname"-style names [RFC1034].
179 2. Comparison of labels SHOULD be done on the A-label form, using an
180 ASCII case-insensitive comparison as with all comparisons of DNS
181 labels. Because A-labels and U-labels can be transformed into
182 each other without loss of information, comparison of native
183 character labels is possible if the application first carefully
184 verifies that the strings are U-labels.
186 3. Labels being registered MUST conform to the requirements of
187 Section 4. Labels being looked up and the lookup process MUST
188 conform to the requirements of Section 5.
190 3.2. Applicability
192 IDNA is applicable to all domain names in all domain name slots
193 except where it is explicitly excluded. It is not applicable to
194 domain name slots which do not use the LDH syntax rules.
196 This implies that IDNA is applicable to many protocols that predate
197 IDNA. Note that IDNs occupying domain name slots in those older
198 protocols MUST be in A-label form until and unless those protocols
199 and implementations of them are upgraded and that IDNs actually
200 appearing in DNS queries or responses MUST be in A-label form.
202 3.2.1. DNS Resource Records
204 IDNA applies only to domain names in the NAME and RDATA fields of DNS
205 resource records whose CLASS is IN.
207 There are currently no other exclusions on the applicability of IDNA
208 to DNS resource records. Applicability depends entirely on the
209 CLASS, and not on the TYPE except as noted below. This will remain
210 true, even as new types are defined, unless there is a compelling
211 reason for a new type that requires type-specific rules. The special
212 naming conventions applicable to SRV records are examples of type-
213 specific rules that are incompatible with IDNA coding. Hence the
214 first two labels (the ones required to start in "_") on a record with
215 TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible
216 to write a non-ASCII string with a leading underscore, conversion to
217 an A-label would be impossible without loss of information because
218 the underscore is not a letter, digit, or hyphen and is consequently
219 DISALLOWED in IDNs). Of course, those labels may be part of a domain
220 that uses IDN labels at higher levels in the tree.
222 3.2.2. Non-domain-name Data Types Stored in the DNS
224 Although IDNA enables the representation of non-ASCII characters in
225 domain names, that does not imply that IDNA enables the
226 representation of non-ASCII characters in other data types that are
227 stored in domain names, specifically in the RDATA field for types
228 that have structured RDATA format. For example, an email address
229 local part is stored in a domain name in the RNAME field as part of
230 the RDATA of an SOA record (hostmaster@example.com would be
231 represented as hostmaster.example.com). IDNA specifically does not
232 update the existing email standards, which allow only ASCII
233 characters in local parts. Even though work is in progress to define
234 internationalization for email addresses [RFC4952], changes to the
235 email address part of the SOA RDATA would require action in, or
236 updates to, other standards, specifically those that specify the
237 format of the SOA RR.
239 4. Registration Protocol
241 This section defines the procedure for registering an IDN. The
242 procedure is implementation independent; any sequence of steps that
243 produces exactly the same result for all labels is considered a valid
244 implementation.
246 Note that, while the registration and lookup protocols (Section 5)
247 are very similar in most respects, they are different and
248 implementers should carefully follow the steps they are implementing.
250 4.1. Proposed label
252 The registrant submits a request for an IDN. The user typically
253 produces the request string by the keyboard entry of a character
254 sequence in the local native character set (which might, of course,
255 be Unicode).
257 The registry MAY permit submission of labels in A-label form. If it
258 does so, it SHOULD perform a conversion to a U-label, perform the
259 steps and tests described below, and verify that the A-label produced
260 by the step in Section 4.5 matches the one provided as input. If,
261 for some reason, it does not, the registration MUST be rejected. If
262 the conversion to a U-label is not performed, the registry MUST
263 verify that the A-label is superficially valid, i.e., that it does
264 not violate any of the rules of Punycode [RFC3492] encoding such as
265 the prohibition on trailing hyphen-minus, appearance of non-basic
266 characters before the delimiter, and so on. Invalid strings that
267 appear to be A-labels MUST NOT be placed in DNS zones.
268 [[anchor9: Editorial: Should the sentences starting with "The
269 registry" be moved to 4.3? I.e., would they be more in sequence
270 there? Note that A-labels are, by definition, in ASCII, so section
271 4.2 does not apply to them. The tone of this recommendation also
272 seems slightly at odds with the statements at the end of 4.2.
273 Suggested text for cleaning this up, harmonizing it, and reducing
274 redundancy would be appreciated.]]
276 4.2. Conversion to Unicode and Normalization
278 Some system routine, or a localized front-end to the IDNA process,
279 ensures that the proposed label is a Unicode string or converts it to
280 one as appropriate. That string MUST be in Unicode Normalization
281 Form C (NFC [Unicode-UAX15]).
283 As a local implementation choice, the implementation MAY choose to
284 map some forbidden characters to permitted characters (for instance
285 mapping uppercase characters to lowercase ones), displaying the
286 result to the user, and allowing processing to continue. However, it
287 is strongly recommended that, to avoid any possible ambiguity,
288 entities responsible for zone files ("registries") accept
289 registrations only for A-labels (to be converted to U-labels by the
290 registry as discussed above) or U-labels actually produced from
291 A-labels, not forms expected to be converted by some other process.
293 4.3. Permitted Character and Label Validation
295 4.3.1. Rejection of Characters that are not Permitted
297 The Unicode string is checked to verify that no characters that IDNA
298 does not permit in input appear in it. Those characters are
299 identified in the "DISALLOWED" and "UNASSIGNED" lists that are
300 specified in [IDNA2008-Tables] and described informally in
301 [IDNA2008-Rationale]. Characters that are either DISALLOWED or
302 UNASSIGNED MUST NOT be part of labels to be processed for
303 registration in the DNS.
305 4.3.2. Label Validation
307 The proposed label (in the form of a Unicode string, i.e., a putative
308 U-label) is then examined, performing tests that require examination
309 of more than one character.
311 4.3.2.1. Rejection of Confusing or Hostile Sequences in U-labels
313 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
314 the third and fourth character positions.
316 4.3.2.2. Leading Combining Marks
318 The first character of the string is examined to verify that it is
319 not a combining mark. If it is a combining mark, the string MUST NOT
320 be registered.
322 4.3.2.3. Contextual Rules
324 Each code point is checked for its identification as a character
325 requiring contextual processing for registration (the list of
326 characters appears as the combination of CONTEXTJ and CONTEXTO in
327 [IDNA2008-Tables] as do the contextual rules themselves). If that
328 indication appears, the table of contextual rules is checked for a
329 rule for that character. If no rule is found, the proposed label is
330 rejected and MUST NOT be installed in a zone file. If one is found,
331 it is applied (typically as a test on the entire label or on adjacent
332 characters within the label). If the application of the rule does
333 not conclude that the character is valid in context, the proposed
334 label MUST BE rejected. (See the IANA Considerations: IDNA Context
335 Registry section of [IDNA2008-Tables].)
337 These contextual rules are required to permit the use of characters
338 that would otherwise risk causing considerable harm. For example,
339 labels containing invisible ("zero-width") characters may be
340 permitted in context with characters whose presentation forms are
341 significantly changed by the presence or absence of the zero-width
342 characters, while other labels in which zero-width characters appear
343 may be rejected.
344 [[anchor14: Should this paragraph be removed? Note that I've been
345 strongly encouraged to supply specific examples to reduce abstraction
346 and questions about the appropriateness of the text. -JcK]]
348 4.3.2.4. Labels Containing Characters Written Right to Left
350 Additional special tests for right-to-left strings are applied (See
351 [IDNA2008-BIDI]). Strings that contain right to left characters that
352 do not conform to the rule(s) identified there MUST NOT be inserted
353 as labels in zone files.
355 4.3.3. Registration Validation Summary
357 Strings that have been produced by the steps above, and whose
358 contents pass the above tests, are U-labels.
360 To summarize, tests are made in Section 4.3 for invalid characters,
361 invalid combinations of characters, and for labels that are invalid
362 even if the characters they contain are valid individually.
364 4.4. Registry Restrictions
366 Registries at all levels of the DNS, not just the top level, are
367 expected to establish policies about the labels that may be
368 registered, and for the processes associated with that action. While
369 exact policies are not specified as part of IDNA2008 and it is
370 expected that different registries may specify different policies,
371 there SHOULD be policies. Even a trivial policy (e.g., "anything can
372 be registered in this zone that can be represented as an A-label -
373 U-label pair") has value because it provides notice to users and
374 applications implementers that the registry cannot be relied upon to
375 provide even minimal user-protection restrictions. These per-
376 registry policies and restrictions are an essential element of the
377 IDNA registration protocol even for registries (and corresponding
378 zone files) deep in the DNS hierarchy. As discussed in
379 [IDNA2008-Rationale], such restrictions have always existed in the
380 DNS. That document also contains a discussion and recommendations
381 about possible types of rules.
383 The string produced by the above steps is checked and processed as
384 appropriate to local registry restrictions. Application of those
385 registry restrictions may result in the rejection of some labels or
386 the application of special restrictions to others.
388 4.5. Punycode Conversion
390 The resulting U-label is converted to an A-label. The A-label, more
391 precisely defined elsewhere, is the encoding of the U-label according
392 to the Punycode algorithm [RFC3492] with the ACE prefix "xn--" added
393 at at the beginning of the string. This document updates RFC 3492
394 only to the extent of replacing the reference to the discussion of
395 the ACE prefix. The ACE prefix is now specified in this document
396 rather than as part of RFC 3490 or Nameprep [RFC3491].
398 The failure conditions identified in the Punycode encoding procedure
399 cannot occur if the input is a U-label as determined by the steps
400 above.
402 4.6. Insertion in the Zone
404 The A-label is registered in the DNS by insertion into a zone.
406 5. Domain Name Lookup Protocol
408 Lookup is conceptually different from registration and different
409 tests are applied on the client. Although some validity checks are
410 necessary to avoid serious problems with the protocol (see
411 Section 5.5ff.), the lookup-side tests are more permissive and rely
412 on the assumption that names that are present in the DNS are valid.
413 That assumption is, however, a weak one because the presence of wild
414 cards in the DNS might cause a string that is not actually registered
415 in the DNS to be successfully looked up.
417 5.1. Label String Input
419 The user supplies a string in the local character set, typically by
420 typing it or clicking on, or copying and pasting, a resource
421 identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the
422 domain name is extracted. Alternately, some process not directly
423 involving the user may read the string from a file or obtain it in
424 some other way. Processing in this step and the next two are local
425 matters, to be accomplished prior to actual invocation of IDNA, but
426 at least the two steps in Section 5.2 and Section 5.3 must be
427 accomplished in some way.
429 5.2. Conversion to Unicode
431 The string is converted from the local character set into Unicode, if
432 it is not already Unicode. The exact nature of this conversion is
433 beyond the scope of this document, but may involve normalization as
434 described in Section 4.2. The result MUST be a Unicode string in NFC
435 form.
437 5.3. Character Changes in Preprocessing or the User Interface
439 The Unicode string MAY then be processed to prevent confounding of
440 user expectations. For instance, it might be reasonable, at this
441 step, to convert all upper case characters to lower case, if this
442 makes sense in the user's environment, but even this should be
443 approached with caution due to some edge cases: in the long term, it
444 is probably better for users to understand IDNs strictly in lower-
445 case, U-label, form. More generally, preprocessing may be useful to
446 smooth the transition from IDNA2003, especially for direct user
447 input, but with similar cautions. In general, IDNs appearing in
448 files and those transmitted across the network as part of protocols
449 are expected to be in either ASCII form (including A-labels) or to
450 contain U-labels, rather than being in forms requiring mapping or
451 other conversions.
453 Other examples of processing for localization might be applied,
454 especially to direct user input, at this point. They include
455 interpreting various characters as separating domain name components
456 from each other (label separators) because they either look like
457 periods or are used to separate sentences, mapping halfwidth or
458 fullwidth East Asian characters to the common form permitted in
459 labels, or giving special treatment to characters whose presentation
460 forms are dependent only on placement in the label. Such
461 localization changes are also outside the scope of this
462 specification.
464 Recommendations for preprocessing for global contexts (i.e., when
465 local considerations do not apply or cannot be used) and for maximum
466 interoperability with labels that might have been specified under
467 liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. It
468 is important to note that the intent of these specifications is that
469 labels in application protocols, files, or links are intended to be
470 in U-label or A-label form. Preprocessing MUST NOT map a character
471 that is valid in a label as specified elsewhere in this document or
472 in [IDNA2008-Tables] into another character. Excessively liberal use
473 of preprocessing, especially to strings stored in files, poses a
474 threat to consistent and predictable behavior for the user even if
475 not to actual interoperability.
477 Because these transformations are local, it is important that domain
478 names that might be passed between systems (e.g., in IRIs) be
479 U-labels or A-labels and not forms that might be accepted locally as
480 a consequence of this step. This step is not standardized as part of
481 IDNA, and is not further specified here.
483 5.4. A-label Input
485 If the input to this procedure appears to be an A-label (i.e., it
486 starts in "xn--"), the lookup application MAY attempt to convert it
487 to a U-label and apply the tests of Section 5.5 and the conversion of
488 Section 5.6 to that form. If the label is converted to Unicode
489 (i.e., to U-label form) using the Punycode decoding algorithm, then
490 the processing specified in those two sections MUST be performed, and
491 the label MUST be rejected if the resulting label is not identical to
492 the original. See also Section 6.1.
494 That conversion and testing SHOULD be performed if the domain name
495 will later be presented to the user in native character form (this
496 requires that the lookup application be IDNA-aware). If those steps
497 are not performed, the lookup process SHOULD at least make tests to
498 determine that the string is actually an A-label, examining it for
499 the invalid formats specified in the Punycode decoding specification.
500 Applications that are not IDNA-aware will obviously omit that
501 testing; others MAY treat the string as opaque to avoid the
502 additional processing at the expense of providing less protection and
503 information to users.
505 5.5. Validation and Character List Testing
507 As with the registration procedure, the Unicode string is checked to
508 verify that all characters that appear in it are valid as input to
509 IDNA lookup processing. As discussed above and in
510 [IDNA2008-Rationale], the lookup check is more liberal than the
511 registration one. Putative labels with any of the following
512 characteristics MUST BE rejected prior to DNS lookup:
514 o Labels containing code points that are unassigned in the version
515 of Unicode being used by the application, i.e., in the
516 "Unassigned" Unicode category or the UNASSIGNED category of
517 [IDNA2008-Tables].
519 o Labels that are not in NFC form.
521 o Labels containing prohibited code points, i.e., those that are
522 assigned to the "DISALLOWED" category in the permitted character
523 table [IDNA2008-Tables].
525 o Labels containing code points that are shown in the permitted
526 character table as requiring a contextual rule and that are
527 flagged as requiring exceptional special processing on lookup
528 ("CONTEXTJ" in the Tables) but do not conform to that rule.
530 o Labels containing other code points that are shown in the
531 permitted character table as requiring a contextual rule
532 ("CONTEXTO" in the tables), but for which no such rule appears in
533 the table of rules. With the exception in the rule immediately
534 above, applications resolving DNS names or carrying out equivalent
535 operations are not required to test contextual rules, only to
536 verify that a rule exists.
538 o Labels whose first character is a combining mark.
540 In addition, the application SHOULD apply the following test. The
541 test may be omitted in special circumstances, such as when the lookup
542 application knows that the conditions are enforced elsewhere, because
543 an attempt to look up and resolve such strings will almost certainly
544 lead to a DNS lookup failure except when wildcards are present in the
545 zone. However, applying the test is likely to give much better
546 information about the reason for a lookup failure -- information that
547 may be usefully passed to the user when that is feasible -- then DNS
548 resolution failure information alone. In any event, lookup
549 applications should avoid attempting to resolve labels that are
550 invalid under that test.
552 o Verification that the string is compliant with the requirements
553 for right to left characters, specified in [IDNA2008-BIDI].
555 For all other strings, the lookup application MUST rely on the
556 presence or absence of labels in the DNS to determine the validity of
557 those labels and the validity of the characters they contain. If
558 they are registered, they are presumed to be valid; if they are not,
559 their possible validity is not relevant. A lookup application that
560 declines to process a string that conforms to the above rules and
561 look it up in the DNS is not in conformance with this protocol.
563 5.6. Punycode Conversion
565 The validated string, a U-label, is converted to an A-label using the
566 Punycode algorithm with the ACE prefix added.
568 5.7. DNS Name Resolution
570 The A-label is looked up in the DNS, using normal DNS resolver
571 procedures.
573 6. Name Server Considerations
575 [[anchor18: Note in draft: If we really want this document to contain
576 only information that is necessary to proper implementation of IDNA
577 by implementers who are familiar with the DNS, the material in this
578 section is either tutorial, explanatory, or totally unnecessary.
579 Should some or all of it be moved back to Rationale?]]
581 6.1. Processing Non-ASCII Strings
583 Existing DNS servers do not know the IDNA rules for handling non-
584 ASCII forms of IDNs, and therefore need to be shielded from them.
585 All existing channels through which names can enter a DNS server
586 database (for example, master files (as described in RFC 1034) and
587 DNS update messages [RFC2136]) are IDN-unaware because they predate
588 IDNA. Other sections of this document provide the needed shielding
589 by ensuring that internationalized domain names entering DNS server
590 databases through such channels have already been converted to their
591 equivalent ASCII A-label forms.
593 Because of the design of the algorithms in Section 4 and Section 5 (a
594 domain name containing only ASCII codepoints can not be converted to
595 an A-label), there can not be more than one A-label form for any
596 given U-label.
598 As specified in RFC 2181 [RFC2181], the DNS protocol explicitly
599 allows domain labels to contain octets beyond the ASCII range
600 (0000..007F), and this document does not change that. Note, however,
601 that there is no defined interpretation of octets 0080..00FF as
602 characters. If labels containing these octets are returned to
603 applications, unpredictable behavior could result. The A-label form,
604 which cannot contain those characters, is the only standard
605 representation for internationalized labels in the DNS protocol.
607 6.2. DNSSEC Authentication of IDN Domain Names
609 DNS Security [RFC2535] is a method for supplying cryptographic
610 verification information along with DNS messages. Public Key
611 Cryptography is used in conjunction with digital signatures to
612 provide a means for a requester of domain information to authenticate
613 the source of the data. This ensures that it can be traced back to a
614 trusted source, either directly or via a chain of trust linking the
615 source of the information to the top of the DNS hierarchy.
617 IDNA specifies that all internationalized domain names served by DNS
618 servers that cannot be represented directly in ASCII must use the
619 A-label form. Conversion to A-labels must be performed prior to a
620 zone being signed by the private key for that zone. Because of this
621 ordering, it is important to recognize that DNSSEC authenticates a
622 domain name containing A-labels or conventional LDH-labels, not
623 U-labels. In the presence of DNSSEC, no form of a zone file or query
624 response that contains a U-label may be signed or the signature
625 validated.
627 One consequence of this for sites deploying IDNA in the presence of
628 DNSSEC is that any special purpose proxies or forwarders used to
629 transform user input into IDNs must be earlier in the lookup flow
630 than DNSSEC authenticating nameservers for DNSSEC to work.
632 6.3. Root and other DNS Server Considerations
634 IDNs in A-label form will generally be somewhat longer than current
635 domain names, so the bandwidth needed by the root servers is likely
636 to go up by a small amount. Also, queries and responses for IDNs
637 will probably be somewhat longer than typical queries historically,
638 so EDNS0 [RFC2671] support may be more important (otherwise, queries
639 and responses may be forced to go to TCP instead of UDP).
641 7. Security Considerations
643 The general security principles and issues for IDNA appear in
644 [IDNA2008-Defs] with additional explanation in [IDNA2008-Rationale].
645 The comments below are specific to the registration and loopup
646 protocols specified in this document, but should be read in the
647 context of the material in the first of those documents and the
648 definitions and specifications, identified there, on which this one
649 depends.
651 This memo describes procedures for registering and looking up labels
652 that are not compatible with the preferred syntax described in the
653 base DNS specifications (STD13 [RFC1034] [RFC1035] and Host
654 Requirements [RFC1123]) because they contain non-ASCII characters.
655 These procedures depend on the use of a special ASCII-compatible
656 encoding form that contains only characters permitted in host names
657 by those earlier specifications. The encoding used is Punycode
658 [RFC3492]. No security issues such as string length increases or new
659 allowed values are introduced by the encoding process or the use of
660 these encoded values, apart from those introduced by the ACE encoding
661 itself.
663 Domain names (or portions of them) are sometimes compared against a
664 set of domains to be given special treatment if a match occurs, e.g.,
665 treated as more privileged than others or blocked in some way. In
666 such situations, it is especially important that the comparisons be
667 done properly, as specified in Requirement 2 of Section 3.1. For
668 labels already in ASCII form (i.e., are LDH-labels or A-labels), the
669 proper comparison reduces to the same case-insensitive ASCII
670 comparison that has always been used for ASCII labels.
672 The introduction of IDNA means that any existing labels that start
673 with the ACE prefix would be construed as A-labels, at least until
674 they failed one of the relevant tests, whether or not that was the
675 intent of the zone administrator or registrant. There is no evidence
676 that this has caused any practical problems since RFC 3490 was
677 adopted, but the risk still exists in principle.
679 8. IANA Considerations
681 IANA actions for this version of IDNA are specified in
682 [IDNA2008-Tables] and discussed informally in [IDNA2008-Rationale].
683 The components of IDNA described in this document do not require any
684 IANA actions.
686 9. Contributors
688 While the listed editor held the pen, the original versions of this
689 document represent the joint work and conclusions of an ad hoc design
690 team consisting of the editor and, in alphabetic order, Harald
691 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
692 draws significantly on the original version of IDNA [RFC3490] both
693 conceptually and for specific text. This second-generation version
694 would not have been possible without the work that went into that
695 first version and its authors, Patrik Faltstrom, Paul Hoffman, and
696 Adam Costello. While Faltstrom was actively involved in the creation
697 of this version, Hoffman and Costello were not and should not be held
698 responsible for any errors or omissions.
700 10. Acknowledgements
702 This revision to IDNA would have been impossible without the
703 accumulated experience since RFC 3490 was published and resulting
704 comments and complaints of many people in the IETF, ICANN, and other
705 communities, too many people to list here. Nor would it have been
706 possible without RFC 3490 itself and the efforts of the Working Group
707 that defined it. Those people whose contributions are acknowledged
708 in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly
709 important.
711 Specific textual changes were incorporated into this document after
712 suggestions from the other contributors, Stephane Bortzmeyer, Mark
713 Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel, Marcos Sanz,
714 Andrew Sullivan, Ken Whistler, and other WG participants. Special
715 thanks are due to Paul Hoffman for permission to extract material
716 from his Internet-Draft to form the basis for Appendix A
718 11. References
720 11.1. Normative References
722 [IDNA2008-BIDI]
723 Alvestrand, H. and C. Karp, "An updated IDNA criterion for
724 right-to-left scripts", July 2008, .
727 [IDNA2008-Defs]
728 Klensin, J., "Internationalized Domain Names for
729 Applications (IDNA): Definitions and Document Framework",
730 November 2008, .
733 [IDNA2008-Tables]
734 Faltstrom, P., "The Unicode Codepoints and IDNA",
735 July 2008, .
738 A version of this document is available in HTML format at
739 http://stupid.domain.name/idnabis/
740 draft-ietf-idnabis-tables-02.html
742 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
743 STD 13, RFC 1034, November 1987.
745 [RFC1035] Mockapetris, P., "Domain names - implementation and
746 specification", STD 13, RFC 1035, November 1987.
748 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
749 and Support", STD 3, RFC 1123, October 1989.
751 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
752 Requirement Levels", BCP 14, RFC 2119, March 1997.
754 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
755 for Internationalized Domain Names in Applications
756 (IDNA)", RFC 3492, March 2003.
758 [Unicode-PropertyValueAliases]
759 The Unicode Consortium, "Unicode Character Database:
760 PropertyValueAliases", March 2008, .
763 [Unicode-RegEx]
764 The Unicode Consortium, "Unicode Technical Standard #18:
765 Unicode Regular Expressions", May 2005,
766 .
768 [Unicode-Scripts]
769 The Unicode Consortium, "Unicode Standard Annex #24:
770 Unicode Script Property", February 2008,
771 .
773 [Unicode-UAX15]
774 The Unicode Consortium, "Unicode Standard Annex #15:
775 Unicode Normalization Forms", 2006,
776 .
778 11.2. Informative References
780 [ASCII] American National Standards Institute (formerly United
781 States of America Standards Institute), "USA Code for
782 Information Interchange", ANSI X3.4-1968, 1968.
784 ANSI X3.4-1968 has been replaced by newer versions with
785 slight modifications, but the 1968 version remains
786 definitive for the Internet.
788 [IDNA2008-Rationale]
789 Klensin, J., Ed., "Internationalizing Domain Names for
790 Applications (IDNA): Issues, Explanation, and Rationale",
791 November 2008, .
794 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
795 "Dynamic Updates in the Domain Name System (DNS UPDATE)",
796 RFC 2136, April 1997.
798 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
799 Specification", RFC 2181, July 1997.
801 [RFC2535] Eastlake, D., "Domain Name System Security Extensions",
802 RFC 2535, March 1999.
804 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
805 RFC 2671, August 1999.
807 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
808 "Internationalizing Domain Names in Applications (IDNA)",
809 RFC 3490, March 2003.
811 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
812 Profile for Internationalized Domain Names (IDN)",
813 RFC 3491, March 2003.
815 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
816 Resource Identifier (URI): Generic Syntax", STD 66,
817 RFC 3986, January 2005.
819 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
820 Identifiers (IRIs)", RFC 3987, January 2005.
822 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
823 Recommendations for Internationalized Domain Names
824 (IDNs)", RFC 4690, September 2006.
826 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
827 Internationalized Email", RFC 4952, July 2007.
829 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
830 5.0", 2007.
832 Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0
834 Appendix A. Summary of Major Changes from IDNA2003
836 1. Update base character set from Unicode 3.2 to Unicode version-
837 agnostic.
839 2. Separate the definitions for the "registration" and "lookup"
840 activities.
842 3. Disallow symbol and punctuation characters except where special
843 exceptions are necessary.
845 4. Remove the mapping and normalization steps from the protocol and
846 have them instead done by the applications themselves, possibly
847 in a local fashion, before invoking the protocol.
849 5. Change the way that the protocol specifies which characters are
850 allowed in labels from "humans decide what the table of
851 codepoints contains" to "decision about codepoints are based on
852 Unicode properties plus a small exclusion list created by
853 humans".
855 6. Introduce the new concept of characters that can be used only in
856 specific contexts.
858 7. Allow typical words and names in languages such as Dhivehi and
859 Yiddish to be expressed.
861 8. Make bidirectional domain names (delimited strings of labels,
862 not just labels standing on their own) display in a non-
863 surprising fashion whether they appear in obvious domain name
864 contexts or as part of running text in paragraphs.
866 9. Remove the dot separator from the mandatory part of the
867 protocol.
869 10. Make some currently-valid labels that are not actually IDNA
870 labels invalid.
872 Appendix B. Change Log
874 [[anchor27: RFC Editor: Please remove this appendix.]]
876 B.1. Changes between Version -00 and -01 of draft-ietf-idnabis-protocol
878 o Corrected discussion of SRV records.
880 o Several small corrections for clarity.
882 o Inserted more "open issue" placeholders.
884 B.2. Version -02
886 o Rewrote the "conversion to Unicode" text in Section 5.2 as
887 requested on-list.
889 o Added a comment (and reference) about EDNS0 to the "DNS Server
890 Conventions" section, which was also retitled.
892 o Made several editorial corrections and improvements in response to
893 various comments.
895 o Added several new discussion placeholder anchors and updated some
896 older ones.
898 B.3. Version -03
900 o Trimmed change log, removing information about pre-WG drafts.
902 o Incorporated a number of changes suggested by Marcos Sanz in his
903 note of 2008.07.17 and added several more placeholder anchors.
905 o Several minor editorial corrections and improvements.
907 o "Editor" designation temporarily removed because the automatic
908 posting machinery does not accept it.
910 B.4. Version -04
912 o Removed Contextual Rule appendices for transfer to Tables.
914 o Several changes, including removal of discussion anchors, based on
915 discussions at IETF 72 (Dublin)
917 o Rewrote the preprocessing material (Section 5.3) somewhat.
919 B.5. Version -05
921 o Updated part of the A-label input explanation (Section 5.4) per
922 note from Erik van der Poel.
924 B.6. Version -06
926 o Corrected a few typographical errors.
928 o Incorporated the material (formerly in Rationale) on the
929 relationship between IDNA2003 and IDNA2008 as an appendix and
930 pointed to the new definitions document.
932 o Text modified in several places to recognize the dangers of
933 interaction between DNS wildcards and IDNs.
935 o Text added to be explicit about the handling of edge and failure
936 cases in Punycode encoding and decoding.
938 o Revised for consistency with the new Definitions document and to
939 make the text read more smoothly.
941 Author's Address
943 John C Klensin
944 1770 Massachusetts Ave, Ste 322
945 Cambridge, MA 02140
946 USA
948 Phone: +1 617 245 1457
949 Email: john+ietf@jck.com
951 Full Copyright Statement
953 Copyright (C) The IETF Trust (2008).
955 This document is subject to the rights, licenses and restrictions
956 contained in BCP 78, and except as set forth therein, the authors
957 retain all their rights.
959 This document and the information contained herein are provided on an
960 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
961 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
962 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
963 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
964 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
965 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
967 Intellectual Property
969 The IETF takes no position regarding the validity or scope of any
970 Intellectual Property Rights or other rights that might be claimed to
971 pertain to the implementation or use of the technology described in
972 this document or the extent to which any license under such rights
973 might or might not be available; nor does it represent that it has
974 made any independent effort to identify any such rights. Information
975 on the procedures with respect to rights in RFC documents can be
976 found in BCP 78 and BCP 79.
978 Copies of IPR disclosures made to the IETF Secretariat and any
979 assurances of licenses to be made available, or the result of an
980 attempt made to obtain a general license or permission for the use of
981 such proprietary rights by implementers or users of this
982 specification can be obtained from the IETF on-line IPR repository at
983 http://www.ietf.org/ipr.
985 The IETF invites any interested party to bring to its attention any
986 copyrights, patents or patent applications, or other proprietary
987 rights that may cover technology that may be required to implement
988 this standard. Please address the information to the IETF at
989 ietf-ipr@ietf.org.