idnits 2.17.1
draft-hoffman-rfc3490bis-02.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
this document.
Expected boilerplate is as follows today (2024-04-27) according to
https://trustee.ietf.org/license-info :
IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
This Internet-Draft is submitted in full conformance with the provisions
of BCP 78 and BCP 79.
IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
Copyright (c) 2024 IETF Trust and the persons identified as the document
authors. All rights reserved.
IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
This document is subject to BCP 78 and the IETF Trust's Legal Provisions
Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand
corner of the first page
** Missing expiration date. The document expiration date should appear on
the first and last page.
** The document seems to lack a 1id_guidelines paragraph about
Internet-Drafts being working documents.
** The document seems to lack a 1id_guidelines paragraph about 6 months
document validity.
** The document seems to lack a 1id_guidelines paragraph about the list of
current Internet-Drafts.
** The document seems to lack a 1id_guidelines paragraph about the list of
Shadow Directories.
** The document is more than 15 pages and seems to lack a Table of Contents.
== No 'Intended status' indicated for this document; assuming Proposed
Standard
== The page length should not exceed 58 lines per page, but there was 1
longer page, the longest (page 1) being 946 lines
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** There are 2 instances of too long lines in the document, the longest one
being 2 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The document seems to lack the recommended RFC 2119 boilerplate, even if
it appears to use RFC 2119 keywords.
(The document does seem to have the reference to RFC 2119 which the
ID-Checklist requires).
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- Couldn't find a document date in the document -- date freshness check
skipped.
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Missing Reference: 'RFC2136' is mentioned on line 737, but not defined
-- Obsolete informational reference (is this intentional?): RFC 2535
(Obsoleted by RFC 4033, RFC 4034, RFC 4035)
Summary: 9 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
1 draft-hoffman-rfc3490bis-02.txt P. Faltstrom
2 April 14, 2004 Cisco
3 Expires in six months P. Hoffman
4 IMC & VPNC
5 A. Costello
6 UC Berkeley
8 Internationalizing Domain Names in Applications (IDNA)
10 Abstract
12 Until now, there has been no standard method for domain names to use
13 characters outside the ASCII repertoire. This document defines
14 internationalized domain names (IDNs) and a mechanism called
15 Internationalizing Domain Names in Applications (IDNA) for handling
16 them in a standard fashion. IDNs use characters drawn from a large
17 repertoire (Unicode), but IDNA allows the non-ASCII characters to be
18 represented using only the ASCII characters already allowed in so-
19 called host names today. This backward-compatible representation is
20 required in existing protocols like DNS, so that IDNs can be
21 introduced with no changes to the existing infrastructure. IDNA is
22 only meant for processing domain names, not free text.
24 1. Introduction
26 IDNA works by allowing applications to use certain ASCII name labels
27 (beginning with a special prefix) to represent non-ASCII name labels.
28 Lower-layer protocols need not be aware of this; therefore IDNA does
29 not depend on changes to any infrastructure. In particular, IDNA
30 does not depend on any changes to DNS servers, resolvers, or protocol
31 elements, because the ASCII name service provided by the existing DNS
32 is entirely sufficient for IDNA.
34 This document does not require any applications to conform to IDNA,
35 but applications can elect to use IDNA in order to support IDN while
36 maintaining interoperability with existing infrastructure. If an
37 application wants to use non-ASCII characters in domain names, IDNA
38 is the only currently-defined option. Adding IDNA support to an
39 existing application entails changes to the application only, and
40 leaves room for flexibility in the user interface.
42 A great deal of the discussion of IDN solutions has focused on
43 transition issues and how IDN will work in a world where not all of
44 the components have been updated. Proposals that were not chosen by
45 the IDN Working Group would depend on user applications, resolvers,
46 and DNS servers being updated in order for a user to use an
47 internationalized domain name. Rather than rely on widespread
48 updating of all components, IDNA depends on updates to user
49 applications only; no changes are needed to the DNS protocol or any
50 DNS servers or the resolvers on users' computers.
52 The IESG issued a statement on IDNA [IESG-STATEMENT].
54 1.1 Problem Statement
56 The IDNA specification solves the problem of extending the repertoire
57 of characters that can be used in domain names to include the Unicode
58 repertoire (with some restrictions).
60 IDNA does not extend the service offered by DNS to the applications.
61 Instead, the applications (and, by implication, the users) continue
62 to see an exact-match lookup service. Either there is a single
63 exactly-matching name or there is no match. This model has served
64 the existing applications well, but it requires, with or without
65 internationalized domain names, that users know the exact spelling of
66 the domain names that the users type into applications such as web
67 browsers and mail user agents. The introduction of the larger
68 repertoire of characters potentially makes the set of misspellings
69 larger, especially given that in some cases the same appearance, for
70 example on a business card, might visually match several Unicode code
71 points or several sequences of code points.
73 IDNA allows the graceful introduction of IDNs not only by avoiding
74 upgrades to existing infrastructure (such as DNS servers and mail
75 transport agents), but also by allowing some rudimentary use of IDNs
76 in applications by using the ASCII representation of the non-ASCII
77 name labels. While such names are very user-unfriendly to read and
78 type, and hence are not suitable for user input, they allow (for
79 instance) replying to email and clicking on URLs even though the
80 domain name displayed is incomprehensible to the user. In order to
81 allow user-friendly input and output of the IDNs, the applications
82 need to be modified to conform to this specification.
84 IDNA uses the Unicode character repertoire, which avoids the
85 significant delays that would be inherent in waiting for a different
86 and specific character set be defined for IDN purposes by some other
87 standards developing organization.
89 1.2 Limitations of IDNA
91 The IDNA protocol does not solve all linguistic issues with users
92 inputting names in different scripts. Many important language-based
93 and script-based mappings are not covered in IDNA and need to be
94 handled outside the protocol. For example, names that are entered in
95 a mix of traditional and simplified Chinese characters will not be
96 mapped to a single canonical name. Another example is Scandinavian
97 names that are entered with U+00F6 (LATIN SMALL LETTER O WITH
98 DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH
99 STROKE).
101 An example of an important issue that is not considered in detail in
102 IDNA is how to provide a high probability that a user who is entering
103 a domain name based on visual information (such as from a business
104 card or billboard) or aural information (such as from a telephone or
105 radio) would correctly enter the IDN. Similar issues exist for ASCII
106 domain names, for example the possible visual confusion between the
107 letter 'O' and the digit zero, but the introduction of the larger
108 repertoire of characters creates more opportunities of similar
109 looking and similar sounding names. Note that this is a complex
110 issue relating to languages, input methods on computers, and so on.
111 Furthermore, the kind of matching and searching necessary for a high
112 probability of success would not fit the role of the DNS and its
113 exact matching function.
115 1.3 Brief overview for application developers
117 Applications can use IDNA to support internationalized domain names
118 anywhere that ASCII domain names are already supported, including DNS
119 master files and resolver interfaces. (Applications can also define
120 protocols and interfaces that support IDNs directly using non-ASCII
121 representations. IDNA does not prescribe any particular
122 representation for new protocols, but it still defines which names
123 are valid and how they are compared.)
125 The IDNA protocol is contained completely within applications. It is
126 not a client-server or peer-to-peer protocol: everything is done
127 inside the application itself. When used with a DNS resolver
128 library, IDNA is inserted as a "shim" between the application and the
129 resolver library. When used for writing names into a DNS zone, IDNA
130 is used just before the name is committed to the zone.
132 There are two operations described in section 4 of this document:
134 - The ToASCII operation is used before sending an IDN to something
135 that expects ASCII names (such as a resolver) or writing an IDN
136 into a place that expects ASCII names (such as a DNS master file).
138 - The ToUnicode operation is used when displaying names to users,
139 for example names obtained from a DNS zone.
141 It is important to note that the ToASCII operation can fail. If it
142 fails when processing a domain name, that domain name cannot be used
143 as an internationalized domain name and the application has to have
144 some method of dealing with this failure.
146 IDNA requires that implementations process input strings with
147 Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],
148 and then with Punycode [PUNYCODE]. Implementations of IDNA MUST
149 fully implement Nameprep and Punycode; neither Nameprep nor Punycode
150 are optional.
152 2. Terminology
154 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
155 and "MAY" in this document are to be interpreted as described in BCP
156 14, RFC 2119 [RFC2119].
158 A code point is an integer value associated with a character in a
159 coded character set.
161 Unicode [UNICODE] is a coded character set containing tens of
162 thousands of characters. A single Unicode code point is denoted by
163 "U+" followed by four to six hexadecimal digits, while a range of
164 Unicode code points is denoted by two hexadecimal numbers separated
165 by "..", with no prefixes.
167 ASCII means US-ASCII [USASCII], a coded character set containing 128
168 characters associated with code points in the range 0..7F. Unicode
169 is an extension of ASCII: it includes all the ASCII characters and
170 associates them with the same code points.
172 The term "LDH code points" is defined in this document to mean the
173 code points associated with ASCII letters, digits, and the hyphen-
174 minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an
175 abbreviation for "letters, digits, hyphen".
177 [STD13] talks about "domain names" and "host names", but many people
178 use the terms interchangeably. Further, because [STD13] was not
179 terribly clear, many people who are sure they know the exact
180 definitions of each of these terms disagree on the definitions. In
181 this document the term "domain name" is used in general. This
182 document explicitly cites [STD3] whenever referring to the host name
183 syntax restrictions defined therein.
185 A label is an individual part of a domain name. Labels are usually
186 shown separated by dots; for example, the domain name
187 "www.example.com" is composed of three labels: "www", "example", and
188 "com". (The zero-length root label described in [STD13], which can
189 be explicit as in "www.example.com." or implicit as in
190 "www.example.com", is not considered a label in this specification.)
191 IDNA extends the set of usable characters in labels that are text.
192 For the rest of this document, the term "label" is shorthand for
193 "text label", and "every label" means "every text label".
195 An "internationalized label" is a label to which the ToASCII
196 operation (see section 4) can be applied without failing (with the
197 UseSTD3ASCIIRules flag unset). This implies that every ASCII label
198 that satisfies the [STD13] length restriction is an internationalized
199 label. Therefore the term "internationalized label" is a
200 generalization, embracing both old ASCII labels and new non-ASCII
201 labels. Although most Unicode characters can appear in
202 internationalized labels, ToASCII will fail for some input strings,
203 and such strings are not valid internationalized labels.
205 An "internationalized domain name" (IDN) is a domain name in which
206 every label is an internationalized label. This implies that every
207 ASCII domain name is an IDN (which implies that it is possible for a
208 name to be an IDN without it containing any non-ASCII characters).
209 This document does not attempt to define an "internationalized host
210 name". Just as has been the case with ASCII names, some DNS zone
211 administrators may impose restrictions, beyond those imposed by DNS
212 or IDNA, on the characters or strings that may be registered as
213 labels in their zones. Such restrictions have no impact on the
214 syntax or semantics of DNS protocol messages; a query for a name that
215 matches no records will yield the same response regardless of the
216 reason why it is not in the zone. Clients issuing queries or
217 interpreting responses cannot be assumed to have any knowledge of
218 zone-specific restrictions or conventions.
220 In IDNA, equivalence of labels is defined in terms of the ToASCII
221 operation, which constructs an ASCII form for a given label, whether
222 or not the label was already an ASCII label. Labels are defined to
223 be equivalent if and only if their ASCII forms produced by ToASCII
224 match using a case-insensitive ASCII comparison. ASCII labels
225 already have a notion of equivalence: upper case and lower case are
226 considered equivalent. The IDNA notion of equivalence is an
227 extension of that older notion. Equivalent labels in IDNA are
228 treated as alternate forms of the same label, just as "foo" and "Foo"
229 are treated as alternate forms of the same label.
231 To allow internationalized labels to be handled by existing
232 applications, IDNA uses an "ACE label" (ACE stands for ASCII
233 Compatible Encoding). An ACE label is an internationalized label
234 that can be rendered in ASCII and is equivalent to an
235 internationalized label that cannot be rendered in ASCII. Given any
236 internationalized label that cannot be rendered in ASCII, the ToASCII
237 operation will convert it to an equivalent ACE label (whereas an
238 ASCII label will be left unaltered by ToASCII). ACE labels are
239 unsuitable for display to users. The ToUnicode operation will
240 convert any label to an equivalent non-ACE label. In fact, an ACE
241 label is formally defined to be any label that the ToUnicode
242 operation would alter (whereas non-ACE labels are left unaltered by
243 ToUnicode). Every ACE label begins with the ACE prefix specified in
244 section 5. The ToASCII and ToUnicode operations are specified in
245 section 4.
247 The "ACE prefix" is defined in this document to be a string of ASCII
248 characters that appears at the beginning of every ACE label. It is
249 specified in section 5.
251 A "domain name slot" is defined in this document to be a protocol
252 element or a function argument or a return value (and so on)
253 explicitly designated for carrying a domain name. Examples of domain
254 name slots include: the QNAME field of a DNS query; the name argument
255 of the gethostbyname() library function; the part of an email address
256 following the at-sign (@) in the From: field of an email message
257 header; and the host portion of the URI in the src attribute of an
258 HTML tag. General text that just happens to contain a domain
259 name is not a domain name slot; for example, a domain name appearing
260 in the plain text body of an email message is not occupying a domain
261 name slot.
263 An "IDN-aware domain name slot" is defined in this document to be a
264 domain name slot explicitly designated for carrying an
265 internationalized domain name as defined in this document. The
266 designation may be static (for example, in the specification of the
267 protocol or interface) or dynamic (for example, as a result of
268 negotiation in an interactive session).
270 An "IDN-unaware domain name slot" is defined in this document to be
271 any domain name slot that is not an IDN-aware domain name slot.
272 Obviously, this includes any domain name slot whose specification
273 predates IDNA.
275 3. Requirements and applicability
277 3.1 Requirements
279 IDNA conformance means adherence to the following four requirements:
281 1) Whenever dots are used as label separators, the following
282 characters MUST be recognized as dots: U+002E (full stop), U+3002
283 (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
284 (halfwidth ideographic full stop).
286 2) Whenever a domain name is put into an IDN-unaware domain name slot
287 (see section 2), it MUST contain only ASCII characters. Given an
288 internationalized domain name (IDN), an equivalent domain name
289 satisfying this requirement can be obtained by applying the
290 ToASCII operation (see section 4) to each label and, if dots are
291 used as label separators, changing all the label separators to
292 U+002E.
294 3) ACE labels obtained from domain name slots SHOULD be hidden from
295 users when it is known that the environment can handle the non-ACE
296 form, except when the ACE form is explicitly requested. When it
297 is not known whether or not the environment can handle the non-ACE
298 form, the application MAY use the non-ACE form (which might fail,
299 such as by not being displayed properly), or it MAY use the ACE
300 form (which will look unintelligible to the user). Given an
301 internationalized domain name, an equivalent domain name
302 containing no ACE labels can be obtained by applying the ToUnicode
303 operation (see section 4) to each label. When requirements 2 and
304 3 both apply, requirement 2 takes precedence.
306 4) Whenever two labels are compared, they MUST be considered to match
307 if and only if they are equivalent, that is, their ASCII forms
308 (obtained by applying ToASCII) match using a case-insensitive
309 ASCII comparison. Whenever two names are compared, they MUST be
310 considered to match if and only if their corresponding labels
311 match, regardless of whether the names use the same forms of label
312 separators.
314 3.2 Applicability
316 IDNA is applicable to all domain names in all domain name slots
317 except where it is explicitly excluded.
319 This implies that IDNA is applicable to many protocols that predate
320 IDNA. Note that IDNs occupying domain name slots in those protocols
321 MUST be in ASCII form (see section 3.1, requirement 2).
323 3.2.1. DNS resource records
325 IDNA does not apply to domain names in the NAME and RDATA fields of
326 DNS resource records whose CLASS is not IN. This exclusion applies
327 to every non-IN class, present and future, except where future
328 standards override this exclusion by explicitly inviting the use of
329 IDNA.
331 There are currently no other exclusions on the applicability of IDNA
332 to DNS resource records; it depends entirely on the CLASS, and not on
333 the TYPE. This will remain true, even as new types are defined,
334 unless there is a compelling reason for a new type to complicate
335 matters by imposing type-specific rules.
337 3.2.2. Non-domain-name data types stored in domain names
339 Although IDNA enables the representation of non-ASCII characters in
340 domain names, that does not imply that IDNA enables the
341 representation of non-ASCII characters in other data types that are
342 stored in domain names. For example, an email address local part is
343 sometimes stored in a domain label (hostmaster@example.com would be
344 represented as hostmaster.example.com in the RDATA field of an SOA
345 record). IDNA does not update the existing email standards, which
346 allow only ASCII characters in local parts. Therefore, unless the
347 email standards are revised to invite the use of IDNA for local
348 parts, a domain label that holds the local part of an email address
349 SHOULD NOT begin with the ACE prefix, and even if it does, it is to
350 be interpreted literally as a local part that happens to begin with
351 the ACE prefix.
353 4. Conversion operations
355 An application converts a domain name put into an IDN-unaware slot or
356 displayed to a user. This section specifies the steps to perform in
357 the conversion, and the ToASCII and ToUnicode operations.
359 The input to ToASCII or ToUnicode is a single label that is a
360 sequence of Unicode code points (remember that all ASCII code points
361 are also Unicode code points). If a domain name is represented using
362 a character set other than Unicode or US-ASCII, it will first need to
363 be transcoded to Unicode.
365 Starting from a whole domain name, the steps that an application
366 takes to do the conversions are:
368 1) Decide whether the domain name is a "stored string" or a "query
369 string" as described in [STRINGPREP]. If this conversion follows
370 the "queries" rule from [STRINGPREP], set the flag called
371 "AllowUnassigned".
373 2) Split the domain name into individual labels as described in
374 section 3.1. The labels do not include the separator.
376 3) For each label, decide whether or not to enforce the restrictions
377 on ASCII characters in host names [STD3]. (Applications already
378 faced this choice before the introduction of IDNA, and can
379 continue to make the decision the same way they always have; IDNA
380 makes no new recommendations regarding this choice.) If the
381 restrictions are to be enforced, set the flag called
382 "UseSTD3ASCIIRules" for that label.
384 4) Process each label with either the ToASCII or the ToUnicode
385 operation as appropriate. Typically, you use the ToASCII
386 operation if you are about to put the name into an IDN-unaware
387 slot, and you use the ToUnicode operation if you are displaying
388 the name to a user; section 3.1 gives greater detail on the
389 applicable requirements.
391 5) If ToASCII was applied in step 4 and dots are used as label
392 separators, change all the label separators to U+002E (full stop).
394 The following two subsections define the ToASCII and ToUnicode
395 operations that are used in step 4.
397 This description of the protocol uses specific procedure names, names
398 of flags, and so on, in order to facilitate the specification of the
399 protocol. These names, as well as the actual steps of the
400 procedures, are not required of an implementation. In fact, any
401 implementation which has the same external behavior as specified in
402 this document conforms to this specification.
404 4.1 ToASCII
406 The ToASCII operation takes a sequence of Unicode code points that
407 make up one label and transforms it into a sequence of code points in
408 the ASCII range (0..7F). If ToASCII succeeds, the original sequence
409 and the resulting sequence are equivalent labels.
411 It is important to note that the ToASCII operation can fail. ToASCII
412 fails if any step of it fails. If any step of the ToASCII operation
413 fails on any label in a domain name, that domain name MUST NOT be
414 used as an internationalized domain name. The method for dealing
415 with this failure is application-specific.
417 The inputs to ToASCII are a sequence of code points, the
418 AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
419 ToASCII is either a sequence of ASCII code points or a failure
420 condition.
422 ToASCII never alters a sequence of code points that are all in the
423 ASCII range to begin with (although it could fail). Applying the
424 ToASCII operation multiple times has exactly the same effect as
425 applying it just once.
427 ToASCII consists of the following steps:
429 1. If the sequence contains any code points outside the ASCII range
430 (0..7F) then proceed to step 2, otherwise skip to step 3.
432 2. Perform the steps specified in [NAMEPREP] and fail if there is an
433 error. The AllowUnassigned flag is used in [NAMEPREP].
435 3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
437 (a) Verify the absence of non-LDH ASCII code points; that is, the
438 absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
440 (b) Verify the absence of leading and trailing hyphen-minus; that
441 is, the absence of U+002D at the beginning and end of the
442 sequence.
444 4. If the sequence contains any code points outside the ASCII range
445 (0..7F) then proceed to step 5, otherwise skip to step 8.
447 5. Verify that the sequence does NOT begin with the ACE prefix.
449 6. Encode the sequence using the encoding algorithm in [PUNYCODE] and
450 fail if there is an error.
452 7. Prepend the ACE prefix.
454 8. Verify that the number of code points is in the range 1 to 63
455 inclusive (0 is excluded).
457 4.2 ToUnicode
459 The ToUnicode operation takes a sequence of Unicode code points that
460 make up one label and returns a sequence of Unicode code points. If
461 the input sequence is a label in ACE form, then the result is an
462 equivalent internationalized label that is not in ACE form, otherwise
463 the original sequence is returned unaltered.
465 ToUnicode never fails. If any step fails, then the original input
466 sequence is returned immediately in that step.
468 The Punycode decoder can never output more code points than it
469 inputs, but Nameprep can, and therefore ToUnicode can.
470 Note that the number of octets needed to represent a sequence of code
471 points depends on the particular character encoding used.
473 The inputs to ToUnicode are a sequence of code points, the
474 AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
475 ToUnicode is always a sequence of Unicode code points.
477 ToUnicode consists of the following steps:
479 1. If the sequence contains any code points outside the ASCII range
480 (0..7F) then proceed to step 2, otherwise skip to step 3.
482 2. Perform the steps specified in [NAMEPREP] and fail if there is an
483 error. (If step 3 of ToASCII is also performed here, it will not
484 affect the overall behavior of ToUnicode, but it is not
485 necessary.) The AllowUnassigned flag is used in [NAMEPREP].
487 3. Verify that the sequence begins with the ACE prefix, and save a
488 copy of the sequence.
490 4. Remove the ACE prefix.
492 5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
493 fail if there is an error. Save a copy of the result of this
494 step.
496 6. Apply ToASCII.
498 7. Verify that the result of step 6 matches the saved copy from step
499 3, using a case-insensitive ASCII comparison.
501 8. Return the saved copy from step 5.
503 5. ACE prefix
505 The ACE prefix, used in the conversion operations (section 4), is two
506 alphanumeric ASCII characters followed by two hyphen-minuses. It
507 cannot be any of the prefixes already used in earlier documents,
508 which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--",
509 "ra--", "wq--" and "zq--". The ToASCII and ToUnicode operations MUST
510 recognize the ACE prefix in a case-insensitive manner.
512 The ACE prefix for IDNA is "xn--" or any capitalization thereof.
514 This means that an ACE label might be "xn--de-jg4avhby1noc0d", where
515 "de-jg4avhby1noc0d" is the part of the ACE label that is generated by
516 the encoding steps in [PUNYCODE].
518 While all ACE labels begin with the ACE prefix, not all labels
519 beginning with the ACE prefix are necessarily ACE labels. Non-ACE
520 labels that begin with the ACE prefix will confuse users and SHOULD
521 NOT be allowed in DNS zones.
523 6. Implications for typical applications using DNS
525 In IDNA, applications perform the processing needed to input
526 internationalized domain names from users, display internationalized
527 domain names to users, and process the inputs and outputs from DNS
528 and other protocols that carry domain names.
530 The components and interfaces between them can be represented
531 pictorially as:
533 +------+
534 | User |
535 +------+
536 ^
537 | Input and display: local interface methods
538 | (pen, keyboard, glowing phosphorus, ...)
539 +-------------------|-------------------------------+
540 | v |
541 | +-----------------------------+ |
542 | | Application | |
543 | | (ToASCII and ToUnicode | |
544 | | operations may be | |
545 | | called here) | |
546 | +-----------------------------+ |
547 | ^ ^ | End system
548 | | | |
549 | Call to resolver: | | Application-specific |
550 | ACE | | protocol: |
551 | v | ACE unless the |
552 | +----------+ | protocol is updated |
553 | | Resolver | | to handle other |
554 | +----------+ | encodings |
555 | ^ | |
556 +-----------------|----------|----------------------+
557 DNS protocol: | |
558 ACE | |
559 v v
560 +-------------+ +---------------------+
561 | DNS servers | | Application servers |
562 +-------------+ +---------------------+
564 The box labeled "Application" is where the application splits a
565 domain name into labels, sets the appropriate flags, and performs the
566 ToASCII and ToUnicode operations. This is described in section 4.
568 6.1 Entry and display in applications
570 Applications can accept domain names using any character set or sets
571 desired by the application developer, and can display domain names in
572 any charset. That is, the IDNA protocol does not affect the
573 interface between users and applications.
575 An IDNA-aware application can accept and display internationalized
576 domain names in two formats: the internationalized character set(s)
577 supported by the application, and as an ACE label. ACE labels that
578 are displayed or input MUST always include the ACE prefix.
579 Applications MAY allow input and display of ACE labels, but are not
580 encouraged to do so except as an interface for special purposes,
581 possibly for debugging, or to cope with display limitations as
582 described in section 6.4. ACE encoding is opaque and ugly, and
583 should thus only be exposed to users who absolutely need it. Because
584 name labels encoded as ACE name labels can be rendered either as the
585 encoded ASCII characters or the proper decoded characters, the
586 application MAY have an option for the user to select the preferred
587 method of display; if it does, rendering the ACE SHOULD NOT be the
588 default.
590 Domain names are often stored and transported in many places. For
591 example, they are part of documents such as mail messages and web
592 pages. They are transported in many parts of many protocols, such as
593 both the control commands and the RFC 2822 body parts of SMTP, and
594 the headers and the body content in HTTP. It is important to
595 remember that domain names appear both in domain name slots and in
596 the content that is passed over protocols.
598 In protocols and document formats that define how to handle
599 specification or negotiation of charsets, labels can be encoded in
600 any charset allowed by the protocol or document format. If a
601 protocol or document format only allows one charset, the labels MUST
602 be given in that charset.
604 In any place where a protocol or document format allows transmission
605 of the characters in internationalized labels, internationalized
606 labels SHOULD be transmitted using whatever character encoding and
607 escape mechanism that the protocol or document format uses at that
608 place.
610 All protocols that use domain name slots already have the capacity
611 for handling domain names in the ASCII charset. Thus, ACE labels
612 (internationalized labels that have been processed with the ToASCII
613 operation) can inherently be handled by those protocols.
615 Displaying internationalized characters can be tricky for
616 applications regardless of whether the characters appear in free
617 text, in domain names, or in other protocol elements. The Unicode
618 standard encompasses many types of text that can cause display
619 problems, such as formatting characters, characters that combine with
620 one or more surrounding characters, characters whose direction of
621 display can change, strings whose logical order cannot be uniquely
622 inferred from their display order, and so on. IDNA requires the use
623 of Nameprep, which mitigates some of these issues, both in individual
624 domain labels and to a lesser extent in full domain names, but does
625 not eliminate all the issues (and does nothing to mitigate them in
626 text outside of domain names).
628 6.2 Applications and resolver libraries
630 Applications normally use functions in the operating system when they
631 resolve DNS queries. Those functions in the operating system are
632 often called "the resolver library", and the applications communicate
633 with the resolver libraries through a programming interface (API).
635 Because these resolver libraries today expect only domain names in
636 ASCII, applications MUST prepare labels that are passed to the
637 resolver library using the ToASCII operation. Labels received from
638 the resolver library contain only ASCII characters; internationalized
639 labels that cannot be represented directly in ASCII use the ACE form.
640 ACE labels always include the ACE prefix.
642 An operating system might have a set of libraries for performing the
643 ToASCII operation. The input to such a library might be in one or
644 more charsets that are used in applications (UTF-8 and UTF-16 are
645 likely candidates for almost any operating system, and script-
646 specific charsets are likely for localized operating systems).
648 IDNA-aware applications MUST be able to work with both non-
649 internationalized labels (those that conform to [STD13] and [STD3])
650 and internationalized labels.
652 It is expected that new versions of the resolver libraries in the
653 future will be able to accept domain names in other charsets than
654 ASCII, and application developers might one day pass not only domain
655 names in Unicode, but also in local script to a new API for the
656 resolver libraries in the operating system. Thus the ToASCII and
657 ToUnicode operations might be performed inside these new versions of
658 the resolver libraries.
660 Domain names passed to resolvers or put into the question section of
661 DNS requests follow the rules for "queries" from [STRINGPREP].
663 6.3 DNS servers
665 Domain names stored in zones follow the rules for "stored strings"
666 from [STRINGPREP].
668 For internationalized labels that cannot be represented directly in
669 ASCII, DNS servers MUST use the ACE form produced by the ToASCII
670 operation. All IDNs served by DNS servers MUST contain only ASCII
671 characters.
673 If a signaling system which makes negotiation possible between old
674 and new DNS clients and servers is standardized in the future, the
675 encoding of the query in the DNS protocol itself can be changed from
676 ACE to something else, such as UTF-8. The question whether or not
677 this should be used is, however, a separate problem and is not
678 discussed in this memo.
680 6.4 Avoiding exposing users to the raw ACE encoding
682 Any application that might show the user a domain name obtained from
683 a domain name slot, such as from gethostbyaddr or part of a mail
684 header, will need to be updated if it is to prevent users from seeing
685 the ACE.
687 If an application decodes an ACE name using ToUnicode but cannot show
688 all of the characters in the decoded name, such as if the name
689 contains characters that the output system cannot display, the
690 application SHOULD show the name in ACE format (which always includes
691 the ACE prefix) instead of displaying the name with the replacement
692 character (U+FFFD). This is to make it easier for the user to
693 transfer the name correctly to other programs. Programs that by
694 default show the ACE form when they cannot show all the characters in
695 a name label SHOULD also have a mechanism to show the name that is
696 produced by the ToUnicode operation with as many characters as
697 possible and replacement characters in the positions where characters
698 cannot be displayed.
700 The ToUnicode operation does not alter labels that are not valid ACE
701 labels, even if they begin with the ACE prefix. After ToUnicode has
702 been applied, if a label still begins with the ACE prefix, then it is
703 not a valid ACE label, and is not equivalent to any of the
704 intermediate Unicode strings constructed by ToUnicode.
706 6.5 DNSSEC authentication of IDN domain names
708 DNS Security [RFC2535] is a method for supplying cryptographic
709 verification information along with DNS messages. Public Key
710 Cryptography is used in conjunction with digital signatures to
711 provide a means for a requester of domain information to authenticate
712 the source of the data. This ensures that it can be traced back to a
713 trusted source, either directly, or via a chain of trust linking the
714 source of the information to the top of the DNS hierarchy.
716 IDNA specifies that all internationalized domain names served by DNS
717 servers that cannot be represented directly in ASCII must use the ACE
718 form produced by the ToASCII operation. This operation must be
719 performed prior to a zone being signed by the private key for that
720 zone. Because of this ordering, it is important to recognize that
721 DNSSEC authenticates the ASCII domain name, not the Unicode form or
722 the mapping between the Unicode form and the ASCII form. In the
723 presence of DNSSEC, this is the name that MUST be signed in the zone
724 and MUST be validated against.
726 One consequence of this for sites deploying IDNA in the presence of
727 DNSSEC is that any special purpose proxies or forwarders used to
728 transform user input into IDNs must be earlier in the resolution flow
729 than DNSSEC authenticating nameservers for DNSSEC to work.
731 7. Name server considerations
733 Existing DNS servers do not know the IDNA rules for handling non-
734 ASCII forms of IDNs, and therefore need to be shielded from them.
735 All existing channels through which names can enter a DNS server
736 database (for example, master files [STD13] and DNS update messages
737 [RFC2136]) are IDN-unaware because they predate IDNA, and therefore
738 requirement 2 of section 3.1 of this document provides the needed
739 shielding, by ensuring that internationalized domain names entering
740 DNS server databases through such channels have already been
741 converted to their equivalent ASCII forms.
743 It is imperative that there be only one ASCII encoding for a
744 particular domain name. Because of the design of the ToASCII and
745 ToUnicode operations, there are no ACE labels that decode to ASCII
746 labels, and therefore name servers cannot contain multiple ASCII
747 encodings of the same domain name.
749 [RFC2181] explicitly allows domain labels to contain octets beyond
750 the ASCII range (0..7F), and this document does not change that.
751 Note, however, that there is no defined interpretation of octets
752 80..FF as characters. If labels containing these octets are returned
753 to applications, unpredictable behavior could result. The ASCII form
754 defined by ToASCII is the only standard representation for
755 internationalized labels in the current DNS protocol.
757 8. Root server considerations
759 IDNs are likely to be somewhat longer than current domain names, so
760 the bandwidth needed by the root servers is likely to go up by a
761 small amount. Also, queries and responses for IDNs will probably be
762 somewhat longer than typical queries today, so more queries and
763 responses may be forced to go to TCP instead of UDP.
765 9. References
767 9.1 Normative References
769 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
770 Requirement Levels", BCP 14, RFC 2119, March 1997.
772 [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
773 Internationalized Strings ("stringprep")",
774 draft-hoffman-rfc3454bis.
776 [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
777 Profile for Internationalized Domain Names (IDN)",
778 draft-hoffman-rfc3491bis.
780 [PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of
781 Unicode for use with Internationalized Domain Names in
782 Applications (IDNA)", draft-costello-rfc3492bis.
784 [STD3] Braden, R., "Requirements for Internet Hosts --
785 Communication Layers", STD 3, RFC 1122, and
786 "Requirements for Internet Hosts -- Application and
787 Support", STD 3, RFC 1123, October 1989.
789 [STD13] Mockapetris, P., "Domain names - concepts and
790 facilities", STD 13, RFC 1034 and "Domain names -
791 implementation and specification", STD 13, RFC 1035,
792 November 1987.
794 9.2 Informative References
796 [IESG-STATEMENT] "IESG Statement on IDN", February 2003,
797 .
799 [RFC2535] Eastlake, D., "Domain Name System Security Extensions",
800 RFC 2535, March 1999.
802 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
803 Specification", RFC 2181, July 1997.
805 [UNICODE] The Unicode Consortium. The Unicode Standard, Version
806 3.2.0 is defined by The Unicode Standard, Version 3.0
807 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
808 as amended by the Unicode Standard Annex #27: Unicode
809 3.1 (http://www.unicode.org/reports/tr27/) and by the
810 Unicode Standard Annex #28: Unicode 3.2
811 (http://www.unicode.org/reports/tr28/).
813 [USASCII] Cerf, V., "ASCII format for Network Interchange", RFC
814 20, October 1969.
816 10. Security Considerations
818 Security on the Internet partly relies on the DNS. Thus, any change
819 to the characteristics of the DNS can change the security of much of
820 the Internet.
822 This memo describes an algorithm which encodes characters that are
823 not valid according to STD3 and STD13 into octet values that are
824 valid. No security issues such as string length increases or new
825 allowed values are introduced by the encoding process or the use of
826 these encoded values, apart from those introduced by the ACE encoding
827 itself.
829 Domain names are used by users to identify and connect to Internet
830 servers. The security of the Internet is compromised if a user
831 entering a single internationalized name is connected to different
832 servers based on different interpretations of the internationalized
833 domain name.
835 When systems use local character sets other than ASCII and Unicode,
836 this specification leaves the the problem of transcoding between the
837 local character set and Unicode up to the application. If different
838 applications (or different versions of one application) implement
839 different transcoding rules, they could interpret the same name
840 differently and contact different servers. This problem is not
841 solved by security protocols like TLS that do not take local
842 character sets into account.
844 Because this document normatively refers to [NAMEPREP], [PUNYCODE],
845 and [STRINGPREP], it includes the security considerations from those
846 documents as well.
848 If or when this specification is updated to use a more recent Unicode
849 normalization table, the new normalization table will need to be
850 compared with the old to spot backwards incompatible changes. If
851 there are such changes, they will need to be handled somehow, or
852 there will be security as well as operational implications. Methods
853 to handle the conflicts could include keeping the old normalization,
854 or taking care of the conflicting characters by operational means, or
855 some other method.
857 Implementations MUST NOT use more recent normalization tables than
858 the one referenced from this document, even though more recent tables
859 may be provided by operating systems. If an application is unsure of
860 which version of the normalization tables are in the operating
861 system, the application needs to include the normalization tables
862 itself. Using normalization tables other than the one referenced
863 from this specification could have security and operational
864 implications.
866 To help prevent confusion between characters that are visually
867 similar, it is suggested that implementations provide visual
868 indications where a domain name contains multiple scripts. Such
869 mechanisms can also be used to show when a name contains a mixture of
870 simplified and traditional Chinese characters, or to distinguish zero
871 and one from O and l. DNS zone adminstrators may impose restrictions
872 (subject to the limitations in section 2) that try to minimize
873 homographs.
875 Domain names (or portions of them) are sometimes compared against a
876 set of privileged or anti-privileged domains. In such situations it
877 is especially important that the comparisons be done properly, as
878 specified in section 3.1 requirement 4. For labels already in ASCII
879 form, the proper comparison reduces to the same case-insensitive
880 ASCII comparison that has always been used for ASCII labels.
882 The introduction of IDNA means that any existing labels that start
883 with the ACE prefix and would be altered by ToUnicode will
884 automatically be ACE labels, and will be considered equivalent to
885 non-ASCII labels, whether or not that was the intent of the zone
886 adminstrator or registrant.
888 11. IANA Considerations
890 IANA has assigned the ACE prefix "xn--" in consultation with the
891 IESG.
893 12. Authors' Addresses
895 Patrik Faltstrom
896 Cisco Systems
897 Arstaangsvagen 31 J
898 S-117 43 Stockholm Sweden
900 EMail: paf@cisco.com
902 Paul Hoffman
903 Internet Mail Consortium and VPN Consortium
904 127 Segre Place
905 Santa Cruz, CA 95060 USA
907 EMail: phoffman@imc.org
909 Adam M. Costello
910 University of California, Berkeley
912 URL: http://www.nicemice.net/amc/
914 A. Changes from RFC 3490
916 This document is a revision of RFC 3490. None of the changes affect the
917 protocol described in RFC 3490; that is, all implementations of RFC 3490
918 will be identical with implementations of the specification in this
919 document. The items that have changed RFC 3490 document are:
921 - The last line of section 1 has a grammatical fix (user's -> users').
923 - Added a note in section 1 about the IESG statement on IDNA, and
924 added a reference to it.
926 - In section 3.1 rule 3, fixed spelling of "unintelligle" to
927 "unintelligible".
929 - In step 8 of section 4.1, added "(0 is excluded)" to clarify.
931 - In section 4.2, the first sentence of the third paragraph was
932 incorrect. It has been replaced with a sentence that is both
933 correct and more descriptive.
935 - Added "ToUnicode consists of the following steps:" before the steps
936 in section 4.2.
938 - Changed wording of step 1 of section 4.2 to match the wording in section
939 4.1 (the result is identical).
941 - Added the last paragraph in section 6.1 to acknowledge that some Unicode
942 display issues are tricky, but they are not specific to IDNA.
944 - The sentence in section 11 now says the sequence that was chosen.