< draft-ietf-idnabis-rationale-13.txt   draft-ietf-idnabis-rationale-14.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft September 13, 2009 Internet-Draft October 25, 2009
Intended status: Informational Intended status: Informational
Expires: March 17, 2010 Expires: April 28, 2010
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-13.txt draft-ietf-idnabis-rationale-14.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. This document may contain material provisions of BCP 78 and BCP 79. This document may contain material
from IETF Documents or IETF Contributions published or made publicly from IETF Documents or IETF Contributions published or made publicly
available before November 10, 2008. The person(s) controlling the available before November 10, 2008. The person(s) controlling the
copyright in some of this material may not have granted the IETF copyright in some of this material may not have granted the IETF
Trust the right to allow modifications of such material outside the Trust the right to allow modifications of such material outside the
IETF Standards Process. Without obtaining an adequate license from IETF Standards Process. Without obtaining an adequate license from
skipping to change at page 1, line 43 skipping to change at page 1, line 43
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 17, 2010. This Internet-Draft will expire on April 28, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info). publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 23 skipping to change at page 2, line 23
Several years have passed since the original protocol for Several years have passed since the original protocol for
Internationalized Domain Names (IDNs) was completed and deployed. Internationalized Domain Names (IDNs) was completed and deployed.
During that time, a number of issues have arisen, including the need During that time, a number of issues have arisen, including the need
to update the system to deal with newer versions of Unicode. Some of to update the system to deal with newer versions of Unicode. Some of
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 5
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 6
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 1.3.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 6
1.3.2. New Terminology and Restrictions . . . . . . . . . . . 6 1.3.2. New Terminology and Restrictions . . . . . . . . . . . 7
1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 8
1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 9
2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 10
3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 10
3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 3.1. A Tiered Model of Permitted Characters and Labels . . . . 11
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 11
3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 12
3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 3.1.2.1. Contextual Restrictions . . . . . . . . . . . . . 12
3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2.2. Rules and Their Application . . . . . . . . . . . 13
3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 13
3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 14
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 14 Applications . . . . . . . . . . . . . . . . . . . . . . . 15
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 16
4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 4.1. Display and Network Order . . . . . . . . . . . . . . . . 16
4.2. Entry and Display in Applications . . . . . . . . . . . . 16 4.2. Entry and Display in Applications . . . . . . . . . . . . 17
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 18 Alternate Character Forms . . . . . . . . . . . . . . . . 19
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 21
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 22
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 22
6. Front-end and User Interface Processing for Lookup . . . . . . 22 6. Front-end and User Interface Processing for Lookup . . . . . . 23
7. Migration from IDNA2003 and Unicode Version Synchronization . 24 7. Migration from IDNA2003 and Unicode Version Synchronization . 25
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 25
7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25
7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26
7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27
7.2. Changes in Character Interpretations . . . . . . . . . . . 28 7.2. Changes in Character Interpretations . . . . . . . . . . . 29
7.3. Character Mapping . . . . . . . . . . . . . . . . . . . . 29 7.3. Character Mapping . . . . . . . . . . . . . . . . . . . . 30
7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 29 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30
7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 29 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30
7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 30 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31
7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 30 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31
7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32
7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32
7.7. Migration Between Unicode Versions: Unassigned Code 7.7. Migration Between Unicode Versions: Unassigned Code
Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 36
8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 36
8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 35 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 36
8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 36 8.2. Root and other DNS Server Considerations . . . . . . . . . 37
8.3. Root and other DNS Server Considerations . . . . . . . . . 36 9. Internationalization Considerations . . . . . . . . . . . . . 37
9. Internationalization Considerations . . . . . . . . . . . . . 36
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37
10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38
10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38
11. Security Considerations . . . . . . . . . . . . . . . . . . . 38 11. Security Considerations . . . . . . . . . . . . . . . . . . . 38
11.1. General Security Issues with IDNA . . . . . . . . . . . . 38 11.1. General Security Issues with IDNA . . . . . . . . . . . . 38
12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38
13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
14.1. Normative References . . . . . . . . . . . . . . . . . . . 39 14.1. Normative References . . . . . . . . . . . . . . . . . . . 40
14.2. Informative References . . . . . . . . . . . . . . . . . . 40 14.2. Informative References . . . . . . . . . . . . . . . . . . 41
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 42 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 43
A.1. Changes between Version -00 and Version -01 of A.1. Changes between Version -00 and Version -01 of
draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43 draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43
A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 43 A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 44
A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 43 A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 44
A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44 A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44
A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 44 A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 45
A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 45 A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 45
A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 45 A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46
A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 45 A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46
A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46 A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46
A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 46 A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47
A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 46 A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 47
A.12. Version -12 . . . . . . . . . . . . . . . . . . . . . . . 47 A.12. Version -12 . . . . . . . . . . . . . . . . . . . . . . . 47
A.13. Version -13 . . . . . . . . . . . . . . . . . . . . . . . 47 A.13. Version -13 . . . . . . . . . . . . . . . . . . . . . . . 48
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.14. Version -14 . . . . . . . . . . . . . . . . . . . . . . . 48
A.15. Version -14 . . . . . . . . . . . . . . . . . . . . . . . 48
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 48
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
Internationalized Domain Names in Applications (IDNA) is a collection Internationalized Domain Names in Applications (IDNA) is a collection
of standards that allow client applications to convert some Unicode of standards that allow client applications to convert some Unicode
mnemonics to an ASCII-compatible encoding form ("ACE") which is a mnemonics to an ASCII-compatible encoding form ("ACE") which is a
valid DNS label containing only letters, digits, and hyphens. The valid DNS label containing only letters, digits, and hyphens. The
specific form of ACE label used by IDNA is called an "A-label". A specific form of ACE label used by IDNA is called an "A-label". A
skipping to change at page 6, line 24 skipping to change at page 7, line 24
label that may appear to meet certain definitional constraints but label that may appear to meet certain definitional constraints but
has not yet been sufficiently tested for validity. has not yet been sufficiently tested for validity.
These definitions are also illustrated in Figure 1 of the Definitions These definitions are also illustrated in Figure 1 of the Definitions
Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and
fourth character from the beginning of the label. In IDNA-aware fourth character from the beginning of the label. In IDNA-aware
applications, only a subset of these reserved labels is permitted to applications, only a subset of these reserved labels is permitted to
be used, namely the A-label subset. A-labels are a subset of the be used, namely the A-label subset. A-labels are a subset of the
R-LDH-labels that begin with the case-insensitive string "xn--". R-LDH-labels that begin with the case-insensitive string "xn--".
Labels that bear this prefix but which are not otherwise valid fall Labels that bear this prefix but which are not otherwise valid fall
into the "Fake-A-label" category. The non-reserved labels (NR-LDH- into the "Fake A-label" category. The non-reserved labels (NR-LDH-
labels) are implicitly valid since they do not trigger any labels) are implicitly valid since they do not bear any resemblance
resemblance to IDNA-landr NR-LDH-labels. to the labels specified by IDNA.
The creation of the Reserved-LDH category is required for three The creation of the Reserved-LDH category is required for three
reasons: reasons:
o to prevent confusion with pre-IDNA coding forms; o to prevent confusion with pre-IDNA coding forms;
o to permit future extensions that would require changing the o to permit future extensions that would require changing the
prefix, no matter how unlikely those might be (see Section 7.4); prefix, no matter how unlikely those might be (see Section 7.4);
and and
skipping to change at page 7, line 24 skipping to change at page 8, line 24
algorithms. algorithms.
1.5. Applicability and Function of IDNA 1.5. Applicability and Function of IDNA
The IDNA specification solves the problem of extending the repertoire The IDNA specification solves the problem of extending the repertoire
of characters that can be used in domain names to include a large of characters that can be used in domain names to include a large
subset of the Unicode repertoire. subset of the Unicode repertoire.
IDNA does not extend DNS. Instead, the applications (and, by IDNA does not extend DNS. Instead, the applications (and, by
implication, the users) continue to see an exact-match lookup implication, the users) continue to see an exact-match lookup
service. Either there is a single exactly-matching (subject to the service. Either there is a single exactly-matching name (subject to
base DNS requirement of case-insensitive ASCII matching) name or the base DNS requirement of case-insensitive ASCII matching) or there
there is no match. This model has served the existing applications is no match. This model has served the existing applications well,
well, but it requires, with or without internationalized domain but it requires, with or without internationalized domain names, that
names, that users know the exact spelling of the domain names that users know the exact spelling of the domain names that are to be
are to be typed into applications such as web browsers and mail user typed into applications such as web browsers and mail user agents.
agents. The introduction of the larger repertoire of characters The introduction of the larger repertoire of characters potentially
potentially makes the set of misspellings larger, especially given makes the set of misspellings larger, especially given that in some
that in some cases the same appearance, for example on a business cases the same appearance, for example on a business card, might
card, might visually match several Unicode code points or several visually match several Unicode code points or several sequences of
sequences of code points. code points.
The IDNA standard does not require any applications to conform to it, The IDNA standard does not require any applications to conform to it,
nor does it retroactively change those applications. An application nor does it retroactively change those applications. An application
can elect to use IDNA in order to support IDN while maintaining can elect to use IDNA in order to support IDN while maintaining
interoperability with existing infrastructure. If an application interoperability with existing infrastructure. If an application
wants to use non-ASCII characters in public DNS domain names, IDNA is wants to use non-ASCII characters in public DNS domain names, IDNA is
the only currently-defined option. Adding IDNA support to an the only currently-defined option. Adding IDNA support to an
existing application entails changes to the application only, and existing application entails changes to the application only, and
leaves room for flexibility in front-end processing and more leaves room for flexibility in front-end processing and more
specifically in the user interface (see Section 6). specifically in the user interface (see Section 6).
A great deal of the discussion of IDN solutions has focused on A great deal of the discussion of IDN solutions has focused on
transition issues and how IDNs will work in a world where not all of transition issues and how IDNs will work in a world where not all of
the components have been updated. Proposals that were not chosen by the components have been updated. Proposals that were not chosen by
the original IDN Working Group would have depended on updating of the original IDN Working Group would have depended on updating of
user applications, DNS resolvers, and DNS servers in order for a user user applications, DNS resolvers, and DNS servers in order for a user
to apply an internationalized domain name in any form or coding to apply an internationalized domain name in any form or coding
acceptable under that method. While processing must be performed acceptable under that method. While processing must be performed
prior to or after access to the DNS, IDNA requires no changes to the prior to or after access to the DNS, IDNA requires no changes to the
DNS protocol or any DNS servers or the resolvers on user's computers. DNS protocol, any DNS servers, or the resolvers on users' computers.
IDNA allows the graceful introduction of IDNs not only by avoiding IDNA allows the graceful introduction of IDNs not only by avoiding
upgrades to existing infrastructure (such as DNS servers and mail upgrades to existing infrastructure (such as DNS servers and mail
transport agents), but also by allowing some limited use of IDNs in transport agents), but also by allowing some limited use of IDNs in
applications by using the ASCII-encoded representation of the labels applications by using the ASCII-encoded representation of the labels
containing non-ASCII characters. While such names are user- containing non-ASCII characters. While such names are user-
unfriendly to read and type, and hence not optimal for user input, unfriendly to read and type, and hence not optimal for user input,
they can be used as a last resort to allow rudimentary IDN usage. they can be used as a last resort to allow rudimentary IDN usage.
For example, they might be the best choice for display if it were For example, they might be the best choice for display if it were
known that relevant fonts were not available on the user's computer. known that relevant fonts were not available on the user's computer.
skipping to change at page 11, line 41 skipping to change at page 12, line 41
rule itself is to be applied on lookup as well as registration. rule itself is to be applied on lookup as well as registration.
A distinction is made between characters that indicate or prohibit A distinction is made between characters that indicate or prohibit
joining and ones similar to them (known as "CONTEXT-JOINER" or joining and ones similar to them (known as "CONTEXT-JOINER" or
"CONTEXTJ") and other characters requiring contextual treatment "CONTEXTJ") and other characters requiring contextual treatment
("CONTEXT-OTHER" or "CONTEXTO"). Only the former require full ("CONTEXT-OTHER" or "CONTEXTO"). Only the former require full
testing at lookup time. testing at lookup time.
It is important to note that these contextual rules cannot prevent It is important to note that these contextual rules cannot prevent
all uses of the relevant characters that might be confusing or all uses of the relevant characters that might be confusing or
problematic. What they are expected do is to confine applicability problematic. What they are expected to do is to confine
of the characters to scripts (and narrower contexts) where zone applicability of the characters to scripts (and narrower contexts)
administrators are knowledgeable enough about the use of those where zone administrators are knowledgeable enough about the use of
characters to be prepared to deal with them appropriately. those characters to be prepared to deal with them appropriately.
For example, a registry dealing with an Indic script that requires For example, a registry dealing with an Indic script that requires
ZWJ and/or ZWNJ as part of the writing system is expected to ZWJ and/or ZWNJ as part of the writing system is expected to
understand where the characters have visible effect and where they do understand where the characters have visible effect and where they do
not and to make registration rules accordingly. By contrast, a not and to make registration rules accordingly. By contrast, a
registry dealing primarily with Latin or Cyrillic script might not be registry dealing primarily with Latin or Cyrillic script might not be
actively aware that the characters exist, much less about the actively aware that the characters exist, much less about the
consequences of embedding them in labels drawn from those scripts and consequences of embedding them in labels drawn from those scripts and
therefore should avoid accepting registrations containing those therefore should avoid accepting registrations containing those
characters, at least in Latin or Cyrillic-script labels. characters, at least in Latin or Cyrillic-script labels.
skipping to change at page 12, line 19 skipping to change at page 13, line 19
Rules have descriptions such as "Must follow a character from Script Rules have descriptions such as "Must follow a character from Script
XYZ", "Must occur only if the entire label is in Script ABC", or XYZ", "Must occur only if the entire label is in Script ABC", or
"Must occur only if the previous and subsequent characters have the "Must occur only if the previous and subsequent characters have the
DFG property". The actual rules may be DEFINED or NULL. If present, DFG property". The actual rules may be DEFINED or NULL. If present,
they may have values of "True" (character may be used in any position they may have values of "True" (character may be used in any position
in any label), "False" (character may not be used in any label), or in any label), "False" (character may not be used in any label), or
may be a set of procedural rules that specify the context in which may be a set of procedural rules that specify the context in which
the character is permitted. the character is permitted.
Examples of descriptions of typical rules, stated informally and in
English, include "Must follow a character from Script XYZ", "Must
occur only if the entire label is in Script ABC", "Must occur only if
the previous and subsequent characters have the DFG property".
Because it is easier to identify these characters than to know that Because it is easier to identify these characters than to know that
they are actually needed in IDNs or how to establish exactly the they are actually needed in IDNs or how to establish exactly the
right rules for each one, a rule may have a null value in a given right rules for each one, a rule may have a null value in a given
version of the tables. Characters associated with null rules are not version of the tables. Characters associated with null rules are not
permitted to appear in putative labels for either registration or permitted to appear in putative labels for either registration or
lookup. Of course, a later version of the tables might contain a lookup. Of course, a later version of the tables might contain a
non-null rule. non-null rule.
The actual rules and their descriptions are in Sections 2 and 3 of The actual rules and their descriptions are in Sections 2 and 3 of
[IDNA2008-Tables]. That document also specifies the creation of a [IDNA2008-Tables]. That document also specifies the creation of a
skipping to change at page 13, line 43 skipping to change at page 14, line 37
not have assigned values in a given version of Unicode are treated as not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points are belonging to a special UNASSIGNED category. Such code points are
prohibited in labels to be registered or looked up. The category prohibited in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories (at which point, they are classified into one of the other categories
as appropriate). as appropriate).
The rationale for restricting the processing of UNASSIGNED characters The rationale for restricting the processing of UNASSIGNED characters
is simply that the properties of such code points cannot be is simply that the properties of such code points cannot be
completely known until actual characters are assigned to them. If, completely known until actual characters are assigned to them. For
for example, such a code point was permitted to be included in a example, assume that an UNASSIGNED code point were included in a
label to be looked up, and the code point was later to be assigned to label to be looked up. Assume that the code point was later assigned
a character that required some set of contextual rules, un-updated to a character that required some set of contextual rules. With that
instances of IDNA-aware software might permit lookup of labels combination, un-updated instances of IDNA-aware software might permit
containing the previously-unassigned characters while updated lookup of labels containing the previously-unassigned characters
versions of IDNA-aware software might restrict their use in lookup, while updated versions of the software might restrict use of the same
depending on the contextual rules. It should be clear that under no label in lookup, depending on the contextual rules. It should be
circumstance should an UNASSIGNED character be permitted in a label clear that under no circumstance should an UNASSIGNED character be
to be registered as part of a domain name. permitted in a label to be registered as part of a domain name.
3.2. Registration Policy 3.2. Registration Policy
While these recommendations cannot and should not define registry While these recommendations cannot and should not define registry
policies, registries should develop and apply additional restrictions policies, registries should develop and apply additional restrictions
as needed to reduce confusion and other problems. For example, it is as needed to reduce confusion and other problems. For example, it is
generally believed that labels containing characters from more than generally believed that labels containing characters from more than
one script are a bad practice although there may be some important one script are a bad practice although there may be some important
exceptions to that principle. Some registries may choose to restrict exceptions to that principle. Some registries may choose to restrict
registrations to characters drawn from a very small number of registrations to characters drawn from a very small number of
scripts. For many scripts, the use of variant techniques such as scripts. For many scripts, the use of variant techniques such as
those as described in RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and those as described in RFC 3743 [RFC3743] and RFC 4290 [RFC4290], and
illustrated for Chinese by the tables described in RFC 4713 [RFC4713] illustrated for Chinese by the tables described in RFC 4713 [RFC4713]
may be helpful in reducing problems that might be perceived by users. may be helpful in reducing problems that might be perceived by users.
In general, users will benefit if registries only permit characters In general, users will benefit if registries only permit characters
from scripts that are well-understood by the registry or its from scripts that are well-understood by the registry or its
advisers. If a registry decides to reduce opportunities for advisers. If a registry decides to reduce opportunities for
confusion by constructing policies that disallow characters used in confusion by constructing policies that disallow characters used in
historic writing systems or characters whose use is restricted to historic writing systems or characters whose use is restricted to
specialized, highly technical contexts, some relevant information may specialized, highly technical contexts, some relevant information may
be found in Section 2.4 "Specific Character Adjustments", Table 4 be found in Section 2.4 "Specific Character Adjustments", Table 4
"Candidate Characters for Exclusion from Identifiers" of "Candidate Characters for Exclusion from Identifiers" of
[Unicode-UAX31] and Section 3.1. "General Security Profile for [Unicode-UAX31] and Section 3.1. "General Security Profile for
Identifiers" in [Unicode-Security]. Identifiers" in [Unicode-Security].
The requirement (in Section 4.1 of [IDNA2008-Protocol]) that The requirement (in Section 4.1 of [IDNA2008-Protocol]) that
registration procedures use only U-labels and/or A-labels is intended registration procedures use only U-labels and/or A-labels is intended
to ensure that registrants are fully aware of exactly what is being to ensure that registrants are fully aware of exactly what is being
registered as well as encouraging use of those canonical forms. That registered as well as encouraging use of those canonical forms. That
provision should not be interpreted as requiring that registrant need provision should not be interpreted as requiring that registrants
to provide characters in a particular code sequence. Registrant need to provide characters in a particular code sequence. Registrant
input conventions and management are part of registrant-registrar input conventions and management are part of registrant-registrar
interactions and relationships between registries and registrars and interactions and relationships between registries and registrars and
are outside the scope of these standards. are outside the scope of these standards.
It is worth stressing that these principles of policy development and It is worth stressing that these principles of policy development and
application apply at all levels of the DNS, not only, e.g., TLD or application apply at all levels of the DNS, not only, e.g., TLD or
SLD registrations. Even a trivial, "anything is permitted that is SLD registrations. Even a trivial, "anything is permitted that is
valid under the protocol" policy is helpful in that it helps users valid under the protocol" policy is helpful in that it helps users
and application developers know what to expect. and application developers know what to expect.
skipping to change at page 16, line 47 skipping to change at page 17, line 42
A-labels or U-labels, the application may reasonably have an option A-labels or U-labels, the application may reasonably have an option
for the user to select the preferred method of display. Rendering for the user to select the preferred method of display. Rendering
the U-label should normally be the default. the U-label should normally be the default.
Domain names are often stored and transported in many places. For Domain names are often stored and transported in many places. For
example, they are part of documents such as mail messages and web example, they are part of documents such as mail messages and web
pages. They are transported in many parts of many protocols, such as pages. They are transported in many parts of many protocols, such as
both the control commands of SMTP and associated message body parts, both the control commands of SMTP and associated message body parts,
and in the headers and the body content in HTTP. It is important to and in the headers and the body content in HTTP. It is important to
remember that domain names appear both in domain name slots and in remember that domain names appear both in domain name slots and in
the content that is passed over protocols. the content that is passed over protocols and it would be helpful if
protocols explicitly define what their domain name slots are.
In protocols and document formats that define how to handle In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in specification or negotiation of charsets, labels can be encoded in
any charset allowed by the protocol or document format. If a any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, the labels must protocol or document format only allows one charset, the labels must
be given in that charset. Of course, not all charsets can properly be given in that charset. Of course, not all charsets can properly
represent all labels. If a U-label cannot be displayed in its represent all labels. If a U-label cannot be displayed in its
entirety, the only choice (without loss of information) may be to entirety, the only choice (without loss of information) may be to
display the A-label. display the A-label.
skipping to change at page 21, line 6 skipping to change at page 21, line 50
Unicode case folding operation maps Greek Final Form Sigma (U+03C2) Unicode case folding operation maps Greek Final Form Sigma (U+03C2)
to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF)
to "ss". Neither of these mappings is reversible because the upper to "ss". Neither of these mappings is reversible because the upper
case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII
string. IDNA2008 permits, at the risk of some incompatibility, string. IDNA2008 permits, at the risk of some incompatibility,
slightly more flexibility in this area by avoiding case folding and slightly more flexibility in this area by avoiding case folding and
treating these characters as themselves. Approaches to handling one- treating these characters as themselves. Approaches to handling one-
way mappings are discussed in Section 7.2. way mappings are discussed in Section 7.2.
Because IDNA2003 maps Final Sigma and Eszett to other characters, and Because IDNA2003 maps Final Sigma and Eszett to other characters, and
the reverse mapping is never possible, that in some sense means that the reverse mapping is never possible, neither Final Sigma nor Eszett
neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN. can be represented in the ACE form of IDNA2003 IDN nor in the native
With IDNA2008, both characters can be used in an IDN and so the character (U-label) form derived from it. With IDNA2008, both
A-label used for lookup for any U-label containing those characters, characters can be used in an IDN and so the A-label used for lookup
is now different. See Section 7.1 for a discussion of what kinds of for any U-label containing those characters, is now different. See
changes might require the IDNA prefix to change; after extended Section 7.1 for a discussion of what kinds of changes might require
discussions, the WG came to consensus that the change for these the IDNA prefix to change; after extended discussions, the WG came to
characters did not justify a prefix change. consensus that the change for these characters did not justify a
prefix change.
4.5. Right to Left Text 4.5. Right to Left Text
In order to be sure that the directionality of right to left text is In order to be sure that the directionality of right to left text is
unambiguous, IDNA2003 required that any label in which right to left unambiguous, IDNA2003 required that any label in which right to left
characters appear both starts and ends with them and that it not characters appear both starts and ends with them and that it not
include any characters with strong left to right properties (that include any characters with strong left to right properties (that
excludes other alphabetic characters but permits European digits). excludes other alphabetic characters but permits European digits).
Any other string that contains a right to left character and does not Any other string that contains a right to left character and does not
meet those requirements is rejected. This is one of the few places meet those requirements is rejected. This is one of the few places
skipping to change at page 25, line 8 skipping to change at page 25, line 49
scripts, and other character collections as they are incorporated scripts, and other character collections as they are incorporated
into Unicode, doing so without disruption and, in the long term, into Unicode, doing so without disruption and, in the long term,
without "heavy" processes (an IETF consensus process is required without "heavy" processes (an IETF consensus process is required
by the IDNA2008 specifications and is expected to be required and by the IDNA2008 specifications and is expected to be required and
used until significant experience accumulates with IDNA operations used until significant experience accumulates with IDNA operations
and new versions of Unicode). and new versions of Unicode).
7.1.1. Summary and Discussion of IDNA Validity Criteria 7.1.1. Summary and Discussion of IDNA Validity Criteria
The general criteria for a label to be considered valid under IDNA The general criteria for a label to be considered valid under IDNA
are (the actual rules are rigorously defined in the "Protocol" and are (the actual rules are rigorously defined in [IDNA2008-Protocol]
"Tables" documents): and [IDNA2008-Tables]):
o The characters are "letters", marks needed to form letters, o The characters are "letters", marks needed to form letters,
numerals, or other code points used to write words in some numerals, or other code points used to write words in some
language. Symbols, drawing characters, and various notational language. Symbols, drawing characters, and various notational
characters are intended to be permanently excluded. There is no characters are intended to be permanently excluded. There is no
evidence that they are important enough to Internet operations or evidence that they are important enough to Internet operations or
internationalization to justify expansion of domain names beyond internationalization to justify expansion of domain names beyond
the general principle of "letters, digits, and hyphen". the general principle of "letters, digits, and hyphen".
(Additional discussion and rationale for the symbol decision (Additional discussion and rationale for the symbol decision
appears in Section 7.6). appears in Section 7.6).
skipping to change at page 29, line 37 skipping to change at page 30, line 32
mappings no longer exist as requirements in IDNA2008. These mappings no longer exist as requirements in IDNA2008. These
specifications strongly prefer that only A-labels or U-labels be used specifications strongly prefer that only A-labels or U-labels be used
in protocol contexts and as much as practical more generally. in protocol contexts and as much as practical more generally.
IDNA2008 does anticipate situations in which some mapping at the time IDNA2008 does anticipate situations in which some mapping at the time
of user input into lookup applications is appropriate and desirable. of user input into lookup applications is appropriate and desirable.
The issues are discussed in Section 6 and specific recommendations The issues are discussed in Section 6 and specific recommendations
are made in [IDNA2008-Mapping]. are made in [IDNA2008-Mapping].
7.4. The Question of Prefix Changes 7.4. The Question of Prefix Changes
The conditions that would require a change in the IDNA ACE prefix The conditions that would have required a change in the IDNA ACE
("xn--" for the version of IDNA specified in [RFC3490]) have been a prefix ("xn--" for the version of IDNA specified in [RFC3490]) were
great concern to the community. A prefix change would clearly be of great concern to the community. A prefix change would have
necessary if the algorithms were modified in a manner that would clearly been necessary if the algorithms were modified in a manner
create serious ambiguities during subsequent transition in that would have created serious ambiguities during subsequent
registrations. This section summarizes our conclusions about the transition in registrations. This section summarizes the working
conditions under which changes in prefix would be necessary and the group's conclusions about the conditions under which a change in the
implications of such a change. prefix would have been necessary and the implications of such a
change.
7.4.1. Conditions Requiring a Prefix Change 7.4.1. Conditions Requiring a Prefix Change
An IDN prefix change is needed if a given string would be looked up An IDN prefix change would have been needed if a given string would
or otherwise interpreted differently depending on the version of the be looked up or otherwise interpreted differently depending on the
protocol or tables being used. An IDNA upgrade would require a version of the protocol or tables being used. This IDNA upgrade
prefix change if, and only if, one of the following four conditions would have required a prefix change if, and only if, one of the
were met: following four conditions were met:
1. The conversion of an A-label to Unicode (i.e., a U-label) yields 1. The conversion of an A-label to Unicode (i.e., a U-label) would
one string under IDNA2003 (RFC3490) and a different string under have yielded one string under IDNA2003 (RFC3490) and a different
IDNA2008. string under IDNA2008.
2. In a significant number of cases, an input string that is valid 2. In a significant number of cases, an input string that was valid
under IDNA2003 and also valid under IDNA2008 yields two different under IDNA2003 and also valid under IDNA2008 would have yielded
A-labels with the different versions. This condition is believed two different A-labels with the different versions. This
to be essentially equivalent to the one above except for a very condition is believed to be essentially equivalent to the one
small number of edge cases which may not justify a prefix change above except for a very small number of edge cases that were not
(See Section 7.2). found to justify a prefix change (See Section 7.2).
Note that if the input string is valid under one version and not Note that if the input string was valid under one version and not
valid under the other, this condition does not apply. See the valid under the other, this condition would not apply. See the
first item in Section 7.4.2, below. first item in Section 7.4.2, below.
3. A fundamental change is made to the semantics of the string that 3. A fundamental change was made to the semantics of the string that
is inserted in the DNS, e.g., if a decision were made to try to would be inserted in the DNS, e.g., if a decision were made to
include language or script information in the encoding in try to include language or script information in the encoding in
addition to the string itself. addition to the string itself.
4. A sufficiently large number of characters is added to Unicode so 4. A sufficiently large number of characters were added to Unicode
that the Punycode mechanism for block offsets can no longer so that the Punycode mechanism for block offsets would no longer
reference the higher-numbered planes and blocks. This condition reference the higher-numbered planes and blocks. This condition
is unlikely even in the long term and certain not to arise in the is unlikely even in the long term and certain not to arise in the
next several years. next several years.
7.4.2. Conditions Not Requiring a Prefix Change 7.4.2. Conditions Not Requiring a Prefix Change
As a result of the principles described above, none of the following As a result of the principles described above, none of the following
changes require a new prefix: changes required a new prefix:
1. Prohibition of some characters as input to IDNA. This may make 1. Prohibition of some characters as input to IDNA. Such a
names that are now registered inaccessible, but does not change prohibition might make names that were previously registered
those names. inaccessible, but did not change those names.
2. Adjustments in IDNA tables or actions, including normalization 2. Adjustments in IDNA tables or actions, including normalization
definitions, that affect characters that were already invalid definitions, that affected characters that were already invalid
under IDNA2003. under IDNA2003.
3. Changes in the style of the IDNA definition that does not alter 3. Changes in the style of the IDNA definition that did not alter
the actions performed by IDNA. the actions performed by IDNA.
7.4.3. Implications of Prefix Changes 7.4.3. Implications of Prefix Changes
While it might be possible to make a prefix change, the costs of such While it might have been possible to make a prefix change, the costs
a change are considerable. Registries could not convert all IDNA2003 of such a change are considerable. Registries could not have
("xn--") registrations to a new form at the same time and synchronize converted all IDNA2003 ("xn--") registrations to a new form at the
that change with applications supporting lookup. Unless all existing same time and synchronize that change with applications supporting
registrations were simply to be declared invalid (and perhaps even lookup. Unless all existing registrations were simply to be declared
then) systems that needed to support both labels with old prefixes invalid (and perhaps even then) systems that needed to support both
and labels with new ones would first process a putative label under labels with old prefixes and labels with new ones would be required
the IDNA2008 rules and try to look it up and then, if it were not to first process a putative label under the IDNA2008 rules and try to
found, would process the label under IDNA2003 rules and look it up look it up and then, if it were not found, would be required to
again. That process could significantly slow down all processing process the label under IDNA2003 rules and look it up again. That
process would probably have significantly slowed down all processing
that involved IDNs in the DNS especially since a fully-qualified name that involved IDNs in the DNS especially since a fully-qualified name
might contain a mixture of labels that were registered with the old might contain a mixture of labels that were registered with the old
and new prefixes. That would make DNS caching very difficult. In and new prefixes. That would have made DNS caching very difficult.
addition, looking up the same input string as two separate A-labels In addition, looking up the same input string as two separate
creates some potential for confusion and attacks, since the labels A-labels would have created some potential for confusion and attacks,
could map to different targets and then resolve to different entries since the labels could map to different targets and then resolve to
in the DNS. different entries in the DNS.
Consequently, a prefix change is to be avoided if at all possible, Consequently, a prefix change should have been, and was, avoided if
even if it means accepting some IDNA2003 decisions about character at all possible, even if it means accepting some IDNA2003 decisions
distinctions as irreversible and/or giving special treatment to edge about character distinctions as irreversible and/or giving special
cases. treatment to edge cases.
7.5. Stringprep Changes and Compatibility 7.5. Stringprep Changes and Compatibility
The Nameprep [RFC3491] specification, a key part of IDNA2003, is a The Nameprep [RFC3491] specification, a key part of IDNA2003, is a
profile of Stringprep [RFC3454]. While Nameprep is a Stringprep profile of Stringprep [RFC3454]. While Nameprep is a Stringprep
profile specific to IDNA, Stringprep is used by a number of other profile specific to IDNA, Stringprep is used by a number of other
protocols. Were Stringprep to be modified by IDNA2008, those changes protocols. Were Stringprep to have been modified by IDNA2008, those
to improve the handling of IDNs could cause problems for non-DNS changes to improve the handling of IDNs could cause problems for non-
uses, most notably if they affected identification and authentication DNS uses, most notably if they affected identification and
protocols. Several elements of IDNA2008 give interpretations to authentication protocols. Several elements of IDNA2008 give
strings prohibited under IDNA2003 or prohibit strings that IDNA2003 interpretations to strings prohibited under IDNA2003 or prohibit
permitted. Those elements include the proposed new inclusion tables strings that IDNA2003 permitted. Those elements include the proposed
[IDNA2008-Tables], the reduction in the number of characters new inclusion tables [IDNA2008-Tables], the reduction in the number
permitted as input for registration or lookup (Section 3), and even of characters permitted as input for registration or lookup
the proposed changes in handling of right to left strings (Section 3), and even the proposed changes in handling of right to
[IDNA2008-Bidi]. IDNA2008 does not use Nameprep or Stringprep at left strings [IDNA2008-Bidi]. IDNA2008 does not use Nameprep or
all, so there are no side-effect changes to other protocols. Stringprep at all, so there are no side-effect changes to other
protocols.
It is particularly important to keep IDNA processing separate from It is particularly important to keep IDNA processing separate from
processing for various security protocols because some of the processing for various security protocols because some of the
constraints that are necessary for smooth and comprehensible use of constraints that are necessary for smooth and comprehensible use of
IDNs may be unwanted or undesirable in other contexts. For example, IDNs may be unwanted or undesirable in other contexts. For example,
the criteria for good passwords or passphrases are very different the criteria for good passwords or passphrases are very different
from those for desirable IDNs: passwords should be hard to guess, from those for desirable IDNs: passwords should be hard to guess,
while domain names should normally be easily memorable. Similarly, while domain names should normally be easily memorable. Similarly,
internationalized SCSI identifiers and other protocol components are internationalized SCSI identifiers and other protocol components are
likely to have different requirements than IDNs. likely to have different requirements than IDNs.
skipping to change at page 32, line 32 skipping to change at page 33, line 27
than an ASCII base. than an ASCII base.
2. Symbol names are more problematic than letters because there may 2. Symbol names are more problematic than letters because there may
be no general agreement on whether a particular glyph matches a be no general agreement on whether a particular glyph matches a
symbol; there are no uniform conventions for naming; variations symbol; there are no uniform conventions for naming; variations
such as outline, solid, and shaded forms may or may not exist; such as outline, solid, and shaded forms may or may not exist;
and so on. As just one example, consider a "heart" symbol as it and so on. As just one example, consider a "heart" symbol as it
might appear in a logo that might be read as "I love...". While might appear in a logo that might be read as "I love...". While
the user might read such a logo as "I love..." or "I heart...", the user might read such a logo as "I love..." or "I heart...",
considerable knowledge of the coding distinctions made in Unicode considerable knowledge of the coding distinctions made in Unicode
is needed to know that there more than one "heart" character is needed to know that there is more than one "heart" character
(e.g., U+2665, U+2661, and U+2765) and how to describe it. These (e.g., U+2665, U+2661, and U+2765) and how to describe it. These
issues are of particular importance if strings are expected to be issues are of particular importance if strings are expected to be
understood or transcribed by the listener after being read out understood or transcribed by the listener after being read out
loud. loud.
3. Design of a screen reader used by blind Internet users who must 3. Design of a screen reader used by blind Internet users who must
listen to renderings of IDN domain names and possibly reproduce listen to renderings of IDN domain names and possibly reproduce
them on the keyboard becomes considerably more complicated when them on the keyboard becomes considerably more complicated when
the names of characters are not obvious and intuitive to anyone the names of characters are not obvious and intuitive to anyone
familiar with the language in question. familiar with the language in question.
skipping to change at page 33, line 9 skipping to change at page 34, line 5
would-be registrant has no way to know -- absent careful study of would-be registrant has no way to know -- absent careful study of
the code tables -- whether it is ambiguous (e.g., where there are the code tables -- whether it is ambiguous (e.g., where there are
multiple "heart" characters) or not. Conversely, the user seeing multiple "heart" characters) or not. Conversely, the user seeing
the hypothetical label doesn't know whether to read it -- try to the hypothetical label doesn't know whether to read it -- try to
transmit it to a colleague by voice -- as "heart", as "love", as transmit it to a colleague by voice -- as "heart", as "love", as
"black heart", or as any of the other examples below. "black heart", or as any of the other examples below.
5. The actual situation is even worse than this. There is no 5. The actual situation is even worse than this. There is no
possible way for a normal, casual, user to tell the difference possible way for a normal, casual, user to tell the difference
between the hearts of U+2665 and U+2765 and the stars of U+2606 between the hearts of U+2665 and U+2765 and the stars of U+2606
and U+2729 or the without somehow knowing to look for a and U+2729 without somehow knowing to look for a distinction. We
distinction. We have a white heart (U+2661) and few black have a white heart (U+2661) and few black hearts. Consequently,
hearts. Consequently, describing a label as containing a heart describing a label as containing a heart is hopelessly ambiguous:
is hopelessly ambiguous: we can only know that it contains one of we can only know that it contains one of several characters that
several characters that look like hearts or have "heart" in their look like hearts or have "heart" in their names. In cities where
names. In cities where "Square" is a popular part of a location "Square" is a popular part of a location name, one might well
name, one might well want to use a square symbol in a label as want to use a square symbol in a label as well and there are far
well and there are far more squares of various flavors in Unicode more squares of various flavors in Unicode than there are hearts
than there are hearts or stars. or stars.
The consequence of these ambiguities is that symbols are a very poor The consequence of these ambiguities is that symbols are a very poor
basis for reliable communication. Consistent with this conclusion, basis for reliable communication. Consistent with this conclusion,
the Unicode standard recommends that strings used in identifiers not the Unicode standard recommends that strings used in identifiers not
contain symbols or punctuation [Unicode-UAX31]. Of course, these contain symbols or punctuation [Unicode-UAX31]. Of course, these
difficulties with symbols do not arise with actual pictographic difficulties with symbols do not arise with actual pictographic
languages and scripts which would be treated like any other language languages and scripts which would be treated like any other language
characters; the two should not be confused. characters; the two should not be confused.
7.7. Migration Between Unicode Versions: Unassigned Code Points 7.7. Migration Between Unicode Versions: Unassigned Code Points
skipping to change at page 34, line 41 skipping to change at page 35, line 37
Unicode. The reality is that a script that is obscure to much of the Unicode. The reality is that a script that is obscure to much of the
world may still be very important to those who use it. Cultural and world may still be very important to those who use it. Cultural and
linguistic preservation principles make it inappropriate to declare linguistic preservation principles make it inappropriate to declare
the script of no importance in IDNs. Second, we already have the script of no importance in IDNs. Second, we already have
counterexamples in, e.g., the relationships associated with new Han counterexamples in, e.g., the relationships associated with new Han
characters being added (whether in the BMP or in Unicode Plane 2). characters being added (whether in the BMP or in Unicode Plane 2).
Independent of the technical transition issues identified above, it Independent of the technical transition issues identified above, it
can be observed that any addition of characters to an existing script can be observed that any addition of characters to an existing script
to make it easier to use or to better accommodate particular to make it easier to use or to better accommodate particular
languages may lead to transition issues. Such changes may change the languages may lead to transition issues. Such additions may change
preferred form for writing a particular string, changes that may be the preferred form for writing a particular string, changes that may
reflected, e.g., in keyboard transition modules that would be reflected, e.g., in keyboard transition modules that would
necessarily be different from those for earlier versions of Unicode necessarily be different from those for earlier versions of Unicode
where the newer characters may not exist. This creates an inherent where the newer characters may not exist. This creates an inherent
transition problem because attempts to access labels may use either transition problem because attempts to access labels may use either
the old or the new conventions, requiring registry action whether the the old or the new conventions, requiring registry action whether the
older conventions were used in labels or not. The need to consider older conventions were used in labels or not. The need to consider
transition mechanisms is inherent to evolution of Unicode to better transition mechanisms is inherent to evolution of Unicode to better
accommodate writing systems and is independent of how IDNs are accommodate writing systems and is independent of how IDNs are
represented in the DNS or how transitions among versions of those represented in the DNS or how transitions among versions of those
mechanisms occur. The requirement for transitions of this type is mechanisms occur. The requirement for transitions of this type is
illustrated by the addition of Malayalam Chillu in Unicode 5.1.0. illustrated by the addition of Malayalam Chillu in Unicode 5.1.0.
skipping to change at page 35, line 42 skipping to change at page 36, line 40
All existing channels through which names can enter a DNS server All existing channels through which names can enter a DNS server
database (for example, master files (as described in RFC 1034) and database (for example, master files (as described in RFC 1034) and
DNS update messages [RFC2136]) are IDN-unaware because they predate DNS update messages [RFC2136]) are IDN-unaware because they predate
IDNA. Other sections of this document provide the needed shielding IDNA. Other sections of this document provide the needed shielding
by ensuring that internationalized domain names entering DNS server by ensuring that internationalized domain names entering DNS server
databases through such channels have already been converted to their databases through such channels have already been converted to their
equivalent ASCII A-label forms. equivalent ASCII A-label forms.
Because of the distinction made between the algorithms for Because of the distinction made between the algorithms for
Registration and Lookup in [IDNA2008-Protocol] (a domain name Registration and Lookup in [IDNA2008-Protocol] (a domain name
containing only ASCII codepoints can not be converted to an A-label), containing only ASCII codepoints cannot be converted to an A-label),
there can not be more than one A-label form for any given U-label. there cannot be more than one A-label form for any given U-label.
As specified in RFC 2181 [RFC2181], the DNS protocol explicitly As specified in RFC 2181 [RFC2181], the DNS protocol explicitly
allows domain labels to contain octets beyond the ASCII range allows domain labels to contain octets beyond the ASCII range
(0000..007F), and this document does not change that. However, (0000..007F), and this document does not change that. However,
although the interpretation of octets 0080..00FF is well-defined in although the interpretation of octets 0080..00FF is well-defined in
the DNS, many application protocols support only ASCII labels and the DNS, many application protocols support only ASCII labels and
there is no defined interpretation of these non-ASCII octets as there is no defined interpretation of these non-ASCII octets as
characters and, in particular, no interpretation of case-independent characters and, in particular, no interpretation of case-independent
matching for them (see, e.g., [RFC4343]). If labels containing these matching for them (see, e.g., [RFC4343]). If labels containing these
octets are returned to applications, unpredictable behavior could octets are returned to applications, unpredictable behavior could
result. The A-label form, which cannot contain those characters, is result. The A-label form, which cannot contain those characters, is
the only standard representation for internationalized labels in the the only standard representation for internationalized labels in the
DNS protocol. DNS protocol.
8.2. DNSSEC Authentication of IDN Domain Names 8.2. Root and other DNS Server Considerations
DNS Security (DNSSEC) [RFC2535] is a method for supplying
cryptographic verification information along with DNS messages.
Public Key Cryptography is used in conjunction with digital
signatures to provide a means for a requester of domain information
to authenticate the source of the data. This ensures that it can be
traced back to a trusted source, either directly or via a chain of
trust linking the source of the information to the top of the DNS
hierarchy.
IDNA specifies that all internationalized domain names served by DNS
servers that cannot be represented directly in ASCII MUST use the
A-label form. Conversion to A-labels MUST be performed prior to a
zone being signed by the private key for that zone. Because of this
ordering, it is important to recognize that DNSSEC authenticates a
domain name containing A-labels or conventional LDH-labels, not
U-labels. In the presence of DNSSEC, no form of a zone file or query
response that contains a U-label may be signed or the signature
validated.
One consequence of this for sites deploying IDNA in the presence of
DNSSEC is that any special purpose proxies or forwarders used to
transform user input into IDNs must be earlier in the lookup flow
than DNSSEC authenticating nameservers for DNSSEC to work.
8.3. Root and other DNS Server Considerations
IDNs in A-label form will generally be somewhat longer than current IDNs in A-label form will generally be somewhat longer than current
domain names, so the bandwidth needed by the root servers is likely domain names, so the bandwidth needed by the root servers is likely
to go up by a small amount. Also, queries and responses for IDNs to go up by a small amount. Also, queries and responses for IDNs
will probably be somewhat longer than typical queries historically, will probably be somewhat longer than typical queries historically,
so EDNS0 [RFC2671] support may be more important (otherwise, queries so EDNS0 [RFC2671] support may be more important (otherwise, queries
and responses may be forced to go to TCP instead of UDP). and responses may be forced to go to TCP instead of UDP).
9. Internationalization Considerations 9. Internationalization Considerations
skipping to change at page 38, line 35 skipping to change at page 39, line 8
The editor and contributors would like to express their thanks to The editor and contributors would like to express their thanks to
those who contributed significant early (pre-WG) review comments, those who contributed significant early (pre-WG) review comments,
sometimes accompanied by text, Paul Hoffman, Simon Josefsson, and Sam sometimes accompanied by text, Paul Hoffman, Simon Josefsson, and Sam
Weiler. In addition, some specific ideas were incorporated from Weiler. In addition, some specific ideas were incorporated from
suggestions, text, or comments about sections that were unclear suggestions, text, or comments about sections that were unclear
supplied by Vint Cerf, Frank Ellerman, Michael Everson, Asmus supplied by Vint Cerf, Frank Ellerman, Michael Everson, Asmus
Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler. Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler.
Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie Garside, and Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie Garside, and
Jefsey Morfin for conversations that led to considerable improvements Jefsey Morfin for conversations that led to considerable improvements
in the content of this document. in the content of this document and to several others, including Ben
Campbell, Martin Duerst, Subramanian Moonesamy, Peter Saint-Andre,
and Dan Winship, for catching specific errors and recommending
corrections.
A meeting was held on 30 January 2008 to attempt to reconcile A meeting was held on 30 January 2008 to attempt to reconcile
differences in perspective and terminology about this set of differences in perspective and terminology about this set of
specifications between the design team and members of the Unicode specifications between the design team and members of the Unicode
Technical Consortium. The discussions at and subsequent to that Technical Consortium. The discussions at and subsequent to that
meeting were very helpful in focusing the issues and in refining the meeting were very helpful in focusing the issues and in refining the
specifications. The active participants at that meeting were (in specifications. The active participants at that meeting were (in
alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam,
Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary
Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel,
skipping to change at page 41, line 37 skipping to change at page 42, line 11
[RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
"Dynamic Updates in the Domain Name System (DNS UPDATE)", "Dynamic Updates in the Domain Name System (DNS UPDATE)",
RFC 2136, April 1997. RFC 2136, April 1997.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997. Specification", RFC 2181, July 1997.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998. Languages", BCP 18, RFC 2277, January 1998.
[RFC2535] Eastlake, D., "Domain Name System Security Extensions",
RFC 2535, March 1999.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
RFC 2671, August 1999. RFC 2671, August 1999.
[RFC2673] Crawford, M., "Binary Labels in the Domain Name System", [RFC2673] Crawford, M., "Binary Labels in the Domain Name System",
RFC 2673, August 1999. RFC 2673, August 1999.
[RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)", RFC 2782, specifying the location of services (DNS SRV)", RFC 2782,
February 2000. February 2000.
skipping to change at page 47, line 41 skipping to change at page 48, line 17
o Incorporated other changes from WG Last Call. o Incorporated other changes from WG Last Call.
o Small typographical and editorial corrections. o Small typographical and editorial corrections.
A.13. Version -13 A.13. Version -13
o Substituted in Section numbers to references to other IDNA2008 o Substituted in Section numbers to references to other IDNA2008
documents. documents.
A.14. Version -14
A.15. Version -14
This is the version of the document produced to reflect comments on
IETF Last Call. For the convenience of those who made comments and
of the IESG in evaluating them, this section therefore identifies
non-editorial changes made in response to Last Call comments in
somewhat more detail than may be usual.
o Removed the discussion of DNSSEC after extensive discussion on the
IETF and IDNABIS lists.
o Modified the discussion of prefix changes to make it clear that
the decisions have been made, rather than still representing open
issues. (Dan Winship review, 20091013)
o Suggested explicit identification of domain name slots in
protocols that use IDNA. Peter Saint-Andre, 20091019.
o Several other clarifications as suggested by Peter Saint-Andre,
20091019.
o Several minor editorial corrections per suggestions in Ben
Campbell's Gen-ART review 20091013.
o Typo corrections.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
 End of changes. 50 change blocks. 
222 lines changed or deleted 226 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/