< draft-ietf-idnabis-rationale-06.txt   draft-ietf-idnabis-rationale-07.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft December 15, 2008 Internet-Draft February 24, 2009
Intended status: Informational Intended status: Informational
Expires: June 18, 2009 Expires: August 28, 2009
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-06.txt draft-ietf-idnabis-rationale-07.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any This Internet-Draft is submitted to IETF in full conformance with the
applicable patent or other IPR claims of which he or she is aware provisions of BCP 78 and BCP 79.
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 18, 2009. This Internet-Draft will expire on August 28, 2009.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Abstract Abstract
Several years have passed since the original protocol for Several years have passed since the original protocol for
Internationalized Domain Names (IDNs) was completed and deployed. Internationalized Domain Names (IDNs) was completed and deployed.
During that time, a number of issues have arisen, including the need During that time, a number of issues have arisen, including the need
to update the system to deal with newer versions of Unicode. Some of to update the system to deal with newer versions of Unicode. Some of
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 5
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 6
1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 6
1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 7
1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 8
1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 9
2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 10
3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 11
3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 3.1. A Tiered Model of Permitted Characters and Labels . . . . 11
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 12
3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 12
3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 3.1.1.2. Rules and Their Application . . . . . . . . . . . 13
3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 14
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 13 Applications . . . . . . . . . . . . . . . . . . . . . . . 15
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15
4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 4.1. Display and Network Order . . . . . . . . . . . . . . . . 16
4.2. Entry and Display in Applications . . . . . . . . . . . . 15 4.2. Entry and Display in Applications . . . . . . . . . . . . 17
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 16 Alternate Character Forms . . . . . . . . . . . . . . . . 18
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 19 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 20 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 22
6. Front-end and User Interface Processing . . . . . . . . . . . 21 6. Front-end and User Interface Processing for Lookup . . . . . . 23
7. Migration from IDNA2003 and Unicode Version Synchronization . 24 7. Migration from IDNA2003 and Unicode Version Synchronization . 26
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 26
7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 26
7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 27
7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 28
7.2. Changes in Character Interpretations . . . . . . . . . . . 28 7.2. Changes in Character Interpretations . . . . . . . . . . . 29
7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 31
7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 32
7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 32
7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 33
7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 33
7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 34
7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 34
7.7. Migration Between Unicode Versions: Unassigned Code 7.7. Migration Between Unicode Versions: Unassigned Code
Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 37
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 38
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 38
10. Internationalization Considerations . . . . . . . . . . . . . 37 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 38
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 8.3. Root and other DNS Server Considerations . . . . . . . . . 39
11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 9. Internationalization Considerations . . . . . . . . . . . . . 39
11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39
11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 40
12. Security Considerations . . . . . . . . . . . . . . . . . . . 38 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 40
12.1. General Security Issues with IDNA . . . . . . . . . . . . 38 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 40
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 11. Security Considerations . . . . . . . . . . . . . . . . . . . 40
13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 11.1. General Security Issues with IDNA . . . . . . . . . . . . 40
13.2. Informative References . . . . . . . . . . . . . . . . . . 40 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 41
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 42
14.1. Normative References . . . . . . . . . . . . . . . . . . . 42
14.2. Informative References . . . . . . . . . . . . . . . . . . 43
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 45
A.1. Changes between Version -00 and Version -01 of A.1. Changes between Version -00 and Version -01 of
draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 45
A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 45
A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 46
A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 46
A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 47
A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 43 A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 47
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 44 A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 47
Intellectual Property and Copyright Statements . . . . . . . . . . 45 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 48
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
The original standards for Internationalized Domain Names (IDNs) were The original standards for Internationalized Domain Names (IDNs) were
completed and deployed starting in 2003. Those standards are known completed and deployed starting in 2003. Those standards are known
as Internationalized Domain Names in Applications (IDNA), taken from as Internationalized Domain Names in Applications (IDNA), taken from
the name of the highest level standard within the group, RFC 3490 the name of the highest level standard within the group, RFC 3490
[RFC3490]. After those standards were deployed, a number of issues [RFC3490]. After those standards were deployed, a number of issues
arose that led to a call for a new version of the IDNA protocol and arose that led to a call for a new version of the IDNA protocol and
the associated tables, including a subset of those described in a the associated tables, including a subset of those described in a
recent IAB report [RFC4690] and the need to update the system to deal recent IAB report [RFC4690] and the need to update the system to deal
with newer versions of Unicode. This document further explains the with newer versions of Unicode. This document further explains the
issues that have been encountered when they are important to issues that have been encountered when they are important to
understanding of the revised protocols. It also provides an overview understanding of the revised protocols. It also provides an overview
of the new IDNA model and explanatory material for it. Additional of the new IDNA model and explanatory material for it. Additional
explanatory material for the specific components of the proposals explanatory material for the specific components of the proposals
appears with the associated documents. appears with the associated documents.
This document and the associated ones are written from the
perspective of an IDNA-aware user, application, or implementation.
While they may reiterate fundamental DNS rules and requirements for
the convenience of the reader, they make no attempt to be
comprehensive about DNS principles and should not be considered as a
substitute for a thorough understanding of the DNS protocols and
specifications.
A good deal of the background material that appeared in RFC 3490 A good deal of the background material that appeared in RFC 3490
[RFC3490] has been removed from this update. That material is either [RFC3490] has been removed from this update. That material is either
of historical interest only or has been covered from a more recent of historical interest only or has been covered from a more recent
perspective in RFC 4690 [RFC4690]. perspective in RFC 4690 [RFC4690].
This document is not normative. The information it provides is This document is not normative. The information it provides is
intended to make the rules, tables, and protocol easier to understand intended to make the rules, tables, and protocol easier to understand
and to provide overview information and suggestions for zone and to provide overview information and suggestions for zone
administrators and others who need to make policy, deployment, and administrators and others who need to make policy, deployment, and
similar decisions about IDNs. similar decisions about IDNs.
skipping to change at page 5, line 51 skipping to change at page 7, line 10
This distinction is important because the reasonable goal of an IDN This distinction is important because the reasonable goal of an IDN
effort is not to be able to write the great Klingon (or language of effort is not to be able to write the great Klingon (or language of
one's choice) novel in DNS labels but to be able to form a usefully one's choice) novel in DNS labels but to be able to form a usefully
broad range of mnemonics in ways that are as natural as possible in a broad range of mnemonics in ways that are as natural as possible in a
very broad range of scripts. very broad range of scripts.
1.3.3. New Terminology and Restrictions 1.3.3. New Terminology and Restrictions
These documents introduce new terminology, and precise definitions, These documents introduce new terminology, and precise definitions,
for the terms "U-labels", "A-labels", labels that are "IDNA-valid", for the terms "U-label", "A-Label", LDH-label (to which all valid
and an "LDH-label" (differing from an LDH-conformant label or fully- pre-IDNA host names conformed), Reserved-LDH-label (R-LDH-label), XN-
qualified domain name). They also introduce a restriction, for IDNA- label, Fake-A-Label, and Non-Reserved-LDH-label (NR-LDH-label).
conformant applications and DNS zones in which IDNA is used, on
strings used as labels that contain "--" in the third and fourth In addition, the term "putative label" has been adopted to refer to a
positions, essentially requiring that such strings be IDNA-valid. label that may appear to meet certain definitional constraints but
This restriction on strings containing "--" is required for three has not yet been sufficiently tested for validity.
These definitions are illustrated in Figure 1 of the Definitions
Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and
fourth character from the beginning of the label. In IDNA-aware
applications, only a subset of these reserved labels is permitted to
be used, namely the A-label subset. A-labels are a subset of the
R-LDH-labels that begin with the case-insensitive (?) string "xn--".
Labels that bear this prefix but which are not otherwise valid fall
into the "Fake-A-label" category. The non-reserved labels (NR-LDH-
labels) are implicitly valid since they do not trigger any
resemblance to IDNA-landr NR-LDH-labels.
The creation of the Reserved-LDH category is required for three
reasons: reasons:
o to prevent confusion with pre-IDNA coding forms; o to prevent confusion with pre-IDNA coding forms;
o to permit future extensions that would require changing the o to permit future extensions that would require changing the
prefix, no matter how unlikely those might be (see Section 7.4); prefix, no matter how unlikely those might be (see Section 7.4);
and and
o to reduce the opportunities for attacks via the Punycode encoding o to reduce the opportunities for attacks via the Punycode encoding
algorithm itself. algorithm itself.
Figure 1 of the Definitions Document [IDNA2008-Defs] illustrates the
terminology used by IDNA for various types of labels and strings and
their relationship.
1.4. Objectives 1.4. Objectives
The intent of the IDNA revision effort, and hence of this document The intent of the IDNA revision effort, and hence of this document
and the associated ones, is to increase the usability and and the associated ones, is to increase the usability and
effectiveness of internationalized domain names (IDNs) while effectiveness of internationalized domain names (IDNs) while
preserving or strengthening the integrity of references that use preserving or strengthening the integrity of references that use
them. The original "hostname" character definitions (see, e.g., them. The original "hostname" character definitions (see, e.g.,
[RFC0810]) struck a balance between the creation of useful mnemonics [RFC0810]) struck a balance between the creation of useful mnemonics
and the introduction of parsing problems or general confusion in the and the introduction of parsing problems or general confusion in the
contexts in which domain names are used. The objective of IDNA2008 contexts in which domain names are used. The objective of IDNA2008
skipping to change at page 8, line 4 skipping to change at page 9, line 18
IDNA allows the graceful introduction of IDNs not only by avoiding IDNA allows the graceful introduction of IDNs not only by avoiding
upgrades to existing infrastructure (such as DNS servers and mail upgrades to existing infrastructure (such as DNS servers and mail
transport agents), but also by allowing some rudimentary use of IDNs transport agents), but also by allowing some rudimentary use of IDNs
in applications by using the ASCII-encoded representation of the in applications by using the ASCII-encoded representation of the
labels containing non-ASCII characters. While such names are user- labels containing non-ASCII characters. While such names are user-
unfriendly to read and type, and hence not optimal for user input, unfriendly to read and type, and hence not optimal for user input,
they can be used as a last resort to allow rudimentary IDN usage. they can be used as a last resort to allow rudimentary IDN usage.
For example, they might be the best choice for display if it were For example, they might be the best choice for display if it were
known that relevant fonts were not available on the user's computer. known that relevant fonts were not available on the user's computer.
In order to allow user-friendly input and output of the IDNs and In order to allow user-friendly input and output of the IDNs and
acceptance of some characters as equivalent to those to be processed acceptance of some characters as equivalent to those to be processed
according to the protocol, the applications need to be modified to according to the protocol, the applications need to be modified to
conform to this specification. conform to this specification.
IDNA uses the Unicode character repertoire, for continuity with the This version of IDNA uses the Unicode character repertoire, for
original version of IDNA. continuity with the original version of IDNA.
1.6. Comprehensibility of IDNA Mechanisms and Processing 1.6. Comprehensibility of IDNA Mechanisms and Processing
One of the major goals of this work is to improve the general One of the major goals of this work is to improve the general
understanding of how IDNA works and what characters are permitted and understanding of how IDNA works and what characters are permitted and
what happens to them. Comprehensibility and predictability to users what happens to them. Comprehensibility and predictability to users
and registrants are themselves important motivations and design goals and registrants are themselves important motivations and design goals
for this effort. The effort includes some new terminology and a for this effort. The effort includes some new terminology and a
revised and extended model, both covered in this section, and some revised and extended model, both covered in this section, and some
more specific protocol, processing, and table modifications. Details more specific protocol, processing, and table modifications. Details
skipping to change at page 9, line 26 skipping to change at page 10, line 40
the mapping but in accurately identifying the incoming character set the mapping but in accurately identifying the incoming character set
and then applying the correct conversion routine. If a local and then applying the correct conversion routine. If a local
operating system uses one of the ISO 8859 character sets or an operating system uses one of the ISO 8859 character sets or an
extensive national or industrial system such as GB18030 [GB18030] or extensive national or industrial system such as GB18030 [GB18030] or
BIG5 [BIG5], one must correctly identify the character set in use BIG5 [BIG5], one must correctly identify the character set in use
before converting to Unicode even though those character coding before converting to Unicode even though those character coding
systems are substantially or completely Unicode-compatible (i.e., all systems are substantially or completely Unicode-compatible (i.e., all
of the code points in them have an exact and unique mapping to of the code points in them have an exact and unique mapping to
Unicode code points). It may be even more difficult when the Unicode code points). It may be even more difficult when the
character coding system in local use is based on conceptually character coding system in local use is based on conceptually
different assumptions than those used by Unicode about, e.g., about different assumptions than those used by Unicode about, e.g., font
font encodings used for publications in some Indic scripts. Those encodings used for publications in some Indic scripts. Those
differences may not easily yield unambiguous conversions or differences may not easily yield unambiguous conversions or
interpretations even if each coding system is internally consistent interpretations even if each coding system is internally consistent
and adequate to represent the local language and script. and adequate to represent the local language and script.
2. Processing in IDNA2008 2. Processing in IDNA2008
These specifications separate Domain Name Registration and Lookup in These specifications separate Domain Name Registration and Lookup in
the protocol specification. Doing so reflects current practice in the protocol specification. Doing so reflects current practice in
which per-registry restrictions and special processing are applied at which per-registry restrictions and special processing are applied at
registration time but not during lookup. Even more important in the registration time but not during lookup. Even more important in the
skipping to change at page 10, line 24 skipping to change at page 11, line 37
3.1. A Tiered Model of Permitted Characters and Labels 3.1. A Tiered Model of Permitted Characters and Labels
Moving to an inclusion model requires respecifying the list of Moving to an inclusion model requires respecifying the list of
characters that are permitted in IDNs. In IDNA2003, the role and characters that are permitted in IDNs. In IDNA2003, the role and
utility of characters are independent of context and fixed forever utility of characters are independent of context and fixed forever
(or until the standard is replaced). Making completely context- (or until the standard is replaced). Making completely context-
independent rules globally has proven impractical because some independent rules globally has proven impractical because some
characters, especially those that are called "Join_Controls" in characters, especially those that are called "Join_Controls" in
Unicode, are needed to make reasonable use of some scripts but have Unicode, are needed to make reasonable use of some scripts but have
no visible effect(s) in others. IDNA2003 prohibited those types of no visible effect(s) in others. IDNA2003 prohibited those types of
characters entirely. But the restrictions were much too severe to characters entirely. But the restrictions led to a consensus that
permit an adequate range of mnemonics for identifiers based on some under some conditions, these "joiner" characters were legitimately
languages. The requirement to support those characters but limit needed to allow useful mnemonics for some languages and scripts. The
their use to very specific contexts was reinforced by the observation requirement to support those characters but limit their use to very
that handling of particular characters across the languages that use specific contexts was reinforced by the observation that handling of
a script, or the use of similar or identical-looking characters in particular characters across the languages that use a script, or the
different scripts, is less well understood than many people believed use of similar or identical-looking characters in different scripts,
it was several years ago. is more complex than many people believed it was several years ago.
Independently of the characters chosen (see next subsection), the Independently of the characters chosen (see next subsection), the
approach is to divide the characters that appear in Unicode into approach is to divide the characters that appear in Unicode into
three categories: three categories:
3.1.1. PROTOCOL-VALID 3.1.1. PROTOCOL-VALID
Characters identified as "PROTOCOL-VALID" (often abbreviated Characters identified as "PROTOCOL-VALID" (often abbreviated
"PVALID") are, in general, permitted by IDNA for all uses in IDNs. "PVALID") are, in general, permitted by IDNA for all uses in IDNs.
Their use may be restricted by rules about the context in which they Their use may be restricted by rules about the context in which they
skipping to change at page 11, line 39 skipping to change at page 13, line 7
"CONTEXTUAL RULE REQUIRED" and, when adequately understood, "CONTEXTUAL RULE REQUIRED" and, when adequately understood,
associated with a rule. In addition, the rule will define whether it associated with a rule. In addition, the rule will define whether it
is to be applied on lookup as well as registration. A distinction is is to be applied on lookup as well as registration. A distinction is
made between characters that indicate or prohibit joining (known as made between characters that indicate or prohibit joining (known as
"CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring
contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the
former require full testing at lookup time. former require full testing at lookup time.
3.1.1.2. Rules and Their Application 3.1.1.2. Rules and Their Application
The actual rules may be present or absent. If present, they may have The actual rules may be DEFINED or NULL. If present, they may have
values of "True" (character may be used in any position in any values of "True" (character may be used in any position in any
label), "False" (character may not be used in any label), or may be a label), "False" (character may not be used in any label), or may be a
set of procedural rules that specify the context in which the set of procedural rules that specify the context in which the
character is permitted. character is permitted.
Examples of descriptions of typical rules, stated informally and in Examples of descriptions of typical rules, stated informally and in
English, include "Must follow a character from Script XYZ", "Must English, include "Must follow a character from Script XYZ", "Must
occur only if the entire label is in Script ABC", "Must occur only if occur only if the entire label is in Script ABC", "Must occur only if
the previous and subsequent characters have the DFG property". the previous and subsequent characters have the DFG property".
skipping to change at page 13, line 17 skipping to change at page 14, line 32
For convenience in processing and table-building, code points that do For convenience in processing and table-building, code points that do
not have assigned values in a given version of Unicode are treated as not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points are belonging to a special UNASSIGNED category. Such code points are
prohibited in labels to be registered or looked up. The category prohibited in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories (at which point, they are classified into one of the other categories
as appropriate). as appropriate).
The rationale for restricting the processing of UNASSIGNED characters
is simply that if such characters were permitted to be looked up, for
example, and were later assigned, but subject to some set of
contextual rules, un-updated instances of IDNA-aware software might
permit lookup of labels containing the previously-unassigned
characters while updated versions of IDNA-aware software might
restrict their use in lookup, depending on the contextual rules. It
should be clear that under no circumstance should an UNASSIGNED
character be permitted in a label to be registered as part of a
domain name.
3.2. Registration Policy 3.2. Registration Policy
While these recommendations cannot and should not define registry While these recommendations cannot and should not define registry
policies, registries should develop and apply additional restrictions policies, registries should develop and apply additional restrictions
to reduce confusion and other problems. For example, it is generally as needed to reduce confusion and other problems. For example, it is
believed that labels containing characters from more than one script generally believed that labels containing characters from more than
are a bad practice although there may be some important exceptions to one script are a bad practice although there may be some important
that principle. Some registries may choose to restrict registrations exceptions to that principle. Some registries may choose to restrict
to characters drawn from a very small number of scripts. For many registrations to characters drawn from a very small number of
scripts, the use of variant techniques such as those as described in scripts. For many scripts, the use of variant techniques such as
RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for those as described in RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and
Chinese by the tables described in RFC 4713 [RFC4713] may be helpful illustrated for Chinese by the tables described in RFC 4713 [RFC4713]
in reducing problems that might be perceived by users. may be helpful in reducing problems that might be perceived by users.
In general, users will benefit if registries only permit characters In general, users will benefit if registries only permit characters
from scripts that are well-understood by the registry or its from scripts that are well-understood by the registry or its
advisers. If a registry decides to reduce opportunities for advisers. If a registry decides to reduce opportunities for
confusion by constructing policies that disallow characters used in confusion by constructing policies that disallow characters used in
historic writing systems or characters whose use is restricted to historic writing systems or characters whose use is restricted to
specialized, highly technical contexts, some relevant information may specialized, highly technical contexts, some relevant information may
be found in Section 2.4 "Specific Character Adjustments", Table 4 be found in Section 2.4 "Specific Character Adjustments", Table 4
"Candidate Characters for Exclusion from Identifiers" of "Candidate Characters for Exclusion from Identifiers" of
[Unicode-UAX31] and Section 3.1. "General Security Profile for [Unicode-UAX31] and Section 3.1. "General Security Profile for
Identifiers" in [Unicode-Security]. Identifiers" in [Unicode-Security].
It is worth stressing that these principles of policy development and It is worth stressing that these principles of policy development and
application apply at all levels of the DNS, not only, e.g., TLD application apply at all levels of the DNS, not only, e.g., TLD or
registrations and that even a trivial, "anything permitted that is SLD registrations and that even a trivial, "anything permitted that
valid under the protocol" policy is helpful in that it helps users is valid under the protocol" policy is helpful in that it helps users
and application developers know what to expect. and application developers know what to expect.
3.3. Layered Restrictions: Tables, Context, Registration, Applications 3.3. Layered Restrictions: Tables, Context, Registration, Applications
The essence of the character rules in IDNA2008 is based on the The essence of the character rules in IDNA2008 is based on the
realization that there is no single magic bullet for any of the realization that there is no single magic bullet for any of the
issues associated with a multiscript DNS. Instead, the issues associated with a multiscript DNS. Instead, the
specifications define a variety of approaches that, together, specifications define a variety of approaches that, together,
constitute multiple lines of defense against ambiguity in identifiers constitute multiple lines of defense against ambiguity in identifiers
and loss of referential integrity. The actual character tables are and loss of referential integrity. The actual character tables are
the first mechanism, protocol rules about how those characters are the first mechanism, protocol rules about how those characters are
applied or restricted in context are the second, and those two in applied or restricted in context are the second, and those two in
combination constitute the limits of what can be done by a protocol combination constitute the limits of what can be done by a protocol
alone. As discussed in the previous section (Section 3.2), alone. As discussed in the previous section (Section 3.2),
registries are expected to restrict what they permit to be registries are expected to restrict what they permit to be
registered, devising and using rules that are designed to optimize registered, devising and using rules that are designed to optimize
the balance between confusion and risk on the one hand and maximum the balance between confusion and risk on the one hand and maximum
expressiveness in mnemonics on the other. expressiveness in mnemonics on the other.
In addition, there is an important role for user agents in warning In addition, there is an important role for user agents in warning
against label forms that appear unreasonable given their knowledge of against label forms that appear problematic given their knowledge of
local contexts and conventions. Of course, no approach based on local contexts and conventions. Of course, no approach based on
naming or identifiers alone can protect against all threats. naming or identifiers alone can protect against all threats.
4. Issues that Constrain Possible Solutions 4. Issues that Constrain Possible Solutions
4.1. Display and Network Order 4.1. Display and Network Order
The correct treatment of domain names requires a clear distinction The correct treatment of domain names requires a clear distinction
between Network Order (the order in which the code points are sent in between Network Order (the order in which the code points are sent in
protocols) and Display Order (the order in which the code points are protocols) and Display Order (the order in which the code points are
displayed on a screen or paper). The order of labels in a domain displayed on a screen or paper). The order of labels in a domain
name that contains characters that are normally written right to left name that contains characters that are normally written right to left
is discussed in [IDNA2008-Bidi]. In particular, there are questions is discussed in [IDNA2008-Bidi]. In particular, there are questions
about the order in which labels are displayed if left to right and about the order in which labels are displayed if left to right and
right to left labels are adjacent to each other, especially if there right to left labels are adjacent to each other, especially if there
skipping to change at page 16, line 41 skipping to change at page 18, line 22
display the A-label. display the A-label.
In any place where a protocol or document format allows transmission In any place where a protocol or document format allows transmission
of the characters in internationalized labels, labels should be of the characters in internationalized labels, labels should be
transmitted using whatever character encoding and escape mechanism transmitted using whatever character encoding and escape mechanism
the protocol or document format uses at that place. This provision the protocol or document format uses at that place. This provision
is intended to prevent situations in which, e.g., UTF-8 domain names is intended to prevent situations in which, e.g., UTF-8 domain names
appear embedded in text that is otherwise in some other character appear embedded in text that is otherwise in some other character
coding. coding.
All protocols that use domain name slots already have the capacity All protocols that use domain name slots (See Section 2.3.1.6
for handling domain names in the ASCII charset. Thus, A-labels can [[anchor12: ?? Verify this]] in [IDNA2008-Defs]) already have the
inherently be handled by those protocols. capacity for handling domain names in the ASCII charset. Thus,
A-labels can inherently be handled by those protocols.
4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate
Character Forms Character Forms
[[anchor13: There is some internal redundancy and repetition in the [[anchor13: There is some internal redundancy and repetition in the
material in this section. Specific suggestions about to reduce or material in this section. Specific suggestions about to reduce or
eliminate redundant text would be appreciated. If no such eliminate redundant text would be appreciated. If no such
suggestions are received before -07 is posted, this not will be suggestions are received before -07 is posted, this note will be
removed.]] removed.]]
Users often have expectations about character matching or equivalence Users often have expectations about character matching or equivalence
that are based on their own languages and the orthography of those that are based on their own languages and the orthography of those
languages. These expectations may not be consistent with forms or languages. These expectations may not be consistent with forms or
actions that can be naturally accommodated in a character coding actions that can be naturally accommodated in a character coding
system, especially if multiple languages are written using the same system, especially if multiple languages are written using the same
script but using different conventions. A Norwegian user might script but using different conventions. A Norwegian user might
expect a label with the ae-ligature to be treated as the same label expect a label with the ae-ligature to be treated as the same label
as one using the Swedish spelling with a-diaeresis even though as one using the Swedish spelling with a-diaeresis even though
skipping to change at page 19, line 4 skipping to change at page 20, line 33
these situations in a system such as IDNA -- or with Unicode these situations in a system such as IDNA -- or with Unicode
normalization generally -- since determining what to do requires normalization generally -- since determining what to do requires
information about the language being used, context, or both. information about the language being used, context, or both.
Consequently, these specifications make no attempt to treat these Consequently, these specifications make no attempt to treat these
combined characters in any special way. However, their existence combined characters in any special way. However, their existence
provides a prime example of a situation in which a registry that is provides a prime example of a situation in which a registry that is
aware of the language context in which labels are to be registered, aware of the language context in which labels are to be registered,
and where that language sometimes (or always) treats the two- and where that language sometimes (or always) treats the two-
character sequences as equivalent to the combined form, should give character sequences as equivalent to the combined form, should give
serious consideration to applying a "variant" model [RFC3743] serious consideration to applying a "variant" model [RFC3743]
[RFC4290], or to prohibiting registration of one the forms entirely, [RFC4290], or to prohibiting registration of one the forms entirely,
to reduce the opportunities for user confusion and fraud that would to reduce the opportunities for user confusion and fraud that would
result from the related strings being registered to different result from the related strings being registered to different
parties. parties.
[[anchor14: Placeholder: A discussion of the Arabic digit issue [[anchor14: Placeholder: A discussion of the Arabic digit issue
shoudl go here once it is resolved in some appropriate way.]] should go here once it is resolved in some appropriate way.]]
4.4. Case Mapping and Related Issues 4.4. Case Mapping and Related Issues
In the DNS, ASCII letters are stored with their case preserved. In the DNS, ASCII letters are stored with their case preserved.
Matching during the query process is case-independent, but none of Matching during the query process is case-independent, but none of
the information that might be represented by choices of case has been the information that might be represented by choices of case has been
lost. That model has been accidentally helpful because, as people lost. That model has been accidentally helpful because, as people
have created DNS labels by catenating words (or parts of words) to have created DNS labels by catenating words (or parts of words) to
form labels, case has often been used to distinguish among components form labels, case has often been used to distinguish among components
and make the labels more memorable. and make the labels more memorable.
skipping to change at page 21, line 24 skipping to change at page 23, line 5
If lookup applications, as a user interface (UI) or other local If lookup applications, as a user interface (UI) or other local
matter, decide to warn about some strings that are valid under the matter, decide to warn about some strings that are valid under the
global rules but that they perceive as dangerous, that is their global rules but that they perceive as dangerous, that is their
prerogative and we can only hope that the market (and maybe prerogative and we can only hope that the market (and maybe
regulators) will reinforce the good choices and discourage the poor regulators) will reinforce the good choices and discourage the poor
ones. In this context, a lookup application that decides a string ones. In this context, a lookup application that decides a string
that is valid under the protocol is dangerous and refuses to look it that is valid under the protocol is dangerous and refuses to look it
up is in violation of the protocols; one that is willing to look up is in violation of the protocols; one that is willing to look
something up, but warns against it, is exercising a local choice. something up, but warns against it, is exercising a local choice.
6. Front-end and User Interface Processing 6. Front-end and User Interface Processing for Lookup
Domain names may be identified and processed in many contexts. They Domain names may be identified and processed in many contexts. They
may be typed in by users either by themselves or embedded in an may be typed in by users either by themselves or embedded in an
identifier structured for a particular protocol or class of protocols identifier structured for a particular protocol or class of protocols
such a email addresses, URIs, or IRIs. They may occur in running such a email addresses, URIs, or IRIs. They may occur in running
text or be processed by one system after being provided in another. text or be processed by one system after being provided in another.
Systems may wish to try to normalize URLs so as to determine (or Systems may wish to try to normalize URLs so as to determine (or
guess) whether a reference is valid or two references point to the guess) whether a reference is valid or two references point to the
same object without actually looking the objects up and comparing same object without actually looking the objects up and comparing
them (that is necessary, not just a choice, for URI types that are them (that is necessary, not just a choice, for URI types that are
not intended to be resolved). Some of these goals may be more easily not intended to be resolved). Some of these goals may be more easily
and reliably satisfied than others. While there are strong arguments and reliably satisfied than others. While there are strong arguments
for any domain name that is placed "on the wire" -- transmitted for any domain name that is placed "on the wire" -- transmitted
between systems -- to be in the minimum-ambiguity forms of A-labels, between systems -- to be in the zero-ambiguity forms of A-labels, it
U-labels, or LDH-labels, it is inevitable that programs that process is inevitable that programs that process domain names will encounter
domain names will encounter variant forms. U-labels or variant forms.
One source of such forms will be labels created under IDNA2003 One source of such forms will be labels created under IDNA2003
because that protocol allowed labels that were transformed before because that protocol allowed labels that were transformed from
they were turned from native-character into ACE ("xn--...") format by native-character format by mapping some characters into others before
mapping some characters into other. One consequence of the conversion into ACE ("xn--...") format. One consequence of the
transformations was that, when the ToUnicode and ToASCII operations transformations was that, when the ToUnicode and ToASCII operations
of IDNA2003 were applied, ToUnicode(ToASCII(original-label)) often of IDNA2003 were applied, ToUnicode(ToASCII(original-label)) often
did not produce the original label. IDNA2008 explicitly defines did not produce the original label. IDNA2008 explicitly defines
A-labels and U-labels as different forms of the same abstract label, A-labels and U-labels as different forms of the same abstract label,
forms that are stable when conversions are performed between them, forms that are stable when conversions are performed between them
without mappings. A different way of explaining this is that there (without mappings). A different way of explaining this is that there
are, today, domain names in files on the Internet that use characters are, today, domain names in files on the Internet that use characters
that cannot be represented directly in, or recovered from, (A-label) that cannot be represented directly in, or recovered from, (A-label)
domain names but for which interpretations are provided by IDNA2003. domain names but for which interpretations are provided by IDNA2003.
There are two major categories of such characters, those that are There are two major categories of such characters, those that are
removed by NFKC normalization and those upper-case characters that removed by NFKC normalization and those upper-case characters that
are mapped to lower-case (there are also a few characters that are are mapped to lower-case (there are also a few characters that are
given special-case mapping treatment in Stringprep including lower- given special-case mapping treatment in Stringprep, including lower-
case characters that are case-folded into other lower-case characters case characters that are case-folded into other lower-case characters
or strings). or strings).
Other issues in domain name identification and processing arise Other issues in domain name identification and processing arise
because IDNA2003 specified that several other characters be treated because IDNA2003 specified that several other characters be treated
as equivalent to the ASCII period (dot, full stop) character used as as equivalent to the ASCII period (dot, full stop) character used as
a label separator. If a string that might be a domain name appears a label separator. If a string that might be a domain name appears
in an arbitrary context (such as running text), it is difficult, even in an arbitrary context (such as running text), it is difficult, even
with only ASCII characters, to know whether an actual domain name (or with only ASCII characters, to know whether an actual domain name (or
a protocol parameter like a URI) is present and where it starts and a protocol parameter like a URI) is present and where it starts and
skipping to change at page 23, line 22 skipping to change at page 24, line 48
o Highly Localized Preprocessing. o Highly Localized Preprocessing.
Unlike the case above, there will be some situations in which Unlike the case above, there will be some situations in which
software will be highly localized for a particular environment and software will be highly localized for a particular environment and
carefully adapted to the expectations of users in that carefully adapted to the expectations of users in that
environment. The many discussions about using the Internet to environment. The many discussions about using the Internet to
preserve and support local cultures suggest that these cases may preserve and support local cultures suggest that these cases may
be more common in the future than they have been so far. be more common in the future than they have been so far.
In these cases, we should avoid trying to tell implementers what In these cases, we should avoid trying to tell implementers what
they should do, if only because they are quite likely (and for they should accept, if only because they are quite likely (and for
good reason) to ignore us. We would assume that they would map good reason) to ignore us. We would assume that they would map
characters that the intuitions of their users would suggest be characters that the intuitions of their users would suggest be
mapped and would hope that they would do that mapping as early as mapped and would hope that they would do that mapping as early as
possible, storing A-label or U-label forms in files and possible, storing A-label or U-label forms in files and
transporting only those forms between systems. One can imagine transporting only those forms between systems. One can imagine
switches about whether some sorts of mappings occur, warnings switches about whether some sorts of mappings occur, warnings
before applying them or, in a slightly more extreme version of the before applying them or, in a slightly more extreme version of the
approach taken in Internet Explorer version 7 (IE7), systems that approach taken in Internet Explorer version 7 (IE7), systems that
utterly refuse to handle "strange" characters at all if they utterly refuse to handle "strange" characters at all if they
appear in U-label form. None of those local decisions are a appear in U-label form. None of those local decisions are a
skipping to change at page 24, line 8 skipping to change at page 25, line 34
globally or compare equal when crude methods (i.e., those not globally or compare equal when crude methods (i.e., those not
conforming to the strict definition of label equivalence given in conforming to the strict definition of label equivalence given in
[IDNA2008-Defs]) are used are those in which all native-script labels [IDNA2008-Defs]) are used are those in which all native-script labels
are in U-label form. Forms that assume mapping will occur, are in U-label form. Forms that assume mapping will occur,
especially forms that were not valid under IDNA2003, may or may not especially forms that were not valid under IDNA2003, may or may not
function in predictable ways across all implementations. function in predictable ways across all implementations.
User interfaces involving Latin-based scripts should take special User interfaces involving Latin-based scripts should take special
care when considering how to handle case mapping because small care when considering how to handle case mapping because small
differences in label strings may cause behavior that is astonishing differences in label strings may cause behavior that is astonishing
to users. Because case-insensitive mapping is done for ASCII strings to users. Because case-insensitive comparison is done for ASCII
by DNS-servers, an all-ASCII label is treated as case-insensitive. strings by DNS-servers, an all-ASCII label is treated as case-
However, if even one of the characters of that string is replaced by insensitive. However, if even one of the characters of that string
one that requires the label to be given IDN treatment (e.g., by is replaced by one that requires the label to be given IDN treatment
adding a diacritical mark), then the label immediately becomes case- (e.g., by adding a diacritical mark), then the label effectively
sensitive. This suggests that case mapping for Latin-based scripts becomes case-sensitive because only lower-case characters are
(and possibly other scripts with case distinctions) as a permitted in IDNs. This suggests that case mapping for Latin-based
scripts (and possibly other scripts with case distinctions) as a
preprocessing matter in applications may be wise to prevent user preprocessing matter in applications may be wise to prevent user
astonishment, but, since all applications may not do this and astonishment, but, since all applications may not do this and
ambiguity in transport is not desirable, the that case-dependent ambiguity in transport is not desirable, the that case-dependent
forms should not be stored in files. forms should not be stored in files.
The comments above apply only in operations that look up names or
interpret files. There are several reasons why registration
activities should require final names and verification of those names
by the would-be registrant.
7. Migration from IDNA2003 and Unicode Version Synchronization 7. Migration from IDNA2003 and Unicode Version Synchronization
7.1. Design Criteria 7.1. Design Criteria
As mentioned above and in RFC 4690, two key goals of the IDNA2008 As mentioned above and in RFC 4690, two key goals of the IDNA2008
design are to enable applications to be agnostic about whether they design are to enable applications to be agnostic about whether they
are being run in environments supporting any Unicode version from 3.2 are being run in environments supporting any Unicode version from 3.2
onward and to permit incrementally adding new characters, character onward and to permit incrementally adding new characters, character
groups, scripts, and other character collections as they are groups, scripts, and other character collections as they are
incorporated into Unicode, without disruption and, in the long term, incorporated into Unicode, without disruption and, in the long term,
skipping to change at page 24, line 49 skipping to change at page 26, line 34
7.1.1. General IDNA Validity Criteria 7.1.1. General IDNA Validity Criteria
The general criteria for a putative label, and the collection of The general criteria for a putative label, and the collection of
characters that make it up, to be considered IDNA-valid are (the characters that make it up, to be considered IDNA-valid are (the
actual rules are rigorously defined in the "Protocol" and "Tables" actual rules are rigorously defined in the "Protocol" and "Tables"
documents): documents):
o The characters are "letters", marks needed to form letters, o The characters are "letters", marks needed to form letters,
numerals, or other code points used to write words in some numerals, or other code points used to write words in some
language. Symbols, drawing characters, and various notational language. Symbols, drawing characters, and various notational
characters are permanently excluded -- some because they are characters are intended to be permanently excluded -- some because
actively dangerous in URI, IRI, or similar contexts and others they are harmful in URI, IRI, or similar contexts (e.g.,
because there is no evidence that they are important enough to characters that appear to be slashes or other reserved URI
Internet operations or internationalization to justify expansion punctuation) and others because there is no evidence that they are
of domain names beyond the general principle of "letters, digits, important enough to Internet operations or internationalization to
and hyphen" and the complexities that would come with it justify expansion of domain names beyond the general principle of
(additional discussion and rationale for the symbol decision "letters, digits, and hyphen" and the complexities that would come
appears in Section 7.6). with it (additional discussion and rationale for the symbol
decision appears in Section 7.6).
o Other than in very exceptional cases, e.g., where they are needed o Other than in very exceptional cases, e.g., where they are needed
to write substantially any word of a given language, punctuation to write substantially any word of a given language, punctuation
characters are excluded as well. The fact that a word exists is characters are excluded as well. The fact that a word exists is
not proof that it should be usable in a DNS label and DNS labels not proof that it should be usable in a DNS label and DNS labels
are not expected to be usable for multiple-word phrases (although are not expected to be usable for multiple-word phrases (although
they are certainly not prohibited if the conventions and they are certainly not prohibited if the conventions and
orthography of a particular language cause that to be possible). orthography of a particular language cause that to be possible).
Even for English, very common constructions -- contractions like Even for English, very common constructions -- contractions like
"don't" or "it's", names that are written with apostrophes such as "don't" or "it's", names that are written with apostrophes such as
"O'Reilly", or characters for which apostrophes are common "O'Reilly", or characters for which apostrophes are common
substitutes cannot be represented in DNS labels. Words in English substitutes cannot be represented in DNS labels. Words in English
whose usually-preferred spellings include diacritical marks cannot whose usually-preferred spellings include diacritical marks cannot
be represented under the original hostname rules, but most can be be represented under the original hostname rules, but most can be
represented if treated as IDNs. represented if treated as IDNs.
o Characters that are unassigned (have no character assignment at o Characters that are unassigned (have no character assignment at
all) in the version of Unicode being used by the registry or all) in the version of Unicode being used by the registry or
application are not permitted, even on lookup. There are at least application are not permitted, even on lookup. The issues
two reasons for this. involved in this decision are discussed in Section 7.7.
* Tests involving the context of characters (e.g., some
characters being permitted only adjacent to ones of specific
types but otherwise invisible or very problematic for other
reasons) and integrity tests on complete labels are needed.
Unassigned code points cannot be permitted because one cannot
determine whether particular code points will require
contextual rules (and what those rules should be) before
characters are assigned to them and the properties of those
characters fully understood.
* Unicode specifies that an unassigned code point normalizes (and
case folds) to itself. If the code point is later assigned to
a character, and particularly if the newly-assigned code point
has a combining class that determines its placement relative to
other combining characters, it could normalize to some other
code point or sequence, creating confusion and/or violating
other rules listed here.
o Any character that is mapped to another character by a current o Any character that is mapped to another character by a current
version of NFKC is prohibited as input to IDNA (for either version of NFKC is prohibited as input to IDNA (for either
registration or lookup). With a few exceptions, this principle registration or lookup). With a few exceptions, this principle
excludes any character mapped to another by Nameprep [RFC3491]. excludes any character mapped to another by Nameprep [RFC3491].
Tables used to identify the characters that are IDNA-valid are Tables used to identify the characters that are IDNA-valid are
expected to be driven by the principles above, principles that are expected to be driven by the principles above, principles that are
specified exactly in [IDNA2008-Tables]). The rules given there are specified exactly in [IDNA2008-Tables]). The rules given there are
normative, rather than being just an interpretation of the tables. normative, rather than being just an interpretation of the tables.
skipping to change at page 29, line 38 skipping to change at page 31, line 7
In principle, lookup applications could also compensate for the In principle, lookup applications could also compensate for the
difference in interpretation by looking up the string according to difference in interpretation by looking up the string according to
the interpretation specified in these documents and then, if that the interpretation specified in these documents and then, if that
failed, doing the lookup with the mapping, simulating the IDNA2003 failed, doing the lookup with the mapping, simulating the IDNA2003
interpretation. The risk of false positives is such that this is interpretation. The risk of false positives is such that this is
generally to be discouraged unless the application is able to engage generally to be discouraged unless the application is able to engage
in a "is this what you meant" dialogue with the end user. in a "is this what you meant" dialogue with the end user.
7.3. More Flexibility in User Agents 7.3. More Flexibility in User Agents
These specifications do not include mappings between one character or These documents do not specify mappings between one character or code
code point and others for any reason. Instead, they prohibit the point and others for any reason. Instead, they prohibit the
characters that would be mapped to others by normalization, upper characters that would be mapped to others by normalization, upper
case to lower case changes, or other rules. As examples, while case to lower case changes, or other rules. As examples, while
mathematical characters based on Latin ones are accepted as input to mathematical characters based on Latin ones are accepted as input to
IDNA2003, they are prohibited in IDNA2008. Similarly, double-width IDNA2003, they are prohibited in IDNA2008. Similarly, double-width
characters and other variations are prohibited as IDNA input. characters and other variations are prohibited as IDNA input.
Since the rules in [IDNA2008-Tables] have the effect that only Since the rules in [IDNA2008-Tables] have the effect that only
strings that are not transformed by NFKC are valid, if an application strings that are not transformed by NFKC are valid, if an application
chooses to perform NFKC normalization before lookup, that operation chooses to perform NFKC normalization before lookup, that operation
is safe since this will never make the application unable to look up is safe since this will never make the application unable to look up
skipping to change at page 32, line 24 skipping to change at page 33, line 39
2. Adjustments in IDNA tables or actions, including normalization 2. Adjustments in IDNA tables or actions, including normalization
definitions, that affect characters that were already invalid definitions, that affect characters that were already invalid
under IDNA2003. under IDNA2003.
3. Changes in the style of the IDNA definition that does not alter 3. Changes in the style of the IDNA definition that does not alter
the actions performed by IDNA. the actions performed by IDNA.
7.4.3. Implications of Prefix Changes 7.4.3. Implications of Prefix Changes
While it might be possible to make a prefix change, the costs of such While it might be possible to make a prefix change, the costs of such
a change are considerable. Even if they wanted to do so, all a change are considerable. Even if they wanted to do so, registries
registries could not convert all IDNA2003 ("xn--") registrations to a could not convert all IDNA2003 ("xn--") registrations to a new form
new form at the same time and synchronize that change with at the same time and synchronize that change with applications
applications supporting lookup. Unless all existing registrations supporting lookup. Unless all existing registrations were simply to
were simply to be declared invalid (and perhaps even then) systems be declared invalid (and perhaps even then) systems that needed to
that needed to support both labels with old prefixes and labels with support both labels with old prefixes and labels with new ones would
new ones would first process a putative label under the IDNA2008 first process a putative label under the IDNA2008 rules and try to
rules and try to look it up and then, if it were not found, would look it up and then, if it were not found, would process the label
process the label under IDNA2003 rules and look it up again. That under IDNA2003 rules and look it up again. That process could
process could significantly slow down all processing that involved significantly slow down all processing that involved IDNs in the DNS
IDNs in the DNS especially since, in principle, a fully-qualified especially since, in principle, a fully-qualified name could contain
name could contain a mixture of labels that were registered with the a mixture of labels that were registered with the old and new
old and new prefixes, a situation that would make the use of DNS prefixes, a situation that would make the use of DNS caching very
caching very difficult. In addition, looking up the same input difficult. In addition, looking up the same input string as two
string as two separate A-labels would create some potential for separate A-labels would create some potential for confusion and
confusion and attacks, since they could, in principle, map to attacks, since they could, in principle, map to different targets and
different targets and then resolve to different entries in the DNS. then resolve to different entries in the DNS.
Consequently, a prefix change is to be avoided if at all possible, Consequently, a prefix change is to be avoided if at all possible,
even if it means accepting some IDNA2003 decisions about character even if it means accepting some IDNA2003 decisions about character
distinctions as irreversible and/or giving special treatment to edge distinctions as irreversible and/or giving special treatment to edge
cases. cases.
7.5. Stringprep Changes and Compatibility 7.5. Stringprep Changes and Compatibility
The Nameprep [RFC3491] specification, a key part of IDNA2003, is a The Nameprep [RFC3491] specification, a key part of IDNA2003, is a
profile of Stringprep [RFC3454]. While Nameprep is a Stringprep profile of Stringprep [RFC3454]. While Nameprep is a Stringprep
skipping to change at page 34, line 13 skipping to change at page 35, line 30
read such a logo as "I love..." or "I heart...", considerable read such a logo as "I love..." or "I heart...", considerable
knowledge of the coding distinctions made in Unicode is needed to knowledge of the coding distinctions made in Unicode is needed to
know that there more than one "heart" character (e.g., U+2665, know that there more than one "heart" character (e.g., U+2665,
U+2661, and U+2765) and how to describe it. These issues are of U+2661, and U+2765) and how to describe it. These issues are of
particular importance if strings are expected to be understood or particular importance if strings are expected to be understood or
transcribed by the listener after being read out loud. transcribed by the listener after being read out loud.
[[anchor20: The above paragraph remains controversial as to [[anchor20: The above paragraph remains controversial as to
whether it is valid. The WG will need to make a decision if this whether it is valid. The WG will need to make a decision if this
section is not dropped entirely.]] section is not dropped entirely.]]
o Consider the case of a screen reader used by blind Internet users
who must listen to renderings of IDN domain names and possibly
reproduce them on the keyboard.
o As a simplified example of this, assume one wanted to use a o As a simplified example of this, assume one wanted to use a
"heart" or "star" symbol in a label. This is problematic because "heart" or "star" symbol in a label. This is problematic because
those names are ambiguous in the Unicode system of naming (the those names are ambiguous in the Unicode system of naming (the
actual Unicode names require far more qualification). A user or actual Unicode names require far more qualification). A user or
would-be registrant has no way to know -- absent careful study of would-be registrant has no way to know -- absent careful study of
the code tables -- whether it is ambiguous (e.g., where there are the code tables -- whether it is ambiguous (e.g., where there are
multiple "heart" characters) or not. Conversely, the user seeing multiple "heart" characters) or not. Conversely, the user seeing
the hypothetical label doesn't know whether to read it -- try to the hypothetical label doesn't know whether to read it -- try to
transmit it to a colleague by voice -- as "heart", as "love", as transmit it to a colleague by voice -- as "heart", as "love", as
"black heart", or as any of the other examples below. "black heart", or as any of the other examples below.
skipping to change at page 35, line 11 skipping to change at page 36, line 32
In IDNA2003, labels containing unassigned code points are looked up In IDNA2003, labels containing unassigned code points are looked up
on the assumption that, if they appear in labels and can be mapped on the assumption that, if they appear in labels and can be mapped
and then resolved, the relevant standards must have changed and the and then resolved, the relevant standards must have changed and the
registry has properly allocated only assigned values. registry has properly allocated only assigned values.
In the protocol as described in these documents, strings containing In the protocol as described in these documents, strings containing
unassigned code points must not be either looked up or registered. unassigned code points must not be either looked up or registered.
There are several reasons for this, with the most important ones There are several reasons for this, with the most important ones
being: being:
o It cannot be known with sufficient reliability in advance that a o It cannot be known in advance, and with sufficient reliability,
code point that was not previously assigned will not be assigned that a code point that was not previously assigned will not be
to a compatibility character or one that would be otherwise assigned to a compatibility character or one that would be
disallowed by the rules in [IDNA2008-Tables]. In IDNA2003, since otherwise disallowed by the rules in [IDNA2008-Tables]. In
there is no direct dependency on NFKC (Stringprep's tables are IDNA2003, since there is no direct dependency on NFKC
based on NFKC, but IDNA2003 depends only on Stringprep), (Stringprep's tables are based on NFKC, but IDNA2003 depends only
allocation of a compatibility character might produce some odd on Stringprep), allocation of a compatibility character might
situations, but it would not be a problem. In IDNA2008, where produce some odd situations, but it would not be a problem. In
compatibility characters are generally assigned to DISALLOWED, IDNA2008, where compatibility characters are assigned to
DISALLOWED unless character-specific exceptions are made,
permitting strings containing unassigned characters to be looked permitting strings containing unassigned characters to be looked
up would permit violating the principle that characters in up would permit violating the principle that characters in
DISALLOWED are not looked up. DISALLOWED are not looked up.
o The Unicode Standard specifies that an unassigned code point
normalizes (and, where relevant, case folds) to itself. If the
code point is later assigned to a character, and particularly if
the newly-assigned code point has a combining class that
determines its placement relative to other combining characters,
it could normalize to some other code point or sequence, creating
confusion and/or violating other rules listed here.
o Tests involving the context of characters (e.g., some characters
being permitted only adjacent to ones of specific types but
otherwise invisible or very problematic for other reasons) and
integrity tests on complete labels are needed. Unassigned code
points cannot be permitted because one cannot determine whether
particular code points will require contextual rules (and what
those rules should be) before characters are assigned to them and
the properties of those characters fully understood.
o More generally, the status of an unassigned character with regard o More generally, the status of an unassigned character with regard
to the DISALLOWED and PROTOCOL-VALID categories, and whether to the DISALLOWED and PROTOCOL-VALID categories, and whether
contextual rules are required with the latter, cannot be evaluated contextual rules are required with the latter, cannot be evaluated
until a character is actually assigned and known. By contrast, until a character is actually assigned and known. By contrast,
characters that are actually DISALLOWED are placed in that characters that are actually DISALLOWED are placed in that
category only as a consequence of rules applied to known category only as a consequence of rules applied to known
properties or per-character evaluation. properties or per-character evaluation.
Another way to look at this is that permitting an unassigned
character to be looked up is nearly equivalent to reclassifying a
character from DISALLOWED to PROTOCOL-VALID since different systems
will interpret the character in different ways.
It is possible to argue that the issues above are not important and It is possible to argue that the issues above are not important and
that, as a consequence, it is better to retain the principle of that, as a consequence, it is better to retain the principle of
looking up labels even if they contain unassigned characters because looking up labels even if they contain unassigned characters because
all of the important scripts and characters have been coded as of all of the important scripts and characters have been coded as of
Unicode 5.1 and hence unassigned code points will be assigned only to Unicode 5.1 and hence unassigned code points will be assigned only to
obscure characters or archaic scripts. Unfortunately, that does not obscure characters or archaic scripts. Unfortunately, that does not
appear to be a safe assumption for at least two reasons. First, much appear to be a safe assumption for at least two reasons. First, much
the same claim of completeness has been made for earlier versions of the same claim of completeness has been made for earlier versions of
Unicode. The reality is that a script that is obscure to much of the Unicode. The reality is that a script that is obscure to much of the
world may still be very important to those who use it. Cultural and world may still be very important to those who use it. Cultural and
skipping to change at page 36, line 18 skipping to change at page 38, line 12
containing that character but that is otherwise in ASCII is not containing that character but that is otherwise in ASCII is not
really an IDN (in the U-label sense defined above) at all. After really an IDN (in the U-label sense defined above) at all. After
Nameprep maps the Eszett out, the result is an ASCII string and so Nameprep maps the Eszett out, the result is an ASCII string and so
does not get an xn-- prefix, but the string that can be displayed to does not get an xn-- prefix, but the string that can be displayed to
a user appears to be an IDN. The newer version of the protocol a user appears to be an IDN. The newer version of the protocol
eliminates this artifact. A character is either permitted as itself eliminates this artifact. A character is either permitted as itself
or it is prohibited; special cases that make sense only in a or it is prohibited; special cases that make sense only in a
particular linguistic or cultural context can be dealt with as particular linguistic or cultural context can be dealt with as
localization matters where appropriate. localization matters where appropriate.
8. Acknowledgments 8. Name Server Considerations
The editor and contributors would like to express their thanks to 8.1. Processing Non-ASCII Strings
those who contributed significant early (pre-WG) review comments,
sometimes accompanied by text, especially Mark Davis, Paul Hoffman,
Simon Josefsson, and Sam Weiler. In addition, some specific ideas
were incorporated from suggestions, text, or comments about sections
that were unclear supplied by Frank Ellerman, Michael Everson, Asmus
Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler,
although, as usual, they bear little or no responsibility for the
conclusions the editor and contributors reached after receiving their
suggestions. Thanks are also due to Vint Cerf, Debbie Garside, and
Jefsey Morphin for conversations that led to considerable
improvements in the content of this document.
A meeting was held on 30 January 2008 to attempt to reconcile Existing DNS servers do not know the IDNA rules for handling non-
differences in perspective and terminology about this set of ASCII forms of IDNs, and therefore need to be shielded from them.
specifications between the design team and members of the Unicode All existing channels through which names can enter a DNS server
Technical Consortium. The discussions at and subsequent to that database (for example, master files (as described in RFC 1034) and
meeting were very helpful in focusing the issues and in refining the DNS update messages [RFC2136]) are IDN-unaware because they predate
specifications. The active participants at that meeting were (in IDNA. Other sections of this document provide the needed shielding
alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, by ensuring that internationalized domain names entering DNS server
Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary databases through such channels have already been converted to their
Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, equivalent ASCII A-label forms.
Michel Suignard, and Ken Whistler. We express our thanks to Google
for support of that meeting and to the participants for their
contributions.
Useful comments and text on the WG versions of the draft were Because of the distinction made between the algorithms for
received from many participants in the IETF "IDNABIS" WG and a number Registration and Lookup in [IDNA2008-Protocol] (a domain name
of document changes resulted from mailing list discussions made by containing only ASCII codepoints can not be converted to an A-label),
that group. Marcos Sanz provided specific analysis and suggestions there can not be more than one A-label form for any given U-label.
that were exceptionally helpful in refining the text, as did Vint
Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan.
9. Contributors As specified in RFC 2181 [RFC2181], the DNS protocol explicitly
allows domain labels to contain octets beyond the ASCII range
(0000..007F), and this document does not change that. Note, however,
that there is no defined interpretation of octets 0080..00FF as
characters. If labels containing these octets are returned to
applications, unpredictable behavior could result. The A-label form,
which cannot contain those characters, is the only standard
representation for internationalized labels in the DNS protocol.
While the listed editor held the pen, this core of this document and 8.2. DNSSEC Authentication of IDN Domain Names
the initial WG version represents the joint work and conclusions of
an ad hoc design team consisting of the editor and, in alphabetic
order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp.
In addition, there were many specific contributions and helpful
comments from those listed in the Acknowledgments section and others
who have contributed to the development and use of the IDNA
protocols.
10. Internationalization Considerations DNS Security (DNSSEC) [RFC2535] is a method for supplying
cryptographic verification information along with DNS messages.
Public Key Cryptography is used in conjunction with digital
signatures to provide a means for a requester of domain information
to authenticate the source of the data. This ensures that it can be
traced back to a trusted source, either directly or via a chain of
trust linking the source of the information to the top of the DNS
hierarchy.
IDNA specifies that all internationalized domain names served by DNS
servers that cannot be represented directly in ASCII MUST use the
A-label form. Conversion to A-labels MUST be performed prior to a
zone being signed by the private key for that zone. Because of this
ordering, it is important to recognize that DNSSEC authenticates a
domain name containing A-labels or conventional LDH-labels, not
U-labels. In the presence of DNSSEC, no form of a zone file or query
response that contains a U-label may be signed or the signature
validated.
One consequence of this for sites deploying IDNA in the presence of
DNSSEC is that any special purpose proxies or forwarders used to
transform user input into IDNs must be earlier in the lookup flow
than DNSSEC authenticating nameservers for DNSSEC to work.
8.3. Root and other DNS Server Considerations
IDNs in A-label form will generally be somewhat longer than current
domain names, so the bandwidth needed by the root servers is likely
to go up by a small amount. Also, queries and responses for IDNs
will probably be somewhat longer than typical queries historically,
so EDNS0 [RFC2671] support may be more important (otherwise, queries
and responses may be forced to go to TCP instead of UDP).
9. Internationalization Considerations
DNS labels and fully-qualified domain names provide mnemonics that DNS labels and fully-qualified domain names provide mnemonics that
assist in identifying and referring to resources on the Internet. assist in identifying and referring to resources on the Internet.
IDNs expand the range of those mnemonics to include those based on IDNs expand the range of those mnemonics to include those based on
languages and character sets other than Western European and Roman- languages and character sets other than Western European and Roman-
derived ones. But domain "names" are not, in general, words in any derived ones. But domain "names" are not, in general, words in any
language. The recommendations of the IETF policy on character sets language. The recommendations of the IETF policy on character sets
and languages, BCP 18 [RFC2277] are applicable to situations in which and languages, BCP 18 [RFC2277] are applicable to situations in which
language identification is used to provide language-specific language identification is used to provide language-specific
contexts. The DNS is, by contrast, global and international and contexts. The DNS is, by contrast, global and international and
ultimately has nothing to do with languages. Adding languages (or ultimately has nothing to do with languages. Adding languages (or
similar context) to IDNs generally, or to DNS matching in particular, similar context) to IDNs generally, or to DNS matching in particular,
would imply context dependent matching in DNS, which would be a very would imply context dependent matching in DNS, which would be a very
significant change to the DNS protocol itself. It would also imply significant change to the DNS protocol itself. It would also imply
that users would need to identify the language associated with a that users would need to identify the language associated with a
particular label in order to look that label up, a decision that particular label in order to look that label up, a decision that
would be impossible in many or most cases. would be impossible in many or most cases.
11. IANA Considerations 10. IANA Considerations
This section gives an overview of registries required for IDNA. The This section gives an overview of registries required for IDNA. The
actual definitions of the first two appear in [IDNA2008-Tables]. actual definitions of the first two appear in [IDNA2008-Tables].
11.1. IDNA Character Registry 10.1. IDNA Character Registry
The distinction among the three major categories "UNASSIGNED", The distinction among the three major categories "UNASSIGNED",
"DISALLOWED", and "PROTOCOL-VALID" is made by special categories and "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and
rules that are integral elements of [IDNA2008-Tables]. Convenience rules that are integral elements of [IDNA2008-Tables]. Convenience
in programming and validation requires a registry of characters and in programming and validation requires a registry of characters and
scripts and their categories, updated for each new version of Unicode scripts and their categories, updated for each new version of Unicode
and the characters it contains. The details of this registry are and the characters it contains. The details of this registry are
specified in [IDNA2008-Tables]. specified in [IDNA2008-Tables].
11.2. IDNA Context Registry 10.2. IDNA Context Registry
For characters that are defined in the IDNA Character Registry list For characters that are defined in the IDNA Character Registry list
as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of
rule described in Section 3.1.1.1), IANA will create and maintain a rule described in Section 3.1.1.1), IANA will create and maintain a
list of approved contextual rules. The details for those rules list of approved contextual rules. The details for those rules
appear in [IDNA2008-Tables]. appear in [IDNA2008-Tables].
11.3. IANA Repository of IDN Practices of TLDs 10.3. IANA Repository of IDN Practices of TLDs
This registry, historically described as the "IANA Language Character This registry, historically described as the "IANA Language Character
Set Registry" or "IANA Script Registry" (both somewhat misleading Set Registry" or "IANA Script Registry" (both somewhat misleading
terms) is maintained by IANA at the request of ICANN. It is used to terms) is maintained by IANA at the request of ICANN. It is used to
provide a central documentation repository of the IDN policies used provide a central documentation repository of the IDN policies used
by top level domain (TLD) registries who volunteer to contribute to by top level domain (TLD) registries who volunteer to contribute to
it and is used in conjunction with ICANN Guidelines for IDN use. it and is used in conjunction with ICANN Guidelines for IDN use.
It is not an IETF-managed registry and, while the protocol changes It is not an IETF-managed registry and, while the protocol changes
specified here may call for some revisions to the tables, these specified here may call for some revisions to the tables, these
specifications have no direct effect on that registry and no IANA specifications have no direct effect on that registry and no IANA
action is required as a result. action is required as a result.
12. Security Considerations 11. Security Considerations
12.1. General Security Issues with IDNA 11.1. General Security Issues with IDNA
This document in the IDNA2008 series is purely explanatory and This document in the IDNA2008 series is purely explanatory and
informational and consequently introduces no new security issues. It informational and consequently introduces no new security issues. It
would, of course, be a poor idea for someone to try to implement from would, of course, be a poor idea for someone to try to implement from
it; such an attempt would almost certainly lead to interoperability it; such an attempt would almost certainly lead to interoperability
problems and might lead to security ones. A discussion of security problems and might lead to security ones. A discussion of security
issues with IDNA, including some relevant history, appears in issues with IDNA, including some relevant history, appears in
[IDNA2008-Defs]. [IDNA2008-Defs].
13. References 12. Acknowledgments
13.1. Normative References The editor and contributors would like to express their thanks to
those who contributed significant early (pre-WG) review comments,
sometimes accompanied by text, especially Mark Davis, Paul Hoffman,
Simon Josefsson, and Sam Weiler. In addition, some specific ideas
were incorporated from suggestions, text, or comments about sections
that were unclear supplied by Vint Cerf, Frank Ellerman, Michael
Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken
Whistler, although, as usual, they bear little or no responsibility
for the conclusions the editor and contributors reached after
receiving their suggestions. Thanks are also due to Vint Cerf,
Debbie Garside, and Jefsey Morfin for conversations that led to
considerable improvements in the content of this document.
A meeting was held on 30 January 2008 to attempt to reconcile
differences in perspective and terminology about this set of
specifications between the design team and members of the Unicode
Technical Consortium. The discussions at and subsequent to that
meeting were very helpful in focusing the issues and in refining the
specifications. The active participants at that meeting were (in
alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam,
Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary
Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel,
Michel Suignard, and Ken Whistler. We express our thanks to Google
for support of that meeting and to the participants for their
contributions.
Useful comments and text on the WG versions of the draft were
received from many participants in the IETF "IDNABIS" WG and a number
of document changes resulted from mailing list discussions made by
that group. Marcos Sanz provided specific analysis and suggestions
that were exceptionally helpful in refining the text, as did Vint
Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan.
13. Contributors
While the listed editor held the pen, this core of this document and
the initial WG version represents the joint work and conclusions of
an ad hoc design team consisting of the editor and, in alphabetic
order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp.
In addition, there were many specific contributions and helpful
comments from those listed in the Acknowledgments section and others
who have contributed to the development and use of the IDNA
protocols.
14. References
14.1. Normative References
[ASCII] American National Standards Institute (formerly United [ASCII] American National Standards Institute (formerly United
States of America Standards Institute), "USA Code for States of America Standards Institute), "USA Code for
Information Interchange", ANSI X3.4-1968, 1968. Information Interchange", ANSI X3.4-1968, 1968.
ANSI X3.4-1968 has been replaced by newer versions with ANSI X3.4-1968 has been replaced by newer versions with
slight modifications, but the 1968 version remains slight modifications, but the 1968 version remains
definitive for the Internet. definitive for the Internet.
[IDNA2008-Bidi] [IDNA2008-Bidi]
skipping to change at page 40, line 5 skipping to change at page 43, line 15
[Unicode51] [Unicode51]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
5.1.0", 2008. 5.1.0", 2008.
defined by: The Unicode Standard, Version 5.0, Boston, MA, defined by: The Unicode Standard, Version 5.0, Boston, MA,
Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by
Unicode 5.1.0 Unicode 5.1.0
(http://www.unicode.org/versions/Unicode5.1.0/). (http://www.unicode.org/versions/Unicode5.1.0/).
13.2. Informative References 14.2. Informative References
[BIG5] Institute for Information Industry of Taiwan, "Computer [BIG5] Institute for Information Industry of Taiwan, "Computer
Chinese Glyph and Character Code Mapping Table, Technical Chinese Glyph and Character Code Mapping Table, Technical
Report C-26", 1984. Report C-26", 1984.
There are several forms and variations and a closely- There are several forms and variations and a closely-
related standard, CNS 11643. See the discussion in related standard, CNS 11643. See the discussion in
Chapter 3 of Lunde, K., CJKV Information Processing, Chapter 3 of Lunde, K., CJKV Information Processing,
O'Reilly & Associates, 1999 O'Reilly & Associates, 1999
skipping to change at page 40, line 36 skipping to change at page 43, line 46
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987. STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987. specification", STD 13, RFC 1035, November 1987.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", STD 3, RFC 1123, October 1989. and Support", STD 3, RFC 1123, October 1989.
[RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
"Dynamic Updates in the Domain Name System (DNS UPDATE)",
RFC 2136, April 1997.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997. Specification", RFC 2181, July 1997.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998. Languages", BCP 18, RFC 2277, January 1998.
[RFC2535] Eastlake, D., "Domain Name System Security Extensions",
RFC 2535, March 1999.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
RFC 2671, August 1999.
[RFC2673] Crawford, M., "Binary Labels in the Domain Name System", [RFC2673] Crawford, M., "Binary Labels in the Domain Name System",
RFC 2673, August 1999. RFC 2673, August 1999.
[RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)", RFC 2782, specifying the location of services (DNS SRV)", RFC 2782,
February 2000. February 2000.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454, Internationalized Strings ("stringprep")", RFC 3454,
December 2002. December 2002.
skipping to change at page 44, line 18 skipping to change at page 47, line 38
may be a more appropriate reference than one containing a year. may be a more appropriate reference than one containing a year.
As discussed on the mailing list, we can and should discuss how to As discussed on the mailing list, we can and should discuss how to
refer to these documents at an appropriate time (e.g., when we refer to these documents at an appropriate time (e.g., when we
know when we will be finished) but, in the interim, it seems know when we will be finished) but, in the interim, it seems
appropriate to simply start getting rid of the version-specific appropriate to simply start getting rid of the version-specific
terminology where it can naturally be removed. terminology where it can naturally be removed.
o Additional discussion of mappings, etc., especially for case- o Additional discussion of mappings, etc., especially for case-
sensitivity. sensitivity.
o Clarified relationship to base DNS specifications.
o Consolidated discussion of lookup of unassigned characters.
o More editorial fine-tuning. o More editorial fine-tuning.
A.7. Version -07
o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label
(or False-A-label), R-LDH-label, valid IDNA-label in
Section 1.3.3.
o Moved the "name server considerations" material to this document
from Protocol because it is non-normative and not part of the
protocol itself.
o To improve clarity, redid discussion of the reasons why looking up
unassigned code points is prohibited.
o Editorial and other non-substantive corrections to reflect earlier
errors as well as new definitions and terminology.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
 End of changes. 65 change blocks. 
241 lines changed or deleted 401 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/