< draft-ietf-idnabis-rationale-10.txt   draft-ietf-idnabis-rationale-11.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft June 18, 2009 Internet-Draft August 13, 2009
Intended status: Informational Intended status: Informational
Expires: December 20, 2009 Expires: February 14, 2010
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-10.txt draft-ietf-idnabis-rationale-11.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. This document may contain material provisions of BCP 78 and BCP 79. This document may contain material
from IETF Documents or IETF Contributions published or made publicly from IETF Documents or IETF Contributions published or made publicly
available before November 10, 2008. The person(s) controlling the available before November 10, 2008. The person(s) controlling the
copyright in some of this material may not have granted the IETF copyright in some of this material may not have granted the IETF
Trust the right to allow modifications of such material outside the Trust the right to allow modifications of such material outside the
IETF Standards Process. Without obtaining an adequate license from IETF Standards Process. Without obtaining an adequate license from
skipping to change at page 1, line 43 skipping to change at page 1, line 43
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 20, 2009. This Internet-Draft will expire on February 14, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info). publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 27 skipping to change at page 2, line 27
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 1.3.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5
1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 1.3.2. New Terminology and Restrictions . . . . . . . . . . . 6
1.3.3. New Terminology and Restrictions . . . . . . . . . . . 6 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7
1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8
2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9
3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9
3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10
3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11
3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12
3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12
3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 14 Applications . . . . . . . . . . . . . . . . . . . . . . . 14
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15
4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15
4.2. Entry and Display in Applications . . . . . . . . . . . . 16 4.2. Entry and Display in Applications . . . . . . . . . . . . 16
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 17 Alternate Character Forms . . . . . . . . . . . . . . . . 18
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21
6. Front-end and User Interface Processing for Lookup . . . . . . 20 6. Front-end and User Interface Processing for Lookup . . . . . . 22
7. Migration from IDNA2003 and Unicode Version Synchronization . 24 7. Migration from IDNA2003 and Unicode Version Synchronization . 24
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24
7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 24 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25
7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25
7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26
7.2. Changes in Character Interpretations . . . . . . . . . . . 27 7.2. Changes in Character Interpretations . . . . . . . . . . . 27
7.3. More Flexibility in User Agents . . . . . . . . . . . . . 28 7.3. Character Mapping . . . . . . . . . . . . . . . . . . . . 29
7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 29
7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 29
7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 30
7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 30
7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31
7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 31
7.7. Migration Between Unicode Versions: Unassigned Code 7.7. Migration Between Unicode Versions: Unassigned Code
Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 34
8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35
8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 36 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 35
8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 36 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 35
8.3. Root and other DNS Server Considerations . . . . . . . . . 37 8.3. Root and other DNS Server Considerations . . . . . . . . . 36
9. Internationalization Considerations . . . . . . . . . . . . . 37 9. Internationalization Considerations . . . . . . . . . . . . . 36
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36
10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 38 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37
10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37
10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37
11. Security Considerations . . . . . . . . . . . . . . . . . . . 38 11. Security Considerations . . . . . . . . . . . . . . . . . . . 37
11.1. General Security Issues with IDNA . . . . . . . . . . . . 38 11.1. General Security Issues with IDNA . . . . . . . . . . . . 37
12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38
13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 38
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39
14.1. Normative References . . . . . . . . . . . . . . . . . . . 40 14.1. Normative References . . . . . . . . . . . . . . . . . . . 39
14.2. Informative References . . . . . . . . . . . . . . . . . . 41 14.2. Informative References . . . . . . . . . . . . . . . . . . 40
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 43 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 42
A.1. Changes between Version -00 and Version -01 of A.1. Changes between Version -00 and Version -01 of
draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43 draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 42
A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 44 A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 43
A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 44 A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 43
A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44 A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44
A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 45 A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 44
A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 45 A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 44
A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46 A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 45
A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46 A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 45
A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46 A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 45
A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47 A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 46
A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 46
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
Internationalized Domain Names in Applications (IDNA) is a collection Internationalized Domain Names in Applications (IDNA) is a collection
of standards that allow client applications to convert some Unicode of standards that allow client applications to convert some Unicode
mnemonics to an ASCII-compatible encoding form ("ACE") which is a mnemonics to an ASCII-compatible encoding form ("ACE") which is a
valid DNS label containing only letters, digits, and hyphens. The valid DNS label containing only letters, digits, and hyphens. The
specific form of ACE label used by IDNA is called an "A-label". A specific form of ACE label used by IDNA is called an "A-label". A
client can look up an exact A-label in the existing DNS, so A-labels client can look up an exact A-label in the existing DNS, so A-labels
do not require any extensions to DNS, upgrades of DNS servers or do not require any extensions to DNS, upgrades of DNS servers or
updates to low-level client libraries. An A-label is recognizable updates to low-level client libraries. An A-label is recognizable
from the prefix "xn--" before the characters produced by the Punycode from the prefix "xn--" before the characters produced by the Punycode
algorithm [RFC3492], thus a user application can identify an A-label algorithm [RFC3492], thus a user application can identify an A-label
and convert it into Unicode (or some local coded character set) for and convert it into Unicode (or some local coded character set) for
display. display.
[[anchor3: Note in draft: The above discussion, and the rest of the
text in this section, are very informal. In particular, the term
"A-label" is used to refer to some things that don't meet all of the
tests for A-labels. I have tightened it somewhat from the suggested
text I received, but not very much. Is the current form ok with
everyone???]]
On the registry side, IDNA allows a registry to offer On the registry side, IDNA allows a registry to offer
Internationalized Domain Names (IDNs) for registration as A-labels. Internationalized Domain Names (IDNs) for registration as A-labels.
A registry may offer any subset of valid IDNs, and may apply any A registry may offer any subset of valid IDNs, and may apply any
restrictions or bundling (grouping of similar labels together in one restrictions or bundling (grouping of similar labels together in one
registration) appropriate for the context of that registry. registration) appropriate for the context of that registry.
Registration of labels is sometimes discussed separately from lookup, Registration of labels is sometimes discussed separately from lookup,
and is subject to a few specific requirements that do not apply to and is subject to a few specific requirements that do not apply to
lookup. lookup.
DNS clients and registries are subject to some differences in DNS clients and registries are subject to some differences in
requirements for handling IDNs. In particular, registries are urged requirements for handling IDNs. In particular, registries are urged
to register only exact, valid A-labels, while clients might do some to register only exact, valid A-labels, while clients might do some
mapping to get from otherwise-invalid user input to a valid A-label. mapping to get from otherwise-invalid user input to a valid A-label.
The first version of IDNA was published in 2003 and is referred to The first version of IDNA was published in 2003 and is referred to
here as IDNA2003 to contrast it with the current version, which is here as IDNA2003 to contrast it with the current version, which is
known as IDNA2008. The documents that made up both versions are known as IDNA2008 (after the year in which IETF work started on it).
listed in Section 1.3.1. The characters that are valid in A-labels IDNA2003 consists of four documents: the IDNA base specification
are identified from rules listed in the Tables document [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep
[IDNA2008-Tables], but validity can be derived from the Unicode [RFC3454]. The current set of documents, IDNA2008, are not dependent
properties of those characters with a very few exceptions. on any of the IDNA2003 specifications other than the one for Punycode
encoding. References to "these specifications" or "these documents"
are to the entire IDNA2008 set listed in [IDNA2008-Defs]. The
characters that are valid in A-labels are identified from rules
listed in the Tables document [IDNA2008-Tables], but validity can be
derived from the Unicode properties of those characters with a very
few exceptions.
Traditionally, DNS labels are case-insensitive [RFC1034][RFC1035]. Traditionally, DNS labels are matched case-insensitively
That pattern was preserved in IDNA2003, but if case rules are [RFC1034][RFC1035]. That convention was preserved in IDNA2003 by a
enforced from one language, another language sometimes loses the case-folding operation that generally maps capital letters into
ability to treat two characters separately. Case-sensitivity is lower-case ones. However, if case rules are enforced from one
treated slightly differently in IDNA2008. language, another language sometimes loses the ability to treat two
characters separately. Case-sensitivity is treated slightly
differently in IDNA2008.
IDNA2003 used Unicode version 3.2 only. In order to keep up with new IDNA2003 used Unicode version 3.2 only. In order to keep up with new
characters added in new versions of UNICODE, IDNA2008 decouples its characters added in new versions of UNICODE, IDNA2008 decouples its
rules from any particular version of UNICODE. Instead, the rules from any particular version of UNICODE. Instead, the
attributes of new characters in Unicode determines how and whether attributes of new characters in Unicode, supplemented by a small
the characters can be used in IDNA labels. number of exception cases, determine how and whether the characters
can be used in IDNA labels.
This document provides informational context for IDNA2008, including This document provides informational context for IDNA2008, including
terminology, background, and policy discussions. terminology, background, and policy discussions.
1.2. Discussion Forum 1.2. Discussion Forum
[[ RFC Editor: please remove this section. ]] [[ RFC Editor: please remove this section. ]]
IDNA2008 is being discussed in the IETF "idnabis" Working Group and IDNA2008 is being discussed in the IETF "idnabis" Working Group and
on the mailing list idna-update@alvestrand.no on the mailing list idna-update@alvestrand.no
1.3. Terminology 1.3. Terminology
Terminology for IDNA2008 appears in [IDNA2008-Defs]. That document Terminology for IDNA2008 appears in [IDNA2008-Defs]. That document
also contains a roadmap to the IDNA2008 document collection. No also contains a roadmap to the IDNA2008 document collection. No
attempt should be made to understand this document without the attempt should be made to understand this document without the
definitions and concepts that appear there. definitions and concepts that appear there.
1.3.1. Documents and Standards 1.3.1. DNS "Name" Terminology
This document uses the term "IDNA2003" to refer to the set of
standards published in 2003 to define IDNA: the IDNA base
specification [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and
Stringprep [RFC3454].
The term "IDNA2008" is used to refer to a new version of IDNA.
IDNA2008 is not dependent on any of the IDNA2003 specifications other
than the one for Punycode encoding. References to "these
specifications" or "these documents" are to the entire IDNA2008 set
listed in [IDNA2008-Defs].
1.3.2. DNS "Name" Terminology
In the context of IDNs, the DNS term 'name' has introduced some In the context of IDNs, the DNS term "name" has introduced some
confusion as people speak of DNS labels in terms of the words or confusion as people speak of DNS labels in terms of the words or
phrases of various natural languages. Historically, many of the phrases of various natural languages. Historically, many of the
"names" in the DNS have been mnemonics to identify some particular "names" in the DNS have been mnemonics to identify some particular
concept, object, or organization. They are typically rooted in some concept, object, or organization. They are typically rooted in some
language because most people think in language-based ways. But, language because most people think in language-based ways. But,
because they are mnemonics, they need not obey the orthographic because they are mnemonics, they need not obey the orthographic
conventions of any language: it is not a requirement that it be conventions of any language: it is not a requirement that it be
possible for them to be "words". possible for them to be "words".
This distinction is important because the reasonable goal of an IDN This distinction is important because the reasonable goal of an IDN
effort is not to be able to write the great Klingon (or language of effort is not to be able to write the great Klingon (or language of
one's choice) novel in DNS labels but to be able to form a usefully one's choice) novel in DNS labels but to be able to form a usefully
broad range of mnemonics in ways that are as natural as possible in a broad range of mnemonics in ways that are as natural as possible in a
very broad range of scripts. very broad range of scripts.
1.3.3. New Terminology and Restrictions 1.3.2. New Terminology and Restrictions
These documents introduce new terminology, and precise definitions, These documents introduce new terminology, and precise definitions
for the terms "U-label", "A-Label", LDH-label (to which all valid (in [IDNA2008-Defs]), for the terms "U-label", "A-Label", LDH-label
pre-IDNA host names conformed), Reserved-LDH-label (R-LDH-label), XN- (to which all valid pre-IDNA host names conformed), Reserved-LDH-
label, Fake-A-Label, and Non-Reserved-LDH-label (NR-LDH-label). label (R-LDH-label), XN-label, Fake-A-Label, and Non-Reserved-LDH-
label (NR-LDH-label).
In addition, the term "putative label" has been adopted to refer to a In addition, the term "putative label" has been adopted to refer to a
label that may appear to meet certain definitional constraints but label that may appear to meet certain definitional constraints but
has not yet been sufficiently tested for validity. has not yet been sufficiently tested for validity.
These definitions are illustrated in Figure 1 of the Definitions These definitions are also illustrated in Figure 1 of the Definitions
Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and
fourth character from the beginning of the label. In IDNA-aware fourth character from the beginning of the label. In IDNA-aware
applications, only a subset of these reserved labels is permitted to applications, only a subset of these reserved labels is permitted to
be used, namely the A-label subset. A-labels are a subset of the be used, namely the A-label subset. A-labels are a subset of the
R-LDH-labels that begin with the case-insensitive string "xn--". R-LDH-labels that begin with the case-insensitive string "xn--".
Labels that bear this prefix but which are not otherwise valid fall Labels that bear this prefix but which are not otherwise valid fall
into the "Fake-A-label" category. The non-reserved labels (NR-LDH- into the "Fake-A-label" category. The non-reserved labels (NR-LDH-
labels) are implicitly valid since they do not trigger any labels) are implicitly valid since they do not trigger any
resemblance to IDNA-landr NR-LDH-labels. resemblance to IDNA-landr NR-LDH-labels.
skipping to change at page 7, line 5 skipping to change at page 6, line 40
o to prevent confusion with pre-IDNA coding forms; o to prevent confusion with pre-IDNA coding forms;
o to permit future extensions that would require changing the o to permit future extensions that would require changing the
prefix, no matter how unlikely those might be (see Section 7.4); prefix, no matter how unlikely those might be (see Section 7.4);
and and
o to reduce the opportunities for attacks via the Punycode encoding o to reduce the opportunities for attacks via the Punycode encoding
algorithm itself. algorithm itself.
As with other documents in the IDNA2008 set, this document uses the
term "registry" to describe any zone in the DNS. That term, and the
terms "zone" or "zone administration", are interchangeable.
1.4. Objectives 1.4. Objectives
These are the main objectives in revising IDNA. These are the main objectives in revising IDNA.
o Use a more recent version of Unicode, and allow IDNA to be o Use a more recent version of Unicode, and allow IDNA to be
independent of Unicode versions, so that IDNA2008 need not be independent of Unicode versions, so that IDNA2008 need not be
update for implementations to adopt codepoints from new Unicode updated for implementations to adopt codepoints from new Unicode
versions. versions.
o Fix a very small number of code-point categorizations that have o Fix a very small number of code-point categorizations that have
turned out to cause problems in the communities that use those turned out to cause problems in the communities that use those
code-points. code-points.
o Reduce the dependency on mapping, in order that the pre-mapped o Reduce the dependency on mapping, in order that the pre-mapped
forms (which are not valid IDNA labels) tend to appear less often forms (which are not valid IDNA labels) tend to appear less often
in various contexts, in favor of valid A-labels. in various contexts, in favor of valid A-labels.
skipping to change at page 7, line 49 skipping to change at page 7, line 40
agents. The introduction of the larger repertoire of characters agents. The introduction of the larger repertoire of characters
potentially makes the set of misspellings larger, especially given potentially makes the set of misspellings larger, especially given
that in some cases the same appearance, for example on a business that in some cases the same appearance, for example on a business
card, might visually match several Unicode code points or several card, might visually match several Unicode code points or several
sequences of code points. sequences of code points.
The IDNA standard does not require any applications to conform to it, The IDNA standard does not require any applications to conform to it,
nor does it retroactively change those applications. An application nor does it retroactively change those applications. An application
can elect to use IDNA in order to support IDN while maintaining can elect to use IDNA in order to support IDN while maintaining
interoperability with existing infrastructure. If an application interoperability with existing infrastructure. If an application
wants to use non-ASCII characters in domain names, IDNA is the only wants to use non-ASCII characters in public DNS domain names, IDNA is
currently-defined option. Adding IDNA support to an existing the only currently-defined option. Adding IDNA support to an
application entails changes to the application only, and leaves room existing application entails changes to the application only, and
for flexibility in front-end processing and more specifically in the leaves room for flexibility in front-end processing and more
user interface (see Section 6). specifically in the user interface (see Section 6).
A great deal of the discussion of IDN solutions has focused on A great deal of the discussion of IDN solutions has focused on
transition issues and how IDNs will work in a world where not all of transition issues and how IDNs will work in a world where not all of
the components have been updated. Proposals that were not chosen by the components have been updated. Proposals that were not chosen by
the original IDN Working Group would have depended on updating of the original IDN Working Group would have depended on updating of
user applications, DNS resolvers, and DNS servers in order for a user user applications, DNS resolvers, and DNS servers in order for a user
to apply an internationalized domain name in any form or coding to apply an internationalized domain name in any form or coding
acceptable under that method. While processing must be performed acceptable under that method. While processing must be performed
prior to or after access to the DNS, IDNA requires no changes to the prior to or after access to the DNS, IDNA requires no changes to the
DNS protocol or any DNS servers or the resolvers on user's computers. DNS protocol or any DNS servers or the resolvers on user's computers.
IDNA allows the graceful introduction of IDNs not only by avoiding IDNA allows the graceful introduction of IDNs not only by avoiding
upgrades to existing infrastructure (such as DNS servers and mail upgrades to existing infrastructure (such as DNS servers and mail
transport agents), but also by allowing some rudimentary use of IDNs transport agents), but also by allowing some limited use of IDNs in
in applications by using the ASCII-encoded representation of the applications by using the ASCII-encoded representation of the labels
labels containing non-ASCII characters. While such names are user- containing non-ASCII characters. While such names are user-
unfriendly to read and type, and hence not optimal for user input, unfriendly to read and type, and hence not optimal for user input,
they can be used as a last resort to allow rudimentary IDN usage. they can be used as a last resort to allow rudimentary IDN usage.
For example, they might be the best choice for display if it were For example, they might be the best choice for display if it were
known that relevant fonts were not available on the user's computer. known that relevant fonts were not available on the user's computer.
In order to allow user-friendly input and output of the IDNs and In order to allow user-friendly input and output of the IDNs and
acceptance of some characters as equivalent to those to be processed acceptance of some characters as equivalent to those to be processed
according to the protocol, the applications need to be modified to according to the protocol, the applications need to be modified to
conform to this specification. conform to this specification.
This version of IDNA uses the Unicode character repertoire, for This version of IDNA uses the Unicode character repertoire, for
continuity with the original version of IDNA. continuity with the original version of IDNA.
1.6. Comprehensibility of IDNA Mechanisms and Processing 1.6. Comprehensibility of IDNA Mechanisms and Processing
One goal of IDNA2008, which is aided by the main goal of reducing the One goal of IDNA2008, which is aided by the main goal of reducing the
dependency on mapping, is to improve the general understanding of how dependency on mapping, is to improve the general understanding of how
to users and registrants are important design goals for this effort. IDNA works and what characters are permitted and what happens to
End-user applications have an important role to play in increasing them. Comprehensibility and predictability to users and registrants
this comprehensibility. are important design goals for this effort. End-user applications
have an important role to play in increasing this comprehensibility.
Any system that tries to handle international characters encounters Any system that tries to handle international characters encounters
some common problems. For example, a UI cannot display a character some common problems. For example, a UI cannot display a character
if no font for that character is available. In some cases, if no font for that character is available. In some cases,
internationalization enables effective localization while maintaining internationalization enables effective localization while maintaining
some global uniformity but losing some universality. some global uniformity but losing some universality.
It is difficult to even make suggestions for end-user applications to It is difficult to even make suggestions for end-user applications to
cope when characters and fonts are not available. Because display cope when characters and fonts are not available. Because display
functions are rarely controlled by the types of applications that functions are rarely controlled by the types of applications that
would call upon IDNA, such suggestions will rarely be very effective. would call upon IDNA, such suggestions will rarely be very effective.
Converting between local character sets and normalized Unicode, if Converting between local character sets and normalized Unicode, if
needed, is part of this set of user agent issues. This conversion needed, is part of this set of user agent issues. This conversion
introduces complexity in a system that is not Unicode-native. If a introduces complexity in a system that is not Unicode-native. If a
label is converted to a local character set that does not have all label is converted to a local character set that does not have all
the needed characters, the user agent may have to add special logic the needed characters, or that uses different character-coding
to avoid or reduce loss of information. principles, the user agent may have to add special logic to avoid or
reduce loss of information.
The major difficulty may lie in accurately identifying the incoming The major difficulty may lie in accurately identifying the incoming
character set and applying the correct conversion routine. Even more character set and applying the correct conversion routine. Even more
difficult, the local character coding system could be based on difficult, the local character coding system could be based on
conceptually different assumptions than those used by Unicode (e.g., conceptually different assumptions than those used by Unicode (e.g.,
choice of font encodings used for publications in some Indic choice of font encodings used for publications in some Indic
scripts). Those differences may not easily yield unambiguous scripts). Those differences may not easily yield unambiguous
conversions or interpretations even if each coding system is conversions or interpretations even if each coding system is
internally consistent and adequate to represent the local language internally consistent and adequate to represent the local language
and script. and script.
IDNA2008 shifts responsibility for character mapping and other IDNA2008 shifts responsibility for character mapping and other
adjustments from the protocol (where it was located in IDNA2003) to adjustments from the protocol (where it was located in IDNA2003) to
pre-processing before invoking IDNA. The intent is that this change pre-processing before invoking IDNA itself. The intent is that this
leads to greater usage of fully-valid A-Labels in display, transit change will lead to greater usage of fully-valid A-Labels or U-labels
and storage, which should aid comprehensibility. A careful look at in display, transit and storage, which should aid comprehensibility
pre-processing raises issues about what that pre-processing should do and predictability. A careful look at pre-processing raises issues
and at what point pre-processing becomes harmful, how universally about what that pre-processing should do and at what point pre-
consistent pre-processing algorithms can be, and how to be compatible processing becomes harmful, how universally consistent pre-processing
with labels prepared in a IDNA2003 context. Those issues are algorithms can be, and how to be compatible with labels prepared in a
discussed in Section 6. [[anchor9: Fix section reference.]] IDNA2003 context. Those issues are discussed in Section 6 and in the
separate document [IDNA2008-Mapping].
2. Processing in IDNA2008 2. Processing in IDNA2008
These specifications separate Domain Name Registration and Lookup in These specifications separate Domain Name Registration and Lookup in
the protocol specification. This separation reflects current the protocol specification. Although most steps in the two processes
practice in which per-registry restrictions and special processing are similar, the separation reflects current practice in which per-
are applied at registration time but not during lookup. Another registry (DNS zone) restrictions and special processing are applied
significant benefit is that separation facilitates incremental at registration time but not during lookup. Another significant
addition of permitted character groups to avoid freezing on one benefit is that separation facilitates incremental addition of
particular version of Unicode. permitted character groups to avoid freezing on one particular
version of Unicode.
The actual registration and lookup protocols for IDNA2008 are The actual registration and lookup protocols for IDNA2008 are
specified in [IDNA2008-Protocol]. specified in [IDNA2008-Protocol].
3. Permitted Characters: An Inclusion List 3. Permitted Characters: An Inclusion List
IDNA2008 adopts the inclusion model. A code-point is assumed to be IDNA2008 adopts the inclusion model. A code-point is assumed to be
invalid, unless it is included as part of a Unicode property-based invalid for IDN use unless it is included as part of a Unicode
rule or in rare cases included individually by an exception. When an property-based rule or, in rare cases, included individually by an
implementation moves to a new version of Unicode, the rules may exception. When an implementation moves to a new version of Unicode,
indicate new valid code-points. the rules may indicate new valid code-points.
This section provides an overview of the model used to establish the This section provides an overview of the model used to establish the
algorithm and character lists of [IDNA2008-Tables] and describes the algorithm and character lists of [IDNA2008-Tables] and describes the
names and applicability of the categories used there. Note that the names and applicability of the categories used there. Note that the
inclusion of a character in the first category group (Section 3.1.1) inclusion of a character in the first category group (Section 3.1.1)
does not imply that it can be used indiscriminately; some characters does not imply that it can be used indiscriminately; some characters
are associated with contextual rules that must be applied as well. are associated with contextual rules that must be applied as well.
The information given in this section is provided to make the rules, The information given in this section is provided to make the rules,
tables, and protocol easier to understand. The normative generating tables, and protocol easier to understand. The normative generating
skipping to change at page 10, line 33 skipping to change at page 10, line 28
list of characters that are permitted in IDNs. In IDNA2003, list of characters that are permitted in IDNs. In IDNA2003,
character validity is independent of context and fixed forever (or character validity is independent of context and fixed forever (or
until the standard is replaced). However, globally context- until the standard is replaced). However, globally context-
independent rules have proved to be impractical because some independent rules have proved to be impractical because some
characters, especially those that are called "Join_Controls" in characters, especially those that are called "Join_Controls" in
Unicode, are needed to make reasonable use of some scripts but have Unicode, are needed to make reasonable use of some scripts but have
no visible effect in others. IDNA2003 prohibited those types of no visible effect in others. IDNA2003 prohibited those types of
characters entirely by discarding them. We now have a consensus that characters entirely by discarding them. We now have a consensus that
under some conditions, these "joiner" characters are legitimately under some conditions, these "joiner" characters are legitimately
needed to allow useful mnemonics for some languages and scripts. In needed to allow useful mnemonics for some languages and scripts. In
general, context-dependent rules help deal with characters that are general, context-dependent rules help deal with characters (generally
used differently across different scripts, and allow the standard to characters that would otherwise be prohibited entirely) that are used
be applied more appropriately in cases where a string is not differently or perceived differently across different scripts, and
universally handled the same way. allow the standard to be applied more appropriately in cases where a
string is not universally handled the same way.
IDNA2008 divides all possible Unicode code-points into four IDNA2008 divides all possible Unicode code-points into four
categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and
UNASSIGNED. UNASSIGNED.
3.1.1. PROTOCOL-VALID 3.1.1. PROTOCOL-VALID
Characters identified as "PROTOCOL-VALID" (often abbreviated Characters identified as "PROTOCOL-VALID" (often abbreviated
"PVALID") are permitted in IDNs. Their use may be restricted by "PVALID") are permitted in IDNs. Their use may be restricted by
rules about the context in which they appear or by other rules that rules about the context in which they appear or by other rules that
skipping to change at page 11, line 23 skipping to change at page 11, line 18
expected to never be removed from it or reclassified. While expected to never be removed from it or reclassified. While
theoretically characters could be removed from Unicode, such removal theoretically characters could be removed from Unicode, such removal
would be inconsistent with the Unicode stability principles (see would be inconsistent with the Unicode stability principles (see
[Unicode51], Appendix F) and hence should never occur. [Unicode51], Appendix F) and hence should never occur.
3.1.2. CONTEXTUAL RULE REQUIRED 3.1.2. CONTEXTUAL RULE REQUIRED
Some characters may be unsuitable for general use in IDNs but Some characters may be unsuitable for general use in IDNs but
necessary for the plausible support of some scripts. The two most necessary for the plausible support of some scripts. The two most
commonly-cited examples are the zero-width joiner and non-joiner commonly-cited examples are the zero-width joiner and non-joiner
characters (ZWJ, U+200D and ZWNJ, U+200C). characters (ZWJ, U+200D and ZWNJ, U+200C) but other characters may
require special treatment because they would otherwise be DISALLOWED
(typically because Unicode considers them punctuation or special
symbols) but need to be permitted in limited contexts. Other
characters are given this special treatment because they pose
exceptional danger of being used to produce misleading labels or to
cause unacceptable ambiguity in label matching and interpretation.
3.1.2.1. Contextual Restrictions 3.1.2.1. Contextual Restrictions
Characters with contextual restrictions are identified as "CONTEXTUAL Characters with contextual restrictions are identified as "CONTEXTUAL
RULE REQUIRED" and associated with a rule. The rule defines whether RULE REQUIRED" and associated with a rule. The rule defines whether
the character is valid in a particular string, and also whether the the character is valid in a particular string, and also whether the
rule itself is to be applied on lookup as well as registration. rule itself is to be applied on lookup as well as registration.
A distinction is made between characters that indicate or prohibit A distinction is made between characters that indicate or prohibit
joining and ones similar to them (known as "CONTEXT-JOINER" or joining and ones similar to them (known as "CONTEXT-JOINER" or
skipping to change at page 11, line 48 skipping to change at page 11, line 49
It is important to note that these contextual rules cannot prevent It is important to note that these contextual rules cannot prevent
all uses of the relevant characters that might be confusing or all uses of the relevant characters that might be confusing or
problematic. What they are expected do is to confine applicability problematic. What they are expected do is to confine applicability
of the characters to scripts (and narrower contexts) where zone of the characters to scripts (and narrower contexts) where zone
administrators are knowledgeable enough about the use of those administrators are knowledgeable enough about the use of those
characters to be prepared to deal with them appropriately. For characters to be prepared to deal with them appropriately. For
example, a registry dealing with an Indic script that requires ZWJ example, a registry dealing with an Indic script that requires ZWJ
and/or ZWNJ as part of the writing system is expected to understand and/or ZWNJ as part of the writing system is expected to understand
where the characters have visible effect and where they do not and to where the characters have visible effect and where they do not and to
make registration rules accordingly. By contrast, a registry dealing make registration rules accordingly. By contrast, a registry dealing
with Latin or Cyrillic script might not be actively aware that the primarily with Latin or Cyrillic script might not be actively aware
characters exist, much less about the consequences of embedding them that the characters exist, much less about the consequences of
in labels drawn from those scripts. embedding them in labels drawn from those scripts.
3.1.2.2. Rules and Their Application 3.1.2.2. Rules and Their Application
Rules have descriptions such as "Must follow a character from Script Rules have descriptions such as "Must follow a character from Script
XYZ", "Must occur only if the entire label is in Script ABC", or XYZ", "Must occur only if the entire label is in Script ABC", or
"Must occur only if the previous and subsequent characters have the "Must occur only if the previous and subsequent characters have the
DFG property". The actual rules may be DEFINED or NULL. If present, DFG property". The actual rules may be DEFINED or NULL. If present,
they may have values of "True" (character may be used in any position they may have values of "True" (character may be used in any position
in any label), "False" (character may not be used in any label), or in any label), "False" (character may not be used in any label), or
may be a set of procedural rules that specify the context in which may be a set of procedural rules that specify the context in which
skipping to change at page 12, line 30 skipping to change at page 12, line 30
Because it is easier to identify these characters than to know that Because it is easier to identify these characters than to know that
they are actually needed in IDNs or how to establish exactly the they are actually needed in IDNs or how to establish exactly the
right rules for each one, a rule may have a null value in a given right rules for each one, a rule may have a null value in a given
version of the tables. Characters associated with null rules are not version of the tables. Characters associated with null rules are not
permitted to appear in putative labels for either registration or permitted to appear in putative labels for either registration or
lookup. Of course, a later version of the tables might contain a lookup. Of course, a later version of the tables might contain a
non-null rule. non-null rule.
The actual rules and their descriptions are in [IDNA2008-Tables]. The actual rules and their descriptions are in [IDNA2008-Tables].
[[anchor12: ??? Section number would be good here.]] That document [[anchor9: ??? Section number would be good here.]] That document
also creates a registry for future rules. also specifies the creation of a registry for future rules.
3.1.3. DISALLOWED 3.1.3. DISALLOWED
Some characters are inappropriate for use in IDNs and are thus Some characters are inappropriate for use in IDNs and are thus
excluded for both registration and lookup (i.e., IDNA-conforming excluded for both registration and lookup (i.e., IDNA-conforming
applications performing name lookup should verify that these applications performing name lookup should verify that these
characters are absent; if they are present, the label strings should characters are absent; if they are present, the label strings should
be rejected rather than converted to A-labels and looked up. Some of be rejected rather than converted to A-labels and looked up. Some of
these characters are problematic for use in IDNs (such as the these characters are problematic for use in IDNs (such as the
FRACTION SLASH character, U+2044), while some of them (such as the FRACTION SLASH character, U+2044), while some of them (such as the
skipping to change at page 13, line 37 skipping to change at page 13, line 37
For convenience in processing and table-building, code points that do For convenience in processing and table-building, code points that do
not have assigned values in a given version of Unicode are treated as not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points are belonging to a special UNASSIGNED category. Such code points are
prohibited in labels to be registered or looked up. The category prohibited in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories (at which point, they are classified into one of the other categories
as appropriate). as appropriate).
The rationale for restricting the processing of UNASSIGNED characters The rationale for restricting the processing of UNASSIGNED characters
is simply that if such characters were permitted to be looked up, for is simply that the properties of such code points cannot be
example, and were later assigned, but subject to some set of completely known until actual characters are assigned to them. If,
contextual rules, un-updated instances of IDNA-aware software might for example, such a code point was permitted to be included in a
permit lookup of labels containing the previously-unassigned label to be looked up, and the code point was later to be assigned to
characters while updated versions of IDNA-aware software might a character that required some set of contextual rules, un-updated
restrict their use in lookup, depending on the contextual rules. It instances of IDNA-aware software might permit lookup of labels
should be clear that under no circumstance should an UNASSIGNED containing the previously-unassigned characters while updated
character be permitted in a label to be registered as part of a versions of IDNA-aware software might restrict their use in lookup,
domain name. depending on the contextual rules. It should be clear that under no
circumstance should an UNASSIGNED character be permitted in a label
to be registered as part of a domain name.
3.2. Registration Policy 3.2. Registration Policy
While these recommendations cannot and should not define registry While these recommendations cannot and should not define registry
policies, registries should develop and apply additional restrictions policies, registries should develop and apply additional restrictions
as needed to reduce confusion and other problems. For example, it is as needed to reduce confusion and other problems. For example, it is
generally believed that labels containing characters from more than generally believed that labels containing characters from more than
one script are a bad practice although there may be some important one script are a bad practice although there may be some important
exceptions to that principle. Some registries may choose to restrict exceptions to that principle. Some registries may choose to restrict
registrations to characters drawn from a very small number of registrations to characters drawn from a very small number of
skipping to change at page 14, line 24 skipping to change at page 14, line 30
from scripts that are well-understood by the registry or its from scripts that are well-understood by the registry or its
advisers. If a registry decides to reduce opportunities for advisers. If a registry decides to reduce opportunities for
confusion by constructing policies that disallow characters used in confusion by constructing policies that disallow characters used in
historic writing systems or characters whose use is restricted to historic writing systems or characters whose use is restricted to
specialized, highly technical contexts, some relevant information may specialized, highly technical contexts, some relevant information may
be found in Section 2.4 "Specific Character Adjustments", Table 4 be found in Section 2.4 "Specific Character Adjustments", Table 4
"Candidate Characters for Exclusion from Identifiers" of "Candidate Characters for Exclusion from Identifiers" of
[Unicode-UAX31] and Section 3.1. "General Security Profile for [Unicode-UAX31] and Section 3.1. "General Security Profile for
Identifiers" in [Unicode-Security]. Identifiers" in [Unicode-Security].
The requirement (in [IDNA2008-Protocol] [[anchor10: ?? Section
number]]) that registration procedures use only U-labels and/or
A-labels is intended to ensure that registrants are fully aware of
exactly what is being registered as well as encouraging use of those
canonical forms. That provision should not be interpreted as
requiring that registrant need to provide characters in a particular
code sequence. Registrant input conventions and management are part
of registrant-registrar interactions and relationships between
registries and registrars and are outside the scope of these
standards.
It is worth stressing that these principles of policy development and It is worth stressing that these principles of policy development and
application apply at all levels of the DNS, not only, e.g., TLD or application apply at all levels of the DNS, not only, e.g., TLD or
SLD registrations and that even a trivial, "anything permitted that SLD registrations and that even a trivial, "anything permitted that
is valid under the protocol" policy is helpful in that it helps users is valid under the protocol" policy is helpful in that it helps users
and application developers know what to expect. and application developers know what to expect.
3.3. Layered Restrictions: Tables, Context, Registration, Applications 3.3. Layered Restrictions: Tables, Context, Registration, Applications
The character rules in IDNA2008 are based on the realization that The character rules in IDNA2008 are based on the realization that
there is no single magic bullet for any of the issues associated with there is no single magic bullet for any of the security,
IDNs. Instead, the specifications define a variety of approaches. confusability, or other issues associated with IDNs. Instead, the
The character tables are the first mechanism, protocol rules about specifications define a variety of approaches. The character tables
how those characters are applied or restricted in context are the are the first mechanism, protocol rules about how those characters
second, and those two in combination constitute the limits of what are applied or restricted in context are the second, and those two in
can be done in the protocol. As discussed in the previous section combination constitute the limits of what can be done in the
(Section 3.2), registries are expected to restrict what they permit protocol. As discussed in the previous section (Section 3.2),
to be registered, devising and using rules that are designed to registries are expected to restrict what they permit to be
optimize the balance between confusion and risk on the one hand and registered, devising and using rules that are designed to optimize
maximum expressiveness in mnemonics on the other. the balance between confusion and risk on the one hand and maximum
expressiveness in mnemonics on the other.
In addition, there is an important role for user agents in warning In addition, there is an important role for user agents in warning
against label forms that appear problematic given their knowledge of against label forms that appear problematic given their knowledge of
local contexts and conventions. Of course, no approach based on local contexts and conventions. Of course, no approach based on
naming or identifiers alone can protect against all threats. naming or identifiers alone can protect against all threats.
4. Issues that Constrain Possible Solutions 4. Issues that Constrain Possible Solutions
4.1. Display and Network Order 4.1. Display and Network Order
skipping to change at page 16, line 9 skipping to change at page 16, line 24
If each implementation of each application makes its own decisions on If each implementation of each application makes its own decisions on
these issues, users will develop heuristics that will sometimes fail these issues, users will develop heuristics that will sometimes fail
when switching applications. However, while some display order when switching applications. However, while some display order
conventions, voluntarily adopted, would be desirable to reduce conventions, voluntarily adopted, would be desirable to reduce
confusion, such suggestions are beyond the scope of these confusion, such suggestions are beyond the scope of these
specifications. specifications.
4.2. Entry and Display in Applications 4.2. Entry and Display in Applications
Applications can accept and display domain names using any character Applications can accept and display domain names using any character
set or character coding system. That is, the IDNA protocol does not set or character coding system. The IDNA protocol does not
necessarily affect the interface between users and applications. An necessarily affect the interface between users and applications. An
IDNA-aware application can accept and display internationalized IDNA-aware application can accept and display internationalized
domain names in two formats: the internationalized character set(s) domain names in two formats: the internationalized character set(s)
supported by the application (i.e., an appropriate local supported by the application (i.e., an appropriate local
representation of a U-label), and as an A-label. Applications may representation of a U-label), and as an A-label. Applications may
allow the display of A-labels, but are encouraged to not do so except allow the display of A-labels, but are encouraged to not do so except
as an interface for special purposes, possibly for debugging, or to as an interface for special purposes, possibly for debugging, or to
cope with display limitations. In general, they should allow, but cope with display limitations. In general, they should allow, but
not encourage, user input of A-labels. A-labels are opaque and ugly not encourage, user input of A-labels. A-labels are opaque, ugly,
and malicious variations on them are not easily detected by users. and malicious variations on them are not easily detected by users.
Where possible, they should thus only be exposed when they are Where possible, they should thus only be exposed when they are
absolutely needed. Because IDN labels can be rendered either as absolutely needed. Because IDN labels can be rendered either as
A-labels or U-labels, the application may reasonably have an option A-labels or U-labels, the application may reasonably have an option
for the user to select the preferred method of display. Rendering for the user to select the preferred method of display. Rendering
the U-label should normally be the default. the U-label should normally be the default.
Domain names are often stored and transported in many places. For Domain names are often stored and transported in many places. For
example, they are part of documents such as mail messages and web example, they are part of documents such as mail messages and web
pages. They are transported in many parts of many protocols, such as pages. They are transported in many parts of many protocols, such as
both the control commands of SMTP and associated the message body both the control commands of SMTP and associated message body parts,
parts, and in the headers and the body content in HTTP. It is and in the headers and the body content in HTTP. It is important to
important to remember that domain names appear both in domain name remember that domain names appear both in domain name slots and in
slots and in the content that is passed over protocols. the content that is passed over protocols.
In protocols and document formats that define how to handle In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in specification or negotiation of charsets, labels can be encoded in
any charset allowed by the protocol or document format. If a any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, the labels must protocol or document format only allows one charset, the labels must
be given in that charset. Of course, not all charsets can properly be given in that charset. Of course, not all charsets can properly
represent all labels. If a U-label cannot be displayed in its represent all labels. If a U-label cannot be displayed in its
entirety, the only choice (without loss of information) may be to entirety, the only choice (without loss of information) may be to
display the A-label. display the A-label.
Where a protocol or document format allows IDNs, labels should be in Where a protocol or document format allows IDNs, labels should be in
whatever character encoding and escape mechanism the protocol or whatever character encoding and escape mechanism the protocol or
document format uses at that place. This provision is intended to document format uses at that place. This provision is intended to
prevent situations in which, e.g., UTF-8 domain names appear embedded prevent situations in which, e.g., UTF-8 domain names appear embedded
in text that is otherwise in some other character coding. in text that is otherwise in some other character coding.
All protocols that use domain name slots (See Section 2.3.1.6 All protocols that use domain name slots (See Section 2.3.1.6 in
[[anchor15: ?? Verify this]] in [IDNA2008-Defs]) already have the [IDNA2008-Defs]) already have the capacity for handling domain names
capacity for handling domain names in the ASCII charset. Thus, in the ASCII charset. Thus, A-labels can inherently be handled by
A-labels can inherently be handled by those protocols. those protocols.
These documents do not specify required mappings between one
character or code point and others. An extended discussion of
mapping issues occurs in Section 6 and specific recommendations
appear in [IDNA2008-Mapping]. In general, IDNA2008 prohibits
characters that would be mapped to others by normalization or other
rules. As examples, while mathematical characters based on Latin
ones are accepted as input to IDNA2003, they are prohibited in
IDNA2008. Similarly, upper-case characters, double-width characters,
and other variations are prohibited as IDNA input although mapping
them as needed in user interfaces is strongly encouraged.
Since the rules in [IDNA2008-Tables] have the effect that only
strings that are not transformed by NFKC are valid, if an application
chooses to perform NFKC normalization before lookup, that operation
is safe since this will never make the application unable to look up
any valid string. However, as discussed above, the application
cannot guarantee that any other application will perform that
mapping, so it should be used only with caution and for informed
users.
In many cases these prohibitions should have no effect on what the
user can type as input to the lookup process. It is perfectly
reasonable for systems that support user interfaces to perform some
character mapping that is appropriate to the local environment. This
would normally be done prior to actual invocation of IDNA. At least
conceptually, the mapping would be part of the Unicode conversions
discussed above and in [IDNA2008-Protocol]. However, those changes
will be local ones only -- local to environments in which users will
clearly understand that the character forms are equivalent. For use
in interchange among systems, it appears to be much more important
that U-labels and A-labels can be mapped back and forth without loss
of information.
One specific, and very important, instance of this strategy arises
with case-folding. In the ASCII-only DNS, names are looked up and
matched in a case-independent way, but no actual case-folding occurs.
Names can be placed in the DNS in either upper or lower case form (or
any mixture of them) and that form is preserved, returned in queries,
and so on. IDNA2003 approximated that behavior for non-ASCII strings
by performing case-folding at registration time (resulting in only
lower-case IDNs in the DNS) and when names were looked up.
As suggested earlier in this section, it appears to be desirable to
do as little character mapping as possible as long as Unicode works
correctly (e.g., NFC mapping to resolve different codings for the
same character is still necessary although the specifications require
that it be performed prior to invoking the protocol) in order to make
the mapping between A-labels and U-labels idempotent. Case-mapping
is not an exception to this principle. If only lower case characters
can be registered in the DNS (i.e., be present in a U-label), then
IDNA2008 should prohibit upper-case characters as input even though
user interfaces to applications should probably map those characters.
Some other considerations reinforce this conclusion. For example, in
ASCII case-mapping for individual characters, uppercase(character)
must be equal to uppercase(lowercase(character)). That may not be
true with IDNs. In some scripts that use case distinctions, there
are a few characters that do not have counterparts in one case or the
other. The relationship between upper case and lower case may even
be language-dependent, with different languages (or even the same
language in different areas) expecting different mappings. User
agents can meet the expectations of users who are accustomed to the
case-insensitive DNS environment by performing case folding prior to
IDNA processing, but the IDNA procedures themselves should neither
require such mapping nor expect them when they are not natural to the
localized environment.
4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate
Character Forms Character Forms
Users have expectations about character matching or equivalence that Users have expectations about character matching or equivalence that
are based on their own languages and the orthography of those are based on their own languages and the orthography of those
languages. These expectations may not always be met in a global languages. These expectations may not always be met in a global
system, especially if multiple languages are written using the same system, especially if multiple languages are written using the same
script but using different conventions. Some examples: script but using different conventions. Some examples:
skipping to change at page 17, line 44 skipping to change at page 19, line 29
appear consecutively without forming a digraph, as in "tophat".) appear consecutively without forming a digraph, as in "tophat".)
Certain digraphs may be indicated typographically by setting the two Certain digraphs may be indicated typographically by setting the two
characters closer together than they would be if used consecutively characters closer together than they would be if used consecutively
to represent different phonemes. Some digraphs are fully joined as to represent different phonemes. Some digraphs are fully joined as
ligatures. For example, the word "encyclopaedia" is sometimes set ligatures. For example, the word "encyclopaedia" is sometimes set
with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph
forms have the same interpretation across all languages that use a forms have the same interpretation across all languages that use a
given script, application of Unicode normalization generally resolves given script, application of Unicode normalization generally resolves
the differences and causes them to match. When they have different the differences and causes them to match. When they have different
interpretations, matching must utilize other methods, presumably interpretations, matching must utilize other methods, presumably
chosen at the registry completely optional typographic convenience chosen at the registry level, or users must be educated to understand
for representing a digraph in one language (as in the above example that matching will not occur.
with some spelling conventions), while in another language it is a
single character that may not always be correctly representable by a
two-letter sequence (as in the above example with different spelling
conventions). This can be illustrated by many words in the Norwegian
language, where the "ae" ligature is the 27th letter of a 29-letter
extended Latin alphabet. It is equivalent to the 28th letter of the
Swedish alphabet (also containing 29 letters), U+00E4 LATIN SMALL
LETTER A WITH DIAERESIS, for which an "ae" cannot be substituted
according to current orthographic standards.
That character (U+00E4) is also part of the German alphabet where, The nature of the problem can be illustrated by many words in the
unlike in the Nordic languages, the two-character sequence "ae" is Norwegian language, where the "ae" ligature is the 27th letter of a
usually treated as a fully acceptable alternate orthography for the 29-letter extended Latin alphabet. It is equivalent to the 28th
"umlauted a" character. The inverse is however not true, and those letter of the Swedish alphabet (also containing 29 letters), U+00E4
two characters cannot necessarily be combined into an "umlauted a". LATIN SMALL LETTER A WITH DIAERESIS, for which an "ae" cannot be
This also applies to another German character, the "umlauted o" substituted according to current orthographic standards. That
(U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, character (U+00E4) is also part of the German alphabet where, unlike
cannot be used for writing the name of the author "Goethe". It is in the Nordic languages, the two-character sequence "ae" is usually
also a letter in the Swedish alphabet where, like the "a with treated as a fully acceptable alternate orthography for the "umlauted
diaeresis", it cannot be correctly represented as "oe" and in the a" character. The inverse is however not true, and those two
Norwegian alphabet, where it is represented, not as "o with characters cannot necessarily be combined into an "umlauted a". This
diaeresis", but as "slashed o", U+00F8. also applies to another German character, the "umlauted o" (U+00F6
LATIN SMALL LETTER O WITH DIAERESIS) which, for example, cannot be
used for writing the name of the author "Goethe". It is also a
letter in the Swedish alphabet where, like the "a with diaeresis", it
cannot be correctly represented as "oe" and in the Norwegian
alphabet, where it is represented, not as "o with diaeresis", but as
"slashed o", U+00F8.
Some of the ligatures that have explicit code points in Unicode were Some of the ligatures that have explicit code points in Unicode were
given special handling in IDNA2003 and now pose additional problems given special handling in IDNA2003 and now pose additional problems
in transition. See Section 7.2. in transition. See Section 7.2.
Additional cases with alphabets written right to left are described Additional cases with alphabets written right to left are described
in Section 4.5. in Section 4.5.
Matching and comparison algorithm selection often requires Matching and comparison algorithm selection often requires
information about the language being used, context, or both -- information about the language being used, context, or both --
information that is not available to IDNA or the DNS. Consequently, information that is not available to IDNA or the DNS. Consequently,
these specifications make no attempt to treat combined characters in these specifications make no attempt to treat combined characters in
any special way. A registry that is aware of the language context in any special way. A registry that is aware of the language context in
which labels are to be registered, and where that language sometimes which labels are to be registered, and where that language sometimes
(or always) treats the two- character sequences as equivalent to the (or always) treats the two- character sequences as equivalent to the
combined form, should give serious consideration to applying a combined form, should give serious consideration to applying a
"variant" model [RFC3743] [RFC4290], or to prohibiting registration "variant" model [RFC3743][RFC4290], or to prohibiting registration of
of one the forms entirely, to reduce the opportunities for user one of the forms entirely, to reduce the opportunities for user
confusion and fraud that would result from the related strings being confusion and fraud that would result from the related strings being
registered to different parties. registered to different parties.
[[anchor16: Placeholder: A discussion of the Arabic digit issue
should go here once it is resolved in some appropriate way.]]
4.4. Case Mapping and Related Issues 4.4. Case Mapping and Related Issues
In the DNS, ASCII letters are stored with their case preserved. In the DNS, ASCII letters are stored with their case preserved.
Matching during the query process is case-independent, but none of Matching during the query process is case-independent, but none of
the information that might be represented by choices of case has been the information that might be represented by choices of case has been
lost. That model has been accidentally helpful because, as people lost. That model has been accidentally helpful because, as people
have created DNS labels by catenating words (or parts of words) to have created DNS labels by catenating words (or parts of words) to
form labels, case has often been used to distinguish among components form labels, case has often been used to distinguish among components
and make the labels more memorable. and make the labels more memorable.
skipping to change at page 19, line 22 skipping to change at page 20, line 48
nothing in these specifications fundamentally changes it or could do nothing in these specifications fundamentally changes it or could do
so. In IDNA2003, all characters are case-folded and mapped by so. In IDNA2003, all characters are case-folded and mapped by
clients in a standardized step. clients in a standardized step.
Some characters do not have upper case forms. For example the Some characters do not have upper case forms. For example the
Unicode case folding operation maps Greek Final Form Sigma (U+03C2) Unicode case folding operation maps Greek Final Form Sigma (U+03C2)
to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF)
to "ss". Neither of these mappings is reversible because the upper to "ss". Neither of these mappings is reversible because the upper
case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII
string. IDNA2008 permits, at the risk of some incompatibility, string. IDNA2008 permits, at the risk of some incompatibility,
slightly more flexibility in this area by avoid case folding and slightly more flexibility in this area by avoiding case folding and
treating these characters as themselves. Approaches to handling one- treating these characters as themselves. Approaches to handling one-
way mappings are discussed in Section 7.2. way mappings are discussed in Section 7.2.
Because IDNA2003 maps Final Sigma and Eszett to other characters, and Because IDNA2003 maps Final Sigma and Eszett to other characters, and
the reverse mapping is never possible, that in some sense means that the reverse mapping is never possible, that in some sense means that
neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN. neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN.
With IDNA2008, both characters can be used in an IDN and so the With IDNA2008, both characters can be used in an IDN and so the
A-label used for lookup for any U-label containing those characters, A-label used for lookup for any U-label containing those characters,
is now different. See Section 7.1 for a discussion of what kinds of is now different. See Section 7.1 for a discussion of what kinds of
changes might require the IDNA prefix to change; this case is clearly changes might require the IDNA prefix to change; after extended
worth discussing but the WG came to consensus not to make a prefix discussions, the WG came to consensus that the change for these
change anyway. characters did not justify a prefix change.
4.5. Right to Left Text 4.5. Right to Left Text
In order to be sure that the directionality of right to left text is In order to be sure that the directionality of right to left text is
unambiguous, IDNA2003 required that any label in which right to left unambiguous, IDNA2003 required that any label in which right to left
characters appear both starts and ends with them and that it not characters appear both starts and ends with them and that it not
include any characters with strong left to right properties (that include any characters with strong left to right properties (that
excludes other alphabetic characters but permits European digits). excludes other alphabetic characters but permits European digits).
Any other string that contains a right to left character and does not Any other string that contains a right to left character and does not
meet those requirements is rejected. This is one of the few places meet those requirements is rejected. This is one of the few places
skipping to change at page 20, line 47 skipping to change at page 22, line 26
If a string cannot be successfully found in the DNS after the lookup If a string cannot be successfully found in the DNS after the lookup
processing described here, it makes no difference whether it simply processing described here, it makes no difference whether it simply
wasn't registered or was prohibited by some rule at the registry. wasn't registered or was prohibited by some rule at the registry.
Application implementors should be aware that where DNS wildcards are Application implementors should be aware that where DNS wildcards are
used, the ability to successfully resolve a name does not guarantee used, the ability to successfully resolve a name does not guarantee
that it was actually registered. that it was actually registered.
6. Front-end and User Interface Processing for Lookup 6. Front-end and User Interface Processing for Lookup
[[anchor18: Note in Draft: While this section has been revised in
version -10 to improve clarity, a significant revision is expected
once the discussions of mapping stabilize.]]
Domain names may be identified and processed in many contexts. They Domain names may be identified and processed in many contexts. They
may be typed in by users either by themselves or embedded in an may be typed in by users either by themselves or embedded in an
identifier such as email addresses, URIs, or IRIs. They may occur in identifier such as email addresses, URIs, or IRIs. They may occur in
running text or be processed by one system after being provided in running text or be processed by one system after being provided in
another. Systems may try to normalize URLs to determine (or guess) another. Systems may try to normalize URLs to determine (or guess)
whether a reference is valid or two references point to the same whether a reference is valid or two references point to the same
object without actually looking the objects up (comparison without object without actually looking the objects up (comparison without
lookup is necessary for URI types that are not intended to be lookup is necessary for URI types that are not intended to be
resolved). Some of these goals may be more easily and reliably resolved). Some of these goals may be more easily and reliably
satisfied than others. While there are strong arguments for any satisfied than others. While there are strong arguments for any
domain name that is placed "on the wire" -- transmitted between domain name that is placed "on the wire" -- transmitted between
systems -- to be in the zero-ambiguity forms of A-labels, it is systems -- to be in the zero-ambiguity forms of A-labels, it is
inevitable that programs that process domain names will encounter inevitable that programs that process domain names will encounter
U-labels or variant forms. U-labels or variant forms.
This section discusses these mapping and transformation issues among An application that implements the IDNA protocol [IDNA2008-Protocol]
names, contrasting IDNA2003 and IDNA2008 behavior. The discussion will always take any user input and convert it to a set of Unicode
applies only in operations that look up names or interpret files. code points. That user input may be acquired by any of several
There are several reasons why registration activities should require different input methods, all with differing conversion processes to
final names and verification of those names by the would-be be taken into consideration (e.g., typed on a keyboard, written by
registrant. hand onto some sort of digitizer, spoken into a microphone and
interpreted by a speech-to-text engine, etc.). The process of taking
One source of label forms that are neither A-labels nor U-labels will any particular user input and mapping it into a Unicode code point
be labels created under IDNA2003. That protocol allowed labels that may be a simple one: If a user strikes the "A" key on a US English
were transformed from native-character format by mapping some keyboard, without any modifiers such as the "Shift" key held down, in
characters into others before conversion into A-label format. One order to draw a Latin small letter A ("a"), many (perhaps most)
consequence of the transformations was that conversion from the modern operating system input methods will produce to the calling
A-label format back to native characters often did not produce the application the code point U+0061, encoded in a single octet.
original label. IDNA2008 explicitly defines A-labels and U-labels as
different forms of the same abstract label, forms that are stable
when conversions are performed between them (without mappings).
A different way of explaining this is that there are, today, domain
names in files on the Internet that use characters that cannot be
represented directly in, or recovered from, (A-label) domain names
but for which interpretations were provided by IDNA2003). There are
two major categories of characters irreversibly remapped by
Stringprep, those that are removed by NFKC normalization and those
upper-case characters that are mapped to lower-case (there are also a
few characters that are given special-case mapping treatment,
including lower-case characters that are case-folded into other
lower-case characters or strings and those that are simply
eliminated).
Other issues in domain name identification and processing arise
because IDNA2003 specified that several other characters be treated
as equivalent to the ASCII period (dot, full stop) character used as
a label separator. If a string that might be a domain name appears
in an arbitrary context (such as running text), it is difficult, even
with only ASCII characters, to know whether an actual domain name (or
a protocol parameter like a URI) is present and where it starts and
ends. When using Unicode, this gets even more difficult if treatment
of certain special characters (like the dot that separates labels in
a domain name) depends on context (e.g., prior knowledge of whether
the string represents a domain name or not). That knowledge is not
available if the primary heuristic for identifying the presence of
domain names in strings depends on the presence of dots separating
groups of characters with no intervening spaces.
[[anchor19: Placeholder: In serial efforts to move the mapping model Sometimes the process is somewhat more complicated: a user might
out of the protocol and leave it unspecified here, this paragraph has strike a particular set of keys to represent a combining macron
become a complete botch. Rewrite when the mapping plan stabilizes.]] followed by striking the "A" key in order to draw a Latin small
The IDNA2008 model removes all of these mappings and interpretations, letter A with a macron above it. Depending on the operating system,
including the equivalence of different forms of dots, from the the input method chosen by the user, and even the parameters with
protocol, discouraging such mappings and leaving them, when which the application communicates with the input method, the result
necessary, to local processing. This should not be taken to imply might be the code point U+0101 (encoded as two octets in UTF-8 or
that local processing is optional or can be avoided entirely, even if UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed
doing so might have been desirable in a world without IDNA2003 IDNs by the code point U+0304 (again, encoded in three or more octets,
in files and archives. Instead, unless the program context is such depending upon the encoding used) or even the code point U+FF41
that it is known that any IDNs that appear will contain either followed by the code point U+0304 (and encoded in some form). And
U-label or A-label forms, or that other forms can safely be rejected, these examples leave aside the issue of operating systems and input
some local processing of apparent domain name strings will be methods that do not use Unicode code points for their character set.
required, both to maintain compatibility with IDNA2003 and to prevent
user astonishment. Such local processing, while not specified in
this document or the associated ones, will generally take one of two
forms:
o Generic Preprocessing. In every case, applications (with the help of the operating systems
When the context in which the program or system that processes on which they run and the input methods used) need to perform a
domain names operates is global, a reasonable balance must be mapping from user input into Unicode code points.
found that is sensitive to the broad range of local needs and
assumptions while, at the same time, not sacrificing the needs of
one language, script, or user population to those of another.
For this case, the best practice will usually be to apply NFKC and The original version of the IDNA protocol [RFC3490] used a model
case-mapping (or, perhaps better yet, Stringprep itself), plus whereby input was taken from the user, mapped (via whatever input
dot-mapping where appropriate, to the domain name string prior to method mechanisms were used) to a set of Unicode code points, and
applying IDNA. That practice will not only yield a reasonable then further mapped to a set of Unicode code points using the
compromise of user experience with protocol requirements but will Nameprep profile specified in [RFC3491]. In this procedure, there
be almost completely compatible with the various forms permitted are two separate mapping steps: First, a mapping done by the input
by IDNA2003. method (which might be controlled by the operating system, the
application, or some combination) and then a second mapping performed
by the Nameprep portion of the IDNA protocol. The mapping done in
Nameprep includes a particular mapping table to re-map some
characters to other characters, a particular normalization, and a set
of prohibited characters.
o Highly Localized Preprocessing. Note that the result of the two step mapping process means that the
Unlike the case above, there will be some situations in which mapping chosen by the operating system or application in the first
software will be highly localized for a particular environment and step might differ significantly from the mapping supplied by the
carefully adapted to the expectations of users in that Nameprep profile in the second step. This has advantages and
environment. The many discussions about using the Internet to disadvantages. Of course, the second mapping regularizes what gets
preserve and support local cultures suggest that these cases may looked up in the DNS, making for better interoperability between
be more common in the future than they have been so far. implementations which use the Nameprep mapping. However, the
application or operating system may choose mappings in their input
methods, which when passed through the second (Nameprep) mapping
result in characters that are "surprising" to the end user.
In these cases, we should avoid trying to tell implementers what The other important feature of the original version of the IDNA
they should accept, if only because they are quite likely (and for protocol is that, with very few exceptions, it assumes that any set
good reason) to ignore us. We would assume that they would map of Unicode code points provided to the Nameprep mapping can be mapped
characters that the intuitions of their users would suggest be into a string of Unicode code points that are "sensible", even if
mapped and would hope that they would do that mapping as early as that means mapping some code points to nothing (that is, removing the
possible, storing A-label or U-label forms in files and code points from the string). This allowed maximum flexibility in
transporting only those forms between systems. One can imagine input strings.
switches about whether some sorts of mappings occur, warnings
before applying them or, in a slightly more extreme version of the
approach taken in Internet Explorer version 7 (IE7), systems that
utterly refuse to handle "strange" characters at all if they
appear in U-label form. None of those local decisions are a
threat to interoperability as long as (i) only U-labels and
A-labels are used in interchange with systems outside the local
environment, (ii) no character that would be valid in a U-label as
itself is mapped to something else, (iii) any local mappings are
applied as a preprocessing step (or, for conversions from U-labels
or A-labels to presentation forms, postprocessing), not as part of
IDNA processing proper, and (iv) appropriate consideration is
given to labels that might have entered the environment in
conformance to IDNA2003.
In either case, it is vital that user interface designs and, where The present version of IDNA differs significantly in approach from
the interfaces are not sufficient, users, be aware that the only the original version. First and foremost, it does not provide
forms of domain names that this protocol anticipates will resolve explicit mapping instructions. Instead, it assumes that the
globally or compare equal when crude methods (i.e., those not application (perhaps via an operating system input method) will do
conforming to the strict definition of label equivalence given in whatever mapping it requires to convert input into Unicode code
[IDNA2008-Defs]) are used are those in which all native-script labels points. This has the advantage of giving flexibility to the
are in U-label form. Forms that assume mapping will occur, application to choose a mapping that is suitable for its user given
especially forms that were not valid under IDNA2003, may or may not specific user requirements, and avoids the two-step mapping of the
function in predictable ways across all implementations. original protocol. Instead of a mapping, the current version of IDNA
provides a set of categories that can be used to specify the valid
code points allowed in a domain name.
User interfaces involving Latin-based scripts should take special In principle, an application ought to take user input of a domain
care when considering how to handle case mapping because small name and convert it to the set of Unicode code points that represent
differences in label strings may cause behavior that is astonishing the domain name the user intends. As a practical matter, of course,
to users. Because case-insensitive comparison is done for ASCII determining user intent is a tricky business, so an application needs
strings by DNS-servers, an all-ASCII label is treated as case- to choose a reasonable mapping from user input. That may differ
insensitive. However, if even one of the characters of that string based on the particular circumstances of a user, depending on locale,
is replaced by one that requires the label to be given IDN treatment language, type of input method, etc. It is up to the application to
(e.g., by adding a diacritical mark), then the label effectively make a reasonable choice.
becomes case-sensitive because only lower-case characters are
permitted in IDNs. Preprocessing in applications to handle case
mapping for Latin-based scripts (and possibly other scripts with case
distinctions) may be wise to prevent user astonishment. However, all
applications may not do this and ambiguity in transport is not
desirable. Consequently the case-dependent forms should not be
stored in files.
7. Migration from IDNA2003 and Unicode Version Synchronization 7. Migration from IDNA2003 and Unicode Version Synchronization
7.1. Design Criteria 7.1. Design Criteria
As mentioned above and in RFC 4690, two key goals of the IDNA2008 As mentioned above and in RFC 4690, two key goals of the IDNA2008
design are design are
o to enable applications to be agnostic about whether they are being o to enable applications to be agnostic about whether they are being
run in environments supporting any Unicode version from 3.2 run in environments supporting any Unicode version from 3.2
skipping to change at page 26, line 38 skipping to change at page 27, line 16
whole-label rules. In particular, it must verify that whole-label rules. In particular, it must verify that
* there are no leading combining marks, * there are no leading combining marks,
* the "bidi" conditions are met if right to left characters * the "bidi" conditions are met if right to left characters
appear, appear,
* any required contextual rules are available, and * any required contextual rules are available, and
* any contextual rules that are associated with Joiner Controls * any contextual rules that are associated with Joiner Controls
are tested. (and "CONTEXTJ" characters more generally) are tested.
o Do not reject labels based on other contextual rules about o Do not reject labels based on other contextual rules about
characters, including mixed-script label prohibitions. Such rules characters, including mixed-script label prohibitions. Such rules
may be used to influence presentation decisions in the user may be used to influence presentation decisions in the user
interface, but not to avoid looking up domain names. interface, but not to avoid looking up domain names.
Lookup applications that following these rules, rather than having Lookup applications that following these rules, rather than having
their own criteria for rejecting lookup attempts, are not sensitive their own criteria for rejecting lookup attempts, are not sensitive
to version incompatibilities with the particular zone registry to version incompatibilities with the particular zone registry
associated with the domain name except for labels containing associated with the domain name except for labels containing
characters recently added to Unicode. characters recently added to Unicode.
An application or client that processes names according to this An application or client that processes names according to this
protocol and then resolves them in the DNS will be able to locate any protocol and then resolves them in the DNS will be able to locate any
name that is registered, as long as those registrations are IDNA- name that is registered, as long as those registrations are IDNA-
value and its version of the IDNA tables is sufficiently up-to-date valid and its version of the IDNA tables is sufficiently up-to-date
to interpret all of the characters in the label. Messages to users to interpret all of the characters in the label. Messages to users
should distinguish between "label contains an unallocated code point" should distinguish between "label contains an unallocated code point"
and other types of lookup failures. A failure on the basis of an old and other types of lookup failures. A failure on the basis of an old
version of Unicode may lead the user to a desire to upgrade to a version of Unicode may lead the user to a desire to upgrade to a
newer version, but will have no other ill effects (this is consistent newer version, but will have no other ill effects (this is consistent
with behavior in the transition to the DNS when some hosts could not with behavior in the transition to the DNS when some hosts could not
yet handle some forms of names or record types). yet handle some forms of names or record types).
7.2. Changes in Character Interpretations 7.2. Changes in Character Interpretations
[[anchor22: This subsection will need to be rewritten when the
mapping decisions stabilize.]]
In those scripts that make case distinctions, there are a few In those scripts that make case distinctions, there are a few
characters for which an obvious and unique upper case character has characters for which an obvious and unique upper case character has
not historically been available to match a lower case one or vice not historically been available to match a lower case one or vice
versa. For those characters, the mappings used in constructing the versa. For those characters, the mappings used in constructing the
Stringprep tables for IDNA2003, performed using the Unicode CaseFold Stringprep tables for IDNA2003, performed using the Unicode CaseFold
operation (See Section 5.8 of the Unicode Standard [Unicode51]), operation (See Section 5.8 of the Unicode Standard [Unicode51]),
generate different characters or sets of characters. Those generate different characters or sets of characters. Those
operations are not reversible and lose even more information than operations are not reversible and lose even more information than
traditional upper case or lower case transformations, but are more traditional upper case or lower case transformations, but are more
useful than those transformations for comparison purposes. Two useful than those transformations for comparison purposes. Two
notable characters of this type are the German character Eszett notable characters of this type are the German character Eszett
(Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The
former is case-folded to the ASCII string "ss", the latter to a former is case-folded to the ASCII string "ss", the latter to a
medial (Lower Case) Sigma (U+03C3). medial (Lower Case) Sigma (U+03C3).
The decision to eliminate mappings, including case folding, from the The decision to eliminate mandatory and standardized mappings,
IDNA2008 protocol in order to make A-labels and U-labels idempotent including case folding, from the IDNA2008 protocol in order to make
made these characters problematic. If they were to be disallowed, A-labels and U-labels idempotent made these characters problematic.
important words and mnemonics could not be written in If they were to be disallowed, important words and mnemonics could
orthographically reasonable ways. If they were to be permitted as not be written in orthographically reasonable ways. If they were to
distinct characters, there would be no information loss and be permitted as distinct characters, there would be no information
registries would have more flexibility, but IDNA2003 and IDNA2008 loss and registries would have more flexibility, but IDNA2003 and
lookups might result in different A-labels. IDNA2008 lookups might result in different A-labels.
With the understanding that there would be incompatibility either way With the understanding that there would be incompatibility either way
but a judgment that the incompatibility was not significant enough to but a judgment that the incompatibility was not significant enough to
justify a prefix change, the WG concluded that Eszett and Final Form justify a prefix change, the WG concluded that Eszett and Final Form
Sigma should be treated as distinct and Protocol-Valid characters. Sigma should be treated as distinct and Protocol-Valid characters.
Registries, especially those maintaining zones for third parties, Registries, especially those maintaining zones for third parties,
must decide how to introduce a new service in a way that does not must decide how to introduce a new service in a way that does not
create confusion or significantly weaken or invalidate existing create confusion or significantly weaken or invalidate existing
identifiers. This is not a new problem; registries were faced with identifiers. This is not a new problem; registries were faced with
skipping to change at page 28, line 30 skipping to change at page 29, line 5
corresponding string containing Eszett or Final Sigma corresponding string containing Eszett or Final Sigma
respectively. respectively.
o Adopt some sort of "variant" approach in which registrants obtain o Adopt some sort of "variant" approach in which registrants obtain
labels with both character forms. labels with both character forms.
o Adopt a different form of "variant" approach in which registration o Adopt a different form of "variant" approach in which registration
of additional names is either not permitted at all or permitted of additional names is either not permitted at all or permitted
only by the registrant who already has one of the names. only by the registrant who already has one of the names.
7.3. More Flexibility in User Agents 7.3. Character Mapping
[[anchor23: Note in Draft: This section is mapping-related and may
need to be revised after that issue settles down.]] Also, it is
closely related to Section 4.2 and may need to be cross-referenced
from it or consolidated into it.
These documents do not specify mappings between one character or code
point and others. Instead, IDNA2008 prohibits characters that would
be mapped to others by normalization, upper case to lower case
changes, or other rules. As examples, while mathematical characters
based on Latin ones are accepted as input to IDNA2003, they are
prohibited in IDNA2008. Similarly, double-width characters and other
variations are prohibited as IDNA input.
Since the rules in [IDNA2008-Tables] have the effect that only
strings that are not transformed by NFKC are valid, if an application
chooses to perform NFKC normalization before lookup, that operation
is safe since this will never make the application unable to look up
any valid string. However, as discussed above, the application
cannot guarantee that any other application will perform that
mapping, so it should be used only with caution and for informed
users.
In many cases these prohibitions should have no effect on what the
user can type as input to the lookup process. It is perfectly
reasonable for systems that support user interfaces to perform some
character mapping that is appropriate to the local environment. This
would normally be done prior to actual invocation of IDNA. At least
conceptually, the mapping would be part of the Unicode conversions
discussed above and in [IDNA2008-Protocol]. However, those changes
will be local ones only -- local to environments in which users will
clearly understand that the character forms are equivalent. For use
in interchange among systems, it appears to be much more important
that U-labels and A-labels can be mapped back and forth without loss
of information.
One specific, and very important, instance of this strategy arises
with case-folding. In the ASCII-only DNS, names are looked up and
matched in a case-independent way, but no actual case-folding occurs.
Names can be placed in the DNS in either upper or lower case form (or
any mixture of them) and that form is preserved, returned in queries,
and so on. IDNA2003 approximated that behavior for non-ASCII strings
by performing case-folding at registration time (resulting in only
lower-case IDNs in the DNS) and when names were looked up.
As suggested earlier in this section, it appears to be desirable to As discussed at length in Section 6, IDNA2003, via Nameprep (see
do as little character mapping as possible as long as Unicode works Section 7.5), mapped many characters into related ones. Those
correctly (e.g., NFC mapping to resolve different codings for the mappings no longer exist as requirements in IDNA2008. These
same character is still necessary although the specifications require specifications strongly prefer that only A-labels or U-labels be used
that it be performed prior to invoking the protocol) in order to make in protocol contexts and as much as practical more generally.
the mapping between A-labels and U-labels idempotent. Case-mapping IDNA2008 does anticipate situations in which some mapping at the time
is not an exception to this principle. If only lower case characters of user input into lookup applications is appropriate and desirable.
can be registered in the DNS (i.e., be present in a U-label), then The issues are discussed in Section 6 and specific recommendations
IDNA2008 should prohibit upper-case characters as input. Some other are made in [IDNA2008-Mapping].
considerations reinforce this conclusion. For example, in ASCII
case-mapping for individual characters, uppercase(character) must be
equal to uppercase(lowercase(character)). That may not be true with
IDNs. In some scripts that use case distinctions, there are a few
characters that do not have counterparts in one case or the other.
The relationship between upper case and lower case may even be
language-dependent, with different languages (or even the same
language in different areas) expecting different mappings. User
agents can meet the expectations of users who are accustomed to the
case-insensitive DNS environment by performing case folding prior to
IDNA processing, but the IDNA procedures themselves should neither
require such mapping nor expect them when they are not natural to the
localized environment.
7.4. The Question of Prefix Changes 7.4. The Question of Prefix Changes
The conditions that would require a change in the IDNA ACE prefix The conditions that would require a change in the IDNA ACE prefix
("xn--" for the version of IDNA specified in [RFC3490]) have been a ("xn--" for the version of IDNA specified in [RFC3490]) have been a
great concern to the community. A prefix change would clearly be great concern to the community. A prefix change would clearly be
necessary if the algorithms were modified in a manner that would necessary if the algorithms were modified in a manner that would
create serious ambiguities during subsequent transition in create serious ambiguities during subsequent transition in
registrations. This section summarizes our conclusions about the registrations. This section summarizes our conclusions about the
conditions under which changes in prefix would be necessary and the conditions under which changes in prefix would be necessary and the
skipping to change at page 33, line 8 skipping to change at page 32, line 18
such as outline, solid, and shaded forms may or may not exist; such as outline, solid, and shaded forms may or may not exist;
and so on. As just one example, consider a "heart" symbol as it and so on. As just one example, consider a "heart" symbol as it
might appear in a logo that might be read as "I love...". While might appear in a logo that might be read as "I love...". While
the user might read such a logo as "I love..." or "I heart...", the user might read such a logo as "I love..." or "I heart...",
considerable knowledge of the coding distinctions made in Unicode considerable knowledge of the coding distinctions made in Unicode
is needed to know that there more than one "heart" character is needed to know that there more than one "heart" character
(e.g., U+2665, U+2661, and U+2765) and how to describe it. These (e.g., U+2665, U+2661, and U+2765) and how to describe it. These
issues are of particular importance if strings are expected to be issues are of particular importance if strings are expected to be
understood or transcribed by the listener after being read out understood or transcribed by the listener after being read out
loud. loud.
[[anchor24: The above paragraph remains controversial as to
whether it is valid. The WG will need to make a decision if this
section is not dropped entirely.]]
3. Consider the case of a screen reader used by blind Internet users 3. Design of a screen reader used by blind Internet users who must
who must listen to renderings of IDN domain names and possibly listen to renderings of IDN domain names and possibly reproduce
reproduce them on the keyboard. them on the keyboard becomes considerably more complicated when
the names of characters are not obvious and intuitive to anyone
familiar with the language in question.
4. As a simplified example of this, assume one wanted to use a 4. As a simplified example of this, assume one wanted to use a
"heart" or "star" symbol in a label. This is problematic because "heart" or "star" symbol in a label. This is problematic because
those names are ambiguous in the Unicode system of naming (the those names are ambiguous in the Unicode system of naming (the
actual Unicode names require far more qualification). A user or actual Unicode names require far more qualification). A user or
would-be registrant has no way to know -- absent careful study of would-be registrant has no way to know -- absent careful study of
the code tables -- whether it is ambiguous (e.g., where there are the code tables -- whether it is ambiguous (e.g., where there are
multiple "heart" characters) or not. Conversely, the user seeing multiple "heart" characters) or not. Conversely, the user seeing
the hypothetical label doesn't know whether to read it -- try to the hypothetical label doesn't know whether to read it -- try to
transmit it to a colleague by voice -- as "heart", as "love", as transmit it to a colleague by voice -- as "heart", as "love", as
skipping to change at page 34, line 24 skipping to change at page 33, line 32
o Tests involving the context of characters (e.g., some characters o Tests involving the context of characters (e.g., some characters
being permitted only adjacent to others of specific types) and being permitted only adjacent to others of specific types) and
integrity tests on complete labels are needed. Unassigned code integrity tests on complete labels are needed. Unassigned code
points cannot be permitted because one cannot determine whether points cannot be permitted because one cannot determine whether
particular code points will require contextual rules (and what particular code points will require contextual rules (and what
those rules should be) before characters are assigned to them and those rules should be) before characters are assigned to them and
the properties of those characters fully understood. the properties of those characters fully understood.
o It cannot be known in advance, and with sufficient reliability, o It cannot be known in advance, and with sufficient reliability,
that a no newly-assigned code point will associated with a whether a newly-assigned code point will be associated with a
character that would be disallowed by the rules in character that would be disallowed by the rules in
[IDNA2008-Tables] (such as a compatibility character). In [IDNA2008-Tables] (such as a compatibility character). In
IDNA2003, since there is no direct dependency on NFKC (many of the IDNA2003, since there is no direct dependency on NFKC (many of the
entries in Stringprep's tables are based on NFKC, but IDNA2003 entries in Stringprep's tables are based on NFKC, but IDNA2003
depends only on Stringprep), allocation of a compatibility depends only on Stringprep), allocation of a compatibility
character might produce some odd situations, but it would not be a character might produce some odd situations, but it would not be a
problem. In IDNA2008, where compatibility characters are problem. In IDNA2008, where compatibility characters are
DISALLOWED unless character-specific exceptions are made, DISALLOWED unless character-specific exceptions are made,
permitting strings containing unassigned characters to be looked permitting strings containing unassigned characters to be looked
up would violate the principle that characters in DISALLOWED are up would violate the principle that characters in DISALLOWED are
skipping to change at page 39, line 14 skipping to change at page 38, line 15
12. Acknowledgments 12. Acknowledgments
The editor and contributors would like to express their thanks to The editor and contributors would like to express their thanks to
those who contributed significant early (pre-WG) review comments, those who contributed significant early (pre-WG) review comments,
sometimes accompanied by text, especially Mark Davis, Paul Hoffman, sometimes accompanied by text, especially Mark Davis, Paul Hoffman,
Simon Josefsson, and Sam Weiler. In addition, some specific ideas Simon Josefsson, and Sam Weiler. In addition, some specific ideas
were incorporated from suggestions, text, or comments about sections were incorporated from suggestions, text, or comments about sections
that were unclear supplied by Vint Cerf, Frank Ellerman, Michael that were unclear supplied by Vint Cerf, Frank Ellerman, Michael
Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken
Whistler, although, as usual, they bear little or no responsibility Whistler. Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie
for the conclusions the editor and contributors reached after Garside, and Jefsey Morfin for conversations that led to considerable
receiving their suggestions. Thanks are also due to Vint Cerf, Lisa improvements in the content of this document.
Dusseault, Debbie Garside, and Jefsey Morfin for conversations that
led to considerable improvements in the content of this document.
A meeting was held on 30 January 2008 to attempt to reconcile A meeting was held on 30 January 2008 to attempt to reconcile
differences in perspective and terminology about this set of differences in perspective and terminology about this set of
specifications between the design team and members of the Unicode specifications between the design team and members of the Unicode
Technical Consortium. The discussions at and subsequent to that Technical Consortium. The discussions at and subsequent to that
meeting were very helpful in focusing the issues and in refining the meeting were very helpful in focusing the issues and in refining the
specifications. The active participants at that meeting were (in specifications. The active participants at that meeting were (in
alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam,
Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary
Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel,
skipping to change at page 39, line 42 skipping to change at page 38, line 41
Useful comments and text on the WG versions of the draft were Useful comments and text on the WG versions of the draft were
received from many participants in the IETF "IDNABIS" WG and a number received from many participants in the IETF "IDNABIS" WG and a number
of document changes resulted from mailing list discussions made by of document changes resulted from mailing list discussions made by
that group. Marcos Sanz provided specific analysis and suggestions that group. Marcos Sanz provided specific analysis and suggestions
that were exceptionally helpful in refining the text, as did Vint that were exceptionally helpful in refining the text, as did Vint
Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler. Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler.
Lisa Dusseault provided extensive editorial suggestions during the Lisa Dusseault provided extensive editorial suggestions during the
spring of 2009, most of which were incorporated. spring of 2009, most of which were incorporated.
As is usual with IETF specifications, while the document represents
rough consensus, it should not be assumed that all participants and
contributors agree with all provisions.
13. Contributors 13. Contributors
While the listed editor held the pen, the core of this document and While the listed editor held the pen, the core of this document and
the initial WG version represents the joint work and conclusions of the initial WG version represents the joint work and conclusions of
an ad hoc design team consisting of the editor and, in alphabetic an ad hoc design team consisting of the editor and, in alphabetic
order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp.
In addition, there were many specific contributions and helpful Considerable material describing mapping principles has been
comments from those listed in the Acknowledgments section and others incorporated from a draft of [IDNA2008-Mapping] by Pete Resnick and
who have contributed to the development and use of the IDNA Paul Hoffman. In addition, there were many specific contributions
protocols. and helpful comments from those listed in the Acknowledgments section
and others who have contributed to the development and use of the
IDNA protocols.
14. References 14. References
14.1. Normative References 14.1. Normative References
[ASCII] American National Standards Institute (formerly United [ASCII] American National Standards Institute (formerly United
States of America Standards Institute), "USA Code for States of America Standards Institute), "USA Code for
Information Interchange", ANSI X3.4-1968, 1968. Information Interchange", ANSI X3.4-1968, 1968.
ANSI X3.4-1968 has been replaced by newer versions with ANSI X3.4-1968 has been replaced by newer versions with
slight modifications, but the 1968 version remains slight modifications, but the 1968 version remains
definitive for the Internet. definitive for the Internet.
[IDNA2008-Bidi] [IDNA2008-Bidi]
Alvestrand, H. and C. Karp, "An updated IDNA criterion for Alvestrand, H. and C. Karp, "An updated IDNA criterion for
right to left scripts", July 2008, <https:// right to left scripts", August 2009, <https://
datatracker.ietf.org/drafts/draft-ietf-idnabis-bidi/>. datatracker.ietf.org/drafts/draft-ietf-idnabis-bidi/>.
[IDNA2008-Defs] [IDNA2008-Defs]
Klensin, J., "Internationalized Domain Names for Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Definitions and Document Framework", Applications (IDNA): Definitions and Document Framework",
November 2008, <https://datatracker.ietf.org/drafts/ August 2009, <https://datatracker.ietf.org/drafts/
draft-ietf-idnabis-defs/>. draft-ietf-idnabis-defs/>.
[IDNA2008-Protocol] [IDNA2008-Protocol]
Klensin, J., "Internationalized Domain Names in Klensin, J., "Internationalized Domain Names in
Applications (IDNA): Protocol", November 2008, <https:// Applications (IDNA): Protocol", August 2009, <https://
datatracker.ietf.org/drafts/draft-ietf-idnabis-protocol/>. datatracker.ietf.org/drafts/draft-ietf-idnabis-protocol/>.
[IDNA2008-Tables] [IDNA2008-Tables]
Faltstrom, P., "The Unicode Code Points and IDNA", Faltstrom, P., "The Unicode Code Points and IDNA",
July 2008, <https://datatracker.ietf.org/drafts/ August 2009, <https://datatracker.ietf.org/drafts/
draft-ietf-idnabis-tables/>. draft-ietf-idnabis-tables/>.
A version of this document is available in HTML format at A version of this document is available in HTML format at
http://stupid.domain.name/idnabis/ http://stupid.domain.name/idnabis/
draft-ietf-idnabis-tables-02.html draft-ietf-idnabis-tables-06.html
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003. RFC 3490, March 2003.
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
for Internationalized Domain Names in Applications for Internationalized Domain Names in Applications
(IDNA)", RFC 3492, March 2003. (IDNA)", RFC 3492, March 2003.
[Unicode-UAX15] [Unicode-UAX15]
skipping to change at page 41, line 31 skipping to change at page 40, line 40
There are several forms and variations and a closely- There are several forms and variations and a closely-
related standard, CNS 11643. See the discussion in related standard, CNS 11643. See the discussion in
Chapter 3 of Lunde, K., CJKV Information Processing, Chapter 3 of Lunde, K., CJKV Information Processing,
O'Reilly & Associates, 1999 O'Reilly & Associates, 1999
[GB18030] "Chinese National Standard GB 18030-2000: Information [GB18030] "Chinese National Standard GB 18030-2000: Information
Technology -- Chinese ideograms coded character set for Technology -- Chinese ideograms coded character set for
information interchange -- Extension for the basic set.", information interchange -- Extension for the basic set.",
2000. 2000.
[IDNA2008-Mapping]
Resnick, P., "Mapping Characters in IDNA", August 2009, <h
ttps://datatracker.ietf.org/drafts/
draft-ietf-idnabis-mapping/>.
[RFC0810] Feinler, E., Harrenstien, K., Su, Z., and V. White, "DoD [RFC0810] Feinler, E., Harrenstien, K., Su, Z., and V. White, "DoD
Internet host table specification", RFC 810, March 1982. Internet host table specification", RFC 810, March 1982.
[RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet
host table specification", RFC 952, October 1985. host table specification", RFC 952, October 1985.
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987. STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
skipping to change at page 46, line 9 skipping to change at page 45, line 18
o Clarified relationship to base DNS specifications. o Clarified relationship to base DNS specifications.
o Consolidated discussion of lookup of unassigned characters. o Consolidated discussion of lookup of unassigned characters.
o More editorial fine-tuning. o More editorial fine-tuning.
A.7. Version -07 A.7. Version -07
o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label
(or False-A-label), R-LDH-label, valid IDNA-label in (or False-A-label), R-LDH-label, valid IDNA-label in
Section 1.3.3. Section 1.3.2.
o Moved the "name server considerations" material to this document o Moved the "name server considerations" material to this document
from Protocol because it is non-normative and not part of the from Protocol because it is non-normative and not part of the
protocol itself. protocol itself.
o To improve clarity, redid discussion of the reasons why looking up o To improve clarity, redid discussion of the reasons why looking up
unassigned code points is prohibited. unassigned code points is prohibited.
o Editorial and other non-substantive corrections to reflect earlier o Editorial and other non-substantive corrections to reflect earlier
errors as well as new definitions and terminology. errors as well as new definitions and terminology.
skipping to change at page 47, line 16 skipping to change at page 46, line 23
o Extensive editorial improvements, mostly due to suggestions from o Extensive editorial improvements, mostly due to suggestions from
Lisa Dusseault. Lisa Dusseault.
o Changes required for the new "mapping" approach and document have, o Changes required for the new "mapping" approach and document have,
in general, not been incorporated despite several suggestions. in general, not been incorporated despite several suggestions.
The editor intends to wait until the mapping model is stable, or The editor intends to wait until the mapping model is stable, or
at least until -11 of this document, before trying to incorporate at least until -11 of this document, before trying to incorporate
those suggestions. those suggestions.
A.11. Version -11
o Several placeholders for additional material or editing have been
removed since no comments have been received.
o Updated references.
o Corrected an apparent patching error in Section 1.6 and another
one in Section 4.3.
o Adjusted several sections that had not properly reflected removal
of the material that is now in the Definitions document and
removed an unnecessary one.
o New material added to Section 3.2 about registration policy issues
to reflect discussions on the mailing list.
o Incorporated mapping material from the former "Architectural
Principles" of version -01 of the Mapping draft into Section 6 and
removed most of the prior mapping material and explanations.
o Eliminated the former Section 7.3 ("More Flexibility in User
Agents"), moving its material into Section 4.2. The replacement
section is basically a placeholder to retain the mapping issues as
one of the migration topics. Note that this item and the previous
one involve considerable text, so people should check things
carefully.
o Corrected several typographical and editorial errors that don't
fall into any of the above categories.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
 End of changes. 78 change blocks. 
420 lines changed or deleted 427 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/