< draft-ietf-idnabis-rationale-04.txt   draft-ietf-idnabis-rationale-05.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft November 2, 2008 Internet-Draft November 28, 2008
Intended status: Informational Intended status: Informational
Expires: May 6, 2009 Expires: June 1, 2009
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-04.txt draft-ietf-idnabis-rationale-05.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 6, 2009. This Internet-Draft will expire on June 1, 2009.
Abstract Abstract
Several years have passed since the original protocol for Several years have passed since the original protocol for
Internationalized Domain Names (IDNs) was completed and deployed. Internationalized Domain Names (IDNs) was completed and deployed.
During that time, a number of issues have arisen, including the need During that time, a number of issues have arisen, including the need
to update the system to deal with newer versions of Unicode. Some of to update the system to deal with newer versions of Unicode. Some of
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1. Documents and Standards . . . . . . . . . . . . . . . 4 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5
1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5
1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5
1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6
1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 7 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8
2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9
3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9
3.1. A Tiered Model of Permitted Characters and Labels . . . . 9 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10
3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11
3.1.1.2. Rules and Their Application . . . . . . . . . . . 11
3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 12 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 13 Applications . . . . . . . . . . . . . . . . . . . . . . . 13
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 13 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14
4.1. Display and Network Order . . . . . . . . . . . . . . . . 13 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14
4.2. Entry and Display in Applications . . . . . . . . . . . . 15 4.2. Entry and Display in Applications . . . . . . . . . . . . 15
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 16 Alternate Character Forms . . . . . . . . . . . . . . . . 16
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 19 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20
6. Front-end and User Interface Processing . . . . . . . . . . . 20 6. Front-end and User Interface Processing . . . . . . . . . . . 21
7. Migration from IDNA2003 and Unicode Version Synchronization . 23 7. Migration from IDNA2003 and Unicode Version Synchronization . 23
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23
7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 23 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24
7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25
7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26
7.2. Changes in Character Interpretations . . . . . . . . . . . 27 7.2. Changes in Character Interpretations . . . . . . . . . . . 27
7.3. More Flexibility in User Agents . . . . . . . . . . . . . 28 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29
7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30
7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30
7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31
7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31
7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32
7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32
7.7. Migration Between Unicode Versions: Unassigned Code 7.7. Migration Between Unicode Versions: Unassigned Code
Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 34 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36
10. Internationalization Considerations . . . . . . . . . . . . . 36 10. Internationalization Considerations . . . . . . . . . . . . . 36
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 36 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37
11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37
11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37
12. Security Considerations . . . . . . . . . . . . . . . . . . . 37 12. Security Considerations . . . . . . . . . . . . . . . . . . . 37
12.1. General Security Issues with IDNA . . . . . . . . . . . . 37 12.1. General Security Issues with IDNA . . . . . . . . . . . . 37
12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 37 12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 38
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38
13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38
13.2. Informative References . . . . . . . . . . . . . . . . . . 39 13.2. Informative References . . . . . . . . . . . . . . . . . . 39
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41
A.1. Changes between Version -00 and Version -01 of A.1. Changes between Version -00 and Version -01 of
draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41
A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42
A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42
A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 42 A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43
A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43
Intellectual Property and Copyright Statements . . . . . . . . . . 44 Intellectual Property and Copyright Statements . . . . . . . . . . 45
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
The original standards for Internationalized Domain Names (IDNs) were The original standards for Internationalized Domain Names (IDNs) were
completed and deployed starting in 2003. Those standards are known completed and deployed starting in 2003. Those standards are known
as Internationalized Domain Names in Applications (IDNA), taken from as Internationalized Domain Names in Applications (IDNA), taken from
the name of the highest level standard within the group, RFC 3490 the name of the highest level standard within the group, RFC 3490
[RFC3490]. After those standards were deployed, a number of issues [RFC3490]. After those standards were deployed, a number of issues
arose that called for a new version of the IDNA protocol and the arose that called for a new version of the IDNA protocol and the
associated tables, including a subset of those described in a recent associated tables, including a subset of those described in a recent
IAB report [RFC4690] and the need to update the system to deal with IAB report [RFC4690] and the need to update the system to deal with
newer versions of Unicode. This document further explains the issues newer versions of Unicode. This document further explains the issues
that have been encountered when they are important to understanding that have been encountered when they are important to understanding
of the revised protocols. It also provides an overview of the new of the revised protocols. It also provides an overview of the new
IDNA model and explanatory material for it. Additional explanatory IDNA model and explanatory material for it. Additional explanatory
material for the specific components of the proposals appears with material for the specific components of the proposals appears with
the associated documents. the associated documents.
A good deal of the background material that appeared in RFC 3490
[RFC3490] has been removed from this update. That material is either
of historical interest only or has been covered from a more recent
perspective in RFC 4690 [RFC4690].
This document is not normative. The information it provides is
intended to make the rules, tables, and protocol easier to understand
and to provide overview information and suggestions for zone
administrators and others who need to make policy, deployment, and
similar decisions about IDNs.
1.2. Discussion Forum 1.2. Discussion Forum
[[ RFC Editor: please remove this section. ]] [[ RFC Editor: please remove this section. ]]
IDNA2008 is being discussed in the IETF "idnabis" Working Group and IDNA2008 is being discussed in the IETF "idnabis" Working Group and
on the mailing list idna-update@alvestrand.no on the mailing list idna-update@alvestrand.no
1.3. Terminology 1.3. Terminology
Terminology that is critical for understanding this document and the Terminology that is critical for understanding this document and the
skipping to change at page 6, line 5 skipping to change at page 6, line 17
the third and fourth positions, essentially requiring that such the third and fourth positions, essentially requiring that such
strings be IDNA-valid. This restriction on strings containing "--" strings be IDNA-valid. This restriction on strings containing "--"
is required for three reasons: is required for three reasons:
o to prevent confusion with pre-IDNA coding forms; o to prevent confusion with pre-IDNA coding forms;
o to permit future extensions that would require changing the o to permit future extensions that would require changing the
prefix, no matter how unlikely those might be (see Section 7.4); prefix, no matter how unlikely those might be (see Section 7.4);
and and
o to reduce the opportunities for attacks via the encoding system. o to reduce the opportunities for attacks via the Punycode encoding
algorithm itself.
1.4. Objectives 1.4. Objectives
The intent of the IDNA revision effort, and hence of this document The intent of the IDNA revision effort, and hence of this document
and the associated ones, is to increase the usability and and the associated ones, is to increase the usability and
effectiveness of internationalized domain names (IDNs) while effectiveness of internationalized domain names (IDNs) while
preserving or strengthening the integrity of references that use preserving or strengthening the integrity of references that use
them. The original "hostname" character definitions (see, e.g., them. The original "hostname" character definitions (see, e.g.,
[RFC0810]) struck a balance between the creation of useful mnemonics [RFC0810]) struck a balance between the creation of useful mnemonics
and the introduction of parsing problems or general confusion in the and the introduction of parsing problems or general confusion in the
skipping to change at page 8, line 31 skipping to change at page 8, line 45
However, shifting responsibility for character mapping and other However, shifting responsibility for character mapping and other
adjustments from the protocol (where it was located in IDNA2003) to adjustments from the protocol (where it was located in IDNA2003) to
the user interface or processing before invoking IDNA raises issues the user interface or processing before invoking IDNA raises issues
about both what that processing should do and about compatibility for about both what that processing should do and about compatibility for
references prepared in an IDNA2003 context. Those issues are references prepared in an IDNA2003 context. Those issues are
discussed in Section 6. discussed in Section 6.
Operations for converting between local character sets and normalized Operations for converting between local character sets and normalized
Unicode are part of this general set of user interface issues. The Unicode are part of this general set of user interface issues. The
conversion is obviously not required at all in a Unicode-native conversion is obviously not required at all in a Unicode-native
system that maintains all strings in Normalization Form C (NFC). It system that maintains all strings in Normalization Form C (NFC).
may, however, involve some complexity in a system that is not (See [Unicode-UAX15] for precise definitions of NFC and NFKC if
Unicode-native, especially if the elements of the local character set needed.) It may, however, involve some complexity in a system that
do not map exactly and unambiguously into Unicode characters or do so is not Unicode-native, especially if the elements of the local
in a way that is not completely stable over time. Perhaps more character set do not map exactly and unambiguously into Unicode
important, if a label being converted to a local character set characters or do so in a way that is not completely stable over time.
contains Unicode characters that have no correspondence in that Perhaps more important, if a label being converted to a local
character set, the application may have to apply special, locally- character set contains Unicode characters that have no correspondence
appropriate, methods to avoid or reduce loss of information. in that character set, the application may have to apply special,
locally-appropriate, methods to avoid or reduce loss of information.
Depending on the system involved, the major difficulty may not lie in Depending on the system involved, the major difficulty may not lie in
the mapping but in accurately identifying the incoming character set the mapping but in accurately identifying the incoming character set
and then applying the correct conversion routine. If a local and then applying the correct conversion routine. If a local
operating system uses one of the ISO 8859 character sets or an operating system uses one of the ISO 8859 character sets or an
extensive national or industrial system such as GB18030 [GB18030] or extensive national or industrial system such as GB18030 [GB18030] or
BIG5 [BIG5], one must correctly identify the character set in use BIG5 [BIG5], one must correctly identify the character set in use
before converting to Unicode even though those character coding before converting to Unicode even though those character coding
systems are substantially or completely Unicode-compatible (i.e., all systems are substantially or completely Unicode-compatible (i.e., all
of the code points in them have an exact and unique mapping to of the code points in them have an exact and unique mapping to
skipping to change at page 9, line 33 skipping to change at page 9, line 47
3. Permitted Characters: An Inclusion List 3. Permitted Characters: An Inclusion List
This section provides an overview of the model used to establish the This section provides an overview of the model used to establish the
algorithm and character lists of [IDNA2008-Tables] and describes the algorithm and character lists of [IDNA2008-Tables] and describes the
names and applicability of the categories used there. Note that the names and applicability of the categories used there. Note that the
inclusion of a character in the first category group does not imply inclusion of a character in the first category group does not imply
that it can be used indiscriminately; some characters are associated that it can be used indiscriminately; some characters are associated
with contextual rules that must be applied as well. with contextual rules that must be applied as well.
The information given in this section is provided to make the rules, The information given in this section is provided to make the rules,
tables, and protocol easier to understand. It is not normative. The tables, and protocol easier to understand. The normative generating
normative generating rules appear in [IDNA2008-Tables] and the rules rules that correspond to this informal discussion appear in
that actually determine what labels can be registered or looked up [IDNA2008-Tables] and the rules that actually determine what labels
are in [IDNA2008-Protocol]. can be registered or looked up are in [IDNA2008-Protocol].
3.1. A Tiered Model of Permitted Characters and Labels 3.1. A Tiered Model of Permitted Characters and Labels
Moving to an inclusion model requires respecifying the list of Moving to an inclusion model requires respecifying the list of
characters that are permitted in IDNs. In IDNA2003, the role and characters that are permitted in IDNs. In IDNA2003, the role and
utility of characters are independent of context and fixed forever utility of characters are independent of context and fixed forever
(or until the standard is replaced). Making completely context- (or until the standard is replaced). Making completely context-
independent rules globally has proven impractical because some independent rules globally has proven impractical because some
characters, especially those that are called "Join_Controls" in characters, especially those that are called "Join_Controls" in
Unicode, are needed to make reasonable use of some scripts but have Unicode, are needed to make reasonable use of some scripts but have
skipping to change at page 10, line 12 skipping to change at page 10, line 26
characters entirely. But the restrictions were much too severe to characters entirely. But the restrictions were much too severe to
permit an adequate range of mnemonics for terminology based on some permit an adequate range of mnemonics for terminology based on some
languages. The requirement to support those characters but limit languages. The requirement to support those characters but limit
their use to very specific contexts was reinforced by the observation their use to very specific contexts was reinforced by the observation
that handling of particular characters across the languages that use that handling of particular characters across the languages that use
a script, or the use of similar or identical-looking characters in a script, or the use of similar or identical-looking characters in
different scripts, is less well understood than many people believed different scripts, is less well understood than many people believed
it was several years ago. it was several years ago.
Independently of the characters chosen (see next subsection), the Independently of the characters chosen (see next subsection), the
theory is to divide the characters that appear in Unicode into three approach is to divide the characters that appear in Unicode into
categories: three categories:
3.1.1. PROTOCOL-VALID 3.1.1. PROTOCOL-VALID
Characters identified as "PROTOCOL-VALID" (often abbreviated Characters identified as "PROTOCOL-VALID" (often abbreviated
"PVALID") are, in general, permitted by IDNA for all uses in IDNs. "PVALID") are, in general, permitted by IDNA for all uses in IDNs.
Their use may be restricted by rules about the context in which they Their use may be restricted by rules about the context in which they
appear or by other rules that apply to the entire label in which they appear or by other rules that apply to the entire label in which they
are to be embedded. For example, any label that contains a character are to be embedded. For example, any label that contains a character
in this category that has a "right-to-left" property must be used in in this category that has a "right-to-left" property must be used in
context with the "Bidi" rules (see [IDNA2008-Bidi]). context with the "Bidi" rules (see [IDNA2008-Bidi]).
The term "PROTOCOL-VALID" is used to stress the fact that the The term "PROTOCOL-VALID" is used to stress the fact that the
presence of a character in this category does not imply that a given presence of a character in this category does not imply that a given
registry need accept registrations containing any of the characters registry need accept registrations containing any of the characters
in the category. Registries are still expected to apply judgment in the category. Registries are still expected to apply judgment
about labels they will accept and to maintain rules consistent with about labels they will accept and to maintain rules consistent with
those judgments (see [IDNA2008-Protocol] and Section 3.3). those judgments (see [IDNA2008-Protocol] and Section 3.3).
Characters that are placed in the "PROTOCOL-VALID" category are never Characters that are placed in the "PROTOCOL-VALID" category are
removed from it unless the code points themselves are removed from expected to never be removed from it or reclassified. While
Unicode (such removal would be inconsistent with the Unicode theoretically characters could be removed from Unicode, such removal
stability principles (see [Unicode51], Appendix F) and hence should would be inconsistent with the Unicode stability principles (see
never occur). [Unicode51], Appendix F) and hence should never occur.
3.1.1.1. Contextual Rules 3.1.1.1. Contextual Rules
Some characters may be unsuitable for general use in IDNs but Some characters may be unsuitable for general use in IDNs but
necessary for the plausible support of some scripts. The two most necessary for the plausible support of some scripts. The two most
commonly-cited examples are the zero-width joiner and non-joiner commonly-cited examples are the zero-width joiner and non-joiner
characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for
unambiguous labels may require that other characters be restricted to unambiguous labels may require that other characters be restricted to
particular contexts. For example, the ASCII hyphen is not permitted particular contexts. For example, the ASCII hyphen is not permitted
to start or end a label, whether that label contains non-ASCII to start or end a label, whether that label contains non-ASCII
skipping to change at page 11, line 13 skipping to change at page 11, line 28
most scripts but affect format or presentation in a few others or most scripts but affect format or presentation in a few others or
because they are combining characters that are safe for use only in because they are combining characters that are safe for use only in
conjunction with particular characters or scripts. In order to conjunction with particular characters or scripts. In order to
permit them to be used at all, they are specially identified as permit them to be used at all, they are specially identified as
"CONTEXTUAL RULE REQUIRED" and, when adequately understood, "CONTEXTUAL RULE REQUIRED" and, when adequately understood,
associated with a rule. In addition, the rule will define whether it associated with a rule. In addition, the rule will define whether it
is to be applied on lookup as well as registration. A distinction is is to be applied on lookup as well as registration. A distinction is
made between characters that indicate or prohibit joining (known as made between characters that indicate or prohibit joining (known as
"CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring
contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the
former are fully tested at lookup time. former require full testing at lookup time.
3.1.1.2. Rules and Their Application 3.1.1.2. Rules and Their Application
The actual rules may be present or absent. If present, they may have The actual rules may be present or absent. If present, they may have
values of "True" (character may be used in any position in any values of "True" (character may be used in any position in any
label), "False" (character may not be used in any label), or may be a label), "False" (character may not be used in any label), or may be a
set of procedural rules that specify the context in which the set of procedural rules that specify the context in which the
character is permitted. character is permitted.
Examples of descriptions of typical rules, stated informally and in Examples of descriptions of typical rules, stated informally and in
skipping to change at page 11, line 41 skipping to change at page 12, line 7
version of the tables. Characters associated with null rules are not version of the tables. Characters associated with null rules are not
permitted to appear in putative labels for either registration or permitted to appear in putative labels for either registration or
lookup. Of course, a later version of the tables might contain a lookup. Of course, a later version of the tables might contain a
non-null rule. non-null rule.
The description of the syntax of the rules, and the rules themselves, The description of the syntax of the rules, and the rules themselves,
appears in [IDNA2008-Tables]. appears in [IDNA2008-Tables].
3.1.2. DISALLOWED 3.1.2. DISALLOWED
Some characters are sufficiently problematic for use in IDNs that Some characters are inappropriate for use in IDNs and are thus
they should be excluded for both registration and lookup (i.e., IDNA- excluded for both registration and lookup (i.e., IDNA-conforming
conforming applications performing name lookup should verify that applications performing name lookup should verify that these
these characters are absent; if they are present, the label strings characters are absent; if they are present, the label strings should
should be rejected rather than converted to A-labels and looked up. be rejected rather than converted to A-labels and looked up. Some of
these characters are problematic for use in IDNs (such as the
FRACTION SLASH character, U+2044), while some of them (such as the
various HEART symbols, e.g., U+2665, U+2661, and U+2765, see
Section 7.6) simply fall outside the conventions for typical
identifiers (basically letters and numbers).
Of course, this category would include code points that had been Of course, this category would include code points that had been
removed entirely from Unicode should such removals ever occur. removed entirely from Unicode should such removals ever occur.
Characters that are placed in the "DISALLOWED" category are expected Characters that are placed in the "DISALLOWED" category are expected
to never be removed from it or reclassified. If a character is to never be removed from it or reclassified. If a character is
classified as "DISALLOWED" in error and the error is sufficiently classified as "DISALLOWED" in error and the error is sufficiently
problematic, the only recourse would be either to introduce a new problematic, the only recourse would be either to introduce a new
code point into Unicode and classify it as "PROTOCOL-VALID" or for code point into Unicode and classify it as "PROTOCOL-VALID" or for
the IETF to accept the considerable costs of an incompatible change the IETF to accept the considerable costs of an incompatible change
skipping to change at page 12, line 31 skipping to change at page 12, line 50
mapped to another character by Unicode casefolding. mapped to another character by Unicode casefolding.
o The character is a symbol or punctuation form or, more generally, o The character is a symbol or punctuation form or, more generally,
something that is not a letter, digit, or a mark that is used to something that is not a letter, digit, or a mark that is used to
form a letter or digit. form a letter or digit.
3.1.3. UNASSIGNED 3.1.3. UNASSIGNED
For convenience in processing and table-building, code points that do For convenience in processing and table-building, code points that do
not have assigned values in a given version of Unicode are treated as not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points MUST belonging to a special UNASSIGNED category. Such code points are
NOT appear in labels to be registered or looked up. The category prohibited in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories (at which point, they are classified into one of the other categories
as appropriate). as appropriate).
3.2. Registration Policy 3.2. Registration Policy
While these recommendations cannot and should not define registry While these recommendations cannot and should not define registry
policies, registries SHOULD develop and apply additional restrictions policies, registries should develop and apply additional restrictions
to reduce confusion and other problems. For example, it is generally to reduce confusion and other problems. For example, it is generally
believed that labels containing characters from more than one script believed that labels containing characters from more than one script
are a bad practice although there may be some important exceptions to are a bad practice although there may be some important exceptions to
that principle. Some registries may choose to restrict registrations that principle. Some registries may choose to restrict registrations
to characters drawn from a very small number of scripts. For many to characters drawn from a very small number of scripts. For many
scripts, the use of variant techniques such as those as described in scripts, the use of variant techniques such as those as described in
RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for
Chinese by the tables described in RFC 4713 [RFC4713] may be helpful Chinese by the tables described in RFC 4713 [RFC4713] may be helpful
in reducing problems that might be perceived by users. in reducing problems that might be perceived by users.
skipping to change at page 15, line 26 skipping to change at page 15, line 43
Applications can accept domain names using any character set or sets Applications can accept domain names using any character set or sets
desired by the application developer, specified by the operating desired by the application developer, specified by the operating
system, or dictated by other constraints, and can display domain system, or dictated by other constraints, and can display domain
names in any character set or character coding system. That is, the names in any character set or character coding system. That is, the
IDNA protocol does not affect the interface between users and IDNA protocol does not affect the interface between users and
applications. applications.
An IDNA-aware application can accept and display internationalized An IDNA-aware application can accept and display internationalized
domain names in two formats: the internationalized character set(s) domain names in two formats: the internationalized character set(s)
supported by the application (i.e., an appropriate local supported by the application (i.e., an appropriate local
representation of a U-label), and as an A-label. Applications MAY representation of a U-label), and as an A-label. Applications may
allow the display of A-labels, but are encouraged to not do so except allow the display of A-labels, but are encouraged to not do so except
as an interface for special purposes, possibly for debugging, or to as an interface for special purposes, possibly for debugging, or to
cope with display limitations. In general, they SHOULD allow, but cope with display limitations. In general, they should allow, but
not encourage, user input of that label form. A-labels are opaque not encourage, user input of that label form. A-labels are opaque
and ugly and malicious variations on them are not easily detected by and ugly and malicious variations on them are not easily detected by
users. Where possible, they should thus only be exposed to users and users. Where possible, they should thus only be exposed to users and
in contexts in which they are absolutely needed. Because IDN labels in contexts in which they are absolutely needed. Because IDN labels
can be rendered either as A-labels or U-labels, the application may can be rendered either as A-labels or U-labels, the application may
reasonably have an option for the user to select the preferred method reasonably have an option for the user to select the preferred method
of display; if it does, rendering the U-label should normally be the of display; if it does, rendering the U-label should normally be the
default. default.
Domain names are often stored and transported in many places. For Domain names are often stored and transported in many places. For
example, they are part of documents such as mail messages and web example, they are part of documents such as mail messages and web
pages. They are transported in many parts of many protocols, such as pages. They are transported in many parts of many protocols, such as
both the control commands of SMTP and associated the message body both the control commands of SMTP and associated the message body
parts, and in the headers and the body content in HTTP. It is parts, and in the headers and the body content in HTTP. It is
important to remember that domain names appear both in domain name important to remember that domain names appear both in domain name
slots and in the content that is passed over protocols. slots and in the content that is passed over protocols.
In protocols and document formats that define how to handle In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in specification or negotiation of charsets, labels can be encoded in
any charset allowed by the protocol or document format. If a any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, the labels MUST protocol or document format only allows one charset, the labels must
be given in that charset. Of course, not all charsets can properly be given in that charset. Of course, not all charsets can properly
represent all labels. If a U-label cannot be displayed in its represent all labels. If a U-label cannot be displayed in its
entirety, the only choice (without loss of information) may be to entirety, the only choice (without loss of information) may be to
display the A-label. display the A-label.
In any place where a protocol or document format allows transmission In any place where a protocol or document format allows transmission
of the characters in internationalized labels, labels SHOULD be of the characters in internationalized labels, labels should be
transmitted using whatever character encoding and escape mechanism transmitted using whatever character encoding and escape mechanism
the protocol or document format uses at that place. This provision the protocol or document format uses at that place. This provision
is intended to prevent situations in which, e.g., UTF-8 domain names is intended to prevent situations in which, e.g., UTF-8 domain names
appear embedded in text that is otherwise in some other character appear embedded in text that is otherwise in some other character
coding. coding.
All protocols that use domain name slots already have the capacity All protocols that use domain name slots already have the capacity
for handling domain names in the ASCII charset. Thus, A-labels can for handling domain names in the ASCII charset. Thus, A-labels can
inherently be handled by those protocols. inherently be handled by those protocols.
skipping to change at page 22, line 8 skipping to change at page 22, line 26
welcome.]] welcome.]]
As discussed elsewhere in this document, the IDNA2008 model removes As discussed elsewhere in this document, the IDNA2008 model removes
all of these mappings and interpretations, including the equivalence all of these mappings and interpretations, including the equivalence
of different forms of dots, from the protocol, discouraging such of different forms of dots, from the protocol, discouraging such
mappings and leaving them, when necessary, to local processing. This mappings and leaving them, when necessary, to local processing. This
should not be taken to imply that local processing is optional or can should not be taken to imply that local processing is optional or can
be avoided entirely, even if doing so might have been desirable in a be avoided entirely, even if doing so might have been desirable in a
world without IDNA2003 IDNs in files and archives. Instead, unless world without IDNA2003 IDNs in files and archives. Instead, unless
the program context is such that it is known that any IDNs that the program context is such that it is known that any IDNs that
appear will be either U-labels or A-labels, or that other forms can appear will contain either U-label or A-label forms, or that other
safely be rejected, some local processing of apparent domain name forms can safely be rejected, some local processing of apparent
strings will be required, both to maintain compatibility with domain name strings will be required, both to maintain compatibility
IDNA2003 and to prevent user astonishment. Such local processing, with IDNA2003 and to prevent user astonishment. Such local
while not specified in this document or the associated ones, will processing, while not specified in this document or the associated
generally take one of two forms: ones, will generally take one of two forms:
o Generic Preprocessing. o Generic Preprocessing.
When the context in which the program or system that processes When the context in which the program or system that processes
domain names operates is global, a reasonable balance must be domain names operates is global, a reasonable balance must be
found that is sensitive to the broad range of local needs and found that is sensitive to the broad range of local needs and
assumptions while, at the same time, not sacrificing the needs of assumptions while, at the same time, not sacrificing the needs of
one language, script, or user population to those of another. one language, script, or user population to those of another.
For this case, the best practice will usually be to apply NFKC and For this case, the best practice will usually be to apply NFKC and
case-mapping (or, perhaps better yet, Stringprep itself), plus case-mapping (or, perhaps better yet, Stringprep itself), plus
skipping to change at page 25, line 31 skipping to change at page 25, line 49
administrators have been expected to verify that names meet administrators have been expected to verify that names meet
"hostname" [RFC0952] where necessary for the expected applications. "hostname" [RFC0952] where necessary for the expected applications.
Later addition of special service location formats [RFC2782] imposed Later addition of special service location formats [RFC2782] imposed
new requirements on zone administrators for the use of labels that new requirements on zone administrators for the use of labels that
conform to the requirements of those formats. For zones that will conform to the requirements of those formats. For zones that will
contain IDNs, support for Unicode version-independence requires contain IDNs, support for Unicode version-independence requires
restrictions on all strings placed in the zone. In particular, for restrictions on all strings placed in the zone. In particular, for
such zones: such zones:
o Any label that appears to be an A-label, i.e., any label that o Any label that appears to be an A-label, i.e., any label that
starts in "xn--", MUST be IDNA-valid, i.e., they MUST be valid starts in "xn--", must be IDNA-valid, i.e., they must be valid
A-labels, as discussed in Section 2 above. A-labels, as discussed in Section 2 above.
o The Unicode tables (i.e., tables of code points, character o The Unicode tables (i.e., tables of code points, character
classes, and properties) and IDNA tables (i.e., tables of classes, and properties) and IDNA tables (i.e., tables of
contextual rules such as those that appear in the Tables contextual rules such as those that appear in the Tables
document), MUST be consistent on the systems performing or document), must be consistent on the systems performing or
validating labels to be registered. Note that this does not validating labels to be registered. Note that this does not
require that tables reflect the latest version of Unicode, only require that tables reflect the latest version of Unicode, only
that all tables used on a given system are consistent with each that all tables used on a given system are consistent with each
other. other.
Under this model, a registry (or entity communicating with a registry Under this model, a registry (or entity communicating with a registry
to accomplish name registrations) will need to update its tables -- to accomplish name registrations) will need to update its tables --
both the Unicode-associated tables and the tables of permitted IDN both the Unicode-associated tables and the tables of permitted IDN
characters -- to enable a new script or other set of new characters. characters -- to enable a new script or other set of new characters.
It will not be affected by newer versions of Unicode, or newly- It will not be affected by newer versions of Unicode, or newly-
skipping to change at page 26, line 10 skipping to change at page 26, line 30
registrations. The zone administrator is also responsible -- under registrations. The zone administrator is also responsible -- under
the protocol and to registrants and users -- for both checking as the protocol and to registrants and users -- for both checking as
required by the protocol and verification that whatever policies it required by the protocol and verification that whatever policies it
develops are complied with, whether those policies are for minimizing develops are complied with, whether those policies are for minimizing
risks due to confusable characters and sequences, for preserving risks due to confusable characters and sequences, for preserving
language or script integrity, or for other purposes. Those checking language or script integrity, or for other purposes. Those checking
and verification procedures are more extensive than those that are is and verification procedures are more extensive than those that are is
expected of applications systems that look names up. expected of applications systems that look names up.
Systems looking up or resolving DNS labels, especially IDN DNS Systems looking up or resolving DNS labels, especially IDN DNS
labels, MUST be able to assume that applicable registration rules labels, must be able to assume that applicable registration rules
were followed for names entered into the DNS. were followed for names entered into the DNS.
7.1.3. Labels in Lookup 7.1.3. Labels in Lookup
Anyone looking up a label in a DNS zone is required to Anyone looking up a label in a DNS zone is required to
o Maintain a consistent set of tables, as discussed above. As with o Maintain a consistent set of tables, as discussed above. As with
registration, the tables need not reflect the latest version of registration, the tables need not reflect the latest version of
Unicode but they must be consistent. Unicode but they must be consistent.
skipping to change at page 26, line 36 skipping to change at page 27, line 8
o Validate the label itself for conformance with a small number of o Validate the label itself for conformance with a small number of
whole-label rules, notably verifying that there are no leading whole-label rules, notably verifying that there are no leading
combining marks, that the "bidi" conditions are met if right to combining marks, that the "bidi" conditions are met if right to
left characters appear, that any required contextual rules are left characters appear, that any required contextual rules are
available and that, if such rules are associated with Joiner available and that, if such rules are associated with Joiner
Controls, they are tested. Controls, they are tested.
o Avoid validating other contextual rules about characters, o Avoid validating other contextual rules about characters,
including mixed-script label prohibitions, although such rules may including mixed-script label prohibitions, although such rules may
be used to influence presentation decisions in the user interface. be used to influence presentation decisions in the user interface.
[[anchor19: Check this, and all similar statements, against
Protocol when that is finished.]]
By avoiding applying its own interpretation of which labels are valid By avoiding applying its own interpretation of which labels are valid
as a means of rejecting lookup attempts, the lookup application as a means of rejecting lookup attempts, the lookup application
becomes less sensitive to version incompatibilities with the becomes less sensitive to version incompatibilities with the
particular zone registry associated with the domain name. particular zone registry associated with the domain name.
An application or client that processes names according to this An application or client that processes names according to this
protocol and then resolves them in the DNS will be able to locate any protocol and then resolves them in the DNS will be able to locate any
name that is validly registered, as long as its version of the name that is validly registered, as long as its version of the
Unicode-associated tables is sufficiently up-to-date to interpret all Unicode-associated tables is sufficiently up-to-date to interpret all
of the characters in the label. Messages to users should distinguish of the characters in the label. Messages to users should distinguish
between "label contains an unallocated code point" and other types of between "label contains an unallocated code point" and other types of
lookup failures. A failure on the basis of an old version of Unicode lookup failures. A failure on the basis of an old version of Unicode
may lead the user to a desire to upgrade to a newer version, but will may lead the user to a desire to upgrade to a newer version, but will
have no other ill effects (this is consistent with behavior in the have no other ill effects (this is consistent with behavior in the
transition to the DNS when some hosts could not yet handle some forms transition to the DNS when some hosts could not yet handle some forms
of names or record types). of names or record types).
7.2. Changes in Character Interpretations 7.2. Changes in Character Interpretations
[[anchor19: Note in Draft: This subsection is completely new in [[anchor20: Note in Draft: This subsection is completely new in
version -04 of this document. It could almost certainly use version -04 of this document. It could almost certainly use
improvement. It also contains some material that is redundant with improvement. It also contains some material that is redundant with
material in other sections. I have not tried to remove that material material in other sections. I have not tried to remove that material
and will not do so until the WG concludes that this section is and will not do so until the WG concludes that this section is
relatively stable, but would appreciate help in identifying what relatively stable, but would appreciate help in identifying what
should be removed or how this might be enhanced to contain more of should be removed or how this might be enhanced to contain more of
that other material. --JcK]] that other material. --JcK]]
In those scripts that make case distinctions, there are a few In those scripts that make case distinctions, there are a few
characters for which an obvious and unique upper case character has characters for which an obvious and unique upper case character has
skipping to change at page 31, line 40 skipping to change at page 32, line 11
new ones would first process a putative label under the IDNA2008 new ones would first process a putative label under the IDNA2008
rules and try to look it up and then, if it were not found, would rules and try to look it up and then, if it were not found, would
process the label under IDNA2003 rules and look it up again. That process the label under IDNA2003 rules and look it up again. That
process could significantly slow down all processing that involved process could significantly slow down all processing that involved
IDNs in the DNS especially since, in principle, a fully-qualified IDNs in the DNS especially since, in principle, a fully-qualified
name could contain a mixture of labels that were registered with the name could contain a mixture of labels that were registered with the
old and new prefixes, a situation that would make the use of DNS old and new prefixes, a situation that would make the use of DNS
caching very difficult. In addition, looking up the same input caching very difficult. In addition, looking up the same input
string as two separate A-labels would create some potential for string as two separate A-labels would create some potential for
confusion and attacks, since they could, in principle, map to confusion and attacks, since they could, in principle, map to
different targets and then resolve to different DNS label nodes. different targets and then resolve to different entries in the DNS.
Consequently, a prefix change is to be avoided if at all possible, Consequently, a prefix change is to be avoided if at all possible,
even if it means accepting some IDNA2003 decisions about character even if it means accepting some IDNA2003 decisions about character
distinctions as irreversible and/or giving special treatment to edge distinctions as irreversible and/or giving special treatment to edge
cases. cases.
7.5. Stringprep Changes and Compatibility 7.5. Stringprep Changes and Compatibility
The Nameprep [RFC3491] specification, a key part of IDNA2003, is a The Nameprep [RFC3491] specification, a key part of IDNA2003, is a
profile of Stringprep [RFC3454]. While Nameprep is a Stringprep profile of Stringprep [RFC3454]. While Nameprep is a Stringprep
skipping to change at page 33, line 9 skipping to change at page 33, line 29
there are no uniform conventions for naming; variations such as there are no uniform conventions for naming; variations such as
outline, solid, and shaded forms may or may not exist; and so on. outline, solid, and shaded forms may or may not exist; and so on.
As just one example, consider a "heart" symbol as it might appear As just one example, consider a "heart" symbol as it might appear
in a logo that might be read as "I love...". While the user might in a logo that might be read as "I love...". While the user might
read such a logo as "I love..." or "I heart...", considerable read such a logo as "I love..." or "I heart...", considerable
knowledge of the coding distinctions made in Unicode is needed to knowledge of the coding distinctions made in Unicode is needed to
know that there more than one "heart" character (e.g., U+2665, know that there more than one "heart" character (e.g., U+2665,
U+2661, and U+2765) and how to describe it. These issues are of U+2661, and U+2765) and how to describe it. These issues are of
particular importance if strings are expected to be understood or particular importance if strings are expected to be understood or
transcribed by the listener after being read out loud. transcribed by the listener after being read out loud.
[[anchor20: The above paragraph remains controversial as to [[anchor21: The above paragraph remains controversial as to
whether it is valid. The WG will need to make a decision if this whether it is valid. The WG will need to make a decision if this
section is not dropped entirely.]] section is not dropped entirely.]]
o As a simplified example of this, assume one wanted to use a o As a simplified example of this, assume one wanted to use a
"heart" or "star" symbol in a label. This is problematic because "heart" or "star" symbol in a label. This is problematic because
those names are ambiguous in the Unicode system of naming (the those names are ambiguous in the Unicode system of naming (the
actual Unicode names require far more qualification). A user or actual Unicode names require far more qualification). A user or
would-be registrant has no way to know -- absent careful study of would-be registrant has no way to know -- absent careful study of
the code tables -- whether it is ambiguous (e.g., where there are the code tables -- whether it is ambiguous (e.g., where there are
multiple "heart" characters) or not. Conversely, the user seeing multiple "heart" characters) or not. Conversely, the user seeing
skipping to change at page 33, line 32 skipping to change at page 34, line 4
"black heart", or as any of the other examples below. "black heart", or as any of the other examples below.
o The actual situation is even worse than this. There is no o The actual situation is even worse than this. There is no
possible way for a normal, casual, user to tell the difference possible way for a normal, casual, user to tell the difference
between the hearts of U+2665 and U+2765 and the stars of U+2606 between the hearts of U+2665 and U+2765 and the stars of U+2606
and U+2729 or the without somehow knowing to look for a and U+2729 or the without somehow knowing to look for a
distinction. We have a white heart (U+2661) and few black hearts. distinction. We have a white heart (U+2661) and few black hearts.
Consequently, describing a label as containing a heart hopelessly Consequently, describing a label as containing a heart hopelessly
ambiguous: we can only know that it contains one of several ambiguous: we can only know that it contains one of several
characters that look like hearts or have "heart" in their names. characters that look like hearts or have "heart" in their names.
In cities where "Square" is a popular part of a location name, one In cities where "Square" is a popular part of a location name, one
might well want to use a square symbol in a label as well and might well want to use a square symbol in a label as well and
there are far more squares of various flavors in Unicode than there are far more squares of various flavors in Unicode than
there are hearts or stars. there are hearts or stars.
o The consequence of these ambiguities of description and o The consequence of these ambiguities of description and
dependencies on distinctions that were, or were not, made in dependencies on distinctions that were, or were not, made in
Unicode codings is that symbols are a very poor basis for reliable Unicode codings is that symbols are a very poor basis for reliable
communication. Consistent with this conclusion, the Unicode communication. Consistent with this conclusion, the Unicode
standard recommends that strings used in identifiers not contain standard recommends that strings used in identifiers not contain
symbols or punctuation [Unicode-UAX31]. Of course, these symbols or punctuation [Unicode-UAX31]. Of course, these
difficulties with symbols do not arise with actual pictographic difficulties with symbols do not arise with actual pictographic
languages and scripts which would be treated like any other languages and scripts which would be treated like any other
language characters; the two should not be confused. language characters; the two should not be confused.
7.7. Migration Between Unicode Versions: Unassigned Code Points 7.7. Migration Between Unicode Versions: Unassigned Code Points
In IDNA2003, labels containing unassigned code points are looked up In IDNA2003, labels containing unassigned code points are looked up
on the theory that, if they appear in labels and can be mapped and on the assumption that, if they appear in labels and can be mapped
then resolved, the relevant standards must have changed and the and then resolved, the relevant standards must have changed and the
registry has properly allocated only assigned values. registry has properly allocated only assigned values.
In IDNA2008, strings containing unassigned code points MUST NOT be In IDNA2008, strings containing unassigned code points must not be
either looked up or registered. There are several reasons for this, either looked up or registered. There are several reasons for this,
with the most important ones being: with the most important ones being:
o It cannot be known with sufficient reliability in advance that a o It cannot be known with sufficient reliability in advance that a
code point that was not previously assigned will not be assigned code point that was not previously assigned will not be assigned
to a compatibility character. In IDNA2003, since there is no to a compatibility character. In IDNA2003, since there is no
direct dependency on NFKC (Stringprep's tables are based on NFKC, direct dependency on NFKC (Stringprep's tables are based on NFKC,
but IDNA2003 depends only on Stringprep), allocation of a but IDNA2003 depends only on Stringprep), allocation of a
compatibility character might produce some odd situations, but it compatibility character might produce some odd situations, but it
would not be a problem. In IDNA2008, where compatibility would not be a problem. In IDNA2008, where compatibility
skipping to change at page 35, line 45 skipping to change at page 36, line 18
Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary
Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel,
Michel Suignard, and Ken Whistler. We express our thanks to Google Michel Suignard, and Ken Whistler. We express our thanks to Google
for support of that meeting and to the participants for their for support of that meeting and to the participants for their
contributions. contributions.
Useful comments and text on the WG versions of the draft were Useful comments and text on the WG versions of the draft were
received from many participants in the IETF "IDNABIS" WG and a number received from many participants in the IETF "IDNABIS" WG and a number
of document changes resulted from mailing list discussions made by of document changes resulted from mailing list discussions made by
that group. Marcos Sanz provided specific analysis and suggestions that group. Marcos Sanz provided specific analysis and suggestions
that were exceptionally helpful in refining the text, as did Mark that were exceptionally helpful in refining the text, as did Vint
Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan. Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan.
9. Contributors 9. Contributors
While the listed editor held the pen, this core of this document and While the listed editor held the pen, this core of this document and
the initial WG version represents the joint work and conclusions of the initial WG version represents the joint work and conclusions of
an ad hoc design team consisting of the editor and, in alphabetic an ad hoc design team consisting of the editor and, in alphabetic
order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp.
In addition, there were many specific contributions and helpful In addition, there were many specific contributions and helpful
comments from those listed in the Acknowledgments section and others comments from those listed in the Acknowledgments section and others
who have contributed to the development and use of the IDNA who have contributed to the development and use of the IDNA
skipping to change at page 42, line 35 skipping to change at page 43, line 4
the IETF does not normally annotate individual sections of documents the IETF does not normally annotate individual sections of documents
with whether they are normative or not, concerns that we don't know with whether they are normative or not, concerns that we don't know
which is which, claims that some material is normative that would be which is which, claims that some material is normative that would be
problematic if so classified, etc., argue that we should at least be problematic if so classified, etc., argue that we should at least be
able to have a clear discussion on the subject. able to have a clear discussion on the subject.
Two annotations have been applied to sections that might reasonably Two annotations have been applied to sections that might reasonably
be considered normative. One annotation is based on the list of be considered normative. One annotation is based on the list of
sections in Mark Davis's note of 29 September (http:// sections in Mark Davis's note of 29 September (http://
www.alvestrand.no/pipermail/idna-update/2008-September/002667.html). www.alvestrand.no/pipermail/idna-update/2008-September/002667.html).
The other is based on an elaboration of John Klensin's response on 7 The other is based on an elaboration of John Klensin's response on 7
October (http://www.alvestrand.no/pipermail/idna-update/2008-October/ October (http://www.alvestrand.no/pipermail/idna-update/2008-October/
002691.html). These should just be considered two suggestions to 002691.html). These should just be considered two suggestions to
illuminate and, one hopes, advance the Working Group's discussions. illuminate and, one hopes, advance the Working Group's discussions.
Some additional editorial changes have been made, but they are Some additional editorial changes have been made, but they are
basically trivial. In the editor's judgment, it is not possible to basically trivial. In the editor's judgment, it is not possible to
make significantly more progress with this document until the matter make significantly more progress with this document until the matter
of document organization is settled. of document organization is settled.
A.4. Version -04 A.4. Version -04
o Definitional and other normative material moved to new document o Definitional and other normative material moved to new document
(draft-ietf-idnabis-defs). Version -03 annotations removed. (draft-ietf-idnabis-defs). Version -03 annotations removed.
o Material on differences between IDNA2003 and IDNA2003 moved to an o Material on differences between IDNA2003 and IDNA2008 moved to an
appendix in Protocol. appendix in Protocol.
o Material left over from the origins of this document as a o Material left over from the origins of this document as a
preliminary proposal has been removed or rewritten. preliminary proposal has been removed or rewritten.
o Changes made to reflect consensus call results, including removing o Changes made to reflect consensus call results, including removing
several placeholder notes for discussion. several placeholder notes for discussion.
o Added more material, including discussion of historic scripts, to o Added more material, including discussion of historic scripts, to
Section 3.2 on registration policies. Section 3.2 on registration policies.
o Added a new section (Section 7.2) to contain specific discussion o Added a new section (Section 7.2) to contain specific discussion
of handling of characters that are interpreted differently in of handling of characters that are interpreted differently in
input to IDNA2003 and 2008. input to IDNA2003 and 2008.
o Some material, including this section/appendix, rearranged. o Some material, including this section/appendix, rearranged.
A.5. Version -05
o Many small editorial changes, including changes to eliminate the
last vestiges of what appeared to be 2119 language (upper-case
MUST, SHOULD, or MAY) and small adjustments to terminology.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
 End of changes. 48 change blocks. 
75 lines changed or deleted 107 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/