< draft-ietf-idnabis-rationale-05.txt   draft-ietf-idnabis-rationale-06.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft November 28, 2008 Internet-Draft December 15, 2008
Intended status: Informational Intended status: Informational
Expires: June 1, 2009 Expires: June 18, 2009
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-05.txt draft-ietf-idnabis-rationale-06.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 1, 2009. This Internet-Draft will expire on June 18, 2009.
Abstract Abstract
Several years have passed since the original protocol for Several years have passed since the original protocol for
Internationalized Domain Names (IDNs) was completed and deployed. Internationalized Domain Names (IDNs) was completed and deployed.
During that time, a number of issues have arisen, including the need During that time, a number of issues have arisen, including the need
to update the system to deal with newer versions of Unicode. Some of to update the system to deal with newer versions of Unicode. Some of
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
skipping to change at page 2, line 15 skipping to change at page 2, line 15
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5
1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5
1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5
1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7
1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8
2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9
3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9
3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10
3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11
3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 3.1.1.2. Rules and Their Application . . . . . . . . . . . 11
3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 13 Applications . . . . . . . . . . . . . . . . . . . . . . . 13
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14
4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14
4.2. Entry and Display in Applications . . . . . . . . . . . . 15 4.2. Entry and Display in Applications . . . . . . . . . . . . 15
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 16 Alternate Character Forms . . . . . . . . . . . . . . . . 16
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 19
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 20
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20
6. Front-end and User Interface Processing . . . . . . . . . . . 21 6. Front-end and User Interface Processing . . . . . . . . . . . 21
7. Migration from IDNA2003 and Unicode Version Synchronization . 23 7. Migration from IDNA2003 and Unicode Version Synchronization . 24
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24
7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24
7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26
7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27
7.2. Changes in Character Interpretations . . . . . . . . . . . 27 7.2. Changes in Character Interpretations . . . . . . . . . . . 28
7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29
7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31
7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31
7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32
7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32
7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32
7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33
7.7. Migration Between Unicode Versions: Unassigned Code 7.7. Migration Between Unicode Versions: Unassigned Code
Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37
10. Internationalization Considerations . . . . . . . . . . . . . 36 10. Internationalization Considerations . . . . . . . . . . . . . 37
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37
11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38
11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38
12. Security Considerations . . . . . . . . . . . . . . . . . . . 37 12. Security Considerations . . . . . . . . . . . . . . . . . . . 38
12.1. General Security Issues with IDNA . . . . . . . . . . . . 37 12.1. General Security Issues with IDNA . . . . . . . . . . . . 38
12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 38
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38
13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38
13.2. Informative References . . . . . . . . . . . . . . . . . . 39 13.2. Informative References . . . . . . . . . . . . . . . . . . 40
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41
A.1. Changes between Version -00 and Version -01 of A.1. Changes between Version -00 and Version -01 of
draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41
A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42
A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42
A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43
A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 43
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 44
Intellectual Property and Copyright Statements . . . . . . . . . . 45 Intellectual Property and Copyright Statements . . . . . . . . . . 45
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
The original standards for Internationalized Domain Names (IDNs) were The original standards for Internationalized Domain Names (IDNs) were
completed and deployed starting in 2003. Those standards are known completed and deployed starting in 2003. Those standards are known
as Internationalized Domain Names in Applications (IDNA), taken from as Internationalized Domain Names in Applications (IDNA), taken from
the name of the highest level standard within the group, RFC 3490 the name of the highest level standard within the group, RFC 3490
[RFC3490]. After those standards were deployed, a number of issues [RFC3490]. After those standards were deployed, a number of issues
arose that called for a new version of the IDNA protocol and the arose that led to a call for a new version of the IDNA protocol and
associated tables, including a subset of those described in a recent the associated tables, including a subset of those described in a
IAB report [RFC4690] and the need to update the system to deal with recent IAB report [RFC4690] and the need to update the system to deal
newer versions of Unicode. This document further explains the issues with newer versions of Unicode. This document further explains the
that have been encountered when they are important to understanding issues that have been encountered when they are important to
of the revised protocols. It also provides an overview of the new understanding of the revised protocols. It also provides an overview
IDNA model and explanatory material for it. Additional explanatory of the new IDNA model and explanatory material for it. Additional
material for the specific components of the proposals appears with explanatory material for the specific components of the proposals
the associated documents. appears with the associated documents.
A good deal of the background material that appeared in RFC 3490 A good deal of the background material that appeared in RFC 3490
[RFC3490] has been removed from this update. That material is either [RFC3490] has been removed from this update. That material is either
of historical interest only or has been covered from a more recent of historical interest only or has been covered from a more recent
perspective in RFC 4690 [RFC4690]. perspective in RFC 4690 [RFC4690].
This document is not normative. The information it provides is This document is not normative. The information it provides is
intended to make the rules, tables, and protocol easier to understand intended to make the rules, tables, and protocol easier to understand
and to provide overview information and suggestions for zone and to provide overview information and suggestions for zone
administrators and others who need to make policy, deployment, and administrators and others who need to make policy, deployment, and
skipping to change at page 5, line 19 skipping to change at page 5, line 19
2003, i.e., those commonly known as the IDNA base specification 2003, i.e., those commonly known as the IDNA base specification
[RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep
[RFC3454]. In this document, those names are used to refer, [RFC3454]. In this document, those names are used to refer,
conceptually, to the individual documents, with the base IDNA conceptually, to the individual documents, with the base IDNA
specification called just "IDNA". specification called just "IDNA".
The term "IDNA2008" is used to refer to a new version of IDNA as The term "IDNA2008" is used to refer to a new version of IDNA as
described in this document and in the documents described in the described in this document and in the documents described in the
document listing of [IDNA2008-Defs]. IDNA2008 is not dependent on document listing of [IDNA2008-Defs]. IDNA2008 is not dependent on
any of the IDNA2003 specifications other than the one for Punycode any of the IDNA2003 specifications other than the one for Punycode
encoding. References to "these specifications" are to the entire encoding. References to "these specifications" or "these documents"
set. are to the entire IDNA2008 set.
1.3.2. DNS "Name" Terminology 1.3.2. DNS "Name" Terminology
These documents depart from historical DNS terminology and usage in These documents depart from historical DNS terminology and usage in
one important respect. Over the years, the community has talked very one important respect. Over the years, the community has talked very
casually about "names" in the DNS, beginning with calling it "the casually about "names" in the DNS, beginning with calling it "the
domain name system". That terminology is fine in the very precise domain name system". That terminology is fine in the very precise
sense that the identifiers of the DNS do provide names for objects sense that the identifiers of the DNS do provide names for objects
and addresses. But, in the context of IDNs, the term has introduced and addresses. But, in the context of IDNs, the term has introduced
some confusion, confusion that has increased further as people have some confusion, confusion that has increased further as people have
skipping to change at page 5, line 50 skipping to change at page 5, line 50
possible for them to be "words". possible for them to be "words".
This distinction is important because the reasonable goal of an IDN This distinction is important because the reasonable goal of an IDN
effort is not to be able to write the great Klingon (or language of effort is not to be able to write the great Klingon (or language of
one's choice) novel in DNS labels but to be able to form a usefully one's choice) novel in DNS labels but to be able to form a usefully
broad range of mnemonics in ways that are as natural as possible in a broad range of mnemonics in ways that are as natural as possible in a
very broad range of scripts. very broad range of scripts.
1.3.3. New Terminology and Restrictions 1.3.3. New Terminology and Restrictions
These documents [IDNA2008-Defs] introduce new terminology, and These documents introduce new terminology, and precise definitions,
precise definitions, for the terms "U-labels", "A-labels", labels for the terms "U-labels", "A-labels", labels that are "IDNA-valid",
that are "IDNA-valid", and an "LDH-label" (differing from an LDH- and an "LDH-label" (differing from an LDH-conformant label or fully-
conformant label or fully-qualified domain name). The also introduce qualified domain name). They also introduce a restriction, for IDNA-
a restriction, for IDNA-conformant applications and DNS zones in conformant applications and DNS zones in which IDNA is used, on
which IDNA is used, on strings used as labels that contain "--" in strings used as labels that contain "--" in the third and fourth
the third and fourth positions, essentially requiring that such positions, essentially requiring that such strings be IDNA-valid.
strings be IDNA-valid. This restriction on strings containing "--" This restriction on strings containing "--" is required for three
is required for three reasons: reasons:
o to prevent confusion with pre-IDNA coding forms; o to prevent confusion with pre-IDNA coding forms;
o to permit future extensions that would require changing the o to permit future extensions that would require changing the
prefix, no matter how unlikely those might be (see Section 7.4); prefix, no matter how unlikely those might be (see Section 7.4);
and and
o to reduce the opportunities for attacks via the Punycode encoding o to reduce the opportunities for attacks via the Punycode encoding
algorithm itself. algorithm itself.
Figure 1 of the Definitions Document [IDNA2008-Defs] illustrates the
terminology used by IDNA for various types of labels and strings and
their relationship.
1.4. Objectives 1.4. Objectives
The intent of the IDNA revision effort, and hence of this document The intent of the IDNA revision effort, and hence of this document
and the associated ones, is to increase the usability and and the associated ones, is to increase the usability and
effectiveness of internationalized domain names (IDNs) while effectiveness of internationalized domain names (IDNs) while
preserving or strengthening the integrity of references that use preserving or strengthening the integrity of references that use
them. The original "hostname" character definitions (see, e.g., them. The original "hostname" character definitions (see, e.g.,
[RFC0810]) struck a balance between the creation of useful mnemonics [RFC0810]) struck a balance between the creation of useful mnemonics
and the introduction of parsing problems or general confusion in the and the introduction of parsing problems or general confusion in the
contexts in which domain names are used. The objective of IDNA2008 contexts in which domain names are used. The objective of IDNA2008
skipping to change at page 9, line 42 skipping to change at page 10, line 4
Unicode. Unicode.
The actual registration and lookup protocols for IDNA2008 are The actual registration and lookup protocols for IDNA2008 are
specified in [IDNA2008-Protocol]. specified in [IDNA2008-Protocol].
3. Permitted Characters: An Inclusion List 3. Permitted Characters: An Inclusion List
This section provides an overview of the model used to establish the This section provides an overview of the model used to establish the
algorithm and character lists of [IDNA2008-Tables] and describes the algorithm and character lists of [IDNA2008-Tables] and describes the
names and applicability of the categories used there. Note that the names and applicability of the categories used there. Note that the
inclusion of a character in the first category group does not imply inclusion of a character in the first category group (Section 3.1.1)
that it can be used indiscriminately; some characters are associated does not imply that it can be used indiscriminately; some characters
with contextual rules that must be applied as well. are associated with contextual rules that must be applied as well.
The information given in this section is provided to make the rules, The information given in this section is provided to make the rules,
tables, and protocol easier to understand. The normative generating tables, and protocol easier to understand. The normative generating
rules that correspond to this informal discussion appear in rules that correspond to this informal discussion appear in
[IDNA2008-Tables] and the rules that actually determine what labels [IDNA2008-Tables] and the rules that actually determine what labels
can be registered or looked up are in [IDNA2008-Protocol]. can be registered or looked up are in [IDNA2008-Protocol].
3.1. A Tiered Model of Permitted Characters and Labels 3.1. A Tiered Model of Permitted Characters and Labels
Moving to an inclusion model requires respecifying the list of Moving to an inclusion model requires respecifying the list of
characters that are permitted in IDNs. In IDNA2003, the role and characters that are permitted in IDNs. In IDNA2003, the role and
utility of characters are independent of context and fixed forever utility of characters are independent of context and fixed forever
(or until the standard is replaced). Making completely context- (or until the standard is replaced). Making completely context-
independent rules globally has proven impractical because some independent rules globally has proven impractical because some
characters, especially those that are called "Join_Controls" in characters, especially those that are called "Join_Controls" in
Unicode, are needed to make reasonable use of some scripts but have Unicode, are needed to make reasonable use of some scripts but have
no visible effect(s) in others. IDNA2003 prohibited those types of no visible effect(s) in others. IDNA2003 prohibited those types of
characters entirely. But the restrictions were much too severe to characters entirely. But the restrictions were much too severe to
permit an adequate range of mnemonics for terminology based on some permit an adequate range of mnemonics for identifiers based on some
languages. The requirement to support those characters but limit languages. The requirement to support those characters but limit
their use to very specific contexts was reinforced by the observation their use to very specific contexts was reinforced by the observation
that handling of particular characters across the languages that use that handling of particular characters across the languages that use
a script, or the use of similar or identical-looking characters in a script, or the use of similar or identical-looking characters in
different scripts, is less well understood than many people believed different scripts, is less well understood than many people believed
it was several years ago. it was several years ago.
Independently of the characters chosen (see next subsection), the Independently of the characters chosen (see next subsection), the
approach is to divide the characters that appear in Unicode into approach is to divide the characters that appear in Unicode into
three categories: three categories:
skipping to change at page 16, line 40 skipping to change at page 16, line 48
appear embedded in text that is otherwise in some other character appear embedded in text that is otherwise in some other character
coding. coding.
All protocols that use domain name slots already have the capacity All protocols that use domain name slots already have the capacity
for handling domain names in the ASCII charset. Thus, A-labels can for handling domain names in the ASCII charset. Thus, A-labels can
inherently be handled by those protocols. inherently be handled by those protocols.
4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate
Character Forms Character Forms
[[anchor14: There is some internal redundancy and repetition in the [[anchor13: There is some internal redundancy and repetition in the
material in this section. Specific suggestions about to reduce or material in this section. Specific suggestions about to reduce or
eliminate redundant text for -05 would be appreciated.]] eliminate redundant text would be appreciated. If no such
suggestions are received before -07 is posted, this not will be
removed.]]
Users often have expectations about character matching or equivalence Users often have expectations about character matching or equivalence
that are based on their own languages and the orthography of those that are based on their own languages and the orthography of those
languages. These expectations may not be consistent with forms or languages. These expectations may not be consistent with forms or
actions that can be naturally accommodated in a character coding actions that can be naturally accommodated in a character coding
system, especially if multiple languages are written using the same system, especially if multiple languages are written using the same
script but using different conventions. A Norwegian user might script but using different conventions. A Norwegian user might
expect a label with the ae-ligature to be treated as the same label expect a label with the ae-ligature to be treated as the same label
as one using the Swedish spelling with a-diaeresis even though as one using the Swedish spelling with a-diaeresis even though
applying that mapping to English would be astonishing to users. A applying that mapping to English would be astonishing to users. A
skipping to change at page 18, line 42 skipping to change at page 19, line 4
these situations in a system such as IDNA -- or with Unicode these situations in a system such as IDNA -- or with Unicode
normalization generally -- since determining what to do requires normalization generally -- since determining what to do requires
information about the language being used, context, or both. information about the language being used, context, or both.
Consequently, these specifications make no attempt to treat these Consequently, these specifications make no attempt to treat these
combined characters in any special way. However, their existence combined characters in any special way. However, their existence
provides a prime example of a situation in which a registry that is provides a prime example of a situation in which a registry that is
aware of the language context in which labels are to be registered, aware of the language context in which labels are to be registered,
and where that language sometimes (or always) treats the two- and where that language sometimes (or always) treats the two-
character sequences as equivalent to the combined form, should give character sequences as equivalent to the combined form, should give
serious consideration to applying a "variant" model [RFC3743] serious consideration to applying a "variant" model [RFC3743]
[RFC4290], or to prohibiting registration of one the forms entirely, [RFC4290], or to prohibiting registration of one the forms entirely,
to reduce the opportunities for user confusion and fraud that would to reduce the opportunities for user confusion and fraud that would
result from the related strings being registered to different result from the related strings being registered to different
parties. parties.
[[anchor14: Placeholder: A discussion of the Arabic digit issue
shoudl go here once it is resolved in some appropriate way.]]
4.4. Case Mapping and Related Issues 4.4. Case Mapping and Related Issues
In the DNS, ASCII letters are stored with their case preserved. In the DNS, ASCII letters are stored with their case preserved.
Matching during the query process is case-independent, but none of Matching during the query process is case-independent, but none of
the information that might be represented by choices of case has been the information that might be represented by choices of case has been
lost. That model has been accidentally helpful because, as people lost. That model has been accidentally helpful because, as people
have created DNS labels by catenating words (or parts of words) to have created DNS labels by catenating words (or parts of words) to
form labels, case has often been used to distinguish among components form labels, case has often been used to distinguish among components
and make the labels more memorable. and make the labels more memorable.
The solution of keeping the characters separate but doing matching The solution of keeping the characters separate but doing matching
independent of case is not feasible with IDNA or any IDNA-like model independent of case is not feasible with IDNA or any IDNA-like model
because the matching would then have to be done on the server rather because the matching would then have to be done on the server rather
than have characters mapped on the client. That situation was than have characters mapped on the client. That situation was
recognized in IDNA2003 and nothing in IDNA2008 fundamentally changes recognized in IDNA2003 and nothing in these specifications
it or could do so. In IDNA2003, all characters are case-folded and fundamentally changes it or could do so. In IDNA2003, all characters
mapped. That results in upper-case characters being mapped to lower- are case-folded and mapped. That results in upper-case characters
case ones and in some other transformations of alternate forms of being mapped to lower-case ones and in some other transformations of
characters, especially those that do not have (or did not have) alternate forms of characters, especially those that do not have (or
upper-case forms. For example, Greek Final Form Sigma (U+03C2) is did not have) upper-case forms. For example, Greek Final Form Sigma
mapped to the medial form (U+03C3) and Eszett (German Sharp S, (U+03C2) is mapped to the medial form (U+03C3) and Eszett (German
U+00DF) is mapped to "ss". Neither of these mappings is reversible Sharp S, U+00DF) is mapped to "ss". Neither of these mappings is
because the upper case of U+03C3 is the Upper Case Sigma (U+03A3) and reversible because the upper case of U+03C3 is the Upper Case Sigma
"ss" is an ASCII string. IDNA2008 permits, at the risk of some (U+03A3) and "ss" is an ASCII string. IDNA2008 permits, at the risk
incompatibility, slightly more flexibility in this area by avoid case of some incompatibility, slightly more flexibility in this area by
folding and treating these characters as themselves. Approaches to avoid case folding and treating these characters as themselves.
handling the incompatibility are discussed in Section 7.2. Although Approaches to handling that incompatibility are discussed in
information is lost in IDNA2003's ToASCII operation so that, in some Section 7.2. Although information is lost in IDNA2003's ToASCII
sense, Final Sigma Eszett cannot be represented in an IDN at all, its operation so that, in some sense, neither Final Sigma nor Eszett can
guarantee of mapping when those characters are used as input can be be represented in an IDN at all, its guarantee of mapping when those
interpreted as violating one of the conditions discussed in characters are used as input can be interpreted as violating one of
Section 7.4.1 and hence requiring a prefix change. The consensus was the conditions discussed in Section 7.4.1 and hence requiring a
to not make a prefix change in spite of this issue. Of course, had a prefix change. The consensus was to not make a prefix change in
prefix change been made (at the costs discussed in Section 7.4.3) spite of this issue. Of course, had a prefix change been made (at
there would have been several options, including, if desired, the costs discussed in Section 7.4.3) there would have been several
assignment of the character to the CONTEXTUAL RULE REQUIRED category options, including, if desired, assignment of the character to the
and requiring that it only be used in carefully-selected contexts. CONTEXTUAL RULE REQUIRED category and requiring that it only be used
in carefully-selected contexts.
4.5. Right to Left Text 4.5. Right to Left Text
In order to be sure that the directionality of right to left text is In order to be sure that the directionality of right to left text is
unambiguous, IDNA2003 required that any label in which right to left unambiguous, IDNA2003 required that any label in which right to left
characters appear both starts and ends with them, not include any characters appear both starts and ends with them, not include any
characters with strong left to right properties (which excludes other characters with strong left to right properties (which excludes other
alphabetic characters but permits European digits), and rejects any alphabetic characters but permits European digits), and rejects any
other string that contains a right to left character. This is one of other string that contains a right to left character. This is one of
the few places where the IDNA algorithms (both in IDNA2003 and in the few places where the IDNA algorithms (both in IDNA2003 and in
skipping to change at page 22, line 14 skipping to change at page 22, line 29
in an arbitrary context (such as running text), it is difficult, even in an arbitrary context (such as running text), it is difficult, even
with only ASCII characters, to know whether an actual domain name (or with only ASCII characters, to know whether an actual domain name (or
a protocol parameter like a URI) is present and where it starts and a protocol parameter like a URI) is present and where it starts and
ends. When using Unicode, this gets even more difficult if treatment ends. When using Unicode, this gets even more difficult if treatment
of certain special characters (like the dot that separates labels in of certain special characters (like the dot that separates labels in
a domain name) depends on context (e.g., prior knowledge of whether a domain name) depends on context (e.g., prior knowledge of whether
the string represents a domain name or not). That knowledge is not the string represents a domain name or not). That knowledge is not
available if the primary heuristic for identifying the presence of available if the primary heuristic for identifying the presence of
domain names in strings depends on the presence of dots separating domain names in strings depends on the presence of dots separating
groups of characters with no intervening spaces. groups of characters with no intervening spaces.
[[anchor16: Above text is a substitute for an earlier (pre -01)
version and is hoped to be more clear. Comments and improvements
welcome.]]
As discussed elsewhere in this document, the IDNA2008 model removes As discussed elsewhere in this document, the IDNA2008 model removes
all of these mappings and interpretations, including the equivalence all of these mappings and interpretations, including the equivalence
of different forms of dots, from the protocol, discouraging such of different forms of dots, from the protocol, discouraging such
mappings and leaving them, when necessary, to local processing. This mappings and leaving them, when necessary, to local processing. This
should not be taken to imply that local processing is optional or can should not be taken to imply that local processing is optional or can
be avoided entirely, even if doing so might have been desirable in a be avoided entirely, even if doing so might have been desirable in a
world without IDNA2003 IDNs in files and archives. Instead, unless world without IDNA2003 IDNs in files and archives. Instead, unless
the program context is such that it is known that any IDNs that the program context is such that it is known that any IDNs that
appear will contain either U-label or A-label forms, or that other appear will contain either U-label or A-label forms, or that other
skipping to change at page 23, line 40 skipping to change at page 24, line 5
In either case, it is vital that user interface designs and, where In either case, it is vital that user interface designs and, where
the interfaces are not sufficient, users, be aware that the only the interfaces are not sufficient, users, be aware that the only
forms of domain names that this protocol anticipates will resolve forms of domain names that this protocol anticipates will resolve
globally or compare equal when crude methods (i.e., those not globally or compare equal when crude methods (i.e., those not
conforming to the strict definition of label equivalence given in conforming to the strict definition of label equivalence given in
[IDNA2008-Defs]) are used are those in which all native-script labels [IDNA2008-Defs]) are used are those in which all native-script labels
are in U-label form. Forms that assume mapping will occur, are in U-label form. Forms that assume mapping will occur,
especially forms that were not valid under IDNA2003, may or may not especially forms that were not valid under IDNA2003, may or may not
function in predictable ways across all implementations. function in predictable ways across all implementations.
User interfaces involving Latin-based scripts should take special
care when considering how to handle case mapping because small
differences in label strings may cause behavior that is astonishing
to users. Because case-insensitive mapping is done for ASCII strings
by DNS-servers, an all-ASCII label is treated as case-insensitive.
However, if even one of the characters of that string is replaced by
one that requires the label to be given IDN treatment (e.g., by
adding a diacritical mark), then the label immediately becomes case-
sensitive. This suggests that case mapping for Latin-based scripts
(and possibly other scripts with case distinctions) as a
preprocessing matter in applications may be wise to prevent user
astonishment, but, since all applications may not do this and
ambiguity in transport is not desirable, the that case-dependent
forms should not be stored in files.
7. Migration from IDNA2003 and Unicode Version Synchronization 7. Migration from IDNA2003 and Unicode Version Synchronization
7.1. Design Criteria 7.1. Design Criteria
As mentioned above and in RFC 4690, two key goals of the IDNA2008 As mentioned above and in RFC 4690, two key goals of the IDNA2008
design are to enable applications to be agnostic about whether they design are to enable applications to be agnostic about whether they
are being run in environments supporting any Unicode version from 3.2 are being run in environments supporting any Unicode version from 3.2
onward and to permit incrementally adding new characters, character onward and to permit incrementally adding new characters, character
groups, scripts, and other character collections as they are groups, scripts, and other character collections as they are
incorporated into Unicode, without disruption and, in the long term, incorporated into Unicode, without disruption and, in the long term,
skipping to change at page 27, line 8 skipping to change at page 27, line 33
o Validate the label itself for conformance with a small number of o Validate the label itself for conformance with a small number of
whole-label rules, notably verifying that there are no leading whole-label rules, notably verifying that there are no leading
combining marks, that the "bidi" conditions are met if right to combining marks, that the "bidi" conditions are met if right to
left characters appear, that any required contextual rules are left characters appear, that any required contextual rules are
available and that, if such rules are associated with Joiner available and that, if such rules are associated with Joiner
Controls, they are tested. Controls, they are tested.
o Avoid validating other contextual rules about characters, o Avoid validating other contextual rules about characters,
including mixed-script label prohibitions, although such rules may including mixed-script label prohibitions, although such rules may
be used to influence presentation decisions in the user interface. be used to influence presentation decisions in the user interface.
[[anchor19: Check this, and all similar statements, against [[anchor18: Check this, and all similar statements, against
Protocol when that is finished.]] Protocol when that is finished.]]
By avoiding applying its own interpretation of which labels are valid By avoiding applying its own interpretation of which labels are valid
as a means of rejecting lookup attempts, the lookup application as a means of rejecting lookup attempts, the lookup application
becomes less sensitive to version incompatibilities with the becomes less sensitive to version incompatibilities with the
particular zone registry associated with the domain name. particular zone registry associated with the domain name.
An application or client that processes names according to this An application or client that processes names according to this
protocol and then resolves them in the DNS will be able to locate any protocol and then resolves them in the DNS will be able to locate any
name that is validly registered, as long as its version of the name that is validly registered, as long as its version of the
skipping to change at page 27, line 30 skipping to change at page 28, line 7
of the characters in the label. Messages to users should distinguish of the characters in the label. Messages to users should distinguish
between "label contains an unallocated code point" and other types of between "label contains an unallocated code point" and other types of
lookup failures. A failure on the basis of an old version of Unicode lookup failures. A failure on the basis of an old version of Unicode
may lead the user to a desire to upgrade to a newer version, but will may lead the user to a desire to upgrade to a newer version, but will
have no other ill effects (this is consistent with behavior in the have no other ill effects (this is consistent with behavior in the
transition to the DNS when some hosts could not yet handle some forms transition to the DNS when some hosts could not yet handle some forms
of names or record types). of names or record types).
7.2. Changes in Character Interpretations 7.2. Changes in Character Interpretations
[[anchor20: Note in Draft: This subsection is completely new in [[anchor19: Note in Draft: This subsection is completely new in
version -04 of this document. It could almost certainly use version -04 and has been further tuned in -05 and -06 of this
improvement. It also contains some material that is redundant with document. It could almost certainly use improvement, although this
material in other sections. I have not tried to remove that material note will be removed if there are not significant suggestions about
and will not do so until the WG concludes that this section is the -06 version. It also contains some material that is redundant
relatively stable, but would appreciate help in identifying what with material in other sections. I have not tried to remove that
material and will not do so until the WG concludes that this section
is relatively stable, but would appreciate help in identifying what
should be removed or how this might be enhanced to contain more of should be removed or how this might be enhanced to contain more of
that other material. --JcK]] that other material. --JcK]]
In those scripts that make case distinctions, there are a few In those scripts that make case distinctions, there are a few
characters for which an obvious and unique upper case character has characters for which an obvious and unique upper case character has
not historically been available to match a lower case one or vice not historically been available to match a lower case one or vice
versa. For those characters, the mappings used in constructing the versa. For those characters, the mappings used in constructing the
Stringprep tables for IDNA2003, performed using the Unicode CaseFold Stringprep tables for IDNA2003, performed using the Unicode CaseFold
operation (See Section 5.8 of the Unicode Standard [Unicode51]), operation (See Section 5.8 of the Unicode Standard [Unicode51]),
generate different characters or sets of characters. Those generate different characters or sets of characters. Those
skipping to change at page 28, line 29 skipping to change at page 29, line 8
but a judgment that the incompatibility was not significant enough to but a judgment that the incompatibility was not significant enough to
just a prefix change, the WG concluded that Eszett and Final Form just a prefix change, the WG concluded that Eszett and Final Form
Sigma should be treated as distinct and Protocol-Valid characters. Sigma should be treated as distinct and Protocol-Valid characters.
The decision faces registries, especially registries maintaining The decision faces registries, especially registries maintaining
zones for third parties, with a variation on what has become a zones for third parties, with a variation on what has become a
familiar problem: how to introduce a new service in a way that does familiar problem: how to introduce a new service in a way that does
not create confusion or significantly weaken or invalidate existing not create confusion or significantly weaken or invalidate existing
identifiers. identifiers.
While it is beyond the scope of these documents to specify a There have traditionally been several approaches to problems of this
preference for any of them, or to suggest that there are not other type. Without any preference or claim to completeness, these are:
possibilities, there have traditionally been several approaches to
problems of this type:
o Do not permit use of the newly-available character at the registry o Do not permit use of the newly-available character at the registry
level. This might cause lookup failures if a domain name were level. This might cause lookup failures if a domain name were to
written with the expectation of the IDNA2003 mapping behavior, but be written with the expectation of the IDNA2003 mapping behavior,
would eliminate any possibility of false matches. but would eliminate any possibility of false matches.
o Hold a "sunrise" arrangement in which holders of the previously- o Hold a "sunrise"-like arrangement in which holders of the
mapped labels (labels containing "ss" in the Eszett case or ones previously-mapped labels (labels containing "ss" in the Eszett
containing Lower Case Sigma in the Final Sigma case) are given case or ones containing Lower Case Sigma in the Final Sigma case)
priority (and perhaps other benefits) for registering the are given priority (and perhaps other benefits) for registering
corresponding string containing the newly-available characters. the corresponding string containing the newly-available
characters.
o Adopt some sort of "variant" approach in which registrants either o Adopt some sort of "variant" approach in which registrants either
obtained labels with both character forms or one of them was obtained labels with both character forms or one of them was
blocked from registration by anyone but the registrant of the blocked from registration by anyone but the registrant of the
other form. other form.
In principle, lookup applications could also compensate for the In principle, lookup applications could also compensate for the
difference in interpretation by looking up the string according to difference in interpretation by looking up the string according to
the IDNA208 interpretation and then, if that failed, doing the lookup the interpretation specified in these documents and then, if that
with the mapping, simulating the IDNA2003 interpretation. The risk failed, doing the lookup with the mapping, simulating the IDNA2003
of false positives is such that this is generally to be discouraged interpretation. The risk of false positives is such that this is
unless the application is able to engage in a "did you really mean" generally to be discouraged unless the application is able to engage
dialogue with the end user. in a "is this what you meant" dialogue with the end user.
7.3. More Flexibility in User Agents 7.3. More Flexibility in User Agents
These specifications do not perform mappings between one character or These specifications do not include mappings between one character or
code point and others for any reason. Instead, they prohibit the code point and others for any reason. Instead, they prohibit the
characters that would be mapped to others by normalization, case characters that would be mapped to others by normalization, upper
folding (with exceptions for lower case characters that have no upper case to lower case changes, or other rules. As examples, while
case form, which are retained), or other rules. As examples, while
mathematical characters based on Latin ones are accepted as input to mathematical characters based on Latin ones are accepted as input to
IDNA2003, they are prohibited in IDNA2008. Similarly, double-width IDNA2003, they are prohibited in IDNA2008. Similarly, double-width
characters and other variations are prohibited as IDNA input. characters and other variations are prohibited as IDNA input.
Since the rules in [IDNA2008-Tables] have the effect that only Since the rules in [IDNA2008-Tables] have the effect that only
strings that are not transformed by NFKC are valid, if an application strings that are not transformed by NFKC are valid, if an application
chooses to perform NFKC normalization before lookup, that operation chooses to perform NFKC normalization before lookup, that operation
is safe since this will never make the application unable to look up is safe since this will never make the application unable to look up
any valid string. However, as discussed above, the application any valid string. However, as discussed above, the application
cannot guarantee that any other application will perform that cannot guarantee that any other application will perform that
skipping to change at page 30, line 12 skipping to change at page 30, line 38
As suggested earlier in this section, it appears to be desirable to As suggested earlier in this section, it appears to be desirable to
do as little character mapping as possible consistent with having do as little character mapping as possible consistent with having
Unicode work correctly (e.g., NFC mapping to resolve different Unicode work correctly (e.g., NFC mapping to resolve different
codings for the same character is still necessary although the codings for the same character is still necessary although the
specifications require that it be performed prior to invoking the specifications require that it be performed prior to invoking the
protocol) and to make the mapping between A-labels and U-labels protocol) and to make the mapping between A-labels and U-labels
idempotent. Case-mapping is not an exception to this principle. If idempotent. Case-mapping is not an exception to this principle. If
only lower case characters can be registered in the DNS (i.e., be only lower case characters can be registered in the DNS (i.e., be
present in a U-label), then IDNA2008 should prohibit upper-case present in a U-label), then IDNA2008 should prohibit upper-case
characters as input. Some other considerations reinforce this characters as input (and therefore does so). Some other
conclusion. For example, an essential element of the ASCII case- considerations reinforce this conclusion. For example, an essential
mapping functions is that uppercase(character) must be equal to element of the ASCII case-mapping functions is that, for individual
characters, uppercase(character) must be equal to
uppercase(lowercase(character)). That requirement may not be uppercase(lowercase(character)). That requirement may not be
satisfied with IDNs. For example, there are some characters in satisfied with IDNs. For example, there are some characters in
scripts that use case distinction that do not have counterparts in scripts that use case distinction that do not have counterparts in
one case or the other. The relationship between upper case and lower one case or the other. The relationship between upper case and lower
case may even be language-dependent, with different languages (or case may even be language-dependent, with different languages (or
even the same language in different areas) expecting different even the same language in different areas) expecting different
mappings. Of course, the expectations of users who are accustomed to mappings. Of course, the expectations of users who are accustomed to
a case-insensitive DNS environment will probably be well-served if a case-insensitive DNS environment will probably be well-served if
user agents perform case folding prior to IDNA processing, but the user agents perform case folding prior to IDNA processing, but the
IDNA procedures themselves should neither require such mapping nor IDNA procedures themselves should neither require such mapping nor
skipping to change at page 33, line 29 skipping to change at page 34, line 9
there are no uniform conventions for naming; variations such as there are no uniform conventions for naming; variations such as
outline, solid, and shaded forms may or may not exist; and so on. outline, solid, and shaded forms may or may not exist; and so on.
As just one example, consider a "heart" symbol as it might appear As just one example, consider a "heart" symbol as it might appear
in a logo that might be read as "I love...". While the user might in a logo that might be read as "I love...". While the user might
read such a logo as "I love..." or "I heart...", considerable read such a logo as "I love..." or "I heart...", considerable
knowledge of the coding distinctions made in Unicode is needed to knowledge of the coding distinctions made in Unicode is needed to
know that there more than one "heart" character (e.g., U+2665, know that there more than one "heart" character (e.g., U+2665,
U+2661, and U+2765) and how to describe it. These issues are of U+2661, and U+2765) and how to describe it. These issues are of
particular importance if strings are expected to be understood or particular importance if strings are expected to be understood or
transcribed by the listener after being read out loud. transcribed by the listener after being read out loud.
[[anchor21: The above paragraph remains controversial as to [[anchor20: The above paragraph remains controversial as to
whether it is valid. The WG will need to make a decision if this whether it is valid. The WG will need to make a decision if this
section is not dropped entirely.]] section is not dropped entirely.]]
o As a simplified example of this, assume one wanted to use a o As a simplified example of this, assume one wanted to use a
"heart" or "star" symbol in a label. This is problematic because "heart" or "star" symbol in a label. This is problematic because
those names are ambiguous in the Unicode system of naming (the those names are ambiguous in the Unicode system of naming (the
actual Unicode names require far more qualification). A user or actual Unicode names require far more qualification). A user or
would-be registrant has no way to know -- absent careful study of would-be registrant has no way to know -- absent careful study of
the code tables -- whether it is ambiguous (e.g., where there are the code tables -- whether it is ambiguous (e.g., where there are
multiple "heart" characters) or not. Conversely, the user seeing multiple "heart" characters) or not. Conversely, the user seeing
skipping to change at page 34, line 27 skipping to change at page 35, line 6
languages and scripts which would be treated like any other languages and scripts which would be treated like any other
language characters; the two should not be confused. language characters; the two should not be confused.
7.7. Migration Between Unicode Versions: Unassigned Code Points 7.7. Migration Between Unicode Versions: Unassigned Code Points
In IDNA2003, labels containing unassigned code points are looked up In IDNA2003, labels containing unassigned code points are looked up
on the assumption that, if they appear in labels and can be mapped on the assumption that, if they appear in labels and can be mapped
and then resolved, the relevant standards must have changed and the and then resolved, the relevant standards must have changed and the
registry has properly allocated only assigned values. registry has properly allocated only assigned values.
In IDNA2008, strings containing unassigned code points must not be In the protocol as described in these documents, strings containing
either looked up or registered. There are several reasons for this, unassigned code points must not be either looked up or registered.
with the most important ones being: There are several reasons for this, with the most important ones
being:
o It cannot be known with sufficient reliability in advance that a o It cannot be known with sufficient reliability in advance that a
code point that was not previously assigned will not be assigned code point that was not previously assigned will not be assigned
to a compatibility character. In IDNA2003, since there is no to a compatibility character or one that would be otherwise
direct dependency on NFKC (Stringprep's tables are based on NFKC, disallowed by the rules in [IDNA2008-Tables]. In IDNA2003, since
but IDNA2003 depends only on Stringprep), allocation of a there is no direct dependency on NFKC (Stringprep's tables are
compatibility character might produce some odd situations, but it based on NFKC, but IDNA2003 depends only on Stringprep),
would not be a problem. In IDNA2008, where compatibility allocation of a compatibility character might produce some odd
characters are generally assigned to DISALLOWED, permitting situations, but it would not be a problem. In IDNA2008, where
strings containing unassigned characters to be looked up would compatibility characters are generally assigned to DISALLOWED,
permit violating the principle that characters in DISALLOWED are permitting strings containing unassigned characters to be looked
not looked up. up would permit violating the principle that characters in
DISALLOWED are not looked up.
o More generally, the status of an unassigned character with regard o More generally, the status of an unassigned character with regard
to the DISALLOWED and PROTOCOL-VALID categories, and whether to the DISALLOWED and PROTOCOL-VALID categories, and whether
contextual rules are required with the latter, cannot be evaluated contextual rules are required with the latter, cannot be evaluated
until a character is actually assigned and known. until a character is actually assigned and known. By contrast,
characters that are actually DISALLOWED are placed in that
category only as a consequence of rules applied to known
properties or per-character evaluation.
It is possible to argue that the issues above are not important and It is possible to argue that the issues above are not important and
that, as a consequence, it is better to retain the principle of that, as a consequence, it is better to retain the principle of
looking up labels even if they contain unassigned characters because looking up labels even if they contain unassigned characters because
all of the important scripts and characters have been coded as of all of the important scripts and characters have been coded as of
Unicode 5.1 and hence unassigned code points will be assigned only to Unicode 5.1 and hence unassigned code points will be assigned only to
obscure characters or archaic scripts. Unfortunately, that does not obscure characters or archaic scripts. Unfortunately, that does not
appear to be a safe assumption for at least two reasons. First, much appear to be a safe assumption for at least two reasons. First, much
the same claim of completeness has been made for earlier versions of the same claim of completeness has been made for earlier versions of
Unicode. The reality is that a script that is obscure to much of the Unicode. The reality is that a script that is obscure to much of the
world may still be very important to those who use it. Cultural and world may still be very important to those who use it. Cultural and
linguistic preservation principles make it inappropriate to declare linguistic preservation principles make it inappropriate to declare
the script of no importance in IDNs. Second, we already have the script of no importance in IDNs. Second, we already have
counterexamples in, e.g., the relationships associated with new Han counterexamples in, e.g., the relationships associated with new Han
characters being added (whether in the BMP or in Unicode Plane 2). characters being added (whether in the BMP or in Unicode Plane 2).
7.8. Other Compatibility Issues 7.8. Other Compatibility Issues
The existing (2003) IDNA model includes several odd artifacts of the The 2003 IDNA model includes several odd artifacts of the context in
context in which it was developed. Many, if not all, of these are which it was developed. Many, if not all, of these are potential
potential avenues for exploits, especially if the registration avenues for exploits, especially if the registration process permits
process permits "source" names (names that have not been processed "source" names (names that have not been processed through IDNA and
through IDNA and Nameprep) to be registered. As one example, since Nameprep) to be registered. As one example, since the character
the character Eszett, used in German, is mapped by IDNA2003 into the Eszett, used in German, is mapped by IDNA2003 into the sequence "ss"
sequence "ss" rather than being retained as itself or prohibited, a rather than being retained as itself or prohibited, a string
string containing that character but that is otherwise in ASCII is containing that character but that is otherwise in ASCII is not
not really an IDN (in the U-label sense defined above) at all. After really an IDN (in the U-label sense defined above) at all. After
Nameprep maps the Eszett out, the result is an ASCII string and so Nameprep maps the Eszett out, the result is an ASCII string and so
does not get an xn-- prefix, but the string that can be displayed to does not get an xn-- prefix, but the string that can be displayed to
a user appears to be an IDN. The proposed IDNA2008 eliminates this a user appears to be an IDN. The newer version of the protocol
artifact. A character is either permitted as itself or it is eliminates this artifact. A character is either permitted as itself
prohibited; special cases that make sense only in a particular or it is prohibited; special cases that make sense only in a
linguistic or cultural context can be dealt with as localization particular linguistic or cultural context can be dealt with as
matters where appropriate. localization matters where appropriate.
8. Acknowledgments 8. Acknowledgments
The editor and contributors would like to express their thanks to The editor and contributors would like to express their thanks to
those who contributed significant early (pre-WG) review comments, those who contributed significant early (pre-WG) review comments,
sometimes accompanied by text, especially Mark Davis, Paul Hoffman, sometimes accompanied by text, especially Mark Davis, Paul Hoffman,
Simon Josefsson, and Sam Weiler. In addition, some specific ideas Simon Josefsson, and Sam Weiler. In addition, some specific ideas
were incorporated from suggestions, text, or comments about sections were incorporated from suggestions, text, or comments about sections
that were unclear supplied by Frank Ellerman, Michael Everson, Asmus that were unclear supplied by Frank Ellerman, Michael Everson, Asmus
Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler,
skipping to change at page 37, line 51 skipping to change at page 38, line 37
12. Security Considerations 12. Security Considerations
12.1. General Security Issues with IDNA 12.1. General Security Issues with IDNA
This document in the IDNA2008 series is purely explanatory and This document in the IDNA2008 series is purely explanatory and
informational and consequently introduces no new security issues. It informational and consequently introduces no new security issues. It
would, of course, be a poor idea for someone to try to implement from would, of course, be a poor idea for someone to try to implement from
it; such an attempt would almost certainly lead to interoperability it; such an attempt would almost certainly lead to interoperability
problems and might lead to security ones. A discussion of security problems and might lead to security ones. A discussion of security
issues with IDNA2008, and IDNA generally, appears in [IDNA2008-Defs]. issues with IDNA, including some relevant history, appears in
[IDNA2008-Defs].
12.2. Security Differences from IDNA2003
The registration and lookup models described in this set of documents
change the mechanisms available for lookup applications to determine
the validity of labels they encounter. In some respects, the ability
to test is strengthened. For example, putative labels that contain
unassigned code points will now be rejected, while IDNA2003 permitted
them (something that is now recognized as a considerable source of
risk). On the other hand, the protocol specification no longer
assumes that the application that looks up a name will be able to
determine, and apply, information about the protocol version used in
registration. In theory, that may increase risk since the
application will be able to do less pre-lookup validation. In
practice, the protection afforded by that test has been largely
illusory for reasons explained in RFC 4690 and above.
Any change to Stringprep or, more broadly, the IETF's model of the
use of internationalized character strings in different protocols,
creates some risk of inadvertent changes to those protocols,
invalidating deployed applications or databases, and so on. The same
considerations that would require changing the IDN prefix (see the
discussion of prefix changes in Section 7.4) are the ones that would,
e.g., invalidate certificates or hashes that depend on Stringprep,
but those cases require careful consideration and evaluation. More
important, it is not necessary to change Stringprep at all in order
to create a definition or implementation of IDNA as specified in this
set of documents. Because these documents do not depend on
Stringprep at all, the question of upgrading other protocols that do
depend on Stringprep can be left to experts on those protocols: there
is no dependency between IDNA changes and possible upgrades to
security protocols or conventions.
13. References 13. References
13.1. Normative References 13.1. Normative References
[ASCII] American National Standards Institute (formerly United [ASCII] American National Standards Institute (formerly United
States of America Standards Institute), "USA Code for States of America Standards Institute), "USA Code for
Information Interchange", ANSI X3.4-1968, 1968. Information Interchange", ANSI X3.4-1968, 1968.
ANSI X3.4-1968 has been replaced by newer versions with ANSI X3.4-1968 has been replaced by newer versions with
skipping to change at page 44, line 5 skipping to change at page 43, line 48
input to IDNA2003 and 2008. input to IDNA2003 and 2008.
o Some material, including this section/appendix, rearranged. o Some material, including this section/appendix, rearranged.
A.5. Version -05 A.5. Version -05
o Many small editorial changes, including changes to eliminate the o Many small editorial changes, including changes to eliminate the
last vestiges of what appeared to be 2119 language (upper-case last vestiges of what appeared to be 2119 language (upper-case
MUST, SHOULD, or MAY) and small adjustments to terminology. MUST, SHOULD, or MAY) and small adjustments to terminology.
A.6. Version -06
o Removed Security Considerations material and pointed to Defs,
where it now appears as of version 05.
o Started changing uses of "IDNA2008" in running text to "in these
specifications" or the equivalent. These documents are titled
simply "IDNA"; once they are standardized, "the current version"
may be a more appropriate reference than one containing a year.
As discussed on the mailing list, we can and should discuss how to
refer to these documents at an appropriate time (e.g., when we
know when we will be finished) but, in the interim, it seems
appropriate to simply start getting rid of the version-specific
terminology where it can naturally be removed.
o Additional discussion of mappings, etc., especially for case-
sensitivity.
o More editorial fine-tuning.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
 End of changes. 45 change blocks. 
172 lines changed or deleted 190 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/