< draft-ietf-idnabis-rationale-16.txt   draft-ietf-idnabis-rationale-17.txt >
Network Working Group J. Klensin Network Working Group J. Klensin
Internet-Draft January 7, 2010 Internet-Draft January 11, 2010
Intended status: Informational Intended status: Informational
Expires: July 11, 2010 Expires: July 15, 2010
Internationalized Domain Names for Applications (IDNA): Background, Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale Explanation, and Rationale
draft-ietf-idnabis-rationale-16.txt draft-ietf-idnabis-rationale-17.txt
Abstract Abstract
Several years have passed since the original protocol for Several years have passed since the original protocol for
Internationalized Domain Names (IDNs) was completed and deployed. Internationalized Domain Names (IDNs) was completed and deployed.
During that time, a number of issues have arisen, including the need During that time, a number of issues have arisen, including the need
to update the system to deal with newer versions of Unicode. Some of to update the system to deal with newer versions of Unicode. Some of
these issues require tuning of the existing protocols and the tables these issues require tuning of the existing protocols and the tables
on which they depend. This document provides an overview of a on which they depend. This document provides an overview of a
revised system and provides explanatory material for its components. revised system and provides explanatory material for its components.
skipping to change at page 1, line 43 skipping to change at page 1, line 43
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 11, 2010. This Internet-Draft will expire on July 15, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 28 skipping to change at page 3, line 28
3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10
3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10
3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11
3.1.2.1. Contextual Restrictions . . . . . . . . . . . . . 11 3.1.2.1. Contextual Restrictions . . . . . . . . . . . . . 11
3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12
3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12
3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13
3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13
3.3. Layered Restrictions: Tables, Context, Registration, 3.3. Layered Restrictions: Tables, Context, Registration,
Applications . . . . . . . . . . . . . . . . . . . . . . . 14 Applications . . . . . . . . . . . . . . . . . . . . . . . 14
4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 4. Application-Related Issues . . . . . . . . . . . . . . . . . . 15
4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15
4.2. Entry and Display in Applications . . . . . . . . . . . . 16 4.2. Entry and Display in Applications . . . . . . . . . . . . 16
4.3. Linguistic Expectations: Ligatures, Digraphs, and 4.3. Linguistic Expectations: Ligatures, Digraphs, and
Alternate Character Forms . . . . . . . . . . . . . . . . 18 Alternate Character Forms . . . . . . . . . . . . . . . . 18
4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20
4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21
5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21
6. Front-end and User Interface Processing for Lookup . . . . . . 22 6. Front-end and User Interface Processing for Lookup . . . . . . 22
7. Migration from IDNA2003 and Unicode Version Synchronization . 24 7. Migration from IDNA2003 and Unicode Version Synchronization . 24
7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24
skipping to change at page 4, line 41 skipping to change at page 4, line 41
A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46 A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46
A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46 A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46
A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46 A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46
A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47 A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47
A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 47 A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 47
A.12. Version -12 . . . . . . . . . . . . . . . . . . . . . . . 48 A.12. Version -12 . . . . . . . . . . . . . . . . . . . . . . . 48
A.13. Version -13 . . . . . . . . . . . . . . . . . . . . . . . 48 A.13. Version -13 . . . . . . . . . . . . . . . . . . . . . . . 48
A.14. Version -14 . . . . . . . . . . . . . . . . . . . . . . . 48 A.14. Version -14 . . . . . . . . . . . . . . . . . . . . . . . 48
A.15. Version -15 . . . . . . . . . . . . . . . . . . . . . . . 49 A.15. Version -15 . . . . . . . . . . . . . . . . . . . . . . . 49
A.16. Version -16 . . . . . . . . . . . . . . . . . . . . . . . 49 A.16. Version -16 . . . . . . . . . . . . . . . . . . . . . . . 49
A.17. Version -17 . . . . . . . . . . . . . . . . . . . . . . . 49
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 49 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 49
1. Introduction 1. Introduction
1.1. Context and Overview 1.1. Context and Overview
Internationalized Domain Names in Applications (IDNA) is a collection Internationalized Domain Names in Applications (IDNA) is a collection
of standards that allow client applications to convert some Unicode of standards that allow client applications to convert some mnemonic
mnemonics to an ASCII-compatible encoding form ("ACE") which is a strings expressed in Unicode to an ASCII-compatible encoding form
valid DNS label containing only letters, digits, and hyphens. The ("ACE") which is a valid DNS label containing only letters, digits,
specific form of ACE label used by IDNA is called an "A-label". A and hyphens. The specific form of ACE label used by IDNA is called
client can look up an exact A-label in the existing DNS, so A-labels an "A-label". A client can look up an exact A-label in the existing
do not require any extensions to DNS, upgrades of DNS servers or DNS, so A-labels do not require any extensions to DNS, upgrades of
updates to low-level client libraries. An A-label is recognizable DNS servers or updates to low-level client libraries. An A-label is
from the prefix "xn--" before the characters produced by the Punycode recognizable from the prefix "xn--" before the characters produced by
algorithm [RFC3492], thus a user application can identify an A-label the Punycode algorithm [RFC3492], thus a user application can
and convert it into Unicode (or some local coded character set) for identify an A-label and convert it into Unicode (or some local coded
display. character set) for display.
On the registry side, IDNA allows a registry to offer On the registry side, IDNA allows a registry to offer
Internationalized Domain Names (IDNs) for registration as A-labels. Internationalized Domain Names (IDNs) for registration as A-labels.
A registry may offer any subset of valid IDNs, and may apply any A registry may offer any subset of valid IDNs, and may apply any
restrictions or bundling (grouping of similar labels together in one restrictions or bundling (grouping of similar labels together in one
registration) appropriate for the context of that registry. registration) appropriate for the context of that registry.
Registration of labels is sometimes discussed separately from lookup, Registration of labels is sometimes discussed separately from lookup,
and is subject to a few specific requirements that do not apply to and is subject to a few specific requirements that do not apply to
lookup. lookup.
skipping to change at page 7, line 50 skipping to change at page 7, line 50
As with other documents in the IDNA2008 set, this document uses the As with other documents in the IDNA2008 set, this document uses the
term "registry" to describe any zone in the DNS. That term, and the term "registry" to describe any zone in the DNS. That term, and the
terms "zone" or "zone administration", are interchangeable. terms "zone" or "zone administration", are interchangeable.
1.4. Objectives 1.4. Objectives
These are the main objectives in revising IDNA. These are the main objectives in revising IDNA.
o Use a more recent version of Unicode, and allow IDNA to be o Use a more recent version of Unicode, and allow IDNA to be
independent of Unicode versions, so that IDNA2008 need not be independent of Unicode versions, so that IDNA2008 need not be
updated for implementations to adopt codepoints from new Unicode updated for implementations to adopt code points from new Unicode
versions. versions.
o Fix a very small number of code-point categorizations that have o Fix a very small number of code point categorizations that have
turned out to cause problems in the communities that use those turned out to cause problems in the communities that use those
code-points. code points.
o Reduce the dependency on mapping, in order that the pre-mapped o Reduce the dependency on mapping, in order that the pre-mapped
forms (which are not valid IDNA labels) tend to appear less often forms (which are not valid IDNA labels) tend to appear less often
in various contexts, in favor of valid A-labels. in various contexts, in favor of valid A-labels.
o Fix some details in the bidirectional codepoint handling o Fix some details in the bidirectional code point handling
algorithms. algorithms.
1.5. Applicability and Function of IDNA 1.5. Applicability and Function of IDNA
The IDNA specification solves the problem of extending the repertoire The IDNA specification solves the problem of extending the repertoire
of characters that can be used in domain names to include a large of characters that can be used in domain names to include a large
subset of the Unicode repertoire. subset of the Unicode repertoire.
IDNA does not extend DNS. Instead, the applications (and, by IDNA does not extend DNS. Instead, the applications (and, by
implication, the users) continue to see an exact-match lookup implication, the users) continue to see an exact-match lookup
skipping to change at page 10, line 43 skipping to change at page 10, line 43
at registration time but not during lookup. Another significant at registration time but not during lookup. Another significant
benefit is that separation facilitates incremental addition of benefit is that separation facilitates incremental addition of
permitted character groups to avoid freezing on one particular permitted character groups to avoid freezing on one particular
version of Unicode. version of Unicode.
The actual registration and lookup protocols for IDNA2008 are The actual registration and lookup protocols for IDNA2008 are
specified in [IDNA2008-Protocol]. specified in [IDNA2008-Protocol].
3. Permitted Characters: An Inclusion List 3. Permitted Characters: An Inclusion List
IDNA2008 adopts the inclusion model. A code-point is assumed to be IDNA2008 adopts the inclusion model. A code point is assumed to be
invalid for IDN use unless it is included as part of a Unicode invalid for IDN use unless it is included as part of a Unicode
property-based rule or, in rare cases, included individually by an property-based rule or, in rare cases, included individually by an
exception. When an implementation moves to a new version of Unicode, exception. When an implementation moves to a new version of Unicode,
the rules may indicate new valid code-points. the rules may indicate new valid code points.
This section provides an overview of the model used to establish the This section provides an overview of the model used to establish the
algorithm and character lists of [IDNA2008-Tables] and describes the algorithm and character lists of [IDNA2008-Tables] and describes the
names and applicability of the categories used there. Note that the names and applicability of the categories used there. Note that the
inclusion of a character in the first category group (Section 3.1.1) inclusion of a character in the first category group (Section 3.1.1)
does not imply that it can be used indiscriminately; some characters does not imply that it can be used indiscriminately; some characters
are associated with contextual rules that must be applied as well. are associated with contextual rules that must be applied as well.
The information given in this section is provided to make the rules, The information given in this section is provided to make the rules,
tables, and protocol easier to understand. The normative generating tables, and protocol easier to understand. The normative generating
skipping to change at page 11, line 34 skipping to change at page 11, line 34
no visible effect in others. IDNA2003 prohibited those types of no visible effect in others. IDNA2003 prohibited those types of
characters entirely by discarding them. We now have a consensus that characters entirely by discarding them. We now have a consensus that
under some conditions, these "joiner" characters are legitimately under some conditions, these "joiner" characters are legitimately
needed to allow useful mnemonics for some languages and scripts. In needed to allow useful mnemonics for some languages and scripts. In
general, context-dependent rules help deal with characters (generally general, context-dependent rules help deal with characters (generally
characters that would otherwise be prohibited entirely) that are used characters that would otherwise be prohibited entirely) that are used
differently or perceived differently across different scripts, and differently or perceived differently across different scripts, and
allow the standard to be applied more appropriately in cases where a allow the standard to be applied more appropriately in cases where a
string is not universally handled the same way. string is not universally handled the same way.
IDNA2008 divides all possible Unicode code-points into four IDNA2008 divides all possible Unicode code points into four
categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and
UNASSIGNED. UNASSIGNED.
3.1.1. PROTOCOL-VALID 3.1.1. PROTOCOL-VALID
Characters identified as "PROTOCOL-VALID" (often abbreviated Characters identified as "PROTOCOL-VALID" (often abbreviated
"PVALID") are permitted in IDNs. Their use may be restricted by "PVALID") are permitted in IDNs. Their use may be restricted by
rules about the context in which they appear or by other rules that rules about the context in which they appear or by other rules that
apply to the entire label in which they are to be embedded. For apply to the entire label in which they are to be embedded. For
example, any label that contains a character in this category that example, any label that contains a character in this category that
skipping to change at page 16, line 13 skipping to change at page 16, line 13
registries are expected to restrict what they permit to be registries are expected to restrict what they permit to be
registered, devising and using rules that are designed to optimize registered, devising and using rules that are designed to optimize
the balance between confusion and risk on the one hand and maximum the balance between confusion and risk on the one hand and maximum
expressiveness in mnemonics on the other. expressiveness in mnemonics on the other.
In addition, there is an important role for user agents in warning In addition, there is an important role for user agents in warning
against label forms that appear problematic given their knowledge of against label forms that appear problematic given their knowledge of
local contexts and conventions. Of course, no approach based on local contexts and conventions. Of course, no approach based on
naming or identifiers alone can protect against all threats. naming or identifiers alone can protect against all threats.
4. Issues that Constrain Possible Solutions 4. Application-Related Issues
4.1. Display and Network Order 4.1. Display and Network Order
Domain names are always transmitted in network order (the order in Domain names are always transmitted in network order (the order in
which the code points are sent in protocols), but may have a which the code points are sent in protocols), but may have a
different display order (the order in which the code points are different display order (the order in which the code points are
displayed on a screen or paper). When a domain name contains displayed on a screen or paper). When a domain name contains
characters that are normally written right to left, display order may characters that are normally written right to left, display order may
be affected although network order is not. It gets even more be affected although network order is not. It gets even more
complicated if left to right and right to left labels are adjacent to complicated if left to right and right to left labels are adjacent to
skipping to change at page 38, line 5 skipping to change at page 38, line 5
All existing channels through which names can enter a DNS server All existing channels through which names can enter a DNS server
database (for example, master files (as described in RFC 1034) and database (for example, master files (as described in RFC 1034) and
DNS update messages [RFC2136]) are IDN-unaware because they predate DNS update messages [RFC2136]) are IDN-unaware because they predate
IDNA. Other sections of this document provide the needed shielding IDNA. Other sections of this document provide the needed shielding
by ensuring that internationalized domain names entering DNS server by ensuring that internationalized domain names entering DNS server
databases through such channels have already been converted to their databases through such channels have already been converted to their
equivalent ASCII A-label forms. equivalent ASCII A-label forms.
Because of the distinction made between the algorithms for Because of the distinction made between the algorithms for
Registration and Lookup in [IDNA2008-Protocol] (a domain name Registration and Lookup in [IDNA2008-Protocol] (a domain name
containing only ASCII codepoints cannot be converted to an A-label), containing only ASCII code points cannot be converted to an A-label),
there cannot be more than one A-label form for any given U-label. there cannot be more than one A-label form for any given U-label.
As specified in RFC 2181 [RFC2181], the DNS protocol explicitly As specified in RFC 2181 [RFC2181], the DNS protocol explicitly
allows domain labels to contain octets beyond the ASCII range allows domain labels to contain octets beyond the ASCII range
(0000..007F), and this document does not change that. However, (0000..007F), and this document does not change that. However,
although the interpretation of octets 0080..00FF is well-defined in although the interpretation of octets 0080..00FF is well-defined in
the DNS, many application protocols support only ASCII labels and the DNS, many application protocols support only ASCII labels and
there is no defined interpretation of these non-ASCII octets as there is no defined interpretation of these non-ASCII octets as
characters and, in particular, no interpretation of case-independent characters and, in particular, no interpretation of case-independent
matching for them (see, e.g., [RFC4343]). If labels containing these matching for them (see, e.g., [RFC4343]). If labels containing these
skipping to change at page 50, line 24 skipping to change at page 50, line 24
I-D version. I-D version.
o Altered use of "these documents" and "these specifications" back o Altered use of "these documents" and "these specifications" back
to "IDNA2008", undoing the change made in Appendix A.6. The to "IDNA2008", undoing the change made in Appendix A.6. The
convolutions became ambiguous in places. convolutions became ambiguous in places.
o Added a sentence to the Introduction to make the non-normative o Added a sentence to the Introduction to make the non-normative
status of this document even more clear and added references to status of this document even more clear and added references to
7.1.2 and 7.1.3 to point to the more formal definitions. 7.1.2 and 7.1.3 to point to the more formal definitions.
A.17. Version -17
o Final IESG comments picked up and included. A few more editorial/
typographic errors caught and fixed.
o Section 4 title adjusted to better match its content.
Author's Address Author's Address
John C Klensin John C Klensin
1770 Massachusetts Ave, Ste 322 1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140 Cambridge, MA 02140
USA USA
Phone: +1 617 245 1457 Phone: +1 617 245 1457
Email: john+ietf@jck.com Email: john+ietf@jck.com
 End of changes. 17 change blocks. 
25 lines changed or deleted 33 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/