| < draft-ietf-idnabis-rationale-09.txt | draft-ietf-idnabis-rationale-10.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft March 9, 2009 | Internet-Draft June 18, 2009 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: September 10, 2009 | Expires: December 20, 2009 | |||
| Internationalized Domain Names for Applications (IDNA): Background, | Internationalized Domain Names for Applications (IDNA): Background, | |||
| Explanation, and Rationale | Explanation, and Rationale | |||
| draft-ietf-idnabis-rationale-09.txt | draft-ietf-idnabis-rationale-10.txt | |||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
| provisions of BCP 78 and BCP 79. This document may contain material | provisions of BCP 78 and BCP 79. This document may contain material | |||
| from IETF Documents or IETF Contributions published or made publicly | from IETF Documents or IETF Contributions published or made publicly | |||
| available before November 10, 2008. The person(s) controlling the | available before November 10, 2008. The person(s) controlling the | |||
| copyright in some of this material may not have granted the IETF | copyright in some of this material may not have granted the IETF | |||
| Trust the right to allow modifications of such material outside the | Trust the right to allow modifications of such material outside the | |||
| IETF Standards Process. Without obtaining an adequate license from | IETF Standards Process. Without obtaining an adequate license from | |||
| skipping to change at page 1, line 43 ¶ | skipping to change at page 1, line 43 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on September 10, 2009. | This Internet-Draft will expire on December 20, 2009. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents in effect on the date of | Provisions Relating to IETF Documents in effect on the date of | |||
| publication of this document (http://trustee.ietf.org/license-info). | publication of this document (http://trustee.ietf.org/license-info). | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| skipping to change at page 2, line 25 ¶ | skipping to change at page 2, line 25 ¶ | |||
| During that time, a number of issues have arisen, including the need | During that time, a number of issues have arisen, including the need | |||
| to update the system to deal with newer versions of Unicode. Some of | to update the system to deal with newer versions of Unicode. Some of | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | |||
| 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | |||
| 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 6 | 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 6 | |||
| 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | |||
| 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 10 | 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | |||
| 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | |||
| 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 11 | 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.1.2. Characters Valid Only in Context With Others . . . . . 11 | 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 | |||
| 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 | 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 | |||
| 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14 | 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, | 3.3. Layered Restrictions: Tables, Context, Registration, | |||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 14 | Applications . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 | 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 | |||
| 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 | 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 | |||
| 4.2. Entry and Display in Applications . . . . . . . . . . . . 16 | 4.2. Entry and Display in Applications . . . . . . . . . . . . 16 | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and | 4.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 17 | Alternate Character Forms . . . . . . . . . . . . . . . . 17 | |||
| 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 | 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 | |||
| 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 20 | 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 | |||
| 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21 | 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | |||
| 6. Front-end and User Interface Processing for Lookup . . . . . . 22 | 6. Front-end and User Interface Processing for Lookup . . . . . . 20 | |||
| 7. Migration from IDNA2003 and Unicode Version Synchronization . 25 | 7. Migration from IDNA2003 and Unicode Version Synchronization . 24 | |||
| 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 25 | 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 25 | 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 24 | |||
| 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26 | 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | |||
| 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27 | 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | |||
| 7.2. Changes in Character Interpretations . . . . . . . . . . . 28 | 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | |||
| 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 30 | 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 28 | |||
| 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31 | 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 | |||
| 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 32 | 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 | |||
| 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32 | 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 | |||
| 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 33 | 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 | |||
| 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 33 | 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 | |||
| 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 34 | 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code | 7.7. Migration Between Unicode Versions: Unassigned Code | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 37 | 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | |||
| 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 38 | 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35 | |||
| 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 38 | 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 36 | |||
| 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 38 | 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 36 | |||
| 8.3. Root and other DNS Server Considerations . . . . . . . . . 39 | 8.3. Root and other DNS Server Considerations . . . . . . . . . 37 | |||
| 9. Internationalization Considerations . . . . . . . . . . . . . 39 | 9. Internationalization Considerations . . . . . . . . . . . . . 37 | |||
| 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 40 | 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 38 | |||
| 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 40 | 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 | |||
| 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 40 | 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 | |||
| 11. Security Considerations . . . . . . . . . . . . . . . . . . . 40 | 11. Security Considerations . . . . . . . . . . . . . . . . . . . 38 | |||
| 11.1. General Security Issues with IDNA . . . . . . . . . . . . 40 | 11.1. General Security Issues with IDNA . . . . . . . . . . . . 38 | |||
| 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41 | 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 41 | 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 42 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
| 14.1. Normative References . . . . . . . . . . . . . . . . . . . 42 | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 40 | |||
| 14.2. Informative References . . . . . . . . . . . . . . . . . . 43 | 14.2. Informative References . . . . . . . . . . . . . . . . . . 41 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 45 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.1. Changes between Version -00 and Version -01 of | A.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 45 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43 | |||
| A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 47 | A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 47 | A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 48 | A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 48 | A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 48 | A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 49 | A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| The original standards for Internationalized Domain Names (IDNs) were | Internationalized Domain Names in Applications (IDNA) is a collection | |||
| completed and deployed starting in 2003. Those standards are known | of standards that allow client applications to convert some Unicode | |||
| as Internationalized Domain Names in Applications (IDNA), taken from | mnemonics to an ASCII-compatible encoding form ("ACE") which is a | |||
| the name of the highest level standard within the group, RFC 3490 | valid DNS label containing only letters, digits, and hyphens. The | |||
| [RFC3490]. After those standards were deployed, a number of issues | specific form of ACE label used by IDNA is called an "A-label". A | |||
| arose that led to a call for a new version of the IDNA protocol and | client can look up an exact A-label in the existing DNS, so A-labels | |||
| the associated tables, including a subset of those described in a | do not require any extensions to DNS, upgrades of DNS servers or | |||
| recent IAB report [RFC4690] and the need to update the system to deal | updates to low-level client libraries. An A-label is recognizable | |||
| with newer versions of Unicode. This document further explains the | from the prefix "xn--" before the characters produced by the Punycode | |||
| issues that have been encountered when they are important to | algorithm [RFC3492], thus a user application can identify an A-label | |||
| understanding of the revised protocols. It also provides an overview | and convert it into Unicode (or some local coded character set) for | |||
| of the new IDNA model and explanatory material for it. Additional | display. | |||
| explanatory material for the specific components of the proposals | ||||
| appears with the associated documents. | ||||
| This document and the associated ones are written from the | [[anchor3: Note in draft: The above discussion, and the rest of the | |||
| perspective of an IDNA-aware user, application, or implementation. | text in this section, are very informal. In particular, the term | |||
| While they may reiterate fundamental DNS rules and requirements for | "A-label" is used to refer to some things that don't meet all of the | |||
| the convenience of the reader, they make no attempt to be | tests for A-labels. I have tightened it somewhat from the suggested | |||
| comprehensive about DNS principles and should not be considered as a | text I received, but not very much. Is the current form ok with | |||
| substitute for a thorough understanding of the DNS protocols and | everyone???]] | |||
| specifications. | ||||
| A good deal of the background material that appeared in RFC 3490 | On the registry side, IDNA allows a registry to offer | |||
| [RFC3490] has been removed from this update. That material is either | Internationalized Domain Names (IDNs) for registration as A-labels. | |||
| of historical interest only or has been covered from a more recent | A registry may offer any subset of valid IDNs, and may apply any | |||
| perspective in RFC 4690 [RFC4690]. | restrictions or bundling (grouping of similar labels together in one | |||
| registration) appropriate for the context of that registry. | ||||
| Registration of labels is sometimes discussed separately from lookup, | ||||
| and is subject to a few specific requirements that do not apply to | ||||
| lookup. | ||||
| This document is not normative. The information it provides is | DNS clients and registries are subject to some differences in | |||
| intended to make the rules, tables, and protocol easier to understand | requirements for handling IDNs. In particular, registries are urged | |||
| and to provide overview information and suggestions for zone | to register only exact, valid A-labels, while clients might do some | |||
| administrators and others who need to make policy, deployment, and | mapping to get from otherwise-invalid user input to a valid A-label. | |||
| similar decisions about IDNs. | ||||
| The first version of IDNA was published in 2003 and is referred to | ||||
| here as IDNA2003 to contrast it with the current version, which is | ||||
| known as IDNA2008. The documents that made up both versions are | ||||
| listed in Section 1.3.1. The characters that are valid in A-labels | ||||
| are identified from rules listed in the Tables document | ||||
| [IDNA2008-Tables], but validity can be derived from the Unicode | ||||
| properties of those characters with a very few exceptions. | ||||
| Traditionally, DNS labels are case-insensitive [RFC1034][RFC1035]. | ||||
| That pattern was preserved in IDNA2003, but if case rules are | ||||
| enforced from one language, another language sometimes loses the | ||||
| ability to treat two characters separately. Case-sensitivity is | ||||
| treated slightly differently in IDNA2008. | ||||
| IDNA2003 used Unicode version 3.2 only. In order to keep up with new | ||||
| characters added in new versions of UNICODE, IDNA2008 decouples its | ||||
| rules from any particular version of UNICODE. Instead, the | ||||
| attributes of new characters in Unicode determines how and whether | ||||
| the characters can be used in IDNA labels. | ||||
| This document provides informational context for IDNA2008, including | ||||
| terminology, background, and policy discussions. | ||||
| 1.2. Discussion Forum | 1.2. Discussion Forum | |||
| [[ RFC Editor: please remove this section. ]] | [[ RFC Editor: please remove this section. ]] | |||
| IDNA2008 is being discussed in the IETF "idnabis" Working Group and | IDNA2008 is being discussed in the IETF "idnabis" Working Group and | |||
| on the mailing list idna-update@alvestrand.no | on the mailing list idna-update@alvestrand.no | |||
| 1.3. Terminology | 1.3. Terminology | |||
| Terminology that is critical for understanding this document and the | Terminology for IDNA2008 appears in [IDNA2008-Defs]. That document | |||
| rest of the documents that make up IDNA2008, appears in | also contains a roadmap to the IDNA2008 document collection. No | |||
| [IDNA2008-Defs]. That document also contains roadmap to the IDNA2008 | attempt should be made to understand this document without the | |||
| document collection. No attempt should be made to understand this | definitions and concepts that appear there. | |||
| document without the definitions and concepts that appear there. | ||||
| 1.3.1. Documents and Standards | 1.3.1. Documents and Standards | |||
| This document uses the term "IDNA2003" to refer to the set of | This document uses the term "IDNA2003" to refer to the set of | |||
| standards that make up and support the version of IDNA published in | standards published in 2003 to define IDNA: the IDNA base | |||
| 2003, i.e., those commonly known as the IDNA base specification | specification [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and | |||
| [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | Stringprep [RFC3454]. | |||
| [RFC3454]. In this document, those names are used to refer, | ||||
| conceptually, to the individual documents, with the base IDNA | ||||
| specification called just "IDNA". | ||||
| The term "IDNA2008" is used to refer to a new version of IDNA as | The term "IDNA2008" is used to refer to a new version of IDNA. | |||
| described in this document and in the documents described in the | IDNA2008 is not dependent on any of the IDNA2003 specifications other | |||
| document listing of [IDNA2008-Defs]. IDNA2008 is not dependent on | than the one for Punycode encoding. References to "these | |||
| any of the IDNA2003 specifications other than the one for Punycode | specifications" or "these documents" are to the entire IDNA2008 set | |||
| encoding. References to "these specifications" or "these documents" | listed in [IDNA2008-Defs]. | |||
| are to the entire IDNA2008 set. | ||||
| 1.3.2. DNS "Name" Terminology | 1.3.2. DNS "Name" Terminology | |||
| These documents depart from historical DNS terminology and usage in | In the context of IDNs, the DNS term 'name' has introduced some | |||
| one important respect. Over the years, the community has talked very | confusion as people speak of DNS labels in terms of the words or | |||
| casually about "names" in the DNS, beginning with calling it "the | phrases of various natural languages. Historically, many of the | |||
| domain name system". That terminology is fine in the very precise | "names" in the DNS have been mnemonics to identify some particular | |||
| sense that the identifiers of the DNS do provide names for objects | concept, object, or organization. They are typically rooted in some | |||
| and addresses. But, in the context of IDNs, the term has introduced | ||||
| some confusion, confusion that has increased further as people have | ||||
| begun to speak of DNS labels in terms of the words or phrases of | ||||
| various natural languages. | ||||
| Historically, many, perhaps most, of the "names" in the DNS have been | ||||
| mnemonics to identify some particular concept, object, or | ||||
| organization. They are typically derived from, or rooted in, some | ||||
| language because most people think in language-based ways. But, | language because most people think in language-based ways. But, | |||
| because they are mnemonics, they need not obey the orthographic | because they are mnemonics, they need not obey the orthographic | |||
| conventions of any language: it is not a requirement that it be | conventions of any language: it is not a requirement that it be | |||
| possible for them to be "words". | possible for them to be "words". | |||
| This distinction is important because the reasonable goal of an IDN | This distinction is important because the reasonable goal of an IDN | |||
| effort is not to be able to write the great Klingon (or language of | effort is not to be able to write the great Klingon (or language of | |||
| one's choice) novel in DNS labels but to be able to form a usefully | one's choice) novel in DNS labels but to be able to form a usefully | |||
| broad range of mnemonics in ways that are as natural as possible in a | broad range of mnemonics in ways that are as natural as possible in a | |||
| very broad range of scripts. | very broad range of scripts. | |||
| skipping to change at page 6, line 43 ¶ | skipping to change at page 7, line 7 ¶ | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 7.4); | prefix, no matter how unlikely those might be (see Section 7.4); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the Punycode encoding | o to reduce the opportunities for attacks via the Punycode encoding | |||
| algorithm itself. | algorithm itself. | |||
| 1.4. Objectives | 1.4. Objectives | |||
| The intent of the IDNA revision effort, and hence of this document | These are the main objectives in revising IDNA. | |||
| and the associated ones, is to increase the usability and | ||||
| effectiveness of internationalized domain names (IDNs) while | o Use a more recent version of Unicode, and allow IDNA to be | |||
| preserving or strengthening the integrity of references that use | independent of Unicode versions, so that IDNA2008 need not be | |||
| them. The original "hostname" character definitions (see, e.g., | update for implementations to adopt codepoints from new Unicode | |||
| [RFC0810]) struck a balance between the creation of useful mnemonics | versions. | |||
| and the introduction of parsing problems or general confusion in the | ||||
| contexts in which domain names are used. The objective of IDNA2008 | o Fix a very small number of code-point categorizations that have | |||
| is to preserve that balance while expanding the character repertoire | turned out to cause problems in the communities that use those | |||
| to include extended versions of Roman-derived scripts and scripts | code-points. | |||
| that are not Roman in origin. No work of this sort is able to | ||||
| completely eliminate sources of visual or textual confusion: such | o Reduce the dependency on mapping, in order that the pre-mapped | |||
| confusion is possible even under the original host naming rules where | forms (which are not valid IDNA labels) tend to appear less often | |||
| only ASCII characters were permitted. However, through the | in various contexts, in favor of valid A-labels. | |||
| application of different techniques at different points (see | ||||
| Section 3.3), it should be possible to keep problems to an acceptable | o Fix some details in the bidirectional codepoint handling | |||
| minimum. One consequence of this general objective is that the | algorithms. | |||
| desire of some user or marketing community to use a particular string | ||||
| --whether the reason is to try to write sentences of particular | ||||
| languages in the DNS, to express a facsimile of the symbol for a | ||||
| brand, or for some other purpose-- is not a primary goal within the | ||||
| context of applications in the domain name space. | ||||
| 1.5. Applicability and Function of IDNA | 1.5. Applicability and Function of IDNA | |||
| The IDNA specification solves the problem of extending the repertoire | The IDNA specification solves the problem of extending the repertoire | |||
| of characters that can be used in domain names to include a large | of characters that can be used in domain names to include a large | |||
| subset of the Unicode repertoire. | subset of the Unicode repertoire. | |||
| IDNA does not extend the service offered by DNS to the applications. | IDNA does not extend DNS. Instead, the applications (and, by | |||
| Instead, the applications (and, by implication, the users) continue | implication, the users) continue to see an exact-match lookup | |||
| to see an exact-match lookup service. Either there is a single | service. Either there is a single exactly-matching (subject to the | |||
| exactly-matching (subject to the base DNS requirement of case- | base DNS requirement of case-insensitive ASCII matching) name or | |||
| insensitive ASCII matching) name or there is no match. This model | there is no match. This model has served the existing applications | |||
| has served the existing applications well, but it requires, with or | well, but it requires, with or without internationalized domain | |||
| without internationalized domain names, that users know the exact | names, that users know the exact spelling of the domain names that | |||
| spelling of the domain names that are to be typed into applications | are to be typed into applications such as web browsers and mail user | |||
| such as web browsers and mail user agents. The introduction of the | agents. The introduction of the larger repertoire of characters | |||
| larger repertoire of characters potentially makes the set of | potentially makes the set of misspellings larger, especially given | |||
| misspellings larger, especially given that in some cases the same | that in some cases the same appearance, for example on a business | |||
| appearance, for example on a business card, might visually match | card, might visually match several Unicode code points or several | |||
| several Unicode code points or several sequences of code points. | sequences of code points. | |||
| The IDNA standard does not require any applications to conform to it, | The IDNA standard does not require any applications to conform to it, | |||
| nor does it retroactively change those applications. An application | nor does it retroactively change those applications. An application | |||
| can elect to use IDNA in order to support IDN while maintaining | can elect to use IDNA in order to support IDN while maintaining | |||
| interoperability with existing infrastructure. If an application | interoperability with existing infrastructure. If an application | |||
| wants to use non-ASCII characters in domain names, IDNA is the only | wants to use non-ASCII characters in domain names, IDNA is the only | |||
| currently-defined option. Adding IDNA support to an existing | currently-defined option. Adding IDNA support to an existing | |||
| application entails changes to the application only, and leaves room | application entails changes to the application only, and leaves room | |||
| for flexibility in front-end processing and more specifically in the | for flexibility in front-end processing and more specifically in the | |||
| user interface (see Section 6). | user interface (see Section 6). | |||
| skipping to change at page 8, line 29 ¶ | skipping to change at page 8, line 35 ¶ | |||
| In order to allow user-friendly input and output of the IDNs and | In order to allow user-friendly input and output of the IDNs and | |||
| acceptance of some characters as equivalent to those to be processed | acceptance of some characters as equivalent to those to be processed | |||
| according to the protocol, the applications need to be modified to | according to the protocol, the applications need to be modified to | |||
| conform to this specification. | conform to this specification. | |||
| This version of IDNA uses the Unicode character repertoire, for | This version of IDNA uses the Unicode character repertoire, for | |||
| continuity with the original version of IDNA. | continuity with the original version of IDNA. | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing | 1.6. Comprehensibility of IDNA Mechanisms and Processing | |||
| One of the major goals of this work is to improve the general | One goal of IDNA2008, which is aided by the main goal of reducing the | |||
| understanding of how IDNA works and what characters are permitted and | dependency on mapping, is to improve the general understanding of how | |||
| what happens to them. Comprehensibility and predictability to users | to users and registrants are important design goals for this effort. | |||
| and registrants are themselves important motivations and design goals | End-user applications have an important role to play in increasing | |||
| for this effort. The effort includes some new terminology and a | this comprehensibility. | |||
| revised and extended model, both covered in this section, and some | ||||
| more specific protocol, processing, and table modifications. Details | ||||
| of the latter appear in other documents (see [IDNA2008-Defs]). | ||||
| Several issues are inherent in the application of IDNs and, indeed, | Any system that tries to handle international characters encounters | |||
| almost any other system that tries to handle international characters | some common problems. For example, a UI cannot display a character | |||
| and concepts. They range from the apparently trivial --e.g., one | if no font for that character is available. In some cases, | |||
| cannot display a character for which one does not have a font | internationalization enables effective localization while maintaining | |||
| available locally-- to the more complex and subtle. Many people have | some global uniformity but losing some universality. | |||
| observed that internationalization is just a tool to enable effective | ||||
| localization while permitting some global uniformity. Issues of | ||||
| display, of exactly how various strings and characters are entered, | ||||
| and so on are inherently issues about localization and user interface | ||||
| design. | ||||
| A protocol such as IDNA can only assume that such operations as data | It is difficult to even make suggestions for end-user applications to | |||
| entry and reconciliation of differences in character forms are | cope when characters and fonts are not available. Because display | |||
| possible. It may make some recommendations about how display might | functions are rarely controlled by the types of applications that | |||
| work when characters and fonts are not available, but they can only | would call upon IDNA, such suggestions will rarely be very effective. | |||
| be general recommendations and, because display functions are rarely | ||||
| controlled by the types of applications that would call upon IDNA, | ||||
| will rarely be very effective. | ||||
| However, shifting responsibility for character mapping and other | Converting between local character sets and normalized Unicode, if | |||
| adjustments from the protocol (where it was located in IDNA2003) to | needed, is part of this set of user agent issues. This conversion | |||
| the user interface or processing before invoking IDNA raises issues | introduces complexity in a system that is not Unicode-native. If a | |||
| about both what that processing should do and about compatibility for | label is converted to a local character set that does not have all | |||
| references prepared in an IDNA2003 context. Those issues are | the needed characters, the user agent may have to add special logic | |||
| discussed in Section 6. | to avoid or reduce loss of information. | |||
| Operations for converting between local character sets and normalized | The major difficulty may lie in accurately identifying the incoming | |||
| Unicode are part of this general set of user interface issues. The | character set and applying the correct conversion routine. Even more | |||
| conversion is obviously not required at all in a Unicode-native | difficult, the local character coding system could be based on | |||
| system that maintains all strings in Normalization Form C (NFC). | conceptually different assumptions than those used by Unicode (e.g., | |||
| (See [Unicode-UAX15] for precise definitions of NFC and NFKC if | choice of font encodings used for publications in some Indic | |||
| needed.) It may, however, involve some complexity in a system that | scripts). Those differences may not easily yield unambiguous | |||
| is not Unicode-native, especially if the elements of the local | conversions or interpretations even if each coding system is | |||
| character set do not map exactly and unambiguously into Unicode | internally consistent and adequate to represent the local language | |||
| characters or do so in a way that is not completely stable over time. | and script. | |||
| Perhaps more important, if a label being converted to a local | ||||
| character set contains Unicode characters that have no correspondence | ||||
| in that character set, the application may have to apply special, | ||||
| locally-appropriate, methods to avoid or reduce loss of information. | ||||
| Depending on the system involved, the major difficulty may not lie in | IDNA2008 shifts responsibility for character mapping and other | |||
| the mapping but in accurately identifying the incoming character set | adjustments from the protocol (where it was located in IDNA2003) to | |||
| and then applying the correct conversion routine. If a local | pre-processing before invoking IDNA. The intent is that this change | |||
| operating system uses one of the ISO 8859 character sets or an | leads to greater usage of fully-valid A-Labels in display, transit | |||
| extensive national or industrial system such as GB18030 [GB18030] or | and storage, which should aid comprehensibility. A careful look at | |||
| BIG5 [BIG5], one must correctly identify the character set in use | pre-processing raises issues about what that pre-processing should do | |||
| before converting to Unicode even though those character coding | and at what point pre-processing becomes harmful, how universally | |||
| systems are substantially or completely Unicode-compatible (i.e., all | consistent pre-processing algorithms can be, and how to be compatible | |||
| of the code points in them have an exact and unique mapping to | with labels prepared in a IDNA2003 context. Those issues are | |||
| Unicode code points). It may be even more difficult when the | discussed in Section 6. [[anchor9: Fix section reference.]] | |||
| character coding system in local use is based on conceptually | ||||
| different assumptions than those used by Unicode about, e.g., font | ||||
| encodings used for publications in some Indic scripts. Those | ||||
| differences may not easily yield unambiguous conversions or | ||||
| interpretations even if each coding system is internally consistent | ||||
| and adequate to represent the local language and script. | ||||
| 2. Processing in IDNA2008 | 2. Processing in IDNA2008 | |||
| These specifications separate Domain Name Registration and Lookup in | These specifications separate Domain Name Registration and Lookup in | |||
| the protocol specification. Doing so reflects current practice in | the protocol specification. This separation reflects current | |||
| which per-registry restrictions and special processing are applied at | practice in which per-registry restrictions and special processing | |||
| registration time but not during lookup. Even more important in the | are applied at registration time but not during lookup. Another | |||
| longer term, it facilitates incremental addition of permitted | significant benefit is that separation facilitates incremental | |||
| character groups to avoid freezing on one particular version of | addition of permitted character groups to avoid freezing on one | |||
| Unicode. | particular version of Unicode. | |||
| The actual registration and lookup protocols for IDNA2008 are | The actual registration and lookup protocols for IDNA2008 are | |||
| specified in [IDNA2008-Protocol]. | specified in [IDNA2008-Protocol]. | |||
| 3. Permitted Characters: An Inclusion List | 3. Permitted Characters: An Inclusion List | |||
| IDNA2008 adopts the inclusion model. A code-point is assumed to be | ||||
| invalid, unless it is included as part of a Unicode property-based | ||||
| rule or in rare cases included individually by an exception. When an | ||||
| implementation moves to a new version of Unicode, the rules may | ||||
| indicate new valid code-points. | ||||
| This section provides an overview of the model used to establish the | This section provides an overview of the model used to establish the | |||
| algorithm and character lists of [IDNA2008-Tables] and describes the | algorithm and character lists of [IDNA2008-Tables] and describes the | |||
| names and applicability of the categories used there. Note that the | names and applicability of the categories used there. Note that the | |||
| inclusion of a character in the first category group (Section 3.1.1) | inclusion of a character in the first category group (Section 3.1.1) | |||
| does not imply that it can be used indiscriminately; some characters | does not imply that it can be used indiscriminately; some characters | |||
| are associated with contextual rules that must be applied as well. | are associated with contextual rules that must be applied as well. | |||
| The information given in this section is provided to make the rules, | The information given in this section is provided to make the rules, | |||
| tables, and protocol easier to understand. The normative generating | tables, and protocol easier to understand. The normative generating | |||
| rules that correspond to this informal discussion appear in | rules that correspond to this informal discussion appear in | |||
| [IDNA2008-Tables] and the rules that actually determine what labels | [IDNA2008-Tables] and the rules that actually determine what labels | |||
| can be registered or looked up are in [IDNA2008-Protocol]. | can be registered or looked up are in [IDNA2008-Protocol]. | |||
| 3.1. A Tiered Model of Permitted Characters and Labels | 3.1. A Tiered Model of Permitted Characters and Labels | |||
| Moving to an inclusion model requires respecifying the list of | Moving to an inclusion model involves a new specification for the | |||
| characters that are permitted in IDNs. In IDNA2003, the role and | list of characters that are permitted in IDNs. In IDNA2003, | |||
| utility of characters are independent of context and fixed forever | character validity is independent of context and fixed forever (or | |||
| (or until the standard is replaced). Making completely context- | until the standard is replaced). However, globally context- | |||
| independent rules globally has proven impractical because some | independent rules have proved to be impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| no visible effect(s) in others. IDNA2003 prohibited those types of | no visible effect in others. IDNA2003 prohibited those types of | |||
| characters entirely. But the restrictions led to a consensus that | characters entirely by discarding them. We now have a consensus that | |||
| under some conditions, these "joiner" characters were legitimately | under some conditions, these "joiner" characters are legitimately | |||
| needed to allow useful mnemonics for some languages and scripts. The | needed to allow useful mnemonics for some languages and scripts. In | |||
| requirement to support those characters but limit their use to very | general, context-dependent rules help deal with characters that are | |||
| specific contexts was reinforced by the observation that handling of | used differently across different scripts, and allow the standard to | |||
| particular characters across the languages that use a script, or the | be applied more appropriately in cases where a string is not | |||
| use of similar or identical-looking characters in different scripts, | universally handled the same way. | |||
| is more complex than many people believed it was several years ago. | ||||
| Independently of the characters chosen (see next subsection), the | IDNA2008 divides all possible Unicode code-points into four | |||
| approach is to divide the characters that appear in Unicode into | categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and | |||
| three categories: | UNASSIGNED. | |||
| 3.1.1. PROTOCOL-VALID | 3.1.1. PROTOCOL-VALID | |||
| Characters identified as "PROTOCOL-VALID" (often abbreviated | Characters identified as "PROTOCOL-VALID" (often abbreviated | |||
| "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | "PVALID") are permitted in IDNs. Their use may be restricted by | |||
| Their use may be restricted by rules about the context in which they | rules about the context in which they appear or by other rules that | |||
| appear or by other rules that apply to the entire label in which they | apply to the entire label in which they are to be embedded. For | |||
| are to be embedded. For example, any label that contains a character | example, any label that contains a character in this category that | |||
| in this category that has a "right-to-left" property must be used in | has a "right-to-left" property must be used in context with the | |||
| context with the "Bidi" rules (see [IDNA2008-Bidi]). | "Bidi" rules (see [IDNA2008-Bidi]). | |||
| The term "PROTOCOL-VALID" is used to stress the fact that the | The term "PROTOCOL-VALID" is used to stress the fact that the | |||
| presence of a character in this category does not imply that a given | presence of a character in this category does not imply that a given | |||
| registry need accept registrations containing any of the characters | registry need accept registrations containing any of the characters | |||
| in the category. Registries are still expected to apply judgment | in the category. Registries are still expected to apply judgment | |||
| about labels they will accept and to maintain rules consistent with | about labels they will accept and to maintain rules consistent with | |||
| those judgments (see [IDNA2008-Protocol] and Section 3.3). | those judgments (see [IDNA2008-Protocol] and Section 3.3). | |||
| Characters that are placed in the "PROTOCOL-VALID" category are | Characters that are placed in the "PROTOCOL-VALID" category are | |||
| expected to never be removed from it or reclassified. While | expected to never be removed from it or reclassified. While | |||
| theoretically characters could be removed from Unicode, such removal | theoretically characters could be removed from Unicode, such removal | |||
| would be inconsistent with the Unicode stability principles (see | would be inconsistent with the Unicode stability principles (see | |||
| [Unicode51], Appendix F) and hence should never occur. | [Unicode51], Appendix F) and hence should never occur. | |||
| 3.1.2. Characters Valid Only in Context With Others | 3.1.2. CONTEXTUAL RULE REQUIRED | |||
| Some characters may be unsuitable for general use in IDNs but | Some characters may be unsuitable for general use in IDNs but | |||
| necessary for the plausible support of some scripts. The two most | necessary for the plausible support of some scripts. The two most | |||
| commonly-cited examples are the zero-width joiner and non-joiner | commonly-cited examples are the zero-width joiner and non-joiner | |||
| characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for | characters (ZWJ, U+200D and ZWNJ, U+200C). | |||
| unambiguous labels may require that other characters be restricted to | ||||
| particular contexts. For example, the ASCII hyphen is not permitted | ||||
| to start or end a label, whether that label contains non-ASCII | ||||
| characters or not. | ||||
| 3.1.2.1. Contextual Restrictions | 3.1.2.1. Contextual Restrictions | |||
| These characters must not appear in IDNs without additional | Characters with contextual restrictions are identified as "CONTEXTUAL | |||
| restrictions, typically because they have no visible consequences in | RULE REQUIRED" and associated with a rule. The rule defines whether | |||
| most scripts but affect format or presentation in a few others or | the character is valid in a particular string, and also whether the | |||
| because they are combining characters that are safe for use only in | rule itself is to be applied on lookup as well as registration. | |||
| conjunction with particular characters or scripts. In order to | ||||
| permit them to be used at all, they are specially identified as | A distinction is made between characters that indicate or prohibit | |||
| "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | joining and ones similar to them (known as "CONTEXT-JOINER" or | |||
| associated with a rule. In addition, the rule will define whether it | "CONTEXTJ") and other characters requiring contextual treatment | |||
| is to be applied on lookup as well as registration. A distinction is | ("CONTEXT-OTHER" or "CONTEXTO"). Only the former require full | |||
| made between characters that indicate or prohibit joining (known as | testing at lookup time. | |||
| "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | ||||
| contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | ||||
| former require full testing at lookup time. | ||||
| It is important to note that these contextual rules cannot prevent | It is important to note that these contextual rules cannot prevent | |||
| all uses of the relevant characters that might be confusing or | all uses of the relevant characters that might be confusing or | |||
| problematic. What they are expected do is to confine applicability | problematic. What they are expected do is to confine applicability | |||
| of the characters to scripts (and narrower contexts) where zone | of the characters to scripts (and narrower contexts) where zone | |||
| administrators are knowledgeable enough about the use of those | administrators are knowledgeable enough about the use of those | |||
| characters to be prepared to deal with them appropriately. For | characters to be prepared to deal with them appropriately. For | |||
| example, a registry dealing with an Indic script that requires ZWJ | example, a registry dealing with an Indic script that requires ZWJ | |||
| and/or ZWNJ as part of the writing system is expected to understand | and/or ZWNJ as part of the writing system is expected to understand | |||
| where the characters have visible effect and where they do not and to | where the characters have visible effect and where they do not and to | |||
| make registration rules accordingly. By contrast, a registry dealing | make registration rules accordingly. By contrast, a registry dealing | |||
| with Latin or Cyrillic script might not be actively aware that the | with Latin or Cyrillic script might not be actively aware that the | |||
| characters exist, much less about the consequences of embedding them | characters exist, much less about the consequences of embedding them | |||
| in labels drawn from those scripts. | in labels drawn from those scripts. | |||
| 3.1.2.2. Rules and Their Application | 3.1.2.2. Rules and Their Application | |||
| The actual rules may be DEFINED or NULL. If present, they may have | Rules have descriptions such as "Must follow a character from Script | |||
| values of "True" (character may be used in any position in any | XYZ", "Must occur only if the entire label is in Script ABC", or | |||
| label), "False" (character may not be used in any label), or may be a | "Must occur only if the previous and subsequent characters have the | |||
| set of procedural rules that specify the context in which the | DFG property". The actual rules may be DEFINED or NULL. If present, | |||
| character is permitted. | they may have values of "True" (character may be used in any position | |||
| in any label), "False" (character may not be used in any label), or | ||||
| may be a set of procedural rules that specify the context in which | ||||
| the character is permitted. | ||||
| Examples of descriptions of typical rules, stated informally and in | Examples of descriptions of typical rules, stated informally and in | |||
| English, include "Must follow a character from Script XYZ", "Must | English, include "Must follow a character from Script XYZ", "Must | |||
| occur only if the entire label is in Script ABC", "Must occur only if | occur only if the entire label is in Script ABC", "Must occur only if | |||
| the previous and subsequent characters have the DFG property". | the previous and subsequent characters have the DFG property". | |||
| Because it is easier to identify these characters than to know that | Because it is easier to identify these characters than to know that | |||
| they are actually needed in IDNs or how to establish exactly the | they are actually needed in IDNs or how to establish exactly the | |||
| right rules for each one, a rule may have a null value in a given | right rules for each one, a rule may have a null value in a given | |||
| version of the tables. Characters associated with null rules are not | version of the tables. Characters associated with null rules are not | |||
| permitted to appear in putative labels for either registration or | permitted to appear in putative labels for either registration or | |||
| lookup. Of course, a later version of the tables might contain a | lookup. Of course, a later version of the tables might contain a | |||
| non-null rule. | non-null rule. | |||
| The description of the syntax of the rules, and the rules themselves, | The actual rules and their descriptions are in [IDNA2008-Tables]. | |||
| appears in [IDNA2008-Tables]. [[anchor11: ??? Section number would | [[anchor12: ??? Section number would be good here.]] That document | |||
| be good here.]] | also creates a registry for future rules. | |||
| 3.1.3. DISALLOWED | 3.1.3. DISALLOWED | |||
| Some characters are inappropriate for use in IDNs and are thus | Some characters are inappropriate for use in IDNs and are thus | |||
| excluded for both registration and lookup (i.e., IDNA-conforming | excluded for both registration and lookup (i.e., IDNA-conforming | |||
| applications performing name lookup should verify that these | applications performing name lookup should verify that these | |||
| characters are absent; if they are present, the label strings should | characters are absent; if they are present, the label strings should | |||
| be rejected rather than converted to A-labels and looked up. Some of | be rejected rather than converted to A-labels and looked up. Some of | |||
| these characters are problematic for use in IDNs (such as the | these characters are problematic for use in IDNs (such as the | |||
| FRACTION SLASH character, U+2044), while some of them (such as the | FRACTION SLASH character, U+2044), while some of them (such as the | |||
| skipping to change at page 14, line 45 ¶ | skipping to change at page 14, line 32 ¶ | |||
| Identifiers" in [Unicode-Security]. | Identifiers" in [Unicode-Security]. | |||
| It is worth stressing that these principles of policy development and | It is worth stressing that these principles of policy development and | |||
| application apply at all levels of the DNS, not only, e.g., TLD or | application apply at all levels of the DNS, not only, e.g., TLD or | |||
| SLD registrations and that even a trivial, "anything permitted that | SLD registrations and that even a trivial, "anything permitted that | |||
| is valid under the protocol" policy is helpful in that it helps users | is valid under the protocol" policy is helpful in that it helps users | |||
| and application developers know what to expect. | and application developers know what to expect. | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, Applications | 3.3. Layered Restrictions: Tables, Context, Registration, Applications | |||
| The essence of the character rules in IDNA2008 is based on the | The character rules in IDNA2008 are based on the realization that | |||
| realization that there is no single magic bullet for any of the | there is no single magic bullet for any of the issues associated with | |||
| issues associated with a multiscript DNS. Instead, the | IDNs. Instead, the specifications define a variety of approaches. | |||
| specifications define a variety of approaches that, together, | The character tables are the first mechanism, protocol rules about | |||
| constitute multiple lines of defense against ambiguity in identifiers | how those characters are applied or restricted in context are the | |||
| and loss of referential integrity. The actual character tables are | second, and those two in combination constitute the limits of what | |||
| the first mechanism, protocol rules about how those characters are | can be done in the protocol. As discussed in the previous section | |||
| applied or restricted in context are the second, and those two in | (Section 3.2), registries are expected to restrict what they permit | |||
| combination constitute the limits of what can be done by a protocol | to be registered, devising and using rules that are designed to | |||
| alone. As discussed in the previous section (Section 3.2), | optimize the balance between confusion and risk on the one hand and | |||
| registries are expected to restrict what they permit to be | maximum expressiveness in mnemonics on the other. | |||
| registered, devising and using rules that are designed to optimize | ||||
| the balance between confusion and risk on the one hand and maximum | ||||
| expressiveness in mnemonics on the other. | ||||
| In addition, there is an important role for user agents in warning | In addition, there is an important role for user agents in warning | |||
| against label forms that appear problematic given their knowledge of | against label forms that appear problematic given their knowledge of | |||
| local contexts and conventions. Of course, no approach based on | local contexts and conventions. Of course, no approach based on | |||
| naming or identifiers alone can protect against all threats. | naming or identifiers alone can protect against all threats. | |||
| 4. Issues that Constrain Possible Solutions | 4. Issues that Constrain Possible Solutions | |||
| 4.1. Display and Network Order | 4.1. Display and Network Order | |||
| The correct treatment of domain names requires a clear distinction | Domain names are always transmitted in network order (the order in | |||
| between Network Order (the order in which the code points are sent in | which the code points are sent in protocols), but may have a | |||
| protocols) and Display Order (the order in which the code points are | different display order (the order in which the code points are | |||
| displayed on a screen or paper). The order of labels in a domain | displayed on a screen or paper). When a domain name contains | |||
| name that contains characters that are normally written right to left | characters that are normally written right to left, display order may | |||
| is discussed in [IDNA2008-Bidi]. In particular, there are questions | be affected although network order is not. It gets even more | |||
| about the order in which labels are displayed if left to right and | complicated if left to right and right to left labels are adjacent to | |||
| right to left labels are adjacent to each other, especially if there | each other within a domain name. The decision about the display | |||
| are also multiple consecutive appearances of one of the types. The | order is ultimately under the control of user agents --including Web | |||
| decision about the display order is ultimately under the control of | browsers, mail clients, hosted Web applications and many more -- | |||
| user agents --including web browsers, mail clients, and the like-- | which may be highly localized. Should a domain name abc.def, in | |||
| which may be highly localized. Even when formats are specified by | which both labels are represented in scripts that are written right | |||
| protocols, the full composition of an Internationalized Resource | to left, be displayed as fed.cba or cba.fed? Applications that are | |||
| Identifier (IRI) [RFC3987] or Internationalized Email address | in deployment today are already diverse, and one can find examples of | |||
| contains elements other than the domain name. For example, IRIs | either choice. | |||
| contain protocol identifiers and field delimiter syntax such as | ||||
| "http://" or "mailto:" while email addresses contain the "@" to | ||||
| separate local parts from domain names. User agents are not required | ||||
| to use those protocol-based forms directly but often do so. While | ||||
| display, parsing, and processing within a label is specified by the | ||||
| normative documents in the IDNA2008 collection, the relationship | ||||
| between fully-qualified domain names and internationalized labels is | ||||
| unchanged from the base DNS specifications. Comments in this | ||||
| document about such full domain names are explanatory or examples of | ||||
| what might be done and must not be considered normative. | ||||
| Questions remain about protocol constraints implying that the overall | ||||
| direction of these strings will always be left to right (or right to | ||||
| left) for an IRI or email address, or if they even should conform to | ||||
| such rules. These questions also have several possible answers. | ||||
| Should a domain name abc.def, in which both labels are represented in | The picture changes once again when an IDN appears in a | |||
| scripts that are written right to left, be displayed as fed.cba or | Internationalized Resource Identifier (IRI) [RFC3987]. An IRI or | |||
| cba.fed? An IRI for clear text web access would, in network order, | Internationalized Email address contains elements other than the | |||
| begin with "http://" and the characters will appear as | domain name. For example, IRIs contain protocol identifiers and | |||
| "http://abc.def" -- but what does this suggest about the display | field delimiter syntax such as "http://" or "mailto:" while email | |||
| order? When entering a URI to many browsers, it may be possible to | addresses contain the "@" to separate local parts from domain names. | |||
| provide only the domain name and leave the "http://" to be filled in | An IRI in network order begins with "http://" followed by domain | |||
| by default, assuming no tail (an approach that does not work for | labels in network order, thus "http://abc.def". | |||
| protocols other than HTTP or whatever is chosen as the default). The | ||||
| natural display order for the typed domain name on a right to left | ||||
| system is fed.cba. Does this change if a protocol identifier, tail, | ||||
| and the corresponding delimiters are specified? | ||||
| While logic, precedent, and reality suggest that these are questions | User agents are not required to display and allow input of IRIs | |||
| for user interface design, not IETF protocol specifications, | directly but often do so. Implementors have to choose whether the | |||
| experience in the 1980s and 1990s with mixing systems in which domain | overall direction of these strings will always be left to right (or | |||
| name labels were read in network order (left to right) and those in | right to left) for an IRI or email address. The natural order for a | |||
| which those labels were read right to left would predict a great deal | user typing a domain name on a right to left system is fed.cba. | |||
| of confusion, and heuristics that sometimes fail, if each | Should the R2L user agent reverse the entire domain name each time a | |||
| implementation of each application makes its own decisions on these | domain name is typed? Does this change if the user types "http://" | |||
| issues. | right before typing a domain name, thus implying that the user is | |||
| beginning at the beginning of the network order IRI? Experience in | ||||
| the 1980s and 1990s with mixing systems in which domain name labels | ||||
| were read in network order (left to right) and those in which those | ||||
| labels were read right to left would predict a great deal of | ||||
| confusion. | ||||
| Any version of IDNA, including the current one, must be written in | If each implementation of each application makes its own decisions on | |||
| terms of the network (transmission on the wire) order of characters | these issues, users will develop heuristics that will sometimes fail | |||
| in labels and for the labels in complete (fully-qualified) domain | when switching applications. However, while some display order | |||
| names and must be quite precise about those relationships. While | conventions, voluntarily adopted, would be desirable to reduce | |||
| some strong suggestions about display order would be desirable to | confusion, such suggestions are beyond the scope of these | |||
| reduce the chances for inconsistent transcription of domain names | ||||
| from printed form, such suggestions are beyond the scope of these | ||||
| specifications. | specifications. | |||
| 4.2. Entry and Display in Applications | 4.2. Entry and Display in Applications | |||
| Applications can accept domain names using any character set or sets | Applications can accept and display domain names using any character | |||
| desired by the application developer, specified by the operating | set or character coding system. That is, the IDNA protocol does not | |||
| system, or dictated by other constraints, and can display domain | necessarily affect the interface between users and applications. An | |||
| names in any character set or character coding system. That is, the | IDNA-aware application can accept and display internationalized | |||
| IDNA protocol does not affect the interface between users and | ||||
| applications. | ||||
| An IDNA-aware application can accept and display internationalized | ||||
| domain names in two formats: the internationalized character set(s) | domain names in two formats: the internationalized character set(s) | |||
| supported by the application (i.e., an appropriate local | supported by the application (i.e., an appropriate local | |||
| representation of a U-label), and as an A-label. Applications may | representation of a U-label), and as an A-label. Applications may | |||
| allow the display of A-labels, but are encouraged to not do so except | allow the display of A-labels, but are encouraged to not do so except | |||
| as an interface for special purposes, possibly for debugging, or to | as an interface for special purposes, possibly for debugging, or to | |||
| cope with display limitations. In general, they should allow, but | cope with display limitations. In general, they should allow, but | |||
| not encourage, user input of that label form. A-labels are opaque | not encourage, user input of A-labels. A-labels are opaque and ugly | |||
| and ugly and malicious variations on them are not easily detected by | and malicious variations on them are not easily detected by users. | |||
| users. Where possible, they should thus only be exposed to users and | Where possible, they should thus only be exposed when they are | |||
| in contexts in which they are absolutely needed. Because IDN labels | absolutely needed. Because IDN labels can be rendered either as | |||
| can be rendered either as A-labels or U-labels, the application may | A-labels or U-labels, the application may reasonably have an option | |||
| reasonably have an option for the user to select the preferred method | for the user to select the preferred method of display. Rendering | |||
| of display; if it does, rendering the U-label should normally be the | the U-label should normally be the default. | |||
| default. | ||||
| Domain names are often stored and transported in many places. For | Domain names are often stored and transported in many places. For | |||
| example, they are part of documents such as mail messages and web | example, they are part of documents such as mail messages and web | |||
| pages. They are transported in many parts of many protocols, such as | pages. They are transported in many parts of many protocols, such as | |||
| both the control commands of SMTP and associated the message body | both the control commands of SMTP and associated the message body | |||
| parts, and in the headers and the body content in HTTP. It is | parts, and in the headers and the body content in HTTP. It is | |||
| important to remember that domain names appear both in domain name | important to remember that domain names appear both in domain name | |||
| slots and in the content that is passed over protocols. | slots and in the content that is passed over protocols. | |||
| In protocols and document formats that define how to handle | In protocols and document formats that define how to handle | |||
| specification or negotiation of charsets, labels can be encoded in | specification or negotiation of charsets, labels can be encoded in | |||
| any charset allowed by the protocol or document format. If a | any charset allowed by the protocol or document format. If a | |||
| protocol or document format only allows one charset, the labels must | protocol or document format only allows one charset, the labels must | |||
| be given in that charset. Of course, not all charsets can properly | be given in that charset. Of course, not all charsets can properly | |||
| represent all labels. If a U-label cannot be displayed in its | represent all labels. If a U-label cannot be displayed in its | |||
| entirety, the only choice (without loss of information) may be to | entirety, the only choice (without loss of information) may be to | |||
| display the A-label. | display the A-label. | |||
| In any place where a protocol or document format allows transmission | Where a protocol or document format allows IDNs, labels should be in | |||
| of the characters in internationalized labels, labels should be | whatever character encoding and escape mechanism the protocol or | |||
| transmitted using whatever character encoding and escape mechanism | document format uses at that place. This provision is intended to | |||
| the protocol or document format uses at that place. This provision | prevent situations in which, e.g., UTF-8 domain names appear embedded | |||
| is intended to prevent situations in which, e.g., UTF-8 domain names | in text that is otherwise in some other character coding. | |||
| appear embedded in text that is otherwise in some other character | ||||
| coding. | ||||
| All protocols that use domain name slots (See Section 2.3.1.6 | All protocols that use domain name slots (See Section 2.3.1.6 | |||
| [[anchor14: ?? Verify this]] in [IDNA2008-Defs]) already have the | [[anchor15: ?? Verify this]] in [IDNA2008-Defs]) already have the | |||
| capacity for handling domain names in the ASCII charset. Thus, | capacity for handling domain names in the ASCII charset. Thus, | |||
| A-labels can inherently be handled by those protocols. | A-labels can inherently be handled by those protocols. | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | |||
| Character Forms | Character Forms | |||
| [[anchor15: There is some internal redundancy and repetition in the | Users have expectations about character matching or equivalence that | |||
| material in this section. Specific suggestions about to reduce or | are based on their own languages and the orthography of those | |||
| eliminate redundant text would be appreciated. If no such | languages. These expectations may not always be met in a global | |||
| suggestions are received before -07 is posted, this note will be | ||||
| removed.]] | ||||
| Users often have expectations about character matching or equivalence | ||||
| that are based on their own languages and the orthography of those | ||||
| languages. These expectations may not be consistent with forms or | ||||
| actions that can be naturally accommodated in a character coding | ||||
| system, especially if multiple languages are written using the same | system, especially if multiple languages are written using the same | |||
| script but using different conventions. A Norwegian user might | script but using different conventions. Some examples: | |||
| expect a label with the ae-ligature to be treated as the same label | ||||
| as one using the Swedish spelling with a-diaeresis even though | ||||
| applying that mapping to English would be astonishing to users. A | ||||
| user in German might expect a label with an o-umlaut and a label that | ||||
| had "oe" substituted, but was otherwise the same, treated as | ||||
| equivalent even though that substitution would be a clear error in | ||||
| Swedish. A Chinese user might expect automatic matching of | ||||
| Simplified and Traditional Chinese characters, but applying that | ||||
| matching for Korean or Japanese text would create considerable | ||||
| confusion. For that matter, an English user might expect "theater" | ||||
| and "theatre" to match. | ||||
| Related issues arise because there are a number of languages written | o A Norwegian user might expect a label with the ae-ligature to be | |||
| with alphabetic scripts in which single phonemes are written using | treated as the same label as one using the Swedish spelling with | |||
| two characters, termed a "digraph", for example, the "ph" in | a-diaeresis even though applying that mapping to English would be | |||
| "pharmacy" and "telephone". (Note that characters paired in this | astonishing to users. | |||
| manner can also appear consecutively without forming a digraph, as in | ||||
| "tophat".) Certain digraphs are normally indicated typographically | ||||
| by setting the two characters closer together than they would be if | ||||
| used consecutively to represent different phonemes. Some digraphs | ||||
| are fully joined as ligatures (strictly designating setting totally | ||||
| without intervening white space, although the term is sometimes | ||||
| applied to close set pairs). An example of this may be seen when the | ||||
| word "encyclopaedia" is set with a U+00E6 LATIN SMALL LIGATURE AE | ||||
| (and some would not consider that word correctly spelled unless the | ||||
| ligature form was used or the "a" was dropped entirely). When these | ||||
| ligature and digraph forms have the same interpretation across all | ||||
| languages that use a given script, application of Unicode | ||||
| normalization generally resolves the differences and causes them to | ||||
| match. When they have different interpretations, any requirements | ||||
| for matching must utilize other methods, presumably at the registry | ||||
| level, or users must be educated to understand that matching will not | ||||
| occur. | ||||
| Difficulties arise from the fact that a given ligature may be a | o A user in German might expect a label with an o-umlaut and a label | |||
| completely optional typographic convenience for representing a | that had "oe" substituted, but was otherwise the same, treated as | |||
| digraph in one language (as in the above example with some spelling | equivalent even though that substitution would be a clear error in | |||
| conventions), while in another language it is a single character that | Swedish. | |||
| may not always be correctly representable by a two-letter sequence | ||||
| (as in the above example with different spelling conventions). This | o A Chinese user might expect automatic matching of Simplified and | |||
| can be illustrated by many words in the Norwegian language, where the | Traditional Chinese characters, but applying that matching for | |||
| "ae" ligature is the 27th letter of a 29-letter extended Latin | Korean or Japanese text would create considerable confusion. | |||
| alphabet. It is equivalent to the 28th letter of the Swedish | ||||
| alphabet (also containing 29 letters), U+00E4 LATIN SMALL LETTER A | o An English user might expect "theater" and "theatre" to match. | |||
| WITH DIAERESIS, for which an "ae" cannot be substituted according to | ||||
| current orthographic standards. | A number of languages use alphabetic scripts in which single phonemes | |||
| are written using two characters, termed a "digraph", for example, | ||||
| the "ph" in "pharmacy" and "telephone". (Such characters can also | ||||
| appear consecutively without forming a digraph, as in "tophat".) | ||||
| Certain digraphs may be indicated typographically by setting the two | ||||
| characters closer together than they would be if used consecutively | ||||
| to represent different phonemes. Some digraphs are fully joined as | ||||
| ligatures. For example, the word "encyclopaedia" is sometimes set | ||||
| with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph | ||||
| forms have the same interpretation across all languages that use a | ||||
| given script, application of Unicode normalization generally resolves | ||||
| the differences and causes them to match. When they have different | ||||
| interpretations, matching must utilize other methods, presumably | ||||
| chosen at the registry completely optional typographic convenience | ||||
| for representing a digraph in one language (as in the above example | ||||
| with some spelling conventions), while in another language it is a | ||||
| single character that may not always be correctly representable by a | ||||
| two-letter sequence (as in the above example with different spelling | ||||
| conventions). This can be illustrated by many words in the Norwegian | ||||
| language, where the "ae" ligature is the 27th letter of a 29-letter | ||||
| extended Latin alphabet. It is equivalent to the 28th letter of the | ||||
| Swedish alphabet (also containing 29 letters), U+00E4 LATIN SMALL | ||||
| LETTER A WITH DIAERESIS, for which an "ae" cannot be substituted | ||||
| according to current orthographic standards. | ||||
| That character (U+00E4) is also part of the German alphabet where, | That character (U+00E4) is also part of the German alphabet where, | |||
| unlike in the Nordic languages, the two-character sequence "ae" is | unlike in the Nordic languages, the two-character sequence "ae" is | |||
| usually treated as a fully acceptable alternate orthography for the | usually treated as a fully acceptable alternate orthography for the | |||
| "umlauted a" character. The inverse is however not true, and those | "umlauted a" character. The inverse is however not true, and those | |||
| two characters cannot necessarily be combined into an "umlauted a". | two characters cannot necessarily be combined into an "umlauted a". | |||
| This also applies to another German character, the "umlauted o" | This also applies to another German character, the "umlauted o" | |||
| (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, | (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, | |||
| cannot be used for writing the name of the author "Goethe". It is | cannot be used for writing the name of the author "Goethe". It is | |||
| also a letter in the Swedish alphabet where, like the "a with | also a letter in the Swedish alphabet where, like the "a with | |||
| skipping to change at page 19, line 28 ¶ | skipping to change at page 18, line 28 ¶ | |||
| Norwegian alphabet, where it is represented, not as "o with | Norwegian alphabet, where it is represented, not as "o with | |||
| diaeresis", but as "slashed o", U+00F8. | diaeresis", but as "slashed o", U+00F8. | |||
| Some of the ligatures that have explicit code points in Unicode were | Some of the ligatures that have explicit code points in Unicode were | |||
| given special handling in IDNA2003 and now pose additional problems | given special handling in IDNA2003 and now pose additional problems | |||
| in transition. See Section 7.2. | in transition. See Section 7.2. | |||
| Additional cases with alphabets written right to left are described | Additional cases with alphabets written right to left are described | |||
| in Section 4.5. | in Section 4.5. | |||
| Whether ligatures and digraphs are to be treated as a sequence of | Matching and comparison algorithm selection often requires | |||
| characters or as a single standalone one constitute a problem that | information about the language being used, context, or both -- | |||
| cannot be resolved solely by operating on scripts. They are, | information that is not available to IDNA or the DNS. Consequently, | |||
| however, a key concern in the IDN context. Their satisfactory | these specifications make no attempt to treat combined characters in | |||
| resolution will require support in policies set by registries, which | any special way. A registry that is aware of the language context in | |||
| therefore need to be particularly mindful not just of this specific | which labels are to be registered, and where that language sometimes | |||
| issue, but of all other related matters that cannot be dealt with on | (or always) treats the two- character sequences as equivalent to the | |||
| an exclusively algorithmic and global basis. | combined form, should give serious consideration to applying a | |||
| "variant" model [RFC3743] [RFC4290], or to prohibiting registration | ||||
| Just as with the examples of different-looking characters that may be | of one the forms entirely, to reduce the opportunities for user | |||
| assumed to be the same, it is in general impossible to deal with | confusion and fraud that would result from the related strings being | |||
| these situations in a system such as IDNA -- or with Unicode | registered to different parties. | |||
| normalization generally -- since determining what to do requires | ||||
| information about the language being used, context, or both. | ||||
| Consequently, these specifications make no attempt to treat these | ||||
| combined characters in any special way. However, their existence | ||||
| provides a prime example of a situation in which a registry that is | ||||
| aware of the language context in which labels are to be registered, | ||||
| and where that language sometimes (or always) treats the two- | ||||
| character sequences as equivalent to the combined form, should give | ||||
| serious consideration to applying a "variant" model [RFC3743] | ||||
| [RFC4290], or to prohibiting registration of one the forms entirely, | ||||
| to reduce the opportunities for user confusion and fraud that would | ||||
| result from the related strings being registered to different | ||||
| parties. | ||||
| [[anchor16: Placeholder: A discussion of the Arabic digit issue | [[anchor16: Placeholder: A discussion of the Arabic digit issue | |||
| should go here once it is resolved in some appropriate way.]] | should go here once it is resolved in some appropriate way.]] | |||
| 4.4. Case Mapping and Related Issues | 4.4. Case Mapping and Related Issues | |||
| In the DNS, ASCII letters are stored with their case preserved. | In the DNS, ASCII letters are stored with their case preserved. | |||
| Matching during the query process is case-independent, but none of | Matching during the query process is case-independent, but none of | |||
| the information that might be represented by choices of case has been | the information that might be represented by choices of case has been | |||
| lost. That model has been accidentally helpful because, as people | lost. That model has been accidentally helpful because, as people | |||
| have created DNS labels by catenating words (or parts of words) to | have created DNS labels by catenating words (or parts of words) to | |||
| form labels, case has often been used to distinguish among components | form labels, case has often been used to distinguish among components | |||
| and make the labels more memorable. | and make the labels more memorable. | |||
| The solution of keeping the characters separate but doing matching | Since DNS servers do not get involved in parsing IDNs, they cannot do | |||
| independent of case is not feasible with IDNA or any IDNA-like model | case-independent matching. Thus, keeping the cases separate in | |||
| because the matching would then have to be done on the server rather | lookup or registration, and doing matching at the server, is not | |||
| than have characters mapped on the client. That situation was | feasible with IDNA or any similar approach. Case-matching must be | |||
| recognized in IDNA2003 and nothing in these specifications | done, if desired, by IDN clients even though it wasn't done by ASCII- | |||
| fundamentally changes it or could do so. In IDNA2003, all characters | only DNS clients. That situation was recognized in IDNA2003 and | |||
| are case-folded and mapped. That results in upper-case characters | nothing in these specifications fundamentally changes it or could do | |||
| being mapped to lower-case ones and in some other transformations of | so. In IDNA2003, all characters are case-folded and mapped by | |||
| alternate forms of characters, especially those that do not have (or | clients in a standardized step. | |||
| did not have) upper-case forms. For example, Greek Final Form Sigma | ||||
| (U+03C2) is mapped to the medial form (U+03C3) and Eszett (German | Some characters do not have upper case forms. For example the | |||
| Sharp S, U+00DF) is mapped to "ss". Neither of these mappings is | Unicode case folding operation maps Greek Final Form Sigma (U+03C2) | |||
| reversible because the upper case of U+03C3 is the Upper Case Sigma | to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) | |||
| (U+03A3) and "ss" is an ASCII string. IDNA2008 permits, at the risk | to "ss". Neither of these mappings is reversible because the upper | |||
| of some incompatibility, slightly more flexibility in this area by | case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII | |||
| avoid case folding and treating these characters as themselves. | string. IDNA2008 permits, at the risk of some incompatibility, | |||
| Approaches to handling that incompatibility are discussed in | slightly more flexibility in this area by avoid case folding and | |||
| Section 7.2. Although information is lost in IDNA2003's ToASCII | treating these characters as themselves. Approaches to handling one- | |||
| operation so that, in some sense, neither Final Sigma nor Eszett can | way mappings are discussed in Section 7.2. | |||
| be represented in an IDN at all, its guarantee of mapping when those | ||||
| characters are used as input can be interpreted as violating one of | Because IDNA2003 maps Final Sigma and Eszett to other characters, and | |||
| the conditions discussed in Section 7.4.1 and hence requiring a | the reverse mapping is never possible, that in some sense means that | |||
| prefix change. The consensus was to not make a prefix change in | neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN. | |||
| spite of this issue. Of course, had a prefix change been made (at | With IDNA2008, both characters can be used in an IDN and so the | |||
| the costs discussed in Section 7.4.3) there would have been several | A-label used for lookup for any U-label containing those characters, | |||
| options, including, if desired, assignment of the character to the | is now different. See Section 7.1 for a discussion of what kinds of | |||
| CONTEXTUAL RULE REQUIRED category and requiring that it only be used | changes might require the IDNA prefix to change; this case is clearly | |||
| in carefully-selected contexts. | worth discussing but the WG came to consensus not to make a prefix | |||
| change anyway. | ||||
| 4.5. Right to Left Text | 4.5. Right to Left Text | |||
| In order to be sure that the directionality of right to left text is | In order to be sure that the directionality of right to left text is | |||
| unambiguous, IDNA2003 required that any label in which right to left | unambiguous, IDNA2003 required that any label in which right to left | |||
| characters appear both starts and ends with them, not include any | characters appear both starts and ends with them and that it not | |||
| characters with strong left to right properties (which excludes other | include any characters with strong left to right properties (that | |||
| alphabetic characters but permits European digits), and rejects any | excludes other alphabetic characters but permits European digits). | |||
| other string that contains a right to left character. This is one of | Any other string that contains a right to left character and does not | |||
| the few places where the IDNA algorithms (both in IDNA2003 and in | meet those requirements is rejected. This is one of the few places | |||
| IDAN2008) are required to examine an entire label, not just | where the IDNA algorithms (both in IDNA2003 and in IDAN2008) examine | |||
| individual characters. The algorithmic model used in IDNA2003 | an entire label, not just individual characters. The algorithmic | |||
| rejects the label when the final character in a right to left string | model used in IDNA2003 rejects the label when the final character in | |||
| requires a combining mark in order to be correctly represented. | a right to left string requires a combining mark in order to be | |||
| correctly represented. | ||||
| That prohibition is not acceptable for writing systems for languages | That prohibition is not acceptable for writing systems for languages | |||
| written with consonantal alphabets to which diacritical vocalic | written with consonantal alphabets to which diacritical vocalic | |||
| systems are applied, and for languages with orthographies derived | systems are applied, and for languages with orthographies derived | |||
| from them where the combining marks may have different functionality. | from them where the combining marks may have different functionality. | |||
| In both cases the combining marks can be essential components of the | In both cases the combining marks can be essential components of the | |||
| orthography. Examples of this are Yiddish, written with an extended | orthography. Examples of this are Yiddish, written with an extended | |||
| Hebrew script, and Dhivehi (the official language of Maldives) which | Hebrew script, and Dhivehi (the official language of Maldives) which | |||
| is written in the Thaana script (which is, in turn, derived from the | is written in the Thaana script (which is, in turn, derived from the | |||
| Arabic script). IDNA2008 removes the restriction on final combining | Arabic script). IDNA2008 removes the restriction on final combining | |||
| characters with a new set of rules for right to left scripts and | characters with a new set of rules for right to left scripts and | |||
| their characters. Those new rules are specified in [IDNA2008-Bidi]. | their characters. Those new rules are specified in [IDNA2008-Bidi]. | |||
| 5. IDNs and the Robustness Principle | 5. IDNs and the Robustness Principle | |||
| The model of IDNs described in this document can be seen as a | The "Robustness Principle" is often stated as "Be conservative about | |||
| particular instance of the "Robustness Principle" that has been so | what you send and liberal in what you accept" (See, e.g., Section | |||
| important to other aspects of Internet protocol design. This | 1.2.2 of the applications-layer Host Requirements specification | |||
| principle is often stated as "Be conservative about what you send and | [RFC1123]) This principle applies to IDNA. In applying the principle | |||
| liberal in what you accept" (See, e.g., Section 1.2.2 of the | to registries as the source ("sender") of all registered and useful | |||
| applications-layer Host Requirements specification [RFC1123]). For | IDNs, registries are responsible for being conservative about what | |||
| IDNs to work well, not only must the protocol be carefully designed | they register and put out in the Internet. For IDNs to work well, | |||
| and implemented, but zone administrators (registries) must have and | zone administrators (registries) must have and require sensible | |||
| require sensible policies about what is registered -- conservative | policies about what is registered -- conservative policies -- and | |||
| policies -- and implement and enforce them. | implement and enforce them. | |||
| Conversely, lookup applications are expected to reject labels that | Conversely, lookup applications are expected to reject labels that | |||
| clearly violate global (protocol) rules (no one has ever seriously | clearly violate global (protocol) rules (no one has ever seriously | |||
| claimed that being liberal in what is accepted requires being | claimed that being liberal in what is accepted requires being | |||
| stupid). However, once one gets past such global rules and deals | stupid). However, once one gets past such global rules and deals | |||
| with anything sensitive to script or locale, it is necessary to | with anything sensitive to script or locale, it is necessary to | |||
| assume that garbage has not been placed into the DNS, i.e., one must | assume that garbage has not been placed into the DNS, i.e., one must | |||
| be liberal about what one is willing to look up in the DNS rather | be liberal about what one is willing to look up in the DNS rather | |||
| than guessing about whether it should have been permitted to be | than guessing about whether it should have been permitted to be | |||
| registered. | registered. | |||
| As mentioned elsewhere, if a string cannot be successfully found in | If a string cannot be successfully found in the DNS after the lookup | |||
| the DNS after the lookup processing described here, it makes no | processing described here, it makes no difference whether it simply | |||
| difference whether it simply wasn't registered or was prohibited by | wasn't registered or was prohibited by some rule at the registry. | |||
| some rule at the registry. Applications should, however, be | Application implementors should be aware that where DNS wildcards are | |||
| sensitive to the fact that, because of the possibility of DNS | used, the ability to successfully resolve a name does not guarantee | |||
| wildcards, the ability to successfully resolve a name does not | that it was actually registered. | |||
| guarantee that it was actually registered. | ||||
| If lookup applications, as a user interface (UI) or other local | ||||
| matter, decide to warn about some strings that are valid under the | ||||
| global rules but that they perceive as dangerous, that is their | ||||
| prerogative and we can only hope that the market (and maybe | ||||
| regulators) will reinforce the good choices and discourage the poor | ||||
| ones. In this context, a lookup application that decides a string | ||||
| that is valid under the protocol is dangerous and refuses to look it | ||||
| up is in violation of the protocols; one that is willing to look | ||||
| something up, but warns against it, is exercising a local choice. | ||||
| 6. Front-end and User Interface Processing for Lookup | 6. Front-end and User Interface Processing for Lookup | |||
| [[anchor18: Note in Draft: While this section has been revised in | ||||
| version -10 to improve clarity, a significant revision is expected | ||||
| once the discussions of mapping stabilize.]] | ||||
| Domain names may be identified and processed in many contexts. They | Domain names may be identified and processed in many contexts. They | |||
| may be typed in by users either by themselves or embedded in an | may be typed in by users either by themselves or embedded in an | |||
| identifier structured for a particular protocol or class of protocols | identifier such as email addresses, URIs, or IRIs. They may occur in | |||
| such a email addresses, URIs, or IRIs. They may occur in running | running text or be processed by one system after being provided in | |||
| text or be processed by one system after being provided in another. | another. Systems may try to normalize URLs to determine (or guess) | |||
| Systems may wish to try to normalize URLs so as to determine (or | whether a reference is valid or two references point to the same | |||
| guess) whether a reference is valid or two references point to the | object without actually looking the objects up (comparison without | |||
| same object without actually looking the objects up and comparing | lookup is necessary for URI types that are not intended to be | |||
| them (that is necessary, not just a choice, for URI types that are | resolved). Some of these goals may be more easily and reliably | |||
| not intended to be resolved). Some of these goals may be more easily | satisfied than others. While there are strong arguments for any | |||
| and reliably satisfied than others. While there are strong arguments | domain name that is placed "on the wire" -- transmitted between | |||
| for any domain name that is placed "on the wire" -- transmitted | systems -- to be in the zero-ambiguity forms of A-labels, it is | |||
| between systems -- to be in the zero-ambiguity forms of A-labels, it | inevitable that programs that process domain names will encounter | |||
| is inevitable that programs that process domain names will encounter | ||||
| U-labels or variant forms. | U-labels or variant forms. | |||
| One source of such forms will be labels created under IDNA2003 | This section discusses these mapping and transformation issues among | |||
| because that protocol allowed labels that were transformed from | names, contrasting IDNA2003 and IDNA2008 behavior. The discussion | |||
| native-character format by mapping some characters into others before | applies only in operations that look up names or interpret files. | |||
| conversion into ACE ("xn--...") format. One consequence of the | There are several reasons why registration activities should require | |||
| transformations was that, when the ToUnicode and ToASCII operations | final names and verification of those names by the would-be | |||
| of IDNA2003 were applied, ToUnicode(ToASCII(original-label)) often | registrant. | |||
| did not produce the original label. IDNA2008 explicitly defines | ||||
| A-labels and U-labels as different forms of the same abstract label, | ||||
| forms that are stable when conversions are performed between them | ||||
| (without mappings). A different way of explaining this is that there | ||||
| are, today, domain names in files on the Internet that use characters | ||||
| that cannot be represented directly in, or recovered from, (A-label) | ||||
| domain names but for which interpretations are provided by IDNA2003. | ||||
| There are two major categories of such characters, those that are | One source of label forms that are neither A-labels nor U-labels will | |||
| removed by NFKC normalization and those upper-case characters that | be labels created under IDNA2003. That protocol allowed labels that | |||
| are mapped to lower-case (there are also a few characters that are | were transformed from native-character format by mapping some | |||
| given special-case mapping treatment in Stringprep, including lower- | characters into others before conversion into A-label format. One | |||
| case characters that are case-folded into other lower-case characters | consequence of the transformations was that conversion from the | |||
| or strings). | A-label format back to native characters often did not produce the | |||
| original label. IDNA2008 explicitly defines A-labels and U-labels as | ||||
| different forms of the same abstract label, forms that are stable | ||||
| when conversions are performed between them (without mappings). | ||||
| A different way of explaining this is that there are, today, domain | ||||
| names in files on the Internet that use characters that cannot be | ||||
| represented directly in, or recovered from, (A-label) domain names | ||||
| but for which interpretations were provided by IDNA2003). There are | ||||
| two major categories of characters irreversibly remapped by | ||||
| Stringprep, those that are removed by NFKC normalization and those | ||||
| upper-case characters that are mapped to lower-case (there are also a | ||||
| few characters that are given special-case mapping treatment, | ||||
| including lower-case characters that are case-folded into other | ||||
| lower-case characters or strings and those that are simply | ||||
| eliminated). | ||||
| Other issues in domain name identification and processing arise | Other issues in domain name identification and processing arise | |||
| because IDNA2003 specified that several other characters be treated | because IDNA2003 specified that several other characters be treated | |||
| as equivalent to the ASCII period (dot, full stop) character used as | as equivalent to the ASCII period (dot, full stop) character used as | |||
| a label separator. If a string that might be a domain name appears | a label separator. If a string that might be a domain name appears | |||
| in an arbitrary context (such as running text), it is difficult, even | in an arbitrary context (such as running text), it is difficult, even | |||
| with only ASCII characters, to know whether an actual domain name (or | with only ASCII characters, to know whether an actual domain name (or | |||
| a protocol parameter like a URI) is present and where it starts and | a protocol parameter like a URI) is present and where it starts and | |||
| ends. When using Unicode, this gets even more difficult if treatment | ends. When using Unicode, this gets even more difficult if treatment | |||
| of certain special characters (like the dot that separates labels in | of certain special characters (like the dot that separates labels in | |||
| a domain name) depends on context (e.g., prior knowledge of whether | a domain name) depends on context (e.g., prior knowledge of whether | |||
| the string represents a domain name or not). That knowledge is not | the string represents a domain name or not). That knowledge is not | |||
| available if the primary heuristic for identifying the presence of | available if the primary heuristic for identifying the presence of | |||
| domain names in strings depends on the presence of dots separating | domain names in strings depends on the presence of dots separating | |||
| groups of characters with no intervening spaces. | groups of characters with no intervening spaces. | |||
| As discussed elsewhere in this document, the IDNA2008 model removes | [[anchor19: Placeholder: In serial efforts to move the mapping model | |||
| all of these mappings and interpretations, including the equivalence | out of the protocol and leave it unspecified here, this paragraph has | |||
| of different forms of dots, from the protocol, discouraging such | become a complete botch. Rewrite when the mapping plan stabilizes.]] | |||
| mappings and leaving them, when necessary, to local processing. This | The IDNA2008 model removes all of these mappings and interpretations, | |||
| should not be taken to imply that local processing is optional or can | including the equivalence of different forms of dots, from the | |||
| be avoided entirely, even if doing so might have been desirable in a | protocol, discouraging such mappings and leaving them, when | |||
| world without IDNA2003 IDNs in files and archives. Instead, unless | necessary, to local processing. This should not be taken to imply | |||
| the program context is such that it is known that any IDNs that | that local processing is optional or can be avoided entirely, even if | |||
| appear will contain either U-label or A-label forms, or that other | doing so might have been desirable in a world without IDNA2003 IDNs | |||
| forms can safely be rejected, some local processing of apparent | in files and archives. Instead, unless the program context is such | |||
| domain name strings will be required, both to maintain compatibility | that it is known that any IDNs that appear will contain either | |||
| with IDNA2003 and to prevent user astonishment. Such local | U-label or A-label forms, or that other forms can safely be rejected, | |||
| processing, while not specified in this document or the associated | some local processing of apparent domain name strings will be | |||
| ones, will generally take one of two forms: | required, both to maintain compatibility with IDNA2003 and to prevent | |||
| user astonishment. Such local processing, while not specified in | ||||
| this document or the associated ones, will generally take one of two | ||||
| forms: | ||||
| o Generic Preprocessing. | o Generic Preprocessing. | |||
| When the context in which the program or system that processes | When the context in which the program or system that processes | |||
| domain names operates is global, a reasonable balance must be | domain names operates is global, a reasonable balance must be | |||
| found that is sensitive to the broad range of local needs and | found that is sensitive to the broad range of local needs and | |||
| assumptions while, at the same time, not sacrificing the needs of | assumptions while, at the same time, not sacrificing the needs of | |||
| one language, script, or user population to those of another. | one language, script, or user population to those of another. | |||
| For this case, the best practice will usually be to apply NFKC and | For this case, the best practice will usually be to apply NFKC and | |||
| case-mapping (or, perhaps better yet, Stringprep itself), plus | case-mapping (or, perhaps better yet, Stringprep itself), plus | |||
| skipping to change at page 25, line 9 ¶ | skipping to change at page 23, line 49 ¶ | |||
| User interfaces involving Latin-based scripts should take special | User interfaces involving Latin-based scripts should take special | |||
| care when considering how to handle case mapping because small | care when considering how to handle case mapping because small | |||
| differences in label strings may cause behavior that is astonishing | differences in label strings may cause behavior that is astonishing | |||
| to users. Because case-insensitive comparison is done for ASCII | to users. Because case-insensitive comparison is done for ASCII | |||
| strings by DNS-servers, an all-ASCII label is treated as case- | strings by DNS-servers, an all-ASCII label is treated as case- | |||
| insensitive. However, if even one of the characters of that string | insensitive. However, if even one of the characters of that string | |||
| is replaced by one that requires the label to be given IDN treatment | is replaced by one that requires the label to be given IDN treatment | |||
| (e.g., by adding a diacritical mark), then the label effectively | (e.g., by adding a diacritical mark), then the label effectively | |||
| becomes case-sensitive because only lower-case characters are | becomes case-sensitive because only lower-case characters are | |||
| permitted in IDNs. This suggests that case mapping for Latin-based | permitted in IDNs. Preprocessing in applications to handle case | |||
| scripts (and possibly other scripts with case distinctions) as a | mapping for Latin-based scripts (and possibly other scripts with case | |||
| preprocessing matter in applications may be wise to prevent user | distinctions) may be wise to prevent user astonishment. However, all | |||
| astonishment, but, since all applications may not do this and | applications may not do this and ambiguity in transport is not | |||
| ambiguity in transport is not desirable, the that case-dependent | desirable. Consequently the case-dependent forms should not be | |||
| forms should not be stored in files. | stored in files. | |||
| The comments above apply only in operations that look up names or | ||||
| interpret files. There are several reasons why registration | ||||
| activities should require final names and verification of those names | ||||
| by the would-be registrant. | ||||
| 7. Migration from IDNA2003 and Unicode Version Synchronization | 7. Migration from IDNA2003 and Unicode Version Synchronization | |||
| 7.1. Design Criteria | 7.1. Design Criteria | |||
| As mentioned above and in RFC 4690, two key goals of the IDNA2008 | As mentioned above and in RFC 4690, two key goals of the IDNA2008 | |||
| design are to enable applications to be agnostic about whether they | design are | |||
| are being run in environments supporting any Unicode version from 3.2 | ||||
| onward and to permit incrementally adding new characters, character | ||||
| groups, scripts, and other character collections as they are | ||||
| incorporated into Unicode, without disruption and, in the long term, | ||||
| without "heavy" processes such as those involving IETF consensus. | ||||
| (An IETF consensus process is required by the IDNA2008 specifications | ||||
| and is expected to be required and used until significant experience | ||||
| accumulates with IDNA operations and new versions of Unicode.) The | ||||
| mechanisms that support this are outlined above and elsewhere in the | ||||
| IDNA2008 document set, but this section reviews them in a context | ||||
| that may be more helpful to those who need to understand the approach | ||||
| and make plans for it. | ||||
| 7.1.1. General IDNA Validity Criteria | o to enable applications to be agnostic about whether they are being | |||
| run in environments supporting any Unicode version from 3.2 | ||||
| onward, | ||||
| The general criteria for a putative label, and the collection of | o to permit incrementally adding new characters, character groups, | |||
| characters that make it up, to be considered IDNA-valid are (the | scripts, and other character collections as they are incorporated | |||
| into Unicode, doing so without disruption and, in the long term, | ||||
| without "heavy" processes (an IETF consensus process is required | ||||
| by the IDNA2008 specifications and is expected to be required and | ||||
| used until significant experience accumulates with IDNA operations | ||||
| and new versions of Unicode). | ||||
| 7.1.1. Summary and Discussion of IDNA Validity Criteria | ||||
| The general criteria for a label to be considered IDNA-valid are (the | ||||
| actual rules are rigorously defined in the "Protocol" and "Tables" | actual rules are rigorously defined in the "Protocol" and "Tables" | |||
| documents): | documents): | |||
| o The characters are "letters", marks needed to form letters, | o The characters are "letters", marks needed to form letters, | |||
| numerals, or other code points used to write words in some | numerals, or other code points used to write words in some | |||
| language. Symbols, drawing characters, and various notational | language. Symbols, drawing characters, and various notational | |||
| characters are intended to be permanently excluded -- some because | characters are intended to be permanently excluded. There is no | |||
| they are harmful in URI, IRI, or similar contexts (e.g., | evidence that they are important enough to Internet operations or | |||
| characters that appear to be slashes or other reserved URI | internationalization to justify expansion of domain names beyond | |||
| punctuation) and others because there is no evidence that they are | the general principle of "letters, digits, and hyphen". | |||
| important enough to Internet operations or internationalization to | (Additional discussion and rationale for the symbol decision | |||
| justify expansion of domain names beyond the general principle of | appears in Section 7.6). | |||
| "letters, digits, and hyphen" and the complexities that would come | ||||
| with it (additional discussion and rationale for the symbol | ||||
| decision appears in Section 7.6). | ||||
| o Other than in very exceptional cases, e.g., where they are needed | o Other than in very exceptional cases, e.g., where they are needed | |||
| to write substantially any word of a given language, punctuation | to write substantially any word of a given language, punctuation | |||
| characters are excluded as well. The fact that a word exists is | characters are excluded. The fact that a word exists is not proof | |||
| not proof that it should be usable in a DNS label and DNS labels | that it should be usable in a DNS label and DNS labels are not | |||
| are not expected to be usable for multiple-word phrases (although | expected to be usable for multiple-word phrases (although they are | |||
| they are certainly not prohibited if the conventions and | certainly not prohibited if the conventions and orthography of a | |||
| orthography of a particular language cause that to be possible). | particular language cause that to be possible). | |||
| Even for English, very common constructions -- contractions like | ||||
| "don't" or "it's", names that are written with apostrophes such as | ||||
| "O'Reilly", or characters for which apostrophes are common | ||||
| substitutes cannot be represented in DNS labels. Words in English | ||||
| whose usually-preferred spellings include diacritical marks cannot | ||||
| be represented under the original hostname rules, but most can be | ||||
| represented if treated as IDNs. | ||||
| o Characters that are unassigned (have no character assignment at | o Characters that are unassigned (have no character assignment at | |||
| all) in the version of Unicode being used by the registry or | all) in the version of Unicode being used by the registry or | |||
| application are not permitted, even on lookup. The issues | application are not permitted, even on lookup. The issues | |||
| involved in this decision are discussed in Section 7.7. | involved in this decision are discussed in Section 7.7. | |||
| o Any character that is mapped to another character by a current | o Any character that is mapped to another character by a current | |||
| version of NFKC is prohibited as input to IDNA (for either | version of NFKC is prohibited as input to IDNA (for either | |||
| registration or lookup). With a few exceptions, this principle | registration or lookup). With a few exceptions, this principle | |||
| excludes any character mapped to another by Nameprep [RFC3491]. | excludes any character mapped to another by Nameprep [RFC3491]. | |||
| Tables used to identify the characters that are IDNA-valid are | The principles above drive the design of rules that are specified | |||
| expected to be driven by the principles above, principles that are | exactly in [IDNA2008-Tables]. Those rules identify the characters | |||
| specified exactly in [IDNA2008-Tables]). The rules given there are | that are IDNA-valid. The rules themselves are normative, and the | |||
| normative, rather than being just an interpretation of the tables. | tables are derived from them, rather than vice versa. | |||
| 7.1.2. Labels in Registration | 7.1.2. Labels in Registration | |||
| Anyone entering a label into a DNS zone must properly validate that | Any label registered in a DNS zone must be validated -- i.e., the | |||
| label -- i.e., be sure that the criteria for that label are met -- in | criteria for that label must be met -- in order for applications to | |||
| order for applications to work as intended. This principle is not | work as intended. This principle is not new. For example, since the | |||
| new. For example, since the DNS was first deployed, zone | DNS was first deployed, zone administrators have been expected to | |||
| administrators have been expected to verify that names meet | verify that names meet "hostname" requirements [RFC0952] where those | |||
| "hostname" [RFC0952] where necessary for the expected applications. | requirements are imposed by the expected applications. Other | |||
| Later addition of special service location formats [RFC2782] imposed | applications contexts, such as the later addition of special service | |||
| new requirements on zone administrators for the use of labels that | location formats [RFC2782] imposed new requirements on zone | |||
| conform to the requirements of those formats. For zones that will | administrators. For zones that will contain IDNs, support for | |||
| contain IDNs, support for Unicode version-independence requires | Unicode version-independence requires restrictions on all strings | |||
| restrictions on all strings placed in the zone. In particular, for | placed in the zone. In particular, for such zones: | |||
| such zones: | ||||
| o Any label that appears to be an A-label, i.e., any label that | o Any label that appears to be an A-label, i.e., any label that | |||
| starts in "xn--", must be IDNA-valid, i.e., they must be valid | starts in "xn--", must be IDNA-valid, i.e., they must be valid | |||
| A-labels, as discussed in Section 2 above. | A-labels, as discussed in Section 2 above. | |||
| o The Unicode tables (i.e., tables of code points, character | o The Unicode tables (i.e., tables of code points, character | |||
| classes, and properties) and IDNA tables (i.e., tables of | classes, and properties) and IDNA tables (i.e., tables of | |||
| contextual rules such as those that appear in the Tables | contextual rules such as those that appear in the Tables | |||
| document), must be consistent on the systems performing or | document), must be consistent on the systems performing or | |||
| validating labels to be registered. Note that this does not | validating labels to be registered. Note that this does not | |||
| require that tables reflect the latest version of Unicode, only | require that tables reflect the latest version of Unicode, only | |||
| that all tables used on a given system are consistent with each | that all tables used on a given system are consistent with each | |||
| other. | other. | |||
| Under this model, a registry (or entity communicating with a registry | Under this model, registry tables will need to be updated (both the | |||
| to accomplish name registrations) will need to update its tables -- | Unicode-associated tables and the tables of permitted IDN characters) | |||
| both the Unicode-associated tables and the tables of permitted IDN | to enable a new script or other set of new characters. The registry | |||
| characters -- to enable a new script or other set of new characters. | will not be affected by newer versions of Unicode, or newly- | |||
| It will not be affected by newer versions of Unicode, or newly- | authorized characters, until and unless it wishes to support them. | |||
| authorized characters, until and unless it wishes to make those | The zone administrator is responsible for verifying IDNA-validity as | |||
| registrations. The zone administrator is also responsible -- under | well as its local policies -- a more extensive set of checks than are | |||
| the protocol and to registrants and users -- for both checking as | required for looking up the labels. Systems looking up or resolving | |||
| required by the protocol and verification that whatever policies it | DNS labels, especially IDN DNS labels, must be able to assume that | |||
| develops are complied with, whether those policies are for minimizing | applicable registration rules were followed for names entered into | |||
| risks due to confusable characters and sequences, for preserving | the DNS. | |||
| language or script integrity, or for other purposes. Those checking | ||||
| and verification procedures are more extensive than those that are is | ||||
| expected of applications systems that look names up. | ||||
| Systems looking up or resolving DNS labels, especially IDN DNS | ||||
| labels, must be able to assume that applicable registration rules | ||||
| were followed for names entered into the DNS. | ||||
| 7.1.3. Labels in Lookup | 7.1.3. Labels in Lookup | |||
| Anyone looking up a label in a DNS zone is required to | Anyone looking up a label in a DNS zone is required to | |||
| o Maintain a consistent set of tables, as discussed above. As with | o Maintain IDNA and Unicode tables that are consistent with regard | |||
| registration, the tables need not reflect the latest version of | to versions, i.e., unless the application actually executes the | |||
| Unicode but they must be consistent. | classification rules in [IDNA2008-Tables], its IDNA tables must be | |||
| derived from the version of Unicode that is supported more | ||||
| generally on the system. As with registration, the tables need | ||||
| not reflect the latest version of Unicode but they must be | ||||
| consistent. | ||||
| o Validate the characters in labels to be looked up only to the | o Validate the characters in labels to be looked up only to the | |||
| extent of determining that the U-label does not contain either | extent of determining that the U-label does not contain | |||
| code points prohibited by IDNA (categorized as "DISALLOWED") or | "DISALLOWED" code points or code points that are unassigned in its | |||
| code points that are unassigned in its version of Unicode. | version of Unicode. | |||
| o Validate the label itself for conformance with a small number of | o Validate the label itself for conformance with a small number of | |||
| whole-label rules, notably verifying that there are no leading | whole-label rules. In particular, it must verify that | |||
| combining marks, that the "bidi" conditions are met if right to | ||||
| left characters appear, that any required contextual rules are | ||||
| available and that, if such rules are associated with Joiner | ||||
| Controls, they are tested. | ||||
| o Avoid validating other contextual rules about characters, | * there are no leading combining marks, | |||
| including mixed-script label prohibitions, although such rules may | ||||
| be used to influence presentation decisions in the user interface. | ||||
| [[anchor20: Check this, and all similar statements, against | ||||
| Protocol when that is finished.]] | ||||
| By avoiding applying its own interpretation of which labels are valid | * the "bidi" conditions are met if right to left characters | |||
| as a means of rejecting lookup attempts, the lookup application | appear, | |||
| becomes less sensitive to version incompatibilities with the | ||||
| particular zone registry associated with the domain name. | * any required contextual rules are available, and | |||
| * any contextual rules that are associated with Joiner Controls | ||||
| are tested. | ||||
| o Do not reject labels based on other contextual rules about | ||||
| characters, including mixed-script label prohibitions. Such rules | ||||
| may be used to influence presentation decisions in the user | ||||
| interface, but not to avoid looking up domain names. | ||||
| Lookup applications that following these rules, rather than having | ||||
| their own criteria for rejecting lookup attempts, are not sensitive | ||||
| to version incompatibilities with the particular zone registry | ||||
| associated with the domain name except for labels containing | ||||
| characters recently added to Unicode. | ||||
| An application or client that processes names according to this | An application or client that processes names according to this | |||
| protocol and then resolves them in the DNS will be able to locate any | protocol and then resolves them in the DNS will be able to locate any | |||
| name that is validly registered, as long as its version of the | name that is registered, as long as those registrations are IDNA- | |||
| Unicode-associated tables is sufficiently up-to-date to interpret all | value and its version of the IDNA tables is sufficiently up-to-date | |||
| of the characters in the label. Messages to users should distinguish | to interpret all of the characters in the label. Messages to users | |||
| between "label contains an unallocated code point" and other types of | should distinguish between "label contains an unallocated code point" | |||
| lookup failures. A failure on the basis of an old version of Unicode | and other types of lookup failures. A failure on the basis of an old | |||
| may lead the user to a desire to upgrade to a newer version, but will | version of Unicode may lead the user to a desire to upgrade to a | |||
| have no other ill effects (this is consistent with behavior in the | newer version, but will have no other ill effects (this is consistent | |||
| transition to the DNS when some hosts could not yet handle some forms | with behavior in the transition to the DNS when some hosts could not | |||
| of names or record types). | yet handle some forms of names or record types). | |||
| 7.2. Changes in Character Interpretations | 7.2. Changes in Character Interpretations | |||
| [[anchor21: Note in Draft: This subsection is completely new in | [[anchor22: This subsection will need to be rewritten when the | |||
| version -04 and has been further tuned in -05 and -06 of this | mapping decisions stabilize.]] | |||
| document. It could almost certainly use improvement, although this | ||||
| note will be removed if there are not significant suggestions about | ||||
| the -06 version. It also contains some material that is redundant | ||||
| with material in other sections. I have not tried to remove that | ||||
| material and will not do so until the WG concludes that this section | ||||
| is relatively stable, but would appreciate help in identifying what | ||||
| should be removed or how this might be enhanced to contain more of | ||||
| that other material. --JcK]] | ||||
| In those scripts that make case distinctions, there are a few | In those scripts that make case distinctions, there are a few | |||
| characters for which an obvious and unique upper case character has | characters for which an obvious and unique upper case character has | |||
| not historically been available to match a lower case one or vice | not historically been available to match a lower case one or vice | |||
| versa. For those characters, the mappings used in constructing the | versa. For those characters, the mappings used in constructing the | |||
| Stringprep tables for IDNA2003, performed using the Unicode CaseFold | Stringprep tables for IDNA2003, performed using the Unicode CaseFold | |||
| operation (See Section 5.8 of the Unicode Standard [Unicode51]), | operation (See Section 5.8 of the Unicode Standard [Unicode51]), | |||
| generate different characters or sets of characters. Those | generate different characters or sets of characters. Those | |||
| operations are not reversible and lose even more information than | operations are not reversible and lose even more information than | |||
| traditional upper case or lower case transformations, but are more | traditional upper case or lower case transformations, but are more | |||
| skipping to change at page 29, line 22 ¶ | skipping to change at page 27, line 40 ¶ | |||
| notable characters of this type are the German character Eszett | notable characters of this type are the German character Eszett | |||
| (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The | (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The | |||
| former is case-folded to the ASCII string "ss", the latter to a | former is case-folded to the ASCII string "ss", the latter to a | |||
| medial (Lower Case) Sigma (U+03C3). | medial (Lower Case) Sigma (U+03C3). | |||
| The decision to eliminate mappings, including case folding, from the | The decision to eliminate mappings, including case folding, from the | |||
| IDNA2008 protocol in order to make A-labels and U-labels idempotent | IDNA2008 protocol in order to make A-labels and U-labels idempotent | |||
| made these characters problematic. If they were to be disallowed, | made these characters problematic. If they were to be disallowed, | |||
| important words and mnemonics could not be written in | important words and mnemonics could not be written in | |||
| orthographically reasonable ways. If they were to be permitted as | orthographically reasonable ways. If they were to be permitted as | |||
| characters distinct from the forms produced by case folding, there | distinct characters, there would be no information loss and | |||
| would be no information loss and registries would have maximum | registries would have more flexibility, but IDNA2003 and IDNA2008 | |||
| flexibility, but labels using those characters that were looked up | lookups might result in different A-labels. | |||
| according to IDNA2003 rules would be transformed into A-labels using | ||||
| their case-mapped variations while lookup according to IDNA2008 rules | ||||
| would be based on different A-labels that represented the actual | ||||
| characters. | ||||
| With the understanding that there would be incompatibility either way | With the understanding that there would be incompatibility either way | |||
| but a judgment that the incompatibility was not significant enough to | but a judgment that the incompatibility was not significant enough to | |||
| justify a prefix change, the WG concluded that Eszett and Final Form | justify a prefix change, the WG concluded that Eszett and Final Form | |||
| Sigma should be treated as distinct and Protocol-Valid characters. | Sigma should be treated as distinct and Protocol-Valid characters. | |||
| The decision faces registries, especially registries maintaining | Registries, especially those maintaining zones for third parties, | |||
| zones for third parties, with a variation on what has become a | must decide how to introduce a new service in a way that does not | |||
| familiar problem: how to introduce a new service in a way that does | create confusion or significantly weaken or invalidate existing | |||
| not create confusion or significantly weaken or invalidate existing | identifiers. This is not a new problem; registries were faced with | |||
| identifiers. | similar issues when IDNs were introduced and when other new forms of | |||
| strings have been permitted as labels. | ||||
| There have traditionally been several approaches to problems of this | There are several approaches to problems of this type. Without any | |||
| type. Without any preference or claim to completeness, these are: | preference or claim to completeness, some of these, all of which have | |||
| been used by registries in the past for similar transitions, are: | ||||
| o Do not permit use of the newly-available character at the registry | o Do not permit use of the newly-available character at the registry | |||
| level. This might cause lookup failures if a domain name were to | level. This might cause lookup failures if a domain name were to | |||
| be written with the expectation of the IDNA2003 mapping behavior, | be written with the expectation of the IDNA2003 mapping behavior, | |||
| but would eliminate any possibility of false matches. | but would eliminate any possibility of false matches. | |||
| o Hold a "sunrise"-like arrangement in which holders of labels that | o Hold a "sunrise"-like arrangement in which holders of labels | |||
| might have resulted from previous mapping (labels containing "ss" | containing "ss" in the Eszett case or Lower Case Sigma are given | |||
| in the Eszett case or ones containing Lower Case Sigma in the | priority (and perhaps other benefits) for registering the | |||
| Final Sigma case) are given priority (and perhaps other benefits) | corresponding string containing Eszett or Final Sigma | |||
| for registering the corresponding string containing the newly- | respectively. | |||
| available characters. | ||||
| o Adopt some sort of "variant" approach in which registrants either | o Adopt some sort of "variant" approach in which registrants obtain | |||
| obtained labels with both character forms or one of them was | labels with both character forms. | |||
| blocked from registration by anyone but the registrant of the | ||||
| other form. | ||||
| In principle, lookup applications could also compensate for the | o Adopt a different form of "variant" approach in which registration | |||
| difference in interpretation by looking up the string according to | of additional names is either not permitted at all or permitted | |||
| the interpretation specified in these documents and then, if that | only by the registrant who already has one of the names. | |||
| failed, doing the lookup with the mapping, simulating the IDNA2003 | ||||
| interpretation. The risk of false positives is such that this is | ||||
| generally to be discouraged unless the application is able to engage | ||||
| in a "is this what you meant" dialogue with the end user. | ||||
| 7.3. More Flexibility in User Agents | 7.3. More Flexibility in User Agents | |||
| [[anchor23: Note in Draft: This section is mapping-related and may | ||||
| need to be revised after that issue settles down.]] Also, it is | ||||
| closely related to Section 4.2 and may need to be cross-referenced | ||||
| from it or consolidated into it. | ||||
| These documents do not specify mappings between one character or code | These documents do not specify mappings between one character or code | |||
| point and others for any reason. Instead, they prohibit the | point and others. Instead, IDNA2008 prohibits characters that would | |||
| characters that would be mapped to others by normalization, upper | be mapped to others by normalization, upper case to lower case | |||
| case to lower case changes, or other rules. As examples, while | changes, or other rules. As examples, while mathematical characters | |||
| mathematical characters based on Latin ones are accepted as input to | based on Latin ones are accepted as input to IDNA2003, they are | |||
| IDNA2003, they are prohibited in IDNA2008. Similarly, double-width | prohibited in IDNA2008. Similarly, double-width characters and other | |||
| characters and other variations are prohibited as IDNA input. | variations are prohibited as IDNA input. | |||
| Since the rules in [IDNA2008-Tables] have the effect that only | Since the rules in [IDNA2008-Tables] have the effect that only | |||
| strings that are not transformed by NFKC are valid, if an application | strings that are not transformed by NFKC are valid, if an application | |||
| chooses to perform NFKC normalization before lookup, that operation | chooses to perform NFKC normalization before lookup, that operation | |||
| is safe since this will never make the application unable to look up | is safe since this will never make the application unable to look up | |||
| any valid string. However, as discussed above, the application | any valid string. However, as discussed above, the application | |||
| cannot guarantee that any other application will perform that | cannot guarantee that any other application will perform that | |||
| mapping, so it should be used only with caution and for informed | mapping, so it should be used only with caution and for informed | |||
| users. | users. | |||
| skipping to change at page 31, line 10 ¶ | skipping to change at page 29, line 24 ¶ | |||
| clearly understand that the character forms are equivalent. For use | clearly understand that the character forms are equivalent. For use | |||
| in interchange among systems, it appears to be much more important | in interchange among systems, it appears to be much more important | |||
| that U-labels and A-labels can be mapped back and forth without loss | that U-labels and A-labels can be mapped back and forth without loss | |||
| of information. | of information. | |||
| One specific, and very important, instance of this strategy arises | One specific, and very important, instance of this strategy arises | |||
| with case-folding. In the ASCII-only DNS, names are looked up and | with case-folding. In the ASCII-only DNS, names are looked up and | |||
| matched in a case-independent way, but no actual case-folding occurs. | matched in a case-independent way, but no actual case-folding occurs. | |||
| Names can be placed in the DNS in either upper or lower case form (or | Names can be placed in the DNS in either upper or lower case form (or | |||
| any mixture of them) and that form is preserved, returned in queries, | any mixture of them) and that form is preserved, returned in queries, | |||
| and so on. IDNA2003 simulated that behavior for non-ASCII strings by | and so on. IDNA2003 approximated that behavior for non-ASCII strings | |||
| performing case-folding at registration time (resulting in only | by performing case-folding at registration time (resulting in only | |||
| lower-case IDNs in the DNS) and when names were looked up. | lower-case IDNs in the DNS) and when names were looked up. | |||
| As suggested earlier in this section, it appears to be desirable to | As suggested earlier in this section, it appears to be desirable to | |||
| do as little character mapping as possible consistent with having | do as little character mapping as possible as long as Unicode works | |||
| Unicode work correctly (e.g., NFC mapping to resolve different | correctly (e.g., NFC mapping to resolve different codings for the | |||
| codings for the same character is still necessary although the | same character is still necessary although the specifications require | |||
| specifications require that it be performed prior to invoking the | that it be performed prior to invoking the protocol) in order to make | |||
| protocol) and to make the mapping between A-labels and U-labels | the mapping between A-labels and U-labels idempotent. Case-mapping | |||
| idempotent. Case-mapping is not an exception to this principle. If | is not an exception to this principle. If only lower case characters | |||
| only lower case characters can be registered in the DNS (i.e., be | can be registered in the DNS (i.e., be present in a U-label), then | |||
| present in a U-label), then IDNA2008 should prohibit upper-case | IDNA2008 should prohibit upper-case characters as input. Some other | |||
| characters as input (and therefore does so). Some other | considerations reinforce this conclusion. For example, in ASCII | |||
| considerations reinforce this conclusion. For example, an essential | case-mapping for individual characters, uppercase(character) must be | |||
| element of the ASCII case-mapping functions is that, for individual | equal to uppercase(lowercase(character)). That may not be true with | |||
| characters, uppercase(character) must be equal to | IDNs. In some scripts that use case distinctions, there are a few | |||
| uppercase(lowercase(character)). That requirement may not be | characters that do not have counterparts in one case or the other. | |||
| satisfied with IDNs. For example, there are some characters in | The relationship between upper case and lower case may even be | |||
| scripts that use case distinction that do not have counterparts in | language-dependent, with different languages (or even the same | |||
| one case or the other. The relationship between upper case and lower | language in different areas) expecting different mappings. User | |||
| case may even be language-dependent, with different languages (or | agents can meet the expectations of users who are accustomed to the | |||
| even the same language in different areas) expecting different | case-insensitive DNS environment by performing case folding prior to | |||
| mappings. Of course, the expectations of users who are accustomed to | IDNA processing, but the IDNA procedures themselves should neither | |||
| a case-insensitive DNS environment will probably be well-served if | require such mapping nor expect them when they are not natural to the | |||
| user agents perform case folding prior to IDNA processing, but the | localized environment. | |||
| IDNA procedures themselves should neither require such mapping nor | ||||
| expect them when they are not natural to the localized environment. | ||||
| 7.4. The Question of Prefix Changes | 7.4. The Question of Prefix Changes | |||
| The conditions that would require a change in the IDNA ACE prefix | The conditions that would require a change in the IDNA ACE prefix | |||
| ("xn--" for the version of IDNA specified in [RFC3490]) have been a | ("xn--" for the version of IDNA specified in [RFC3490]) have been a | |||
| great concern to the community. A prefix change would clearly be | great concern to the community. A prefix change would clearly be | |||
| necessary if the algorithms were modified in a manner that would | necessary if the algorithms were modified in a manner that would | |||
| create serious ambiguities during subsequent transition in | create serious ambiguities during subsequent transition in | |||
| registrations. This section summarizes our conclusions about the | registrations. This section summarizes our conclusions about the | |||
| conditions under which changes in prefix would be necessary and the | conditions under which changes in prefix would be necessary and the | |||
| implications of such a change. | implications of such a change. | |||
| 7.4.1. Conditions Requiring a Prefix Change | 7.4.1. Conditions Requiring a Prefix Change | |||
| An IDN prefix change is needed if a given string would be looked up | An IDN prefix change is needed if a given string would be looked up | |||
| or otherwise interpreted differently depending on the version of the | or otherwise interpreted differently depending on the version of the | |||
| protocol or tables being used. Consequently, work to update IDNs | protocol or tables being used. An IDNA upgrade would require a | |||
| would require a prefix change if, and only if, one of the following | prefix change if, and only if, one of the following four conditions | |||
| four conditions were met: | were met: | |||
| 1. The conversion of an A-label to Unicode (i.e., a U-label) yields | 1. The conversion of an A-label to Unicode (i.e., a U-label) yields | |||
| one string under IDNA2003 (RFC3490) and a different string under | one string under IDNA2003 (RFC3490) and a different string under | |||
| IDNA2008. | IDNA2008. | |||
| 2. An input string that is valid under IDNA2003 and also valid under | 2. In a significant number of cases, an input string that is valid | |||
| IDNA2008 yields two different A-labels with the different | under IDNA2003 and also valid under IDNA2008 yields two different | |||
| versions of IDNA. This condition is believed to be essentially | A-labels with the different versions. This condition is believed | |||
| equivalent to the one above except for a very small number of | to be essentially equivalent to the one above except for a very | |||
| edge cases which may not, pragmatically, justify a prefix change | small number of edge cases which may not justify a prefix change | |||
| (See Section 7.2). | (See Section 7.2). | |||
| Note, however, that if the input string is valid under one | Note that if the input string is valid under one version and not | |||
| version and not valid under the other, this condition does not | valid under the other, this condition does not apply. See the | |||
| apply. See the first item in Section 7.4.2, below. | first item in Section 7.4.2, below. | |||
| 3. A fundamental change is made to the semantics of the string that | 3. A fundamental change is made to the semantics of the string that | |||
| is inserted in the DNS, e.g., if a decision were made to try to | is inserted in the DNS, e.g., if a decision were made to try to | |||
| include language or specific script information in that string, | include language or script information in the encoding in | |||
| rather than having it be just a string of characters. | addition to the string itself. | |||
| 4. A sufficiently large number of characters is added to Unicode so | 4. A sufficiently large number of characters is added to Unicode so | |||
| that the Punycode mechanism for block offsets no longer has | that the Punycode mechanism for block offsets can no longer | |||
| enough capacity to reference the higher-numbered planes and | reference the higher-numbered planes and blocks. This condition | |||
| blocks. This condition is unlikely even in the long term and | is unlikely even in the long term and certain not to arise in the | |||
| certain not to arise in the next few years. | next several years. | |||
| 7.4.2. Conditions Not Requiring a Prefix Change | 7.4.2. Conditions Not Requiring a Prefix Change | |||
| In particular, as a result of the principles described above, none of | As a result of the principles described above, none of the following | |||
| the following changes require a new prefix: | changes require a new prefix: | |||
| 1. Prohibition of some characters as input to IDNA. This may make | 1. Prohibition of some characters as input to IDNA. This may make | |||
| names that are now registered inaccessible, but does not require | names that are now registered inaccessible, but does not change | |||
| a prefix change. | those names. | |||
| 2. Adjustments in IDNA tables or actions, including normalization | 2. Adjustments in IDNA tables or actions, including normalization | |||
| definitions, that affect characters that were already invalid | definitions, that affect characters that were already invalid | |||
| under IDNA2003. | under IDNA2003. | |||
| 3. Changes in the style of the IDNA definition that does not alter | 3. Changes in the style of the IDNA definition that does not alter | |||
| the actions performed by IDNA. | the actions performed by IDNA. | |||
| 7.4.3. Implications of Prefix Changes | 7.4.3. Implications of Prefix Changes | |||
| While it might be possible to make a prefix change, the costs of such | While it might be possible to make a prefix change, the costs of such | |||
| a change are considerable. Even if they wanted to do so, registries | a change are considerable. Registries could not convert all IDNA2003 | |||
| could not convert all IDNA2003 ("xn--") registrations to a new form | ("xn--") registrations to a new form at the same time and synchronize | |||
| at the same time and synchronize that change with applications | that change with applications supporting lookup. Unless all existing | |||
| supporting lookup. Unless all existing registrations were simply to | registrations were simply to be declared invalid (and perhaps even | |||
| be declared invalid (and perhaps even then) systems that needed to | then) systems that needed to support both labels with old prefixes | |||
| support both labels with old prefixes and labels with new ones would | and labels with new ones would first process a putative label under | |||
| first process a putative label under the IDNA2008 rules and try to | the IDNA2008 rules and try to look it up and then, if it were not | |||
| look it up and then, if it were not found, would process the label | found, would process the label under IDNA2003 rules and look it up | |||
| under IDNA2003 rules and look it up again. That process could | again. That process could significantly slow down all processing | |||
| significantly slow down all processing that involved IDNs in the DNS | that involved IDNs in the DNS especially since a fully-qualified name | |||
| especially since, in principle, a fully-qualified name could contain | might contain a mixture of labels that were registered with the old | |||
| a mixture of labels that were registered with the old and new | and new prefixes. That would make DNS caching very difficult. In | |||
| prefixes, a situation that would make the use of DNS caching very | addition, looking up the same input string as two separate A-labels | |||
| difficult. In addition, looking up the same input string as two | creates some potential for confusion and attacks, since the labels | |||
| separate A-labels would create some potential for confusion and | could map to different targets and then resolve to different entries | |||
| attacks, since they could, in principle, map to different targets and | in the DNS. | |||
| then resolve to different entries in the DNS. | ||||
| Consequently, a prefix change is to be avoided if at all possible, | Consequently, a prefix change is to be avoided if at all possible, | |||
| even if it means accepting some IDNA2003 decisions about character | even if it means accepting some IDNA2003 decisions about character | |||
| distinctions as irreversible and/or giving special treatment to edge | distinctions as irreversible and/or giving special treatment to edge | |||
| cases. | cases. | |||
| 7.5. Stringprep Changes and Compatibility | 7.5. Stringprep Changes and Compatibility | |||
| The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | |||
| profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | |||
| profile specific to IDNA, Stringprep is used by a number of other | profile specific to IDNA, Stringprep is used by a number of other | |||
| protocols. Concerns have been expressed about problems for non-DNS | protocols. Were Stringprep to be modified by IDNA2008, those changes | |||
| uses of Stringprep being caused by changes to the specification | to improve the handling of IDNs could cause problems for non-DNS | |||
| intended to improve the handling of IDNs, most notably as this might | uses, most notably if they affected identification and authentication | |||
| affect identification and authentication protocols. The proposed new | protocols. Several elements of IDNA2008 give interpretations to | |||
| inclusion tables [IDNA2008-Tables], the reduction in the number of | strings prohibited under IDNA2003 or prohibit strings that IDNA2003 | |||
| characters permitted as input for registration or lookup (Section 3), | permitted. Those elements include the proposed new inclusion tables | |||
| and even the proposed changes in handling of right to left strings | [IDNA2008-Tables], the reduction in the number of characters | |||
| [IDNA2008-Bidi] either give interpretations to strings prohibited | permitted as input for registration or lookup (Section 3), and even | |||
| under IDNA2003 or prohibit strings that IDNA2003 permitted. The | the proposed changes in handling of right to left strings | |||
| IDNA2008 protocol does not use either Nameprep or Stringprep at all, | [IDNA2008-Bidi]. IDNA2008 does not use Nameprep or Stringprep at | |||
| so there are no side-effect changes to other protocols. | all, so there are no side-effect changes to other protocols. | |||
| It is particularly important to keep IDNA processing separate from | It is particularly important to keep IDNA processing separate from | |||
| processing for various security protocols because some of the | processing for various security protocols because some of the | |||
| constraints that are necessary for smooth and comprehensible use of | constraints that are necessary for smooth and comprehensible use of | |||
| IDNs may be unwanted or undesirable in other contexts. For example, | IDNs may be unwanted or undesirable in other contexts. For example, | |||
| the criteria for good passwords or passphrases are very different | the criteria for good passwords or passphrases are very different | |||
| from those for desirable IDNs: passwords should be hard to guess, | from those for desirable IDNs: passwords should be hard to guess, | |||
| while domain names should normally be easily memorable. Similarly, | while domain names should normally be easily memorable. Similarly, | |||
| internationalized SCSI identifiers and other protocol components are | internationalized SCSI identifiers and other protocol components are | |||
| likely to have different requirements than IDNs. | likely to have different requirements than IDNs. | |||
| skipping to change at page 34, line 23 ¶ | skipping to change at page 32, line 35 ¶ | |||
| One of the major differences between this specification and the | One of the major differences between this specification and the | |||
| original version of IDNA is that the original version permitted non- | original version of IDNA is that the original version permitted non- | |||
| letter symbols of various sorts, including punctuation and line- | letter symbols of various sorts, including punctuation and line- | |||
| drawing symbols, in the protocol. They were always discouraged in | drawing symbols, in the protocol. They were always discouraged in | |||
| practice. In particular, both the "IESG Statement" about IDNA and | practice. In particular, both the "IESG Statement" about IDNA and | |||
| all versions of the ICANN Guidelines specify that only language | all versions of the ICANN Guidelines specify that only language | |||
| characters be used in labels. This specification disallows symbols | characters be used in labels. This specification disallows symbols | |||
| entirely. There are several reasons for this, which include: | entirely. There are several reasons for this, which include: | |||
| o As discussed elsewhere, the original IDNA specification assumed | 1. As discussed elsewhere, the original IDNA specification assumed | |||
| that as many Unicode characters as possible should be permitted, | that as many Unicode characters as possible should be permitted, | |||
| directly or via mapping to other characters, in IDNs. This | directly or via mapping to other characters, in IDNs. This | |||
| specification operates on an inclusion model, extrapolating from | specification operates on an inclusion model, extrapolating from | |||
| the LDH rules -- which have served the Internet very well -- to a | the original "hostname" rules (LDH, see [IDNA2008-Defs]) -- which | |||
| Unicode base rather than an ASCII base. | have served the Internet very well -- to a Unicode base rather | |||
| than an ASCII base. | ||||
| o Most Unicode names for letters are, in most cases, fairly | 2. Symbol names are more problematic than letters because there may | |||
| intuitive, unambiguous and recognizable to users of the relevant | be no general agreement on whether a particular glyph matches a | |||
| script. Symbol names are more problematic because there may be no | symbol; there are no uniform conventions for naming; variations | |||
| general agreement on whether a particular glyph matches a symbol; | such as outline, solid, and shaded forms may or may not exist; | |||
| there are no uniform conventions for naming; variations such as | and so on. As just one example, consider a "heart" symbol as it | |||
| outline, solid, and shaded forms may or may not exist; and so on. | might appear in a logo that might be read as "I love...". While | |||
| As just one example, consider a "heart" symbol as it might appear | the user might read such a logo as "I love..." or "I heart...", | |||
| in a logo that might be read as "I love...". While the user might | considerable knowledge of the coding distinctions made in Unicode | |||
| read such a logo as "I love..." or "I heart...", considerable | is needed to know that there more than one "heart" character | |||
| knowledge of the coding distinctions made in Unicode is needed to | (e.g., U+2665, U+2661, and U+2765) and how to describe it. These | |||
| know that there more than one "heart" character (e.g., U+2665, | issues are of particular importance if strings are expected to be | |||
| U+2661, and U+2765) and how to describe it. These issues are of | understood or transcribed by the listener after being read out | |||
| particular importance if strings are expected to be understood or | loud. | |||
| transcribed by the listener after being read out loud. | [[anchor24: The above paragraph remains controversial as to | |||
| [[anchor22: The above paragraph remains controversial as to | whether it is valid. The WG will need to make a decision if this | |||
| whether it is valid. The WG will need to make a decision if this | section is not dropped entirely.]] | |||
| section is not dropped entirely.]] | ||||
| o Consider the case of a screen reader used by blind Internet users | 3. Consider the case of a screen reader used by blind Internet users | |||
| who must listen to renderings of IDN domain names and possibly | who must listen to renderings of IDN domain names and possibly | |||
| reproduce them on the keyboard. | reproduce them on the keyboard. | |||
| o As a simplified example of this, assume one wanted to use a | 4. As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| those names are ambiguous in the Unicode system of naming (the | those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know -- absent careful study of | would-be registrant has no way to know -- absent careful study of | |||
| the code tables -- whether it is ambiguous (e.g., where there are | the code tables -- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| the hypothetical label doesn't know whether to read it -- try to | the hypothetical label doesn't know whether to read it -- try to | |||
| transmit it to a colleague by voice -- as "heart", as "love", as | transmit it to a colleague by voice -- as "heart", as "love", as | |||
| "black heart", or as any of the other examples below. | "black heart", or as any of the other examples below. | |||
| o The actual situation is even worse than this. There is no | 5. The actual situation is even worse than this. There is no | |||
| possible way for a normal, casual, user to tell the difference | possible way for a normal, casual, user to tell the difference | |||
| between the hearts of U+2665 and U+2765 and the stars of U+2606 | between the hearts of U+2665 and U+2765 and the stars of U+2606 | |||
| and U+2729 or the without somehow knowing to look for a | and U+2729 or the without somehow knowing to look for a | |||
| distinction. We have a white heart (U+2661) and few black hearts. | distinction. We have a white heart (U+2661) and few black | |||
| Consequently, describing a label as containing a heart hopelessly | hearts. Consequently, describing a label as containing a heart | |||
| ambiguous: we can only know that it contains one of several | hopelessly ambiguous: we can only know that it contains one of | |||
| characters that look like hearts or have "heart" in their names. | several characters that look like hearts or have "heart" in their | |||
| In cities where "Square" is a popular part of a location name, one | names. In cities where "Square" is a popular part of a location | |||
| might well want to use a square symbol in a label as well and | name, one might well want to use a square symbol in a label as | |||
| there are far more squares of various flavors in Unicode than | well and there are far more squares of various flavors in Unicode | |||
| there are hearts or stars. | than there are hearts or stars. | |||
| o The consequence of these ambiguities of description and | The consequence of these ambiguities is that symbols are a very poor | |||
| dependencies on distinctions that were, or were not, made in | basis for reliable communication. Consistent with this conclusion, | |||
| Unicode codings is that symbols are a very poor basis for reliable | the Unicode standard recommends that strings used in identifiers not | |||
| communication. Consistent with this conclusion, the Unicode | contain symbols or punctuation [Unicode-UAX31]. Of course, these | |||
| standard recommends that strings used in identifiers not contain | difficulties with symbols do not arise with actual pictographic | |||
| symbols or punctuation [Unicode-UAX31]. Of course, these | languages and scripts which would be treated like any other language | |||
| difficulties with symbols do not arise with actual pictographic | characters; the two should not be confused. | |||
| languages and scripts which would be treated like any other | ||||
| language characters; the two should not be confused. | ||||
| 7.7. Migration Between Unicode Versions: Unassigned Code Points | 7.7. Migration Between Unicode Versions: Unassigned Code Points | |||
| In IDNA2003, labels containing unassigned code points are looked up | In IDNA2003, labels containing unassigned code points are looked up | |||
| on the assumption that, if they appear in labels and can be mapped | on the assumption that, if they appear in labels and can be mapped | |||
| and then resolved, the relevant standards must have changed and the | and then resolved, the relevant standards must have changed and the | |||
| registry has properly allocated only assigned values. | registry has properly allocated only assigned values. | |||
| In the protocol as described in these documents, strings containing | In the protocol described in these documents, strings containing | |||
| unassigned code points must not be either looked up or registered. | unassigned code points must not be either looked up or registered. | |||
| In summary, the status of an unassigned character with regard to the | ||||
| DISALLOWED, PROTOCOL-VALID, and CONTEXTUAL RULE REQUIRED categories | ||||
| cannot be evaluated until a character is actually assigned and known. | ||||
| There are several reasons for this, with the most important ones | There are several reasons for this, with the most important ones | |||
| being: | being: | |||
| o Tests involving the context of characters (e.g., some characters | ||||
| being permitted only adjacent to others of specific types) and | ||||
| integrity tests on complete labels are needed. Unassigned code | ||||
| points cannot be permitted because one cannot determine whether | ||||
| particular code points will require contextual rules (and what | ||||
| those rules should be) before characters are assigned to them and | ||||
| the properties of those characters fully understood. | ||||
| o It cannot be known in advance, and with sufficient reliability, | o It cannot be known in advance, and with sufficient reliability, | |||
| that a code point that was not previously assigned will not be | that a no newly-assigned code point will associated with a | |||
| assigned to a compatibility character or one that would be | character that would be disallowed by the rules in | |||
| otherwise disallowed by the rules in [IDNA2008-Tables]. In | [IDNA2008-Tables] (such as a compatibility character). In | |||
| IDNA2003, since there is no direct dependency on NFKC (many of the | IDNA2003, since there is no direct dependency on NFKC (many of the | |||
| entries in Stringprep's tables are based on NFKC, but IDNA2003 | entries in Stringprep's tables are based on NFKC, but IDNA2003 | |||
| depends only on Stringprep), allocation of a compatibility | depends only on Stringprep), allocation of a compatibility | |||
| character might produce some odd situations, but it would not be a | character might produce some odd situations, but it would not be a | |||
| problem. In IDNA2008, where compatibility characters are assigned | problem. In IDNA2008, where compatibility characters are | |||
| to DISALLOWED unless character-specific exceptions are made, | DISALLOWED unless character-specific exceptions are made, | |||
| permitting strings containing unassigned characters to be looked | permitting strings containing unassigned characters to be looked | |||
| up would permit violating the principle that characters in | up would violate the principle that characters in DISALLOWED are | |||
| DISALLOWED are not looked up. | not looked up. | |||
| o The Unicode Standard specifies that an unassigned code point | o The Unicode Standard specifies that an unassigned code point | |||
| normalizes (and, where relevant, case folds) to itself. If the | normalizes (and, where relevant, case folds) to itself. If the | |||
| code point is later assigned to a character, and particularly if | code point is later assigned to a character, and particularly if | |||
| the newly-assigned code point has a combining class that | the newly-assigned code point has a combining class that | |||
| determines its placement relative to other combining characters, | determines its placement relative to other combining characters, | |||
| it could normalize to some other code point or sequence, creating | it could normalize to some other code point or sequence. | |||
| confusion and/or violating other rules listed here. | ||||
| o Tests involving the context of characters (e.g., some characters | ||||
| being permitted only adjacent to ones of specific types but | ||||
| otherwise invisible or very problematic for other reasons) and | ||||
| integrity tests on complete labels are needed. Unassigned code | ||||
| points cannot be permitted because one cannot determine whether | ||||
| particular code points will require contextual rules (and what | ||||
| those rules should be) before characters are assigned to them and | ||||
| the properties of those characters fully understood. | ||||
| o More generally, the status of an unassigned character with regard | ||||
| to the DISALLOWED and PROTOCOL-VALID categories, and whether | ||||
| contextual rules are required with the latter, cannot be evaluated | ||||
| until a character is actually assigned and known. By contrast, | ||||
| characters that are actually DISALLOWED are placed in that | ||||
| category only as a consequence of rules applied to known | ||||
| properties or per-character evaluation. | ||||
| Another way to look at this is that permitting an unassigned | ||||
| character to be looked up is nearly equivalent to reclassifying a | ||||
| character from DISALLOWED to PROTOCOL-VALID since different systems | ||||
| will interpret the character in different ways. | ||||
| It is possible to argue that the issues above are not important and | It is possible to argue that the issues above are not important and | |||
| that, as a consequence, it is better to retain the principle of | that, as a consequence, it is better to retain the principle of | |||
| looking up labels even if they contain unassigned characters because | looking up labels even if they contain unassigned characters because | |||
| all of the important scripts and characters have been coded as of | all of the important scripts and characters have been coded as of | |||
| Unicode 5.1 and hence unassigned code points will be assigned only to | Unicode 5.1 and hence unassigned code points will be assigned only to | |||
| obscure characters or archaic scripts. Unfortunately, that does not | obscure characters or archaic scripts. Unfortunately, that does not | |||
| appear to be a safe assumption for at least two reasons. First, much | appear to be a safe assumption for at least two reasons. First, much | |||
| the same claim of completeness has been made for earlier versions of | the same claim of completeness has been made for earlier versions of | |||
| Unicode. The reality is that a script that is obscure to much of the | Unicode. The reality is that a script that is obscure to much of the | |||
| skipping to change at page 39, line 31 ¶ | skipping to change at page 37, line 29 ¶ | |||
| and responses may be forced to go to TCP instead of UDP). | and responses may be forced to go to TCP instead of UDP). | |||
| 9. Internationalization Considerations | 9. Internationalization Considerations | |||
| DNS labels and fully-qualified domain names provide mnemonics that | DNS labels and fully-qualified domain names provide mnemonics that | |||
| assist in identifying and referring to resources on the Internet. | assist in identifying and referring to resources on the Internet. | |||
| IDNs expand the range of those mnemonics to include those based on | IDNs expand the range of those mnemonics to include those based on | |||
| languages and character sets other than Western European and Roman- | languages and character sets other than Western European and Roman- | |||
| derived ones. But domain "names" are not, in general, words in any | derived ones. But domain "names" are not, in general, words in any | |||
| language. The recommendations of the IETF policy on character sets | language. The recommendations of the IETF policy on character sets | |||
| and languages, BCP 18 [RFC2277] are applicable to situations in which | and languages, (BCP 18 [RFC2277]) are applicable to situations in | |||
| language identification is used to provide language-specific | which language identification is used to provide language-specific | |||
| contexts. The DNS is, by contrast, global and international and | contexts. The DNS is, by contrast, global and international and | |||
| ultimately has nothing to do with languages. Adding languages (or | ultimately has nothing to do with languages. Adding languages (or | |||
| similar context) to IDNs generally, or to DNS matching in particular, | similar context) to IDNs generally, or to DNS matching in particular, | |||
| would imply context dependent matching in DNS, which would be a very | would imply context dependent matching in DNS, which would be a very | |||
| significant change to the DNS protocol itself. It would also imply | significant change to the DNS protocol itself. It would also imply | |||
| that users would need to identify the language associated with a | that users would need to identify the language associated with a | |||
| particular label in order to look that label up, a decision that | particular label in order to look that label up. That knowledge is | |||
| would be impossible in many or most cases. | generally not available because many labels are not words in any | |||
| language and some may be words in more than one. | ||||
| 10. IANA Considerations | 10. IANA Considerations | |||
| This section gives an overview of registries required for IDNA. The | This section gives an overview of IANA registries required for IDNA. | |||
| actual definitions of the first two appear in [IDNA2008-Tables]. | The actual definitions of, and specifications for, the first two, | |||
| which must be newly-created for IDNA2008, appear in | ||||
| [IDNA2008-Tables]. This document describes the registries but does | ||||
| not specify any IANA actions. | ||||
| 10.1. IDNA Character Registry | 10.1. IDNA Character Registry | |||
| The distinction among the three major categories "UNASSIGNED", | The distinction among the major categories "UNASSIGNED", | |||
| "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and | "DISALLOWED", "PROTOCOL-VALID", and "CONTEXTUAL RULE REQUIRED" is | |||
| rules that are integral elements of [IDNA2008-Tables]. Convenience | made by special categories and rules that are integral elements of | |||
| in programming and validation requires a registry of characters and | [IDNA2008-Tables]. While not normative, an IANA registry of | |||
| scripts and their categories, updated for each new version of Unicode | characters and scripts and their categories, updated for each new | |||
| and the characters it contains. The details of this registry are | version of Unicode and the characters it contains, will be convenient | |||
| specified in [IDNA2008-Tables]. | for programming and validation purposes. The details of this | |||
| registry are specified in [IDNA2008-Tables]. | ||||
| 10.2. IDNA Context Registry | 10.2. IDNA Context Registry | |||
| For characters that are defined in the IDNA Character Registry list | IANA will create and maintain a list of approved contextual rules for | |||
| as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of | characters that are defined in the IDNA Character Registry list as | |||
| rule described in Section 3.1.2), IANA will create and maintain a | requiring a Contextual Rule (i.e., the types of rule described in | |||
| list of approved contextual rules. The details for those rules | Section 3.1.2). The details for those rules appear in | |||
| appear in [IDNA2008-Tables]. | [IDNA2008-Tables]. | |||
| 10.3. IANA Repository of IDN Practices of TLDs | 10.3. IANA Repository of IDN Practices of TLDs | |||
| This registry, historically described as the "IANA Language Character | This registry, historically described as the "IANA Language Character | |||
| Set Registry" or "IANA Script Registry" (both somewhat misleading | Set Registry" or "IANA Script Registry" (both somewhat misleading | |||
| terms) is maintained by IANA at the request of ICANN. It is used to | terms) is maintained by IANA at the request of ICANN. It is used to | |||
| provide a central documentation repository of the IDN policies used | provide a central documentation repository of the IDN policies used | |||
| by top level domain (TLD) registries who volunteer to contribute to | by top level domain (TLD) registries who volunteer to contribute to | |||
| it and is used in conjunction with ICANN Guidelines for IDN use. | it and is used in conjunction with ICANN Guidelines for IDN use. | |||
| It is not an IETF-managed registry and, while the protocol changes | It is not an IETF-managed registry and, while the protocol changes | |||
| specified here may call for some revisions to the tables, these | specified here may call for some revisions to the tables, these | |||
| specifications have no direct effect on that registry and no IANA | specifications have no direct effect on that registry and no IANA | |||
| action is required as a result. | action is required as a result. | |||
| 11. Security Considerations | 11. Security Considerations | |||
| 11.1. General Security Issues with IDNA | 11.1. General Security Issues with IDNA | |||
| This document in the IDNA2008 series is purely explanatory and | This document is purely explanatory and informational and | |||
| informational and consequently introduces no new security issues. It | consequently introduces no new security issues. It would, of course, | |||
| would, of course, be a poor idea for someone to try to implement from | be a poor idea for someone to try to implement from it; such an | |||
| it; such an attempt would almost certainly lead to interoperability | attempt would almost certainly lead to interoperability problems and | |||
| problems and might lead to security ones. A discussion of security | might lead to security ones. A discussion of security issues with | |||
| issues with IDNA, including some relevant history, appears in | IDNA, including some relevant history, appears in [IDNA2008-Defs]. | |||
| [IDNA2008-Defs]. | ||||
| 12. Acknowledgments | 12. Acknowledgments | |||
| The editor and contributors would like to express their thanks to | The editor and contributors would like to express their thanks to | |||
| those who contributed significant early (pre-WG) review comments, | those who contributed significant early (pre-WG) review comments, | |||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | |||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | Simon Josefsson, and Sam Weiler. In addition, some specific ideas | |||
| were incorporated from suggestions, text, or comments about sections | were incorporated from suggestions, text, or comments about sections | |||
| that were unclear supplied by Vint Cerf, Frank Ellerman, Michael | that were unclear supplied by Vint Cerf, Frank Ellerman, Michael | |||
| Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken | Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken | |||
| Whistler, although, as usual, they bear little or no responsibility | Whistler, although, as usual, they bear little or no responsibility | |||
| for the conclusions the editor and contributors reached after | for the conclusions the editor and contributors reached after | |||
| receiving their suggestions. Thanks are also due to Vint Cerf, | receiving their suggestions. Thanks are also due to Vint Cerf, Lisa | |||
| Debbie Garside, and Jefsey Morfin for conversations that led to | Dusseault, Debbie Garside, and Jefsey Morfin for conversations that | |||
| considerable improvements in the content of this document. | led to considerable improvements in the content of this document. | |||
| A meeting was held on 30 January 2008 to attempt to reconcile | A meeting was held on 30 January 2008 to attempt to reconcile | |||
| differences in perspective and terminology about this set of | differences in perspective and terminology about this set of | |||
| specifications between the design team and members of the Unicode | specifications between the design team and members of the Unicode | |||
| Technical Consortium. The discussions at and subsequent to that | Technical Consortium. The discussions at and subsequent to that | |||
| meeting were very helpful in focusing the issues and in refining the | meeting were very helpful in focusing the issues and in refining the | |||
| specifications. The active participants at that meeting were (in | specifications. The active participants at that meeting were (in | |||
| alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | |||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | |||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | |||
| Michel Suignard, and Ken Whistler. We express our thanks to Google | Michel Suignard, and Ken Whistler. We express our thanks to Google | |||
| for support of that meeting and to the participants for their | for support of that meeting and to the participants for their | |||
| contributions. | contributions. | |||
| Useful comments and text on the WG versions of the draft were | Useful comments and text on the WG versions of the draft were | |||
| received from many participants in the IETF "IDNABIS" WG and a number | received from many participants in the IETF "IDNABIS" WG and a number | |||
| of document changes resulted from mailing list discussions made by | of document changes resulted from mailing list discussions made by | |||
| that group. Marcos Sanz provided specific analysis and suggestions | that group. Marcos Sanz provided specific analysis and suggestions | |||
| that were exceptionally helpful in refining the text, as did Vint | that were exceptionally helpful in refining the text, as did Vint | |||
| Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler. | Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler. | |||
| Lisa Dusseault provided extensive editorial suggestions during the | ||||
| spring of 2009, most of which were incorporated. | ||||
| 13. Contributors | 13. Contributors | |||
| While the listed editor held the pen, the core of this document and | While the listed editor held the pen, the core of this document and | |||
| the initial WG version represents the joint work and conclusions of | the initial WG version represents the joint work and conclusions of | |||
| an ad hoc design team consisting of the editor and, in alphabetic | an ad hoc design team consisting of the editor and, in alphabetic | |||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | |||
| In addition, there were many specific contributions and helpful | In addition, there were many specific contributions and helpful | |||
| comments from those listed in the Acknowledgments section and others | comments from those listed in the Acknowledgments section and others | |||
| who have contributed to the development and use of the IDNA | who have contributed to the development and use of the IDNA | |||
| skipping to change at page 49, line 5 ¶ | skipping to change at page 47, line 5 ¶ | |||
| o Added discussion of adding characters to an existing script to the | o Added discussion of adding characters to an existing script to the | |||
| discussion of unassigned code point transitions in Section 7.7. | discussion of unassigned code point transitions in Section 7.7. | |||
| o Tightened up the discussion of non-ASCII string processing | o Tightened up the discussion of non-ASCII string processing | |||
| (Section 8.1) slightly. | (Section 8.1) slightly. | |||
| o Removed some placeholders and comments that have been around long | o Removed some placeholders and comments that have been around long | |||
| enough to be considered acceptable or that no longer seem | enough to be considered acceptable or that no longer seem | |||
| necessary for other reasons. | necessary for other reasons. | |||
| A.10. Version -10 | ||||
| o Extensive editorial improvements, mostly due to suggestions from | ||||
| Lisa Dusseault. | ||||
| o Changes required for the new "mapping" approach and document have, | ||||
| in general, not been incorporated despite several suggestions. | ||||
| The editor intends to wait until the mapping model is stable, or | ||||
| at least until -11 of this document, before trying to incorporate | ||||
| those suggestions. | ||||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| Email: john+ietf@jck.com | Email: john+ietf@jck.com | |||
| End of changes. 119 change blocks. | ||||
| 900 lines changed or deleted | 813 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||