| < draft-ietf-idnabis-rationale-06.txt | draft-ietf-idnabis-rationale-07.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft December 15, 2008 | Internet-Draft February 24, 2009 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: June 18, 2009 | Expires: August 28, 2009 | |||
| Internationalized Domain Names for Applications (IDNA): Background, | Internationalized Domain Names for Applications (IDNA): Background, | |||
| Explanation, and Rationale | Explanation, and Rationale | |||
| draft-ietf-idnabis-rationale-06.txt | draft-ietf-idnabis-rationale-07.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | This Internet-Draft is submitted to IETF in full conformance with the | |||
| applicable patent or other IPR claims of which he or she is aware | provisions of BCP 78 and BCP 79. | |||
| have been or will be disclosed, and any of which he or she becomes | ||||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on June 18, 2009. | This Internet-Draft will expire on August 28, 2009. | |||
| Copyright Notice | ||||
| Copyright (c) 2009 IETF Trust and the persons identified as the | ||||
| document authors. All rights reserved. | ||||
| This document is subject to BCP 78 and the IETF Trust's Legal | ||||
| Provisions Relating to IETF Documents in effect on the date of | ||||
| publication of this document (http://trustee.ietf.org/license-info). | ||||
| Please review these documents carefully, as they describe your rights | ||||
| and restrictions with respect to this document. | ||||
| This document may contain material from IETF Documents or IETF | ||||
| Contributions published or made publicly available before November | ||||
| 10, 2008. The person(s) controlling the copyright in some of this | ||||
| material may not have granted the IETF Trust the right to allow | ||||
| modifications of such material outside the IETF Standards Process. | ||||
| Without obtaining an adequate license from the person(s) controlling | ||||
| the copyright in such materials, this document may not be modified | ||||
| outside the IETF Standards Process, and derivative works of it may | ||||
| not be created outside the IETF Standards Process, except to format | ||||
| it for publication as an RFC or to translate it into languages other | ||||
| than English. | ||||
| Abstract | Abstract | |||
| Several years have passed since the original protocol for | Several years have passed since the original protocol for | |||
| Internationalized Domain Names (IDNs) was completed and deployed. | Internationalized Domain Names (IDNs) was completed and deployed. | |||
| During that time, a number of issues have arisen, including the need | During that time, a number of issues have arisen, including the need | |||
| to update the system to deal with newer versions of Unicode. Some of | to update the system to deal with newer versions of Unicode. Some of | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 6 | |||
| 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 6 | |||
| 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 | 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 7 | |||
| 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 8 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 9 | |||
| 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 11 | |||
| 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | 3.1. A Tiered Model of Permitted Characters and Labels . . . . 11 | |||
| 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 | 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 12 | |||
| 3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 | 3.1.1.2. Rules and Their Application . . . . . . . . . . . 13 | |||
| 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, | 3.3. Layered Restrictions: Tables, Context, Registration, | |||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 13 | Applications . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 | 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 | |||
| 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 | 4.1. Display and Network Order . . . . . . . . . . . . . . . . 16 | |||
| 4.2. Entry and Display in Applications . . . . . . . . . . . . 15 | 4.2. Entry and Display in Applications . . . . . . . . . . . . 17 | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and | 4.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 16 | Alternate Character Forms . . . . . . . . . . . . . . . . 18 | |||
| 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 19 | 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 | |||
| 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 20 | 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21 | |||
| 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 22 | |||
| 6. Front-end and User Interface Processing . . . . . . . . . . . 21 | 6. Front-end and User Interface Processing for Lookup . . . . . . 23 | |||
| 7. Migration from IDNA2003 and Unicode Version Synchronization . 24 | 7. Migration from IDNA2003 and Unicode Version Synchronization . 26 | |||
| 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 | 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 | 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 26 | |||
| 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26 | 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 27 | |||
| 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27 | 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 28 | |||
| 7.2. Changes in Character Interpretations . . . . . . . . . . . 28 | 7.2. Changes in Character Interpretations . . . . . . . . . . . 29 | |||
| 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 | 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 31 | |||
| 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31 | 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 32 | |||
| 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31 | 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 32 | |||
| 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32 | 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 33 | |||
| 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32 | 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 33 | |||
| 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 | 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 34 | |||
| 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33 | 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 34 | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code | 7.7. Migration Between Unicode Versions: Unassigned Code | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | Points . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 37 | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36 | 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 38 | |||
| 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37 | 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 38 | |||
| 10. Internationalization Considerations . . . . . . . . . . . . . 37 | 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 38 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | 8.3. Root and other DNS Server Considerations . . . . . . . . . 39 | |||
| 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 | 9. Internationalization Considerations . . . . . . . . . . . . . 39 | |||
| 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 | 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 40 | |||
| 12. Security Considerations . . . . . . . . . . . . . . . . . . . 38 | 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 40 | |||
| 12.1. General Security Issues with IDNA . . . . . . . . . . . . 38 | 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 40 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | 11. Security Considerations . . . . . . . . . . . . . . . . . . . 40 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | 11.1. General Security Issues with IDNA . . . . . . . . . . . . 40 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 40 | 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 | 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
| 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 42 | ||||
| 14.1. Normative References . . . . . . . . . . . . . . . . . . . 42 | ||||
| 14.2. Informative References . . . . . . . . . . . . . . . . . . 43 | ||||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 45 | ||||
| A.1. Changes between Version -00 and Version -01 of | A.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 45 | |||
| A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 | A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 | A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 43 | A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 44 | A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 45 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 48 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| The original standards for Internationalized Domain Names (IDNs) were | The original standards for Internationalized Domain Names (IDNs) were | |||
| completed and deployed starting in 2003. Those standards are known | completed and deployed starting in 2003. Those standards are known | |||
| as Internationalized Domain Names in Applications (IDNA), taken from | as Internationalized Domain Names in Applications (IDNA), taken from | |||
| the name of the highest level standard within the group, RFC 3490 | the name of the highest level standard within the group, RFC 3490 | |||
| [RFC3490]. After those standards were deployed, a number of issues | [RFC3490]. After those standards were deployed, a number of issues | |||
| arose that led to a call for a new version of the IDNA protocol and | arose that led to a call for a new version of the IDNA protocol and | |||
| the associated tables, including a subset of those described in a | the associated tables, including a subset of those described in a | |||
| recent IAB report [RFC4690] and the need to update the system to deal | recent IAB report [RFC4690] and the need to update the system to deal | |||
| with newer versions of Unicode. This document further explains the | with newer versions of Unicode. This document further explains the | |||
| issues that have been encountered when they are important to | issues that have been encountered when they are important to | |||
| understanding of the revised protocols. It also provides an overview | understanding of the revised protocols. It also provides an overview | |||
| of the new IDNA model and explanatory material for it. Additional | of the new IDNA model and explanatory material for it. Additional | |||
| explanatory material for the specific components of the proposals | explanatory material for the specific components of the proposals | |||
| appears with the associated documents. | appears with the associated documents. | |||
| This document and the associated ones are written from the | ||||
| perspective of an IDNA-aware user, application, or implementation. | ||||
| While they may reiterate fundamental DNS rules and requirements for | ||||
| the convenience of the reader, they make no attempt to be | ||||
| comprehensive about DNS principles and should not be considered as a | ||||
| substitute for a thorough understanding of the DNS protocols and | ||||
| specifications. | ||||
| A good deal of the background material that appeared in RFC 3490 | A good deal of the background material that appeared in RFC 3490 | |||
| [RFC3490] has been removed from this update. That material is either | [RFC3490] has been removed from this update. That material is either | |||
| of historical interest only or has been covered from a more recent | of historical interest only or has been covered from a more recent | |||
| perspective in RFC 4690 [RFC4690]. | perspective in RFC 4690 [RFC4690]. | |||
| This document is not normative. The information it provides is | This document is not normative. The information it provides is | |||
| intended to make the rules, tables, and protocol easier to understand | intended to make the rules, tables, and protocol easier to understand | |||
| and to provide overview information and suggestions for zone | and to provide overview information and suggestions for zone | |||
| administrators and others who need to make policy, deployment, and | administrators and others who need to make policy, deployment, and | |||
| similar decisions about IDNs. | similar decisions about IDNs. | |||
| skipping to change at page 5, line 51 ¶ | skipping to change at page 7, line 10 ¶ | |||
| This distinction is important because the reasonable goal of an IDN | This distinction is important because the reasonable goal of an IDN | |||
| effort is not to be able to write the great Klingon (or language of | effort is not to be able to write the great Klingon (or language of | |||
| one's choice) novel in DNS labels but to be able to form a usefully | one's choice) novel in DNS labels but to be able to form a usefully | |||
| broad range of mnemonics in ways that are as natural as possible in a | broad range of mnemonics in ways that are as natural as possible in a | |||
| very broad range of scripts. | very broad range of scripts. | |||
| 1.3.3. New Terminology and Restrictions | 1.3.3. New Terminology and Restrictions | |||
| These documents introduce new terminology, and precise definitions, | These documents introduce new terminology, and precise definitions, | |||
| for the terms "U-labels", "A-labels", labels that are "IDNA-valid", | for the terms "U-label", "A-Label", LDH-label (to which all valid | |||
| and an "LDH-label" (differing from an LDH-conformant label or fully- | pre-IDNA host names conformed), Reserved-LDH-label (R-LDH-label), XN- | |||
| qualified domain name). They also introduce a restriction, for IDNA- | label, Fake-A-Label, and Non-Reserved-LDH-label (NR-LDH-label). | |||
| conformant applications and DNS zones in which IDNA is used, on | ||||
| strings used as labels that contain "--" in the third and fourth | In addition, the term "putative label" has been adopted to refer to a | |||
| positions, essentially requiring that such strings be IDNA-valid. | label that may appear to meet certain definitional constraints but | |||
| This restriction on strings containing "--" is required for three | has not yet been sufficiently tested for validity. | |||
| These definitions are illustrated in Figure 1 of the Definitions | ||||
| Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and | ||||
| fourth character from the beginning of the label. In IDNA-aware | ||||
| applications, only a subset of these reserved labels is permitted to | ||||
| be used, namely the A-label subset. A-labels are a subset of the | ||||
| R-LDH-labels that begin with the case-insensitive (?) string "xn--". | ||||
| Labels that bear this prefix but which are not otherwise valid fall | ||||
| into the "Fake-A-label" category. The non-reserved labels (NR-LDH- | ||||
| labels) are implicitly valid since they do not trigger any | ||||
| resemblance to IDNA-landr NR-LDH-labels. | ||||
| The creation of the Reserved-LDH category is required for three | ||||
| reasons: | reasons: | |||
| o to prevent confusion with pre-IDNA coding forms; | o to prevent confusion with pre-IDNA coding forms; | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 7.4); | prefix, no matter how unlikely those might be (see Section 7.4); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the Punycode encoding | o to reduce the opportunities for attacks via the Punycode encoding | |||
| algorithm itself. | algorithm itself. | |||
| Figure 1 of the Definitions Document [IDNA2008-Defs] illustrates the | ||||
| terminology used by IDNA for various types of labels and strings and | ||||
| their relationship. | ||||
| 1.4. Objectives | 1.4. Objectives | |||
| The intent of the IDNA revision effort, and hence of this document | The intent of the IDNA revision effort, and hence of this document | |||
| and the associated ones, is to increase the usability and | and the associated ones, is to increase the usability and | |||
| effectiveness of internationalized domain names (IDNs) while | effectiveness of internationalized domain names (IDNs) while | |||
| preserving or strengthening the integrity of references that use | preserving or strengthening the integrity of references that use | |||
| them. The original "hostname" character definitions (see, e.g., | them. The original "hostname" character definitions (see, e.g., | |||
| [RFC0810]) struck a balance between the creation of useful mnemonics | [RFC0810]) struck a balance between the creation of useful mnemonics | |||
| and the introduction of parsing problems or general confusion in the | and the introduction of parsing problems or general confusion in the | |||
| contexts in which domain names are used. The objective of IDNA2008 | contexts in which domain names are used. The objective of IDNA2008 | |||
| skipping to change at page 8, line 4 ¶ | skipping to change at page 9, line 18 ¶ | |||
| IDNA allows the graceful introduction of IDNs not only by avoiding | IDNA allows the graceful introduction of IDNs not only by avoiding | |||
| upgrades to existing infrastructure (such as DNS servers and mail | upgrades to existing infrastructure (such as DNS servers and mail | |||
| transport agents), but also by allowing some rudimentary use of IDNs | transport agents), but also by allowing some rudimentary use of IDNs | |||
| in applications by using the ASCII-encoded representation of the | in applications by using the ASCII-encoded representation of the | |||
| labels containing non-ASCII characters. While such names are user- | labels containing non-ASCII characters. While such names are user- | |||
| unfriendly to read and type, and hence not optimal for user input, | unfriendly to read and type, and hence not optimal for user input, | |||
| they can be used as a last resort to allow rudimentary IDN usage. | they can be used as a last resort to allow rudimentary IDN usage. | |||
| For example, they might be the best choice for display if it were | For example, they might be the best choice for display if it were | |||
| known that relevant fonts were not available on the user's computer. | known that relevant fonts were not available on the user's computer. | |||
| In order to allow user-friendly input and output of the IDNs and | In order to allow user-friendly input and output of the IDNs and | |||
| acceptance of some characters as equivalent to those to be processed | acceptance of some characters as equivalent to those to be processed | |||
| according to the protocol, the applications need to be modified to | according to the protocol, the applications need to be modified to | |||
| conform to this specification. | conform to this specification. | |||
| IDNA uses the Unicode character repertoire, for continuity with the | This version of IDNA uses the Unicode character repertoire, for | |||
| original version of IDNA. | continuity with the original version of IDNA. | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing | 1.6. Comprehensibility of IDNA Mechanisms and Processing | |||
| One of the major goals of this work is to improve the general | One of the major goals of this work is to improve the general | |||
| understanding of how IDNA works and what characters are permitted and | understanding of how IDNA works and what characters are permitted and | |||
| what happens to them. Comprehensibility and predictability to users | what happens to them. Comprehensibility and predictability to users | |||
| and registrants are themselves important motivations and design goals | and registrants are themselves important motivations and design goals | |||
| for this effort. The effort includes some new terminology and a | for this effort. The effort includes some new terminology and a | |||
| revised and extended model, both covered in this section, and some | revised and extended model, both covered in this section, and some | |||
| more specific protocol, processing, and table modifications. Details | more specific protocol, processing, and table modifications. Details | |||
| skipping to change at page 9, line 26 ¶ | skipping to change at page 10, line 40 ¶ | |||
| the mapping but in accurately identifying the incoming character set | the mapping but in accurately identifying the incoming character set | |||
| and then applying the correct conversion routine. If a local | and then applying the correct conversion routine. If a local | |||
| operating system uses one of the ISO 8859 character sets or an | operating system uses one of the ISO 8859 character sets or an | |||
| extensive national or industrial system such as GB18030 [GB18030] or | extensive national or industrial system such as GB18030 [GB18030] or | |||
| BIG5 [BIG5], one must correctly identify the character set in use | BIG5 [BIG5], one must correctly identify the character set in use | |||
| before converting to Unicode even though those character coding | before converting to Unicode even though those character coding | |||
| systems are substantially or completely Unicode-compatible (i.e., all | systems are substantially or completely Unicode-compatible (i.e., all | |||
| of the code points in them have an exact and unique mapping to | of the code points in them have an exact and unique mapping to | |||
| Unicode code points). It may be even more difficult when the | Unicode code points). It may be even more difficult when the | |||
| character coding system in local use is based on conceptually | character coding system in local use is based on conceptually | |||
| different assumptions than those used by Unicode about, e.g., about | different assumptions than those used by Unicode about, e.g., font | |||
| font encodings used for publications in some Indic scripts. Those | encodings used for publications in some Indic scripts. Those | |||
| differences may not easily yield unambiguous conversions or | differences may not easily yield unambiguous conversions or | |||
| interpretations even if each coding system is internally consistent | interpretations even if each coding system is internally consistent | |||
| and adequate to represent the local language and script. | and adequate to represent the local language and script. | |||
| 2. Processing in IDNA2008 | 2. Processing in IDNA2008 | |||
| These specifications separate Domain Name Registration and Lookup in | These specifications separate Domain Name Registration and Lookup in | |||
| the protocol specification. Doing so reflects current practice in | the protocol specification. Doing so reflects current practice in | |||
| which per-registry restrictions and special processing are applied at | which per-registry restrictions and special processing are applied at | |||
| registration time but not during lookup. Even more important in the | registration time but not during lookup. Even more important in the | |||
| skipping to change at page 10, line 24 ¶ | skipping to change at page 11, line 37 ¶ | |||
| 3.1. A Tiered Model of Permitted Characters and Labels | 3.1. A Tiered Model of Permitted Characters and Labels | |||
| Moving to an inclusion model requires respecifying the list of | Moving to an inclusion model requires respecifying the list of | |||
| characters that are permitted in IDNs. In IDNA2003, the role and | characters that are permitted in IDNs. In IDNA2003, the role and | |||
| utility of characters are independent of context and fixed forever | utility of characters are independent of context and fixed forever | |||
| (or until the standard is replaced). Making completely context- | (or until the standard is replaced). Making completely context- | |||
| independent rules globally has proven impractical because some | independent rules globally has proven impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| no visible effect(s) in others. IDNA2003 prohibited those types of | no visible effect(s) in others. IDNA2003 prohibited those types of | |||
| characters entirely. But the restrictions were much too severe to | characters entirely. But the restrictions led to a consensus that | |||
| permit an adequate range of mnemonics for identifiers based on some | under some conditions, these "joiner" characters were legitimately | |||
| languages. The requirement to support those characters but limit | needed to allow useful mnemonics for some languages and scripts. The | |||
| their use to very specific contexts was reinforced by the observation | requirement to support those characters but limit their use to very | |||
| that handling of particular characters across the languages that use | specific contexts was reinforced by the observation that handling of | |||
| a script, or the use of similar or identical-looking characters in | particular characters across the languages that use a script, or the | |||
| different scripts, is less well understood than many people believed | use of similar or identical-looking characters in different scripts, | |||
| it was several years ago. | is more complex than many people believed it was several years ago. | |||
| Independently of the characters chosen (see next subsection), the | Independently of the characters chosen (see next subsection), the | |||
| approach is to divide the characters that appear in Unicode into | approach is to divide the characters that appear in Unicode into | |||
| three categories: | three categories: | |||
| 3.1.1. PROTOCOL-VALID | 3.1.1. PROTOCOL-VALID | |||
| Characters identified as "PROTOCOL-VALID" (often abbreviated | Characters identified as "PROTOCOL-VALID" (often abbreviated | |||
| "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | |||
| Their use may be restricted by rules about the context in which they | Their use may be restricted by rules about the context in which they | |||
| skipping to change at page 11, line 39 ¶ | skipping to change at page 13, line 7 ¶ | |||
| "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | |||
| associated with a rule. In addition, the rule will define whether it | associated with a rule. In addition, the rule will define whether it | |||
| is to be applied on lookup as well as registration. A distinction is | is to be applied on lookup as well as registration. A distinction is | |||
| made between characters that indicate or prohibit joining (known as | made between characters that indicate or prohibit joining (known as | |||
| "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | |||
| contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | |||
| former require full testing at lookup time. | former require full testing at lookup time. | |||
| 3.1.1.2. Rules and Their Application | 3.1.1.2. Rules and Their Application | |||
| The actual rules may be present or absent. If present, they may have | The actual rules may be DEFINED or NULL. If present, they may have | |||
| values of "True" (character may be used in any position in any | values of "True" (character may be used in any position in any | |||
| label), "False" (character may not be used in any label), or may be a | label), "False" (character may not be used in any label), or may be a | |||
| set of procedural rules that specify the context in which the | set of procedural rules that specify the context in which the | |||
| character is permitted. | character is permitted. | |||
| Examples of descriptions of typical rules, stated informally and in | Examples of descriptions of typical rules, stated informally and in | |||
| English, include "Must follow a character from Script XYZ", "Must | English, include "Must follow a character from Script XYZ", "Must | |||
| occur only if the entire label is in Script ABC", "Must occur only if | occur only if the entire label is in Script ABC", "Must occur only if | |||
| the previous and subsequent characters have the DFG property". | the previous and subsequent characters have the DFG property". | |||
| skipping to change at page 13, line 17 ¶ | skipping to change at page 14, line 32 ¶ | |||
| For convenience in processing and table-building, code points that do | For convenience in processing and table-building, code points that do | |||
| not have assigned values in a given version of Unicode are treated as | not have assigned values in a given version of Unicode are treated as | |||
| belonging to a special UNASSIGNED category. Such code points are | belonging to a special UNASSIGNED category. Such code points are | |||
| prohibited in labels to be registered or looked up. The category | prohibited in labels to be registered or looked up. The category | |||
| differs from DISALLOWED in that code points are moved out of it by | differs from DISALLOWED in that code points are moved out of it by | |||
| the simple expedient of being assigned in a later version of Unicode | the simple expedient of being assigned in a later version of Unicode | |||
| (at which point, they are classified into one of the other categories | (at which point, they are classified into one of the other categories | |||
| as appropriate). | as appropriate). | |||
| The rationale for restricting the processing of UNASSIGNED characters | ||||
| is simply that if such characters were permitted to be looked up, for | ||||
| example, and were later assigned, but subject to some set of | ||||
| contextual rules, un-updated instances of IDNA-aware software might | ||||
| permit lookup of labels containing the previously-unassigned | ||||
| characters while updated versions of IDNA-aware software might | ||||
| restrict their use in lookup, depending on the contextual rules. It | ||||
| should be clear that under no circumstance should an UNASSIGNED | ||||
| character be permitted in a label to be registered as part of a | ||||
| domain name. | ||||
| 3.2. Registration Policy | 3.2. Registration Policy | |||
| While these recommendations cannot and should not define registry | While these recommendations cannot and should not define registry | |||
| policies, registries should develop and apply additional restrictions | policies, registries should develop and apply additional restrictions | |||
| to reduce confusion and other problems. For example, it is generally | as needed to reduce confusion and other problems. For example, it is | |||
| believed that labels containing characters from more than one script | generally believed that labels containing characters from more than | |||
| are a bad practice although there may be some important exceptions to | one script are a bad practice although there may be some important | |||
| that principle. Some registries may choose to restrict registrations | exceptions to that principle. Some registries may choose to restrict | |||
| to characters drawn from a very small number of scripts. For many | registrations to characters drawn from a very small number of | |||
| scripts, the use of variant techniques such as those as described in | scripts. For many scripts, the use of variant techniques such as | |||
| RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for | those as described in RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and | |||
| Chinese by the tables described in RFC 4713 [RFC4713] may be helpful | illustrated for Chinese by the tables described in RFC 4713 [RFC4713] | |||
| in reducing problems that might be perceived by users. | may be helpful in reducing problems that might be perceived by users. | |||
| In general, users will benefit if registries only permit characters | In general, users will benefit if registries only permit characters | |||
| from scripts that are well-understood by the registry or its | from scripts that are well-understood by the registry or its | |||
| advisers. If a registry decides to reduce opportunities for | advisers. If a registry decides to reduce opportunities for | |||
| confusion by constructing policies that disallow characters used in | confusion by constructing policies that disallow characters used in | |||
| historic writing systems or characters whose use is restricted to | historic writing systems or characters whose use is restricted to | |||
| specialized, highly technical contexts, some relevant information may | specialized, highly technical contexts, some relevant information may | |||
| be found in Section 2.4 "Specific Character Adjustments", Table 4 | be found in Section 2.4 "Specific Character Adjustments", Table 4 | |||
| "Candidate Characters for Exclusion from Identifiers" of | "Candidate Characters for Exclusion from Identifiers" of | |||
| [Unicode-UAX31] and Section 3.1. "General Security Profile for | [Unicode-UAX31] and Section 3.1. "General Security Profile for | |||
| Identifiers" in [Unicode-Security]. | Identifiers" in [Unicode-Security]. | |||
| It is worth stressing that these principles of policy development and | It is worth stressing that these principles of policy development and | |||
| application apply at all levels of the DNS, not only, e.g., TLD | application apply at all levels of the DNS, not only, e.g., TLD or | |||
| registrations and that even a trivial, "anything permitted that is | SLD registrations and that even a trivial, "anything permitted that | |||
| valid under the protocol" policy is helpful in that it helps users | is valid under the protocol" policy is helpful in that it helps users | |||
| and application developers know what to expect. | and application developers know what to expect. | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, Applications | 3.3. Layered Restrictions: Tables, Context, Registration, Applications | |||
| The essence of the character rules in IDNA2008 is based on the | The essence of the character rules in IDNA2008 is based on the | |||
| realization that there is no single magic bullet for any of the | realization that there is no single magic bullet for any of the | |||
| issues associated with a multiscript DNS. Instead, the | issues associated with a multiscript DNS. Instead, the | |||
| specifications define a variety of approaches that, together, | specifications define a variety of approaches that, together, | |||
| constitute multiple lines of defense against ambiguity in identifiers | constitute multiple lines of defense against ambiguity in identifiers | |||
| and loss of referential integrity. The actual character tables are | and loss of referential integrity. The actual character tables are | |||
| the first mechanism, protocol rules about how those characters are | the first mechanism, protocol rules about how those characters are | |||
| applied or restricted in context are the second, and those two in | applied or restricted in context are the second, and those two in | |||
| combination constitute the limits of what can be done by a protocol | combination constitute the limits of what can be done by a protocol | |||
| alone. As discussed in the previous section (Section 3.2), | alone. As discussed in the previous section (Section 3.2), | |||
| registries are expected to restrict what they permit to be | registries are expected to restrict what they permit to be | |||
| registered, devising and using rules that are designed to optimize | registered, devising and using rules that are designed to optimize | |||
| the balance between confusion and risk on the one hand and maximum | the balance between confusion and risk on the one hand and maximum | |||
| expressiveness in mnemonics on the other. | expressiveness in mnemonics on the other. | |||
| In addition, there is an important role for user agents in warning | In addition, there is an important role for user agents in warning | |||
| against label forms that appear unreasonable given their knowledge of | against label forms that appear problematic given their knowledge of | |||
| local contexts and conventions. Of course, no approach based on | local contexts and conventions. Of course, no approach based on | |||
| naming or identifiers alone can protect against all threats. | naming or identifiers alone can protect against all threats. | |||
| 4. Issues that Constrain Possible Solutions | 4. Issues that Constrain Possible Solutions | |||
| 4.1. Display and Network Order | 4.1. Display and Network Order | |||
| The correct treatment of domain names requires a clear distinction | The correct treatment of domain names requires a clear distinction | |||
| between Network Order (the order in which the code points are sent in | between Network Order (the order in which the code points are sent in | |||
| protocols) and Display Order (the order in which the code points are | protocols) and Display Order (the order in which the code points are | |||
| displayed on a screen or paper). The order of labels in a domain | displayed on a screen or paper). The order of labels in a domain | |||
| name that contains characters that are normally written right to left | name that contains characters that are normally written right to left | |||
| is discussed in [IDNA2008-Bidi]. In particular, there are questions | is discussed in [IDNA2008-Bidi]. In particular, there are questions | |||
| about the order in which labels are displayed if left to right and | about the order in which labels are displayed if left to right and | |||
| right to left labels are adjacent to each other, especially if there | right to left labels are adjacent to each other, especially if there | |||
| skipping to change at page 16, line 41 ¶ | skipping to change at page 18, line 22 ¶ | |||
| display the A-label. | display the A-label. | |||
| In any place where a protocol or document format allows transmission | In any place where a protocol or document format allows transmission | |||
| of the characters in internationalized labels, labels should be | of the characters in internationalized labels, labels should be | |||
| transmitted using whatever character encoding and escape mechanism | transmitted using whatever character encoding and escape mechanism | |||
| the protocol or document format uses at that place. This provision | the protocol or document format uses at that place. This provision | |||
| is intended to prevent situations in which, e.g., UTF-8 domain names | is intended to prevent situations in which, e.g., UTF-8 domain names | |||
| appear embedded in text that is otherwise in some other character | appear embedded in text that is otherwise in some other character | |||
| coding. | coding. | |||
| All protocols that use domain name slots already have the capacity | All protocols that use domain name slots (See Section 2.3.1.6 | |||
| for handling domain names in the ASCII charset. Thus, A-labels can | [[anchor12: ?? Verify this]] in [IDNA2008-Defs]) already have the | |||
| inherently be handled by those protocols. | capacity for handling domain names in the ASCII charset. Thus, | |||
| A-labels can inherently be handled by those protocols. | ||||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | |||
| Character Forms | Character Forms | |||
| [[anchor13: There is some internal redundancy and repetition in the | [[anchor13: There is some internal redundancy and repetition in the | |||
| material in this section. Specific suggestions about to reduce or | material in this section. Specific suggestions about to reduce or | |||
| eliminate redundant text would be appreciated. If no such | eliminate redundant text would be appreciated. If no such | |||
| suggestions are received before -07 is posted, this not will be | suggestions are received before -07 is posted, this note will be | |||
| removed.]] | removed.]] | |||
| Users often have expectations about character matching or equivalence | Users often have expectations about character matching or equivalence | |||
| that are based on their own languages and the orthography of those | that are based on their own languages and the orthography of those | |||
| languages. These expectations may not be consistent with forms or | languages. These expectations may not be consistent with forms or | |||
| actions that can be naturally accommodated in a character coding | actions that can be naturally accommodated in a character coding | |||
| system, especially if multiple languages are written using the same | system, especially if multiple languages are written using the same | |||
| script but using different conventions. A Norwegian user might | script but using different conventions. A Norwegian user might | |||
| expect a label with the ae-ligature to be treated as the same label | expect a label with the ae-ligature to be treated as the same label | |||
| as one using the Swedish spelling with a-diaeresis even though | as one using the Swedish spelling with a-diaeresis even though | |||
| skipping to change at page 19, line 4 ¶ | skipping to change at page 20, line 33 ¶ | |||
| these situations in a system such as IDNA -- or with Unicode | these situations in a system such as IDNA -- or with Unicode | |||
| normalization generally -- since determining what to do requires | normalization generally -- since determining what to do requires | |||
| information about the language being used, context, or both. | information about the language being used, context, or both. | |||
| Consequently, these specifications make no attempt to treat these | Consequently, these specifications make no attempt to treat these | |||
| combined characters in any special way. However, their existence | combined characters in any special way. However, their existence | |||
| provides a prime example of a situation in which a registry that is | provides a prime example of a situation in which a registry that is | |||
| aware of the language context in which labels are to be registered, | aware of the language context in which labels are to be registered, | |||
| and where that language sometimes (or always) treats the two- | and where that language sometimes (or always) treats the two- | |||
| character sequences as equivalent to the combined form, should give | character sequences as equivalent to the combined form, should give | |||
| serious consideration to applying a "variant" model [RFC3743] | serious consideration to applying a "variant" model [RFC3743] | |||
| [RFC4290], or to prohibiting registration of one the forms entirely, | [RFC4290], or to prohibiting registration of one the forms entirely, | |||
| to reduce the opportunities for user confusion and fraud that would | to reduce the opportunities for user confusion and fraud that would | |||
| result from the related strings being registered to different | result from the related strings being registered to different | |||
| parties. | parties. | |||
| [[anchor14: Placeholder: A discussion of the Arabic digit issue | [[anchor14: Placeholder: A discussion of the Arabic digit issue | |||
| shoudl go here once it is resolved in some appropriate way.]] | should go here once it is resolved in some appropriate way.]] | |||
| 4.4. Case Mapping and Related Issues | 4.4. Case Mapping and Related Issues | |||
| In the DNS, ASCII letters are stored with their case preserved. | In the DNS, ASCII letters are stored with their case preserved. | |||
| Matching during the query process is case-independent, but none of | Matching during the query process is case-independent, but none of | |||
| the information that might be represented by choices of case has been | the information that might be represented by choices of case has been | |||
| lost. That model has been accidentally helpful because, as people | lost. That model has been accidentally helpful because, as people | |||
| have created DNS labels by catenating words (or parts of words) to | have created DNS labels by catenating words (or parts of words) to | |||
| form labels, case has often been used to distinguish among components | form labels, case has often been used to distinguish among components | |||
| and make the labels more memorable. | and make the labels more memorable. | |||
| skipping to change at page 21, line 24 ¶ | skipping to change at page 23, line 5 ¶ | |||
| If lookup applications, as a user interface (UI) or other local | If lookup applications, as a user interface (UI) or other local | |||
| matter, decide to warn about some strings that are valid under the | matter, decide to warn about some strings that are valid under the | |||
| global rules but that they perceive as dangerous, that is their | global rules but that they perceive as dangerous, that is their | |||
| prerogative and we can only hope that the market (and maybe | prerogative and we can only hope that the market (and maybe | |||
| regulators) will reinforce the good choices and discourage the poor | regulators) will reinforce the good choices and discourage the poor | |||
| ones. In this context, a lookup application that decides a string | ones. In this context, a lookup application that decides a string | |||
| that is valid under the protocol is dangerous and refuses to look it | that is valid under the protocol is dangerous and refuses to look it | |||
| up is in violation of the protocols; one that is willing to look | up is in violation of the protocols; one that is willing to look | |||
| something up, but warns against it, is exercising a local choice. | something up, but warns against it, is exercising a local choice. | |||
| 6. Front-end and User Interface Processing | 6. Front-end and User Interface Processing for Lookup | |||
| Domain names may be identified and processed in many contexts. They | Domain names may be identified and processed in many contexts. They | |||
| may be typed in by users either by themselves or embedded in an | may be typed in by users either by themselves or embedded in an | |||
| identifier structured for a particular protocol or class of protocols | identifier structured for a particular protocol or class of protocols | |||
| such a email addresses, URIs, or IRIs. They may occur in running | such a email addresses, URIs, or IRIs. They may occur in running | |||
| text or be processed by one system after being provided in another. | text or be processed by one system after being provided in another. | |||
| Systems may wish to try to normalize URLs so as to determine (or | Systems may wish to try to normalize URLs so as to determine (or | |||
| guess) whether a reference is valid or two references point to the | guess) whether a reference is valid or two references point to the | |||
| same object without actually looking the objects up and comparing | same object without actually looking the objects up and comparing | |||
| them (that is necessary, not just a choice, for URI types that are | them (that is necessary, not just a choice, for URI types that are | |||
| not intended to be resolved). Some of these goals may be more easily | not intended to be resolved). Some of these goals may be more easily | |||
| and reliably satisfied than others. While there are strong arguments | and reliably satisfied than others. While there are strong arguments | |||
| for any domain name that is placed "on the wire" -- transmitted | for any domain name that is placed "on the wire" -- transmitted | |||
| between systems -- to be in the minimum-ambiguity forms of A-labels, | between systems -- to be in the zero-ambiguity forms of A-labels, it | |||
| U-labels, or LDH-labels, it is inevitable that programs that process | is inevitable that programs that process domain names will encounter | |||
| domain names will encounter variant forms. | U-labels or variant forms. | |||
| One source of such forms will be labels created under IDNA2003 | One source of such forms will be labels created under IDNA2003 | |||
| because that protocol allowed labels that were transformed before | because that protocol allowed labels that were transformed from | |||
| they were turned from native-character into ACE ("xn--...") format by | native-character format by mapping some characters into others before | |||
| mapping some characters into other. One consequence of the | conversion into ACE ("xn--...") format. One consequence of the | |||
| transformations was that, when the ToUnicode and ToASCII operations | transformations was that, when the ToUnicode and ToASCII operations | |||
| of IDNA2003 were applied, ToUnicode(ToASCII(original-label)) often | of IDNA2003 were applied, ToUnicode(ToASCII(original-label)) often | |||
| did not produce the original label. IDNA2008 explicitly defines | did not produce the original label. IDNA2008 explicitly defines | |||
| A-labels and U-labels as different forms of the same abstract label, | A-labels and U-labels as different forms of the same abstract label, | |||
| forms that are stable when conversions are performed between them, | forms that are stable when conversions are performed between them | |||
| without mappings. A different way of explaining this is that there | (without mappings). A different way of explaining this is that there | |||
| are, today, domain names in files on the Internet that use characters | are, today, domain names in files on the Internet that use characters | |||
| that cannot be represented directly in, or recovered from, (A-label) | that cannot be represented directly in, or recovered from, (A-label) | |||
| domain names but for which interpretations are provided by IDNA2003. | domain names but for which interpretations are provided by IDNA2003. | |||
| There are two major categories of such characters, those that are | There are two major categories of such characters, those that are | |||
| removed by NFKC normalization and those upper-case characters that | removed by NFKC normalization and those upper-case characters that | |||
| are mapped to lower-case (there are also a few characters that are | are mapped to lower-case (there are also a few characters that are | |||
| given special-case mapping treatment in Stringprep including lower- | given special-case mapping treatment in Stringprep, including lower- | |||
| case characters that are case-folded into other lower-case characters | case characters that are case-folded into other lower-case characters | |||
| or strings). | or strings). | |||
| Other issues in domain name identification and processing arise | Other issues in domain name identification and processing arise | |||
| because IDNA2003 specified that several other characters be treated | because IDNA2003 specified that several other characters be treated | |||
| as equivalent to the ASCII period (dot, full stop) character used as | as equivalent to the ASCII period (dot, full stop) character used as | |||
| a label separator. If a string that might be a domain name appears | a label separator. If a string that might be a domain name appears | |||
| in an arbitrary context (such as running text), it is difficult, even | in an arbitrary context (such as running text), it is difficult, even | |||
| with only ASCII characters, to know whether an actual domain name (or | with only ASCII characters, to know whether an actual domain name (or | |||
| a protocol parameter like a URI) is present and where it starts and | a protocol parameter like a URI) is present and where it starts and | |||
| skipping to change at page 23, line 22 ¶ | skipping to change at page 24, line 48 ¶ | |||
| o Highly Localized Preprocessing. | o Highly Localized Preprocessing. | |||
| Unlike the case above, there will be some situations in which | Unlike the case above, there will be some situations in which | |||
| software will be highly localized for a particular environment and | software will be highly localized for a particular environment and | |||
| carefully adapted to the expectations of users in that | carefully adapted to the expectations of users in that | |||
| environment. The many discussions about using the Internet to | environment. The many discussions about using the Internet to | |||
| preserve and support local cultures suggest that these cases may | preserve and support local cultures suggest that these cases may | |||
| be more common in the future than they have been so far. | be more common in the future than they have been so far. | |||
| In these cases, we should avoid trying to tell implementers what | In these cases, we should avoid trying to tell implementers what | |||
| they should do, if only because they are quite likely (and for | they should accept, if only because they are quite likely (and for | |||
| good reason) to ignore us. We would assume that they would map | good reason) to ignore us. We would assume that they would map | |||
| characters that the intuitions of their users would suggest be | characters that the intuitions of their users would suggest be | |||
| mapped and would hope that they would do that mapping as early as | mapped and would hope that they would do that mapping as early as | |||
| possible, storing A-label or U-label forms in files and | possible, storing A-label or U-label forms in files and | |||
| transporting only those forms between systems. One can imagine | transporting only those forms between systems. One can imagine | |||
| switches about whether some sorts of mappings occur, warnings | switches about whether some sorts of mappings occur, warnings | |||
| before applying them or, in a slightly more extreme version of the | before applying them or, in a slightly more extreme version of the | |||
| approach taken in Internet Explorer version 7 (IE7), systems that | approach taken in Internet Explorer version 7 (IE7), systems that | |||
| utterly refuse to handle "strange" characters at all if they | utterly refuse to handle "strange" characters at all if they | |||
| appear in U-label form. None of those local decisions are a | appear in U-label form. None of those local decisions are a | |||
| skipping to change at page 24, line 8 ¶ | skipping to change at page 25, line 34 ¶ | |||
| globally or compare equal when crude methods (i.e., those not | globally or compare equal when crude methods (i.e., those not | |||
| conforming to the strict definition of label equivalence given in | conforming to the strict definition of label equivalence given in | |||
| [IDNA2008-Defs]) are used are those in which all native-script labels | [IDNA2008-Defs]) are used are those in which all native-script labels | |||
| are in U-label form. Forms that assume mapping will occur, | are in U-label form. Forms that assume mapping will occur, | |||
| especially forms that were not valid under IDNA2003, may or may not | especially forms that were not valid under IDNA2003, may or may not | |||
| function in predictable ways across all implementations. | function in predictable ways across all implementations. | |||
| User interfaces involving Latin-based scripts should take special | User interfaces involving Latin-based scripts should take special | |||
| care when considering how to handle case mapping because small | care when considering how to handle case mapping because small | |||
| differences in label strings may cause behavior that is astonishing | differences in label strings may cause behavior that is astonishing | |||
| to users. Because case-insensitive mapping is done for ASCII strings | to users. Because case-insensitive comparison is done for ASCII | |||
| by DNS-servers, an all-ASCII label is treated as case-insensitive. | strings by DNS-servers, an all-ASCII label is treated as case- | |||
| However, if even one of the characters of that string is replaced by | insensitive. However, if even one of the characters of that string | |||
| one that requires the label to be given IDN treatment (e.g., by | is replaced by one that requires the label to be given IDN treatment | |||
| adding a diacritical mark), then the label immediately becomes case- | (e.g., by adding a diacritical mark), then the label effectively | |||
| sensitive. This suggests that case mapping for Latin-based scripts | becomes case-sensitive because only lower-case characters are | |||
| (and possibly other scripts with case distinctions) as a | permitted in IDNs. This suggests that case mapping for Latin-based | |||
| scripts (and possibly other scripts with case distinctions) as a | ||||
| preprocessing matter in applications may be wise to prevent user | preprocessing matter in applications may be wise to prevent user | |||
| astonishment, but, since all applications may not do this and | astonishment, but, since all applications may not do this and | |||
| ambiguity in transport is not desirable, the that case-dependent | ambiguity in transport is not desirable, the that case-dependent | |||
| forms should not be stored in files. | forms should not be stored in files. | |||
| The comments above apply only in operations that look up names or | ||||
| interpret files. There are several reasons why registration | ||||
| activities should require final names and verification of those names | ||||
| by the would-be registrant. | ||||
| 7. Migration from IDNA2003 and Unicode Version Synchronization | 7. Migration from IDNA2003 and Unicode Version Synchronization | |||
| 7.1. Design Criteria | 7.1. Design Criteria | |||
| As mentioned above and in RFC 4690, two key goals of the IDNA2008 | As mentioned above and in RFC 4690, two key goals of the IDNA2008 | |||
| design are to enable applications to be agnostic about whether they | design are to enable applications to be agnostic about whether they | |||
| are being run in environments supporting any Unicode version from 3.2 | are being run in environments supporting any Unicode version from 3.2 | |||
| onward and to permit incrementally adding new characters, character | onward and to permit incrementally adding new characters, character | |||
| groups, scripts, and other character collections as they are | groups, scripts, and other character collections as they are | |||
| incorporated into Unicode, without disruption and, in the long term, | incorporated into Unicode, without disruption and, in the long term, | |||
| skipping to change at page 24, line 49 ¶ | skipping to change at page 26, line 34 ¶ | |||
| 7.1.1. General IDNA Validity Criteria | 7.1.1. General IDNA Validity Criteria | |||
| The general criteria for a putative label, and the collection of | The general criteria for a putative label, and the collection of | |||
| characters that make it up, to be considered IDNA-valid are (the | characters that make it up, to be considered IDNA-valid are (the | |||
| actual rules are rigorously defined in the "Protocol" and "Tables" | actual rules are rigorously defined in the "Protocol" and "Tables" | |||
| documents): | documents): | |||
| o The characters are "letters", marks needed to form letters, | o The characters are "letters", marks needed to form letters, | |||
| numerals, or other code points used to write words in some | numerals, or other code points used to write words in some | |||
| language. Symbols, drawing characters, and various notational | language. Symbols, drawing characters, and various notational | |||
| characters are permanently excluded -- some because they are | characters are intended to be permanently excluded -- some because | |||
| actively dangerous in URI, IRI, or similar contexts and others | they are harmful in URI, IRI, or similar contexts (e.g., | |||
| because there is no evidence that they are important enough to | characters that appear to be slashes or other reserved URI | |||
| Internet operations or internationalization to justify expansion | punctuation) and others because there is no evidence that they are | |||
| of domain names beyond the general principle of "letters, digits, | important enough to Internet operations or internationalization to | |||
| and hyphen" and the complexities that would come with it | justify expansion of domain names beyond the general principle of | |||
| (additional discussion and rationale for the symbol decision | "letters, digits, and hyphen" and the complexities that would come | |||
| appears in Section 7.6). | with it (additional discussion and rationale for the symbol | |||
| decision appears in Section 7.6). | ||||
| o Other than in very exceptional cases, e.g., where they are needed | o Other than in very exceptional cases, e.g., where they are needed | |||
| to write substantially any word of a given language, punctuation | to write substantially any word of a given language, punctuation | |||
| characters are excluded as well. The fact that a word exists is | characters are excluded as well. The fact that a word exists is | |||
| not proof that it should be usable in a DNS label and DNS labels | not proof that it should be usable in a DNS label and DNS labels | |||
| are not expected to be usable for multiple-word phrases (although | are not expected to be usable for multiple-word phrases (although | |||
| they are certainly not prohibited if the conventions and | they are certainly not prohibited if the conventions and | |||
| orthography of a particular language cause that to be possible). | orthography of a particular language cause that to be possible). | |||
| Even for English, very common constructions -- contractions like | Even for English, very common constructions -- contractions like | |||
| "don't" or "it's", names that are written with apostrophes such as | "don't" or "it's", names that are written with apostrophes such as | |||
| "O'Reilly", or characters for which apostrophes are common | "O'Reilly", or characters for which apostrophes are common | |||
| substitutes cannot be represented in DNS labels. Words in English | substitutes cannot be represented in DNS labels. Words in English | |||
| whose usually-preferred spellings include diacritical marks cannot | whose usually-preferred spellings include diacritical marks cannot | |||
| be represented under the original hostname rules, but most can be | be represented under the original hostname rules, but most can be | |||
| represented if treated as IDNs. | represented if treated as IDNs. | |||
| o Characters that are unassigned (have no character assignment at | o Characters that are unassigned (have no character assignment at | |||
| all) in the version of Unicode being used by the registry or | all) in the version of Unicode being used by the registry or | |||
| application are not permitted, even on lookup. There are at least | application are not permitted, even on lookup. The issues | |||
| two reasons for this. | involved in this decision are discussed in Section 7.7. | |||
| * Tests involving the context of characters (e.g., some | ||||
| characters being permitted only adjacent to ones of specific | ||||
| types but otherwise invisible or very problematic for other | ||||
| reasons) and integrity tests on complete labels are needed. | ||||
| Unassigned code points cannot be permitted because one cannot | ||||
| determine whether particular code points will require | ||||
| contextual rules (and what those rules should be) before | ||||
| characters are assigned to them and the properties of those | ||||
| characters fully understood. | ||||
| * Unicode specifies that an unassigned code point normalizes (and | ||||
| case folds) to itself. If the code point is later assigned to | ||||
| a character, and particularly if the newly-assigned code point | ||||
| has a combining class that determines its placement relative to | ||||
| other combining characters, it could normalize to some other | ||||
| code point or sequence, creating confusion and/or violating | ||||
| other rules listed here. | ||||
| o Any character that is mapped to another character by a current | o Any character that is mapped to another character by a current | |||
| version of NFKC is prohibited as input to IDNA (for either | version of NFKC is prohibited as input to IDNA (for either | |||
| registration or lookup). With a few exceptions, this principle | registration or lookup). With a few exceptions, this principle | |||
| excludes any character mapped to another by Nameprep [RFC3491]. | excludes any character mapped to another by Nameprep [RFC3491]. | |||
| Tables used to identify the characters that are IDNA-valid are | Tables used to identify the characters that are IDNA-valid are | |||
| expected to be driven by the principles above, principles that are | expected to be driven by the principles above, principles that are | |||
| specified exactly in [IDNA2008-Tables]). The rules given there are | specified exactly in [IDNA2008-Tables]). The rules given there are | |||
| normative, rather than being just an interpretation of the tables. | normative, rather than being just an interpretation of the tables. | |||
| skipping to change at page 29, line 38 ¶ | skipping to change at page 31, line 7 ¶ | |||
| In principle, lookup applications could also compensate for the | In principle, lookup applications could also compensate for the | |||
| difference in interpretation by looking up the string according to | difference in interpretation by looking up the string according to | |||
| the interpretation specified in these documents and then, if that | the interpretation specified in these documents and then, if that | |||
| failed, doing the lookup with the mapping, simulating the IDNA2003 | failed, doing the lookup with the mapping, simulating the IDNA2003 | |||
| interpretation. The risk of false positives is such that this is | interpretation. The risk of false positives is such that this is | |||
| generally to be discouraged unless the application is able to engage | generally to be discouraged unless the application is able to engage | |||
| in a "is this what you meant" dialogue with the end user. | in a "is this what you meant" dialogue with the end user. | |||
| 7.3. More Flexibility in User Agents | 7.3. More Flexibility in User Agents | |||
| These specifications do not include mappings between one character or | These documents do not specify mappings between one character or code | |||
| code point and others for any reason. Instead, they prohibit the | point and others for any reason. Instead, they prohibit the | |||
| characters that would be mapped to others by normalization, upper | characters that would be mapped to others by normalization, upper | |||
| case to lower case changes, or other rules. As examples, while | case to lower case changes, or other rules. As examples, while | |||
| mathematical characters based on Latin ones are accepted as input to | mathematical characters based on Latin ones are accepted as input to | |||
| IDNA2003, they are prohibited in IDNA2008. Similarly, double-width | IDNA2003, they are prohibited in IDNA2008. Similarly, double-width | |||
| characters and other variations are prohibited as IDNA input. | characters and other variations are prohibited as IDNA input. | |||
| Since the rules in [IDNA2008-Tables] have the effect that only | Since the rules in [IDNA2008-Tables] have the effect that only | |||
| strings that are not transformed by NFKC are valid, if an application | strings that are not transformed by NFKC are valid, if an application | |||
| chooses to perform NFKC normalization before lookup, that operation | chooses to perform NFKC normalization before lookup, that operation | |||
| is safe since this will never make the application unable to look up | is safe since this will never make the application unable to look up | |||
| skipping to change at page 32, line 24 ¶ | skipping to change at page 33, line 39 ¶ | |||
| 2. Adjustments in IDNA tables or actions, including normalization | 2. Adjustments in IDNA tables or actions, including normalization | |||
| definitions, that affect characters that were already invalid | definitions, that affect characters that were already invalid | |||
| under IDNA2003. | under IDNA2003. | |||
| 3. Changes in the style of the IDNA definition that does not alter | 3. Changes in the style of the IDNA definition that does not alter | |||
| the actions performed by IDNA. | the actions performed by IDNA. | |||
| 7.4.3. Implications of Prefix Changes | 7.4.3. Implications of Prefix Changes | |||
| While it might be possible to make a prefix change, the costs of such | While it might be possible to make a prefix change, the costs of such | |||
| a change are considerable. Even if they wanted to do so, all | a change are considerable. Even if they wanted to do so, registries | |||
| registries could not convert all IDNA2003 ("xn--") registrations to a | could not convert all IDNA2003 ("xn--") registrations to a new form | |||
| new form at the same time and synchronize that change with | at the same time and synchronize that change with applications | |||
| applications supporting lookup. Unless all existing registrations | supporting lookup. Unless all existing registrations were simply to | |||
| were simply to be declared invalid (and perhaps even then) systems | be declared invalid (and perhaps even then) systems that needed to | |||
| that needed to support both labels with old prefixes and labels with | support both labels with old prefixes and labels with new ones would | |||
| new ones would first process a putative label under the IDNA2008 | first process a putative label under the IDNA2008 rules and try to | |||
| rules and try to look it up and then, if it were not found, would | look it up and then, if it were not found, would process the label | |||
| process the label under IDNA2003 rules and look it up again. That | under IDNA2003 rules and look it up again. That process could | |||
| process could significantly slow down all processing that involved | significantly slow down all processing that involved IDNs in the DNS | |||
| IDNs in the DNS especially since, in principle, a fully-qualified | especially since, in principle, a fully-qualified name could contain | |||
| name could contain a mixture of labels that were registered with the | a mixture of labels that were registered with the old and new | |||
| old and new prefixes, a situation that would make the use of DNS | prefixes, a situation that would make the use of DNS caching very | |||
| caching very difficult. In addition, looking up the same input | difficult. In addition, looking up the same input string as two | |||
| string as two separate A-labels would create some potential for | separate A-labels would create some potential for confusion and | |||
| confusion and attacks, since they could, in principle, map to | attacks, since they could, in principle, map to different targets and | |||
| different targets and then resolve to different entries in the DNS. | then resolve to different entries in the DNS. | |||
| Consequently, a prefix change is to be avoided if at all possible, | Consequently, a prefix change is to be avoided if at all possible, | |||
| even if it means accepting some IDNA2003 decisions about character | even if it means accepting some IDNA2003 decisions about character | |||
| distinctions as irreversible and/or giving special treatment to edge | distinctions as irreversible and/or giving special treatment to edge | |||
| cases. | cases. | |||
| 7.5. Stringprep Changes and Compatibility | 7.5. Stringprep Changes and Compatibility | |||
| The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | |||
| profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | |||
| skipping to change at page 34, line 13 ¶ | skipping to change at page 35, line 30 ¶ | |||
| read such a logo as "I love..." or "I heart...", considerable | read such a logo as "I love..." or "I heart...", considerable | |||
| knowledge of the coding distinctions made in Unicode is needed to | knowledge of the coding distinctions made in Unicode is needed to | |||
| know that there more than one "heart" character (e.g., U+2665, | know that there more than one "heart" character (e.g., U+2665, | |||
| U+2661, and U+2765) and how to describe it. These issues are of | U+2661, and U+2765) and how to describe it. These issues are of | |||
| particular importance if strings are expected to be understood or | particular importance if strings are expected to be understood or | |||
| transcribed by the listener after being read out loud. | transcribed by the listener after being read out loud. | |||
| [[anchor20: The above paragraph remains controversial as to | [[anchor20: The above paragraph remains controversial as to | |||
| whether it is valid. The WG will need to make a decision if this | whether it is valid. The WG will need to make a decision if this | |||
| section is not dropped entirely.]] | section is not dropped entirely.]] | |||
| o Consider the case of a screen reader used by blind Internet users | ||||
| who must listen to renderings of IDN domain names and possibly | ||||
| reproduce them on the keyboard. | ||||
| o As a simplified example of this, assume one wanted to use a | o As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| those names are ambiguous in the Unicode system of naming (the | those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know -- absent careful study of | would-be registrant has no way to know -- absent careful study of | |||
| the code tables -- whether it is ambiguous (e.g., where there are | the code tables -- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| the hypothetical label doesn't know whether to read it -- try to | the hypothetical label doesn't know whether to read it -- try to | |||
| transmit it to a colleague by voice -- as "heart", as "love", as | transmit it to a colleague by voice -- as "heart", as "love", as | |||
| "black heart", or as any of the other examples below. | "black heart", or as any of the other examples below. | |||
| skipping to change at page 35, line 11 ¶ | skipping to change at page 36, line 32 ¶ | |||
| In IDNA2003, labels containing unassigned code points are looked up | In IDNA2003, labels containing unassigned code points are looked up | |||
| on the assumption that, if they appear in labels and can be mapped | on the assumption that, if they appear in labels and can be mapped | |||
| and then resolved, the relevant standards must have changed and the | and then resolved, the relevant standards must have changed and the | |||
| registry has properly allocated only assigned values. | registry has properly allocated only assigned values. | |||
| In the protocol as described in these documents, strings containing | In the protocol as described in these documents, strings containing | |||
| unassigned code points must not be either looked up or registered. | unassigned code points must not be either looked up or registered. | |||
| There are several reasons for this, with the most important ones | There are several reasons for this, with the most important ones | |||
| being: | being: | |||
| o It cannot be known with sufficient reliability in advance that a | o It cannot be known in advance, and with sufficient reliability, | |||
| code point that was not previously assigned will not be assigned | that a code point that was not previously assigned will not be | |||
| to a compatibility character or one that would be otherwise | assigned to a compatibility character or one that would be | |||
| disallowed by the rules in [IDNA2008-Tables]. In IDNA2003, since | otherwise disallowed by the rules in [IDNA2008-Tables]. In | |||
| there is no direct dependency on NFKC (Stringprep's tables are | IDNA2003, since there is no direct dependency on NFKC | |||
| based on NFKC, but IDNA2003 depends only on Stringprep), | (Stringprep's tables are based on NFKC, but IDNA2003 depends only | |||
| allocation of a compatibility character might produce some odd | on Stringprep), allocation of a compatibility character might | |||
| situations, but it would not be a problem. In IDNA2008, where | produce some odd situations, but it would not be a problem. In | |||
| compatibility characters are generally assigned to DISALLOWED, | IDNA2008, where compatibility characters are assigned to | |||
| DISALLOWED unless character-specific exceptions are made, | ||||
| permitting strings containing unassigned characters to be looked | permitting strings containing unassigned characters to be looked | |||
| up would permit violating the principle that characters in | up would permit violating the principle that characters in | |||
| DISALLOWED are not looked up. | DISALLOWED are not looked up. | |||
| o The Unicode Standard specifies that an unassigned code point | ||||
| normalizes (and, where relevant, case folds) to itself. If the | ||||
| code point is later assigned to a character, and particularly if | ||||
| the newly-assigned code point has a combining class that | ||||
| determines its placement relative to other combining characters, | ||||
| it could normalize to some other code point or sequence, creating | ||||
| confusion and/or violating other rules listed here. | ||||
| o Tests involving the context of characters (e.g., some characters | ||||
| being permitted only adjacent to ones of specific types but | ||||
| otherwise invisible or very problematic for other reasons) and | ||||
| integrity tests on complete labels are needed. Unassigned code | ||||
| points cannot be permitted because one cannot determine whether | ||||
| particular code points will require contextual rules (and what | ||||
| those rules should be) before characters are assigned to them and | ||||
| the properties of those characters fully understood. | ||||
| o More generally, the status of an unassigned character with regard | o More generally, the status of an unassigned character with regard | |||
| to the DISALLOWED and PROTOCOL-VALID categories, and whether | to the DISALLOWED and PROTOCOL-VALID categories, and whether | |||
| contextual rules are required with the latter, cannot be evaluated | contextual rules are required with the latter, cannot be evaluated | |||
| until a character is actually assigned and known. By contrast, | until a character is actually assigned and known. By contrast, | |||
| characters that are actually DISALLOWED are placed in that | characters that are actually DISALLOWED are placed in that | |||
| category only as a consequence of rules applied to known | category only as a consequence of rules applied to known | |||
| properties or per-character evaluation. | properties or per-character evaluation. | |||
| Another way to look at this is that permitting an unassigned | ||||
| character to be looked up is nearly equivalent to reclassifying a | ||||
| character from DISALLOWED to PROTOCOL-VALID since different systems | ||||
| will interpret the character in different ways. | ||||
| It is possible to argue that the issues above are not important and | It is possible to argue that the issues above are not important and | |||
| that, as a consequence, it is better to retain the principle of | that, as a consequence, it is better to retain the principle of | |||
| looking up labels even if they contain unassigned characters because | looking up labels even if they contain unassigned characters because | |||
| all of the important scripts and characters have been coded as of | all of the important scripts and characters have been coded as of | |||
| Unicode 5.1 and hence unassigned code points will be assigned only to | Unicode 5.1 and hence unassigned code points will be assigned only to | |||
| obscure characters or archaic scripts. Unfortunately, that does not | obscure characters or archaic scripts. Unfortunately, that does not | |||
| appear to be a safe assumption for at least two reasons. First, much | appear to be a safe assumption for at least two reasons. First, much | |||
| the same claim of completeness has been made for earlier versions of | the same claim of completeness has been made for earlier versions of | |||
| Unicode. The reality is that a script that is obscure to much of the | Unicode. The reality is that a script that is obscure to much of the | |||
| world may still be very important to those who use it. Cultural and | world may still be very important to those who use it. Cultural and | |||
| skipping to change at page 36, line 18 ¶ | skipping to change at page 38, line 12 ¶ | |||
| containing that character but that is otherwise in ASCII is not | containing that character but that is otherwise in ASCII is not | |||
| really an IDN (in the U-label sense defined above) at all. After | really an IDN (in the U-label sense defined above) at all. After | |||
| Nameprep maps the Eszett out, the result is an ASCII string and so | Nameprep maps the Eszett out, the result is an ASCII string and so | |||
| does not get an xn-- prefix, but the string that can be displayed to | does not get an xn-- prefix, but the string that can be displayed to | |||
| a user appears to be an IDN. The newer version of the protocol | a user appears to be an IDN. The newer version of the protocol | |||
| eliminates this artifact. A character is either permitted as itself | eliminates this artifact. A character is either permitted as itself | |||
| or it is prohibited; special cases that make sense only in a | or it is prohibited; special cases that make sense only in a | |||
| particular linguistic or cultural context can be dealt with as | particular linguistic or cultural context can be dealt with as | |||
| localization matters where appropriate. | localization matters where appropriate. | |||
| 8. Acknowledgments | 8. Name Server Considerations | |||
| The editor and contributors would like to express their thanks to | 8.1. Processing Non-ASCII Strings | |||
| those who contributed significant early (pre-WG) review comments, | ||||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | ||||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | ||||
| were incorporated from suggestions, text, or comments about sections | ||||
| that were unclear supplied by Frank Ellerman, Michael Everson, Asmus | ||||
| Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, | ||||
| although, as usual, they bear little or no responsibility for the | ||||
| conclusions the editor and contributors reached after receiving their | ||||
| suggestions. Thanks are also due to Vint Cerf, Debbie Garside, and | ||||
| Jefsey Morphin for conversations that led to considerable | ||||
| improvements in the content of this document. | ||||
| A meeting was held on 30 January 2008 to attempt to reconcile | Existing DNS servers do not know the IDNA rules for handling non- | |||
| differences in perspective and terminology about this set of | ASCII forms of IDNs, and therefore need to be shielded from them. | |||
| specifications between the design team and members of the Unicode | All existing channels through which names can enter a DNS server | |||
| Technical Consortium. The discussions at and subsequent to that | database (for example, master files (as described in RFC 1034) and | |||
| meeting were very helpful in focusing the issues and in refining the | DNS update messages [RFC2136]) are IDN-unaware because they predate | |||
| specifications. The active participants at that meeting were (in | IDNA. Other sections of this document provide the needed shielding | |||
| alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | by ensuring that internationalized domain names entering DNS server | |||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | databases through such channels have already been converted to their | |||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | equivalent ASCII A-label forms. | |||
| Michel Suignard, and Ken Whistler. We express our thanks to Google | ||||
| for support of that meeting and to the participants for their | ||||
| contributions. | ||||
| Useful comments and text on the WG versions of the draft were | Because of the distinction made between the algorithms for | |||
| received from many participants in the IETF "IDNABIS" WG and a number | Registration and Lookup in [IDNA2008-Protocol] (a domain name | |||
| of document changes resulted from mailing list discussions made by | containing only ASCII codepoints can not be converted to an A-label), | |||
| that group. Marcos Sanz provided specific analysis and suggestions | there can not be more than one A-label form for any given U-label. | |||
| that were exceptionally helpful in refining the text, as did Vint | ||||
| Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan. | ||||
| 9. Contributors | As specified in RFC 2181 [RFC2181], the DNS protocol explicitly | |||
| allows domain labels to contain octets beyond the ASCII range | ||||
| (0000..007F), and this document does not change that. Note, however, | ||||
| that there is no defined interpretation of octets 0080..00FF as | ||||
| characters. If labels containing these octets are returned to | ||||
| applications, unpredictable behavior could result. The A-label form, | ||||
| which cannot contain those characters, is the only standard | ||||
| representation for internationalized labels in the DNS protocol. | ||||
| While the listed editor held the pen, this core of this document and | 8.2. DNSSEC Authentication of IDN Domain Names | |||
| the initial WG version represents the joint work and conclusions of | ||||
| an ad hoc design team consisting of the editor and, in alphabetic | ||||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | ||||
| In addition, there were many specific contributions and helpful | ||||
| comments from those listed in the Acknowledgments section and others | ||||
| who have contributed to the development and use of the IDNA | ||||
| protocols. | ||||
| 10. Internationalization Considerations | DNS Security (DNSSEC) [RFC2535] is a method for supplying | |||
| cryptographic verification information along with DNS messages. | ||||
| Public Key Cryptography is used in conjunction with digital | ||||
| signatures to provide a means for a requester of domain information | ||||
| to authenticate the source of the data. This ensures that it can be | ||||
| traced back to a trusted source, either directly or via a chain of | ||||
| trust linking the source of the information to the top of the DNS | ||||
| hierarchy. | ||||
| IDNA specifies that all internationalized domain names served by DNS | ||||
| servers that cannot be represented directly in ASCII MUST use the | ||||
| A-label form. Conversion to A-labels MUST be performed prior to a | ||||
| zone being signed by the private key for that zone. Because of this | ||||
| ordering, it is important to recognize that DNSSEC authenticates a | ||||
| domain name containing A-labels or conventional LDH-labels, not | ||||
| U-labels. In the presence of DNSSEC, no form of a zone file or query | ||||
| response that contains a U-label may be signed or the signature | ||||
| validated. | ||||
| One consequence of this for sites deploying IDNA in the presence of | ||||
| DNSSEC is that any special purpose proxies or forwarders used to | ||||
| transform user input into IDNs must be earlier in the lookup flow | ||||
| than DNSSEC authenticating nameservers for DNSSEC to work. | ||||
| 8.3. Root and other DNS Server Considerations | ||||
| IDNs in A-label form will generally be somewhat longer than current | ||||
| domain names, so the bandwidth needed by the root servers is likely | ||||
| to go up by a small amount. Also, queries and responses for IDNs | ||||
| will probably be somewhat longer than typical queries historically, | ||||
| so EDNS0 [RFC2671] support may be more important (otherwise, queries | ||||
| and responses may be forced to go to TCP instead of UDP). | ||||
| 9. Internationalization Considerations | ||||
| DNS labels and fully-qualified domain names provide mnemonics that | DNS labels and fully-qualified domain names provide mnemonics that | |||
| assist in identifying and referring to resources on the Internet. | assist in identifying and referring to resources on the Internet. | |||
| IDNs expand the range of those mnemonics to include those based on | IDNs expand the range of those mnemonics to include those based on | |||
| languages and character sets other than Western European and Roman- | languages and character sets other than Western European and Roman- | |||
| derived ones. But domain "names" are not, in general, words in any | derived ones. But domain "names" are not, in general, words in any | |||
| language. The recommendations of the IETF policy on character sets | language. The recommendations of the IETF policy on character sets | |||
| and languages, BCP 18 [RFC2277] are applicable to situations in which | and languages, BCP 18 [RFC2277] are applicable to situations in which | |||
| language identification is used to provide language-specific | language identification is used to provide language-specific | |||
| contexts. The DNS is, by contrast, global and international and | contexts. The DNS is, by contrast, global and international and | |||
| ultimately has nothing to do with languages. Adding languages (or | ultimately has nothing to do with languages. Adding languages (or | |||
| similar context) to IDNs generally, or to DNS matching in particular, | similar context) to IDNs generally, or to DNS matching in particular, | |||
| would imply context dependent matching in DNS, which would be a very | would imply context dependent matching in DNS, which would be a very | |||
| significant change to the DNS protocol itself. It would also imply | significant change to the DNS protocol itself. It would also imply | |||
| that users would need to identify the language associated with a | that users would need to identify the language associated with a | |||
| particular label in order to look that label up, a decision that | particular label in order to look that label up, a decision that | |||
| would be impossible in many or most cases. | would be impossible in many or most cases. | |||
| 11. IANA Considerations | 10. IANA Considerations | |||
| This section gives an overview of registries required for IDNA. The | This section gives an overview of registries required for IDNA. The | |||
| actual definitions of the first two appear in [IDNA2008-Tables]. | actual definitions of the first two appear in [IDNA2008-Tables]. | |||
| 11.1. IDNA Character Registry | 10.1. IDNA Character Registry | |||
| The distinction among the three major categories "UNASSIGNED", | The distinction among the three major categories "UNASSIGNED", | |||
| "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and | "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and | |||
| rules that are integral elements of [IDNA2008-Tables]. Convenience | rules that are integral elements of [IDNA2008-Tables]. Convenience | |||
| in programming and validation requires a registry of characters and | in programming and validation requires a registry of characters and | |||
| scripts and their categories, updated for each new version of Unicode | scripts and their categories, updated for each new version of Unicode | |||
| and the characters it contains. The details of this registry are | and the characters it contains. The details of this registry are | |||
| specified in [IDNA2008-Tables]. | specified in [IDNA2008-Tables]. | |||
| 11.2. IDNA Context Registry | 10.2. IDNA Context Registry | |||
| For characters that are defined in the IDNA Character Registry list | For characters that are defined in the IDNA Character Registry list | |||
| as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of | as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of | |||
| rule described in Section 3.1.1.1), IANA will create and maintain a | rule described in Section 3.1.1.1), IANA will create and maintain a | |||
| list of approved contextual rules. The details for those rules | list of approved contextual rules. The details for those rules | |||
| appear in [IDNA2008-Tables]. | appear in [IDNA2008-Tables]. | |||
| 11.3. IANA Repository of IDN Practices of TLDs | 10.3. IANA Repository of IDN Practices of TLDs | |||
| This registry, historically described as the "IANA Language Character | This registry, historically described as the "IANA Language Character | |||
| Set Registry" or "IANA Script Registry" (both somewhat misleading | Set Registry" or "IANA Script Registry" (both somewhat misleading | |||
| terms) is maintained by IANA at the request of ICANN. It is used to | terms) is maintained by IANA at the request of ICANN. It is used to | |||
| provide a central documentation repository of the IDN policies used | provide a central documentation repository of the IDN policies used | |||
| by top level domain (TLD) registries who volunteer to contribute to | by top level domain (TLD) registries who volunteer to contribute to | |||
| it and is used in conjunction with ICANN Guidelines for IDN use. | it and is used in conjunction with ICANN Guidelines for IDN use. | |||
| It is not an IETF-managed registry and, while the protocol changes | It is not an IETF-managed registry and, while the protocol changes | |||
| specified here may call for some revisions to the tables, these | specified here may call for some revisions to the tables, these | |||
| specifications have no direct effect on that registry and no IANA | specifications have no direct effect on that registry and no IANA | |||
| action is required as a result. | action is required as a result. | |||
| 12. Security Considerations | 11. Security Considerations | |||
| 12.1. General Security Issues with IDNA | 11.1. General Security Issues with IDNA | |||
| This document in the IDNA2008 series is purely explanatory and | This document in the IDNA2008 series is purely explanatory and | |||
| informational and consequently introduces no new security issues. It | informational and consequently introduces no new security issues. It | |||
| would, of course, be a poor idea for someone to try to implement from | would, of course, be a poor idea for someone to try to implement from | |||
| it; such an attempt would almost certainly lead to interoperability | it; such an attempt would almost certainly lead to interoperability | |||
| problems and might lead to security ones. A discussion of security | problems and might lead to security ones. A discussion of security | |||
| issues with IDNA, including some relevant history, appears in | issues with IDNA, including some relevant history, appears in | |||
| [IDNA2008-Defs]. | [IDNA2008-Defs]. | |||
| 13. References | 12. Acknowledgments | |||
| 13.1. Normative References | The editor and contributors would like to express their thanks to | |||
| those who contributed significant early (pre-WG) review comments, | ||||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | ||||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | ||||
| were incorporated from suggestions, text, or comments about sections | ||||
| that were unclear supplied by Vint Cerf, Frank Ellerman, Michael | ||||
| Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken | ||||
| Whistler, although, as usual, they bear little or no responsibility | ||||
| for the conclusions the editor and contributors reached after | ||||
| receiving their suggestions. Thanks are also due to Vint Cerf, | ||||
| Debbie Garside, and Jefsey Morfin for conversations that led to | ||||
| considerable improvements in the content of this document. | ||||
| A meeting was held on 30 January 2008 to attempt to reconcile | ||||
| differences in perspective and terminology about this set of | ||||
| specifications between the design team and members of the Unicode | ||||
| Technical Consortium. The discussions at and subsequent to that | ||||
| meeting were very helpful in focusing the issues and in refining the | ||||
| specifications. The active participants at that meeting were (in | ||||
| alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | ||||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | ||||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | ||||
| Michel Suignard, and Ken Whistler. We express our thanks to Google | ||||
| for support of that meeting and to the participants for their | ||||
| contributions. | ||||
| Useful comments and text on the WG versions of the draft were | ||||
| received from many participants in the IETF "IDNABIS" WG and a number | ||||
| of document changes resulted from mailing list discussions made by | ||||
| that group. Marcos Sanz provided specific analysis and suggestions | ||||
| that were exceptionally helpful in refining the text, as did Vint | ||||
| Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan. | ||||
| 13. Contributors | ||||
| While the listed editor held the pen, this core of this document and | ||||
| the initial WG version represents the joint work and conclusions of | ||||
| an ad hoc design team consisting of the editor and, in alphabetic | ||||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | ||||
| In addition, there were many specific contributions and helpful | ||||
| comments from those listed in the Acknowledgments section and others | ||||
| who have contributed to the development and use of the IDNA | ||||
| protocols. | ||||
| 14. References | ||||
| 14.1. Normative References | ||||
| [ASCII] American National Standards Institute (formerly United | [ASCII] American National Standards Institute (formerly United | |||
| States of America Standards Institute), "USA Code for | States of America Standards Institute), "USA Code for | |||
| Information Interchange", ANSI X3.4-1968, 1968. | Information Interchange", ANSI X3.4-1968, 1968. | |||
| ANSI X3.4-1968 has been replaced by newer versions with | ANSI X3.4-1968 has been replaced by newer versions with | |||
| slight modifications, but the 1968 version remains | slight modifications, but the 1968 version remains | |||
| definitive for the Internet. | definitive for the Internet. | |||
| [IDNA2008-Bidi] | [IDNA2008-Bidi] | |||
| skipping to change at page 40, line 5 ¶ | skipping to change at page 43, line 15 ¶ | |||
| [Unicode51] | [Unicode51] | |||
| The Unicode Consortium, "The Unicode Standard, Version | The Unicode Consortium, "The Unicode Standard, Version | |||
| 5.1.0", 2008. | 5.1.0", 2008. | |||
| defined by: The Unicode Standard, Version 5.0, Boston, MA, | defined by: The Unicode Standard, Version 5.0, Boston, MA, | |||
| Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | |||
| Unicode 5.1.0 | Unicode 5.1.0 | |||
| (http://www.unicode.org/versions/Unicode5.1.0/). | (http://www.unicode.org/versions/Unicode5.1.0/). | |||
| 13.2. Informative References | 14.2. Informative References | |||
| [BIG5] Institute for Information Industry of Taiwan, "Computer | [BIG5] Institute for Information Industry of Taiwan, "Computer | |||
| Chinese Glyph and Character Code Mapping Table, Technical | Chinese Glyph and Character Code Mapping Table, Technical | |||
| Report C-26", 1984. | Report C-26", 1984. | |||
| There are several forms and variations and a closely- | There are several forms and variations and a closely- | |||
| related standard, CNS 11643. See the discussion in | related standard, CNS 11643. See the discussion in | |||
| Chapter 3 of Lunde, K., CJKV Information Processing, | Chapter 3 of Lunde, K., CJKV Information Processing, | |||
| O'Reilly & Associates, 1999 | O'Reilly & Associates, 1999 | |||
| skipping to change at page 40, line 36 ¶ | skipping to change at page 43, line 46 ¶ | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, November 1987. | STD 13, RFC 1034, November 1987. | |||
| [RFC1035] Mockapetris, P., "Domain names - implementation and | [RFC1035] Mockapetris, P., "Domain names - implementation and | |||
| specification", STD 13, RFC 1035, November 1987. | specification", STD 13, RFC 1035, November 1987. | |||
| [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | |||
| and Support", STD 3, RFC 1123, October 1989. | and Support", STD 3, RFC 1123, October 1989. | |||
| [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, | ||||
| "Dynamic Updates in the Domain Name System (DNS UPDATE)", | ||||
| RFC 2136, April 1997. | ||||
| [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS | [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS | |||
| Specification", RFC 2181, July 1997. | Specification", RFC 2181, July 1997. | |||
| [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | |||
| Languages", BCP 18, RFC 2277, January 1998. | Languages", BCP 18, RFC 2277, January 1998. | |||
| [RFC2535] Eastlake, D., "Domain Name System Security Extensions", | ||||
| RFC 2535, March 1999. | ||||
| [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", | ||||
| RFC 2671, August 1999. | ||||
| [RFC2673] Crawford, M., "Binary Labels in the Domain Name System", | [RFC2673] Crawford, M., "Binary Labels in the Domain Name System", | |||
| RFC 2673, August 1999. | RFC 2673, August 1999. | |||
| [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | |||
| specifying the location of services (DNS SRV)", RFC 2782, | specifying the location of services (DNS SRV)", RFC 2782, | |||
| February 2000. | February 2000. | |||
| [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | |||
| Internationalized Strings ("stringprep")", RFC 3454, | Internationalized Strings ("stringprep")", RFC 3454, | |||
| December 2002. | December 2002. | |||
| skipping to change at page 44, line 18 ¶ | skipping to change at page 47, line 38 ¶ | |||
| may be a more appropriate reference than one containing a year. | may be a more appropriate reference than one containing a year. | |||
| As discussed on the mailing list, we can and should discuss how to | As discussed on the mailing list, we can and should discuss how to | |||
| refer to these documents at an appropriate time (e.g., when we | refer to these documents at an appropriate time (e.g., when we | |||
| know when we will be finished) but, in the interim, it seems | know when we will be finished) but, in the interim, it seems | |||
| appropriate to simply start getting rid of the version-specific | appropriate to simply start getting rid of the version-specific | |||
| terminology where it can naturally be removed. | terminology where it can naturally be removed. | |||
| o Additional discussion of mappings, etc., especially for case- | o Additional discussion of mappings, etc., especially for case- | |||
| sensitivity. | sensitivity. | |||
| o Clarified relationship to base DNS specifications. | ||||
| o Consolidated discussion of lookup of unassigned characters. | ||||
| o More editorial fine-tuning. | o More editorial fine-tuning. | |||
| A.7. Version -07 | ||||
| o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label | ||||
| (or False-A-label), R-LDH-label, valid IDNA-label in | ||||
| Section 1.3.3. | ||||
| o Moved the "name server considerations" material to this document | ||||
| from Protocol because it is non-normative and not part of the | ||||
| protocol itself. | ||||
| o To improve clarity, redid discussion of the reasons why looking up | ||||
| unassigned code points is prohibited. | ||||
| o Editorial and other non-substantive corrections to reflect earlier | ||||
| errors as well as new definitions and terminology. | ||||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| Email: john+ietf@jck.com | Email: john+ietf@jck.com | |||
| Full Copyright Statement | ||||
| Copyright (C) The IETF Trust (2008). | ||||
| This document is subject to the rights, licenses and restrictions | ||||
| contained in BCP 78, and except as set forth therein, the authors | ||||
| retain all their rights. | ||||
| This document and the information contained herein are provided on an | ||||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | ||||
| THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | ||||
| OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | ||||
| THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
| Intellectual Property | ||||
| The IETF takes no position regarding the validity or scope of any | ||||
| Intellectual Property Rights or other rights that might be claimed to | ||||
| pertain to the implementation or use of the technology described in | ||||
| this document or the extent to which any license under such rights | ||||
| might or might not be available; nor does it represent that it has | ||||
| made any independent effort to identify any such rights. Information | ||||
| on the procedures with respect to rights in RFC documents can be | ||||
| found in BCP 78 and BCP 79. | ||||
| Copies of IPR disclosures made to the IETF Secretariat and any | ||||
| assurances of licenses to be made available, or the result of an | ||||
| attempt made to obtain a general license or permission for the use of | ||||
| such proprietary rights by implementers or users of this | ||||
| specification can be obtained from the IETF on-line IPR repository at | ||||
| http://www.ietf.org/ipr. | ||||
| The IETF invites any interested party to bring to its attention any | ||||
| copyrights, patents or patent applications, or other proprietary | ||||
| rights that may cover technology that may be required to implement | ||||
| this standard. Please address the information to the IETF at | ||||
| ietf-ipr@ietf.org. | ||||
| End of changes. 65 change blocks. | ||||
| 241 lines changed or deleted | 401 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||