| < draft-ietf-idnabis-rationale-01.txt | draft-ietf-idnabis-rationale-02.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft July 12, 2008 | Internet-Draft September 12, 2008 | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: January 13, 2009 | Expires: March 16, 2009 | |||
| Internationalized Domain Names for Applications (IDNA): Definitions, | Internationalized Domain Names for Applications (IDNA): Definitions, | |||
| Background and Rationale | Background and Rationale | |||
| draft-ietf-idnabis-rationale-01.txt | draft-ietf-idnabis-rationale-02.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on January 13, 2009. | This Internet-Draft will expire on March 16, 2009. | |||
| Abstract | Abstract | |||
| Several years have passed since the original protocol for | Several years have passed since the original protocol for | |||
| Internationalized Domain Names (IDNs) was completed and deployed. | Internationalized Domain Names (IDNs) was completed and deployed. | |||
| During that time, a number of issues have arisen, including the need | During that time, a number of issues have arisen, including the need | |||
| to update the system to deal with newer versions of Unicode. Some of | to update the system to deal with newer versions of Unicode. Some of | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| skipping to change at page 2, line 17 ¶ | skipping to change at page 2, line 17 ¶ | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.4. Applicability and Function of IDNA . . . . . . . . . . . . 5 | 1.4. Applicability and Function of IDNA . . . . . . . . . . . . 5 | |||
| 1.5. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.5. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.5.1. Documents and Standards . . . . . . . . . . . . . . . 6 | 1.5.1. Documents and Standards . . . . . . . . . . . . . . . 6 | |||
| 1.5.2. Terminology about Characters and Character Sets . . . 6 | 1.5.2. Terminology about Characters and Character Sets . . . 6 | |||
| 1.5.3. DNS-related Terminology . . . . . . . . . . . . . . . 7 | 1.5.3. DNS-related Terminology . . . . . . . . . . . . . . . 7 | |||
| 1.5.4. Terminology Specific to IDNA . . . . . . . . . . . . . 7 | 1.5.4. Terminology Specific to IDNA . . . . . . . . . . . . . 7 | |||
| 1.5.5. Punycode is an Algorithm, not a Name . . . . . . . . . 10 | 1.5.5. Punycode is an Algorithm, not a Name . . . . . . . . . 11 | |||
| 1.5.6. Other Terminology Issues . . . . . . . . . . . . . . . 11 | 1.5.6. Other Terminology Issues . . . . . . . . . . . . . . . 11 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 12 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 12 | |||
| 2. Summary of Major Changes from IDNA2003 . . . . . . . . . . . . 13 | 2. The Revised IDNA Model . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3. The Revised IDNA Model . . . . . . . . . . . . . . . . . . . . 14 | 3. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 14 | 4. IDNA2008 Document List . . . . . . . . . . . . . . . . . . . . 14 | |||
| 5. IDNA2008 Document List . . . . . . . . . . . . . . . . . . . . 14 | 5. Permitted Characters: An Inclusion List . . . . . . . . . . . 15 | |||
| 6. Permitted Characters: An Inclusion List . . . . . . . . . . . 15 | 5.1. A Tiered Model of Permitted Characters and Labels . . . . 15 | |||
| 6.1. A Tiered Model of Permitted Characters and Labels . . . . 15 | 5.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 15 | |||
| 6.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 16 | 5.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 6.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 17 | 5.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 6.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 18 | 5.2. Registration Policy . . . . . . . . . . . . . . . . . . . 18 | |||
| 6.2. Registration Policy . . . . . . . . . . . . . . . . . . . 19 | 5.3. Layered Restrictions: Tables, Context, Registration, | |||
| 6.3. Layered Restrictions: Tables, Context, Registration, | ||||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 19 | Applications . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 7. Issues that Constrain Possible Solutions . . . . . . . . . . . 19 | 6. Issues that Constrain Possible Solutions . . . . . . . . . . . 19 | |||
| 7.1. Display and Network Order . . . . . . . . . . . . . . . . 19 | 6.1. Display and Network Order . . . . . . . . . . . . . . . . 19 | |||
| 7.2. Entry and Display in Applications . . . . . . . . . . . . 21 | 6.2. Entry and Display in Applications . . . . . . . . . . . . 20 | |||
| 7.3. Linguistic Expectations: Ligatures, Digraphs, and | 6.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 22 | Alternate Character Forms . . . . . . . . . . . . . . . . 21 | |||
| 7.4. Case Mapping and Related Issues . . . . . . . . . . . . . 24 | 6.4. Case Mapping and Related Issues . . . . . . . . . . . . . 24 | |||
| 7.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 25 | 6.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 25 | |||
| 8. IDNs and the Robustness Principle . . . . . . . . . . . . . . 25 | 7. IDNs and the Robustness Principle . . . . . . . . . . . . . . 25 | |||
| 9. Front-end and User Interface Processing . . . . . . . . . . . 26 | 8. Front-end and User Interface Processing . . . . . . . . . . . 26 | |||
| 10. Migration and Version Synchronization . . . . . . . . . . . . 29 | 9. Relationship to IDNA2003 and Earlier Versions of Unicode . . . 28 | |||
| 10.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 29 | 9.1. Summary of Major Changes from IDNA2003 . . . . . . . . . . 29 | |||
| 10.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 29 | 9.2. Migration and Version Synchronization . . . . . . . . . . 29 | |||
| 10.1.2. Labels in Registration . . . . . . . . . . . . . . . . 30 | 9.2.1. Design Criteria . . . . . . . . . . . . . . . . . . . 29 | |||
| 10.1.3. Labels in Resolution (Lookup) . . . . . . . . . . . . 31 | 9.2.2. More Flexibility in User Agents . . . . . . . . . . . 33 | |||
| 10.2. More Flexibility in User Agents . . . . . . . . . . . . . 32 | 9.2.3. The Question of Prefix Changes . . . . . . . . . . . . 34 | |||
| 10.3. The Question of Prefix Changes . . . . . . . . . . . . . . 33 | 9.2.4. Stringprep Changes and Compatibility . . . . . . . . . 36 | |||
| 10.3.1. Conditions Requiring a Prefix Change . . . . . . . . . 33 | 9.2.5. The Symbol Question . . . . . . . . . . . . . . . . . 37 | |||
| 10.3.2. Conditions Not Requiring a Prefix Change . . . . . . . 34 | 9.2.6. Migration Between Unicode Versions: Unassigned | |||
| 10.3.3. Implications of Prefix Changes . . . . . . . . . . . . 35 | Code Points . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 10.4. Stringprep Changes and Compatibility . . . . . . . . . . . 35 | 9.2.7. Other Compatibility Issues . . . . . . . . . . . . . . 39 | |||
| 10.5. The Symbol Question . . . . . . . . . . . . . . . . . . . 36 | 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 10.6. Migration Between Unicode Versions: Unassigned Code | 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 37 | 12. Internationalization Considerations . . . . . . . . . . . . . 40 | |||
| 10.7. Other Compatibility Issues . . . . . . . . . . . . . . . . 38 | 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 | |||
| 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 | 13.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 41 | |||
| 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39 | 13.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 41 | |||
| 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 | 13.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 41 | |||
| 13.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 40 | 14. Security Considerations . . . . . . . . . . . . . . . . . . . 42 | |||
| 13.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 40 | 15. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| 13.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 40 | 15.1. Changes between Version -00 and Version -01 of | |||
| 14. Security Considerations . . . . . . . . . . . . . . . . . . . 41 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43 | |||
| 15. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 42 | 15.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 15.1. Version -01 of draft-klensin-idnabis-issues . . . . . . . 42 | 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 15.2. Version -02 of draft-klensin-idnabis-issues . . . . . . . 42 | 16.1. Normative References . . . . . . . . . . . . . . . . . . . 44 | |||
| 15.3. Version -03 of draft-klensin-idnabis-issues . . . . . . . 43 | 16.2. Informative References . . . . . . . . . . . . . . . . . . 46 | |||
| 15.4. Version -04 of draft-klensin-idnabis-issues . . . . . . . 43 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| 15.5. Version -05 of draft-klensin-idnabis-issues . . . . . . . 43 | Intellectual Property and Copyright Statements . . . . . . . . . . 48 | |||
| 15.6. Version -06 of draft-klensin-idnabis-issues . . . . . . . 43 | ||||
| 15.7. Version -07 of draft-klensin-idnabis-issues . . . . . . . 44 | ||||
| 15.8. Version -00 of draft-ietf-idnabis-rationale . . . . . . . 44 | ||||
| 15.9. Version -01 of draft-ietf-idnabis-rationale . . . . . . . 45 | ||||
| 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 46 | ||||
| 16.1. Normative References . . . . . . . . . . . . . . . . . . . 46 | ||||
| 16.2. Informative References . . . . . . . . . . . . . . . . . . 47 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 48 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . . . 49 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| Several years have passed since the original protocol for | Several years have passed since the original protocol for | |||
| Internationalized Domain Names (IDNs) was completed and deployed. | Internationalized Domain Names (IDNs) was completed and deployed. | |||
| During that time, a number of issues have arisen, including a subset | During that time, a number of issues have arisen, including a subset | |||
| of those described in a recent IAB report [RFC4690] and the need to | of those described in a recent IAB report [RFC4690] and the need to | |||
| update the system to deal with newer versions of Unicode. Those | update the system to deal with newer versions of Unicode. Those | |||
| skipping to change at page 4, line 48 ¶ | skipping to change at page 4, line 48 ¶ | |||
| [RFC0810]) struck a balance between the creation of useful mnemonics | [RFC0810]) struck a balance between the creation of useful mnemonics | |||
| and the introduction of parsing problems or general confusion in the | and the introduction of parsing problems or general confusion in the | |||
| contexts in which domain names are used. Our objective is to | contexts in which domain names are used. Our objective is to | |||
| preserve that balance while expanding the character repertoire to | preserve that balance while expanding the character repertoire to | |||
| include extended versions of Roman-derived scripts and scripts that | include extended versions of Roman-derived scripts and scripts that | |||
| are not Roman in origin. No work of this sort will be able to | are not Roman in origin. No work of this sort will be able to | |||
| completely eliminate sources of visual or textual confusion: such | completely eliminate sources of visual or textual confusion: such | |||
| confusion is possible even under the original rules where only ASCII | confusion is possible even under the original rules where only ASCII | |||
| characters were permitted. However, one can hope, through the | characters were permitted. However, one can hope, through the | |||
| application of different techniques at different points (see | application of different techniques at different points (see | |||
| Section 6.3), to keep problems to an acceptable minimum. One | Section 5.3), to keep problems to an acceptable minimum. One | |||
| consequence of this general objective is that the desire of some user | consequence of this general objective is that the desire of some user | |||
| or marketing community to use a particular string --whether the | or marketing community to use a particular string --whether the | |||
| reason is to try to write sentences of particular languages in the | reason is to try to write sentences of particular languages in the | |||
| DNS, to express a facsimile of the symbol for a brand, or for some | DNS, to express a facsimile of the symbol for a brand, or for some | |||
| other purpose-- is not a primary goal within the context of | other purpose-- is not a primary goal within the context of | |||
| applications in the domain name space. | applications in the domain name space. | |||
| 1.4. Applicability and Function of IDNA | 1.4. Applicability and Function of IDNA | |||
| The IDNA standard does not require any applications to conform to it, | The IDNA standard does not require any applications to conform to it, | |||
| nor does it retroactively change those applications. An application | nor does it retroactively change those applications. An application | |||
| can elect to use IDNA in order to support IDN while maintaining | can elect to use IDNA in order to support IDN while maintaining | |||
| interoperability with existing infrastructure. If an application | interoperability with existing infrastructure. If an application | |||
| wants to use non-ASCII characters in domain names, IDNA is the only | wants to use non-ASCII characters in domain names, IDNA is the only | |||
| currently-defined option. Adding IDNA support to an existing | currently-defined option. Adding IDNA support to an existing | |||
| application entails changes to the application only, and leaves room | application entails changes to the application only, and leaves room | |||
| for flexibility in front-end processing and more specifically in the | for flexibility in front-end processing and more specifically in the | |||
| user interface (see Section 9). | user interface (see Section 8). | |||
| A great deal of the discussion of IDN solutions has focused on | A great deal of the discussion of IDN solutions has focused on | |||
| transition issues and how IDNs will work in a world where not all of | transition issues and how IDNs will work in a world where not all of | |||
| the components have been updated. Proposals that were not chosen by | the components have been updated. Proposals that were not chosen by | |||
| the original IDN Working Group would depend on user applications, | the original IDN Working Group would depend on user applications, | |||
| resolvers, and DNS servers being updated in order for a user to apply | resolvers, and DNS servers being updated in order for a user to apply | |||
| an internationalized domain name in any form or coding acceptable | an internationalized domain name in any form or coding acceptable | |||
| under that method. While processing must be performed prior to or | under that method. While processing must be performed prior to or | |||
| after access to the DNS, no changes are needed to the DNS protocol or | after access to the DNS, no changes are needed to the DNS protocol or | |||
| any DNS servers or the resolvers on user's computers. | any DNS servers or the resolvers on user's computers. | |||
| skipping to change at page 5, line 51 ¶ | skipping to change at page 5, line 51 ¶ | |||
| repertoire of characters potentially makes the set of misspellings | repertoire of characters potentially makes the set of misspellings | |||
| larger, especially given that in some cases the same appearance, for | larger, especially given that in some cases the same appearance, for | |||
| example on a business card, might visually match several Unicode code | example on a business card, might visually match several Unicode code | |||
| points or several sequences of code points. | points or several sequences of code points. | |||
| IDNA allows the graceful introduction of IDNs not only by avoiding | IDNA allows the graceful introduction of IDNs not only by avoiding | |||
| upgrades to existing infrastructure (such as DNS servers and mail | upgrades to existing infrastructure (such as DNS servers and mail | |||
| transport agents), but also by allowing some rudimentary use of IDNs | transport agents), but also by allowing some rudimentary use of IDNs | |||
| in applications by using the ASCII representation of the non-ASCII | in applications by using the ASCII representation of the non-ASCII | |||
| name labels. While such names are user-unfriendly to read and type, | name labels. While such names are user-unfriendly to read and type, | |||
| and hence not optimal for user input, they allow (for instance) | and hence not optimal for user input, they can be used as a last | |||
| replying to email and clicking on URLs even though the domain name | resort to allow rudimentary IDN usage. For example, they might be | |||
| displayed is incomprehensible to the user. In order to allow user- | the best choice for display if it were known that relevant fonts were | |||
| not available on the user's computer. In order to allow user- | ||||
| friendly input and output of the IDNs and acceptance of some | friendly input and output of the IDNs and acceptance of some | |||
| characters as equivalent to those to be processed according to the | characters as equivalent to those to be processed according to the | |||
| protocol, the applications need to be modified to conform to this | protocol, the applications need to be modified to conform to this | |||
| specification. | specification. | |||
| IDNA uses the Unicode character repertoire, for continuity with | IDNA uses the Unicode character repertoire, for continuity with the | |||
| IDNA2003. | original version of IDNA. | |||
| 1.5. Terminology | 1.5. Terminology | |||
| 1.5.1. Documents and Standards | 1.5.1. Documents and Standards | |||
| This document uses the term "IDNA2003" to refer to the set of | This document uses the term "IDNA2003" to refer to the set of | |||
| standards that make up and support the version of IDNA published in | standards that make up and support the version of IDNA published in | |||
| 2003, i.e., those commonly known as the IDNA base specification | 2003, i.e., those commonly known as the IDNA base specification | |||
| [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | |||
| [RFC3454]. In this document, those names are used to refer, | [RFC3454]. In this document, those names are used to refer, | |||
| conceptually, to the individual documents, with the base IDNA | conceptually, to the individual documents, with the base IDNA | |||
| specification called just "IDNA". | specification called just "IDNA". | |||
| The term "IDNA2008" is used to refer to a new version of IDNA as | The term "IDNA2008" is used to refer to a new version of IDNA as | |||
| described in this document and in the documents described in | described in this document and in the documents described in | |||
| Section 5. References to "these specifications" are to the entire | Section 4. References to "these specifications" are to the entire | |||
| set. | set. | |||
| 1.5.2. Terminology about Characters and Character Sets | 1.5.2. Terminology about Characters and Character Sets | |||
| A code point is an integer value associated with a character in a | A code point is an integer value associated with a character in a | |||
| coded character set. | coded character set. | |||
| Unicode [Unicode51] is a coded character set containing almost | Unicode [Unicode51] is a coded character set containing almost | |||
| 100,000 characters as of the current version. A single Unicode code | 100,000 characters as of the current version. A single Unicode code | |||
| point is denoted by "U+" followed by four to six hexadecimal digits, | point is denoted by "U+" followed by four to six hexadecimal digits, | |||
| skipping to change at page 7, line 9 ¶ | skipping to change at page 7, line 10 ¶ | |||
| "Letters" are, informally, generalizations from the ASCII and common- | "Letters" are, informally, generalizations from the ASCII and common- | |||
| sense understanding of that term, i.e., characters that are used to | sense understanding of that term, i.e., characters that are used to | |||
| write text that are not digits, symbols, or punctuation. Formally, | write text that are not digits, symbols, or punctuation. Formally, | |||
| they are characters with a Unicode General Category value starting in | they are characters with a Unicode General Category value starting in | |||
| "L" (see Section 4.5 of [Unicode51]). | "L" (see Section 4.5 of [Unicode51]). | |||
| 1.5.3. DNS-related Terminology | 1.5.3. DNS-related Terminology | |||
| When discussing the DNS, this document generally assumes the | When discussing the DNS, this document generally assumes the | |||
| terminology used in the DNS specifications [RFC1034] [RFC1035]. The | terminology used in the DNS specifications [RFC1034] [RFC1035]. The | |||
| terms "lookup" and "resolution" are used interchangeably and the | terms "lookup" is used to describe the combination of operations | |||
| process or application component that performs DNS resolution is | performed by this protocol and those actually performed by a DNS | |||
| called a "resolver". The process of placing an entry into the DNS is | resolver. The process of placing an entry into the DNS is referred | |||
| referred to as "registration" paralleling common contemporary usage | to as "registration", similar to common contemporary usage in other | |||
| in other contexts. Consequently, any DNS zone administration is | contexts. Consequently, any DNS zone administration is described as | |||
| described as a "registry", regardless of that actual administrative | a "registry", regardless of the actual administrative arrangements or | |||
| arrangements or level in the tree. A note about that relationship is | level in the DNS tree. A note about that relationship is included in | |||
| included in the text below where it seems particularly significant. | the text below where it seems particularly significant. | |||
| The term "LDH code points" is defined in this document to mean the | The term "LDH code points" is defined in this document to mean the | |||
| code points associated with ASCII letters, digits, and the hyphen- | code points associated with ASCII letters, digits, and the hyphen- | |||
| minus; that is, U+002D, 0030..0039, 0041..005A, and 0061..007A. "LDH" | minus; that is, U+002D, 0030..0039, 0041..005A, and 0061..007A. "LDH" | |||
| is an abbreviation for "letters, digits, hyphen". | is an abbreviation for "letters, digits, hyphen". | |||
| The base DNS specifications [RFC1034] [RFC1035] discuss "domain | The base DNS specifications [RFC1034] [RFC1035] discuss "domain | |||
| names" and "host names", but many people and sections of these | names" and "host names", but many people and sections of these | |||
| specifications use the terms interchangeably. Further, because those | specifications use the terms interchangeably. Lack of clarity about | |||
| documents were not terribly clear, many people who are sure they know | that terminology has contributed to confusion about intent in some | |||
| the exact definitions of each of these terms disagree on the | cases. This document generally uses the term "domain name". When it | |||
| definitions. This document generally uses the term "domain name". | refers to, e.g., host name syntax restrictions, it explicitly cites | |||
| When it refers to, e.g., host name syntax restrictions, it explicitly | the relevant defining documents. The remaining definitions in this | |||
| cites the relevant defining documents. The remaining definitions in | subsection are essentially a review. | |||
| this subsection are essentially a review. | ||||
| A label is an individual component of a domain name. Labels are | A label is an individual component of a domain name. Labels are | |||
| usually shown separated by dots; for example, the domain name | usually shown separated by dots; for example, the domain name | |||
| "www.example.com" is composed of three labels: "www", "example", and | "www.example.com" is composed of three labels: "www", "example", and | |||
| "com". (The zero-length root label described in [RFC1123], which can | "com". (The zero-length root label described in RFC 1123 [RFC1123], | |||
| be explicit as in "www.example.com." or implicit as in | which can be explicit as in "www.example.com." or implicit as in | |||
| "www.example.com", is not considered a label in this specification.) | "www.example.com", is not considered in this specification.) IDNA | |||
| IDNA extends the set of usable characters in labels that are text. | extends the set of usable characters in labels that are treated as | |||
| For the rest of this document, the term "label" is shorthand for | text (as distinct from the binary string labels discussed in RFC 1035 | |||
| "text label", and "every label" means "every text label". | and RFC 2181 [RFC2181] and the bitstring ones described in RFC 2673 | |||
| [RFC2673]). For the rest of this document and in the related ones, | ||||
| the term "label" is shorthand for "text label", and "every label" | ||||
| means "every text label". | ||||
| 1.5.4. Terminology Specific to IDNA | 1.5.4. Terminology Specific to IDNA | |||
| This section defines some terminology to reduce dependence on terms | This section defines some terminology to reduce dependence on terms | |||
| and definitions that have been problematic in the past. | and definitions that have been problematic in the past. | |||
| 1.5.4.1. Terms for IDN Label Codings | 1.5.4.1. Terms for IDN Label Codings | |||
| 1.5.4.1.1. IDNA-valid strings, A-label, and U-label | 1.5.4.1.1. IDNA-valid strings, A-label, and U-label | |||
| skipping to change at page 8, line 14 ¶ | skipping to change at page 8, line 20 ¶ | |||
| subsection. In the next, it defines a historical one to be slightly | subsection. In the next, it defines a historical one to be slightly | |||
| more precise for IDNA contexts. | more precise for IDNA contexts. | |||
| o A string is "IDNA-valid" if it meets all of the requirements of | o A string is "IDNA-valid" if it meets all of the requirements of | |||
| these specifications for an IDNA label. IDNA-valid strings may | these specifications for an IDNA label. IDNA-valid strings may | |||
| appear in either of two forms, defined immediately below. It is | appear in either of two forms, defined immediately below. It is | |||
| expected that specific reference will be made to the form | expected that specific reference will be made to the form | |||
| appropriate to any context in which the distinction is important. | appropriate to any context in which the distinction is important. | |||
| o An "A-label" is the ASCII-Compatible Encoding (ACE, see | o An "A-label" is the ASCII-Compatible Encoding (ACE, see | |||
| Section 1.5.4.4) form of an IDNA-valid string. It must be a | Section 1.5.4.5) form of an IDNA-valid string. It must be a | |||
| complete label: IDNA is defined for labels, not for parts of them | complete label: IDNA is defined for labels, not for parts of them | |||
| and not for complete domain names. This means, by definition, | and not for complete domain names. This means, by definition, | |||
| that every A-label will begin with the IDNA ACE prefix, "xn--", | that every A-label will begin with the IDNA ACE prefix, "xn--", | |||
| followed by a string that is a valid output of the Punycode | followed by a string that is a valid output of the Punycode | |||
| algorithm and hence a maximum of 59 ASCII characters in length. | algorithm and hence a maximum of 59 ASCII characters in length. | |||
| The prefix and string together must conform to all requirements | The prefix and string together must conform to all requirements | |||
| for a label that can be stored in the DNS including conformance to | for a label that can be stored in the DNS including conformance to | |||
| the LDH ("host name") rule described in RFC 1034, RFC 1123 and | the rules for the preferred form described in RFC 1034, RFC 1035, | |||
| elsewhere. | and RFC 1123. | |||
| o A "U-label" is an IDNA-valid string of Unicode characters, | o A "U-label" is an IDNA-valid string of Unicode characters, | |||
| including at least one non-ASCII character, expressed in a | including at least one non-ASCII character, expressed in a | |||
| standard Unicode Encoding Form, normally UTF-8 in an Internet | standard Unicode Encoding Form -- normally UTF-8 in an Internet | |||
| transmission context, and subject to the constraint below. | transmission context -- and subject to the constraint below. | |||
| Conversions between valid U-labels and valid A-labels is performed | Conversions between U-labels and A-labels are performed according | |||
| according to the specification in [RFC3492], adding or removing | to the "Punycode" specification [RFC3492], adding or removing the | |||
| the ACE prefix (see Section 1.5.4.4) as needed. | ACE prefix (see Section 1.5.4.5) as needed. | |||
| To be valid, U-labels and A-labels must obey an important symmetry | To be valid, U-labels and A-labels must obey an important symmetry | |||
| constraint. While that constraint may be tested in any of several | constraint. While that constraint may be tested in any of several | |||
| ways, an A-label must be capable of being produced by conversion from | ways, an A-label must be capable of being produced by conversion from | |||
| a U-label and a U-label must be capable of being produced by | a U-label and a U-label must be capable of being produced by | |||
| conversion from an A-label. Among other things, this implies that | conversion from an A-label. Among other things, this implies that | |||
| both U-labels and A-labels must represent strings in normalized form. | both U-labels and A-labels must be strings in Unicode NFC | |||
| These strings MUST contain only characters specified elsewhere in | [Unicode-UAX15] normalized form. These strings MUST contain only | |||
| this document and its companion documents, and only in the contexts | characters specified elsewhere in this document and its companion | |||
| indicated as appropriate. | documents, and only in the contexts indicated as appropriate. | |||
| Any rules or conventions that apply to DNS labels in general, such as | Any rules or conventions that apply to DNS labels in general, such as | |||
| rules about lengths of strings, apply to whichever of the U-label or | rules about lengths of strings, apply to whichever of the U-label or | |||
| A-label would be more restrictive. For the U-label, constraints | A-label would be more restrictive. For the U-label, constraints | |||
| imposed by existing protocols and their presentation forms make the | imposed by existing protocols and their presentation forms make the | |||
| length restriction apply to the length in octets of the UTF-8 form of | length restriction apply to the length in octets of the UTF-8 form of | |||
| those labels (which will always be greater than or equal to the | those labels (which will always be greater than or equal to the | |||
| length in code points). The exception to this, of course, is that | length in code points). The exception to this, of course, is that | |||
| the restriction to ASCII characters does not apply to the U-label. | the restriction to ASCII characters does not apply to the U-label. | |||
| skipping to change at page 9, line 29 ¶ | skipping to change at page 9, line 33 ¶ | |||
| o cannot be processed as U-labels or A-labels as described in these | o cannot be processed as U-labels or A-labels as described in these | |||
| specifications, | specifications, | |||
| are invalid in IDNA-conformant applications as labels in domain names | are invalid in IDNA-conformant applications as labels in domain names | |||
| that identify Internet hosts or similar resources. This restriction | that identify Internet hosts or similar resources. This restriction | |||
| on strings containing "--" is required for three reasons: | on strings containing "--" is required for three reasons: | |||
| o to prevent confusion with pre-IDNA coding forms; | o to prevent confusion with pre-IDNA coding forms; | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 10.3); | prefix, no matter how unlikely those might be (see Section 9.2.3); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the encoding system. | o to reduce the opportunities for attacks via the encoding system. | |||
| 1.5.4.2. LDH-label and Internationalized Label | 1.5.4.2. LDH-label and Internationalized Label | |||
| In the hope of further clarifying discussions about IDNs, these | In the hope of further clarifying discussions about IDNs, these | |||
| specifications use the term "LDH-label" strictly to refer to an all- | specifications use the term "LDH-label" strictly to refer to an all- | |||
| ASCII label that obeys the "hostname" (LDH) conventions and that is | ASCII label that obeys the preferred syntax (often known as | |||
| not an IDN. In other words, only "U-label" and "A-label" refer to | "hostname" (from RFC 952 [RFC0952]) or "LDH") conventions and that is | |||
| IDNs; LDH-labels are not IDNs. "Internationalized label" is used | not an IDN. It should be stressed that an A-label obeys the | |||
| when a term is needed to refer to any of the three categories. There | "hostname" rules and is sometimes described as "LDH-conformant" or in | |||
| are some standardized DNS label formats, such as those for service | similar language but that it is not an LDH-label as used in this | |||
| location (SRV) records [RFC2782] that do not fall into any of the | document. | |||
| three categories and hence are not internationalized labels. | ||||
| 1.5.4.3. Equivalence | 1.5.4.3. Internationalized Domain Name | |||
| An "internationalized domain name" (IDN) is a domain name that may | ||||
| contain any mixture of LDH-labels, A-labels, or U-labels. This | ||||
| implies that every conventional domain name is an IDN (which implies | ||||
| that it is possible for a domain name to be an IDN without it | ||||
| containing any non-ASCII characters). Just as has been the case with | ||||
| ASCII names, some DNS zone administrators may impose restrictions, | ||||
| beyond those imposed by DNS or IDNA, on the characters or strings | ||||
| that may be registered as labels in their zones. Because of the | ||||
| diversity of characters that can be used in a U-label and the | ||||
| confusion they might cause, such restrictions are mandatory for IDN | ||||
| registries and zones even though the particular restrictions are not | ||||
| part of these specifications. Because these restrictions, commonly | ||||
| known as "registry restrictions", only affect what can be registered | ||||
| and not lookup processing, they have no effect on the syntax or | ||||
| semantics of DNS protocol messages; a query for a name that matches | ||||
| no records will yield the same response regardless of the reason why | ||||
| it is not in the zone. Clients issuing queries or interpreting | ||||
| responses cannot be assumed to have any knowledge of zone-specific | ||||
| restrictions or conventions. See Section 5.2. | ||||
| "Internationalized label" is used when a term is needed to refer to a | ||||
| single label of an IDN, i.e., one that might be any of an LDH-label, | ||||
| A-label, or U-label. There are some standardized DNS label formats, | ||||
| such as those for service location (SRV) records [RFC2782] that do | ||||
| not fall into any of the three categories and hence are not | ||||
| internationalized labels. | ||||
| 1.5.4.4. Equivalence | ||||
| In IDNA, equivalence of labels is defined in terms of the A-labels. | In IDNA, equivalence of labels is defined in terms of the A-labels. | |||
| If the A-labels are equal in a case-independent comparison, then the | If the A-labels are equal in a case-independent comparison, then the | |||
| labels are considered equivalent, no matter how they are represented. | labels are considered equivalent, no matter how they are represented. | |||
| Traditional LDH labels already have a notion of equivalence: within | Traditional LDH labels already have a notion of equivalence: within | |||
| that list of characters, upper case and lower case are considered | that list of characters, upper case and lower case are considered | |||
| equivalent. The IDNA notion of equivalence is an extension of that | equivalent. The IDNA notion of equivalence is an extension of that | |||
| older notion. Equivalent labels in IDNA are treated as alternate | older notion. Equivalent labels in IDNA are treated as alternate | |||
| forms of the same label, just as "foo" and "Foo" are treated as | forms of the same label, just as "foo" and "Foo" are treated as | |||
| alternate forms of the same label. | alternate forms of the same label. | |||
| 1.5.4.4. ACE Prefix | 1.5.4.5. ACE Prefix | |||
| The "ACE prefix" is defined in this document to be a string of ASCII | The "ACE prefix" is defined in this document to be a string of ASCII | |||
| characters "xn--" that appears at the beginning of every A-label. | characters "xn--" that appears at the beginning of every A-label. | |||
| "ACE" stands for "ASCII-Compatible Encoding". | "ACE" stands for "ASCII-Compatible Encoding". | |||
| 1.5.4.5. Domain Name Slot | 1.5.4.6. Domain Name Slot | |||
| A "domain name slot" is defined in this document to be a protocol | A "domain name slot" is defined in this document to be a protocol | |||
| element or a function argument or a return value (and so on) | element or a function argument or a return value (and so on) | |||
| explicitly designated for carrying a domain name. Examples of domain | explicitly designated for carrying a domain name. Examples of domain | |||
| name slots include: the QNAME field of a DNS query; the name argument | name slots include: the QNAME field of a DNS query; the name argument | |||
| of the gethostbyname() or getaddrinfo() standard C library functions; | of the gethostbyname() or getaddrinfo() standard C library functions; | |||
| the part of an email address following the at-sign (@) in the | the part of an email address following the at-sign (@) in the | |||
| parameter to the SMTP MAIL or RCPT commands or the "From:" field of | parameter to the SMTP MAIL or RCPT commands or the "From:" field of | |||
| an email message header; and the host portion of the URI in the src | an email message header; and the host portion of the URI in the src | |||
| attribute of an HTML <IMG> tag. General text that just happens to | attribute of an HTML <IMG> tag. General text that just happens to | |||
| skipping to change at page 10, line 46 ¶ | skipping to change at page 11, line 35 ¶ | |||
| negotiation in an interactive session). | negotiation in an interactive session). | |||
| An "IDN-unaware domain name slot" is defined in this document to be | An "IDN-unaware domain name slot" is defined in this document to be | |||
| any domain name slot that is not an IDN-aware domain name slot. | any domain name slot that is not an IDN-aware domain name slot. | |||
| Obviously, this includes any domain name slot whose specification | Obviously, this includes any domain name slot whose specification | |||
| predates IDNA. | predates IDNA. | |||
| 1.5.5. Punycode is an Algorithm, not a Name | 1.5.5. Punycode is an Algorithm, not a Name | |||
| There has been some confusion about whether a "Punycode string" does | There has been some confusion about whether a "Punycode string" does | |||
| or does not include the prefix and about whether it is required that | or does not include the ACE prefix and about whether it is required | |||
| such strings could have been the output of ToASCII (see RFC 3490, | that such strings could have been the output of the ToASCII operation | |||
| Section 4 [RFC3490]). This specification discourages the use of the | (see RFC 3490, Section 4 [RFC3490]). This specification discourages | |||
| term "Punycode" to describe anything but the encoding method and | the use of the term "Punycode" to describe anything but the encoding | |||
| algorithm of [RFC3492]. The terms defined above are preferred as | method and algorithm of [RFC3492]. The terms defined above are | |||
| much more clear than terms such as "Punycode string". | preferred as much more clear than terms such as "Punycode string". | |||
| 1.5.6. Other Terminology Issues | 1.5.6. Other Terminology Issues | |||
| The document departs from historical DNS terminology and usage in one | The document departs from historical DNS terminology and usage in one | |||
| important respect. Over the years, the community has talked very | important respect. Over the years, the community has talked very | |||
| casually about "names" in the DNS, beginning with calling it "the | casually about "names" in the DNS, beginning with calling it "the | |||
| domain name system". That terminology is fine in the very precise | domain name system". That terminology is fine in the very precise | |||
| sense that the identifiers of the DNS do provide names for objects | sense that the identifiers of the DNS do provide names for objects | |||
| and addresses. But, in the context of IDNs, the term has introduced | and addresses. But, in the context of IDNs, the term has introduced | |||
| some confusion, confusion that has increased further as people have | some confusion, confusion that has increased further as people have | |||
| skipping to change at page 11, line 31 ¶ | skipping to change at page 12, line 19 ¶ | |||
| because they are mnemonics, they need not obey the orthographic | because they are mnemonics, they need not obey the orthographic | |||
| conventions of any language: it is not a requirement that it be | conventions of any language: it is not a requirement that it be | |||
| possible for them to be "words". | possible for them to be "words". | |||
| This distinction is important because the reasonable goal of an IDN | This distinction is important because the reasonable goal of an IDN | |||
| effort is not to be able to write the great Klingon (or language of | effort is not to be able to write the great Klingon (or language of | |||
| one's choice) novel in DNS labels but to be able to form a usefully | one's choice) novel in DNS labels but to be able to form a usefully | |||
| broad range of mnemonics in ways that are as natural as possible in a | broad range of mnemonics in ways that are as natural as possible in a | |||
| very broad range of scripts. | very broad range of scripts. | |||
| An "internationalized domain name" (IDN) is a domain name that may | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| contain any mixture of LDH-labels, A-labels, or U-labels. This | ||||
| implies that every conventional domain name is an IDN (which implies | ||||
| that it is possible for a domain name to be an IDN without it | ||||
| containing any non-ASCII characters). Just as has been the case with | ||||
| ASCII names, some DNS zone administrators may impose restrictions, | ||||
| beyond those imposed by DNS or IDNA, on the characters or strings | ||||
| that may be registered as labels in their zones. Because of the | ||||
| diversity of characters that can be used in a U-label and the | ||||
| confusion they might cause, such restrictions are mandatory for IDN | ||||
| registries and zones even though the particular restrictions are not | ||||
| part of these specifications. Because these restrictions, commonly | ||||
| known as "registry restrictions", only affect what can be registered | ||||
| and not resolution processing, they have no effect on the syntax or | ||||
| semantics of DNS protocol messages; a query for a name that matches | ||||
| no records will yield the same response regardless of the reason why | ||||
| it is not in the zone. Clients issuing queries or interpreting | ||||
| responses cannot be assumed to have any knowledge of zone-specific | ||||
| restrictions or conventions. See Section 6.2. | ||||
| "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing | 1.6. Comprehensibility of IDNA Mechanisms and Processing | |||
| One of the major goals of this work is to improve the general | One of the major goals of this work is to improve the general | |||
| understanding of how IDNA works and what characters are permitted and | understanding of how IDNA works and what characters are permitted and | |||
| what happens to them. Comprehensibility and predictability to users | what happens to them. Comprehensibility and predictability to users | |||
| and registrants are themselves important motivations and design goals | and registrants are themselves important motivations and design goals | |||
| for this effort. The effort includes some new terminology and a | for this effort. The effort includes some new terminology and a | |||
| revised and extended model, both covered in this section, and some | revised and extended model, both covered in this section, and some | |||
| more specific protocol, processing, and table modifications. Details | more specific protocol, processing, and table modifications. Details | |||
| of the latter appear in other documents (see Section 5). | of the latter appear in other documents (see Section 4). | |||
| Several issues are inherent in the application of IDNs and, indeed, | Several issues are inherent in the application of IDNs and, indeed, | |||
| almost any other system that tries to handle international characters | almost any other system that tries to handle international characters | |||
| and concepts. They range from the apparently trivial --e.g., one | and concepts. They range from the apparently trivial --e.g., one | |||
| cannot display a character for which one does not have a font | cannot display a character for which one does not have a font | |||
| available locally-- to the more complex and subtle. Many people have | available locally-- to the more complex and subtle. Many people have | |||
| observed that internationalization is just a tool to enable effective | observed that internationalization is just a tool to enable effective | |||
| localization while permitting some global uniformity. Issues of | localization while permitting some global uniformity. Issues of | |||
| display, of exactly how various strings and characters are entered, | display, of exactly how various strings and characters are entered, | |||
| and so on are inherently issues about localization and user interface | and so on are inherently issues about localization and user interface | |||
| skipping to change at page 12, line 41 ¶ | skipping to change at page 13, line 10 ¶ | |||
| work when characters and fonts are not available, but they can only | work when characters and fonts are not available, but they can only | |||
| be general recommendations and, because display functions are rarely | be general recommendations and, because display functions are rarely | |||
| controlled by the types of applications that would call upon IDNA, | controlled by the types of applications that would call upon IDNA, | |||
| will rarely be very effective. | will rarely be very effective. | |||
| However, shifting responsibility for character mapping and other | However, shifting responsibility for character mapping and other | |||
| adjustments from the protocol (where it was located in IDNA2003) to | adjustments from the protocol (where it was located in IDNA2003) to | |||
| the user interface or processing before invoking IDNA raises issues | the user interface or processing before invoking IDNA raises issues | |||
| about both what that processing should do and about compatibility for | about both what that processing should do and about compatibility for | |||
| references prepared in an IDNA2003 context. Those issues are | references prepared in an IDNA2003 context. Those issues are | |||
| discussed in Section 9. | discussed in Section 8. | |||
| Operations for converting between local character sets and normalized | Operations for converting between local character sets and normalized | |||
| Unicode are part of this general set of user interface issues. The | Unicode are part of this general set of user interface issues. The | |||
| conversion is obviously not required at all in a Unicode-native | conversion is obviously not required at all in a Unicode-native | |||
| system that maintains all strings in Normalization Form C (NFC). It | system that maintains all strings in Normalization Form C (NFC). It | |||
| may, however, involve some complexity in a system that is not | may, however, involve some complexity in a system that is not | |||
| Unicode-native, especially if the elements of the local character set | Unicode-native, especially if the elements of the local character set | |||
| do not map exactly and unambiguously into Unicode characters or do so | do not map exactly and unambiguously into Unicode characters or do so | |||
| in a way that is not completely stable over time. Perhaps more | in a way that is not completely stable over time. Perhaps more | |||
| important, if a label being converted to a local character set | important, if a label being converted to a local character set | |||
| skipping to change at page 13, line 25 ¶ | skipping to change at page 13, line 42 ¶ | |||
| systems are substantially or completely Unicode-compatible (i.e., all | systems are substantially or completely Unicode-compatible (i.e., all | |||
| of the code points in them have an exact and unique mapping to | of the code points in them have an exact and unique mapping to | |||
| Unicode code points). It may be even more difficult when the | Unicode code points). It may be even more difficult when the | |||
| character coding system in local use is based on conceptually | character coding system in local use is based on conceptually | |||
| different assumptions than those used by Unicode about, e.g., about | different assumptions than those used by Unicode about, e.g., about | |||
| font encodings used for publications in some Indic scripts. Those | font encodings used for publications in some Indic scripts. Those | |||
| differences may not easily yield unambiguous conversions or | differences may not easily yield unambiguous conversions or | |||
| interpretations even if each coding system is internally consistent | interpretations even if each coding system is internally consistent | |||
| and adequate to represent the local language and script. | and adequate to represent the local language and script. | |||
| 2. Summary of Major Changes from IDNA2003 | 2. The Revised IDNA Model | |||
| 1. Update base character set from Unicode 3.2 to Unicode version- | ||||
| agnostic. | ||||
| 2. Separate the definitions for the "registration" and "lookup" | ||||
| activities. | ||||
| 3. Disallow symbol and punctuation characters except where special | ||||
| exceptions are necessary. | ||||
| 4. Remove the mapping and normalization steps from the protocol and | ||||
| have them instead done by the applications themselves, possibly | ||||
| in a local fashion, before invoking the protocol. | ||||
| 5. Change the way that the protocol specifies which characters are | ||||
| allowed in labels from "humans decide what the table of | ||||
| codepoints contains" to "decision about codepoints are based on | ||||
| Unicode properties plus a small exclusion list created by | ||||
| humans". | ||||
| 6. Introduce the new concept of characters that can be used only in | ||||
| specific contexts. | ||||
| 7. Allow typical words and names in languages such as Dhivehi and | ||||
| Yiddish to be expressed. | ||||
| 8. Make bidirectional domain names (delimited strings of labels, | ||||
| not just labels standing on their own) display in a non- | ||||
| surprising fashion. | ||||
| 9. Make bidirectional domain names in a paragraph display in a non- | ||||
| surprising fashion.[[anchor17: Is this statement necessary or is | ||||
| it redundant with the previous one?]] | ||||
| 10. Remove the dot separator from the mandatory part of the | ||||
| protocol. | ||||
| 11. Make some currently-valid labels that are not actually IDNA | ||||
| labels invalid. | ||||
| 3. The Revised IDNA Model | ||||
| IDNA is a client-side protocol, i.e., almost all of the processing is | IDNA is a client-side protocol, i.e., almost all of the processing is | |||
| performed by the client. The strings that appear in, and are | performed by the client. The strings that appear in, and are | |||
| resolved by, the DNS conform to the traditional rules for the naming | resolved by, the DNS conform to the traditional rules for the naming | |||
| of hosts, and consist of ASCII letters, digits, and hyphens. This | of hosts, and consist of ASCII letters, digits, and hyphens. This | |||
| approach permits IDNA to be deployed without modifications to the DNS | approach permits IDNA to be deployed without modifications to the DNS | |||
| itself. That, in turn, avoids both having to upgrade the entire | itself. That, in turn, avoids both having to upgrade the entire | |||
| Internet to support IDNs and needing to incur the unknown risks to | Internet to support IDNs and needing to incur the unknown risks to | |||
| deployed systems of DNS structural or design changes especially if | deployed systems of DNS structural or design changes especially if | |||
| those changes need to be deployed all at the same time. | those changes need to be deployed all at the same time. | |||
| 4. Processing in IDNA2008 | [[anchor17: This paragraph is somewhat redundant with material | |||
| above.It will be dropped in -03 if there are not strong arguments for | ||||
| keeping it here.]] | ||||
| These specifications separate Domain Name Registration and Resolution | 3. Processing in IDNA2008 | |||
| in the protocol specification. Doing so reflects current practice in | ||||
| These specifications separate Domain Name Registration and Lookup in | ||||
| the protocol specification. Doing so reflects current practice in | ||||
| which per-registry restrictions and special processing are applied at | which per-registry restrictions and special processing are applied at | |||
| registration time but not on resolution. Even more important in the | registration time but not during lookup. Even more important in the | |||
| longer term, it facilitates incremental addition of permitted | longer term, it facilitates incremental addition of permitted | |||
| character groups to avoid freezing on one particular version of | character groups to avoid freezing on one particular version of | |||
| Unicode. | Unicode. | |||
| The actual registration and lookup protocols for IDNA2008 are | The actual registration and lookup protocols for IDNA2008 are | |||
| specified in [IDNA2008-Protocol]. | specified in [IDNA2008-Protocol]. | |||
| 5. IDNA2008 Document List | 4. IDNA2008 Document List | |||
| [[anchor19: This section will need to be extensively revised or | [[anchor19: This section will need to be extensively revised or | |||
| removed before publication.]] | removed before publication.]] | |||
| The following documents are being produced as part of the IDNA2008 | The following documents are being produced as part of the IDNA2008 | |||
| effort. | effort. | |||
| o A revised version of this document, containing an overview, | o A revised version of this document, containing an overview, | |||
| rationale, and conformance conditions. | rationale, and conformance conditions. | |||
| o A separate document, drawn from material in early versions of this | o A separate document, drawn from material in early versions of this | |||
| one, that explicitly updates and replaces RFC 3490 but which has | one, that explicitly updates and replaces RFC 3490 but which has | |||
| most rationale material from that document moved to this one | most rationale material from that document moved to this one | |||
| [IDNA2008-Protocol]. | [IDNA2008-Protocol]. | |||
| o A document describing the "Bidi problem" with Stringprep and | o A document describing the "Bidi problem" with Stringprep and | |||
| proposing a solution [IDNA2008-Bidi]. | proposing a solution [IDNA2008-Bidi]. | |||
| o A specification of the categories and rules that identify the code | o A specification of the categories and rules that identify the code | |||
| points allowed in a U-label, based on Unicode 5.0 code | points allowed in a U-label, based on Unicode 5.0 code | |||
| assignments. See Section 6 and [IDNA2008-Tables]. | assignments. See Section 5 and [IDNA2008-Tables]. | |||
| o One or more documents containing guidance and suggestions for | o One or more documents containing guidance and suggestions for | |||
| registries (in this context, those responsible for establishing | registries (in this context, those responsible for establishing | |||
| policies for any zone file in the DNS, not only those at the top | policies for any zone file in the DNS, not only those at the top | |||
| or second level). The documents in this category may not be IETF | or second level). The documents in this category may not be IETF | |||
| products and may be prepared and completed asynchronously with | products and may be prepared and completed asynchronously with | |||
| those described above. | those described above. | |||
| 6. Permitted Characters: An Inclusion List | 5. Permitted Characters: An Inclusion List | |||
| This section provides an overview of the model used to establish the | This section provides an overview of the model used to establish the | |||
| algorithm and character lists of [IDNA2008-Tables] and describes the | algorithm and character lists of [IDNA2008-Tables] and describes the | |||
| names and applicability of the categories used there. Note that the | names and applicability of the categories used there. Note that the | |||
| inclusion of a character in the first category group does not imply | inclusion of a character in the first category group does not imply | |||
| that it can be used indiscriminately; some characters are associated | that it can be used indiscriminately; some characters are associated | |||
| with contextual rules that must be applied as well. | with contextual rules that must be applied as well. | |||
| The information given in this section is provided to make the rules, | The information given in this section is provided to make the rules, | |||
| tables, and protocol easier to understand. It is not normative. The | tables, and protocol easier to understand. It is not normative. The | |||
| normative generating rules appear in [IDNA2008-Tables] and the rules | normative generating rules appear in [IDNA2008-Tables] and the rules | |||
| that actually determine what labels can be registered or looked up | that actually determine what labels can be registered or looked up | |||
| are in [IDNA2008-Protocol]. | are in [IDNA2008-Protocol]. | |||
| 6.1. A Tiered Model of Permitted Characters and Labels | 5.1. A Tiered Model of Permitted Characters and Labels | |||
| Moving to an inclusion model requires respecifying the list of | Moving to an inclusion model requires respecifying the list of | |||
| characters that are permitted in IDNs. In IDNA2003, the role and | characters that are permitted in IDNs. In IDNA2003, the role and | |||
| utility of characters are independent of context and fixed forever | utility of characters are independent of context and fixed forever | |||
| (or until the standard is replaced). Making completely context- | (or until the standard is replaced). Making completely context- | |||
| independent rules globally has proven impractical because some | independent rules globally has proven impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| no visible effect(s) in others. Of necessity, IDNA2003 prohibited | no visible effect(s) in others. Of necessity, IDNA2003 prohibited | |||
| those types of characters entirely. But the restrictions were much | those types of characters entirely. But the restrictions were much | |||
| skipping to change at page 16, line 20 ¶ | skipping to change at page 15, line 46 ¶ | |||
| but limit their use to very specific contexts was reinforced by the | but limit their use to very specific contexts was reinforced by the | |||
| observation that handling of particular characters across the | observation that handling of particular characters across the | |||
| languages that use a script, or the use of similar or identical- | languages that use a script, or the use of similar or identical- | |||
| looking characters in different scripts, is less well understood than | looking characters in different scripts, is less well understood than | |||
| many people believed it was several years ago. | many people believed it was several years ago. | |||
| Independently of the characters chosen (see next subsection), the | Independently of the characters chosen (see next subsection), the | |||
| theory is to divide the characters that appear in Unicode into three | theory is to divide the characters that appear in Unicode into three | |||
| categories: | categories: | |||
| 6.1.1. PROTOCOL-VALID | 5.1.1. PROTOCOL-VALID | |||
| Characters identified as "PROTOCOL-VALID" (often abbreviated | Characters identified as "PROTOCOL-VALID" (often abbreviated | |||
| "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | |||
| Their use may be restricted by rules about the context in which they | Their use may be restricted by rules about the context in which they | |||
| appear or by other rules that apply to the entire label in which they | appear or by other rules that apply to the entire label in which they | |||
| are to be embedded. For example, any label that contains a character | are to be embedded. For example, any label that contains a character | |||
| in this group that has a "right to left" property must be used in | in this category that has a "right-to-left" property must be used in | |||
| context with the "Bidi" rules (see [IDNA2008-Bidi]). | context with the "Bidi" rules (see [IDNA2008-Bidi]). | |||
| The term "PROTOCOL-VALID", is used to stress the fact that the | The term "PROTOCOL-VALID" is used to stress the fact that the | |||
| presence of a character in this category does not imply that a given | presence of a character in this category does not imply that a given | |||
| registry need accept registrations containing any of the characters | registry need accept registrations containing any of the characters | |||
| in the category. Registries are still expected to apply judgment | in the category. Registries are still expected to apply judgment | |||
| about labels they will accept and to maintain rules consistent with | about labels they will accept and to maintain rules consistent with | |||
| those judgments (see [IDNA2008-Protocol] and Section 6.3). | those judgments (see [IDNA2008-Protocol] and Section 5.3). | |||
| Characters that are placed in the "PROTOCOL-VALID" category are never | Characters that are placed in the "PROTOCOL-VALID" category are never | |||
| removed from it unless the code points themselves are removed from | removed from it unless the code points themselves are removed from | |||
| Unicode (such removal would be inconsistent with the Unicode | Unicode (such removal would be inconsistent with the Unicode | |||
| stability principles (see [Unicode51], Appendix F) and hence should | stability principles (see [Unicode51], Appendix F) and hence should | |||
| never occur). | never occur). | |||
| [[anchor21: Placeholder: Does this topic or comment need additional | [[anchor21: Placeholder: Does this topic or comment need additional | |||
| discussion or explanation?]] | discussion or explanation?]] | |||
| 6.1.1.1. Contextual Rules | 5.1.1.1. Contextual Rules | |||
| Some characters may be unsuitable for general use in IDNs but | Some characters may be unsuitable for general use in IDNs but | |||
| necessary for the plausible support of some scripts. The two most | necessary for the plausible support of some scripts. The two most | |||
| commonly-cited examples are the zero-width joiner and non-joiner | commonly-cited examples are the zero-width joiner and non-joiner | |||
| characters (ZWNJ, U+200C, and ZWJ, U+200D), but provisions for | characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for | |||
| unambiguous labels may require that other characters be restricted to | unambiguous labels may require that other characters be restricted to | |||
| particular contexts. For example, the ASCII hyphen is not permitted | particular contexts. For example, the ASCII hyphen is not permitted | |||
| to start or end a label, whether that label contains non-ASCII | to start or end a label, whether that label contains non-ASCII | |||
| characters or not. | characters or not. | |||
| These characters must not appear in IDNs without additional | These characters must not appear in IDNs without additional | |||
| restrictions, typically because they have no visible consequences in | restrictions, typically because they have no visible consequences in | |||
| most scripts but affect format or presentation in a few others or | most scripts but affect format or presentation in a few others or | |||
| because they are combining characters that are safe for use only in | because they are combining characters that are safe for use only in | |||
| conjunction with particular characters or scripts. In order to | conjunction with particular characters or scripts. In order to | |||
| permit them to be used at all, they are specially identified as | permit them to be used at all, they are specially identified as | |||
| "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | |||
| associated with a rule. In addition, the rule will define whether it | associated with a rule. In addition, the rule will define whether it | |||
| is to be applied on lookup as well as registration. A distinction is | is to be applied on lookup as well as registration. A distinction is | |||
| made between characters that indicate or prohibit joining (known as | made between characters that indicate or prohibit joining (known as | |||
| "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | |||
| contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | |||
| former are fully tested at lookup time. | former are fully tested at lookup time. | |||
| 6.1.1.2. Rules and Their Application | 5.1.1.2. Rules and Their Application | |||
| The actual rules may be present or absent. If present, they may have | The actual rules may be present or absent. If present, they may have | |||
| values of "True" (character may be used in any position in any | values of "True" (character may be used in any position in any | |||
| label), "False" (character may not be used in any label), or may be | label), "False" (character may not be used in any label), or may be a | |||
| an extended regular expression that specifies the context in which | set of procedural rules that specify the context in which the | |||
| the character is permitted. | character is permitted. | |||
| Examples of descriptions of typical rules, stated informally and in | Examples of descriptions of typical rules, stated informally and in | |||
| English, include "Must follow a character from Script XYZ", "MUST | English, include "Must follow a character from Script XYZ", "MUST | |||
| occur only if the entire label is in Script ABC", "MUST occur only if | occur only if the entire label is in Script ABC", "MUST occur only if | |||
| the previous and subsequent characters have the DFG property". | the previous and subsequent characters have the DFG property". | |||
| Because it is easier to identify these characters than to know that | Because it is easier to identify these characters than to know that | |||
| they are actually needed in IDNs or how to establish exactly the | they are actually needed in IDNs or how to establish exactly the | |||
| right rules for each one, a rule may have a null value in a given | right rules for each one, a rule may have a null value in a given | |||
| version of the tables. Characters associated with null rules MUST | version of the tables. Characters associated with null rules MUST | |||
| NOT appear in putative labels for either registration or lookup. Of | NOT appear in putative labels for either registration or lookup. Of | |||
| course, a later version of the tables might contain a non-null rule. | course, a later version of the tables might contain a non-null rule. | |||
| [[anchor23: Definition of regular expression language to be supplied | The description of the syntax of the rules, and the rules themselves, | |||
| or replaced with a description of the definitional technique. It may | appears in [IDNA2008-Tables]. | |||
| be useful to more more of this material to Tables as part of moving | ||||
| the rules from Protocol to Tables.]] | ||||
| 6.1.2. DISALLOWED | 5.1.2. DISALLOWED | |||
| Some characters are sufficiently problematic for use in IDNs that | Some characters are sufficiently problematic for use in IDNs that | |||
| they should be excluded for both registration and lookup (i.e., | they should be excluded for both registration and lookup (i.e., IDNA- | |||
| conforming applications performing name resolution should verify that | conforming applications performing name lookup should verify that | |||
| these characters are absent; if they are present, the label strings | these characters are absent; if they are present, the label strings | |||
| should be rejected rather than converted to A-labels and looked up. | should be rejected rather than converted to A-labels and looked up. | |||
| Of course, this category would include code points that had been | Of course, this category would include code points that had been | |||
| removed entirely from Unicode should such removals ever occur. | removed entirely from Unicode should such removals ever occur. | |||
| Characters that are placed in the "DISALLOWED" category are expected | Characters that are placed in the "DISALLOWED" category are expected | |||
| to never be removed from it or reclassified. If a character is | to never be removed from it or reclassified. If a character is | |||
| classified as "DISALLOWED" in error and the error is sufficiently | classified as "DISALLOWED" in error and the error is sufficiently | |||
| problematic, the only recourse would be either to introduce a new | problematic, the only recourse would be either to introduce a new | |||
| code point into Unicode and classify it as "PROTOCOL-VALID" or for | code point into Unicode and classify it as "PROTOCOL-VALID" or for | |||
| the IETF to accept the considerable costs of an incompatible change | the IETF to accept the considerable costs of an incompatible change | |||
| and replace the relevant RFC with one containing appropriate | and replace the relevant RFC with one containing appropriate | |||
| exceptions. | exceptions. | |||
| [[anchor24: Note in Draft: the permanence of DISALLOWED was still | [[anchor23: Note in Draft: the permanence of DISALLOWED was still | |||
| under discussion in the WG when this draft was posted. The text | under discussion in the WG when this draft was posted. The text | |||
| above reflects the editor's opinion about the emerging consensus but | above reflects the editor's opinion about the emerging consensus but | |||
| is subject to change as the discussion continues.]] | is subject to change as the discussion continues.]] | |||
| There is provision for exception cases but, in general, characters | There is provision for exception cases but, in general, characters | |||
| are placed into "DISALLOWED" if they fall into one or more of the | are placed into "DISALLOWED" if they fall into one or more of the | |||
| following groups: | following groups: | |||
| o The character is a compatibility equivalent for another character. | o The character is a compatibility equivalent for another character. | |||
| In slightly more precise Unicode terms, application of | In slightly more precise Unicode terms, application of | |||
| normalization method NFKC to the character yields some other | normalization method NFKC to the character yields some other | |||
| character. | character. | |||
| o The character is an upper-case form or some other form that is | o The character is an upper-case form or some other form that is | |||
| mapped to another character by Unicode casefolding. | mapped to another character by Unicode casefolding. | |||
| o The character is a symbol or punctuation form or, more generally, | o The character is a symbol or punctuation form or, more generally, | |||
| something that is not a letter, digit, or a mark that is used to | something that is not a letter, digit, or a mark that is used to | |||
| form a letter or digit. | form a letter or digit. | |||
| 6.1.3. UNASSIGNED | 5.1.3. UNASSIGNED | |||
| For convenience in processing and table-building, code points that do | For convenience in processing and table-building, code points that do | |||
| not have assigned values in a given version of Unicode are treated as | not have assigned values in a given version of Unicode are treated as | |||
| belonging to a special UNASSIGNED category. Such code points MUST | belonging to a special UNASSIGNED category. Such code points MUST | |||
| NOT appear in labels to be registered or looked up. The category | NOT appear in labels to be registered or looked up. The category | |||
| differs from DISALLOWED in that code points are moved out of it by | differs from DISALLOWED in that code points are moved out of it by | |||
| the simple expedient of being assigned in a later version of Unicode | the simple expedient of being assigned in a later version of Unicode | |||
| (at which point, they are classified into one of the other categories | (at which point, they are classified into one of the other categories | |||
| as appropriate). | as appropriate). | |||
| 6.2. Registration Policy | 5.2. Registration Policy | |||
| While these recommendations cannot and should not define registry | While these recommendations cannot and should not define registry | |||
| policies, registries SHOULD develop and apply additional restrictions | policies, registries SHOULD develop and apply additional restrictions | |||
| to reduce confusion and other problems. For example, it is generally | to reduce confusion and other problems. For example, it is generally | |||
| believed that labels containing characters from more than one script | believed that labels containing characters from more than one script | |||
| are a bad practice although there may be some important exceptions to | are a bad practice although there may be some important exceptions to | |||
| that principle. Some registries may choose to restrict registrations | that principle. Some registries may choose to restrict registrations | |||
| to characters drawn from a very small number of scripts. For many | to characters drawn from a very small number of scripts. For many | |||
| scripts, the use of variant techniques such as those as described in | scripts, the use of variant techniques such as those as described in | |||
| [RFC3743] and [RFC4290], and illustrated for Chinese by the tables | [RFC3743] and [RFC4290], and illustrated for Chinese by the tables | |||
| described in RFC 4713 [RFC4713] may be helpful in reducing problems | described in RFC 4713 [RFC4713] may be helpful in reducing problems | |||
| that might be perceived by users. It is worth stressing that these | that might be perceived by users. It is worth stressing that these | |||
| principles of policy development and application apply at all levels | principles of policy development and application apply at all levels | |||
| of the DNS, not only, e.g., TLD registrations. | of the DNS, not only, e.g., TLD registrations and that even a | |||
| trivial, "anything permitted that is valid under the protocol" policy | ||||
| is helpful in that it helps users and application developers know | ||||
| what to expect.. | ||||
| 6.3. Layered Restrictions: Tables, Context, Registration, Applications | 5.3. Layered Restrictions: Tables, Context, Registration, Applications | |||
| The essence of the character rules in IDNA2008 is based on the | The essence of the character rules in IDNA2008 is based on the | |||
| realization that there is no magic bullet for any of the issues | realization that there is no magic bullet for any of the issues | |||
| associated with a multiscript DNS. Instead, the specifications | associated with a multiscript DNS. Instead, the specifications | |||
| define a variety of approaches that, together, constitute multiple | define a variety of approaches that, together, constitute multiple | |||
| lines of defense against ambiguity in identifiers and loss of | lines of defense against ambiguity in identifiers and loss of | |||
| referential integrity. The actual character tables are the first | referential integrity. The actual character tables are the first | |||
| mechanism, protocol rules about how those characters are applied or | mechanism, protocol rules about how those characters are applied or | |||
| restricted in context are the second, and those two in combination | restricted in context are the second, and those two in combination | |||
| constitute the limits of what can be done by a protocol alone. As | constitute the limits of what can be done by a protocol alone. As | |||
| discussed in the previous section (Section 6.2), registries are | discussed in the previous section (Section 5.2), registries are | |||
| expected to restrict what they permit to be registered, devising and | expected to restrict what they permit to be registered, devising and | |||
| using rules that are designed to optimize the balance between | using rules that are designed to optimize the balance between | |||
| confusion and risk on the one hand and maximum expressiveness in | confusion and risk on the one hand and maximum expressiveness in | |||
| mnemonics on the other. | mnemonics on the other. | |||
| In addition, there is an important role for user agents in warning | In addition, there is an important role for user agents in warning | |||
| against label forms that appear unreasonable given their knowledge of | against label forms that appear unreasonable given their knowledge of | |||
| local contexts and conventions. Of course, no approach based on | local contexts and conventions. Of course, no approach based on | |||
| naming or identifiers alone can protect against all threats. | naming or identifiers alone can protect against all threats. | |||
| [[anchor25: Note in Draft: the last sentence above basically | ||||
| duplicates a comment in Security Considerations. Is it worth having | ||||
| in both places??]] | ||||
| 7. Issues that Constrain Possible Solutions | 6. Issues that Constrain Possible Solutions | |||
| 7.1. Display and Network Order | 6.1. Display and Network Order | |||
| The correct treatment of domain names requires a clear distinction | The correct treatment of domain names requires a clear distinction | |||
| between Network Order (the order in which the code points are sent in | between Network Order (the order in which the code points are sent in | |||
| protocols) and Display Order (the order in which the code points are | protocols) and Display Order (the order in which the code points are | |||
| displayed on a screen or paper). The order of labels in a domain | displayed on a screen or paper). The order of labels in a domain | |||
| name that contains characters that are normally written right to left | name that contains characters that are normally written right to left | |||
| is discussed in [IDNA2008-Bidi]. In particular, there are questions | is discussed in [IDNA2008-Bidi]. In particular, there are questions | |||
| about the order in which labels are displayed if left to right and | about the order in which labels are displayed if left to right and | |||
| right to left labels are adjacent to each other, especially if there | right to left labels are adjacent to each other, especially if there | |||
| are also multiple consecutive appearances of one of the types. The | are also multiple consecutive appearances of one of the types. The | |||
| skipping to change at page 21, line 14 ¶ | skipping to change at page 20, line 43 ¶ | |||
| issues. | issues. | |||
| It should be obvious that any revision of IDNA, including the current | It should be obvious that any revision of IDNA, including the current | |||
| one, must be clear about the network (transmission on the wire) order | one, must be clear about the network (transmission on the wire) order | |||
| of characters in labels and for the labels in complete (fully- | of characters in labels and for the labels in complete (fully- | |||
| qualified) domain names. In order to prevent user confusion and, in | qualified) domain names. In order to prevent user confusion and, in | |||
| particular, to reduce the chances for inconsistent transcription of | particular, to reduce the chances for inconsistent transcription of | |||
| domain names from printed form, it is likely that some strong | domain names from printed form, it is likely that some strong | |||
| suggestions should be made about display order as well. | suggestions should be made about display order as well. | |||
| 7.2. Entry and Display in Applications | 6.2. Entry and Display in Applications | |||
| Applications can accept domain names using any character set or sets | Applications can accept domain names using any character set or sets | |||
| desired by the application developer or specified by the operating | desired by the application developer or specified by the operating | |||
| system, and can display domain names in any charset. That is, the | system, and can display domain names in any charset. That is, the | |||
| IDNA protocol does not affect the interface between users and | IDNA protocol does not affect the interface between users and | |||
| applications. | applications. | |||
| An IDNA-aware application can accept and display internationalized | An IDNA-aware application can accept and display internationalized | |||
| domain names in two formats: the internationalized character set(s) | domain names in two formats: the internationalized character set(s) | |||
| supported by the application (i.e., an appropriate local | supported by the application (i.e., an appropriate local | |||
| representation of a U-label), and as an A-label. Applications MAY | representation of a U-label), and as an A-label. Applications MAY | |||
| allow the display and user input of A-labels, but are encouraged to | allow the display of A-labels, but are encouraged to not do so except | |||
| not do so except as an interface for special purposes, possibly for | as an interface for special purposes, possibly for debugging, or to | |||
| debugging, or to cope with display limitations. A-labels are opaque | cope with display limitations. In general, they SHOULD allow, but | |||
| not encourage, user input of that label form. A-labels are opaque | ||||
| and ugly, and, where possible, should thus only be exposed to users | and ugly, and, where possible, should thus only be exposed to users | |||
| and in contexts in which they are absolutely needed. Because IDN | and in contexts in which they are absolutely needed. Because IDN | |||
| labels can be rendered either as the A-labels or U-labels, the | labels can be rendered either as A-labels or U-labels, the | |||
| application may reasonably have an option for the user to select the | application may reasonably have an option for the user to select the | |||
| preferred method of display; if it does, rendering the U-label should | preferred method of display; if it does, rendering the U-label should | |||
| normally be the default. | normally be the default. | |||
| Domain names are often stored and transported in many places. For | Domain names are often stored and transported in many places. For | |||
| example, they are part of documents such as mail messages and web | example, they are part of documents such as mail messages and web | |||
| pages. They are transported in many parts of many protocols, such as | pages. They are transported in many parts of many protocols, such as | |||
| both the control commands and the RFC 2822 body parts of SMTP, and | both the control commands and the RFC 2822 body parts of SMTP, and | |||
| the headers and the body content in HTTP. It is important to | the headers and the body content in HTTP. It is important to | |||
| remember that domain names appear both in domain name slots and in | remember that domain names appear both in domain name slots and in | |||
| skipping to change at page 22, line 17 ¶ | skipping to change at page 21, line 47 ¶ | |||
| transmitted using whatever character encoding and escape mechanism | transmitted using whatever character encoding and escape mechanism | |||
| the protocol or document format uses at that place. This provision | the protocol or document format uses at that place. This provision | |||
| is intended to prevent situations in which, e.g., UTF-8 domain names | is intended to prevent situations in which, e.g., UTF-8 domain names | |||
| appear embedded in text that is otherwise in some other character | appear embedded in text that is otherwise in some other character | |||
| coding. | coding. | |||
| All protocols that use domain name slots already have the capacity | All protocols that use domain name slots already have the capacity | |||
| for handling domain names in the ASCII charset. Thus, A-labels can | for handling domain names in the ASCII charset. Thus, A-labels can | |||
| inherently be handled by those protocols. | inherently be handled by those protocols. | |||
| 7.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | 6.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | |||
| Character Forms | Character Forms | |||
| Users often have expectations about character matching or equivalence | Users often have expectations about character matching or equivalence | |||
| that are based on their languages and the orthography of those | that are based on their languages and the orthography of those | |||
| languages. These expectations may not be consistent with forms or | languages. These expectations may not be consistent with forms or | |||
| actions that can be naturally accommodated in a character coding | actions that can be naturally accommodated in a character coding | |||
| system, especially if multiple languages are written using the same | system, especially if multiple languages are written using the same | |||
| script but using different conventions. A Norwegian user might | script but using different conventions. A Norwegian user might | |||
| expect a label with the ae-ligature to be treated as the same label | expect a label with the ae-ligature to be treated as the same label | |||
| as one using the Swedish spelling with a-umlaut even though applying | as one using the Swedish spelling with a-umlaut even though applying | |||
| skipping to change at page 23, line 31 ¶ | skipping to change at page 23, line 14 ¶ | |||
| current orthographic standards. | current orthographic standards. | |||
| That character (U+00E4) is also part of the German alphabet where, | That character (U+00E4) is also part of the German alphabet where, | |||
| unlike in the Nordic languages, the two-character sequence "ae" is | unlike in the Nordic languages, the two-character sequence "ae" is | |||
| usually treated as a fully acceptable alternate orthography for the | usually treated as a fully acceptable alternate orthography for the | |||
| "umlauted a" character. The inverse is however not true, and those | "umlauted a" character. The inverse is however not true, and those | |||
| two characters cannot necessarily be combined into an "umlauted a". | two characters cannot necessarily be combined into an "umlauted a". | |||
| This also applies to another German character, the "umlauted o" | This also applies to another German character, the "umlauted o" | |||
| (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, | (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, | |||
| cannot be used for writing the name of the author "Goethe". It is | cannot be used for writing the name of the author "Goethe". It is | |||
| also a letter in the Swedish alphabet where, in parallel to the | also a letter in the Swedish alphabet where, like the "umlauted a", | |||
| "umlauted a", it cannot be correctly represented as "oe" and in the | it cannot be correctly represented as "oe" and in the Norwegian | |||
| Norwegian alphabet, where it is represented, not as "umlauted o", but | alphabet, where it is represented, not as "umlauted o", but as | |||
| as "slashed o", U+00F8. | "slashed o", U+00F8. | |||
| Some of the ligatures that have explicit code points in Unicode were | Some of the ligatures that have explicit code points in Unicode were | |||
| given special handling in IDNA2003 and now pose additional problems | given special handling in IDNA2003 and now pose additional problems | |||
| as people argue that they should have been treated differently to | as people argue that they should have been treated differently to | |||
| preserve important information. For example, the German character | preserve important information. For example, the German character | |||
| Eszett (Sharp S, U+00DF) is retained as itself by NFKC but case- | Eszett (Sharp S, U+00DF) is retained as itself by NFKC but case- | |||
| folded by Stringprep to "ss", but the closely-related, but less | folded by Stringprep to "ss", but the closely-related, but less | |||
| frequently seen, character "Long S T" (U+FB05) is a compatibility | frequently seen, character "Long S T" (U+FB05) is a compatibility | |||
| character that is mapped out by NFKC. Unless exceptions are made, | character that is mapped out by NFKC. Unless exceptions are made, | |||
| both will be treated as DISALLOWED by IDNA2008. But there is | both will be treated as DISALLOWED by IDNA2008. But there is | |||
| significant interest in an exception, especially for Eszett. | significant interest in an exception, especially for Eszett. | |||
| Depending on what the exception was, making it would either raise | Depending on what the exception was, making it would either raise | |||
| some backward compatibility problems with IDNA2003 or create an | some backward compatibility problems with IDNA2003 or create an | |||
| unusual special case that would highlight differences in preferred | unusual special case that would highlight differences in preferred | |||
| orthography between German as written in Germany and German as | orthography between German as written in Germany and German as | |||
| written in some other countries, notably Switzerland. Additional | written in some other countries, notably Switzerland. Additional | |||
| discussion of issues with Eszett appear in Section 10.7. | discussion of issues with Eszett appear in Section 9.2.7. | |||
| Additional cases with alphabets written right to left are described | Additional cases with alphabets written right to left are described | |||
| in Section 7.5. | in Section 6.5. | |||
| Whether ligatures and digraphs are to be treated as a sequence of | Whether ligatures and digraphs are to be treated as a sequence of | |||
| characters or as a single standalone one constitute a problem that | characters or as a single standalone one constitute a problem that | |||
| cannot be resolved solely by operating on scripts. They are, | cannot be resolved solely by operating on scripts. They are, | |||
| however, a key concern in the IDN context. Their satisfactory | however, a key concern in the IDN context. Their satisfactory | |||
| resolution will require support in policies set by registries, which | resolution will require support in policies set by registries, which | |||
| therefore need to be particularly mindful not just of this specific | therefore need to be particularly mindful not just of this specific | |||
| issue, but of all other related matters that cannot be dealt with on | issue, but of all other related matters that cannot be dealt with on | |||
| an exclusively algorithmic basis. | an exclusively algorithmic basis. | |||
| skipping to change at page 24, line 33 ¶ | skipping to change at page 24, line 16 ¶ | |||
| combined characters in any special way. However, their existence | combined characters in any special way. However, their existence | |||
| provides a prime example of a situation in which a registry that is | provides a prime example of a situation in which a registry that is | |||
| aware of the language context in which labels are to be registered, | aware of the language context in which labels are to be registered, | |||
| and where that language sometimes (or always) treats the two- | and where that language sometimes (or always) treats the two- | |||
| character sequences as equivalent to the combined form, should give | character sequences as equivalent to the combined form, should give | |||
| serious consideration to applying a "variant" model [RFC3743] | serious consideration to applying a "variant" model [RFC3743] | |||
| [RFC4290] to reduce the opportunities for user confusion and fraud | [RFC4290] to reduce the opportunities for user confusion and fraud | |||
| that would result from the related strings being registered to | that would result from the related strings being registered to | |||
| different parties. | different parties. | |||
| 7.4. Case Mapping and Related Issues | 6.4. Case Mapping and Related Issues | |||
| Traditionally in the DNS, ASCII letters have been stored with their | Traditionally in the DNS, ASCII letters have been stored with their | |||
| case preserved. Matching during the query process has been case- | case preserved. Matching during the query process has been case- | |||
| independent, but none of the information that might be represented by | independent, but none of the information that might be represented by | |||
| choices of case has been lost. That model has been accidentally | choices of case has been lost. That model has been accidentally | |||
| helpful because, as people have created DNS labels by catenating | helpful because, as people have created DNS labels by catenating | |||
| words (or parts of words) to form labels, case has often been used to | words (or parts of words) to form labels, case has often been used to | |||
| distinguish among components and make the labels more memorable. | distinguish among components and make the labels more memorable. | |||
| The solution of keeping the characters separate but doing matching | The solution of keeping the characters separate but doing matching | |||
| skipping to change at page 25, line 13 ¶ | skipping to change at page 24, line 45 ¶ | |||
| permits, at the risk of some incompatibility, slightly more | permits, at the risk of some incompatibility, slightly more | |||
| flexibility in this area. That additional flexibility still does not | flexibility in this area. That additional flexibility still does not | |||
| solve the problem with final form sigma and other characters that | solve the problem with final form sigma and other characters that | |||
| Unicode treats as completely separate characters that match only | Unicode treats as completely separate characters that match only | |||
| under casemapping if at all. Many people now believe these should be | under casemapping if at all. Many people now believe these should be | |||
| handled as separate characters so information about them can be | handled as separate characters so information about them can be | |||
| preserved in the transformations to A-labels and back. However | preserved in the transformations to A-labels and back. However | |||
| making a change to permit that behavior would create a situation in | making a change to permit that behavior would create a situation in | |||
| which the same string, valid in both protocols, would be interpreted | which the same string, valid in both protocols, would be interpreted | |||
| differently by IDNA2003 and IDNA2008. In principle, that would | differently by IDNA2003 and IDNA2008. In principle, that would | |||
| violate one of the conditions discussed in Section 10.3.1 and hence | violate one of the conditions discussed in Section 9.2.3.1 and hence | |||
| require a prefix change. Of course, if a prefix change were made (at | require a prefix change. Of course, if a prefix change were made (at | |||
| the costs discussed in Section 10.3.3) there would be several | the costs discussed in Section 9.2.3.3) there would be several | |||
| options, including, if desired, assigning the characer to the | options, including, if desired, assigning the characer to the | |||
| CONTEXTUAL RULE REQUIRED category and requiring that it only be used | CONTEXTUAL RULE REQUIRED category and requiring that it only be used | |||
| in carefully-selected contexts. | in carefully-selected contexts. | |||
| 7.5. Right to Left Text | 6.5. Right to Left Text | |||
| In order to be sure that the directionality of right to left text is | In order to be sure that the directionality of right to left text is | |||
| unambiguous, IDNA2003 required that any label in which right to left | unambiguous, IDNA2003 required that any label in which right to left | |||
| characters appear both starts and ends with them, may not include any | characters appear both starts and ends with them, may not include any | |||
| characters with strong left to right properties (which excludes other | characters with strong left to right properties (which excludes other | |||
| alphabetic characters but permits European digits), and rejects any | alphabetic characters but permits European digits), and rejects any | |||
| other string that contains a right to left character. This is one of | other string that contains a right to left character. This is one of | |||
| the few places where the IDNA algorithms (both old and new) are | the few places where the IDNA algorithms (both old and new) are | |||
| required to look at an entire label, not just at individual | required to look at an entire label, not just at individual | |||
| characters. The algorithmic model used in IDNA2003 rejects the label | characters. The algorithmic model used in IDNA2003 rejects the label | |||
| skipping to change at page 25, line 44 ¶ | skipping to change at page 25, line 29 ¶ | |||
| This problem manifests itself in languages written with consonantal | This problem manifests itself in languages written with consonantal | |||
| alphabets to which diacritical vocalic systems are applied, and in | alphabets to which diacritical vocalic systems are applied, and in | |||
| languages with orthographies derived from them where the combining | languages with orthographies derived from them where the combining | |||
| marks may have different functionality. In both cases the combining | marks may have different functionality. In both cases the combining | |||
| marks can be essential components of the orthography. Examples of | marks can be essential components of the orthography. Examples of | |||
| this are Yiddish, written with an extended Hebrew script, and Dhivehi | this are Yiddish, written with an extended Hebrew script, and Dhivehi | |||
| (the official language of Maldives) which is written in the Thaana | (the official language of Maldives) which is written in the Thaana | |||
| script (which is, in turn, derived from the Arabic script). The new | script (which is, in turn, derived from the Arabic script). The new | |||
| rules for right to left scripts are described in [IDNA2008-Bidi]. | rules for right to left scripts are described in [IDNA2008-Bidi]. | |||
| 8. IDNs and the Robustness Principle | 7. IDNs and the Robustness Principle | |||
| The model of IDNs described in this document can be seen as a | The model of IDNs described in this document can be seen as a | |||
| particular instance of the "Robustness Principle" that has been so | particular instance of the "Robustness Principle" that has been so | |||
| important to other aspects of Internet protocol design. This | important to other aspects of Internet protocol design. This | |||
| principle is often stated as "Be conservative about what you send and | principle is often stated as "Be conservative about what you send and | |||
| liberal in what you accept" (See, e.g., RFC 1123, Section 1.2.2 | liberal in what you accept" (See, e.g., RFC 1123, Section 1.2.2 | |||
| [RFC1123]). For IDNs to work well, not only must the protocol be | [RFC1123]). For IDNs to work well, not only must the protocol be | |||
| carefully designed and implemented, but zone administrators | carefully designed and implemented, but zone administrators | |||
| (registries) must have and require sensible policies about what is | (registries) must have and require sensible policies about what is | |||
| registered -- conservative policies -- and implement and enforce | registered -- conservative policies -- and implement and enforce | |||
| them. | them. | |||
| Conversely, resolvers can (and SHOULD or maybe MUST) reject labels | Conversely, lookup applications can (and SHOULD or maybe MUST) reject | |||
| that clearly violate global (protocol) rules (no one has ever | labels that clearly violate global (protocol) rules (no one has ever | |||
| seriously claimed that being liberal in what is accepted requires | seriously claimed that being liberal in what is accepted requires | |||
| being stupid). However, once one gets past such global rules and | being stupid). However, once one gets past such global rules and | |||
| deals with anything sensitive to script or locale, it is necessary to | deals with anything sensitive to script or locale, it is necessary to | |||
| assume that garbage has not been placed into the DNS, i.e., one must | assume that garbage has not been placed into the DNS, i.e., one must | |||
| be liberal about what one is willing to look up in the DNS rather | be liberal about what one is willing to look up in the DNS rather | |||
| than guessing about whether it should have been permitted to be | than guessing about whether it should have been permitted to be | |||
| registered. | registered. | |||
| As mentioned elsewhere, if a string doesn't resolve, it makes no | As mentioned elsewhere, if a string cannot be successfully found in | |||
| the DNS after the lookup processing described here, it makes no | ||||
| difference whether it simply wasn't registered or was prohibited by | difference whether it simply wasn't registered or was prohibited by | |||
| some rule. | some rule. | |||
| If resolvers, as a user interface (UI) or other local matter, decide | If lookup applications, as a user interface (UI) or other local | |||
| to warn about some strings that are valid under the global rules but | matter, decide to warn about some strings that are valid under the | |||
| that they perceive as dangerous, that is their prerogative and we can | global rules but that they perceive as dangerous, that is their | |||
| only hope that the market (and maybe regulators) will reinforce the | prerogative and we can only hope that the market (and maybe | |||
| good choices and discourage the poor ones. In this context, a | regulators) will reinforce the good choices and discourage the poor | |||
| resolver that decides a string that is valid under the protocol is | ones. In this context, a lookup application that decides a string | |||
| dangerous and refuses to look it up is in violation of the protocols; | that is valid under the protocol is dangerous and refuses to look it | |||
| one that is willing to look something up, but warns against it, is | up is in violation of the protocols; one that is willing to look | |||
| exercising a local choice. | something up, but warns against it, is exercising a local choice. | |||
| 9. Front-end and User Interface Processing | 8. Front-end and User Interface Processing | |||
| Domain names may be identified and processed in many contexts. They | Domain names may be identified and processed in many contexts. They | |||
| may be typed in by users either by themselves or as part of URIs or | may be typed in by users either by themselves or as part of URIs or | |||
| IRIs. They may occur in running text or be processed by one system | IRIs. They may occur in running text or be processed by one system | |||
| after being provided in another. Systems may wish to try to | after being provided in another. Systems may wish to try to | |||
| normalize URLs so as to determine (or guess) whether a reference is | normalize URLs so as to determine (or guess) whether a reference is | |||
| valid or two references point to the same object without actually | valid or two references point to the same object without actually | |||
| looking the objects up and comparing them. Some of these goals may | looking the objects up and comparing them (that is necessary, not | |||
| be more easily and reliably satisfied than others. While there are | just a choice, for URI types that are not intended to be resolved). | |||
| strong arguments for any domain name that is placed "on the wire" -- | Some of these goals may be more easily and reliably satisfied than | |||
| transmitted between systems -- to be in the minimum-ambiguity forms | others. While there are strong arguments for any domain name that is | |||
| of A-labels, U-labels, or LDH-labels, it is inevitable that programs | placed "on the wire" -- transmitted between systems -- to be in the | |||
| that process domain names will encounter variant forms. One source | minimum-ambiguity forms of A-labels, U-labels, or LDH-labels, it is | |||
| of such forms will be labels created under IDNA2003. Because of the | inevitable that programs that process domain names will encounter | |||
| way that protocol was specified, there are a significant number of | variant forms. | |||
| domain names in files on the Internet that use characters that cannot | ||||
| be represented directly in domain names but for which interpretations | One source of such forms will be labels created under IDNA2003 | |||
| are provided. There are two major categories of such characters, | because that protocol allowed labels that were transformed before | |||
| those that are removed by NFKC normalization and those upper-case | they were turned from native-character into ACE ("xn--...") format. | |||
| characters that are mapped to lower-case (there are also a few | One consequence of the transformations was that, when the ToUnicode | |||
| characters that are given special-case mapping treatment in | and ToASCII operations of IDNA2003 were applied, | |||
| Stringprep). [[anchor29: The text above is a too obscure, but was | ToUnicode(ToASCII(original-label)) often did not produce the | |||
| intended to address the mapping differences between IDNA2003 and the | original-label. IDNA2008 explicitly defines A-labels and U-labels as | |||
| current proposal. Patrik suggests the following, which will need | different forms of the same abstract label, forms that are stable | |||
| some tuning before it can be inserted: One source of such forms will | when conversions are performed between them, without mappings. A | |||
| be labels created under IDNA2003 as some allowed labels where | different way of explaining this is that there are, today, domain | |||
| transformed before they where turned into its ascii (xn--) form so | names in files on the Internet that use characters that cannot be | |||
| that ToUnicode(ToASCII(label)) != label. This is why IDNA2008 | represented directly in, or recovered from, (A-label) domain names | |||
| explicitly define A-label and U-label being a form of the label that | but for which interpretations are provided by IDNA2003. There are | |||
| is stable when converting between A-label and U-label, without | two major categories of such characters, those that are removed by | |||
| mappings. A different way of explaining this is that there could be | NFKC normalization and those upper-case characters that are mapped to | |||
| already today domain names in files on the Internet that use | ||||
| characters that cannot be represented directly in domain names but | ||||
| for which interpretations are provided. There are two major | ||||
| categories of such characters, those that are removed by NFKC | ||||
| normalization and those upper-case characters that are mapped to | ||||
| lower-case (there are also a few characters that are given special- | lower-case (there are also a few characters that are given special- | |||
| case mapping treatment in Stringprep)."]] | case mapping treatment in Stringprep). | |||
| Other issues in domain name identification and processing arise | Other issues in domain name identification and processing arise | |||
| because IDNA2003 specified that several other characters be treated | because IDNA2003 specified that several other characters be treated | |||
| as equivalent to the ASCII period (dot, full stop) character used as | as equivalent to the ASCII period (dot, full stop) character used as | |||
| a label separator. If a domain name appears in an arbitrary context | a label separator. If a string that might be a domain name appears | |||
| (such as running text), it is difficult, even with only ASCII | in an arbitrary context (such as running text), it is difficult, even | |||
| characters, to know whether a domain name (or a protocol parameter | with only ASCII characters, to know whether an actual domain name (or | |||
| like a URI) is present and where it starts and ends. When using | a protocol parameter like a URI) is present and where it starts and | |||
| Unicode this gets even more difficult if treatment of certain special | ends. When using Unicode, this gets even more difficult if treatment | |||
| characters (like the dot that separates labels in a domain name) | of certain special characters (like the dot that separates labels in | |||
| depends on context. That problem occurs if the dot is part of a | a domain name) depends on context (e.g., prior knowledge of whether | |||
| domain name or not, which would mean that, contrary to common | the string represents a domain name or not). That knowledge is not | |||
| practice today, the primary heuristic for identifying a domain name | available if the primary heuristic for identifying the presence of | |||
| depends on dots separating strings with no intervening spaces. | domain names in strings depends on the presence of dots separating | |||
| [[anchor30: Above text is a substitute for an earlier (pre -01) | groups of characters with no intervening spaces. | |||
| [[anchor27: Above text is a substitute for an earlier (pre -01) | ||||
| version and is hoped to be more clear. Comments and improvements | version and is hoped to be more clear. Comments and improvements | |||
| welcome.]] | welcome.]] | |||
| As discussed elsewhere in this document, the IDNA2008 model removes | As discussed elsewhere in this document, the IDNA2008 model removes | |||
| all of these mappings and interpretations, including the equivalence | all of these mappings and interpretations, including the equivalence | |||
| of different forms of dots, from the protocol, leaving such mappings | of different forms of dots, from the protocol, discouraging such | |||
| to local processing. This should not be taken to imply that local | mappings and leaving them, when necessary, to local processing. This | |||
| processing is optional or can be avoided entirely. Instead, unless | should not be taken to imply that local processing is optional or can | |||
| the program context is such that it is known that any IDNs that | be avoided entirely. Instead, unless the program context is such | |||
| appear will be either U-labels or A-labels, some local processing of | that it is known that any IDNs that appear will be either U-labels or | |||
| apparent domain name strings will be required, both to maintain | A-labels, or that other forms can safely be rejected, some local | |||
| compatibility with IDNA2003 and to prevent user astonishment. Such | processing of apparent domain name strings will be required, both to | |||
| local processing, while not specified in this document or the | maintain compatibility with IDNA2003 and to prevent user | |||
| associated ones, will generally take one of two forms: | astonishment. Such local processing, while not specified in this | |||
| document or the associated ones, will generally take one of two | ||||
| forms: | ||||
| o Generic Preprocessing. | o Generic Preprocessing. | |||
| When the context in which the program or system that processes | When the context in which the program or system that processes | |||
| domain names operates is global, a reasonable balance must be | domain names operates is global, a reasonable balance must be | |||
| found that is sensitive to the broad range of local needs and | found that is sensitive to the broad range of local needs and | |||
| assumptions while, at the same time, not sacrificing the needs of | assumptions while, at the same time, not sacrificing the needs of | |||
| one language, script, or user population to those of another. | one language, script, or user population to those of another. | |||
| For this case, the best practice will usually be to apply NFKC and | For this case, the best practice will usually be to apply NFKC and | |||
| case-mapping (or, perhaps better yet, Stringprep itself), plus | case-mapping (or, perhaps better yet, Stringprep itself), plus | |||
| skipping to change at page 28, line 35 ¶ | skipping to change at page 28, line 18 ¶ | |||
| software will be highly localized for a particular environment and | software will be highly localized for a particular environment and | |||
| carefully adapted to the expectations of users in that | carefully adapted to the expectations of users in that | |||
| environment. The many discussions about using the Internet to | environment. The many discussions about using the Internet to | |||
| preserve and support local cultures suggest that these cases may | preserve and support local cultures suggest that these cases may | |||
| be more common in the future than they have been so far. | be more common in the future than they have been so far. | |||
| In these cases, we should avoid trying to tell implementers what | In these cases, we should avoid trying to tell implementers what | |||
| they should do, if only because they are quite likely (and for | they should do, if only because they are quite likely (and for | |||
| good reason) to ignore us. We would assume that they would map | good reason) to ignore us. We would assume that they would map | |||
| characters that the intuitions of their users would suggest be | characters that the intuitions of their users would suggest be | |||
| mapped. One can imagine switches about whether some sorts of | mapped and would hope that they would do that mapping as early as | |||
| mappings occur, warnings before applying them or, in a slightly | possible, storing A-label or U-label forms in files and | |||
| more extreme version of the approach taken in Internet Explorer | transporting only those forms between systems. One can imagine | |||
| version 7 (IE7), utterly refuse to handle "strange" characters at | switches about whether some sorts of mappings occur, warnings | |||
| all if they appear in U-label form. None of those local decisions | before applying them or, in a slightly more extreme version of the | |||
| are a threat to interoperability as long as (i) only U-labels and | approach taken in Internet Explorer version 7 (IE7), systems that | |||
| utterly refuse to handle "strange" characters at all if they | ||||
| appear in U-label form. None of those local decisions are a | ||||
| threat to interoperability as long as (i) only U-labels and | ||||
| A-labels are used in interchange with systems outside the local | A-labels are used in interchange with systems outside the local | |||
| environment, (ii) no character that would be valid in a U-label as | environment, (ii) no character that would be valid in a U-label as | |||
| itself is mapped to something else, (iii) any local mappings are | itself is mapped to something else, (iii) any local mappings are | |||
| applied as a preprocessing step (or, for conversions from U-labels | applied as a preprocessing step (or, for conversions from U-labels | |||
| or A-labels to presentation forms, postprocessing), not as part of | or A-labels to presentation forms, postprocessing), not as part of | |||
| IDNA processing proper, and (iv) appropriate consideration is | IDNA processing proper, and (iv) appropriate consideration is | |||
| given to labels that might have entered the environment in | given to labels that might have entered the environment in | |||
| conformance to IDNA2003. [[anchor31: Placeholder: there have been | conformance to IDNA2003. [[anchor28: Placeholder: there have been | |||
| suggestions that this text be removed entirely. Comments (or | suggestions that this text be removed entirely. Comments (or | |||
| improved text) welcome.]] | improved text) welcome.]] | |||
| 10. Migration and Version Synchronization | In either case, it is vital that user interface designs and, where | |||
| the interfaces are not sufficient, users, be aware that the only | ||||
| forms of domain names that this protocol anticipates will resolve | ||||
| globally or compare equal when crude methods (i.e., those not | ||||
| conforming to Section 1.5.4.4) are used are those in which all | ||||
| native-script labels are in U-label form. Forms that assume mapping | ||||
| will occur, especially forms that were not valid under IDNA2003, may | ||||
| or may not function in predictable ways across all implementations. | ||||
| 10.1. Design Criteria | 9. Relationship to IDNA2003 and Earlier Versions of Unicode | |||
| 9.1. Summary of Major Changes from IDNA2003 | ||||
| 1. Update base character set from Unicode 3.2 to Unicode version- | ||||
| agnostic. | ||||
| 2. Separate the definitions for the "registration" and "lookup" | ||||
| activities. | ||||
| 3. Disallow symbol and punctuation characters except where special | ||||
| exceptions are necessary. | ||||
| 4. Remove the mapping and normalization steps from the protocol and | ||||
| have them instead done by the applications themselves, possibly | ||||
| in a local fashion, before invoking the protocol. | ||||
| 5. Change the way that the protocol specifies which characters are | ||||
| allowed in labels from "humans decide what the table of | ||||
| codepoints contains" to "decision about codepoints are based on | ||||
| Unicode properties plus a small exclusion list created by | ||||
| humans". | ||||
| 6. Introduce the new concept of characters that can be used only in | ||||
| specific contexts. | ||||
| 7. Allow typical words and names in languages such as Dhivehi and | ||||
| Yiddish to be expressed. | ||||
| 8. Make bidirectional domain names (delimited strings of labels, | ||||
| not just labels standing on their own) display in a non- | ||||
| surprising fashion whether they appear in obvious domain name | ||||
| contexts or as part of running text in paragraphs. | ||||
| 9. Remove the dot separator from the mandatory part of the | ||||
| protocol. | ||||
| 10. Make some currently-valid labels that are not actually IDNA | ||||
| labels invalid. | ||||
| 9.2. Migration and Version Synchronization | ||||
| 9.2.1. Design Criteria | ||||
| As mentioned above and in RFC 4690, two key goals of this work are to | As mentioned above and in RFC 4690, two key goals of this work are to | |||
| enable applications to be agnostic about whether they are being run | enable applications to be agnostic about whether they are being run | |||
| in environments supporting any Unicode version from 3.2 onward and to | in environments supporting any Unicode version from 3.2 onward and to | |||
| permit incrementally adding permitted scripts and other character | permit incrementally adding permitted scripts and other character | |||
| collections without disruption or, subsequent to this version, | collections without disruption or, subsequent to this version, | |||
| "heavy" processes such as formation of an IETF WG. The mechanisms | "heavy" processes such as formation of an IETF WG. The mechanisms | |||
| that support this are outlined above, but this section reviews them | that support this are outlined above, but this section reviews them | |||
| in a context that may be more helpful to those who need to understand | in a context that may be more helpful to those who need to understand | |||
| the approach and make plans for it. | the approach and make plans for it. | |||
| 10.1.1. General IDNA Validity Criteria | 9.2.1.1. General IDNA Validity Criteria | |||
| The general criteria for a putative label, and the collection of | The general criteria for a putative label, and the collection of | |||
| characters that make it up, to be considered IDNA-valid are: | characters that make it up, to be considered IDNA-valid are: | |||
| o The characters are "letters", marks needed to form letters, | o The characters are "letters", marks needed to form letters, | |||
| numerals, or other code points used to write words in some | numerals, or other code points used to write words in some | |||
| language. Symbols, drawing characters, and various notational | language. Symbols, drawing characters, and various notational | |||
| characters are permanently excluded -- some because they are | characters are permanently excluded -- some because they are | |||
| actively dangerous in URI, IRI, or similar contexts and others | actively dangerous in URI, IRI, or similar contexts and others | |||
| because there is no evidence that they are important enough to | because there is no evidence that they are important enough to | |||
| Internet operations or internationalization to justify inclusion | Internet operations or internationalization to justify inclusion | |||
| and the complexities that would come with it (additional | and the complexities that would come with it (additional | |||
| discussion and rationale for the symbol decision appears in | discussion and rationale for the symbol decision appears in | |||
| Section 10.5). | Section 9.2.5). | |||
| o Other than in very exceptional cases, e.g., where they are needed | o Other than in very exceptional cases, e.g., where they are needed | |||
| to write substantially any word of a given language, punctuation | to write substantially any word of a given language, punctuation | |||
| characters are excluded as well. The fact that a word exists is | characters are excluded as well. The fact that a word exists is | |||
| not proof that it should be usable in a DNS label and DNS labels | not proof that it should be usable in a DNS label and DNS labels | |||
| are not expected to be usable for multiple-word phrases (although | are not expected to be usable for multiple-word phrases (although | |||
| they are certainly not prohibited if the conventions and | they are certainly not prohibited if the conventions and | |||
| orthography of a particular language cause that to be possible). | orthography of a particular language cause that to be possible). | |||
| Even for English, very common constructions -- contractions like | Even for English, very common constructions -- contractions like | |||
| "don't" or "it's", names that are written with apostrophes such as | "don't" or "it's", names that are written with apostrophes such as | |||
| "O'Reilly" or characters for which apostrophes are common | "O'Reilly" or characters for which apostrophes are common | |||
| substitutes, and words whose usually-preferred spellings retain | substitutes, and words whose usually-preferred spellings retain | |||
| diacritical marks from earlier forms -- cannot be represented in | diacritical marks from earlier forms -- cannot be represented in | |||
| DNS labels. | DNS labels. | |||
| o Characters that are unassigned (have no character assignment at | o Characters that are unassigned (have no character assignment at | |||
| all) in the version of Unicode being used by the registry or | all) in the version of Unicode being used by the registry or | |||
| application are not permitted, even on resolution (lookup). There | application are not permitted, even on lookup. There are at least | |||
| are at least two reasons for this. Tests involving the context of | two reasons for this. Tests involving the context of characters | |||
| characters (e.g., some characters being permitted only adjacent to | (e.g., some characters being permitted only adjacent to ones of | |||
| ones of specific types but otherwise invisible or very problematic | specific types but otherwise invisible or very problematic for | |||
| for other reasons) and integrity tests on complete labels are | other reasons) and integrity tests on complete labels are needed. | |||
| needed. Unassigned code points cannot be permitted because one | Unassigned code points cannot be permitted because one cannot | |||
| cannot determine whether particular code points will require | determine whether particular code points will require contextual | |||
| contextual rules (and what those rules should be) before | rules (and what those rules should be) before characters are | |||
| characters are assigned to them and the properties of those | assigned to them and the properties of those characters fully | |||
| characters fully understood. Second, Unicode specifies that an | understood. Second, Unicode specifies that an unassigned code | |||
| unassigned code point normalizes and case folds to itself. If the | point normalizes and case folds to itself. If the code point is | |||
| code point is later assigned to a character, and particularly if | later assigned to a character, and particularly if the newly- | |||
| the newly-assigned code point has a combining class that | assigned code point has a combining class that determines its | |||
| determines its placement relative to other combining characters, | placement relative to other combining characters, it could | |||
| it could normalize to some other code point or sequence, creating | normalize to some other code point or sequence, creating confusion | |||
| confusion and/or violating other rules listed here. | and/or violating other rules listed here. | |||
| o Any character that is mapped to another character by Nameprep2003 | o Any character that is mapped to another character by Nameprep2003 | |||
| or by a current version of NFKC is prohibited as input to IDNA | or by a current version of NFKC is prohibited as input to IDNA | |||
| (for either registration or resolution). Implementers of user | (for either registration or lookup). Implementers of user | |||
| interfaces to applications are free to make those conversions when | interfaces to applications are free to make those conversions when | |||
| they consider them suitable for their operating system | they consider them suitable for their operating system | |||
| environments, context, or users. | environments, context, or users. | |||
| Tables used to identify the characters that are IDNA-valid are | Tables used to identify the characters that are IDNA-valid are | |||
| expected to be driven by the principles above (described in more | expected to be driven by the principles above (described in more | |||
| precise form in [IDNA2008-Tables]). The principles are not just an | precise form in [IDNA2008-Tables]). The principles are not just an | |||
| interpretation of the tables. | interpretation of the tables. | |||
| 10.1.2. Labels in Registration | 9.2.1.2. Labels in Registration | |||
| Anyone entering a label into a DNS zone must properly validate that | Anyone entering a label into a DNS zone must properly validate that | |||
| label -- i.e., be sure that the criteria for that label are met -- in | label -- i.e., be sure that the criteria for that label are met -- in | |||
| order for applications to work as intended. This principle is not | order for applications to work as intended. This principle is not | |||
| new: for example, zone administrators are expected to verify that | new: for example, zone administrators are expected to verify that | |||
| names meet "hostname" [RFC0952] or special service location formats | names meet "hostname" [RFC0952] or special service location formats | |||
| [RFC2782] where necessary for the expected applications. For zones | [RFC2782] where necessary for the expected applications. For zones | |||
| that will contain IDNs, support for Unicode version-independence | that will contain IDNs, support for Unicode version-independence | |||
| requires restrictions on all strings placed in the zone. In | requires restrictions on all strings placed in the zone. In | |||
| particular, for such zones: | particular, for such zones: | |||
| o Any label that appears to be an A-label, i.e., any label that | o Any label that appears to be an A-label, i.e., any label that | |||
| starts in "xn--", MUST be IDNA-valid, i.e., that they MUST be | starts in "xn--", MUST be IDNA-valid, i.e., that they MUST be | |||
| valid A-labels, as discussed in Section 3 above. | valid A-labels, as discussed in Section 2 above. | |||
| o The Unicode tables (i.e., tables of code points, character | o The Unicode tables (i.e., tables of code points, character | |||
| classes, and properties) and IDNA tables (i.e., tables of | classes, and properties) and IDNA tables (i.e., tables of | |||
| contextual rules such as those described above), MUST be | contextual rules such as those described above), MUST be | |||
| consistent on the systems performing or validating labels to be | consistent on the systems performing or validating labels to be | |||
| registered. Note that this does not require that tables reflect | registered. Note that this does not require that tables reflect | |||
| the latest version of Unicode, only that all tables used on a | the latest version of Unicode, only that all tables used on a | |||
| given system are consistent with each other. | given system are consistent with each other. | |||
| [[anchor33: Note in draft: the above text was changed significantly | [[anchor31: Note in draft: the above text was changed significantly | |||
| between -00 and -01 to clearly restrict its scope to zones supporting | between -00 and -01 to clearly restrict its scope to zones supporting | |||
| IDNA and to eliminate comments about labels containing "--" in the | IDNA and to eliminate comments about labels containing "--" in the | |||
| third and forth positions but with different prefixes. There appears | third and forth positions but with different prefixes. There appears | |||
| to be consensus that more extensive rules belong in a "best | to be consensus that more extensive rules belong in a "best | |||
| practices" document about appropriate DNS labels, but that document | practices" document about appropriate DNS labels, but that document | |||
| is not in-scope for the IDNABIS WG.]] | is not in-scope for the IDNABIS WG.]] | |||
| Under this model, a registry (or entity communicating with a registry | Under this model, a registry (or entity communicating with a registry | |||
| to accomplish name registrations) will need to update its tables -- | to accomplish name registrations) will need to update its tables -- | |||
| both the Unicode-associated tables and the tables of permitted IDN | both the Unicode-associated tables and the tables of permitted IDN | |||
| characters -- to enable a new script or other set of new characters. | characters -- to enable a new script or other set of new characters. | |||
| It will not be affected by newer versions of Unicode, or newly- | It will not be affected by newer versions of Unicode, or newly- | |||
| authorized characters, until and unless it wishes to make those | authorized characters, until and unless it wishes to make those | |||
| registrations. The registration side is also responsible --under the | registrations. The registration side is also responsible --under the | |||
| protocol and to registrants and users-- for much more careful | protocol and to registrants and users-- for much more careful | |||
| checking than is expected of applications systems that look names up, | checking than is expected of applications systems that look names up, | |||
| both checking as required by the protocol and checking required by | both checking as required by the protocol and checking required by | |||
| whatever policies it develops for minimizing risks due to confusable | whatever policies it develops for minimizing risks due to confusable | |||
| characters and sequences and preserving language or script integrity. | characters and sequences and preserving language or script integrity. | |||
| Systems looking up or resolving DNS labels, especially IDN DNS | Systems looking up or resolving DNS labels, especially IDN DNS | |||
| labels, MUST be able to assume that applicable registration rules | labels, MUST be able to assume that applicable registration rules | |||
| were followed for names entered into the DNS. | were followed for names entered into the DNS. | |||
| 10.1.3. Labels in Resolution (Lookup) | 9.2.1.3. Labels in Lookup | |||
| Anyone looking up a label in a DNS zone | Anyone looking up a label in a DNS zone | |||
| o MUST maintain a consistent set of tables, as discussed above. As | o MUST maintain a consistent set of tables, as discussed above. As | |||
| with registration, the tables need not reflect the latest version | with registration, the tables need not reflect the latest version | |||
| of Unicode but they MUST be consistent. | of Unicode but they MUST be consistent. | |||
| o MUST validate the characters in labels to be looked up only to the | o MUST validate the characters in labels to be looked up only to the | |||
| extent of determining that the U-label does not contain either | extent of determining that the U-label does not contain either | |||
| code points prohibited by IDNA (categorized as "DISALLOWED") or | code points prohibited by IDNA (categorized as "DISALLOWED") or | |||
| skipping to change at page 32, line 10 ¶ | skipping to change at page 32, line 46 ¶ | |||
| combining marks, that the "bidi" conditions are met if right to | combining marks, that the "bidi" conditions are met if right to | |||
| left characters appear, that any required contextual rules are | left characters appear, that any required contextual rules are | |||
| available and that, if such rules are associated with Joiner | available and that, if such rules are associated with Joiner | |||
| Controls, they are tested. | Controls, they are tested. | |||
| o MUST NOT validate other contextual rules about characters, | o MUST NOT validate other contextual rules about characters, | |||
| including mixed-script label prohibitions, although such rules MAY | including mixed-script label prohibitions, although such rules MAY | |||
| be used to influence presentation decisions in the user interface. | be used to influence presentation decisions in the user interface. | |||
| By avoiding applying its own interpretation of which labels are valid | By avoiding applying its own interpretation of which labels are valid | |||
| as a means of rejecting lookup attempts, the resolver application | as a means of rejecting lookup attempts, the lookup application | |||
| becomes less sensitive to version incompatibilities with the | becomes less sensitive to version incompatibilities with the | |||
| particular zone registry associated with the domain name. | particular zone registry associated with the domain name. | |||
| An application or client that looks names up in the DNS will be able | An application or client that looks processes names according to this | |||
| to resolve any name that is validly registered, as long as its | protocol and then resolves them in the DNS will be able to locate any | |||
| version of the Unicode-associated tables is sufficiently up-to-date | name that is validly registered, as long as its version of the | |||
| to interpret all of the characters in the label. It SHOULD | Unicode-associated tables is sufficiently up-to-date to interpret all | |||
| distinguish, in its messages to users, between "label contains an | of the characters in the label. It SHOULD distinguish, in its | |||
| unallocated code point" and other types of lookup failures. A | messages to users, between "label contains an unallocated code point" | |||
| failure on the basis of an old version of Unicode may lead the user | and other types of lookup failures. A failure on the basis of an old | |||
| to a desire to upgrade to a newer version, but will have no other ill | version of Unicode may lead the user to a desire to upgrade to a | |||
| effects (this is consistent with behavior in the transition to the | newer version, but will have no other ill effects (this is consistent | |||
| DNS when some hosts could not yet handle some forms of names or | with behavior in the transition to the DNS when some hosts could not | |||
| record types). | yet handle some forms of names or record types). | |||
| 10.2. More Flexibility in User Agents | 9.2.2. More Flexibility in User Agents | |||
| These specifications do not perform mappings between one character or | These specifications do not perform mappings between one character or | |||
| code point and others for any reason. Instead, they prohibits the | code point and others for any reason. Instead, they prohibits the | |||
| characters that would be mapped to others by normalization, case | characters that would be mapped to others by normalization, case | |||
| folding, or other rules. As examples, while mathematical characters | folding, or other rules. As examples, while mathematical characters | |||
| based on Latin ones are accepted as input to IDNA2003, they are | based on Latin ones are accepted as input to IDNA2003, they are | |||
| prohibited in IDNA2008. Similarly, double-width characters and other | prohibited in IDNA2008. Similarly, double-width characters and other | |||
| variations are prohibited as IDNA input. | variations are prohibited as IDNA input. | |||
| Since the rules in [IDNA2008-Tables] provide that only strings that | Since the rules in [IDNA2008-Tables] provide that only strings that | |||
| are stable under NFKC are valid, if it is convenient for an | are stable under NFKC are valid, if it is convenient for an | |||
| application to perform NFKC normalization before lookup, that | application to perform NFKC normalization before lookup, that | |||
| operation is safe since this will never make the application unable | operation is safe since this will never make the application unable | |||
| to look up any valid string. | to look up any valid string. | |||
| In many cases these prohibitions should have no effect on what the | In many cases these prohibitions should have no effect on what the | |||
| user can type at resolution time. It is perfectly reasonable for | user can type as input to the lookup process. It is perfectly | |||
| systems that support user interfaces to perform some character | reasonable for systems that support user interfaces to perform some | |||
| mapping that is appropriate to the local environment. This would | character mapping that is appropriate to the local environment. This | |||
| normally be done prior to actual invocation of IDNA. At least | would normally be done prior to actual invocation of IDNA. At least | |||
| conceptually, the mapping would be part of the Unicode conversions | conceptually, the mapping would be part of the Unicode conversions | |||
| discussed above and in [IDNA2008-Protocol]. However, those changes | discussed above and in [IDNA2008-Protocol]. However, those changes | |||
| will be local ones only -- local to environments in which users will | will be local ones only -- local to environments in which users will | |||
| clearly understand that the character forms are equivalent. For use | clearly understand that the character forms are equivalent. For use | |||
| in interchange among systems, it appears to be much more important | in interchange among systems, it appears to be much more important | |||
| that U-labels and A-labels can be mapped back and forth without loss | that U-labels and A-labels can be mapped back and forth without loss | |||
| of information. | of information. | |||
| One specific, and very important, instance of this strategy arises | One specific, and very important, instance of this strategy arises | |||
| with case-folding. In the ASCII-only DNS, names are looked up and | with case-folding. In the ASCII-only DNS, names are looked up and | |||
| skipping to change at page 33, line 37 ¶ | skipping to change at page 34, line 27 ¶ | |||
| uppercase(lowercase(character)). That requirement may not be | uppercase(lowercase(character)). That requirement may not be | |||
| satisfied with IDNs. The relationship between upper case and lower | satisfied with IDNs. The relationship between upper case and lower | |||
| case may even be language-dependent, with different languages (or | case may even be language-dependent, with different languages (or | |||
| even the same language in different areas) expecting different | even the same language in different areas) expecting different | |||
| mappings. Of course, the expectations of users who are accustomed to | mappings. Of course, the expectations of users who are accustomed to | |||
| a case-insensitive DNS environment will probably be well-served if | a case-insensitive DNS environment will probably be well-served if | |||
| user agents perform case mapping prior to IDNA processing, but the | user agents perform case mapping prior to IDNA processing, but the | |||
| IDNA procedures themselves should neither require such mapping nor | IDNA procedures themselves should neither require such mapping nor | |||
| expect them when they are not natural to the localized environment. | expect them when they are not natural to the localized environment. | |||
| 10.3. The Question of Prefix Changes | 9.2.3. The Question of Prefix Changes | |||
| The conditions that would require a change in the IDNA "prefix" | The conditions that would require a change in the IDNA "prefix" | |||
| ("xn--" for the version of IDNA specified in [RFC3490]) have been a | ("xn--" for the version of IDNA specified in [RFC3490]) have been a | |||
| great concern to the community. A prefix change would clearly be | great concern to the community. A prefix change would clearly be | |||
| necessary if the algorithms were modified in a manner that would | necessary if the algorithms were modified in a manner that would | |||
| create serious ambiguities during subsequent transition in | create serious ambiguities during subsequent transition in | |||
| registrations. This section summarizes our conclusions about the | registrations. This section summarizes our conclusions about the | |||
| conditions under which changes in prefix would be necessary and the | conditions under which changes in prefix would be necessary and the | |||
| implications of such a change. | implications of such a change. | |||
| 10.3.1. Conditions Requiring a Prefix Change | 9.2.3.1. Conditions Requiring a Prefix Change | |||
| An IDN prefix change is needed if a given string would resolve or | An IDN prefix change is needed if a given string would be looked up | |||
| otherwise be interpreted differently depending on the version of the | or otherwise interpreted differently depending on the version of the | |||
| protocol or tables being used. Consequently, work to update IDNs | protocol or tables being used. Consequently, work to update IDNs | |||
| would require a prefix change if, and only if, one of the following | would require a prefix change if, and only if, one of the following | |||
| four conditions were met: | four conditions were met: | |||
| 1. The conversion of an A-label to Unicode (i.e., a U-label) yields | 1. The conversion of an A-label to Unicode (i.e., a U-label) yields | |||
| one string under IDNA2003 (RFC3490) and a different string under | one string under IDNA2003 (RFC3490) and a different string under | |||
| IDNA2008. | IDNA2008. | |||
| 2. An input string that is valid under IDNA2003 and also valid under | 2. An input string that is valid under IDNA2003 and also valid under | |||
| IDNA2008 yields two different A-labels with the different | IDNA2008 yields two different A-labels with the different | |||
| versions of IDNA. This condition is believed to be essentially | versions of IDNA. This condition is believed to be essentially | |||
| equivalent to the one above. | equivalent to the one above. | |||
| Note, however, that if the input string is valid under one | Note, however, that if the input string is valid under one | |||
| version and not valid under the other, this condition does not | version and not valid under the other, this condition does not | |||
| apply. See the first item in Section 10.3.2, below. | apply. See the first item in Section 9.2.3.2, below. | |||
| 3. A fundamental change is made to the semantics of the string that | 3. A fundamental change is made to the semantics of the string that | |||
| is inserted in the DNS, e.g., if a decision were made to try to | is inserted in the DNS, e.g., if a decision were made to try to | |||
| include language or specific script information in that string, | include language or specific script information in that string, | |||
| rather than having it be just a string of characters. | rather than having it be just a string of characters. | |||
| 4. A sufficiently large number of characters is added to Unicode so | 4. A sufficiently large number of characters is added to Unicode so | |||
| that the Punycode mechanism for block offsets no longer has | that the Punycode mechanism for block offsets no longer has | |||
| enough capacity to reference the higher-numbered planes and | enough capacity to reference the higher-numbered planes and | |||
| blocks. This condition is unlikely even in the long term and | blocks. This condition is unlikely even in the long term and | |||
| certain not to arise in the next few years. | certain not to arise in the next few years. | |||
| 10.3.2. Conditions Not Requiring a Prefix Change | 9.2.3.2. Conditions Not Requiring a Prefix Change | |||
| In particular, as a result of the principles described above, none of | In particular, as a result of the principles described above, none of | |||
| the following changes require a new prefix: | the following changes require a new prefix: | |||
| 1. Prohibition of some characters as input to IDNA. This may make | 1. Prohibition of some characters as input to IDNA. This may make | |||
| names that are now registered inaccessible, but does not require | names that are now registered inaccessible, but does not require | |||
| a prefix change. | a prefix change. | |||
| 2. Adjustments in Stringprep tables or IDNA actions, including | 2. Adjustments in Stringprep tables or IDNA actions, including | |||
| normalization definitions, that affect characters that were | normalization definitions, that affect characters that were | |||
| already invalid under IDNA2003. | already invalid under IDNA2003. | |||
| 3. Changes in the style of definitions of Stringprep or Nameprep | 3. Changes in the style of definitions of Stringprep or Nameprep | |||
| that do not alter the actions performed by them. | that do not alter the actions performed by them. | |||
| Of course, because these specifications do not involve changes to | Of course, because these specifications do not involve changes to | |||
| Stringprep or Nameprep, the third condition above and part of the | Stringprep or Nameprep, the third condition above and part of the | |||
| second are moot. | second are moot. | |||
| 10.3.3. Implications of Prefix Changes | 9.2.3.3. Implications of Prefix Changes | |||
| While it might be possible to make a prefix change, the costs of such | While it might be possible to make a prefix change, the costs of such | |||
| a change are considerable. Even if they wanted to do so, all | a change are considerable. Even if they wanted to do so, all | |||
| registries could not convert all IDNA2003 ("xn--") registrations to a | registries could not convert all IDNA2003 ("xn--") registrations to a | |||
| new form at the same time and synchronize that change with | new form at the same time and synchronize that change with | |||
| applications supporting lookup. Unless all existing registrations | applications supporting lookup. Unless all existing registrations | |||
| were simply to be declared invalid, and perhaps even then, systems | were simply to be declared invalid, and perhaps even then, systems | |||
| that needed to support both labels with old prefixes and labels with | that needed to support both labels with old prefixes and labels with | |||
| new ones would first process a putative label under the IDNA2008 | new ones would first process a putative label under the IDNA2008 | |||
| rules and try to look it up and then, if it were not found, would | rules and try to look it up and then, if it were not found, would | |||
| process the label under IDNA2003 rules and look it up again. That | process the label under IDNA2003 rules and look it up again. That | |||
| process could significantly slow down all processing that involved | process could significantly slow down all processing that involved | |||
| IDNs in the DNS especially since, in principle, a fully-qualified | IDNs in the DNS especially since, in principle, a fully-qualified | |||
| name could contain a mixture of labels that were registered with the | name could contain a mixture of labels that were registered with the | |||
| old and new prefixes, a situation that would make the use of DNS | old and new prefixes, a situation that would make the use of DNS | |||
| caching very difficult. In addition, looking up the same input | caching very difficult. In addition, looking up the same input | |||
| string as two separate A-labels would create some potential for | string as two separate A-labels would create some potential for | |||
| confusion and attacks, since they could, in principle, resolve to | confusion and attacks, since they could, in principle, map to | |||
| different targets. | different targets and then resolve to different DNS label nodes. | |||
| Consequently, a prefix change is to be avoided if at all possible, | Consequently, a prefix change is to be avoided if at all possible, | |||
| even if it means accepting some IDNA2003 decisions about character | even if it means accepting some IDNA2003 decisions about character | |||
| distinctions as irreversible. | distinctions as irreversible. | |||
| 10.4. Stringprep Changes and Compatibility | 9.2.4. Stringprep Changes and Compatibility | |||
| Concerns have been expressed about problems for non-DNS uses of | Concerns have been expressed about problems for non-DNS uses of | |||
| Stringprep being caused by changes to the specification intended to | Stringprep being caused by changes to the specification intended to | |||
| improve the handling of IDNs, most notably as this might affect | improve the handling of IDNs, most notably as this might affect | |||
| identification and authentication protocols. Section 10.3, above, | identification and authentication protocols. Section 9.2.3, above, | |||
| essentially also applies in this context. The proposed new inclusion | essentially also applies in this context. The proposed new inclusion | |||
| tables [IDNA2008-Tables], the reduction in the number of characters | tables [IDNA2008-Tables], the reduction in the number of characters | |||
| permitted as input for registration or resolution (Section 6), and | permitted as input for registration or lookup (Section 5), and even | |||
| even the proposed changes in handling of right to left strings | the proposed changes in handling of right to left strings | |||
| [IDNA2008-Bidi] either give interpretations to strings prohibited | [IDNA2008-Bidi] either give interpretations to strings prohibited | |||
| under IDNA2003 or prohibit strings that IDNA2003 permitted. Strings | under IDNA2003 or prohibit strings that IDNA2003 permitted. Strings | |||
| that are valid under both IDNA2003 and IDNA2008, and the | that are valid under both IDNA2003 and IDNA2008, and the | |||
| corresponding versions of Stringprep, are not changed in | corresponding versions of Stringprep, are not changed in | |||
| interpretation. This protocol does not use either Nameprep or | interpretation. This protocol does not use either Nameprep or | |||
| Stringprep as specified in IDNA2003. | Stringprep as specified in IDNA2003. | |||
| It is particularly important to keep IDNA processing separate from | It is particularly important to keep IDNA processing separate from | |||
| processing for various security protocols because some of the | processing for various security protocols because some of the | |||
| constraints that are necessary for smooth and comprehensible use of | constraints that are necessary for smooth and comprehensible use of | |||
| skipping to change at page 36, line 15 ¶ | skipping to change at page 37, line 5 ¶ | |||
| different requirements than IDNs. | different requirements than IDNs. | |||
| Perhaps even more important in practice, since most other known uses | Perhaps even more important in practice, since most other known uses | |||
| of Stringprep encode or process characters that are already in | of Stringprep encode or process characters that are already in | |||
| normalized form and expect the use of only those characters that can | normalized form and expect the use of only those characters that can | |||
| be used in writing words of languages, the changes proposed here and | be used in writing words of languages, the changes proposed here and | |||
| in [IDNA2008-Tables] are unlikely to have any effect at all, | in [IDNA2008-Tables] are unlikely to have any effect at all, | |||
| especially not on registries and registrations that follow rules | especially not on registries and registrations that follow rules | |||
| already in existence when this work started. | already in existence when this work started. | |||
| 10.5. The Symbol Question | 9.2.5. The Symbol Question | |||
| One of the major differences between this specification and the | One of the major differences between this specification and the | |||
| original version of IDNA is that the original version permitted non- | original version of IDNA is that the original version permitted non- | |||
| letter symbols of various sorts, including punctuation and line- | letter symbols of various sorts, including punctuation and line- | |||
| drawing symbols, in the protocol. They were always discouraged in | drawing symbols, in the protocol. They were always discouraged in | |||
| practice. In particular, both the "IESG Statement" about IDNA and | practice. In particular, both the "IESG Statement" about IDNA and | |||
| all versions of the ICANN Guidelines specify that only language | all versions of the ICANN Guidelines specify that only language | |||
| characters be used in labels. This specification disallows symbols | characters be used in labels. This specification disallows symbols | |||
| entirely. There are several reasons for this, which include: | entirely. There are several reasons for this, which include: | |||
| skipping to change at page 36, line 47 ¶ | skipping to change at page 37, line 37 ¶ | |||
| there are no uniform conventions for naming; variations such as | there are no uniform conventions for naming; variations such as | |||
| outline, solid, and shaded forms may or may not exist; and so on. | outline, solid, and shaded forms may or may not exist; and so on. | |||
| As just one example, consider a "heart" symbol as it might appear | As just one example, consider a "heart" symbol as it might appear | |||
| in a logo that might be read as "I love...". While the user might | in a logo that might be read as "I love...". While the user might | |||
| read such a logo as "I love..." or "I heart...", considerable | read such a logo as "I love..." or "I heart...", considerable | |||
| knowledge of the coding distinctions made in Unicode is needed to | knowledge of the coding distinctions made in Unicode is needed to | |||
| know that there more than one "heart" character (e.g., U+2665, | know that there more than one "heart" character (e.g., U+2665, | |||
| U+2661, and U+2765) and how to describe it. These issues are of | U+2661, and U+2765) and how to describe it. These issues are of | |||
| particular importance if strings are expected to be understood or | particular importance if strings are expected to be understood or | |||
| transcribed by the listener after being read out loud. | transcribed by the listener after being read out loud. | |||
| [[anchor35: The above paragraph remains controversial as to | [[anchor33: The above paragraph remains controversial as to | |||
| whether it is valid. The WG will need to make a decision if this | whether it is valid. The WG will need to make a decision if this | |||
| section is not dropped entirely.]] | section is not dropped entirely.]] | |||
| o As a simplified example of this, assume one wanted to use a | o As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| the those names are ambiguous in the Unicode system of naming (the | the those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know --absent careful study of | would-be registrant has no way to know --absent careful study of | |||
| the code tables-- whether it is ambiguous (e.g., where there are | the code tables-- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| skipping to change at page 37, line 30 ¶ | skipping to change at page 38, line 19 ¶ | |||
| distinction. We have a white heart (U+2661) and few black hearts | distinction. We have a white heart (U+2661) and few black hearts | |||
| and describing a label containing a heart symbol is hopelessly | and describing a label containing a heart symbol is hopelessly | |||
| ambiguous. In cities where "Square" is a popular part of a | ambiguous. In cities where "Square" is a popular part of a | |||
| location name, one might well want to use a square symbol in a | location name, one might well want to use a square symbol in a | |||
| label as well and there are far more squares of various flavors in | label as well and there are far more squares of various flavors in | |||
| Unicode than there are hearts or stars. | Unicode than there are hearts or stars. | |||
| o The consequence of these ambiguities of description and | o The consequence of these ambiguities of description and | |||
| dependencies on distinctions that were, or were not, made in | dependencies on distinctions that were, or were not, made in | |||
| Unicode codings, is that symbols are a very poor basis for | Unicode codings, is that symbols are a very poor basis for | |||
| reliable communication. Of course, these difficulties with | reliable communication. Consistent with this conclusion, the | |||
| symbols do not arise with actual pictographic languages and | Unicode standard recommends that strings used in identifiers not | |||
| scripts which would be treated like any other language characters; | contain symbols or punctuation [Unicode-UAX31]. Of course, these | |||
| the two should not be confused. | difficulties with symbols do not arise with actual pictographic | |||
| languages and scripts which would be treated like any other | ||||
| language characters; the two should not be confused. | ||||
| [[anchor36: Note in Draft: Should the above section be significantly | [[anchor34: Note in Draft: Should the above section be significantly | |||
| trimmed or eliminated?]] | trimmed or eliminated?]] | |||
| 10.6. Migration Between Unicode Versions: Unassigned Code Points | 9.2.6. Migration Between Unicode Versions: Unassigned Code Points | |||
| In IDNA2003, labels containing unassigned code points are resolved on | In IDNA2003, labels containing unassigned code points are looked up | |||
| the theory that, if they appear in labels and can be resolved, the | on the theory that, if they appear in labels and can be mapped and | |||
| relevant standards must have changed and the registry has properly | then resolved, the relevant standards must have changed and the | |||
| allocated only assigned values. | registry has properly allocated only assigned values. | |||
| In this specification, strings containing unassigned code points MUST | In this specification, strings containing unassigned code points MUST | |||
| NOT be either looked up or registered. There are several reasons for | NOT be either looked up or registered. There are several reasons for | |||
| this, with the most important ones being: | this, with the most important ones being: | |||
| o It cannot be known with sufficient reliability in advance that a | o It cannot be known with sufficient reliability in advance that a | |||
| code point that was not previously assigned will not be assigned | code point that was not previously assigned will not be assigned | |||
| to a compatibility character. In IDNA2003, since there is no | to a compatibility character. In IDNA2003, since there is no | |||
| direct dependency on NFKC (Stringprep's tables are based on NFKC, | direct dependency on NFKC (Stringprep's tables are based on NFKC, | |||
| but IDNA2003 depends only on Stringprep), allocation of a | but IDNA2003 depends only on Stringprep), allocation of a | |||
| skipping to change at page 38, line 32 ¶ | skipping to change at page 39, line 25 ¶ | |||
| obscure characters or archaic scripts. Unfortunately, that does not | obscure characters or archaic scripts. Unfortunately, that does not | |||
| appear to be a safe assumption for at least two reasons. First, much | appear to be a safe assumption for at least two reasons. First, much | |||
| the same claim of completeness has been made for earlier versions of | the same claim of completeness has been made for earlier versions of | |||
| Unicode. The reality is that a script that is obscure to much of the | Unicode. The reality is that a script that is obscure to much of the | |||
| world may still be very important to those who use it. Cultural and | world may still be very important to those who use it. Cultural and | |||
| linguistic preservation principles make it inappropriate to declare | linguistic preservation principles make it inappropriate to declare | |||
| the script of no importance in IDNs. Second, we already have | the script of no importance in IDNs. Second, we already have | |||
| counterexamples in, e.g., the relationships associated with new Han | counterexamples in, e.g., the relationships associated with new Han | |||
| characters being added (whether in the BMP or in Unicode Plane 2). | characters being added (whether in the BMP or in Unicode Plane 2). | |||
| 10.7. Other Compatibility Issues | 9.2.7. Other Compatibility Issues | |||
| The existing (2003) IDNA model includes several odd artifacts of the | The existing (2003) IDNA model includes several odd artifacts of the | |||
| context in which it was developed. Many, if not all, of these are | context in which it was developed. Many, if not all, of these are | |||
| potential avenues for exploits, especially if the registration | potential avenues for exploits, especially if the registration | |||
| process permits "source" names (names that have not been processed | process permits "source" names (names that have not been processed | |||
| through IDNA and nameprep) to be registered. As one example, since | through IDNA and nameprep) to be registered. As one example, since | |||
| the character Eszett, used in German, is mapped by IDNA2003 into the | the character Eszett, used in German, is mapped by IDNA2003 into the | |||
| sequence "ss" rather than being retained as itself or prohibited, a | sequence "ss" rather than being retained as itself or prohibited, a | |||
| string containing that character but that is otherwise in ASCII is | string containing that character but that is otherwise in ASCII is | |||
| not really an IDN (in the U-label sense defined above) at all. After | not really an IDN (in the U-label sense defined above) at all. After | |||
| Nameprep maps the Eszett out, the result is an ASCII string and so | Nameprep maps the Eszett out, the result is an ASCII string and so | |||
| does not get an xn-- prefix, but the string that can be displayed to | does not get an xn-- prefix, but the string that can be displayed to | |||
| a user appears to be an IDN. The proposed IDNA2008 eliminates this | a user appears to be an IDN. The proposed IDNA2008 eliminates this | |||
| artifact. A character is either permitted as itself or it is | artifact. A character is either permitted as itself or it is | |||
| prohibited; special cases that make sense only in a particular | prohibited; special cases that make sense only in a particular | |||
| linguistic or cultural context can be dealt with as localization | linguistic or cultural context can be dealt with as localization | |||
| matters where appropriate. | matters where appropriate. | |||
| 11. Acknowledgments | 10. Acknowledgments | |||
| The editor and contributors would like to express their thanks to | The editor and contributors would like to express their thanks to | |||
| those who contributed significant early (pre-WG) review comments, | those who contributed significant early (pre-WG) review comments, | |||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | |||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | Simon Josefsson, and Sam Weiler. In addition, some specific ideas | |||
| were incorporated from suggestions, text, or comments about sections | were incorporated from suggestions, text, or comments about sections | |||
| that were unclear supplied by Frank Ellerman, Michael Everson, Asmus | that were unclear supplied by Frank Ellerman, Michael Everson, Asmus | |||
| Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, | Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, | |||
| although, as usual, they bear little or no responsibility for the | although, as usual, they bear little or no responsibility for the | |||
| conclusions the editor and contributors reached after receiving their | conclusions the editor and contributors reached after receiving their | |||
| skipping to change at page 39, line 34 ¶ | skipping to change at page 40, line 25 ¶ | |||
| meeting were very helpful in focusing the issues and in refining the | meeting were very helpful in focusing the issues and in refining the | |||
| specifications. The active participants at that meeting were (in | specifications. The active participants at that meeting were (in | |||
| alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | |||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | |||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | |||
| Michel Suignard, and Ken Whistler. We express our thanks to Google | Michel Suignard, and Ken Whistler. We express our thanks to Google | |||
| for support of that meeting and to the participants for their | for support of that meeting and to the participants for their | |||
| contributions. | contributions. | |||
| Special thanks are due to Paul Hoffman for permission to extract | Special thanks are due to Paul Hoffman for permission to extract | |||
| material from his Internet-Draft to form the basis for Section 2. | material from his Internet-Draft to form the basis for Section 9.1. | |||
| Useful comments and text on the WG versions of the draft were | Useful comments and text on the WG versions of the draft were | |||
| received from many participants in the IETF "IDNABIS" WG and a number | received from many participants in the IETF "IDNABIS" WG and a number | |||
| of document changes resulted from mailing list discussions made by | of document changes resulted from mailing list discussions made by | |||
| that group. | that group. Marcos Sanz provided specific analysis and suggestions | |||
| that were exceptionally helpful in refining the text. | ||||
| 12. Contributors | 11. Contributors | |||
| While the listed editor held the pen, this core of this document and | While the listed editor held the pen, this core of this document and | |||
| the initial WG version represents the joint work and conclusions of | the initial WG version represents the joint work and conclusions of | |||
| an ad hoc design team consisting of the editor and, in alphabetic | an ad hoc design team consisting of the editor and, in alphabetic | |||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | |||
| In addition, there were many specific contributions and helpful | In addition, there were many specific contributions and helpful | |||
| comments from those listed in the Acknowledgments section and others | comments from those listed in the Acknowledgments section and others | |||
| who have contributed to the development and use of the IDNA | who have contributed to the development and use of the IDNA | |||
| protocols. | protocols. | |||
| 12. Internationalization Considerations | ||||
| DNS labels and fully-qualified domain names provide mnemonics that | ||||
| assist in identifying and referring to resources on the Internet. | ||||
| IDNs expand the range of those mnemonics to include those based on | ||||
| languages and character sets other than Western European and Roman- | ||||
| derived ones. But domain "names" are not, in general, words in any | ||||
| language. The recommendations of the IETF policy on character sets | ||||
| and languages, BCP 18 [RFC2277] are applicable to situations in which | ||||
| language identification is used to provide language-specific | ||||
| contexts. The DNS is, by contrast, global and international and | ||||
| ultimately has nothing to do with languages. Adding languages (or | ||||
| similar context) to IDNs generally, or to DNS matching in particular, | ||||
| would imply context dependent matching in DNS, which would be a very | ||||
| significant change to the DNS protocol itself. It would also imply | ||||
| that users would need to identify the language associated with a | ||||
| particular label in order to look that label up, a decision that | ||||
| would be impossible in many or most cases. | ||||
| 13. IANA Considerations | 13. IANA Considerations | |||
| This section gives an overview of registries required for IDNA. The | This section gives an overview of registries required for IDNA. The | |||
| actual definition of the first one appears in [IDNA2008-Tables]. | actual definitions of the first two appear in [IDNA2008-Tables]. | |||
| 13.1. IDNA Character Registry | 13.1. IDNA Character Registry | |||
| The distinction among the three major categories "UNASSIGNED", | The distinction among the three major categories "UNASSIGNED", | |||
| "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and | "DISALLOWED", and "PROTOCOL-VALID" is made by special categories and | |||
| rules that are integral elements of [IDNA2008-Tables]. Convenience | rules that are integral elements of [IDNA2008-Tables]. Convenience | |||
| in programming and validation requires a registry of characters and | in programming and validation requires a registry of characters and | |||
| scripts and their categories, updated for each new version of Unicode | scripts and their categories, updated for each new version of Unicode | |||
| and the characters it contains. The details of this registry are | and the characters it contains. The details of this registry are | |||
| specified in [IDNA2008-Tables]. | specified in [IDNA2008-Tables]. | |||
| 13.2. IDNA Context Registry | 13.2. IDNA Context Registry | |||
| For characters that are defined in the IDNA Character Registry list | For characters that are defined in the IDNA Character Registry list | |||
| as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of | as PROTOCOL-VALID but requiring a contextual rule (i.e., the types of | |||
| rule described in Section 6.1.1.1), IANA will create and maintain a | rule described in Section 5.1.1.1), IANA will create and maintain a | |||
| list of approved contextual rules. Additions or changes to these | list of approved contextual rules. The details for those rules | |||
| rules require IETF Review, as described in [RFC5226]. | appear in [IDNA2008-Tables]. | |||
| [[anchor41: Note in Draft: This section was changed between -00 and | ||||
| -01 based on list discussion. Consensus needs to be verified for | ||||
| that decision.]] | ||||
| A table from which that registry can be initialized, and some further | ||||
| discussion, appears in [RulesInit]. | ||||
| [[anchor42: This subsection should probably be moved to Tables along | ||||
| with the Contextual rules themselves (from Protocol) when the move is | ||||
| made.]] | ||||
| 13.3. IANA Repository of IDN Practices of TLDs | 13.3. IANA Repository of IDN Practices of TLDs | |||
| This registry, historically described as the "IANA Language Character | This registry, historically described as the "IANA Language Character | |||
| Set Registry" or "IANA Script Registry" (both somewhat misleading | Set Registry" or "IANA Script Registry" (both somewhat misleading | |||
| terms) is maintained by IANA at the request of ICANN. It is used to | terms) is maintained by IANA at the request of ICANN. It is used to | |||
| provide a central documentation repository of the IDN policies used | provide a central documentation repository of the IDN policies used | |||
| by top level domain (TLD) registries who volunteer to contribute to | by top level domain (TLD) registries who volunteer to contribute to | |||
| it and is used in conjunction with ICANN Guidelines for IDN use. | it and is used in conjunction with ICANN Guidelines for IDN use. | |||
| skipping to change at page 41, line 42 ¶ | skipping to change at page 42, line 44 ¶ | |||
| restrictions (subject to the limitations identified elsewhere in this | restrictions (subject to the limitations identified elsewhere in this | |||
| document) that try to minimize characters that have similar | document) that try to minimize characters that have similar | |||
| appearance or similar interpretations. It is worth noting that there | appearance or similar interpretations. It is worth noting that there | |||
| are no comprehensive technical solutions to the problems of | are no comprehensive technical solutions to the problems of | |||
| confusable characters. One can reduce the extent of the problems in | confusable characters. One can reduce the extent of the problems in | |||
| various ways, but probably never eliminate it. Some specific | various ways, but probably never eliminate it. Some specific | |||
| suggestions about identification and handling of confusable | suggestions about identification and handling of confusable | |||
| characters appear in a Unicode Consortium publication | characters appear in a Unicode Consortium publication | |||
| [Unicode-UTR36]. | [Unicode-UTR36]. | |||
| The registration and resolution models described above and in | The registration and lookup models described above and in | |||
| [IDNA2008-Protocol] change the mechanisms available for applications | [IDNA2008-Protocol] change the mechanisms available for lookup | |||
| and resolvers to determine the validity of labels they encounter. In | applications to determine the validity of labels they encounter. In | |||
| some respects, the ability to test is strengthened. For example, | some respects, the ability to test is strengthened. For example, | |||
| putative labels that contain unassigned code points will now be | putative labels that contain unassigned code points will now be | |||
| rejected, while IDNA2003 permitted them (something that is now | rejected, while IDNA2003 permitted them (something that is now | |||
| recognized as a considerable source of risk). On the other hand, the | recognized as a considerable source of risk). On the other hand, the | |||
| protocol specification no longer assumes that the application that | protocol specification no longer assumes that the application that | |||
| looks up a name will be able to determine, and apply, information | looks up a name will be able to determine, and apply, information | |||
| about the protocol version used in registration. In theory, that may | about the protocol version used in registration. In theory, that may | |||
| increase risk since the application will be able to do less pre- | increase risk since the application will be able to do less pre- | |||
| lookup validation. In practice, the protection afforded by that test | lookup validation. In practice, the protection afforded by that test | |||
| has been largely illusory for reasons explained in RFC 4690 and | has been largely illusory for reasons explained in RFC 4690 and | |||
| above. | above. | |||
| Any change to Stringprep or, more broadly, the IETF's model of the | Any change to Stringprep or, more broadly, the IETF's model of the | |||
| use of internationalized character strings in different protocols, | use of internationalized character strings in different protocols, | |||
| creates some risk of inadvertent changes to those protocols, | creates some risk of inadvertent changes to those protocols, | |||
| invalidating deployed applications or databases, and so on. Our | invalidating deployed applications or databases, and so on. Our | |||
| current hypothesis is that the same considerations that would require | current hypothesis is that the same considerations that would require | |||
| changing the IDN prefix (see Section 10.3.2) are the ones that would, | changing the IDN prefix (see Section 9.2.3.2) are the ones that | |||
| e.g., invalidate certificates or hashes that depend on Stringprep, | would, e.g., invalidate certificates or hashes that depend on | |||
| but those cases require careful consideration and evaluation. More | Stringprep, but those cases require careful consideration and | |||
| important, it is not necessary to change Stringprep2003 at all in | evaluation. More important, it is not necessary to change | |||
| order to make the IDNA changes contemplated here. It is far | Stringprep2003 at all in order to make the IDNA changes contemplated | |||
| preferable to create a separate document, or separate profile | here. It is far preferable to create a separate document, or | |||
| components, for IDN work, leaving the question of upgrading to other | separate profile components, for IDN work, leaving the question of | |||
| protocols to experts on them and eliminating any possible | upgrading to other protocols to experts on them and eliminating any | |||
| synchronization dependency between IDNA changes and possible upgrades | possible synchronization dependency between IDNA changes and possible | |||
| to security protocols or conventions. | upgrades to security protocols or conventions. | |||
| No mechanism involving names or identifiers alone can protect a wide | No mechanism involving names or identifiers alone can protect a wide | |||
| variety of security threats and attacks that are largely independent | variety of security threats and attacks that are largely independent | |||
| of them including spoofed pages, DNS query trapping and diversion, | of them including spoofed pages, DNS query trapping and diversion, | |||
| and so on. | and so on. | |||
| 15. Change Log | 15. Change Log | |||
| [[anchor45: RFC Editor: Please remove this section.]] | [[anchor42: RFC Editor: Please remove this section.]] | |||
| For version 00 of draft-ietf-idnabis-rationale, this list contains a | ||||
| complete trace going back through the earlier, design team, drafts. | ||||
| Material earlier than that described in Section 15.9 will be removed | ||||
| in WG draft -02. | ||||
| 15.1. Version -01 of draft-klensin-idnabis-issues | ||||
| Version -01 of this document is a considerable rewrite from -00. | ||||
| Many sections have been clarified or extended and several new | ||||
| sections have been added to reflect discussions in a number of | ||||
| contexts since -00 was issued. | ||||
| 15.2. Version -02 of draft-klensin-idnabis-issues | ||||
| o Corrected several editorial errors including an accidentally- | ||||
| introduced misstatement about NFKC. | ||||
| o Extensively revised the document to synchronize its terminology | ||||
| with version 03 of [IDNA2008-Tables] and to provide a better | ||||
| conceptual framework for its categories and how they are used. | ||||
| Added new material to clarify terminology and relationships with | ||||
| other efforts. More subtle changes in this version lay the | ||||
| groundwork for separating the document into a conceptual overview | ||||
| and a protocol specification for version 03. | ||||
| 15.3. Version -03 of draft-klensin-idnabis-issues | ||||
| o Removed protocol materials to a separate document and incorporated | ||||
| rationale and explanation materials from the original | ||||
| specification in RFC 3960 into this document. Cleaned up earlier | ||||
| text to reflect a more mature specification and restructured | ||||
| several sections and added additional rationale material. | ||||
| o Strengthened and clarified the A-label / U-label/ LDH-label | ||||
| definition. | ||||
| o Retitled the document to reflect its evolving role. | ||||
| 15.4. Version -04 of draft-klensin-idnabis-issues | ||||
| o Moved more text from "protocol" and further reorganized material. | ||||
| o Provided new material on "Contextual Rule Required. | ||||
| o Improved consistency of terminology, both internally and with the | ||||
| "tables" document. | ||||
| o Improved the IANA Considerations section and discussed the | ||||
| existing IDNA-related registry. | ||||
| o More small changes to increase consistency. | ||||
| 15.5. Version -05 of draft-klensin-idnabis-issues | ||||
| Changed "YES" category back to "ALWAYS" to re-synch with the tables | ||||
| document and provide clearer terminology. | ||||
| 15.6. Version -06 of draft-klensin-idnabis-issues | ||||
| o Clarified the prohibitions on strings that look like A-labels but | ||||
| are not and on unassigned code points. | ||||
| o Clarified length restrictions on IDN labels. | ||||
| o Revised the terminology definitions to remove the impression of | ||||
| circularity and removed invocations of ToASCII and ToUnicode, | ||||
| which do not exist in IDNA2008. | ||||
| o Added a new section on front-end processing. | ||||
| o Added a new section to discuss case-mapping. | ||||
| o Extended the discussion of prefix changes to identify the | ||||
| implications of making one. | ||||
| o Several more editorial improvements, corrected references, and | ||||
| similar adjustments. | ||||
| 15.7. Version -07 of draft-klensin-idnabis-issues | ||||
| o Added material that specifically defines the format of contextual | ||||
| rules. | ||||
| o Added and altered text after discussions at the 30 January meeting | ||||
| (see Section 11) and the follow-up to those discussions. Among | ||||
| the key decisions at that meeting were to eliminate the | ||||
| distinction among the valid categories (formerly "ALWAYS", "MAYBE | ||||
| YES", and "MAYBE NO"), to adjust the terminology accordingly, and | ||||
| to change "CONTEXTUAL RULE REQUIRED" from a separate category in | ||||
| this document and the protocol one to a modifier of what is now | ||||
| called "PROTOCOL-VALID". The consequent changes resulted in | ||||
| removal of several sections of explanation from this document. | ||||
| o Resynchronized terminology with "protocol" and "tables" documents. | ||||
| o More editorial and typographic corrections. | ||||
| 15.8. Version -00 of draft-ietf-idnabis-rationale | ||||
| o Rewrote the abstract and introduction, and retuned the title, to | ||||
| be more consistent with WG work and activities. Changed the file | ||||
| name to reflect WG naming. | ||||
| o Removed most of the material that explained, or compared this | ||||
| approach to, IDNA2003. Some of this material may appear in the | ||||
| non-WG "IDNA-alternatives" draft if it is ever completed. | ||||
| o Changed IDNA200X in terminology and references to IDNA2008. | ||||
| o Added a contextual rule for hyphen to the appendix, adjusted the | ||||
| rule syntax slightly, and supplied draft regular expression rules. | ||||
| o Responded to comments produced during the WG charter discussions | ||||
| and from several individuals. In general, comments requesting a | ||||
| reorganization of the collection of documents have not been | ||||
| responded to pending a WG decision on that topic. | ||||
| o Moved the contextual rule appendix out of here and into | ||||
| "Protocol". It may not belong there either, but definitely does | ||||
| not belong here, and was holding up getting this document out. | ||||
| o Many small editorial improvements, including reorganization of | ||||
| some material. | ||||
| Editorial note: While several sections have been removed from this | ||||
| version, the WG should discuss whether further cuts are desirable, | ||||
| e.g., whether Section 7.3, Section 7.4, or Section 10.3 provide | ||||
| enough value to be worth retaining? Can Section 10.4 be trimmed | ||||
| without loss of useful information and, if so, how? Section 10.7 | ||||
| appears critical of IDNA2003 in undesirable ways: should it be | ||||
| dropped or do people have suggestions about how to improve it? | ||||
| Strong opinions have been expressed that Section 10.5 should be | ||||
| trimmed significantly or removed entirely. The WG will need to | ||||
| discuss that too. Are there other materials that should be trimmed | ||||
| out? | ||||
| 15.9. Version -01 of draft-ietf-idnabis-rationale | 15.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale | ||||
| o Clarified the U-label definition to note that U-labels must | o Clarified the U-label definition to note that U-labels must | |||
| contain at least one non-ASCII character. Also clarified the | contain at least one non-ASCII character. Also clarified the | |||
| relationship among label types. | relationship among label types. | |||
| o Rewrote the discussion of Labels in Registration (Section 10.1.2) | o Rewrote the discussion of Labels in Registration (Section 9.2.1.2) | |||
| and related text in Section 1.5.4.1.1 to narrow its focus and | and related text in Section 1.5.4.1.1 to narrow its focus and | |||
| remove more general restrictions. Added a temporary note in line | remove more general restrictions. Added a temporary note in line | |||
| to explain the situation. | to explain the situation. | |||
| o Changed the "IDNA uses Unicode" statement to focus on | o Changed the "IDNA uses Unicode" statement to focus on | |||
| compatibility with IDNA2003 and avoid more general or | compatibility with IDNA2003 and avoid more general or | |||
| controversial assertions. | controversial assertions. | |||
| o Added a discussion of examples to Section 10.1 | o Added a discussion of examples to Section 9.2.1 | |||
| o Made a number of other small editorial changes and corrections | o Made a number of other small editorial changes and corrections | |||
| suggested by Mark Davis. | suggested by Mark Davis. | |||
| o Added several more discussion anchors and notes and expanded or | o Added several more discussion anchors and notes and expanded or | |||
| updated some existing ones. | updated some existing ones. | |||
| 15.2. Version -02 | ||||
| o Trimmed change log, removing information about pre-WG drafts. | ||||
| o Adjusted discussion of Contextual Rules to match the new location | ||||
| of the tables and some conceptual material. | ||||
| o Rewrote the material on preprocessing somewhat. | ||||
| o Moved the material about relationships with IDNA2003 to be part of | ||||
| a single section on transitions. | ||||
| o Removed several placeholders and made editorial changes in | ||||
| accordance with decisions made at IETF 72 in Dublin and not | ||||
| disputed on the mailing list. | ||||
| 16. References | 16. References | |||
| 16.1. Normative References | 16.1. Normative References | |||
| [ASCII] American National Standards Institute (formerly United | [ASCII] American National Standards Institute (formerly United | |||
| States of America Standards Institute), "USA Code for | States of America Standards Institute), "USA Code for | |||
| Information Interchange", ANSI X3.4-1968, 1968. | Information Interchange", ANSI X3.4-1968, 1968. | |||
| ANSI X3.4-1968 has been replaced by newer versions with | ANSI X3.4-1968 has been replaced by newer versions with | |||
| slight modifications, but the 1968 version remains | slight modifications, but the 1968 version remains | |||
| definitive for the Internet. | definitive for the Internet. | |||
| [IDNA2008-Bidi] | [IDNA2008-Bidi] | |||
| Alvestrand, H. and C. Karp, "An updated IDNA criterion for | Alvestrand, H. and C. Karp, "An updated IDNA criterion for | |||
| right to left scripts", July 2008, <http://www.ietf.org/ | right to left scripts", July 2008, <https:// | |||
| internet-drafts/draft-ietf-idnabs-bidi-01.txt>. | datatracker.ietf.org/drafts/draft-ietf-idnabs-bidi/>. | |||
| [IDNA2008-Protocol] | [IDNA2008-Protocol] | |||
| Klensin, J., "Internationalized Domain Names in | Klensin, J., "Internationalized Domain Names in | |||
| Applications (IDNA): Protocol", July 2008, <http:// | Applications (IDNA): Protocol", July 2008, <https:// | |||
| www.ietf.org/internet-drafts/ | datatracker.ietf.org/drafts/draft-ietf-idnabis-protocol/>. | |||
| draft-ietf-idnabis-protocol-02.txt>. | ||||
| [IDNA2008-Tables] | [IDNA2008-Tables] | |||
| Faltstrom, P., "The Unicode Code Points and IDNA", | Faltstrom, P., "The Unicode Code Points and IDNA", | |||
| May 2008, <http://www.ietf.org/internet-drafts/ | July 2008, <https://datatracker.ietf.org/drafts/ | |||
| draft-ietf-idnabis-tables-01.txt>. | draft-ietf-idnabis-tables/>. | |||
| A version of this document is available in HTML format at | A version of this document is available in HTML format at | |||
| http://stupid.domain.name/idnabis/ | http://stupid.domain.name/idnabis/ | |||
| draft-ietf-idnabis-tables-01.html | draft-ietf-idnabis-tables-02.html | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | |||
| Internationalized Strings ("stringprep")", RFC 3454, | Internationalized Strings ("stringprep")", RFC 3454, | |||
| December 2002. | December 2002. | |||
| [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| skipping to change at page 47, line 17 ¶ | skipping to change at page 45, line 43 ¶ | |||
| [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an | [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an | |||
| IANA Considerations Section in RFCs", BCP 26, RFC 5226, | IANA Considerations Section in RFCs", BCP 26, RFC 5226, | |||
| May 2008. | May 2008. | |||
| [RulesInit] | [RulesInit] | |||
| Klensin, J., "Internationalizing Domain Names in | Klensin, J., "Internationalizing Domain Names in | |||
| Applications (IDNA): Protocol, Appendix A Contextual Rules | Applications (IDNA): Protocol, Appendix A Contextual Rules | |||
| Table", July 2008, <http://www.ietf.org/internet-drafts/ | Table", July 2008, <http://www.ietf.org/internet-drafts/ | |||
| draft-ietf-idnabis-protocol-02.txt>. | draft-ietf-idnabis-protocol-02.txt>. | |||
| [Unicode-UAX15] | ||||
| The Unicode Consortium, "Unicode Standard Annex #15: | ||||
| Unicode Normalization Forms", March 2008, | ||||
| <http://www.unicode.org/reports/tr15/>. | ||||
| [Unicode51] | [Unicode51] | |||
| The Unicode Consortium, "The Unicode Standard, Version | The Unicode Consortium, "The Unicode Standard, Version | |||
| 5.1.0", 2008. | 5.1.0", 2008. | |||
| defined by: The Unicode Standard, Version 5.0, Boston, MA, | defined by: The Unicode Standard, Version 5.0, Boston, MA, | |||
| Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | |||
| Unicode 5.1.0 | Unicode 5.1.0 | |||
| (http://www.unicode.org/versions/Unicode5.1.0/). | (http://www.unicode.org/versions/Unicode5.1.0/). | |||
| 16.2. Informative References | 16.2. Informative References | |||
| skipping to change at page 48, line 9 ¶ | skipping to change at page 46, line 39 ¶ | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, November 1987. | STD 13, RFC 1034, November 1987. | |||
| [RFC1035] Mockapetris, P., "Domain names - implementation and | [RFC1035] Mockapetris, P., "Domain names - implementation and | |||
| specification", STD 13, RFC 1035, November 1987. | specification", STD 13, RFC 1035, November 1987. | |||
| [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | |||
| and Support", STD 3, RFC 1123, October 1989. | and Support", STD 3, RFC 1123, October 1989. | |||
| [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS | ||||
| Specification", RFC 2181, July 1997. | ||||
| [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | ||||
| Languages", BCP 18, RFC 2277, January 1998. | ||||
| [RFC2673] Crawford, M., "Binary Labels in the Domain Name System", | ||||
| RFC 2673, August 1999. | ||||
| [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for | |||
| specifying the location of services (DNS SRV)", RFC 2782, | specifying the location of services (DNS SRV)", RFC 2782, | |||
| February 2000. | February 2000. | |||
| [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint | [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint | |||
| Engineering Team (JET) Guidelines for Internationalized | Engineering Team (JET) Guidelines for Internationalized | |||
| Domain Names (IDN) Registration and Administration for | Domain Names (IDN) Registration and Administration for | |||
| Chinese, Japanese, and Korean", RFC 3743, April 2004. | Chinese, Japanese, and Korean", RFC 3743, April 2004. | |||
| [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource | [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource | |||
| skipping to change at page 48, line 33 ¶ | skipping to change at page 47, line 25 ¶ | |||
| December 2005. | December 2005. | |||
| [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and | [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and | |||
| Recommendations for Internationalized Domain Names | Recommendations for Internationalized Domain Names | |||
| (IDNs)", RFC 4690, September 2006. | (IDNs)", RFC 4690, September 2006. | |||
| [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, | [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, | |||
| "Registration and Administration Recommendations for | "Registration and Administration Recommendations for | |||
| Chinese Domain Names", RFC 4713, October 2006. | Chinese Domain Names", RFC 4713, October 2006. | |||
| [Unicode-UAX31] | ||||
| The Unicode Consortium, "Unicode Standard Annex #31: | ||||
| Unicode Identifier and Pattern Syntax", March 2008, | ||||
| <http://www.unicode.org/reports/tr31/>. | ||||
| [Unicode-UTR36] | [Unicode-UTR36] | |||
| The Unicode Consortium, "Unicode Technical Report #36: | The Unicode Consortium, "Unicode Technical Report #36: | |||
| Unicode Security Considerations", August 2006, | Unicode Security Considerations", July 2008, | |||
| <http://www.unicode.org/reports/tr36/>. | <http://www.unicode.org/reports/tr36/>. | |||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| End of changes. 135 change blocks. | ||||
| 558 lines changed or deleted | 481 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||