| < draft-ietf-idnabis-rationale-05.txt | draft-ietf-idnabis-rationale-06.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft November 28, 2008 | Internet-Draft December 15, 2008 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: June 1, 2009 | Expires: June 18, 2009 | |||
| Internationalized Domain Names for Applications (IDNA): Background, | Internationalized Domain Names for Applications (IDNA): Background, | |||
| Explanation, and Rationale | Explanation, and Rationale | |||
| draft-ietf-idnabis-rationale-05.txt | draft-ietf-idnabis-rationale-06.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on June 1, 2009. | This Internet-Draft will expire on June 18, 2009. | |||
| Abstract | Abstract | |||
| Several years have passed since the original protocol for | Several years have passed since the original protocol for | |||
| Internationalized Domain Names (IDNs) was completed and deployed. | Internationalized Domain Names (IDNs) was completed and deployed. | |||
| During that time, a number of issues have arisen, including the need | During that time, a number of issues have arisen, including the need | |||
| to update the system to deal with newer versions of Unicode. Some of | to update the system to deal with newer versions of Unicode. Some of | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| skipping to change at page 2, line 15 ¶ | skipping to change at page 2, line 15 ¶ | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | |||
| 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | |||
| 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 | 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 | |||
| 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6 | 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | |||
| 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | |||
| 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | |||
| 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 | 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 | |||
| 3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 | 3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 | |||
| 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, | 3.3. Layered Restrictions: Tables, Context, Registration, | |||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 13 | Applications . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 | 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 | |||
| 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 | 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 | |||
| 4.2. Entry and Display in Applications . . . . . . . . . . . . 15 | 4.2. Entry and Display in Applications . . . . . . . . . . . . 15 | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and | 4.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 16 | Alternate Character Forms . . . . . . . . . . . . . . . . 16 | |||
| 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 | 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 19 | |||
| 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 | 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 20 | |||
| 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | |||
| 6. Front-end and User Interface Processing . . . . . . . . . . . 21 | 6. Front-end and User Interface Processing . . . . . . . . . . . 21 | |||
| 7. Migration from IDNA2003 and Unicode Version Synchronization . 23 | 7. Migration from IDNA2003 and Unicode Version Synchronization . 24 | |||
| 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23 | 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 | 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 | |||
| 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 26 | |||
| 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 27 | |||
| 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | 7.2. Changes in Character Interpretations . . . . . . . . . . . 28 | |||
| 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 | 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 | |||
| 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 | 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 31 | |||
| 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 | 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 31 | |||
| 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 | 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 32 | |||
| 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 | 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 32 | |||
| 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 | 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 | |||
| 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 | 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 33 | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code | 7.7. Migration Between Unicode Versions: Unassigned Code | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35 | 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36 | 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 10. Internationalization Considerations . . . . . . . . . . . . . 36 | 10. Internationalization Considerations . . . . . . . . . . . . . 37 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 | 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 | |||
| 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 | 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 | |||
| 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 | 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 | |||
| 12. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | 12. Security Considerations . . . . . . . . . . . . . . . . . . . 38 | |||
| 12.1. General Security Issues with IDNA . . . . . . . . . . . . 37 | 12.1. General Security Issues with IDNA . . . . . . . . . . . . 38 | |||
| 12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 38 | ||||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 39 | 13.2. Informative References . . . . . . . . . . . . . . . . . . 40 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 | |||
| A.1. Changes between Version -00 and Version -01 of | A.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 | |||
| A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 | A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 | A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43 | A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 44 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . . . 45 | Intellectual Property and Copyright Statements . . . . . . . . . . 45 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| The original standards for Internationalized Domain Names (IDNs) were | The original standards for Internationalized Domain Names (IDNs) were | |||
| completed and deployed starting in 2003. Those standards are known | completed and deployed starting in 2003. Those standards are known | |||
| as Internationalized Domain Names in Applications (IDNA), taken from | as Internationalized Domain Names in Applications (IDNA), taken from | |||
| the name of the highest level standard within the group, RFC 3490 | the name of the highest level standard within the group, RFC 3490 | |||
| [RFC3490]. After those standards were deployed, a number of issues | [RFC3490]. After those standards were deployed, a number of issues | |||
| arose that called for a new version of the IDNA protocol and the | arose that led to a call for a new version of the IDNA protocol and | |||
| associated tables, including a subset of those described in a recent | the associated tables, including a subset of those described in a | |||
| IAB report [RFC4690] and the need to update the system to deal with | recent IAB report [RFC4690] and the need to update the system to deal | |||
| newer versions of Unicode. This document further explains the issues | with newer versions of Unicode. This document further explains the | |||
| that have been encountered when they are important to understanding | issues that have been encountered when they are important to | |||
| of the revised protocols. It also provides an overview of the new | understanding of the revised protocols. It also provides an overview | |||
| IDNA model and explanatory material for it. Additional explanatory | of the new IDNA model and explanatory material for it. Additional | |||
| material for the specific components of the proposals appears with | explanatory material for the specific components of the proposals | |||
| the associated documents. | appears with the associated documents. | |||
| A good deal of the background material that appeared in RFC 3490 | A good deal of the background material that appeared in RFC 3490 | |||
| [RFC3490] has been removed from this update. That material is either | [RFC3490] has been removed from this update. That material is either | |||
| of historical interest only or has been covered from a more recent | of historical interest only or has been covered from a more recent | |||
| perspective in RFC 4690 [RFC4690]. | perspective in RFC 4690 [RFC4690]. | |||
| This document is not normative. The information it provides is | This document is not normative. The information it provides is | |||
| intended to make the rules, tables, and protocol easier to understand | intended to make the rules, tables, and protocol easier to understand | |||
| and to provide overview information and suggestions for zone | and to provide overview information and suggestions for zone | |||
| administrators and others who need to make policy, deployment, and | administrators and others who need to make policy, deployment, and | |||
| skipping to change at page 5, line 19 ¶ | skipping to change at page 5, line 19 ¶ | |||
| 2003, i.e., those commonly known as the IDNA base specification | 2003, i.e., those commonly known as the IDNA base specification | |||
| [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | |||
| [RFC3454]. In this document, those names are used to refer, | [RFC3454]. In this document, those names are used to refer, | |||
| conceptually, to the individual documents, with the base IDNA | conceptually, to the individual documents, with the base IDNA | |||
| specification called just "IDNA". | specification called just "IDNA". | |||
| The term "IDNA2008" is used to refer to a new version of IDNA as | The term "IDNA2008" is used to refer to a new version of IDNA as | |||
| described in this document and in the documents described in the | described in this document and in the documents described in the | |||
| document listing of [IDNA2008-Defs]. IDNA2008 is not dependent on | document listing of [IDNA2008-Defs]. IDNA2008 is not dependent on | |||
| any of the IDNA2003 specifications other than the one for Punycode | any of the IDNA2003 specifications other than the one for Punycode | |||
| encoding. References to "these specifications" are to the entire | encoding. References to "these specifications" or "these documents" | |||
| set. | are to the entire IDNA2008 set. | |||
| 1.3.2. DNS "Name" Terminology | 1.3.2. DNS "Name" Terminology | |||
| These documents depart from historical DNS terminology and usage in | These documents depart from historical DNS terminology and usage in | |||
| one important respect. Over the years, the community has talked very | one important respect. Over the years, the community has talked very | |||
| casually about "names" in the DNS, beginning with calling it "the | casually about "names" in the DNS, beginning with calling it "the | |||
| domain name system". That terminology is fine in the very precise | domain name system". That terminology is fine in the very precise | |||
| sense that the identifiers of the DNS do provide names for objects | sense that the identifiers of the DNS do provide names for objects | |||
| and addresses. But, in the context of IDNs, the term has introduced | and addresses. But, in the context of IDNs, the term has introduced | |||
| some confusion, confusion that has increased further as people have | some confusion, confusion that has increased further as people have | |||
| skipping to change at page 5, line 50 ¶ | skipping to change at page 5, line 50 ¶ | |||
| possible for them to be "words". | possible for them to be "words". | |||
| This distinction is important because the reasonable goal of an IDN | This distinction is important because the reasonable goal of an IDN | |||
| effort is not to be able to write the great Klingon (or language of | effort is not to be able to write the great Klingon (or language of | |||
| one's choice) novel in DNS labels but to be able to form a usefully | one's choice) novel in DNS labels but to be able to form a usefully | |||
| broad range of mnemonics in ways that are as natural as possible in a | broad range of mnemonics in ways that are as natural as possible in a | |||
| very broad range of scripts. | very broad range of scripts. | |||
| 1.3.3. New Terminology and Restrictions | 1.3.3. New Terminology and Restrictions | |||
| These documents [IDNA2008-Defs] introduce new terminology, and | These documents introduce new terminology, and precise definitions, | |||
| precise definitions, for the terms "U-labels", "A-labels", labels | for the terms "U-labels", "A-labels", labels that are "IDNA-valid", | |||
| that are "IDNA-valid", and an "LDH-label" (differing from an LDH- | and an "LDH-label" (differing from an LDH-conformant label or fully- | |||
| conformant label or fully-qualified domain name). The also introduce | qualified domain name). They also introduce a restriction, for IDNA- | |||
| a restriction, for IDNA-conformant applications and DNS zones in | conformant applications and DNS zones in which IDNA is used, on | |||
| which IDNA is used, on strings used as labels that contain "--" in | strings used as labels that contain "--" in the third and fourth | |||
| the third and fourth positions, essentially requiring that such | positions, essentially requiring that such strings be IDNA-valid. | |||
| strings be IDNA-valid. This restriction on strings containing "--" | This restriction on strings containing "--" is required for three | |||
| is required for three reasons: | reasons: | |||
| o to prevent confusion with pre-IDNA coding forms; | o to prevent confusion with pre-IDNA coding forms; | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 7.4); | prefix, no matter how unlikely those might be (see Section 7.4); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the Punycode encoding | o to reduce the opportunities for attacks via the Punycode encoding | |||
| algorithm itself. | algorithm itself. | |||
| Figure 1 of the Definitions Document [IDNA2008-Defs] illustrates the | ||||
| terminology used by IDNA for various types of labels and strings and | ||||
| their relationship. | ||||
| 1.4. Objectives | 1.4. Objectives | |||
| The intent of the IDNA revision effort, and hence of this document | The intent of the IDNA revision effort, and hence of this document | |||
| and the associated ones, is to increase the usability and | and the associated ones, is to increase the usability and | |||
| effectiveness of internationalized domain names (IDNs) while | effectiveness of internationalized domain names (IDNs) while | |||
| preserving or strengthening the integrity of references that use | preserving or strengthening the integrity of references that use | |||
| them. The original "hostname" character definitions (see, e.g., | them. The original "hostname" character definitions (see, e.g., | |||
| [RFC0810]) struck a balance between the creation of useful mnemonics | [RFC0810]) struck a balance between the creation of useful mnemonics | |||
| and the introduction of parsing problems or general confusion in the | and the introduction of parsing problems or general confusion in the | |||
| contexts in which domain names are used. The objective of IDNA2008 | contexts in which domain names are used. The objective of IDNA2008 | |||
| skipping to change at page 9, line 42 ¶ | skipping to change at page 10, line 4 ¶ | |||
| Unicode. | Unicode. | |||
| The actual registration and lookup protocols for IDNA2008 are | The actual registration and lookup protocols for IDNA2008 are | |||
| specified in [IDNA2008-Protocol]. | specified in [IDNA2008-Protocol]. | |||
| 3. Permitted Characters: An Inclusion List | 3. Permitted Characters: An Inclusion List | |||
| This section provides an overview of the model used to establish the | This section provides an overview of the model used to establish the | |||
| algorithm and character lists of [IDNA2008-Tables] and describes the | algorithm and character lists of [IDNA2008-Tables] and describes the | |||
| names and applicability of the categories used there. Note that the | names and applicability of the categories used there. Note that the | |||
| inclusion of a character in the first category group does not imply | inclusion of a character in the first category group (Section 3.1.1) | |||
| that it can be used indiscriminately; some characters are associated | does not imply that it can be used indiscriminately; some characters | |||
| with contextual rules that must be applied as well. | are associated with contextual rules that must be applied as well. | |||
| The information given in this section is provided to make the rules, | The information given in this section is provided to make the rules, | |||
| tables, and protocol easier to understand. The normative generating | tables, and protocol easier to understand. The normative generating | |||
| rules that correspond to this informal discussion appear in | rules that correspond to this informal discussion appear in | |||
| [IDNA2008-Tables] and the rules that actually determine what labels | [IDNA2008-Tables] and the rules that actually determine what labels | |||
| can be registered or looked up are in [IDNA2008-Protocol]. | can be registered or looked up are in [IDNA2008-Protocol]. | |||
| 3.1. A Tiered Model of Permitted Characters and Labels | 3.1. A Tiered Model of Permitted Characters and Labels | |||
| Moving to an inclusion model requires respecifying the list of | Moving to an inclusion model requires respecifying the list of | |||
| characters that are permitted in IDNs. In IDNA2003, the role and | characters that are permitted in IDNs. In IDNA2003, the role and | |||
| utility of characters are independent of context and fixed forever | utility of characters are independent of context and fixed forever | |||
| (or until the standard is replaced). Making completely context- | (or until the standard is replaced). Making completely context- | |||
| independent rules globally has proven impractical because some | independent rules globally has proven impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| no visible effect(s) in others. IDNA2003 prohibited those types of | no visible effect(s) in others. IDNA2003 prohibited those types of | |||
| characters entirely. But the restrictions were much too severe to | characters entirely. But the restrictions were much too severe to | |||
| permit an adequate range of mnemonics for terminology based on some | permit an adequate range of mnemonics for identifiers based on some | |||
| languages. The requirement to support those characters but limit | languages. The requirement to support those characters but limit | |||
| their use to very specific contexts was reinforced by the observation | their use to very specific contexts was reinforced by the observation | |||
| that handling of particular characters across the languages that use | that handling of particular characters across the languages that use | |||
| a script, or the use of similar or identical-looking characters in | a script, or the use of similar or identical-looking characters in | |||
| different scripts, is less well understood than many people believed | different scripts, is less well understood than many people believed | |||
| it was several years ago. | it was several years ago. | |||
| Independently of the characters chosen (see next subsection), the | Independently of the characters chosen (see next subsection), the | |||
| approach is to divide the characters that appear in Unicode into | approach is to divide the characters that appear in Unicode into | |||
| three categories: | three categories: | |||
| skipping to change at page 16, line 40 ¶ | skipping to change at page 16, line 48 ¶ | |||
| appear embedded in text that is otherwise in some other character | appear embedded in text that is otherwise in some other character | |||
| coding. | coding. | |||
| All protocols that use domain name slots already have the capacity | All protocols that use domain name slots already have the capacity | |||
| for handling domain names in the ASCII charset. Thus, A-labels can | for handling domain names in the ASCII charset. Thus, A-labels can | |||
| inherently be handled by those protocols. | inherently be handled by those protocols. | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | |||
| Character Forms | Character Forms | |||
| [[anchor14: There is some internal redundancy and repetition in the | [[anchor13: There is some internal redundancy and repetition in the | |||
| material in this section. Specific suggestions about to reduce or | material in this section. Specific suggestions about to reduce or | |||
| eliminate redundant text for -05 would be appreciated.]] | eliminate redundant text would be appreciated. If no such | |||
| suggestions are received before -07 is posted, this not will be | ||||
| removed.]] | ||||
| Users often have expectations about character matching or equivalence | Users often have expectations about character matching or equivalence | |||
| that are based on their own languages and the orthography of those | that are based on their own languages and the orthography of those | |||
| languages. These expectations may not be consistent with forms or | languages. These expectations may not be consistent with forms or | |||
| actions that can be naturally accommodated in a character coding | actions that can be naturally accommodated in a character coding | |||
| system, especially if multiple languages are written using the same | system, especially if multiple languages are written using the same | |||
| script but using different conventions. A Norwegian user might | script but using different conventions. A Norwegian user might | |||
| expect a label with the ae-ligature to be treated as the same label | expect a label with the ae-ligature to be treated as the same label | |||
| as one using the Swedish spelling with a-diaeresis even though | as one using the Swedish spelling with a-diaeresis even though | |||
| applying that mapping to English would be astonishing to users. A | applying that mapping to English would be astonishing to users. A | |||
| skipping to change at page 18, line 42 ¶ | skipping to change at page 19, line 4 ¶ | |||
| these situations in a system such as IDNA -- or with Unicode | these situations in a system such as IDNA -- or with Unicode | |||
| normalization generally -- since determining what to do requires | normalization generally -- since determining what to do requires | |||
| information about the language being used, context, or both. | information about the language being used, context, or both. | |||
| Consequently, these specifications make no attempt to treat these | Consequently, these specifications make no attempt to treat these | |||
| combined characters in any special way. However, their existence | combined characters in any special way. However, their existence | |||
| provides a prime example of a situation in which a registry that is | provides a prime example of a situation in which a registry that is | |||
| aware of the language context in which labels are to be registered, | aware of the language context in which labels are to be registered, | |||
| and where that language sometimes (or always) treats the two- | and where that language sometimes (or always) treats the two- | |||
| character sequences as equivalent to the combined form, should give | character sequences as equivalent to the combined form, should give | |||
| serious consideration to applying a "variant" model [RFC3743] | serious consideration to applying a "variant" model [RFC3743] | |||
| [RFC4290], or to prohibiting registration of one the forms entirely, | [RFC4290], or to prohibiting registration of one the forms entirely, | |||
| to reduce the opportunities for user confusion and fraud that would | to reduce the opportunities for user confusion and fraud that would | |||
| result from the related strings being registered to different | result from the related strings being registered to different | |||
| parties. | parties. | |||
| [[anchor14: Placeholder: A discussion of the Arabic digit issue | ||||
| shoudl go here once it is resolved in some appropriate way.]] | ||||
| 4.4. Case Mapping and Related Issues | 4.4. Case Mapping and Related Issues | |||
| In the DNS, ASCII letters are stored with their case preserved. | In the DNS, ASCII letters are stored with their case preserved. | |||
| Matching during the query process is case-independent, but none of | Matching during the query process is case-independent, but none of | |||
| the information that might be represented by choices of case has been | the information that might be represented by choices of case has been | |||
| lost. That model has been accidentally helpful because, as people | lost. That model has been accidentally helpful because, as people | |||
| have created DNS labels by catenating words (or parts of words) to | have created DNS labels by catenating words (or parts of words) to | |||
| form labels, case has often been used to distinguish among components | form labels, case has often been used to distinguish among components | |||
| and make the labels more memorable. | and make the labels more memorable. | |||
| The solution of keeping the characters separate but doing matching | The solution of keeping the characters separate but doing matching | |||
| independent of case is not feasible with IDNA or any IDNA-like model | independent of case is not feasible with IDNA or any IDNA-like model | |||
| because the matching would then have to be done on the server rather | because the matching would then have to be done on the server rather | |||
| than have characters mapped on the client. That situation was | than have characters mapped on the client. That situation was | |||
| recognized in IDNA2003 and nothing in IDNA2008 fundamentally changes | recognized in IDNA2003 and nothing in these specifications | |||
| it or could do so. In IDNA2003, all characters are case-folded and | fundamentally changes it or could do so. In IDNA2003, all characters | |||
| mapped. That results in upper-case characters being mapped to lower- | are case-folded and mapped. That results in upper-case characters | |||
| case ones and in some other transformations of alternate forms of | being mapped to lower-case ones and in some other transformations of | |||
| characters, especially those that do not have (or did not have) | alternate forms of characters, especially those that do not have (or | |||
| upper-case forms. For example, Greek Final Form Sigma (U+03C2) is | did not have) upper-case forms. For example, Greek Final Form Sigma | |||
| mapped to the medial form (U+03C3) and Eszett (German Sharp S, | (U+03C2) is mapped to the medial form (U+03C3) and Eszett (German | |||
| U+00DF) is mapped to "ss". Neither of these mappings is reversible | Sharp S, U+00DF) is mapped to "ss". Neither of these mappings is | |||
| because the upper case of U+03C3 is the Upper Case Sigma (U+03A3) and | reversible because the upper case of U+03C3 is the Upper Case Sigma | |||
| "ss" is an ASCII string. IDNA2008 permits, at the risk of some | (U+03A3) and "ss" is an ASCII string. IDNA2008 permits, at the risk | |||
| incompatibility, slightly more flexibility in this area by avoid case | of some incompatibility, slightly more flexibility in this area by | |||
| folding and treating these characters as themselves. Approaches to | avoid case folding and treating these characters as themselves. | |||
| handling the incompatibility are discussed in Section 7.2. Although | Approaches to handling that incompatibility are discussed in | |||
| information is lost in IDNA2003's ToASCII operation so that, in some | Section 7.2. Although information is lost in IDNA2003's ToASCII | |||
| sense, Final Sigma Eszett cannot be represented in an IDN at all, its | operation so that, in some sense, neither Final Sigma nor Eszett can | |||
| guarantee of mapping when those characters are used as input can be | be represented in an IDN at all, its guarantee of mapping when those | |||
| interpreted as violating one of the conditions discussed in | characters are used as input can be interpreted as violating one of | |||
| Section 7.4.1 and hence requiring a prefix change. The consensus was | the conditions discussed in Section 7.4.1 and hence requiring a | |||
| to not make a prefix change in spite of this issue. Of course, had a | prefix change. The consensus was to not make a prefix change in | |||
| prefix change been made (at the costs discussed in Section 7.4.3) | spite of this issue. Of course, had a prefix change been made (at | |||
| there would have been several options, including, if desired, | the costs discussed in Section 7.4.3) there would have been several | |||
| assignment of the character to the CONTEXTUAL RULE REQUIRED category | options, including, if desired, assignment of the character to the | |||
| and requiring that it only be used in carefully-selected contexts. | CONTEXTUAL RULE REQUIRED category and requiring that it only be used | |||
| in carefully-selected contexts. | ||||
| 4.5. Right to Left Text | 4.5. Right to Left Text | |||
| In order to be sure that the directionality of right to left text is | In order to be sure that the directionality of right to left text is | |||
| unambiguous, IDNA2003 required that any label in which right to left | unambiguous, IDNA2003 required that any label in which right to left | |||
| characters appear both starts and ends with them, not include any | characters appear both starts and ends with them, not include any | |||
| characters with strong left to right properties (which excludes other | characters with strong left to right properties (which excludes other | |||
| alphabetic characters but permits European digits), and rejects any | alphabetic characters but permits European digits), and rejects any | |||
| other string that contains a right to left character. This is one of | other string that contains a right to left character. This is one of | |||
| the few places where the IDNA algorithms (both in IDNA2003 and in | the few places where the IDNA algorithms (both in IDNA2003 and in | |||
| skipping to change at page 22, line 14 ¶ | skipping to change at page 22, line 29 ¶ | |||
| in an arbitrary context (such as running text), it is difficult, even | in an arbitrary context (such as running text), it is difficult, even | |||
| with only ASCII characters, to know whether an actual domain name (or | with only ASCII characters, to know whether an actual domain name (or | |||
| a protocol parameter like a URI) is present and where it starts and | a protocol parameter like a URI) is present and where it starts and | |||
| ends. When using Unicode, this gets even more difficult if treatment | ends. When using Unicode, this gets even more difficult if treatment | |||
| of certain special characters (like the dot that separates labels in | of certain special characters (like the dot that separates labels in | |||
| a domain name) depends on context (e.g., prior knowledge of whether | a domain name) depends on context (e.g., prior knowledge of whether | |||
| the string represents a domain name or not). That knowledge is not | the string represents a domain name or not). That knowledge is not | |||
| available if the primary heuristic for identifying the presence of | available if the primary heuristic for identifying the presence of | |||
| domain names in strings depends on the presence of dots separating | domain names in strings depends on the presence of dots separating | |||
| groups of characters with no intervening spaces. | groups of characters with no intervening spaces. | |||
| [[anchor16: Above text is a substitute for an earlier (pre -01) | ||||
| version and is hoped to be more clear. Comments and improvements | ||||
| welcome.]] | ||||
| As discussed elsewhere in this document, the IDNA2008 model removes | As discussed elsewhere in this document, the IDNA2008 model removes | |||
| all of these mappings and interpretations, including the equivalence | all of these mappings and interpretations, including the equivalence | |||
| of different forms of dots, from the protocol, discouraging such | of different forms of dots, from the protocol, discouraging such | |||
| mappings and leaving them, when necessary, to local processing. This | mappings and leaving them, when necessary, to local processing. This | |||
| should not be taken to imply that local processing is optional or can | should not be taken to imply that local processing is optional or can | |||
| be avoided entirely, even if doing so might have been desirable in a | be avoided entirely, even if doing so might have been desirable in a | |||
| world without IDNA2003 IDNs in files and archives. Instead, unless | world without IDNA2003 IDNs in files and archives. Instead, unless | |||
| the program context is such that it is known that any IDNs that | the program context is such that it is known that any IDNs that | |||
| appear will contain either U-label or A-label forms, or that other | appear will contain either U-label or A-label forms, or that other | |||
| skipping to change at page 23, line 40 ¶ | skipping to change at page 24, line 5 ¶ | |||
| In either case, it is vital that user interface designs and, where | In either case, it is vital that user interface designs and, where | |||
| the interfaces are not sufficient, users, be aware that the only | the interfaces are not sufficient, users, be aware that the only | |||
| forms of domain names that this protocol anticipates will resolve | forms of domain names that this protocol anticipates will resolve | |||
| globally or compare equal when crude methods (i.e., those not | globally or compare equal when crude methods (i.e., those not | |||
| conforming to the strict definition of label equivalence given in | conforming to the strict definition of label equivalence given in | |||
| [IDNA2008-Defs]) are used are those in which all native-script labels | [IDNA2008-Defs]) are used are those in which all native-script labels | |||
| are in U-label form. Forms that assume mapping will occur, | are in U-label form. Forms that assume mapping will occur, | |||
| especially forms that were not valid under IDNA2003, may or may not | especially forms that were not valid under IDNA2003, may or may not | |||
| function in predictable ways across all implementations. | function in predictable ways across all implementations. | |||
| User interfaces involving Latin-based scripts should take special | ||||
| care when considering how to handle case mapping because small | ||||
| differences in label strings may cause behavior that is astonishing | ||||
| to users. Because case-insensitive mapping is done for ASCII strings | ||||
| by DNS-servers, an all-ASCII label is treated as case-insensitive. | ||||
| However, if even one of the characters of that string is replaced by | ||||
| one that requires the label to be given IDN treatment (e.g., by | ||||
| adding a diacritical mark), then the label immediately becomes case- | ||||
| sensitive. This suggests that case mapping for Latin-based scripts | ||||
| (and possibly other scripts with case distinctions) as a | ||||
| preprocessing matter in applications may be wise to prevent user | ||||
| astonishment, but, since all applications may not do this and | ||||
| ambiguity in transport is not desirable, the that case-dependent | ||||
| forms should not be stored in files. | ||||
| 7. Migration from IDNA2003 and Unicode Version Synchronization | 7. Migration from IDNA2003 and Unicode Version Synchronization | |||
| 7.1. Design Criteria | 7.1. Design Criteria | |||
| As mentioned above and in RFC 4690, two key goals of the IDNA2008 | As mentioned above and in RFC 4690, two key goals of the IDNA2008 | |||
| design are to enable applications to be agnostic about whether they | design are to enable applications to be agnostic about whether they | |||
| are being run in environments supporting any Unicode version from 3.2 | are being run in environments supporting any Unicode version from 3.2 | |||
| onward and to permit incrementally adding new characters, character | onward and to permit incrementally adding new characters, character | |||
| groups, scripts, and other character collections as they are | groups, scripts, and other character collections as they are | |||
| incorporated into Unicode, without disruption and, in the long term, | incorporated into Unicode, without disruption and, in the long term, | |||
| skipping to change at page 27, line 8 ¶ | skipping to change at page 27, line 33 ¶ | |||
| o Validate the label itself for conformance with a small number of | o Validate the label itself for conformance with a small number of | |||
| whole-label rules, notably verifying that there are no leading | whole-label rules, notably verifying that there are no leading | |||
| combining marks, that the "bidi" conditions are met if right to | combining marks, that the "bidi" conditions are met if right to | |||
| left characters appear, that any required contextual rules are | left characters appear, that any required contextual rules are | |||
| available and that, if such rules are associated with Joiner | available and that, if such rules are associated with Joiner | |||
| Controls, they are tested. | Controls, they are tested. | |||
| o Avoid validating other contextual rules about characters, | o Avoid validating other contextual rules about characters, | |||
| including mixed-script label prohibitions, although such rules may | including mixed-script label prohibitions, although such rules may | |||
| be used to influence presentation decisions in the user interface. | be used to influence presentation decisions in the user interface. | |||
| [[anchor19: Check this, and all similar statements, against | [[anchor18: Check this, and all similar statements, against | |||
| Protocol when that is finished.]] | Protocol when that is finished.]] | |||
| By avoiding applying its own interpretation of which labels are valid | By avoiding applying its own interpretation of which labels are valid | |||
| as a means of rejecting lookup attempts, the lookup application | as a means of rejecting lookup attempts, the lookup application | |||
| becomes less sensitive to version incompatibilities with the | becomes less sensitive to version incompatibilities with the | |||
| particular zone registry associated with the domain name. | particular zone registry associated with the domain name. | |||
| An application or client that processes names according to this | An application or client that processes names according to this | |||
| protocol and then resolves them in the DNS will be able to locate any | protocol and then resolves them in the DNS will be able to locate any | |||
| name that is validly registered, as long as its version of the | name that is validly registered, as long as its version of the | |||
| skipping to change at page 27, line 30 ¶ | skipping to change at page 28, line 7 ¶ | |||
| of the characters in the label. Messages to users should distinguish | of the characters in the label. Messages to users should distinguish | |||
| between "label contains an unallocated code point" and other types of | between "label contains an unallocated code point" and other types of | |||
| lookup failures. A failure on the basis of an old version of Unicode | lookup failures. A failure on the basis of an old version of Unicode | |||
| may lead the user to a desire to upgrade to a newer version, but will | may lead the user to a desire to upgrade to a newer version, but will | |||
| have no other ill effects (this is consistent with behavior in the | have no other ill effects (this is consistent with behavior in the | |||
| transition to the DNS when some hosts could not yet handle some forms | transition to the DNS when some hosts could not yet handle some forms | |||
| of names or record types). | of names or record types). | |||
| 7.2. Changes in Character Interpretations | 7.2. Changes in Character Interpretations | |||
| [[anchor20: Note in Draft: This subsection is completely new in | [[anchor19: Note in Draft: This subsection is completely new in | |||
| version -04 of this document. It could almost certainly use | version -04 and has been further tuned in -05 and -06 of this | |||
| improvement. It also contains some material that is redundant with | document. It could almost certainly use improvement, although this | |||
| material in other sections. I have not tried to remove that material | note will be removed if there are not significant suggestions about | |||
| and will not do so until the WG concludes that this section is | the -06 version. It also contains some material that is redundant | |||
| relatively stable, but would appreciate help in identifying what | with material in other sections. I have not tried to remove that | |||
| material and will not do so until the WG concludes that this section | ||||
| is relatively stable, but would appreciate help in identifying what | ||||
| should be removed or how this might be enhanced to contain more of | should be removed or how this might be enhanced to contain more of | |||
| that other material. --JcK]] | that other material. --JcK]] | |||
| In those scripts that make case distinctions, there are a few | In those scripts that make case distinctions, there are a few | |||
| characters for which an obvious and unique upper case character has | characters for which an obvious and unique upper case character has | |||
| not historically been available to match a lower case one or vice | not historically been available to match a lower case one or vice | |||
| versa. For those characters, the mappings used in constructing the | versa. For those characters, the mappings used in constructing the | |||
| Stringprep tables for IDNA2003, performed using the Unicode CaseFold | Stringprep tables for IDNA2003, performed using the Unicode CaseFold | |||
| operation (See Section 5.8 of the Unicode Standard [Unicode51]), | operation (See Section 5.8 of the Unicode Standard [Unicode51]), | |||
| generate different characters or sets of characters. Those | generate different characters or sets of characters. Those | |||
| skipping to change at page 28, line 29 ¶ | skipping to change at page 29, line 8 ¶ | |||
| but a judgment that the incompatibility was not significant enough to | but a judgment that the incompatibility was not significant enough to | |||
| just a prefix change, the WG concluded that Eszett and Final Form | just a prefix change, the WG concluded that Eszett and Final Form | |||
| Sigma should be treated as distinct and Protocol-Valid characters. | Sigma should be treated as distinct and Protocol-Valid characters. | |||
| The decision faces registries, especially registries maintaining | The decision faces registries, especially registries maintaining | |||
| zones for third parties, with a variation on what has become a | zones for third parties, with a variation on what has become a | |||
| familiar problem: how to introduce a new service in a way that does | familiar problem: how to introduce a new service in a way that does | |||
| not create confusion or significantly weaken or invalidate existing | not create confusion or significantly weaken or invalidate existing | |||
| identifiers. | identifiers. | |||
| While it is beyond the scope of these documents to specify a | There have traditionally been several approaches to problems of this | |||
| preference for any of them, or to suggest that there are not other | type. Without any preference or claim to completeness, these are: | |||
| possibilities, there have traditionally been several approaches to | ||||
| problems of this type: | ||||
| o Do not permit use of the newly-available character at the registry | o Do not permit use of the newly-available character at the registry | |||
| level. This might cause lookup failures if a domain name were | level. This might cause lookup failures if a domain name were to | |||
| written with the expectation of the IDNA2003 mapping behavior, but | be written with the expectation of the IDNA2003 mapping behavior, | |||
| would eliminate any possibility of false matches. | but would eliminate any possibility of false matches. | |||
| o Hold a "sunrise" arrangement in which holders of the previously- | o Hold a "sunrise"-like arrangement in which holders of the | |||
| mapped labels (labels containing "ss" in the Eszett case or ones | previously-mapped labels (labels containing "ss" in the Eszett | |||
| containing Lower Case Sigma in the Final Sigma case) are given | case or ones containing Lower Case Sigma in the Final Sigma case) | |||
| priority (and perhaps other benefits) for registering the | are given priority (and perhaps other benefits) for registering | |||
| corresponding string containing the newly-available characters. | the corresponding string containing the newly-available | |||
| characters. | ||||
| o Adopt some sort of "variant" approach in which registrants either | o Adopt some sort of "variant" approach in which registrants either | |||
| obtained labels with both character forms or one of them was | obtained labels with both character forms or one of them was | |||
| blocked from registration by anyone but the registrant of the | blocked from registration by anyone but the registrant of the | |||
| other form. | other form. | |||
| In principle, lookup applications could also compensate for the | In principle, lookup applications could also compensate for the | |||
| difference in interpretation by looking up the string according to | difference in interpretation by looking up the string according to | |||
| the IDNA208 interpretation and then, if that failed, doing the lookup | the interpretation specified in these documents and then, if that | |||
| with the mapping, simulating the IDNA2003 interpretation. The risk | failed, doing the lookup with the mapping, simulating the IDNA2003 | |||
| of false positives is such that this is generally to be discouraged | interpretation. The risk of false positives is such that this is | |||
| unless the application is able to engage in a "did you really mean" | generally to be discouraged unless the application is able to engage | |||
| dialogue with the end user. | in a "is this what you meant" dialogue with the end user. | |||
| 7.3. More Flexibility in User Agents | 7.3. More Flexibility in User Agents | |||
| These specifications do not perform mappings between one character or | These specifications do not include mappings between one character or | |||
| code point and others for any reason. Instead, they prohibit the | code point and others for any reason. Instead, they prohibit the | |||
| characters that would be mapped to others by normalization, case | characters that would be mapped to others by normalization, upper | |||
| folding (with exceptions for lower case characters that have no upper | case to lower case changes, or other rules. As examples, while | |||
| case form, which are retained), or other rules. As examples, while | ||||
| mathematical characters based on Latin ones are accepted as input to | mathematical characters based on Latin ones are accepted as input to | |||
| IDNA2003, they are prohibited in IDNA2008. Similarly, double-width | IDNA2003, they are prohibited in IDNA2008. Similarly, double-width | |||
| characters and other variations are prohibited as IDNA input. | characters and other variations are prohibited as IDNA input. | |||
| Since the rules in [IDNA2008-Tables] have the effect that only | Since the rules in [IDNA2008-Tables] have the effect that only | |||
| strings that are not transformed by NFKC are valid, if an application | strings that are not transformed by NFKC are valid, if an application | |||
| chooses to perform NFKC normalization before lookup, that operation | chooses to perform NFKC normalization before lookup, that operation | |||
| is safe since this will never make the application unable to look up | is safe since this will never make the application unable to look up | |||
| any valid string. However, as discussed above, the application | any valid string. However, as discussed above, the application | |||
| cannot guarantee that any other application will perform that | cannot guarantee that any other application will perform that | |||
| skipping to change at page 30, line 12 ¶ | skipping to change at page 30, line 38 ¶ | |||
| As suggested earlier in this section, it appears to be desirable to | As suggested earlier in this section, it appears to be desirable to | |||
| do as little character mapping as possible consistent with having | do as little character mapping as possible consistent with having | |||
| Unicode work correctly (e.g., NFC mapping to resolve different | Unicode work correctly (e.g., NFC mapping to resolve different | |||
| codings for the same character is still necessary although the | codings for the same character is still necessary although the | |||
| specifications require that it be performed prior to invoking the | specifications require that it be performed prior to invoking the | |||
| protocol) and to make the mapping between A-labels and U-labels | protocol) and to make the mapping between A-labels and U-labels | |||
| idempotent. Case-mapping is not an exception to this principle. If | idempotent. Case-mapping is not an exception to this principle. If | |||
| only lower case characters can be registered in the DNS (i.e., be | only lower case characters can be registered in the DNS (i.e., be | |||
| present in a U-label), then IDNA2008 should prohibit upper-case | present in a U-label), then IDNA2008 should prohibit upper-case | |||
| characters as input. Some other considerations reinforce this | characters as input (and therefore does so). Some other | |||
| conclusion. For example, an essential element of the ASCII case- | considerations reinforce this conclusion. For example, an essential | |||
| mapping functions is that uppercase(character) must be equal to | element of the ASCII case-mapping functions is that, for individual | |||
| characters, uppercase(character) must be equal to | ||||
| uppercase(lowercase(character)). That requirement may not be | uppercase(lowercase(character)). That requirement may not be | |||
| satisfied with IDNs. For example, there are some characters in | satisfied with IDNs. For example, there are some characters in | |||
| scripts that use case distinction that do not have counterparts in | scripts that use case distinction that do not have counterparts in | |||
| one case or the other. The relationship between upper case and lower | one case or the other. The relationship between upper case and lower | |||
| case may even be language-dependent, with different languages (or | case may even be language-dependent, with different languages (or | |||
| even the same language in different areas) expecting different | even the same language in different areas) expecting different | |||
| mappings. Of course, the expectations of users who are accustomed to | mappings. Of course, the expectations of users who are accustomed to | |||
| a case-insensitive DNS environment will probably be well-served if | a case-insensitive DNS environment will probably be well-served if | |||
| user agents perform case folding prior to IDNA processing, but the | user agents perform case folding prior to IDNA processing, but the | |||
| IDNA procedures themselves should neither require such mapping nor | IDNA procedures themselves should neither require such mapping nor | |||
| skipping to change at page 33, line 29 ¶ | skipping to change at page 34, line 9 ¶ | |||
| there are no uniform conventions for naming; variations such as | there are no uniform conventions for naming; variations such as | |||
| outline, solid, and shaded forms may or may not exist; and so on. | outline, solid, and shaded forms may or may not exist; and so on. | |||
| As just one example, consider a "heart" symbol as it might appear | As just one example, consider a "heart" symbol as it might appear | |||
| in a logo that might be read as "I love...". While the user might | in a logo that might be read as "I love...". While the user might | |||
| read such a logo as "I love..." or "I heart...", considerable | read such a logo as "I love..." or "I heart...", considerable | |||
| knowledge of the coding distinctions made in Unicode is needed to | knowledge of the coding distinctions made in Unicode is needed to | |||
| know that there more than one "heart" character (e.g., U+2665, | know that there more than one "heart" character (e.g., U+2665, | |||
| U+2661, and U+2765) and how to describe it. These issues are of | U+2661, and U+2765) and how to describe it. These issues are of | |||
| particular importance if strings are expected to be understood or | particular importance if strings are expected to be understood or | |||
| transcribed by the listener after being read out loud. | transcribed by the listener after being read out loud. | |||
| [[anchor21: The above paragraph remains controversial as to | [[anchor20: The above paragraph remains controversial as to | |||
| whether it is valid. The WG will need to make a decision if this | whether it is valid. The WG will need to make a decision if this | |||
| section is not dropped entirely.]] | section is not dropped entirely.]] | |||
| o As a simplified example of this, assume one wanted to use a | o As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| those names are ambiguous in the Unicode system of naming (the | those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know -- absent careful study of | would-be registrant has no way to know -- absent careful study of | |||
| the code tables -- whether it is ambiguous (e.g., where there are | the code tables -- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| skipping to change at page 34, line 27 ¶ | skipping to change at page 35, line 6 ¶ | |||
| languages and scripts which would be treated like any other | languages and scripts which would be treated like any other | |||
| language characters; the two should not be confused. | language characters; the two should not be confused. | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code Points | 7.7. Migration Between Unicode Versions: Unassigned Code Points | |||
| In IDNA2003, labels containing unassigned code points are looked up | In IDNA2003, labels containing unassigned code points are looked up | |||
| on the assumption that, if they appear in labels and can be mapped | on the assumption that, if they appear in labels and can be mapped | |||
| and then resolved, the relevant standards must have changed and the | and then resolved, the relevant standards must have changed and the | |||
| registry has properly allocated only assigned values. | registry has properly allocated only assigned values. | |||
| In IDNA2008, strings containing unassigned code points must not be | In the protocol as described in these documents, strings containing | |||
| either looked up or registered. There are several reasons for this, | unassigned code points must not be either looked up or registered. | |||
| with the most important ones being: | There are several reasons for this, with the most important ones | |||
| being: | ||||
| o It cannot be known with sufficient reliability in advance that a | o It cannot be known with sufficient reliability in advance that a | |||
| code point that was not previously assigned will not be assigned | code point that was not previously assigned will not be assigned | |||
| to a compatibility character. In IDNA2003, since there is no | to a compatibility character or one that would be otherwise | |||
| direct dependency on NFKC (Stringprep's tables are based on NFKC, | disallowed by the rules in [IDNA2008-Tables]. In IDNA2003, since | |||
| but IDNA2003 depends only on Stringprep), allocation of a | there is no direct dependency on NFKC (Stringprep's tables are | |||
| compatibility character might produce some odd situations, but it | based on NFKC, but IDNA2003 depends only on Stringprep), | |||
| would not be a problem. In IDNA2008, where compatibility | allocation of a compatibility character might produce some odd | |||
| characters are generally assigned to DISALLOWED, permitting | situations, but it would not be a problem. In IDNA2008, where | |||
| strings containing unassigned characters to be looked up would | compatibility characters are generally assigned to DISALLOWED, | |||
| permit violating the principle that characters in DISALLOWED are | permitting strings containing unassigned characters to be looked | |||
| not looked up. | up would permit violating the principle that characters in | |||
| DISALLOWED are not looked up. | ||||
| o More generally, the status of an unassigned character with regard | o More generally, the status of an unassigned character with regard | |||
| to the DISALLOWED and PROTOCOL-VALID categories, and whether | to the DISALLOWED and PROTOCOL-VALID categories, and whether | |||
| contextual rules are required with the latter, cannot be evaluated | contextual rules are required with the latter, cannot be evaluated | |||
| until a character is actually assigned and known. | until a character is actually assigned and known. By contrast, | |||
| characters that are actually DISALLOWED are placed in that | ||||
| category only as a consequence of rules applied to known | ||||
| properties or per-character evaluation. | ||||
| It is possible to argue that the issues above are not important and | It is possible to argue that the issues above are not important and | |||
| that, as a consequence, it is better to retain the principle of | that, as a consequence, it is better to retain the principle of | |||
| looking up labels even if they contain unassigned characters because | looking up labels even if they contain unassigned characters because | |||
| all of the important scripts and characters have been coded as of | all of the important scripts and characters have been coded as of | |||
| Unicode 5.1 and hence unassigned code points will be assigned only to | Unicode 5.1 and hence unassigned code points will be assigned only to | |||
| obscure characters or archaic scripts. Unfortunately, that does not | obscure characters or archaic scripts. Unfortunately, that does not | |||
| appear to be a safe assumption for at least two reasons. First, much | appear to be a safe assumption for at least two reasons. First, much | |||
| the same claim of completeness has been made for earlier versions of | the same claim of completeness has been made for earlier versions of | |||
| Unicode. The reality is that a script that is obscure to much of the | Unicode. The reality is that a script that is obscure to much of the | |||
| world may still be very important to those who use it. Cultural and | world may still be very important to those who use it. Cultural and | |||
| linguistic preservation principles make it inappropriate to declare | linguistic preservation principles make it inappropriate to declare | |||
| the script of no importance in IDNs. Second, we already have | the script of no importance in IDNs. Second, we already have | |||
| counterexamples in, e.g., the relationships associated with new Han | counterexamples in, e.g., the relationships associated with new Han | |||
| characters being added (whether in the BMP or in Unicode Plane 2). | characters being added (whether in the BMP or in Unicode Plane 2). | |||
| 7.8. Other Compatibility Issues | 7.8. Other Compatibility Issues | |||
| The existing (2003) IDNA model includes several odd artifacts of the | The 2003 IDNA model includes several odd artifacts of the context in | |||
| context in which it was developed. Many, if not all, of these are | which it was developed. Many, if not all, of these are potential | |||
| potential avenues for exploits, especially if the registration | avenues for exploits, especially if the registration process permits | |||
| process permits "source" names (names that have not been processed | "source" names (names that have not been processed through IDNA and | |||
| through IDNA and Nameprep) to be registered. As one example, since | Nameprep) to be registered. As one example, since the character | |||
| the character Eszett, used in German, is mapped by IDNA2003 into the | Eszett, used in German, is mapped by IDNA2003 into the sequence "ss" | |||
| sequence "ss" rather than being retained as itself or prohibited, a | rather than being retained as itself or prohibited, a string | |||
| string containing that character but that is otherwise in ASCII is | containing that character but that is otherwise in ASCII is not | |||
| not really an IDN (in the U-label sense defined above) at all. After | really an IDN (in the U-label sense defined above) at all. After | |||
| Nameprep maps the Eszett out, the result is an ASCII string and so | Nameprep maps the Eszett out, the result is an ASCII string and so | |||
| does not get an xn-- prefix, but the string that can be displayed to | does not get an xn-- prefix, but the string that can be displayed to | |||
| a user appears to be an IDN. The proposed IDNA2008 eliminates this | a user appears to be an IDN. The newer version of the protocol | |||
| artifact. A character is either permitted as itself or it is | eliminates this artifact. A character is either permitted as itself | |||
| prohibited; special cases that make sense only in a particular | or it is prohibited; special cases that make sense only in a | |||
| linguistic or cultural context can be dealt with as localization | particular linguistic or cultural context can be dealt with as | |||
| matters where appropriate. | localization matters where appropriate. | |||
| 8. Acknowledgments | 8. Acknowledgments | |||
| The editor and contributors would like to express their thanks to | The editor and contributors would like to express their thanks to | |||
| those who contributed significant early (pre-WG) review comments, | those who contributed significant early (pre-WG) review comments, | |||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | |||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | Simon Josefsson, and Sam Weiler. In addition, some specific ideas | |||
| were incorporated from suggestions, text, or comments about sections | were incorporated from suggestions, text, or comments about sections | |||
| that were unclear supplied by Frank Ellerman, Michael Everson, Asmus | that were unclear supplied by Frank Ellerman, Michael Everson, Asmus | |||
| Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, | Freytag, Erik van der Poel, Michel Suignard, and Ken Whistler, | |||
| skipping to change at page 37, line 51 ¶ | skipping to change at page 38, line 37 ¶ | |||
| 12. Security Considerations | 12. Security Considerations | |||
| 12.1. General Security Issues with IDNA | 12.1. General Security Issues with IDNA | |||
| This document in the IDNA2008 series is purely explanatory and | This document in the IDNA2008 series is purely explanatory and | |||
| informational and consequently introduces no new security issues. It | informational and consequently introduces no new security issues. It | |||
| would, of course, be a poor idea for someone to try to implement from | would, of course, be a poor idea for someone to try to implement from | |||
| it; such an attempt would almost certainly lead to interoperability | it; such an attempt would almost certainly lead to interoperability | |||
| problems and might lead to security ones. A discussion of security | problems and might lead to security ones. A discussion of security | |||
| issues with IDNA2008, and IDNA generally, appears in [IDNA2008-Defs]. | issues with IDNA, including some relevant history, appears in | |||
| [IDNA2008-Defs]. | ||||
| 12.2. Security Differences from IDNA2003 | ||||
| The registration and lookup models described in this set of documents | ||||
| change the mechanisms available for lookup applications to determine | ||||
| the validity of labels they encounter. In some respects, the ability | ||||
| to test is strengthened. For example, putative labels that contain | ||||
| unassigned code points will now be rejected, while IDNA2003 permitted | ||||
| them (something that is now recognized as a considerable source of | ||||
| risk). On the other hand, the protocol specification no longer | ||||
| assumes that the application that looks up a name will be able to | ||||
| determine, and apply, information about the protocol version used in | ||||
| registration. In theory, that may increase risk since the | ||||
| application will be able to do less pre-lookup validation. In | ||||
| practice, the protection afforded by that test has been largely | ||||
| illusory for reasons explained in RFC 4690 and above. | ||||
| Any change to Stringprep or, more broadly, the IETF's model of the | ||||
| use of internationalized character strings in different protocols, | ||||
| creates some risk of inadvertent changes to those protocols, | ||||
| invalidating deployed applications or databases, and so on. The same | ||||
| considerations that would require changing the IDN prefix (see the | ||||
| discussion of prefix changes in Section 7.4) are the ones that would, | ||||
| e.g., invalidate certificates or hashes that depend on Stringprep, | ||||
| but those cases require careful consideration and evaluation. More | ||||
| important, it is not necessary to change Stringprep at all in order | ||||
| to create a definition or implementation of IDNA as specified in this | ||||
| set of documents. Because these documents do not depend on | ||||
| Stringprep at all, the question of upgrading other protocols that do | ||||
| depend on Stringprep can be left to experts on those protocols: there | ||||
| is no dependency between IDNA changes and possible upgrades to | ||||
| security protocols or conventions. | ||||
| 13. References | 13. References | |||
| 13.1. Normative References | 13.1. Normative References | |||
| [ASCII] American National Standards Institute (formerly United | [ASCII] American National Standards Institute (formerly United | |||
| States of America Standards Institute), "USA Code for | States of America Standards Institute), "USA Code for | |||
| Information Interchange", ANSI X3.4-1968, 1968. | Information Interchange", ANSI X3.4-1968, 1968. | |||
| ANSI X3.4-1968 has been replaced by newer versions with | ANSI X3.4-1968 has been replaced by newer versions with | |||
| skipping to change at page 44, line 5 ¶ | skipping to change at page 43, line 48 ¶ | |||
| input to IDNA2003 and 2008. | input to IDNA2003 and 2008. | |||
| o Some material, including this section/appendix, rearranged. | o Some material, including this section/appendix, rearranged. | |||
| A.5. Version -05 | A.5. Version -05 | |||
| o Many small editorial changes, including changes to eliminate the | o Many small editorial changes, including changes to eliminate the | |||
| last vestiges of what appeared to be 2119 language (upper-case | last vestiges of what appeared to be 2119 language (upper-case | |||
| MUST, SHOULD, or MAY) and small adjustments to terminology. | MUST, SHOULD, or MAY) and small adjustments to terminology. | |||
| A.6. Version -06 | ||||
| o Removed Security Considerations material and pointed to Defs, | ||||
| where it now appears as of version 05. | ||||
| o Started changing uses of "IDNA2008" in running text to "in these | ||||
| specifications" or the equivalent. These documents are titled | ||||
| simply "IDNA"; once they are standardized, "the current version" | ||||
| may be a more appropriate reference than one containing a year. | ||||
| As discussed on the mailing list, we can and should discuss how to | ||||
| refer to these documents at an appropriate time (e.g., when we | ||||
| know when we will be finished) but, in the interim, it seems | ||||
| appropriate to simply start getting rid of the version-specific | ||||
| terminology where it can naturally be removed. | ||||
| o Additional discussion of mappings, etc., especially for case- | ||||
| sensitivity. | ||||
| o More editorial fine-tuning. | ||||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| Email: john+ietf@jck.com | Email: john+ietf@jck.com | |||
| End of changes. 45 change blocks. | ||||
| 172 lines changed or deleted | 190 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||