| < draft-ietf-idnabis-rationale-04.txt | draft-ietf-idnabis-rationale-05.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft November 2, 2008 | Internet-Draft November 28, 2008 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: May 6, 2009 | Expires: June 1, 2009 | |||
| Internationalized Domain Names for Applications (IDNA): Background, | Internationalized Domain Names for Applications (IDNA): Background, | |||
| Explanation, and Rationale | Explanation, and Rationale | |||
| draft-ietf-idnabis-rationale-04.txt | draft-ietf-idnabis-rationale-05.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on May 6, 2009. | This Internet-Draft will expire on June 1, 2009. | |||
| Abstract | Abstract | |||
| Several years have passed since the original protocol for | Several years have passed since the original protocol for | |||
| Internationalized Domain Names (IDNs) was completed and deployed. | Internationalized Domain Names (IDNs) was completed and deployed. | |||
| During that time, a number of issues have arisen, including the need | During that time, a number of issues have arisen, including the need | |||
| to update the system to deal with newer versions of Unicode. Some of | to update the system to deal with newer versions of Unicode. Some of | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 4 | 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | |||
| 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | |||
| 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 | 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 5 | |||
| 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6 | 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 6 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 7 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | |||
| 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | |||
| 3.1. A Tiered Model of Permitted Characters and Labels . . . . 9 | 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | |||
| 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 11 | 3.1.1.1. Contextual Rules . . . . . . . . . . . . . . . . . 11 | |||
| 3.1.1.2. Rules and Their Application . . . . . . . . . . . 11 | ||||
| 3.1.2. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | ||||
| 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.3. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 12 | 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, | 3.3. Layered Restrictions: Tables, Context, Registration, | |||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 13 | Applications . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 13 | 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 14 | |||
| 4.1. Display and Network Order . . . . . . . . . . . . . . . . 13 | 4.1. Display and Network Order . . . . . . . . . . . . . . . . 14 | |||
| 4.2. Entry and Display in Applications . . . . . . . . . . . . 15 | 4.2. Entry and Display in Applications . . . . . . . . . . . . 15 | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and | 4.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 16 | Alternate Character Forms . . . . . . . . . . . . . . . . 16 | |||
| 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 | 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 | |||
| 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 | 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 | |||
| 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 19 | 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | |||
| 6. Front-end and User Interface Processing . . . . . . . . . . . 20 | 6. Front-end and User Interface Processing . . . . . . . . . . . 21 | |||
| 7. Migration from IDNA2003 and Unicode Version Synchronization . 23 | 7. Migration from IDNA2003 and Unicode Version Synchronization . 23 | |||
| 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23 | 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 23 | 7.1.1. General IDNA Validity Criteria . . . . . . . . . . . . 24 | |||
| 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | |||
| 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | |||
| 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | |||
| 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 28 | 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 29 | |||
| 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 | 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 | |||
| 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 | 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 | |||
| 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 | 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 | |||
| 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 | 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 | |||
| 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 | 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 32 | |||
| 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 | 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code | 7.7. Migration Between Unicode Versions: Unassigned Code | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | Points . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 34 | 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35 | 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36 | 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 10. Internationalization Considerations . . . . . . . . . . . . . 36 | 10. Internationalization Considerations . . . . . . . . . . . . . 36 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 36 | 11.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 | |||
| 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 | 11.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 | |||
| 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 | 11.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 | |||
| 12. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | 12. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | |||
| 12.1. General Security Issues with IDNA . . . . . . . . . . . . 37 | 12.1. General Security Issues with IDNA . . . . . . . . . . . . 37 | |||
| 12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 37 | 12.2. Security Differences from IDNA2003 . . . . . . . . . . . . 38 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 39 | 13.2. Informative References . . . . . . . . . . . . . . . . . . 39 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 41 | |||
| A.1. Changes between Version -00 and Version -01 of | A.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 41 | |||
| A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 42 | A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 43 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 44 | Intellectual Property and Copyright Statements . . . . . . . . . . 45 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| The original standards for Internationalized Domain Names (IDNs) were | The original standards for Internationalized Domain Names (IDNs) were | |||
| completed and deployed starting in 2003. Those standards are known | completed and deployed starting in 2003. Those standards are known | |||
| as Internationalized Domain Names in Applications (IDNA), taken from | as Internationalized Domain Names in Applications (IDNA), taken from | |||
| the name of the highest level standard within the group, RFC 3490 | the name of the highest level standard within the group, RFC 3490 | |||
| [RFC3490]. After those standards were deployed, a number of issues | [RFC3490]. After those standards were deployed, a number of issues | |||
| arose that called for a new version of the IDNA protocol and the | arose that called for a new version of the IDNA protocol and the | |||
| associated tables, including a subset of those described in a recent | associated tables, including a subset of those described in a recent | |||
| IAB report [RFC4690] and the need to update the system to deal with | IAB report [RFC4690] and the need to update the system to deal with | |||
| newer versions of Unicode. This document further explains the issues | newer versions of Unicode. This document further explains the issues | |||
| that have been encountered when they are important to understanding | that have been encountered when they are important to understanding | |||
| of the revised protocols. It also provides an overview of the new | of the revised protocols. It also provides an overview of the new | |||
| IDNA model and explanatory material for it. Additional explanatory | IDNA model and explanatory material for it. Additional explanatory | |||
| material for the specific components of the proposals appears with | material for the specific components of the proposals appears with | |||
| the associated documents. | the associated documents. | |||
| A good deal of the background material that appeared in RFC 3490 | ||||
| [RFC3490] has been removed from this update. That material is either | ||||
| of historical interest only or has been covered from a more recent | ||||
| perspective in RFC 4690 [RFC4690]. | ||||
| This document is not normative. The information it provides is | ||||
| intended to make the rules, tables, and protocol easier to understand | ||||
| and to provide overview information and suggestions for zone | ||||
| administrators and others who need to make policy, deployment, and | ||||
| similar decisions about IDNs. | ||||
| 1.2. Discussion Forum | 1.2. Discussion Forum | |||
| [[ RFC Editor: please remove this section. ]] | [[ RFC Editor: please remove this section. ]] | |||
| IDNA2008 is being discussed in the IETF "idnabis" Working Group and | IDNA2008 is being discussed in the IETF "idnabis" Working Group and | |||
| on the mailing list idna-update@alvestrand.no | on the mailing list idna-update@alvestrand.no | |||
| 1.3. Terminology | 1.3. Terminology | |||
| Terminology that is critical for understanding this document and the | Terminology that is critical for understanding this document and the | |||
| skipping to change at page 6, line 5 ¶ | skipping to change at page 6, line 17 ¶ | |||
| the third and fourth positions, essentially requiring that such | the third and fourth positions, essentially requiring that such | |||
| strings be IDNA-valid. This restriction on strings containing "--" | strings be IDNA-valid. This restriction on strings containing "--" | |||
| is required for three reasons: | is required for three reasons: | |||
| o to prevent confusion with pre-IDNA coding forms; | o to prevent confusion with pre-IDNA coding forms; | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 7.4); | prefix, no matter how unlikely those might be (see Section 7.4); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the encoding system. | o to reduce the opportunities for attacks via the Punycode encoding | |||
| algorithm itself. | ||||
| 1.4. Objectives | 1.4. Objectives | |||
| The intent of the IDNA revision effort, and hence of this document | The intent of the IDNA revision effort, and hence of this document | |||
| and the associated ones, is to increase the usability and | and the associated ones, is to increase the usability and | |||
| effectiveness of internationalized domain names (IDNs) while | effectiveness of internationalized domain names (IDNs) while | |||
| preserving or strengthening the integrity of references that use | preserving or strengthening the integrity of references that use | |||
| them. The original "hostname" character definitions (see, e.g., | them. The original "hostname" character definitions (see, e.g., | |||
| [RFC0810]) struck a balance between the creation of useful mnemonics | [RFC0810]) struck a balance between the creation of useful mnemonics | |||
| and the introduction of parsing problems or general confusion in the | and the introduction of parsing problems or general confusion in the | |||
| skipping to change at page 8, line 31 ¶ | skipping to change at page 8, line 45 ¶ | |||
| However, shifting responsibility for character mapping and other | However, shifting responsibility for character mapping and other | |||
| adjustments from the protocol (where it was located in IDNA2003) to | adjustments from the protocol (where it was located in IDNA2003) to | |||
| the user interface or processing before invoking IDNA raises issues | the user interface or processing before invoking IDNA raises issues | |||
| about both what that processing should do and about compatibility for | about both what that processing should do and about compatibility for | |||
| references prepared in an IDNA2003 context. Those issues are | references prepared in an IDNA2003 context. Those issues are | |||
| discussed in Section 6. | discussed in Section 6. | |||
| Operations for converting between local character sets and normalized | Operations for converting between local character sets and normalized | |||
| Unicode are part of this general set of user interface issues. The | Unicode are part of this general set of user interface issues. The | |||
| conversion is obviously not required at all in a Unicode-native | conversion is obviously not required at all in a Unicode-native | |||
| system that maintains all strings in Normalization Form C (NFC). It | system that maintains all strings in Normalization Form C (NFC). | |||
| may, however, involve some complexity in a system that is not | (See [Unicode-UAX15] for precise definitions of NFC and NFKC if | |||
| Unicode-native, especially if the elements of the local character set | needed.) It may, however, involve some complexity in a system that | |||
| do not map exactly and unambiguously into Unicode characters or do so | is not Unicode-native, especially if the elements of the local | |||
| in a way that is not completely stable over time. Perhaps more | character set do not map exactly and unambiguously into Unicode | |||
| important, if a label being converted to a local character set | characters or do so in a way that is not completely stable over time. | |||
| contains Unicode characters that have no correspondence in that | Perhaps more important, if a label being converted to a local | |||
| character set, the application may have to apply special, locally- | character set contains Unicode characters that have no correspondence | |||
| appropriate, methods to avoid or reduce loss of information. | in that character set, the application may have to apply special, | |||
| locally-appropriate, methods to avoid or reduce loss of information. | ||||
| Depending on the system involved, the major difficulty may not lie in | Depending on the system involved, the major difficulty may not lie in | |||
| the mapping but in accurately identifying the incoming character set | the mapping but in accurately identifying the incoming character set | |||
| and then applying the correct conversion routine. If a local | and then applying the correct conversion routine. If a local | |||
| operating system uses one of the ISO 8859 character sets or an | operating system uses one of the ISO 8859 character sets or an | |||
| extensive national or industrial system such as GB18030 [GB18030] or | extensive national or industrial system such as GB18030 [GB18030] or | |||
| BIG5 [BIG5], one must correctly identify the character set in use | BIG5 [BIG5], one must correctly identify the character set in use | |||
| before converting to Unicode even though those character coding | before converting to Unicode even though those character coding | |||
| systems are substantially or completely Unicode-compatible (i.e., all | systems are substantially or completely Unicode-compatible (i.e., all | |||
| of the code points in them have an exact and unique mapping to | of the code points in them have an exact and unique mapping to | |||
| skipping to change at page 9, line 33 ¶ | skipping to change at page 9, line 47 ¶ | |||
| 3. Permitted Characters: An Inclusion List | 3. Permitted Characters: An Inclusion List | |||
| This section provides an overview of the model used to establish the | This section provides an overview of the model used to establish the | |||
| algorithm and character lists of [IDNA2008-Tables] and describes the | algorithm and character lists of [IDNA2008-Tables] and describes the | |||
| names and applicability of the categories used there. Note that the | names and applicability of the categories used there. Note that the | |||
| inclusion of a character in the first category group does not imply | inclusion of a character in the first category group does not imply | |||
| that it can be used indiscriminately; some characters are associated | that it can be used indiscriminately; some characters are associated | |||
| with contextual rules that must be applied as well. | with contextual rules that must be applied as well. | |||
| The information given in this section is provided to make the rules, | The information given in this section is provided to make the rules, | |||
| tables, and protocol easier to understand. It is not normative. The | tables, and protocol easier to understand. The normative generating | |||
| normative generating rules appear in [IDNA2008-Tables] and the rules | rules that correspond to this informal discussion appear in | |||
| that actually determine what labels can be registered or looked up | [IDNA2008-Tables] and the rules that actually determine what labels | |||
| are in [IDNA2008-Protocol]. | can be registered or looked up are in [IDNA2008-Protocol]. | |||
| 3.1. A Tiered Model of Permitted Characters and Labels | 3.1. A Tiered Model of Permitted Characters and Labels | |||
| Moving to an inclusion model requires respecifying the list of | Moving to an inclusion model requires respecifying the list of | |||
| characters that are permitted in IDNs. In IDNA2003, the role and | characters that are permitted in IDNs. In IDNA2003, the role and | |||
| utility of characters are independent of context and fixed forever | utility of characters are independent of context and fixed forever | |||
| (or until the standard is replaced). Making completely context- | (or until the standard is replaced). Making completely context- | |||
| independent rules globally has proven impractical because some | independent rules globally has proven impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| skipping to change at page 10, line 12 ¶ | skipping to change at page 10, line 26 ¶ | |||
| characters entirely. But the restrictions were much too severe to | characters entirely. But the restrictions were much too severe to | |||
| permit an adequate range of mnemonics for terminology based on some | permit an adequate range of mnemonics for terminology based on some | |||
| languages. The requirement to support those characters but limit | languages. The requirement to support those characters but limit | |||
| their use to very specific contexts was reinforced by the observation | their use to very specific contexts was reinforced by the observation | |||
| that handling of particular characters across the languages that use | that handling of particular characters across the languages that use | |||
| a script, or the use of similar or identical-looking characters in | a script, or the use of similar or identical-looking characters in | |||
| different scripts, is less well understood than many people believed | different scripts, is less well understood than many people believed | |||
| it was several years ago. | it was several years ago. | |||
| Independently of the characters chosen (see next subsection), the | Independently of the characters chosen (see next subsection), the | |||
| theory is to divide the characters that appear in Unicode into three | approach is to divide the characters that appear in Unicode into | |||
| categories: | three categories: | |||
| 3.1.1. PROTOCOL-VALID | 3.1.1. PROTOCOL-VALID | |||
| Characters identified as "PROTOCOL-VALID" (often abbreviated | Characters identified as "PROTOCOL-VALID" (often abbreviated | |||
| "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | "PVALID") are, in general, permitted by IDNA for all uses in IDNs. | |||
| Their use may be restricted by rules about the context in which they | Their use may be restricted by rules about the context in which they | |||
| appear or by other rules that apply to the entire label in which they | appear or by other rules that apply to the entire label in which they | |||
| are to be embedded. For example, any label that contains a character | are to be embedded. For example, any label that contains a character | |||
| in this category that has a "right-to-left" property must be used in | in this category that has a "right-to-left" property must be used in | |||
| context with the "Bidi" rules (see [IDNA2008-Bidi]). | context with the "Bidi" rules (see [IDNA2008-Bidi]). | |||
| The term "PROTOCOL-VALID" is used to stress the fact that the | The term "PROTOCOL-VALID" is used to stress the fact that the | |||
| presence of a character in this category does not imply that a given | presence of a character in this category does not imply that a given | |||
| registry need accept registrations containing any of the characters | registry need accept registrations containing any of the characters | |||
| in the category. Registries are still expected to apply judgment | in the category. Registries are still expected to apply judgment | |||
| about labels they will accept and to maintain rules consistent with | about labels they will accept and to maintain rules consistent with | |||
| those judgments (see [IDNA2008-Protocol] and Section 3.3). | those judgments (see [IDNA2008-Protocol] and Section 3.3). | |||
| Characters that are placed in the "PROTOCOL-VALID" category are never | Characters that are placed in the "PROTOCOL-VALID" category are | |||
| removed from it unless the code points themselves are removed from | expected to never be removed from it or reclassified. While | |||
| Unicode (such removal would be inconsistent with the Unicode | theoretically characters could be removed from Unicode, such removal | |||
| stability principles (see [Unicode51], Appendix F) and hence should | would be inconsistent with the Unicode stability principles (see | |||
| never occur). | [Unicode51], Appendix F) and hence should never occur. | |||
| 3.1.1.1. Contextual Rules | 3.1.1.1. Contextual Rules | |||
| Some characters may be unsuitable for general use in IDNs but | Some characters may be unsuitable for general use in IDNs but | |||
| necessary for the plausible support of some scripts. The two most | necessary for the plausible support of some scripts. The two most | |||
| commonly-cited examples are the zero-width joiner and non-joiner | commonly-cited examples are the zero-width joiner and non-joiner | |||
| characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for | characters (ZWJ, U+200D and ZWNJ, U+200C), but provisions for | |||
| unambiguous labels may require that other characters be restricted to | unambiguous labels may require that other characters be restricted to | |||
| particular contexts. For example, the ASCII hyphen is not permitted | particular contexts. For example, the ASCII hyphen is not permitted | |||
| to start or end a label, whether that label contains non-ASCII | to start or end a label, whether that label contains non-ASCII | |||
| skipping to change at page 11, line 13 ¶ | skipping to change at page 11, line 28 ¶ | |||
| most scripts but affect format or presentation in a few others or | most scripts but affect format or presentation in a few others or | |||
| because they are combining characters that are safe for use only in | because they are combining characters that are safe for use only in | |||
| conjunction with particular characters or scripts. In order to | conjunction with particular characters or scripts. In order to | |||
| permit them to be used at all, they are specially identified as | permit them to be used at all, they are specially identified as | |||
| "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | "CONTEXTUAL RULE REQUIRED" and, when adequately understood, | |||
| associated with a rule. In addition, the rule will define whether it | associated with a rule. In addition, the rule will define whether it | |||
| is to be applied on lookup as well as registration. A distinction is | is to be applied on lookup as well as registration. A distinction is | |||
| made between characters that indicate or prohibit joining (known as | made between characters that indicate or prohibit joining (known as | |||
| "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | "CONTEXT-JOINER" or "CONTEXTJ") and other characters requiring | |||
| contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | contextual treatment ("CONTEXT-OTHER" or "CONTEXTO"). Only the | |||
| former are fully tested at lookup time. | former require full testing at lookup time. | |||
| 3.1.1.2. Rules and Their Application | 3.1.1.2. Rules and Their Application | |||
| The actual rules may be present or absent. If present, they may have | The actual rules may be present or absent. If present, they may have | |||
| values of "True" (character may be used in any position in any | values of "True" (character may be used in any position in any | |||
| label), "False" (character may not be used in any label), or may be a | label), "False" (character may not be used in any label), or may be a | |||
| set of procedural rules that specify the context in which the | set of procedural rules that specify the context in which the | |||
| character is permitted. | character is permitted. | |||
| Examples of descriptions of typical rules, stated informally and in | Examples of descriptions of typical rules, stated informally and in | |||
| skipping to change at page 11, line 41 ¶ | skipping to change at page 12, line 7 ¶ | |||
| version of the tables. Characters associated with null rules are not | version of the tables. Characters associated with null rules are not | |||
| permitted to appear in putative labels for either registration or | permitted to appear in putative labels for either registration or | |||
| lookup. Of course, a later version of the tables might contain a | lookup. Of course, a later version of the tables might contain a | |||
| non-null rule. | non-null rule. | |||
| The description of the syntax of the rules, and the rules themselves, | The description of the syntax of the rules, and the rules themselves, | |||
| appears in [IDNA2008-Tables]. | appears in [IDNA2008-Tables]. | |||
| 3.1.2. DISALLOWED | 3.1.2. DISALLOWED | |||
| Some characters are sufficiently problematic for use in IDNs that | Some characters are inappropriate for use in IDNs and are thus | |||
| they should be excluded for both registration and lookup (i.e., IDNA- | excluded for both registration and lookup (i.e., IDNA-conforming | |||
| conforming applications performing name lookup should verify that | applications performing name lookup should verify that these | |||
| these characters are absent; if they are present, the label strings | characters are absent; if they are present, the label strings should | |||
| should be rejected rather than converted to A-labels and looked up. | be rejected rather than converted to A-labels and looked up. Some of | |||
| these characters are problematic for use in IDNs (such as the | ||||
| FRACTION SLASH character, U+2044), while some of them (such as the | ||||
| various HEART symbols, e.g., U+2665, U+2661, and U+2765, see | ||||
| Section 7.6) simply fall outside the conventions for typical | ||||
| identifiers (basically letters and numbers). | ||||
| Of course, this category would include code points that had been | Of course, this category would include code points that had been | |||
| removed entirely from Unicode should such removals ever occur. | removed entirely from Unicode should such removals ever occur. | |||
| Characters that are placed in the "DISALLOWED" category are expected | Characters that are placed in the "DISALLOWED" category are expected | |||
| to never be removed from it or reclassified. If a character is | to never be removed from it or reclassified. If a character is | |||
| classified as "DISALLOWED" in error and the error is sufficiently | classified as "DISALLOWED" in error and the error is sufficiently | |||
| problematic, the only recourse would be either to introduce a new | problematic, the only recourse would be either to introduce a new | |||
| code point into Unicode and classify it as "PROTOCOL-VALID" or for | code point into Unicode and classify it as "PROTOCOL-VALID" or for | |||
| the IETF to accept the considerable costs of an incompatible change | the IETF to accept the considerable costs of an incompatible change | |||
| skipping to change at page 12, line 31 ¶ | skipping to change at page 12, line 50 ¶ | |||
| mapped to another character by Unicode casefolding. | mapped to another character by Unicode casefolding. | |||
| o The character is a symbol or punctuation form or, more generally, | o The character is a symbol or punctuation form or, more generally, | |||
| something that is not a letter, digit, or a mark that is used to | something that is not a letter, digit, or a mark that is used to | |||
| form a letter or digit. | form a letter or digit. | |||
| 3.1.3. UNASSIGNED | 3.1.3. UNASSIGNED | |||
| For convenience in processing and table-building, code points that do | For convenience in processing and table-building, code points that do | |||
| not have assigned values in a given version of Unicode are treated as | not have assigned values in a given version of Unicode are treated as | |||
| belonging to a special UNASSIGNED category. Such code points MUST | belonging to a special UNASSIGNED category. Such code points are | |||
| NOT appear in labels to be registered or looked up. The category | prohibited in labels to be registered or looked up. The category | |||
| differs from DISALLOWED in that code points are moved out of it by | differs from DISALLOWED in that code points are moved out of it by | |||
| the simple expedient of being assigned in a later version of Unicode | the simple expedient of being assigned in a later version of Unicode | |||
| (at which point, they are classified into one of the other categories | (at which point, they are classified into one of the other categories | |||
| as appropriate). | as appropriate). | |||
| 3.2. Registration Policy | 3.2. Registration Policy | |||
| While these recommendations cannot and should not define registry | While these recommendations cannot and should not define registry | |||
| policies, registries SHOULD develop and apply additional restrictions | policies, registries should develop and apply additional restrictions | |||
| to reduce confusion and other problems. For example, it is generally | to reduce confusion and other problems. For example, it is generally | |||
| believed that labels containing characters from more than one script | believed that labels containing characters from more than one script | |||
| are a bad practice although there may be some important exceptions to | are a bad practice although there may be some important exceptions to | |||
| that principle. Some registries may choose to restrict registrations | that principle. Some registries may choose to restrict registrations | |||
| to characters drawn from a very small number of scripts. For many | to characters drawn from a very small number of scripts. For many | |||
| scripts, the use of variant techniques such as those as described in | scripts, the use of variant techniques such as those as described in | |||
| RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for | RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for | |||
| Chinese by the tables described in RFC 4713 [RFC4713] may be helpful | Chinese by the tables described in RFC 4713 [RFC4713] may be helpful | |||
| in reducing problems that might be perceived by users. | in reducing problems that might be perceived by users. | |||
| skipping to change at page 15, line 26 ¶ | skipping to change at page 15, line 43 ¶ | |||
| Applications can accept domain names using any character set or sets | Applications can accept domain names using any character set or sets | |||
| desired by the application developer, specified by the operating | desired by the application developer, specified by the operating | |||
| system, or dictated by other constraints, and can display domain | system, or dictated by other constraints, and can display domain | |||
| names in any character set or character coding system. That is, the | names in any character set or character coding system. That is, the | |||
| IDNA protocol does not affect the interface between users and | IDNA protocol does not affect the interface between users and | |||
| applications. | applications. | |||
| An IDNA-aware application can accept and display internationalized | An IDNA-aware application can accept and display internationalized | |||
| domain names in two formats: the internationalized character set(s) | domain names in two formats: the internationalized character set(s) | |||
| supported by the application (i.e., an appropriate local | supported by the application (i.e., an appropriate local | |||
| representation of a U-label), and as an A-label. Applications MAY | representation of a U-label), and as an A-label. Applications may | |||
| allow the display of A-labels, but are encouraged to not do so except | allow the display of A-labels, but are encouraged to not do so except | |||
| as an interface for special purposes, possibly for debugging, or to | as an interface for special purposes, possibly for debugging, or to | |||
| cope with display limitations. In general, they SHOULD allow, but | cope with display limitations. In general, they should allow, but | |||
| not encourage, user input of that label form. A-labels are opaque | not encourage, user input of that label form. A-labels are opaque | |||
| and ugly and malicious variations on them are not easily detected by | and ugly and malicious variations on them are not easily detected by | |||
| users. Where possible, they should thus only be exposed to users and | users. Where possible, they should thus only be exposed to users and | |||
| in contexts in which they are absolutely needed. Because IDN labels | in contexts in which they are absolutely needed. Because IDN labels | |||
| can be rendered either as A-labels or U-labels, the application may | can be rendered either as A-labels or U-labels, the application may | |||
| reasonably have an option for the user to select the preferred method | reasonably have an option for the user to select the preferred method | |||
| of display; if it does, rendering the U-label should normally be the | of display; if it does, rendering the U-label should normally be the | |||
| default. | default. | |||
| Domain names are often stored and transported in many places. For | Domain names are often stored and transported in many places. For | |||
| example, they are part of documents such as mail messages and web | example, they are part of documents such as mail messages and web | |||
| pages. They are transported in many parts of many protocols, such as | pages. They are transported in many parts of many protocols, such as | |||
| both the control commands of SMTP and associated the message body | both the control commands of SMTP and associated the message body | |||
| parts, and in the headers and the body content in HTTP. It is | parts, and in the headers and the body content in HTTP. It is | |||
| important to remember that domain names appear both in domain name | important to remember that domain names appear both in domain name | |||
| slots and in the content that is passed over protocols. | slots and in the content that is passed over protocols. | |||
| In protocols and document formats that define how to handle | In protocols and document formats that define how to handle | |||
| specification or negotiation of charsets, labels can be encoded in | specification or negotiation of charsets, labels can be encoded in | |||
| any charset allowed by the protocol or document format. If a | any charset allowed by the protocol or document format. If a | |||
| protocol or document format only allows one charset, the labels MUST | protocol or document format only allows one charset, the labels must | |||
| be given in that charset. Of course, not all charsets can properly | be given in that charset. Of course, not all charsets can properly | |||
| represent all labels. If a U-label cannot be displayed in its | represent all labels. If a U-label cannot be displayed in its | |||
| entirety, the only choice (without loss of information) may be to | entirety, the only choice (without loss of information) may be to | |||
| display the A-label. | display the A-label. | |||
| In any place where a protocol or document format allows transmission | In any place where a protocol or document format allows transmission | |||
| of the characters in internationalized labels, labels SHOULD be | of the characters in internationalized labels, labels should be | |||
| transmitted using whatever character encoding and escape mechanism | transmitted using whatever character encoding and escape mechanism | |||
| the protocol or document format uses at that place. This provision | the protocol or document format uses at that place. This provision | |||
| is intended to prevent situations in which, e.g., UTF-8 domain names | is intended to prevent situations in which, e.g., UTF-8 domain names | |||
| appear embedded in text that is otherwise in some other character | appear embedded in text that is otherwise in some other character | |||
| coding. | coding. | |||
| All protocols that use domain name slots already have the capacity | All protocols that use domain name slots already have the capacity | |||
| for handling domain names in the ASCII charset. Thus, A-labels can | for handling domain names in the ASCII charset. Thus, A-labels can | |||
| inherently be handled by those protocols. | inherently be handled by those protocols. | |||
| skipping to change at page 22, line 8 ¶ | skipping to change at page 22, line 26 ¶ | |||
| welcome.]] | welcome.]] | |||
| As discussed elsewhere in this document, the IDNA2008 model removes | As discussed elsewhere in this document, the IDNA2008 model removes | |||
| all of these mappings and interpretations, including the equivalence | all of these mappings and interpretations, including the equivalence | |||
| of different forms of dots, from the protocol, discouraging such | of different forms of dots, from the protocol, discouraging such | |||
| mappings and leaving them, when necessary, to local processing. This | mappings and leaving them, when necessary, to local processing. This | |||
| should not be taken to imply that local processing is optional or can | should not be taken to imply that local processing is optional or can | |||
| be avoided entirely, even if doing so might have been desirable in a | be avoided entirely, even if doing so might have been desirable in a | |||
| world without IDNA2003 IDNs in files and archives. Instead, unless | world without IDNA2003 IDNs in files and archives. Instead, unless | |||
| the program context is such that it is known that any IDNs that | the program context is such that it is known that any IDNs that | |||
| appear will be either U-labels or A-labels, or that other forms can | appear will contain either U-label or A-label forms, or that other | |||
| safely be rejected, some local processing of apparent domain name | forms can safely be rejected, some local processing of apparent | |||
| strings will be required, both to maintain compatibility with | domain name strings will be required, both to maintain compatibility | |||
| IDNA2003 and to prevent user astonishment. Such local processing, | with IDNA2003 and to prevent user astonishment. Such local | |||
| while not specified in this document or the associated ones, will | processing, while not specified in this document or the associated | |||
| generally take one of two forms: | ones, will generally take one of two forms: | |||
| o Generic Preprocessing. | o Generic Preprocessing. | |||
| When the context in which the program or system that processes | When the context in which the program or system that processes | |||
| domain names operates is global, a reasonable balance must be | domain names operates is global, a reasonable balance must be | |||
| found that is sensitive to the broad range of local needs and | found that is sensitive to the broad range of local needs and | |||
| assumptions while, at the same time, not sacrificing the needs of | assumptions while, at the same time, not sacrificing the needs of | |||
| one language, script, or user population to those of another. | one language, script, or user population to those of another. | |||
| For this case, the best practice will usually be to apply NFKC and | For this case, the best practice will usually be to apply NFKC and | |||
| case-mapping (or, perhaps better yet, Stringprep itself), plus | case-mapping (or, perhaps better yet, Stringprep itself), plus | |||
| skipping to change at page 25, line 31 ¶ | skipping to change at page 25, line 49 ¶ | |||
| administrators have been expected to verify that names meet | administrators have been expected to verify that names meet | |||
| "hostname" [RFC0952] where necessary for the expected applications. | "hostname" [RFC0952] where necessary for the expected applications. | |||
| Later addition of special service location formats [RFC2782] imposed | Later addition of special service location formats [RFC2782] imposed | |||
| new requirements on zone administrators for the use of labels that | new requirements on zone administrators for the use of labels that | |||
| conform to the requirements of those formats. For zones that will | conform to the requirements of those formats. For zones that will | |||
| contain IDNs, support for Unicode version-independence requires | contain IDNs, support for Unicode version-independence requires | |||
| restrictions on all strings placed in the zone. In particular, for | restrictions on all strings placed in the zone. In particular, for | |||
| such zones: | such zones: | |||
| o Any label that appears to be an A-label, i.e., any label that | o Any label that appears to be an A-label, i.e., any label that | |||
| starts in "xn--", MUST be IDNA-valid, i.e., they MUST be valid | starts in "xn--", must be IDNA-valid, i.e., they must be valid | |||
| A-labels, as discussed in Section 2 above. | A-labels, as discussed in Section 2 above. | |||
| o The Unicode tables (i.e., tables of code points, character | o The Unicode tables (i.e., tables of code points, character | |||
| classes, and properties) and IDNA tables (i.e., tables of | classes, and properties) and IDNA tables (i.e., tables of | |||
| contextual rules such as those that appear in the Tables | contextual rules such as those that appear in the Tables | |||
| document), MUST be consistent on the systems performing or | document), must be consistent on the systems performing or | |||
| validating labels to be registered. Note that this does not | validating labels to be registered. Note that this does not | |||
| require that tables reflect the latest version of Unicode, only | require that tables reflect the latest version of Unicode, only | |||
| that all tables used on a given system are consistent with each | that all tables used on a given system are consistent with each | |||
| other. | other. | |||
| Under this model, a registry (or entity communicating with a registry | Under this model, a registry (or entity communicating with a registry | |||
| to accomplish name registrations) will need to update its tables -- | to accomplish name registrations) will need to update its tables -- | |||
| both the Unicode-associated tables and the tables of permitted IDN | both the Unicode-associated tables and the tables of permitted IDN | |||
| characters -- to enable a new script or other set of new characters. | characters -- to enable a new script or other set of new characters. | |||
| It will not be affected by newer versions of Unicode, or newly- | It will not be affected by newer versions of Unicode, or newly- | |||
| skipping to change at page 26, line 10 ¶ | skipping to change at page 26, line 30 ¶ | |||
| registrations. The zone administrator is also responsible -- under | registrations. The zone administrator is also responsible -- under | |||
| the protocol and to registrants and users -- for both checking as | the protocol and to registrants and users -- for both checking as | |||
| required by the protocol and verification that whatever policies it | required by the protocol and verification that whatever policies it | |||
| develops are complied with, whether those policies are for minimizing | develops are complied with, whether those policies are for minimizing | |||
| risks due to confusable characters and sequences, for preserving | risks due to confusable characters and sequences, for preserving | |||
| language or script integrity, or for other purposes. Those checking | language or script integrity, or for other purposes. Those checking | |||
| and verification procedures are more extensive than those that are is | and verification procedures are more extensive than those that are is | |||
| expected of applications systems that look names up. | expected of applications systems that look names up. | |||
| Systems looking up or resolving DNS labels, especially IDN DNS | Systems looking up or resolving DNS labels, especially IDN DNS | |||
| labels, MUST be able to assume that applicable registration rules | labels, must be able to assume that applicable registration rules | |||
| were followed for names entered into the DNS. | were followed for names entered into the DNS. | |||
| 7.1.3. Labels in Lookup | 7.1.3. Labels in Lookup | |||
| Anyone looking up a label in a DNS zone is required to | Anyone looking up a label in a DNS zone is required to | |||
| o Maintain a consistent set of tables, as discussed above. As with | o Maintain a consistent set of tables, as discussed above. As with | |||
| registration, the tables need not reflect the latest version of | registration, the tables need not reflect the latest version of | |||
| Unicode but they must be consistent. | Unicode but they must be consistent. | |||
| skipping to change at page 26, line 36 ¶ | skipping to change at page 27, line 8 ¶ | |||
| o Validate the label itself for conformance with a small number of | o Validate the label itself for conformance with a small number of | |||
| whole-label rules, notably verifying that there are no leading | whole-label rules, notably verifying that there are no leading | |||
| combining marks, that the "bidi" conditions are met if right to | combining marks, that the "bidi" conditions are met if right to | |||
| left characters appear, that any required contextual rules are | left characters appear, that any required contextual rules are | |||
| available and that, if such rules are associated with Joiner | available and that, if such rules are associated with Joiner | |||
| Controls, they are tested. | Controls, they are tested. | |||
| o Avoid validating other contextual rules about characters, | o Avoid validating other contextual rules about characters, | |||
| including mixed-script label prohibitions, although such rules may | including mixed-script label prohibitions, although such rules may | |||
| be used to influence presentation decisions in the user interface. | be used to influence presentation decisions in the user interface. | |||
| [[anchor19: Check this, and all similar statements, against | ||||
| Protocol when that is finished.]] | ||||
| By avoiding applying its own interpretation of which labels are valid | By avoiding applying its own interpretation of which labels are valid | |||
| as a means of rejecting lookup attempts, the lookup application | as a means of rejecting lookup attempts, the lookup application | |||
| becomes less sensitive to version incompatibilities with the | becomes less sensitive to version incompatibilities with the | |||
| particular zone registry associated with the domain name. | particular zone registry associated with the domain name. | |||
| An application or client that processes names according to this | An application or client that processes names according to this | |||
| protocol and then resolves them in the DNS will be able to locate any | protocol and then resolves them in the DNS will be able to locate any | |||
| name that is validly registered, as long as its version of the | name that is validly registered, as long as its version of the | |||
| Unicode-associated tables is sufficiently up-to-date to interpret all | Unicode-associated tables is sufficiently up-to-date to interpret all | |||
| of the characters in the label. Messages to users should distinguish | of the characters in the label. Messages to users should distinguish | |||
| between "label contains an unallocated code point" and other types of | between "label contains an unallocated code point" and other types of | |||
| lookup failures. A failure on the basis of an old version of Unicode | lookup failures. A failure on the basis of an old version of Unicode | |||
| may lead the user to a desire to upgrade to a newer version, but will | may lead the user to a desire to upgrade to a newer version, but will | |||
| have no other ill effects (this is consistent with behavior in the | have no other ill effects (this is consistent with behavior in the | |||
| transition to the DNS when some hosts could not yet handle some forms | transition to the DNS when some hosts could not yet handle some forms | |||
| of names or record types). | of names or record types). | |||
| 7.2. Changes in Character Interpretations | 7.2. Changes in Character Interpretations | |||
| [[anchor19: Note in Draft: This subsection is completely new in | [[anchor20: Note in Draft: This subsection is completely new in | |||
| version -04 of this document. It could almost certainly use | version -04 of this document. It could almost certainly use | |||
| improvement. It also contains some material that is redundant with | improvement. It also contains some material that is redundant with | |||
| material in other sections. I have not tried to remove that material | material in other sections. I have not tried to remove that material | |||
| and will not do so until the WG concludes that this section is | and will not do so until the WG concludes that this section is | |||
| relatively stable, but would appreciate help in identifying what | relatively stable, but would appreciate help in identifying what | |||
| should be removed or how this might be enhanced to contain more of | should be removed or how this might be enhanced to contain more of | |||
| that other material. --JcK]] | that other material. --JcK]] | |||
| In those scripts that make case distinctions, there are a few | In those scripts that make case distinctions, there are a few | |||
| characters for which an obvious and unique upper case character has | characters for which an obvious and unique upper case character has | |||
| skipping to change at page 31, line 40 ¶ | skipping to change at page 32, line 11 ¶ | |||
| new ones would first process a putative label under the IDNA2008 | new ones would first process a putative label under the IDNA2008 | |||
| rules and try to look it up and then, if it were not found, would | rules and try to look it up and then, if it were not found, would | |||
| process the label under IDNA2003 rules and look it up again. That | process the label under IDNA2003 rules and look it up again. That | |||
| process could significantly slow down all processing that involved | process could significantly slow down all processing that involved | |||
| IDNs in the DNS especially since, in principle, a fully-qualified | IDNs in the DNS especially since, in principle, a fully-qualified | |||
| name could contain a mixture of labels that were registered with the | name could contain a mixture of labels that were registered with the | |||
| old and new prefixes, a situation that would make the use of DNS | old and new prefixes, a situation that would make the use of DNS | |||
| caching very difficult. In addition, looking up the same input | caching very difficult. In addition, looking up the same input | |||
| string as two separate A-labels would create some potential for | string as two separate A-labels would create some potential for | |||
| confusion and attacks, since they could, in principle, map to | confusion and attacks, since they could, in principle, map to | |||
| different targets and then resolve to different DNS label nodes. | different targets and then resolve to different entries in the DNS. | |||
| Consequently, a prefix change is to be avoided if at all possible, | Consequently, a prefix change is to be avoided if at all possible, | |||
| even if it means accepting some IDNA2003 decisions about character | even if it means accepting some IDNA2003 decisions about character | |||
| distinctions as irreversible and/or giving special treatment to edge | distinctions as irreversible and/or giving special treatment to edge | |||
| cases. | cases. | |||
| 7.5. Stringprep Changes and Compatibility | 7.5. Stringprep Changes and Compatibility | |||
| The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | The Nameprep [RFC3491] specification, a key part of IDNA2003, is a | |||
| profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | profile of Stringprep [RFC3454]. While Nameprep is a Stringprep | |||
| skipping to change at page 33, line 9 ¶ | skipping to change at page 33, line 29 ¶ | |||
| there are no uniform conventions for naming; variations such as | there are no uniform conventions for naming; variations such as | |||
| outline, solid, and shaded forms may or may not exist; and so on. | outline, solid, and shaded forms may or may not exist; and so on. | |||
| As just one example, consider a "heart" symbol as it might appear | As just one example, consider a "heart" symbol as it might appear | |||
| in a logo that might be read as "I love...". While the user might | in a logo that might be read as "I love...". While the user might | |||
| read such a logo as "I love..." or "I heart...", considerable | read such a logo as "I love..." or "I heart...", considerable | |||
| knowledge of the coding distinctions made in Unicode is needed to | knowledge of the coding distinctions made in Unicode is needed to | |||
| know that there more than one "heart" character (e.g., U+2665, | know that there more than one "heart" character (e.g., U+2665, | |||
| U+2661, and U+2765) and how to describe it. These issues are of | U+2661, and U+2765) and how to describe it. These issues are of | |||
| particular importance if strings are expected to be understood or | particular importance if strings are expected to be understood or | |||
| transcribed by the listener after being read out loud. | transcribed by the listener after being read out loud. | |||
| [[anchor20: The above paragraph remains controversial as to | [[anchor21: The above paragraph remains controversial as to | |||
| whether it is valid. The WG will need to make a decision if this | whether it is valid. The WG will need to make a decision if this | |||
| section is not dropped entirely.]] | section is not dropped entirely.]] | |||
| o As a simplified example of this, assume one wanted to use a | o As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| those names are ambiguous in the Unicode system of naming (the | those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know -- absent careful study of | would-be registrant has no way to know -- absent careful study of | |||
| the code tables -- whether it is ambiguous (e.g., where there are | the code tables -- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| skipping to change at page 33, line 32 ¶ | skipping to change at page 34, line 4 ¶ | |||
| "black heart", or as any of the other examples below. | "black heart", or as any of the other examples below. | |||
| o The actual situation is even worse than this. There is no | o The actual situation is even worse than this. There is no | |||
| possible way for a normal, casual, user to tell the difference | possible way for a normal, casual, user to tell the difference | |||
| between the hearts of U+2665 and U+2765 and the stars of U+2606 | between the hearts of U+2665 and U+2765 and the stars of U+2606 | |||
| and U+2729 or the without somehow knowing to look for a | and U+2729 or the without somehow knowing to look for a | |||
| distinction. We have a white heart (U+2661) and few black hearts. | distinction. We have a white heart (U+2661) and few black hearts. | |||
| Consequently, describing a label as containing a heart hopelessly | Consequently, describing a label as containing a heart hopelessly | |||
| ambiguous: we can only know that it contains one of several | ambiguous: we can only know that it contains one of several | |||
| characters that look like hearts or have "heart" in their names. | characters that look like hearts or have "heart" in their names. | |||
| In cities where "Square" is a popular part of a location name, one | In cities where "Square" is a popular part of a location name, one | |||
| might well want to use a square symbol in a label as well and | might well want to use a square symbol in a label as well and | |||
| there are far more squares of various flavors in Unicode than | there are far more squares of various flavors in Unicode than | |||
| there are hearts or stars. | there are hearts or stars. | |||
| o The consequence of these ambiguities of description and | o The consequence of these ambiguities of description and | |||
| dependencies on distinctions that were, or were not, made in | dependencies on distinctions that were, or were not, made in | |||
| Unicode codings is that symbols are a very poor basis for reliable | Unicode codings is that symbols are a very poor basis for reliable | |||
| communication. Consistent with this conclusion, the Unicode | communication. Consistent with this conclusion, the Unicode | |||
| standard recommends that strings used in identifiers not contain | standard recommends that strings used in identifiers not contain | |||
| symbols or punctuation [Unicode-UAX31]. Of course, these | symbols or punctuation [Unicode-UAX31]. Of course, these | |||
| difficulties with symbols do not arise with actual pictographic | difficulties with symbols do not arise with actual pictographic | |||
| languages and scripts which would be treated like any other | languages and scripts which would be treated like any other | |||
| language characters; the two should not be confused. | language characters; the two should not be confused. | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code Points | 7.7. Migration Between Unicode Versions: Unassigned Code Points | |||
| In IDNA2003, labels containing unassigned code points are looked up | In IDNA2003, labels containing unassigned code points are looked up | |||
| on the theory that, if they appear in labels and can be mapped and | on the assumption that, if they appear in labels and can be mapped | |||
| then resolved, the relevant standards must have changed and the | and then resolved, the relevant standards must have changed and the | |||
| registry has properly allocated only assigned values. | registry has properly allocated only assigned values. | |||
| In IDNA2008, strings containing unassigned code points MUST NOT be | In IDNA2008, strings containing unassigned code points must not be | |||
| either looked up or registered. There are several reasons for this, | either looked up or registered. There are several reasons for this, | |||
| with the most important ones being: | with the most important ones being: | |||
| o It cannot be known with sufficient reliability in advance that a | o It cannot be known with sufficient reliability in advance that a | |||
| code point that was not previously assigned will not be assigned | code point that was not previously assigned will not be assigned | |||
| to a compatibility character. In IDNA2003, since there is no | to a compatibility character. In IDNA2003, since there is no | |||
| direct dependency on NFKC (Stringprep's tables are based on NFKC, | direct dependency on NFKC (Stringprep's tables are based on NFKC, | |||
| but IDNA2003 depends only on Stringprep), allocation of a | but IDNA2003 depends only on Stringprep), allocation of a | |||
| compatibility character might produce some odd situations, but it | compatibility character might produce some odd situations, but it | |||
| would not be a problem. In IDNA2008, where compatibility | would not be a problem. In IDNA2008, where compatibility | |||
| skipping to change at page 35, line 45 ¶ | skipping to change at page 36, line 18 ¶ | |||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | |||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | |||
| Michel Suignard, and Ken Whistler. We express our thanks to Google | Michel Suignard, and Ken Whistler. We express our thanks to Google | |||
| for support of that meeting and to the participants for their | for support of that meeting and to the participants for their | |||
| contributions. | contributions. | |||
| Useful comments and text on the WG versions of the draft were | Useful comments and text on the WG versions of the draft were | |||
| received from many participants in the IETF "IDNABIS" WG and a number | received from many participants in the IETF "IDNABIS" WG and a number | |||
| of document changes resulted from mailing list discussions made by | of document changes resulted from mailing list discussions made by | |||
| that group. Marcos Sanz provided specific analysis and suggestions | that group. Marcos Sanz provided specific analysis and suggestions | |||
| that were exceptionally helpful in refining the text, as did Mark | that were exceptionally helpful in refining the text, as did Vint | |||
| Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan. | Cerf, Mark Davis, Martin Duerst, Ken Whistler, and Andrew Sullivan. | |||
| 9. Contributors | 9. Contributors | |||
| While the listed editor held the pen, this core of this document and | While the listed editor held the pen, this core of this document and | |||
| the initial WG version represents the joint work and conclusions of | the initial WG version represents the joint work and conclusions of | |||
| an ad hoc design team consisting of the editor and, in alphabetic | an ad hoc design team consisting of the editor and, in alphabetic | |||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | |||
| In addition, there were many specific contributions and helpful | In addition, there were many specific contributions and helpful | |||
| comments from those listed in the Acknowledgments section and others | comments from those listed in the Acknowledgments section and others | |||
| who have contributed to the development and use of the IDNA | who have contributed to the development and use of the IDNA | |||
| skipping to change at page 42, line 35 ¶ | skipping to change at page 43, line 4 ¶ | |||
| the IETF does not normally annotate individual sections of documents | the IETF does not normally annotate individual sections of documents | |||
| with whether they are normative or not, concerns that we don't know | with whether they are normative or not, concerns that we don't know | |||
| which is which, claims that some material is normative that would be | which is which, claims that some material is normative that would be | |||
| problematic if so classified, etc., argue that we should at least be | problematic if so classified, etc., argue that we should at least be | |||
| able to have a clear discussion on the subject. | able to have a clear discussion on the subject. | |||
| Two annotations have been applied to sections that might reasonably | Two annotations have been applied to sections that might reasonably | |||
| be considered normative. One annotation is based on the list of | be considered normative. One annotation is based on the list of | |||
| sections in Mark Davis's note of 29 September (http:// | sections in Mark Davis's note of 29 September (http:// | |||
| www.alvestrand.no/pipermail/idna-update/2008-September/002667.html). | www.alvestrand.no/pipermail/idna-update/2008-September/002667.html). | |||
| The other is based on an elaboration of John Klensin's response on 7 | The other is based on an elaboration of John Klensin's response on 7 | |||
| October (http://www.alvestrand.no/pipermail/idna-update/2008-October/ | October (http://www.alvestrand.no/pipermail/idna-update/2008-October/ | |||
| 002691.html). These should just be considered two suggestions to | 002691.html). These should just be considered two suggestions to | |||
| illuminate and, one hopes, advance the Working Group's discussions. | illuminate and, one hopes, advance the Working Group's discussions. | |||
| Some additional editorial changes have been made, but they are | Some additional editorial changes have been made, but they are | |||
| basically trivial. In the editor's judgment, it is not possible to | basically trivial. In the editor's judgment, it is not possible to | |||
| make significantly more progress with this document until the matter | make significantly more progress with this document until the matter | |||
| of document organization is settled. | of document organization is settled. | |||
| A.4. Version -04 | A.4. Version -04 | |||
| o Definitional and other normative material moved to new document | o Definitional and other normative material moved to new document | |||
| (draft-ietf-idnabis-defs). Version -03 annotations removed. | (draft-ietf-idnabis-defs). Version -03 annotations removed. | |||
| o Material on differences between IDNA2003 and IDNA2003 moved to an | o Material on differences between IDNA2003 and IDNA2008 moved to an | |||
| appendix in Protocol. | appendix in Protocol. | |||
| o Material left over from the origins of this document as a | o Material left over from the origins of this document as a | |||
| preliminary proposal has been removed or rewritten. | preliminary proposal has been removed or rewritten. | |||
| o Changes made to reflect consensus call results, including removing | o Changes made to reflect consensus call results, including removing | |||
| several placeholder notes for discussion. | several placeholder notes for discussion. | |||
| o Added more material, including discussion of historic scripts, to | o Added more material, including discussion of historic scripts, to | |||
| Section 3.2 on registration policies. | Section 3.2 on registration policies. | |||
| o Added a new section (Section 7.2) to contain specific discussion | o Added a new section (Section 7.2) to contain specific discussion | |||
| of handling of characters that are interpreted differently in | of handling of characters that are interpreted differently in | |||
| input to IDNA2003 and 2008. | input to IDNA2003 and 2008. | |||
| o Some material, including this section/appendix, rearranged. | o Some material, including this section/appendix, rearranged. | |||
| A.5. Version -05 | ||||
| o Many small editorial changes, including changes to eliminate the | ||||
| last vestiges of what appeared to be 2119 language (upper-case | ||||
| MUST, SHOULD, or MAY) and small adjustments to terminology. | ||||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| Email: john+ietf@jck.com | Email: john+ietf@jck.com | |||
| End of changes. 48 change blocks. | ||||
| 75 lines changed or deleted | 107 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||