| < draft-ietf-idnabis-rationale-10.txt | draft-ietf-idnabis-rationale-11.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Klensin | Network Working Group J. Klensin | |||
| Internet-Draft June 18, 2009 | Internet-Draft August 13, 2009 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: December 20, 2009 | Expires: February 14, 2010 | |||
| Internationalized Domain Names for Applications (IDNA): Background, | Internationalized Domain Names for Applications (IDNA): Background, | |||
| Explanation, and Rationale | Explanation, and Rationale | |||
| draft-ietf-idnabis-rationale-10.txt | draft-ietf-idnabis-rationale-11.txt | |||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
| provisions of BCP 78 and BCP 79. This document may contain material | provisions of BCP 78 and BCP 79. This document may contain material | |||
| from IETF Documents or IETF Contributions published or made publicly | from IETF Documents or IETF Contributions published or made publicly | |||
| available before November 10, 2008. The person(s) controlling the | available before November 10, 2008. The person(s) controlling the | |||
| copyright in some of this material may not have granted the IETF | copyright in some of this material may not have granted the IETF | |||
| Trust the right to allow modifications of such material outside the | Trust the right to allow modifications of such material outside the | |||
| IETF Standards Process. Without obtaining an adequate license from | IETF Standards Process. Without obtaining an adequate license from | |||
| skipping to change at page 1, line 43 ¶ | skipping to change at page 1, line 43 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on December 20, 2009. | This Internet-Draft will expire on February 14, 2010. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents in effect on the date of | Provisions Relating to IETF Documents in effect on the date of | |||
| publication of this document (http://trustee.ietf.org/license-info). | publication of this document (http://trustee.ietf.org/license-info). | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| skipping to change at page 2, line 27 ¶ | skipping to change at page 2, line 27 ¶ | |||
| these issues require tuning of the existing protocols and the tables | these issues require tuning of the existing protocols and the tables | |||
| on which they depend. This document provides an overview of a | on which they depend. This document provides an overview of a | |||
| revised system and provides explanatory material for its components. | revised system and provides explanatory material for its components. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | 1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3.1. Documents and Standards . . . . . . . . . . . . . . . 5 | 1.3.1. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | |||
| 1.3.2. DNS "Name" Terminology . . . . . . . . . . . . . . . . 5 | 1.3.2. New Terminology and Restrictions . . . . . . . . . . . 6 | |||
| 1.3.3. New Terminology and Restrictions . . . . . . . . . . . 6 | 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7 | ||||
| 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | 1.5. Applicability and Function of IDNA . . . . . . . . . . . . 7 | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | 1.6. Comprehensibility of IDNA Mechanisms and Processing . . . 8 | |||
| 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | 2. Processing in IDNA2008 . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | 3. Permitted Characters: An Inclusion List . . . . . . . . . . . 9 | |||
| 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | 3.1. A Tiered Model of Permitted Characters and Labels . . . . 10 | |||
| 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | 3.1.1. PROTOCOL-VALID . . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 | 3.1.2. CONTEXTUAL RULE REQUIRED . . . . . . . . . . . . . . . 11 | |||
| 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 | 3.1.2.2. Rules and Their Application . . . . . . . . . . . 12 | |||
| 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1.3. DISALLOWED . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | 3.1.4. UNASSIGNED . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 13 | 3.2. Registration Policy . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, | 3.3. Layered Restrictions: Tables, Context, Registration, | |||
| Applications . . . . . . . . . . . . . . . . . . . . . . . 14 | Applications . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 | 4. Issues that Constrain Possible Solutions . . . . . . . . . . . 15 | |||
| 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 | 4.1. Display and Network Order . . . . . . . . . . . . . . . . 15 | |||
| 4.2. Entry and Display in Applications . . . . . . . . . . . . 16 | 4.2. Entry and Display in Applications . . . . . . . . . . . . 16 | |||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and | 4.3. Linguistic Expectations: Ligatures, Digraphs, and | |||
| Alternate Character Forms . . . . . . . . . . . . . . . . 17 | Alternate Character Forms . . . . . . . . . . . . . . . . 18 | |||
| 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 18 | 4.4. Case Mapping and Related Issues . . . . . . . . . . . . . 20 | |||
| 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 19 | 4.5. Right to Left Text . . . . . . . . . . . . . . . . . . . . 21 | |||
| 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 20 | 5. IDNs and the Robustness Principle . . . . . . . . . . . . . . 21 | |||
| 6. Front-end and User Interface Processing for Lookup . . . . . . 20 | 6. Front-end and User Interface Processing for Lookup . . . . . . 22 | |||
| 7. Migration from IDNA2003 and Unicode Version Synchronization . 24 | 7. Migration from IDNA2003 and Unicode Version Synchronization . 24 | |||
| 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 | 7.1. Design Criteria . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 24 | 7.1.1. Summary and Discussion of IDNA Validity Criteria . . . 25 | |||
| 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | 7.1.2. Labels in Registration . . . . . . . . . . . . . . . . 25 | |||
| 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | 7.1.3. Labels in Lookup . . . . . . . . . . . . . . . . . . . 26 | |||
| 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | 7.2. Changes in Character Interpretations . . . . . . . . . . . 27 | |||
| 7.3. More Flexibility in User Agents . . . . . . . . . . . . . 28 | 7.3. Character Mapping . . . . . . . . . . . . . . . . . . . . 29 | |||
| 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 30 | 7.4. The Question of Prefix Changes . . . . . . . . . . . . . . 29 | |||
| 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 30 | 7.4.1. Conditions Requiring a Prefix Change . . . . . . . . . 29 | |||
| 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 31 | 7.4.2. Conditions Not Requiring a Prefix Change . . . . . . . 30 | |||
| 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 31 | 7.4.3. Implications of Prefix Changes . . . . . . . . . . . . 30 | |||
| 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 | 7.5. Stringprep Changes and Compatibility . . . . . . . . . . . 31 | |||
| 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 32 | 7.6. The Symbol Question . . . . . . . . . . . . . . . . . . . 31 | |||
| 7.7. Migration Between Unicode Versions: Unassigned Code | 7.7. Migration Between Unicode Versions: Unassigned Code | |||
| Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | Points . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 35 | 7.8. Other Compatibility Issues . . . . . . . . . . . . . . . . 34 | |||
| 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35 | 8. Name Server Considerations . . . . . . . . . . . . . . . . . . 35 | |||
| 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 36 | 8.1. Processing Non-ASCII Strings . . . . . . . . . . . . . . . 35 | |||
| 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 36 | 8.2. DNSSEC Authentication of IDN Domain Names . . . . . . . . 35 | |||
| 8.3. Root and other DNS Server Considerations . . . . . . . . . 37 | 8.3. Root and other DNS Server Considerations . . . . . . . . . 36 | |||
| 9. Internationalization Considerations . . . . . . . . . . . . . 37 | 9. Internationalization Considerations . . . . . . . . . . . . . 36 | |||
| 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 38 | 10.1. IDNA Character Registry . . . . . . . . . . . . . . . . . 37 | |||
| 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 38 | 10.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 37 | |||
| 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 38 | 10.3. IANA Repository of IDN Practices of TLDs . . . . . . . . . 37 | |||
| 11. Security Considerations . . . . . . . . . . . . . . . . . . . 38 | 11. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | |||
| 11.1. General Security Issues with IDNA . . . . . . . . . . . . 38 | 11.1. General Security Issues with IDNA . . . . . . . . . . . . 37 | |||
| 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 | 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 39 | 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 14.1. Normative References . . . . . . . . . . . . . . . . . . . 40 | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 39 | |||
| 14.2. Informative References . . . . . . . . . . . . . . . . . . 41 | 14.2. Informative References . . . . . . . . . . . . . . . . . . 40 | |||
| Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 43 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 42 | |||
| A.1. Changes between Version -00 and Version -01 of | A.1. Changes between Version -00 and Version -01 of | |||
| draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 43 | draft-ietf-idnabis-rationale . . . . . . . . . . . . . . . 42 | |||
| A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 44 | A.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 44 | A.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44 | A.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 45 | A.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 45 | A.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 46 | A.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 47 | A.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| A.11. Version -11 . . . . . . . . . . . . . . . . . . . . . . . 46 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. Context and Overview | 1.1. Context and Overview | |||
| Internationalized Domain Names in Applications (IDNA) is a collection | Internationalized Domain Names in Applications (IDNA) is a collection | |||
| of standards that allow client applications to convert some Unicode | of standards that allow client applications to convert some Unicode | |||
| mnemonics to an ASCII-compatible encoding form ("ACE") which is a | mnemonics to an ASCII-compatible encoding form ("ACE") which is a | |||
| valid DNS label containing only letters, digits, and hyphens. The | valid DNS label containing only letters, digits, and hyphens. The | |||
| specific form of ACE label used by IDNA is called an "A-label". A | specific form of ACE label used by IDNA is called an "A-label". A | |||
| client can look up an exact A-label in the existing DNS, so A-labels | client can look up an exact A-label in the existing DNS, so A-labels | |||
| do not require any extensions to DNS, upgrades of DNS servers or | do not require any extensions to DNS, upgrades of DNS servers or | |||
| updates to low-level client libraries. An A-label is recognizable | updates to low-level client libraries. An A-label is recognizable | |||
| from the prefix "xn--" before the characters produced by the Punycode | from the prefix "xn--" before the characters produced by the Punycode | |||
| algorithm [RFC3492], thus a user application can identify an A-label | algorithm [RFC3492], thus a user application can identify an A-label | |||
| and convert it into Unicode (or some local coded character set) for | and convert it into Unicode (or some local coded character set) for | |||
| display. | display. | |||
| [[anchor3: Note in draft: The above discussion, and the rest of the | ||||
| text in this section, are very informal. In particular, the term | ||||
| "A-label" is used to refer to some things that don't meet all of the | ||||
| tests for A-labels. I have tightened it somewhat from the suggested | ||||
| text I received, but not very much. Is the current form ok with | ||||
| everyone???]] | ||||
| On the registry side, IDNA allows a registry to offer | On the registry side, IDNA allows a registry to offer | |||
| Internationalized Domain Names (IDNs) for registration as A-labels. | Internationalized Domain Names (IDNs) for registration as A-labels. | |||
| A registry may offer any subset of valid IDNs, and may apply any | A registry may offer any subset of valid IDNs, and may apply any | |||
| restrictions or bundling (grouping of similar labels together in one | restrictions or bundling (grouping of similar labels together in one | |||
| registration) appropriate for the context of that registry. | registration) appropriate for the context of that registry. | |||
| Registration of labels is sometimes discussed separately from lookup, | Registration of labels is sometimes discussed separately from lookup, | |||
| and is subject to a few specific requirements that do not apply to | and is subject to a few specific requirements that do not apply to | |||
| lookup. | lookup. | |||
| DNS clients and registries are subject to some differences in | DNS clients and registries are subject to some differences in | |||
| requirements for handling IDNs. In particular, registries are urged | requirements for handling IDNs. In particular, registries are urged | |||
| to register only exact, valid A-labels, while clients might do some | to register only exact, valid A-labels, while clients might do some | |||
| mapping to get from otherwise-invalid user input to a valid A-label. | mapping to get from otherwise-invalid user input to a valid A-label. | |||
| The first version of IDNA was published in 2003 and is referred to | The first version of IDNA was published in 2003 and is referred to | |||
| here as IDNA2003 to contrast it with the current version, which is | here as IDNA2003 to contrast it with the current version, which is | |||
| known as IDNA2008. The documents that made up both versions are | known as IDNA2008 (after the year in which IETF work started on it). | |||
| listed in Section 1.3.1. The characters that are valid in A-labels | IDNA2003 consists of four documents: the IDNA base specification | |||
| are identified from rules listed in the Tables document | [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep | |||
| [IDNA2008-Tables], but validity can be derived from the Unicode | [RFC3454]. The current set of documents, IDNA2008, are not dependent | |||
| properties of those characters with a very few exceptions. | on any of the IDNA2003 specifications other than the one for Punycode | |||
| encoding. References to "these specifications" or "these documents" | ||||
| are to the entire IDNA2008 set listed in [IDNA2008-Defs]. The | ||||
| characters that are valid in A-labels are identified from rules | ||||
| listed in the Tables document [IDNA2008-Tables], but validity can be | ||||
| derived from the Unicode properties of those characters with a very | ||||
| few exceptions. | ||||
| Traditionally, DNS labels are case-insensitive [RFC1034][RFC1035]. | Traditionally, DNS labels are matched case-insensitively | |||
| That pattern was preserved in IDNA2003, but if case rules are | [RFC1034][RFC1035]. That convention was preserved in IDNA2003 by a | |||
| enforced from one language, another language sometimes loses the | case-folding operation that generally maps capital letters into | |||
| ability to treat two characters separately. Case-sensitivity is | lower-case ones. However, if case rules are enforced from one | |||
| treated slightly differently in IDNA2008. | language, another language sometimes loses the ability to treat two | |||
| characters separately. Case-sensitivity is treated slightly | ||||
| differently in IDNA2008. | ||||
| IDNA2003 used Unicode version 3.2 only. In order to keep up with new | IDNA2003 used Unicode version 3.2 only. In order to keep up with new | |||
| characters added in new versions of UNICODE, IDNA2008 decouples its | characters added in new versions of UNICODE, IDNA2008 decouples its | |||
| rules from any particular version of UNICODE. Instead, the | rules from any particular version of UNICODE. Instead, the | |||
| attributes of new characters in Unicode determines how and whether | attributes of new characters in Unicode, supplemented by a small | |||
| the characters can be used in IDNA labels. | number of exception cases, determine how and whether the characters | |||
| can be used in IDNA labels. | ||||
| This document provides informational context for IDNA2008, including | This document provides informational context for IDNA2008, including | |||
| terminology, background, and policy discussions. | terminology, background, and policy discussions. | |||
| 1.2. Discussion Forum | 1.2. Discussion Forum | |||
| [[ RFC Editor: please remove this section. ]] | [[ RFC Editor: please remove this section. ]] | |||
| IDNA2008 is being discussed in the IETF "idnabis" Working Group and | IDNA2008 is being discussed in the IETF "idnabis" Working Group and | |||
| on the mailing list idna-update@alvestrand.no | on the mailing list idna-update@alvestrand.no | |||
| 1.3. Terminology | 1.3. Terminology | |||
| Terminology for IDNA2008 appears in [IDNA2008-Defs]. That document | Terminology for IDNA2008 appears in [IDNA2008-Defs]. That document | |||
| also contains a roadmap to the IDNA2008 document collection. No | also contains a roadmap to the IDNA2008 document collection. No | |||
| attempt should be made to understand this document without the | attempt should be made to understand this document without the | |||
| definitions and concepts that appear there. | definitions and concepts that appear there. | |||
| 1.3.1. Documents and Standards | 1.3.1. DNS "Name" Terminology | |||
| This document uses the term "IDNA2003" to refer to the set of | ||||
| standards published in 2003 to define IDNA: the IDNA base | ||||
| specification [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and | ||||
| Stringprep [RFC3454]. | ||||
| The term "IDNA2008" is used to refer to a new version of IDNA. | ||||
| IDNA2008 is not dependent on any of the IDNA2003 specifications other | ||||
| than the one for Punycode encoding. References to "these | ||||
| specifications" or "these documents" are to the entire IDNA2008 set | ||||
| listed in [IDNA2008-Defs]. | ||||
| 1.3.2. DNS "Name" Terminology | ||||
| In the context of IDNs, the DNS term 'name' has introduced some | In the context of IDNs, the DNS term "name" has introduced some | |||
| confusion as people speak of DNS labels in terms of the words or | confusion as people speak of DNS labels in terms of the words or | |||
| phrases of various natural languages. Historically, many of the | phrases of various natural languages. Historically, many of the | |||
| "names" in the DNS have been mnemonics to identify some particular | "names" in the DNS have been mnemonics to identify some particular | |||
| concept, object, or organization. They are typically rooted in some | concept, object, or organization. They are typically rooted in some | |||
| language because most people think in language-based ways. But, | language because most people think in language-based ways. But, | |||
| because they are mnemonics, they need not obey the orthographic | because they are mnemonics, they need not obey the orthographic | |||
| conventions of any language: it is not a requirement that it be | conventions of any language: it is not a requirement that it be | |||
| possible for them to be "words". | possible for them to be "words". | |||
| This distinction is important because the reasonable goal of an IDN | This distinction is important because the reasonable goal of an IDN | |||
| effort is not to be able to write the great Klingon (or language of | effort is not to be able to write the great Klingon (or language of | |||
| one's choice) novel in DNS labels but to be able to form a usefully | one's choice) novel in DNS labels but to be able to form a usefully | |||
| broad range of mnemonics in ways that are as natural as possible in a | broad range of mnemonics in ways that are as natural as possible in a | |||
| very broad range of scripts. | very broad range of scripts. | |||
| 1.3.3. New Terminology and Restrictions | 1.3.2. New Terminology and Restrictions | |||
| These documents introduce new terminology, and precise definitions, | These documents introduce new terminology, and precise definitions | |||
| for the terms "U-label", "A-Label", LDH-label (to which all valid | (in [IDNA2008-Defs]), for the terms "U-label", "A-Label", LDH-label | |||
| pre-IDNA host names conformed), Reserved-LDH-label (R-LDH-label), XN- | (to which all valid pre-IDNA host names conformed), Reserved-LDH- | |||
| label, Fake-A-Label, and Non-Reserved-LDH-label (NR-LDH-label). | label (R-LDH-label), XN-label, Fake-A-Label, and Non-Reserved-LDH- | |||
| label (NR-LDH-label). | ||||
| In addition, the term "putative label" has been adopted to refer to a | In addition, the term "putative label" has been adopted to refer to a | |||
| label that may appear to meet certain definitional constraints but | label that may appear to meet certain definitional constraints but | |||
| has not yet been sufficiently tested for validity. | has not yet been sufficiently tested for validity. | |||
| These definitions are illustrated in Figure 1 of the Definitions | These definitions are also illustrated in Figure 1 of the Definitions | |||
| Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and | Document [IDNA2008-Defs]. R-LDH-labels contain "--" in the third and | |||
| fourth character from the beginning of the label. In IDNA-aware | fourth character from the beginning of the label. In IDNA-aware | |||
| applications, only a subset of these reserved labels is permitted to | applications, only a subset of these reserved labels is permitted to | |||
| be used, namely the A-label subset. A-labels are a subset of the | be used, namely the A-label subset. A-labels are a subset of the | |||
| R-LDH-labels that begin with the case-insensitive string "xn--". | R-LDH-labels that begin with the case-insensitive string "xn--". | |||
| Labels that bear this prefix but which are not otherwise valid fall | Labels that bear this prefix but which are not otherwise valid fall | |||
| into the "Fake-A-label" category. The non-reserved labels (NR-LDH- | into the "Fake-A-label" category. The non-reserved labels (NR-LDH- | |||
| labels) are implicitly valid since they do not trigger any | labels) are implicitly valid since they do not trigger any | |||
| resemblance to IDNA-landr NR-LDH-labels. | resemblance to IDNA-landr NR-LDH-labels. | |||
| skipping to change at page 7, line 5 ¶ | skipping to change at page 6, line 40 ¶ | |||
| o to prevent confusion with pre-IDNA coding forms; | o to prevent confusion with pre-IDNA coding forms; | |||
| o to permit future extensions that would require changing the | o to permit future extensions that would require changing the | |||
| prefix, no matter how unlikely those might be (see Section 7.4); | prefix, no matter how unlikely those might be (see Section 7.4); | |||
| and | and | |||
| o to reduce the opportunities for attacks via the Punycode encoding | o to reduce the opportunities for attacks via the Punycode encoding | |||
| algorithm itself. | algorithm itself. | |||
| As with other documents in the IDNA2008 set, this document uses the | ||||
| term "registry" to describe any zone in the DNS. That term, and the | ||||
| terms "zone" or "zone administration", are interchangeable. | ||||
| 1.4. Objectives | 1.4. Objectives | |||
| These are the main objectives in revising IDNA. | These are the main objectives in revising IDNA. | |||
| o Use a more recent version of Unicode, and allow IDNA to be | o Use a more recent version of Unicode, and allow IDNA to be | |||
| independent of Unicode versions, so that IDNA2008 need not be | independent of Unicode versions, so that IDNA2008 need not be | |||
| update for implementations to adopt codepoints from new Unicode | updated for implementations to adopt codepoints from new Unicode | |||
| versions. | versions. | |||
| o Fix a very small number of code-point categorizations that have | o Fix a very small number of code-point categorizations that have | |||
| turned out to cause problems in the communities that use those | turned out to cause problems in the communities that use those | |||
| code-points. | code-points. | |||
| o Reduce the dependency on mapping, in order that the pre-mapped | o Reduce the dependency on mapping, in order that the pre-mapped | |||
| forms (which are not valid IDNA labels) tend to appear less often | forms (which are not valid IDNA labels) tend to appear less often | |||
| in various contexts, in favor of valid A-labels. | in various contexts, in favor of valid A-labels. | |||
| skipping to change at page 7, line 49 ¶ | skipping to change at page 7, line 40 ¶ | |||
| agents. The introduction of the larger repertoire of characters | agents. The introduction of the larger repertoire of characters | |||
| potentially makes the set of misspellings larger, especially given | potentially makes the set of misspellings larger, especially given | |||
| that in some cases the same appearance, for example on a business | that in some cases the same appearance, for example on a business | |||
| card, might visually match several Unicode code points or several | card, might visually match several Unicode code points or several | |||
| sequences of code points. | sequences of code points. | |||
| The IDNA standard does not require any applications to conform to it, | The IDNA standard does not require any applications to conform to it, | |||
| nor does it retroactively change those applications. An application | nor does it retroactively change those applications. An application | |||
| can elect to use IDNA in order to support IDN while maintaining | can elect to use IDNA in order to support IDN while maintaining | |||
| interoperability with existing infrastructure. If an application | interoperability with existing infrastructure. If an application | |||
| wants to use non-ASCII characters in domain names, IDNA is the only | wants to use non-ASCII characters in public DNS domain names, IDNA is | |||
| currently-defined option. Adding IDNA support to an existing | the only currently-defined option. Adding IDNA support to an | |||
| application entails changes to the application only, and leaves room | existing application entails changes to the application only, and | |||
| for flexibility in front-end processing and more specifically in the | leaves room for flexibility in front-end processing and more | |||
| user interface (see Section 6). | specifically in the user interface (see Section 6). | |||
| A great deal of the discussion of IDN solutions has focused on | A great deal of the discussion of IDN solutions has focused on | |||
| transition issues and how IDNs will work in a world where not all of | transition issues and how IDNs will work in a world where not all of | |||
| the components have been updated. Proposals that were not chosen by | the components have been updated. Proposals that were not chosen by | |||
| the original IDN Working Group would have depended on updating of | the original IDN Working Group would have depended on updating of | |||
| user applications, DNS resolvers, and DNS servers in order for a user | user applications, DNS resolvers, and DNS servers in order for a user | |||
| to apply an internationalized domain name in any form or coding | to apply an internationalized domain name in any form or coding | |||
| acceptable under that method. While processing must be performed | acceptable under that method. While processing must be performed | |||
| prior to or after access to the DNS, IDNA requires no changes to the | prior to or after access to the DNS, IDNA requires no changes to the | |||
| DNS protocol or any DNS servers or the resolvers on user's computers. | DNS protocol or any DNS servers or the resolvers on user's computers. | |||
| IDNA allows the graceful introduction of IDNs not only by avoiding | IDNA allows the graceful introduction of IDNs not only by avoiding | |||
| upgrades to existing infrastructure (such as DNS servers and mail | upgrades to existing infrastructure (such as DNS servers and mail | |||
| transport agents), but also by allowing some rudimentary use of IDNs | transport agents), but also by allowing some limited use of IDNs in | |||
| in applications by using the ASCII-encoded representation of the | applications by using the ASCII-encoded representation of the labels | |||
| labels containing non-ASCII characters. While such names are user- | containing non-ASCII characters. While such names are user- | |||
| unfriendly to read and type, and hence not optimal for user input, | unfriendly to read and type, and hence not optimal for user input, | |||
| they can be used as a last resort to allow rudimentary IDN usage. | they can be used as a last resort to allow rudimentary IDN usage. | |||
| For example, they might be the best choice for display if it were | For example, they might be the best choice for display if it were | |||
| known that relevant fonts were not available on the user's computer. | known that relevant fonts were not available on the user's computer. | |||
| In order to allow user-friendly input and output of the IDNs and | In order to allow user-friendly input and output of the IDNs and | |||
| acceptance of some characters as equivalent to those to be processed | acceptance of some characters as equivalent to those to be processed | |||
| according to the protocol, the applications need to be modified to | according to the protocol, the applications need to be modified to | |||
| conform to this specification. | conform to this specification. | |||
| This version of IDNA uses the Unicode character repertoire, for | This version of IDNA uses the Unicode character repertoire, for | |||
| continuity with the original version of IDNA. | continuity with the original version of IDNA. | |||
| 1.6. Comprehensibility of IDNA Mechanisms and Processing | 1.6. Comprehensibility of IDNA Mechanisms and Processing | |||
| One goal of IDNA2008, which is aided by the main goal of reducing the | One goal of IDNA2008, which is aided by the main goal of reducing the | |||
| dependency on mapping, is to improve the general understanding of how | dependency on mapping, is to improve the general understanding of how | |||
| to users and registrants are important design goals for this effort. | IDNA works and what characters are permitted and what happens to | |||
| End-user applications have an important role to play in increasing | them. Comprehensibility and predictability to users and registrants | |||
| this comprehensibility. | are important design goals for this effort. End-user applications | |||
| have an important role to play in increasing this comprehensibility. | ||||
| Any system that tries to handle international characters encounters | Any system that tries to handle international characters encounters | |||
| some common problems. For example, a UI cannot display a character | some common problems. For example, a UI cannot display a character | |||
| if no font for that character is available. In some cases, | if no font for that character is available. In some cases, | |||
| internationalization enables effective localization while maintaining | internationalization enables effective localization while maintaining | |||
| some global uniformity but losing some universality. | some global uniformity but losing some universality. | |||
| It is difficult to even make suggestions for end-user applications to | It is difficult to even make suggestions for end-user applications to | |||
| cope when characters and fonts are not available. Because display | cope when characters and fonts are not available. Because display | |||
| functions are rarely controlled by the types of applications that | functions are rarely controlled by the types of applications that | |||
| would call upon IDNA, such suggestions will rarely be very effective. | would call upon IDNA, such suggestions will rarely be very effective. | |||
| Converting between local character sets and normalized Unicode, if | Converting between local character sets and normalized Unicode, if | |||
| needed, is part of this set of user agent issues. This conversion | needed, is part of this set of user agent issues. This conversion | |||
| introduces complexity in a system that is not Unicode-native. If a | introduces complexity in a system that is not Unicode-native. If a | |||
| label is converted to a local character set that does not have all | label is converted to a local character set that does not have all | |||
| the needed characters, the user agent may have to add special logic | the needed characters, or that uses different character-coding | |||
| to avoid or reduce loss of information. | principles, the user agent may have to add special logic to avoid or | |||
| reduce loss of information. | ||||
| The major difficulty may lie in accurately identifying the incoming | The major difficulty may lie in accurately identifying the incoming | |||
| character set and applying the correct conversion routine. Even more | character set and applying the correct conversion routine. Even more | |||
| difficult, the local character coding system could be based on | difficult, the local character coding system could be based on | |||
| conceptually different assumptions than those used by Unicode (e.g., | conceptually different assumptions than those used by Unicode (e.g., | |||
| choice of font encodings used for publications in some Indic | choice of font encodings used for publications in some Indic | |||
| scripts). Those differences may not easily yield unambiguous | scripts). Those differences may not easily yield unambiguous | |||
| conversions or interpretations even if each coding system is | conversions or interpretations even if each coding system is | |||
| internally consistent and adequate to represent the local language | internally consistent and adequate to represent the local language | |||
| and script. | and script. | |||
| IDNA2008 shifts responsibility for character mapping and other | IDNA2008 shifts responsibility for character mapping and other | |||
| adjustments from the protocol (where it was located in IDNA2003) to | adjustments from the protocol (where it was located in IDNA2003) to | |||
| pre-processing before invoking IDNA. The intent is that this change | pre-processing before invoking IDNA itself. The intent is that this | |||
| leads to greater usage of fully-valid A-Labels in display, transit | change will lead to greater usage of fully-valid A-Labels or U-labels | |||
| and storage, which should aid comprehensibility. A careful look at | in display, transit and storage, which should aid comprehensibility | |||
| pre-processing raises issues about what that pre-processing should do | and predictability. A careful look at pre-processing raises issues | |||
| and at what point pre-processing becomes harmful, how universally | about what that pre-processing should do and at what point pre- | |||
| consistent pre-processing algorithms can be, and how to be compatible | processing becomes harmful, how universally consistent pre-processing | |||
| with labels prepared in a IDNA2003 context. Those issues are | algorithms can be, and how to be compatible with labels prepared in a | |||
| discussed in Section 6. [[anchor9: Fix section reference.]] | IDNA2003 context. Those issues are discussed in Section 6 and in the | |||
| separate document [IDNA2008-Mapping]. | ||||
| 2. Processing in IDNA2008 | 2. Processing in IDNA2008 | |||
| These specifications separate Domain Name Registration and Lookup in | These specifications separate Domain Name Registration and Lookup in | |||
| the protocol specification. This separation reflects current | the protocol specification. Although most steps in the two processes | |||
| practice in which per-registry restrictions and special processing | are similar, the separation reflects current practice in which per- | |||
| are applied at registration time but not during lookup. Another | registry (DNS zone) restrictions and special processing are applied | |||
| significant benefit is that separation facilitates incremental | at registration time but not during lookup. Another significant | |||
| addition of permitted character groups to avoid freezing on one | benefit is that separation facilitates incremental addition of | |||
| particular version of Unicode. | permitted character groups to avoid freezing on one particular | |||
| version of Unicode. | ||||
| The actual registration and lookup protocols for IDNA2008 are | The actual registration and lookup protocols for IDNA2008 are | |||
| specified in [IDNA2008-Protocol]. | specified in [IDNA2008-Protocol]. | |||
| 3. Permitted Characters: An Inclusion List | 3. Permitted Characters: An Inclusion List | |||
| IDNA2008 adopts the inclusion model. A code-point is assumed to be | IDNA2008 adopts the inclusion model. A code-point is assumed to be | |||
| invalid, unless it is included as part of a Unicode property-based | invalid for IDN use unless it is included as part of a Unicode | |||
| rule or in rare cases included individually by an exception. When an | property-based rule or, in rare cases, included individually by an | |||
| implementation moves to a new version of Unicode, the rules may | exception. When an implementation moves to a new version of Unicode, | |||
| indicate new valid code-points. | the rules may indicate new valid code-points. | |||
| This section provides an overview of the model used to establish the | This section provides an overview of the model used to establish the | |||
| algorithm and character lists of [IDNA2008-Tables] and describes the | algorithm and character lists of [IDNA2008-Tables] and describes the | |||
| names and applicability of the categories used there. Note that the | names and applicability of the categories used there. Note that the | |||
| inclusion of a character in the first category group (Section 3.1.1) | inclusion of a character in the first category group (Section 3.1.1) | |||
| does not imply that it can be used indiscriminately; some characters | does not imply that it can be used indiscriminately; some characters | |||
| are associated with contextual rules that must be applied as well. | are associated with contextual rules that must be applied as well. | |||
| The information given in this section is provided to make the rules, | The information given in this section is provided to make the rules, | |||
| tables, and protocol easier to understand. The normative generating | tables, and protocol easier to understand. The normative generating | |||
| skipping to change at page 10, line 33 ¶ | skipping to change at page 10, line 28 ¶ | |||
| list of characters that are permitted in IDNs. In IDNA2003, | list of characters that are permitted in IDNs. In IDNA2003, | |||
| character validity is independent of context and fixed forever (or | character validity is independent of context and fixed forever (or | |||
| until the standard is replaced). However, globally context- | until the standard is replaced). However, globally context- | |||
| independent rules have proved to be impractical because some | independent rules have proved to be impractical because some | |||
| characters, especially those that are called "Join_Controls" in | characters, especially those that are called "Join_Controls" in | |||
| Unicode, are needed to make reasonable use of some scripts but have | Unicode, are needed to make reasonable use of some scripts but have | |||
| no visible effect in others. IDNA2003 prohibited those types of | no visible effect in others. IDNA2003 prohibited those types of | |||
| characters entirely by discarding them. We now have a consensus that | characters entirely by discarding them. We now have a consensus that | |||
| under some conditions, these "joiner" characters are legitimately | under some conditions, these "joiner" characters are legitimately | |||
| needed to allow useful mnemonics for some languages and scripts. In | needed to allow useful mnemonics for some languages and scripts. In | |||
| general, context-dependent rules help deal with characters that are | general, context-dependent rules help deal with characters (generally | |||
| used differently across different scripts, and allow the standard to | characters that would otherwise be prohibited entirely) that are used | |||
| be applied more appropriately in cases where a string is not | differently or perceived differently across different scripts, and | |||
| universally handled the same way. | allow the standard to be applied more appropriately in cases where a | |||
| string is not universally handled the same way. | ||||
| IDNA2008 divides all possible Unicode code-points into four | IDNA2008 divides all possible Unicode code-points into four | |||
| categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and | categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and | |||
| UNASSIGNED. | UNASSIGNED. | |||
| 3.1.1. PROTOCOL-VALID | 3.1.1. PROTOCOL-VALID | |||
| Characters identified as "PROTOCOL-VALID" (often abbreviated | Characters identified as "PROTOCOL-VALID" (often abbreviated | |||
| "PVALID") are permitted in IDNs. Their use may be restricted by | "PVALID") are permitted in IDNs. Their use may be restricted by | |||
| rules about the context in which they appear or by other rules that | rules about the context in which they appear or by other rules that | |||
| skipping to change at page 11, line 23 ¶ | skipping to change at page 11, line 18 ¶ | |||
| expected to never be removed from it or reclassified. While | expected to never be removed from it or reclassified. While | |||
| theoretically characters could be removed from Unicode, such removal | theoretically characters could be removed from Unicode, such removal | |||
| would be inconsistent with the Unicode stability principles (see | would be inconsistent with the Unicode stability principles (see | |||
| [Unicode51], Appendix F) and hence should never occur. | [Unicode51], Appendix F) and hence should never occur. | |||
| 3.1.2. CONTEXTUAL RULE REQUIRED | 3.1.2. CONTEXTUAL RULE REQUIRED | |||
| Some characters may be unsuitable for general use in IDNs but | Some characters may be unsuitable for general use in IDNs but | |||
| necessary for the plausible support of some scripts. The two most | necessary for the plausible support of some scripts. The two most | |||
| commonly-cited examples are the zero-width joiner and non-joiner | commonly-cited examples are the zero-width joiner and non-joiner | |||
| characters (ZWJ, U+200D and ZWNJ, U+200C). | characters (ZWJ, U+200D and ZWNJ, U+200C) but other characters may | |||
| require special treatment because they would otherwise be DISALLOWED | ||||
| (typically because Unicode considers them punctuation or special | ||||
| symbols) but need to be permitted in limited contexts. Other | ||||
| characters are given this special treatment because they pose | ||||
| exceptional danger of being used to produce misleading labels or to | ||||
| cause unacceptable ambiguity in label matching and interpretation. | ||||
| 3.1.2.1. Contextual Restrictions | 3.1.2.1. Contextual Restrictions | |||
| Characters with contextual restrictions are identified as "CONTEXTUAL | Characters with contextual restrictions are identified as "CONTEXTUAL | |||
| RULE REQUIRED" and associated with a rule. The rule defines whether | RULE REQUIRED" and associated with a rule. The rule defines whether | |||
| the character is valid in a particular string, and also whether the | the character is valid in a particular string, and also whether the | |||
| rule itself is to be applied on lookup as well as registration. | rule itself is to be applied on lookup as well as registration. | |||
| A distinction is made between characters that indicate or prohibit | A distinction is made between characters that indicate or prohibit | |||
| joining and ones similar to them (known as "CONTEXT-JOINER" or | joining and ones similar to them (known as "CONTEXT-JOINER" or | |||
| skipping to change at page 11, line 48 ¶ | skipping to change at page 11, line 49 ¶ | |||
| It is important to note that these contextual rules cannot prevent | It is important to note that these contextual rules cannot prevent | |||
| all uses of the relevant characters that might be confusing or | all uses of the relevant characters that might be confusing or | |||
| problematic. What they are expected do is to confine applicability | problematic. What they are expected do is to confine applicability | |||
| of the characters to scripts (and narrower contexts) where zone | of the characters to scripts (and narrower contexts) where zone | |||
| administrators are knowledgeable enough about the use of those | administrators are knowledgeable enough about the use of those | |||
| characters to be prepared to deal with them appropriately. For | characters to be prepared to deal with them appropriately. For | |||
| example, a registry dealing with an Indic script that requires ZWJ | example, a registry dealing with an Indic script that requires ZWJ | |||
| and/or ZWNJ as part of the writing system is expected to understand | and/or ZWNJ as part of the writing system is expected to understand | |||
| where the characters have visible effect and where they do not and to | where the characters have visible effect and where they do not and to | |||
| make registration rules accordingly. By contrast, a registry dealing | make registration rules accordingly. By contrast, a registry dealing | |||
| with Latin or Cyrillic script might not be actively aware that the | primarily with Latin or Cyrillic script might not be actively aware | |||
| characters exist, much less about the consequences of embedding them | that the characters exist, much less about the consequences of | |||
| in labels drawn from those scripts. | embedding them in labels drawn from those scripts. | |||
| 3.1.2.2. Rules and Their Application | 3.1.2.2. Rules and Their Application | |||
| Rules have descriptions such as "Must follow a character from Script | Rules have descriptions such as "Must follow a character from Script | |||
| XYZ", "Must occur only if the entire label is in Script ABC", or | XYZ", "Must occur only if the entire label is in Script ABC", or | |||
| "Must occur only if the previous and subsequent characters have the | "Must occur only if the previous and subsequent characters have the | |||
| DFG property". The actual rules may be DEFINED or NULL. If present, | DFG property". The actual rules may be DEFINED or NULL. If present, | |||
| they may have values of "True" (character may be used in any position | they may have values of "True" (character may be used in any position | |||
| in any label), "False" (character may not be used in any label), or | in any label), "False" (character may not be used in any label), or | |||
| may be a set of procedural rules that specify the context in which | may be a set of procedural rules that specify the context in which | |||
| skipping to change at page 12, line 30 ¶ | skipping to change at page 12, line 30 ¶ | |||
| Because it is easier to identify these characters than to know that | Because it is easier to identify these characters than to know that | |||
| they are actually needed in IDNs or how to establish exactly the | they are actually needed in IDNs or how to establish exactly the | |||
| right rules for each one, a rule may have a null value in a given | right rules for each one, a rule may have a null value in a given | |||
| version of the tables. Characters associated with null rules are not | version of the tables. Characters associated with null rules are not | |||
| permitted to appear in putative labels for either registration or | permitted to appear in putative labels for either registration or | |||
| lookup. Of course, a later version of the tables might contain a | lookup. Of course, a later version of the tables might contain a | |||
| non-null rule. | non-null rule. | |||
| The actual rules and their descriptions are in [IDNA2008-Tables]. | The actual rules and their descriptions are in [IDNA2008-Tables]. | |||
| [[anchor12: ??? Section number would be good here.]] That document | [[anchor9: ??? Section number would be good here.]] That document | |||
| also creates a registry for future rules. | also specifies the creation of a registry for future rules. | |||
| 3.1.3. DISALLOWED | 3.1.3. DISALLOWED | |||
| Some characters are inappropriate for use in IDNs and are thus | Some characters are inappropriate for use in IDNs and are thus | |||
| excluded for both registration and lookup (i.e., IDNA-conforming | excluded for both registration and lookup (i.e., IDNA-conforming | |||
| applications performing name lookup should verify that these | applications performing name lookup should verify that these | |||
| characters are absent; if they are present, the label strings should | characters are absent; if they are present, the label strings should | |||
| be rejected rather than converted to A-labels and looked up. Some of | be rejected rather than converted to A-labels and looked up. Some of | |||
| these characters are problematic for use in IDNs (such as the | these characters are problematic for use in IDNs (such as the | |||
| FRACTION SLASH character, U+2044), while some of them (such as the | FRACTION SLASH character, U+2044), while some of them (such as the | |||
| skipping to change at page 13, line 37 ¶ | skipping to change at page 13, line 37 ¶ | |||
| For convenience in processing and table-building, code points that do | For convenience in processing and table-building, code points that do | |||
| not have assigned values in a given version of Unicode are treated as | not have assigned values in a given version of Unicode are treated as | |||
| belonging to a special UNASSIGNED category. Such code points are | belonging to a special UNASSIGNED category. Such code points are | |||
| prohibited in labels to be registered or looked up. The category | prohibited in labels to be registered or looked up. The category | |||
| differs from DISALLOWED in that code points are moved out of it by | differs from DISALLOWED in that code points are moved out of it by | |||
| the simple expedient of being assigned in a later version of Unicode | the simple expedient of being assigned in a later version of Unicode | |||
| (at which point, they are classified into one of the other categories | (at which point, they are classified into one of the other categories | |||
| as appropriate). | as appropriate). | |||
| The rationale for restricting the processing of UNASSIGNED characters | The rationale for restricting the processing of UNASSIGNED characters | |||
| is simply that if such characters were permitted to be looked up, for | is simply that the properties of such code points cannot be | |||
| example, and were later assigned, but subject to some set of | completely known until actual characters are assigned to them. If, | |||
| contextual rules, un-updated instances of IDNA-aware software might | for example, such a code point was permitted to be included in a | |||
| permit lookup of labels containing the previously-unassigned | label to be looked up, and the code point was later to be assigned to | |||
| characters while updated versions of IDNA-aware software might | a character that required some set of contextual rules, un-updated | |||
| restrict their use in lookup, depending on the contextual rules. It | instances of IDNA-aware software might permit lookup of labels | |||
| should be clear that under no circumstance should an UNASSIGNED | containing the previously-unassigned characters while updated | |||
| character be permitted in a label to be registered as part of a | versions of IDNA-aware software might restrict their use in lookup, | |||
| domain name. | depending on the contextual rules. It should be clear that under no | |||
| circumstance should an UNASSIGNED character be permitted in a label | ||||
| to be registered as part of a domain name. | ||||
| 3.2. Registration Policy | 3.2. Registration Policy | |||
| While these recommendations cannot and should not define registry | While these recommendations cannot and should not define registry | |||
| policies, registries should develop and apply additional restrictions | policies, registries should develop and apply additional restrictions | |||
| as needed to reduce confusion and other problems. For example, it is | as needed to reduce confusion and other problems. For example, it is | |||
| generally believed that labels containing characters from more than | generally believed that labels containing characters from more than | |||
| one script are a bad practice although there may be some important | one script are a bad practice although there may be some important | |||
| exceptions to that principle. Some registries may choose to restrict | exceptions to that principle. Some registries may choose to restrict | |||
| registrations to characters drawn from a very small number of | registrations to characters drawn from a very small number of | |||
| skipping to change at page 14, line 24 ¶ | skipping to change at page 14, line 30 ¶ | |||
| from scripts that are well-understood by the registry or its | from scripts that are well-understood by the registry or its | |||
| advisers. If a registry decides to reduce opportunities for | advisers. If a registry decides to reduce opportunities for | |||
| confusion by constructing policies that disallow characters used in | confusion by constructing policies that disallow characters used in | |||
| historic writing systems or characters whose use is restricted to | historic writing systems or characters whose use is restricted to | |||
| specialized, highly technical contexts, some relevant information may | specialized, highly technical contexts, some relevant information may | |||
| be found in Section 2.4 "Specific Character Adjustments", Table 4 | be found in Section 2.4 "Specific Character Adjustments", Table 4 | |||
| "Candidate Characters for Exclusion from Identifiers" of | "Candidate Characters for Exclusion from Identifiers" of | |||
| [Unicode-UAX31] and Section 3.1. "General Security Profile for | [Unicode-UAX31] and Section 3.1. "General Security Profile for | |||
| Identifiers" in [Unicode-Security]. | Identifiers" in [Unicode-Security]. | |||
| The requirement (in [IDNA2008-Protocol] [[anchor10: ?? Section | ||||
| number]]) that registration procedures use only U-labels and/or | ||||
| A-labels is intended to ensure that registrants are fully aware of | ||||
| exactly what is being registered as well as encouraging use of those | ||||
| canonical forms. That provision should not be interpreted as | ||||
| requiring that registrant need to provide characters in a particular | ||||
| code sequence. Registrant input conventions and management are part | ||||
| of registrant-registrar interactions and relationships between | ||||
| registries and registrars and are outside the scope of these | ||||
| standards. | ||||
| It is worth stressing that these principles of policy development and | It is worth stressing that these principles of policy development and | |||
| application apply at all levels of the DNS, not only, e.g., TLD or | application apply at all levels of the DNS, not only, e.g., TLD or | |||
| SLD registrations and that even a trivial, "anything permitted that | SLD registrations and that even a trivial, "anything permitted that | |||
| is valid under the protocol" policy is helpful in that it helps users | is valid under the protocol" policy is helpful in that it helps users | |||
| and application developers know what to expect. | and application developers know what to expect. | |||
| 3.3. Layered Restrictions: Tables, Context, Registration, Applications | 3.3. Layered Restrictions: Tables, Context, Registration, Applications | |||
| The character rules in IDNA2008 are based on the realization that | The character rules in IDNA2008 are based on the realization that | |||
| there is no single magic bullet for any of the issues associated with | there is no single magic bullet for any of the security, | |||
| IDNs. Instead, the specifications define a variety of approaches. | confusability, or other issues associated with IDNs. Instead, the | |||
| The character tables are the first mechanism, protocol rules about | specifications define a variety of approaches. The character tables | |||
| how those characters are applied or restricted in context are the | are the first mechanism, protocol rules about how those characters | |||
| second, and those two in combination constitute the limits of what | are applied or restricted in context are the second, and those two in | |||
| can be done in the protocol. As discussed in the previous section | combination constitute the limits of what can be done in the | |||
| (Section 3.2), registries are expected to restrict what they permit | protocol. As discussed in the previous section (Section 3.2), | |||
| to be registered, devising and using rules that are designed to | registries are expected to restrict what they permit to be | |||
| optimize the balance between confusion and risk on the one hand and | registered, devising and using rules that are designed to optimize | |||
| maximum expressiveness in mnemonics on the other. | the balance between confusion and risk on the one hand and maximum | |||
| expressiveness in mnemonics on the other. | ||||
| In addition, there is an important role for user agents in warning | In addition, there is an important role for user agents in warning | |||
| against label forms that appear problematic given their knowledge of | against label forms that appear problematic given their knowledge of | |||
| local contexts and conventions. Of course, no approach based on | local contexts and conventions. Of course, no approach based on | |||
| naming or identifiers alone can protect against all threats. | naming or identifiers alone can protect against all threats. | |||
| 4. Issues that Constrain Possible Solutions | 4. Issues that Constrain Possible Solutions | |||
| 4.1. Display and Network Order | 4.1. Display and Network Order | |||
| skipping to change at page 16, line 9 ¶ | skipping to change at page 16, line 24 ¶ | |||
| If each implementation of each application makes its own decisions on | If each implementation of each application makes its own decisions on | |||
| these issues, users will develop heuristics that will sometimes fail | these issues, users will develop heuristics that will sometimes fail | |||
| when switching applications. However, while some display order | when switching applications. However, while some display order | |||
| conventions, voluntarily adopted, would be desirable to reduce | conventions, voluntarily adopted, would be desirable to reduce | |||
| confusion, such suggestions are beyond the scope of these | confusion, such suggestions are beyond the scope of these | |||
| specifications. | specifications. | |||
| 4.2. Entry and Display in Applications | 4.2. Entry and Display in Applications | |||
| Applications can accept and display domain names using any character | Applications can accept and display domain names using any character | |||
| set or character coding system. That is, the IDNA protocol does not | set or character coding system. The IDNA protocol does not | |||
| necessarily affect the interface between users and applications. An | necessarily affect the interface between users and applications. An | |||
| IDNA-aware application can accept and display internationalized | IDNA-aware application can accept and display internationalized | |||
| domain names in two formats: the internationalized character set(s) | domain names in two formats: the internationalized character set(s) | |||
| supported by the application (i.e., an appropriate local | supported by the application (i.e., an appropriate local | |||
| representation of a U-label), and as an A-label. Applications may | representation of a U-label), and as an A-label. Applications may | |||
| allow the display of A-labels, but are encouraged to not do so except | allow the display of A-labels, but are encouraged to not do so except | |||
| as an interface for special purposes, possibly for debugging, or to | as an interface for special purposes, possibly for debugging, or to | |||
| cope with display limitations. In general, they should allow, but | cope with display limitations. In general, they should allow, but | |||
| not encourage, user input of A-labels. A-labels are opaque and ugly | not encourage, user input of A-labels. A-labels are opaque, ugly, | |||
| and malicious variations on them are not easily detected by users. | and malicious variations on them are not easily detected by users. | |||
| Where possible, they should thus only be exposed when they are | Where possible, they should thus only be exposed when they are | |||
| absolutely needed. Because IDN labels can be rendered either as | absolutely needed. Because IDN labels can be rendered either as | |||
| A-labels or U-labels, the application may reasonably have an option | A-labels or U-labels, the application may reasonably have an option | |||
| for the user to select the preferred method of display. Rendering | for the user to select the preferred method of display. Rendering | |||
| the U-label should normally be the default. | the U-label should normally be the default. | |||
| Domain names are often stored and transported in many places. For | Domain names are often stored and transported in many places. For | |||
| example, they are part of documents such as mail messages and web | example, they are part of documents such as mail messages and web | |||
| pages. They are transported in many parts of many protocols, such as | pages. They are transported in many parts of many protocols, such as | |||
| both the control commands of SMTP and associated the message body | both the control commands of SMTP and associated message body parts, | |||
| parts, and in the headers and the body content in HTTP. It is | and in the headers and the body content in HTTP. It is important to | |||
| important to remember that domain names appear both in domain name | remember that domain names appear both in domain name slots and in | |||
| slots and in the content that is passed over protocols. | the content that is passed over protocols. | |||
| In protocols and document formats that define how to handle | In protocols and document formats that define how to handle | |||
| specification or negotiation of charsets, labels can be encoded in | specification or negotiation of charsets, labels can be encoded in | |||
| any charset allowed by the protocol or document format. If a | any charset allowed by the protocol or document format. If a | |||
| protocol or document format only allows one charset, the labels must | protocol or document format only allows one charset, the labels must | |||
| be given in that charset. Of course, not all charsets can properly | be given in that charset. Of course, not all charsets can properly | |||
| represent all labels. If a U-label cannot be displayed in its | represent all labels. If a U-label cannot be displayed in its | |||
| entirety, the only choice (without loss of information) may be to | entirety, the only choice (without loss of information) may be to | |||
| display the A-label. | display the A-label. | |||
| Where a protocol or document format allows IDNs, labels should be in | Where a protocol or document format allows IDNs, labels should be in | |||
| whatever character encoding and escape mechanism the protocol or | whatever character encoding and escape mechanism the protocol or | |||
| document format uses at that place. This provision is intended to | document format uses at that place. This provision is intended to | |||
| prevent situations in which, e.g., UTF-8 domain names appear embedded | prevent situations in which, e.g., UTF-8 domain names appear embedded | |||
| in text that is otherwise in some other character coding. | in text that is otherwise in some other character coding. | |||
| All protocols that use domain name slots (See Section 2.3.1.6 | All protocols that use domain name slots (See Section 2.3.1.6 in | |||
| [[anchor15: ?? Verify this]] in [IDNA2008-Defs]) already have the | [IDNA2008-Defs]) already have the capacity for handling domain names | |||
| capacity for handling domain names in the ASCII charset. Thus, | in the ASCII charset. Thus, A-labels can inherently be handled by | |||
| A-labels can inherently be handled by those protocols. | those protocols. | |||
| These documents do not specify required mappings between one | ||||
| character or code point and others. An extended discussion of | ||||
| mapping issues occurs in Section 6 and specific recommendations | ||||
| appear in [IDNA2008-Mapping]. In general, IDNA2008 prohibits | ||||
| characters that would be mapped to others by normalization or other | ||||
| rules. As examples, while mathematical characters based on Latin | ||||
| ones are accepted as input to IDNA2003, they are prohibited in | ||||
| IDNA2008. Similarly, upper-case characters, double-width characters, | ||||
| and other variations are prohibited as IDNA input although mapping | ||||
| them as needed in user interfaces is strongly encouraged. | ||||
| Since the rules in [IDNA2008-Tables] have the effect that only | ||||
| strings that are not transformed by NFKC are valid, if an application | ||||
| chooses to perform NFKC normalization before lookup, that operation | ||||
| is safe since this will never make the application unable to look up | ||||
| any valid string. However, as discussed above, the application | ||||
| cannot guarantee that any other application will perform that | ||||
| mapping, so it should be used only with caution and for informed | ||||
| users. | ||||
| In many cases these prohibitions should have no effect on what the | ||||
| user can type as input to the lookup process. It is perfectly | ||||
| reasonable for systems that support user interfaces to perform some | ||||
| character mapping that is appropriate to the local environment. This | ||||
| would normally be done prior to actual invocation of IDNA. At least | ||||
| conceptually, the mapping would be part of the Unicode conversions | ||||
| discussed above and in [IDNA2008-Protocol]. However, those changes | ||||
| will be local ones only -- local to environments in which users will | ||||
| clearly understand that the character forms are equivalent. For use | ||||
| in interchange among systems, it appears to be much more important | ||||
| that U-labels and A-labels can be mapped back and forth without loss | ||||
| of information. | ||||
| One specific, and very important, instance of this strategy arises | ||||
| with case-folding. In the ASCII-only DNS, names are looked up and | ||||
| matched in a case-independent way, but no actual case-folding occurs. | ||||
| Names can be placed in the DNS in either upper or lower case form (or | ||||
| any mixture of them) and that form is preserved, returned in queries, | ||||
| and so on. IDNA2003 approximated that behavior for non-ASCII strings | ||||
| by performing case-folding at registration time (resulting in only | ||||
| lower-case IDNs in the DNS) and when names were looked up. | ||||
| As suggested earlier in this section, it appears to be desirable to | ||||
| do as little character mapping as possible as long as Unicode works | ||||
| correctly (e.g., NFC mapping to resolve different codings for the | ||||
| same character is still necessary although the specifications require | ||||
| that it be performed prior to invoking the protocol) in order to make | ||||
| the mapping between A-labels and U-labels idempotent. Case-mapping | ||||
| is not an exception to this principle. If only lower case characters | ||||
| can be registered in the DNS (i.e., be present in a U-label), then | ||||
| IDNA2008 should prohibit upper-case characters as input even though | ||||
| user interfaces to applications should probably map those characters. | ||||
| Some other considerations reinforce this conclusion. For example, in | ||||
| ASCII case-mapping for individual characters, uppercase(character) | ||||
| must be equal to uppercase(lowercase(character)). That may not be | ||||
| true with IDNs. In some scripts that use case distinctions, there | ||||
| are a few characters that do not have counterparts in one case or the | ||||
| other. The relationship between upper case and lower case may even | ||||
| be language-dependent, with different languages (or even the same | ||||
| language in different areas) expecting different mappings. User | ||||
| agents can meet the expectations of users who are accustomed to the | ||||
| case-insensitive DNS environment by performing case folding prior to | ||||
| IDNA processing, but the IDNA procedures themselves should neither | ||||
| require such mapping nor expect them when they are not natural to the | ||||
| localized environment. | ||||
| 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | 4.3. Linguistic Expectations: Ligatures, Digraphs, and Alternate | |||
| Character Forms | Character Forms | |||
| Users have expectations about character matching or equivalence that | Users have expectations about character matching or equivalence that | |||
| are based on their own languages and the orthography of those | are based on their own languages and the orthography of those | |||
| languages. These expectations may not always be met in a global | languages. These expectations may not always be met in a global | |||
| system, especially if multiple languages are written using the same | system, especially if multiple languages are written using the same | |||
| script but using different conventions. Some examples: | script but using different conventions. Some examples: | |||
| skipping to change at page 17, line 44 ¶ | skipping to change at page 19, line 29 ¶ | |||
| appear consecutively without forming a digraph, as in "tophat".) | appear consecutively without forming a digraph, as in "tophat".) | |||
| Certain digraphs may be indicated typographically by setting the two | Certain digraphs may be indicated typographically by setting the two | |||
| characters closer together than they would be if used consecutively | characters closer together than they would be if used consecutively | |||
| to represent different phonemes. Some digraphs are fully joined as | to represent different phonemes. Some digraphs are fully joined as | |||
| ligatures. For example, the word "encyclopaedia" is sometimes set | ligatures. For example, the word "encyclopaedia" is sometimes set | |||
| with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph | with a U+00E6 LATIN SMALL LIGATURE AE. When ligature and digraph | |||
| forms have the same interpretation across all languages that use a | forms have the same interpretation across all languages that use a | |||
| given script, application of Unicode normalization generally resolves | given script, application of Unicode normalization generally resolves | |||
| the differences and causes them to match. When they have different | the differences and causes them to match. When they have different | |||
| interpretations, matching must utilize other methods, presumably | interpretations, matching must utilize other methods, presumably | |||
| chosen at the registry completely optional typographic convenience | chosen at the registry level, or users must be educated to understand | |||
| for representing a digraph in one language (as in the above example | that matching will not occur. | |||
| with some spelling conventions), while in another language it is a | ||||
| single character that may not always be correctly representable by a | ||||
| two-letter sequence (as in the above example with different spelling | ||||
| conventions). This can be illustrated by many words in the Norwegian | ||||
| language, where the "ae" ligature is the 27th letter of a 29-letter | ||||
| extended Latin alphabet. It is equivalent to the 28th letter of the | ||||
| Swedish alphabet (also containing 29 letters), U+00E4 LATIN SMALL | ||||
| LETTER A WITH DIAERESIS, for which an "ae" cannot be substituted | ||||
| according to current orthographic standards. | ||||
| That character (U+00E4) is also part of the German alphabet where, | The nature of the problem can be illustrated by many words in the | |||
| unlike in the Nordic languages, the two-character sequence "ae" is | Norwegian language, where the "ae" ligature is the 27th letter of a | |||
| usually treated as a fully acceptable alternate orthography for the | 29-letter extended Latin alphabet. It is equivalent to the 28th | |||
| "umlauted a" character. The inverse is however not true, and those | letter of the Swedish alphabet (also containing 29 letters), U+00E4 | |||
| two characters cannot necessarily be combined into an "umlauted a". | LATIN SMALL LETTER A WITH DIAERESIS, for which an "ae" cannot be | |||
| This also applies to another German character, the "umlauted o" | substituted according to current orthographic standards. That | |||
| (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for example, | character (U+00E4) is also part of the German alphabet where, unlike | |||
| cannot be used for writing the name of the author "Goethe". It is | in the Nordic languages, the two-character sequence "ae" is usually | |||
| also a letter in the Swedish alphabet where, like the "a with | treated as a fully acceptable alternate orthography for the "umlauted | |||
| diaeresis", it cannot be correctly represented as "oe" and in the | a" character. The inverse is however not true, and those two | |||
| Norwegian alphabet, where it is represented, not as "o with | characters cannot necessarily be combined into an "umlauted a". This | |||
| diaeresis", but as "slashed o", U+00F8. | also applies to another German character, the "umlauted o" (U+00F6 | |||
| LATIN SMALL LETTER O WITH DIAERESIS) which, for example, cannot be | ||||
| used for writing the name of the author "Goethe". It is also a | ||||
| letter in the Swedish alphabet where, like the "a with diaeresis", it | ||||
| cannot be correctly represented as "oe" and in the Norwegian | ||||
| alphabet, where it is represented, not as "o with diaeresis", but as | ||||
| "slashed o", U+00F8. | ||||
| Some of the ligatures that have explicit code points in Unicode were | Some of the ligatures that have explicit code points in Unicode were | |||
| given special handling in IDNA2003 and now pose additional problems | given special handling in IDNA2003 and now pose additional problems | |||
| in transition. See Section 7.2. | in transition. See Section 7.2. | |||
| Additional cases with alphabets written right to left are described | Additional cases with alphabets written right to left are described | |||
| in Section 4.5. | in Section 4.5. | |||
| Matching and comparison algorithm selection often requires | Matching and comparison algorithm selection often requires | |||
| information about the language being used, context, or both -- | information about the language being used, context, or both -- | |||
| information that is not available to IDNA or the DNS. Consequently, | information that is not available to IDNA or the DNS. Consequently, | |||
| these specifications make no attempt to treat combined characters in | these specifications make no attempt to treat combined characters in | |||
| any special way. A registry that is aware of the language context in | any special way. A registry that is aware of the language context in | |||
| which labels are to be registered, and where that language sometimes | which labels are to be registered, and where that language sometimes | |||
| (or always) treats the two- character sequences as equivalent to the | (or always) treats the two- character sequences as equivalent to the | |||
| combined form, should give serious consideration to applying a | combined form, should give serious consideration to applying a | |||
| "variant" model [RFC3743] [RFC4290], or to prohibiting registration | "variant" model [RFC3743][RFC4290], or to prohibiting registration of | |||
| of one the forms entirely, to reduce the opportunities for user | one of the forms entirely, to reduce the opportunities for user | |||
| confusion and fraud that would result from the related strings being | confusion and fraud that would result from the related strings being | |||
| registered to different parties. | registered to different parties. | |||
| [[anchor16: Placeholder: A discussion of the Arabic digit issue | ||||
| should go here once it is resolved in some appropriate way.]] | ||||
| 4.4. Case Mapping and Related Issues | 4.4. Case Mapping and Related Issues | |||
| In the DNS, ASCII letters are stored with their case preserved. | In the DNS, ASCII letters are stored with their case preserved. | |||
| Matching during the query process is case-independent, but none of | Matching during the query process is case-independent, but none of | |||
| the information that might be represented by choices of case has been | the information that might be represented by choices of case has been | |||
| lost. That model has been accidentally helpful because, as people | lost. That model has been accidentally helpful because, as people | |||
| have created DNS labels by catenating words (or parts of words) to | have created DNS labels by catenating words (or parts of words) to | |||
| form labels, case has often been used to distinguish among components | form labels, case has often been used to distinguish among components | |||
| and make the labels more memorable. | and make the labels more memorable. | |||
| skipping to change at page 19, line 22 ¶ | skipping to change at page 20, line 48 ¶ | |||
| nothing in these specifications fundamentally changes it or could do | nothing in these specifications fundamentally changes it or could do | |||
| so. In IDNA2003, all characters are case-folded and mapped by | so. In IDNA2003, all characters are case-folded and mapped by | |||
| clients in a standardized step. | clients in a standardized step. | |||
| Some characters do not have upper case forms. For example the | Some characters do not have upper case forms. For example the | |||
| Unicode case folding operation maps Greek Final Form Sigma (U+03C2) | Unicode case folding operation maps Greek Final Form Sigma (U+03C2) | |||
| to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) | to the medial form (U+03C3) and maps Eszett (German Sharp S, U+00DF) | |||
| to "ss". Neither of these mappings is reversible because the upper | to "ss". Neither of these mappings is reversible because the upper | |||
| case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII | case of U+03C3 is the Upper Case Sigma (U+03A3) and "ss" is an ASCII | |||
| string. IDNA2008 permits, at the risk of some incompatibility, | string. IDNA2008 permits, at the risk of some incompatibility, | |||
| slightly more flexibility in this area by avoid case folding and | slightly more flexibility in this area by avoiding case folding and | |||
| treating these characters as themselves. Approaches to handling one- | treating these characters as themselves. Approaches to handling one- | |||
| way mappings are discussed in Section 7.2. | way mappings are discussed in Section 7.2. | |||
| Because IDNA2003 maps Final Sigma and Eszett to other characters, and | Because IDNA2003 maps Final Sigma and Eszett to other characters, and | |||
| the reverse mapping is never possible, that in some sense means that | the reverse mapping is never possible, that in some sense means that | |||
| neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN. | neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN. | |||
| With IDNA2008, both characters can be used in an IDN and so the | With IDNA2008, both characters can be used in an IDN and so the | |||
| A-label used for lookup for any U-label containing those characters, | A-label used for lookup for any U-label containing those characters, | |||
| is now different. See Section 7.1 for a discussion of what kinds of | is now different. See Section 7.1 for a discussion of what kinds of | |||
| changes might require the IDNA prefix to change; this case is clearly | changes might require the IDNA prefix to change; after extended | |||
| worth discussing but the WG came to consensus not to make a prefix | discussions, the WG came to consensus that the change for these | |||
| change anyway. | characters did not justify a prefix change. | |||
| 4.5. Right to Left Text | 4.5. Right to Left Text | |||
| In order to be sure that the directionality of right to left text is | In order to be sure that the directionality of right to left text is | |||
| unambiguous, IDNA2003 required that any label in which right to left | unambiguous, IDNA2003 required that any label in which right to left | |||
| characters appear both starts and ends with them and that it not | characters appear both starts and ends with them and that it not | |||
| include any characters with strong left to right properties (that | include any characters with strong left to right properties (that | |||
| excludes other alphabetic characters but permits European digits). | excludes other alphabetic characters but permits European digits). | |||
| Any other string that contains a right to left character and does not | Any other string that contains a right to left character and does not | |||
| meet those requirements is rejected. This is one of the few places | meet those requirements is rejected. This is one of the few places | |||
| skipping to change at page 20, line 47 ¶ | skipping to change at page 22, line 26 ¶ | |||
| If a string cannot be successfully found in the DNS after the lookup | If a string cannot be successfully found in the DNS after the lookup | |||
| processing described here, it makes no difference whether it simply | processing described here, it makes no difference whether it simply | |||
| wasn't registered or was prohibited by some rule at the registry. | wasn't registered or was prohibited by some rule at the registry. | |||
| Application implementors should be aware that where DNS wildcards are | Application implementors should be aware that where DNS wildcards are | |||
| used, the ability to successfully resolve a name does not guarantee | used, the ability to successfully resolve a name does not guarantee | |||
| that it was actually registered. | that it was actually registered. | |||
| 6. Front-end and User Interface Processing for Lookup | 6. Front-end and User Interface Processing for Lookup | |||
| [[anchor18: Note in Draft: While this section has been revised in | ||||
| version -10 to improve clarity, a significant revision is expected | ||||
| once the discussions of mapping stabilize.]] | ||||
| Domain names may be identified and processed in many contexts. They | Domain names may be identified and processed in many contexts. They | |||
| may be typed in by users either by themselves or embedded in an | may be typed in by users either by themselves or embedded in an | |||
| identifier such as email addresses, URIs, or IRIs. They may occur in | identifier such as email addresses, URIs, or IRIs. They may occur in | |||
| running text or be processed by one system after being provided in | running text or be processed by one system after being provided in | |||
| another. Systems may try to normalize URLs to determine (or guess) | another. Systems may try to normalize URLs to determine (or guess) | |||
| whether a reference is valid or two references point to the same | whether a reference is valid or two references point to the same | |||
| object without actually looking the objects up (comparison without | object without actually looking the objects up (comparison without | |||
| lookup is necessary for URI types that are not intended to be | lookup is necessary for URI types that are not intended to be | |||
| resolved). Some of these goals may be more easily and reliably | resolved). Some of these goals may be more easily and reliably | |||
| satisfied than others. While there are strong arguments for any | satisfied than others. While there are strong arguments for any | |||
| domain name that is placed "on the wire" -- transmitted between | domain name that is placed "on the wire" -- transmitted between | |||
| systems -- to be in the zero-ambiguity forms of A-labels, it is | systems -- to be in the zero-ambiguity forms of A-labels, it is | |||
| inevitable that programs that process domain names will encounter | inevitable that programs that process domain names will encounter | |||
| U-labels or variant forms. | U-labels or variant forms. | |||
| This section discusses these mapping and transformation issues among | An application that implements the IDNA protocol [IDNA2008-Protocol] | |||
| names, contrasting IDNA2003 and IDNA2008 behavior. The discussion | will always take any user input and convert it to a set of Unicode | |||
| applies only in operations that look up names or interpret files. | code points. That user input may be acquired by any of several | |||
| There are several reasons why registration activities should require | different input methods, all with differing conversion processes to | |||
| final names and verification of those names by the would-be | be taken into consideration (e.g., typed on a keyboard, written by | |||
| registrant. | hand onto some sort of digitizer, spoken into a microphone and | |||
| interpreted by a speech-to-text engine, etc.). The process of taking | ||||
| One source of label forms that are neither A-labels nor U-labels will | any particular user input and mapping it into a Unicode code point | |||
| be labels created under IDNA2003. That protocol allowed labels that | may be a simple one: If a user strikes the "A" key on a US English | |||
| were transformed from native-character format by mapping some | keyboard, without any modifiers such as the "Shift" key held down, in | |||
| characters into others before conversion into A-label format. One | order to draw a Latin small letter A ("a"), many (perhaps most) | |||
| consequence of the transformations was that conversion from the | modern operating system input methods will produce to the calling | |||
| A-label format back to native characters often did not produce the | application the code point U+0061, encoded in a single octet. | |||
| original label. IDNA2008 explicitly defines A-labels and U-labels as | ||||
| different forms of the same abstract label, forms that are stable | ||||
| when conversions are performed between them (without mappings). | ||||
| A different way of explaining this is that there are, today, domain | ||||
| names in files on the Internet that use characters that cannot be | ||||
| represented directly in, or recovered from, (A-label) domain names | ||||
| but for which interpretations were provided by IDNA2003). There are | ||||
| two major categories of characters irreversibly remapped by | ||||
| Stringprep, those that are removed by NFKC normalization and those | ||||
| upper-case characters that are mapped to lower-case (there are also a | ||||
| few characters that are given special-case mapping treatment, | ||||
| including lower-case characters that are case-folded into other | ||||
| lower-case characters or strings and those that are simply | ||||
| eliminated). | ||||
| Other issues in domain name identification and processing arise | ||||
| because IDNA2003 specified that several other characters be treated | ||||
| as equivalent to the ASCII period (dot, full stop) character used as | ||||
| a label separator. If a string that might be a domain name appears | ||||
| in an arbitrary context (such as running text), it is difficult, even | ||||
| with only ASCII characters, to know whether an actual domain name (or | ||||
| a protocol parameter like a URI) is present and where it starts and | ||||
| ends. When using Unicode, this gets even more difficult if treatment | ||||
| of certain special characters (like the dot that separates labels in | ||||
| a domain name) depends on context (e.g., prior knowledge of whether | ||||
| the string represents a domain name or not). That knowledge is not | ||||
| available if the primary heuristic for identifying the presence of | ||||
| domain names in strings depends on the presence of dots separating | ||||
| groups of characters with no intervening spaces. | ||||
| [[anchor19: Placeholder: In serial efforts to move the mapping model | Sometimes the process is somewhat more complicated: a user might | |||
| out of the protocol and leave it unspecified here, this paragraph has | strike a particular set of keys to represent a combining macron | |||
| become a complete botch. Rewrite when the mapping plan stabilizes.]] | followed by striking the "A" key in order to draw a Latin small | |||
| The IDNA2008 model removes all of these mappings and interpretations, | letter A with a macron above it. Depending on the operating system, | |||
| including the equivalence of different forms of dots, from the | the input method chosen by the user, and even the parameters with | |||
| protocol, discouraging such mappings and leaving them, when | which the application communicates with the input method, the result | |||
| necessary, to local processing. This should not be taken to imply | might be the code point U+0101 (encoded as two octets in UTF-8 or | |||
| that local processing is optional or can be avoided entirely, even if | UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed | |||
| doing so might have been desirable in a world without IDNA2003 IDNs | by the code point U+0304 (again, encoded in three or more octets, | |||
| in files and archives. Instead, unless the program context is such | depending upon the encoding used) or even the code point U+FF41 | |||
| that it is known that any IDNs that appear will contain either | followed by the code point U+0304 (and encoded in some form). And | |||
| U-label or A-label forms, or that other forms can safely be rejected, | these examples leave aside the issue of operating systems and input | |||
| some local processing of apparent domain name strings will be | methods that do not use Unicode code points for their character set. | |||
| required, both to maintain compatibility with IDNA2003 and to prevent | ||||
| user astonishment. Such local processing, while not specified in | ||||
| this document or the associated ones, will generally take one of two | ||||
| forms: | ||||
| o Generic Preprocessing. | In every case, applications (with the help of the operating systems | |||
| When the context in which the program or system that processes | on which they run and the input methods used) need to perform a | |||
| domain names operates is global, a reasonable balance must be | mapping from user input into Unicode code points. | |||
| found that is sensitive to the broad range of local needs and | ||||
| assumptions while, at the same time, not sacrificing the needs of | ||||
| one language, script, or user population to those of another. | ||||
| For this case, the best practice will usually be to apply NFKC and | The original version of the IDNA protocol [RFC3490] used a model | |||
| case-mapping (or, perhaps better yet, Stringprep itself), plus | whereby input was taken from the user, mapped (via whatever input | |||
| dot-mapping where appropriate, to the domain name string prior to | method mechanisms were used) to a set of Unicode code points, and | |||
| applying IDNA. That practice will not only yield a reasonable | then further mapped to a set of Unicode code points using the | |||
| compromise of user experience with protocol requirements but will | Nameprep profile specified in [RFC3491]. In this procedure, there | |||
| be almost completely compatible with the various forms permitted | are two separate mapping steps: First, a mapping done by the input | |||
| by IDNA2003. | method (which might be controlled by the operating system, the | |||
| application, or some combination) and then a second mapping performed | ||||
| by the Nameprep portion of the IDNA protocol. The mapping done in | ||||
| Nameprep includes a particular mapping table to re-map some | ||||
| characters to other characters, a particular normalization, and a set | ||||
| of prohibited characters. | ||||
| o Highly Localized Preprocessing. | Note that the result of the two step mapping process means that the | |||
| Unlike the case above, there will be some situations in which | mapping chosen by the operating system or application in the first | |||
| software will be highly localized for a particular environment and | step might differ significantly from the mapping supplied by the | |||
| carefully adapted to the expectations of users in that | Nameprep profile in the second step. This has advantages and | |||
| environment. The many discussions about using the Internet to | disadvantages. Of course, the second mapping regularizes what gets | |||
| preserve and support local cultures suggest that these cases may | looked up in the DNS, making for better interoperability between | |||
| be more common in the future than they have been so far. | implementations which use the Nameprep mapping. However, the | |||
| application or operating system may choose mappings in their input | ||||
| methods, which when passed through the second (Nameprep) mapping | ||||
| result in characters that are "surprising" to the end user. | ||||
| In these cases, we should avoid trying to tell implementers what | The other important feature of the original version of the IDNA | |||
| they should accept, if only because they are quite likely (and for | protocol is that, with very few exceptions, it assumes that any set | |||
| good reason) to ignore us. We would assume that they would map | of Unicode code points provided to the Nameprep mapping can be mapped | |||
| characters that the intuitions of their users would suggest be | into a string of Unicode code points that are "sensible", even if | |||
| mapped and would hope that they would do that mapping as early as | that means mapping some code points to nothing (that is, removing the | |||
| possible, storing A-label or U-label forms in files and | code points from the string). This allowed maximum flexibility in | |||
| transporting only those forms between systems. One can imagine | input strings. | |||
| switches about whether some sorts of mappings occur, warnings | ||||
| before applying them or, in a slightly more extreme version of the | ||||
| approach taken in Internet Explorer version 7 (IE7), systems that | ||||
| utterly refuse to handle "strange" characters at all if they | ||||
| appear in U-label form. None of those local decisions are a | ||||
| threat to interoperability as long as (i) only U-labels and | ||||
| A-labels are used in interchange with systems outside the local | ||||
| environment, (ii) no character that would be valid in a U-label as | ||||
| itself is mapped to something else, (iii) any local mappings are | ||||
| applied as a preprocessing step (or, for conversions from U-labels | ||||
| or A-labels to presentation forms, postprocessing), not as part of | ||||
| IDNA processing proper, and (iv) appropriate consideration is | ||||
| given to labels that might have entered the environment in | ||||
| conformance to IDNA2003. | ||||
| In either case, it is vital that user interface designs and, where | The present version of IDNA differs significantly in approach from | |||
| the interfaces are not sufficient, users, be aware that the only | the original version. First and foremost, it does not provide | |||
| forms of domain names that this protocol anticipates will resolve | explicit mapping instructions. Instead, it assumes that the | |||
| globally or compare equal when crude methods (i.e., those not | application (perhaps via an operating system input method) will do | |||
| conforming to the strict definition of label equivalence given in | whatever mapping it requires to convert input into Unicode code | |||
| [IDNA2008-Defs]) are used are those in which all native-script labels | points. This has the advantage of giving flexibility to the | |||
| are in U-label form. Forms that assume mapping will occur, | application to choose a mapping that is suitable for its user given | |||
| especially forms that were not valid under IDNA2003, may or may not | specific user requirements, and avoids the two-step mapping of the | |||
| function in predictable ways across all implementations. | original protocol. Instead of a mapping, the current version of IDNA | |||
| provides a set of categories that can be used to specify the valid | ||||
| code points allowed in a domain name. | ||||
| User interfaces involving Latin-based scripts should take special | In principle, an application ought to take user input of a domain | |||
| care when considering how to handle case mapping because small | name and convert it to the set of Unicode code points that represent | |||
| differences in label strings may cause behavior that is astonishing | the domain name the user intends. As a practical matter, of course, | |||
| to users. Because case-insensitive comparison is done for ASCII | determining user intent is a tricky business, so an application needs | |||
| strings by DNS-servers, an all-ASCII label is treated as case- | to choose a reasonable mapping from user input. That may differ | |||
| insensitive. However, if even one of the characters of that string | based on the particular circumstances of a user, depending on locale, | |||
| is replaced by one that requires the label to be given IDN treatment | language, type of input method, etc. It is up to the application to | |||
| (e.g., by adding a diacritical mark), then the label effectively | make a reasonable choice. | |||
| becomes case-sensitive because only lower-case characters are | ||||
| permitted in IDNs. Preprocessing in applications to handle case | ||||
| mapping for Latin-based scripts (and possibly other scripts with case | ||||
| distinctions) may be wise to prevent user astonishment. However, all | ||||
| applications may not do this and ambiguity in transport is not | ||||
| desirable. Consequently the case-dependent forms should not be | ||||
| stored in files. | ||||
| 7. Migration from IDNA2003 and Unicode Version Synchronization | 7. Migration from IDNA2003 and Unicode Version Synchronization | |||
| 7.1. Design Criteria | 7.1. Design Criteria | |||
| As mentioned above and in RFC 4690, two key goals of the IDNA2008 | As mentioned above and in RFC 4690, two key goals of the IDNA2008 | |||
| design are | design are | |||
| o to enable applications to be agnostic about whether they are being | o to enable applications to be agnostic about whether they are being | |||
| run in environments supporting any Unicode version from 3.2 | run in environments supporting any Unicode version from 3.2 | |||
| skipping to change at page 26, line 38 ¶ | skipping to change at page 27, line 16 ¶ | |||
| whole-label rules. In particular, it must verify that | whole-label rules. In particular, it must verify that | |||
| * there are no leading combining marks, | * there are no leading combining marks, | |||
| * the "bidi" conditions are met if right to left characters | * the "bidi" conditions are met if right to left characters | |||
| appear, | appear, | |||
| * any required contextual rules are available, and | * any required contextual rules are available, and | |||
| * any contextual rules that are associated with Joiner Controls | * any contextual rules that are associated with Joiner Controls | |||
| are tested. | (and "CONTEXTJ" characters more generally) are tested. | |||
| o Do not reject labels based on other contextual rules about | o Do not reject labels based on other contextual rules about | |||
| characters, including mixed-script label prohibitions. Such rules | characters, including mixed-script label prohibitions. Such rules | |||
| may be used to influence presentation decisions in the user | may be used to influence presentation decisions in the user | |||
| interface, but not to avoid looking up domain names. | interface, but not to avoid looking up domain names. | |||
| Lookup applications that following these rules, rather than having | Lookup applications that following these rules, rather than having | |||
| their own criteria for rejecting lookup attempts, are not sensitive | their own criteria for rejecting lookup attempts, are not sensitive | |||
| to version incompatibilities with the particular zone registry | to version incompatibilities with the particular zone registry | |||
| associated with the domain name except for labels containing | associated with the domain name except for labels containing | |||
| characters recently added to Unicode. | characters recently added to Unicode. | |||
| An application or client that processes names according to this | An application or client that processes names according to this | |||
| protocol and then resolves them in the DNS will be able to locate any | protocol and then resolves them in the DNS will be able to locate any | |||
| name that is registered, as long as those registrations are IDNA- | name that is registered, as long as those registrations are IDNA- | |||
| value and its version of the IDNA tables is sufficiently up-to-date | valid and its version of the IDNA tables is sufficiently up-to-date | |||
| to interpret all of the characters in the label. Messages to users | to interpret all of the characters in the label. Messages to users | |||
| should distinguish between "label contains an unallocated code point" | should distinguish between "label contains an unallocated code point" | |||
| and other types of lookup failures. A failure on the basis of an old | and other types of lookup failures. A failure on the basis of an old | |||
| version of Unicode may lead the user to a desire to upgrade to a | version of Unicode may lead the user to a desire to upgrade to a | |||
| newer version, but will have no other ill effects (this is consistent | newer version, but will have no other ill effects (this is consistent | |||
| with behavior in the transition to the DNS when some hosts could not | with behavior in the transition to the DNS when some hosts could not | |||
| yet handle some forms of names or record types). | yet handle some forms of names or record types). | |||
| 7.2. Changes in Character Interpretations | 7.2. Changes in Character Interpretations | |||
| [[anchor22: This subsection will need to be rewritten when the | ||||
| mapping decisions stabilize.]] | ||||
| In those scripts that make case distinctions, there are a few | In those scripts that make case distinctions, there are a few | |||
| characters for which an obvious and unique upper case character has | characters for which an obvious and unique upper case character has | |||
| not historically been available to match a lower case one or vice | not historically been available to match a lower case one or vice | |||
| versa. For those characters, the mappings used in constructing the | versa. For those characters, the mappings used in constructing the | |||
| Stringprep tables for IDNA2003, performed using the Unicode CaseFold | Stringprep tables for IDNA2003, performed using the Unicode CaseFold | |||
| operation (See Section 5.8 of the Unicode Standard [Unicode51]), | operation (See Section 5.8 of the Unicode Standard [Unicode51]), | |||
| generate different characters or sets of characters. Those | generate different characters or sets of characters. Those | |||
| operations are not reversible and lose even more information than | operations are not reversible and lose even more information than | |||
| traditional upper case or lower case transformations, but are more | traditional upper case or lower case transformations, but are more | |||
| useful than those transformations for comparison purposes. Two | useful than those transformations for comparison purposes. Two | |||
| notable characters of this type are the German character Eszett | notable characters of this type are the German character Eszett | |||
| (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The | (Sharp S, U+00DF) and the Greek Final Form Sigma (U+03C2). The | |||
| former is case-folded to the ASCII string "ss", the latter to a | former is case-folded to the ASCII string "ss", the latter to a | |||
| medial (Lower Case) Sigma (U+03C3). | medial (Lower Case) Sigma (U+03C3). | |||
| The decision to eliminate mappings, including case folding, from the | The decision to eliminate mandatory and standardized mappings, | |||
| IDNA2008 protocol in order to make A-labels and U-labels idempotent | including case folding, from the IDNA2008 protocol in order to make | |||
| made these characters problematic. If they were to be disallowed, | A-labels and U-labels idempotent made these characters problematic. | |||
| important words and mnemonics could not be written in | If they were to be disallowed, important words and mnemonics could | |||
| orthographically reasonable ways. If they were to be permitted as | not be written in orthographically reasonable ways. If they were to | |||
| distinct characters, there would be no information loss and | be permitted as distinct characters, there would be no information | |||
| registries would have more flexibility, but IDNA2003 and IDNA2008 | loss and registries would have more flexibility, but IDNA2003 and | |||
| lookups might result in different A-labels. | IDNA2008 lookups might result in different A-labels. | |||
| With the understanding that there would be incompatibility either way | With the understanding that there would be incompatibility either way | |||
| but a judgment that the incompatibility was not significant enough to | but a judgment that the incompatibility was not significant enough to | |||
| justify a prefix change, the WG concluded that Eszett and Final Form | justify a prefix change, the WG concluded that Eszett and Final Form | |||
| Sigma should be treated as distinct and Protocol-Valid characters. | Sigma should be treated as distinct and Protocol-Valid characters. | |||
| Registries, especially those maintaining zones for third parties, | Registries, especially those maintaining zones for third parties, | |||
| must decide how to introduce a new service in a way that does not | must decide how to introduce a new service in a way that does not | |||
| create confusion or significantly weaken or invalidate existing | create confusion or significantly weaken or invalidate existing | |||
| identifiers. This is not a new problem; registries were faced with | identifiers. This is not a new problem; registries were faced with | |||
| skipping to change at page 28, line 30 ¶ | skipping to change at page 29, line 5 ¶ | |||
| corresponding string containing Eszett or Final Sigma | corresponding string containing Eszett or Final Sigma | |||
| respectively. | respectively. | |||
| o Adopt some sort of "variant" approach in which registrants obtain | o Adopt some sort of "variant" approach in which registrants obtain | |||
| labels with both character forms. | labels with both character forms. | |||
| o Adopt a different form of "variant" approach in which registration | o Adopt a different form of "variant" approach in which registration | |||
| of additional names is either not permitted at all or permitted | of additional names is either not permitted at all or permitted | |||
| only by the registrant who already has one of the names. | only by the registrant who already has one of the names. | |||
| 7.3. More Flexibility in User Agents | 7.3. Character Mapping | |||
| [[anchor23: Note in Draft: This section is mapping-related and may | ||||
| need to be revised after that issue settles down.]] Also, it is | ||||
| closely related to Section 4.2 and may need to be cross-referenced | ||||
| from it or consolidated into it. | ||||
| These documents do not specify mappings between one character or code | ||||
| point and others. Instead, IDNA2008 prohibits characters that would | ||||
| be mapped to others by normalization, upper case to lower case | ||||
| changes, or other rules. As examples, while mathematical characters | ||||
| based on Latin ones are accepted as input to IDNA2003, they are | ||||
| prohibited in IDNA2008. Similarly, double-width characters and other | ||||
| variations are prohibited as IDNA input. | ||||
| Since the rules in [IDNA2008-Tables] have the effect that only | ||||
| strings that are not transformed by NFKC are valid, if an application | ||||
| chooses to perform NFKC normalization before lookup, that operation | ||||
| is safe since this will never make the application unable to look up | ||||
| any valid string. However, as discussed above, the application | ||||
| cannot guarantee that any other application will perform that | ||||
| mapping, so it should be used only with caution and for informed | ||||
| users. | ||||
| In many cases these prohibitions should have no effect on what the | ||||
| user can type as input to the lookup process. It is perfectly | ||||
| reasonable for systems that support user interfaces to perform some | ||||
| character mapping that is appropriate to the local environment. This | ||||
| would normally be done prior to actual invocation of IDNA. At least | ||||
| conceptually, the mapping would be part of the Unicode conversions | ||||
| discussed above and in [IDNA2008-Protocol]. However, those changes | ||||
| will be local ones only -- local to environments in which users will | ||||
| clearly understand that the character forms are equivalent. For use | ||||
| in interchange among systems, it appears to be much more important | ||||
| that U-labels and A-labels can be mapped back and forth without loss | ||||
| of information. | ||||
| One specific, and very important, instance of this strategy arises | ||||
| with case-folding. In the ASCII-only DNS, names are looked up and | ||||
| matched in a case-independent way, but no actual case-folding occurs. | ||||
| Names can be placed in the DNS in either upper or lower case form (or | ||||
| any mixture of them) and that form is preserved, returned in queries, | ||||
| and so on. IDNA2003 approximated that behavior for non-ASCII strings | ||||
| by performing case-folding at registration time (resulting in only | ||||
| lower-case IDNs in the DNS) and when names were looked up. | ||||
| As suggested earlier in this section, it appears to be desirable to | As discussed at length in Section 6, IDNA2003, via Nameprep (see | |||
| do as little character mapping as possible as long as Unicode works | Section 7.5), mapped many characters into related ones. Those | |||
| correctly (e.g., NFC mapping to resolve different codings for the | mappings no longer exist as requirements in IDNA2008. These | |||
| same character is still necessary although the specifications require | specifications strongly prefer that only A-labels or U-labels be used | |||
| that it be performed prior to invoking the protocol) in order to make | in protocol contexts and as much as practical more generally. | |||
| the mapping between A-labels and U-labels idempotent. Case-mapping | IDNA2008 does anticipate situations in which some mapping at the time | |||
| is not an exception to this principle. If only lower case characters | of user input into lookup applications is appropriate and desirable. | |||
| can be registered in the DNS (i.e., be present in a U-label), then | The issues are discussed in Section 6 and specific recommendations | |||
| IDNA2008 should prohibit upper-case characters as input. Some other | are made in [IDNA2008-Mapping]. | |||
| considerations reinforce this conclusion. For example, in ASCII | ||||
| case-mapping for individual characters, uppercase(character) must be | ||||
| equal to uppercase(lowercase(character)). That may not be true with | ||||
| IDNs. In some scripts that use case distinctions, there are a few | ||||
| characters that do not have counterparts in one case or the other. | ||||
| The relationship between upper case and lower case may even be | ||||
| language-dependent, with different languages (or even the same | ||||
| language in different areas) expecting different mappings. User | ||||
| agents can meet the expectations of users who are accustomed to the | ||||
| case-insensitive DNS environment by performing case folding prior to | ||||
| IDNA processing, but the IDNA procedures themselves should neither | ||||
| require such mapping nor expect them when they are not natural to the | ||||
| localized environment. | ||||
| 7.4. The Question of Prefix Changes | 7.4. The Question of Prefix Changes | |||
| The conditions that would require a change in the IDNA ACE prefix | The conditions that would require a change in the IDNA ACE prefix | |||
| ("xn--" for the version of IDNA specified in [RFC3490]) have been a | ("xn--" for the version of IDNA specified in [RFC3490]) have been a | |||
| great concern to the community. A prefix change would clearly be | great concern to the community. A prefix change would clearly be | |||
| necessary if the algorithms were modified in a manner that would | necessary if the algorithms were modified in a manner that would | |||
| create serious ambiguities during subsequent transition in | create serious ambiguities during subsequent transition in | |||
| registrations. This section summarizes our conclusions about the | registrations. This section summarizes our conclusions about the | |||
| conditions under which changes in prefix would be necessary and the | conditions under which changes in prefix would be necessary and the | |||
| skipping to change at page 33, line 8 ¶ | skipping to change at page 32, line 18 ¶ | |||
| such as outline, solid, and shaded forms may or may not exist; | such as outline, solid, and shaded forms may or may not exist; | |||
| and so on. As just one example, consider a "heart" symbol as it | and so on. As just one example, consider a "heart" symbol as it | |||
| might appear in a logo that might be read as "I love...". While | might appear in a logo that might be read as "I love...". While | |||
| the user might read such a logo as "I love..." or "I heart...", | the user might read such a logo as "I love..." or "I heart...", | |||
| considerable knowledge of the coding distinctions made in Unicode | considerable knowledge of the coding distinctions made in Unicode | |||
| is needed to know that there more than one "heart" character | is needed to know that there more than one "heart" character | |||
| (e.g., U+2665, U+2661, and U+2765) and how to describe it. These | (e.g., U+2665, U+2661, and U+2765) and how to describe it. These | |||
| issues are of particular importance if strings are expected to be | issues are of particular importance if strings are expected to be | |||
| understood or transcribed by the listener after being read out | understood or transcribed by the listener after being read out | |||
| loud. | loud. | |||
| [[anchor24: The above paragraph remains controversial as to | ||||
| whether it is valid. The WG will need to make a decision if this | ||||
| section is not dropped entirely.]] | ||||
| 3. Consider the case of a screen reader used by blind Internet users | 3. Design of a screen reader used by blind Internet users who must | |||
| who must listen to renderings of IDN domain names and possibly | listen to renderings of IDN domain names and possibly reproduce | |||
| reproduce them on the keyboard. | them on the keyboard becomes considerably more complicated when | |||
| the names of characters are not obvious and intuitive to anyone | ||||
| familiar with the language in question. | ||||
| 4. As a simplified example of this, assume one wanted to use a | 4. As a simplified example of this, assume one wanted to use a | |||
| "heart" or "star" symbol in a label. This is problematic because | "heart" or "star" symbol in a label. This is problematic because | |||
| those names are ambiguous in the Unicode system of naming (the | those names are ambiguous in the Unicode system of naming (the | |||
| actual Unicode names require far more qualification). A user or | actual Unicode names require far more qualification). A user or | |||
| would-be registrant has no way to know -- absent careful study of | would-be registrant has no way to know -- absent careful study of | |||
| the code tables -- whether it is ambiguous (e.g., where there are | the code tables -- whether it is ambiguous (e.g., where there are | |||
| multiple "heart" characters) or not. Conversely, the user seeing | multiple "heart" characters) or not. Conversely, the user seeing | |||
| the hypothetical label doesn't know whether to read it -- try to | the hypothetical label doesn't know whether to read it -- try to | |||
| transmit it to a colleague by voice -- as "heart", as "love", as | transmit it to a colleague by voice -- as "heart", as "love", as | |||
| skipping to change at page 34, line 24 ¶ | skipping to change at page 33, line 32 ¶ | |||
| o Tests involving the context of characters (e.g., some characters | o Tests involving the context of characters (e.g., some characters | |||
| being permitted only adjacent to others of specific types) and | being permitted only adjacent to others of specific types) and | |||
| integrity tests on complete labels are needed. Unassigned code | integrity tests on complete labels are needed. Unassigned code | |||
| points cannot be permitted because one cannot determine whether | points cannot be permitted because one cannot determine whether | |||
| particular code points will require contextual rules (and what | particular code points will require contextual rules (and what | |||
| those rules should be) before characters are assigned to them and | those rules should be) before characters are assigned to them and | |||
| the properties of those characters fully understood. | the properties of those characters fully understood. | |||
| o It cannot be known in advance, and with sufficient reliability, | o It cannot be known in advance, and with sufficient reliability, | |||
| that a no newly-assigned code point will associated with a | whether a newly-assigned code point will be associated with a | |||
| character that would be disallowed by the rules in | character that would be disallowed by the rules in | |||
| [IDNA2008-Tables] (such as a compatibility character). In | [IDNA2008-Tables] (such as a compatibility character). In | |||
| IDNA2003, since there is no direct dependency on NFKC (many of the | IDNA2003, since there is no direct dependency on NFKC (many of the | |||
| entries in Stringprep's tables are based on NFKC, but IDNA2003 | entries in Stringprep's tables are based on NFKC, but IDNA2003 | |||
| depends only on Stringprep), allocation of a compatibility | depends only on Stringprep), allocation of a compatibility | |||
| character might produce some odd situations, but it would not be a | character might produce some odd situations, but it would not be a | |||
| problem. In IDNA2008, where compatibility characters are | problem. In IDNA2008, where compatibility characters are | |||
| DISALLOWED unless character-specific exceptions are made, | DISALLOWED unless character-specific exceptions are made, | |||
| permitting strings containing unassigned characters to be looked | permitting strings containing unassigned characters to be looked | |||
| up would violate the principle that characters in DISALLOWED are | up would violate the principle that characters in DISALLOWED are | |||
| skipping to change at page 39, line 14 ¶ | skipping to change at page 38, line 15 ¶ | |||
| 12. Acknowledgments | 12. Acknowledgments | |||
| The editor and contributors would like to express their thanks to | The editor and contributors would like to express their thanks to | |||
| those who contributed significant early (pre-WG) review comments, | those who contributed significant early (pre-WG) review comments, | |||
| sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | sometimes accompanied by text, especially Mark Davis, Paul Hoffman, | |||
| Simon Josefsson, and Sam Weiler. In addition, some specific ideas | Simon Josefsson, and Sam Weiler. In addition, some specific ideas | |||
| were incorporated from suggestions, text, or comments about sections | were incorporated from suggestions, text, or comments about sections | |||
| that were unclear supplied by Vint Cerf, Frank Ellerman, Michael | that were unclear supplied by Vint Cerf, Frank Ellerman, Michael | |||
| Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken | Everson, Asmus Freytag, Erik van der Poel, Michel Suignard, and Ken | |||
| Whistler, although, as usual, they bear little or no responsibility | Whistler. Thanks are also due to Vint Cerf, Lisa Dusseault, Debbie | |||
| for the conclusions the editor and contributors reached after | Garside, and Jefsey Morfin for conversations that led to considerable | |||
| receiving their suggestions. Thanks are also due to Vint Cerf, Lisa | improvements in the content of this document. | |||
| Dusseault, Debbie Garside, and Jefsey Morfin for conversations that | ||||
| led to considerable improvements in the content of this document. | ||||
| A meeting was held on 30 January 2008 to attempt to reconcile | A meeting was held on 30 January 2008 to attempt to reconcile | |||
| differences in perspective and terminology about this set of | differences in perspective and terminology about this set of | |||
| specifications between the design team and members of the Unicode | specifications between the design team and members of the Unicode | |||
| Technical Consortium. The discussions at and subsequent to that | Technical Consortium. The discussions at and subsequent to that | |||
| meeting were very helpful in focusing the issues and in refining the | meeting were very helpful in focusing the issues and in refining the | |||
| specifications. The active participants at that meeting were (in | specifications. The active participants at that meeting were (in | |||
| alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | alphabetic order as usual) Harald Alvestrand, Vint Cerf, Tina Dam, | |||
| Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | Mark Davis, Lisa Dusseault, Patrik Faltstrom (by telephone), Cary | |||
| Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | Karp, John Klensin, Warren Kumari, Lisa Moore, Erik van der Poel, | |||
| skipping to change at page 39, line 42 ¶ | skipping to change at page 38, line 41 ¶ | |||
| Useful comments and text on the WG versions of the draft were | Useful comments and text on the WG versions of the draft were | |||
| received from many participants in the IETF "IDNABIS" WG and a number | received from many participants in the IETF "IDNABIS" WG and a number | |||
| of document changes resulted from mailing list discussions made by | of document changes resulted from mailing list discussions made by | |||
| that group. Marcos Sanz provided specific analysis and suggestions | that group. Marcos Sanz provided specific analysis and suggestions | |||
| that were exceptionally helpful in refining the text, as did Vint | that were exceptionally helpful in refining the text, as did Vint | |||
| Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler. | Cerf, Mark Davis, Martin Duerst, Andrew Sullivan, and Ken Whistler. | |||
| Lisa Dusseault provided extensive editorial suggestions during the | Lisa Dusseault provided extensive editorial suggestions during the | |||
| spring of 2009, most of which were incorporated. | spring of 2009, most of which were incorporated. | |||
| As is usual with IETF specifications, while the document represents | ||||
| rough consensus, it should not be assumed that all participants and | ||||
| contributors agree with all provisions. | ||||
| 13. Contributors | 13. Contributors | |||
| While the listed editor held the pen, the core of this document and | While the listed editor held the pen, the core of this document and | |||
| the initial WG version represents the joint work and conclusions of | the initial WG version represents the joint work and conclusions of | |||
| an ad hoc design team consisting of the editor and, in alphabetic | an ad hoc design team consisting of the editor and, in alphabetic | |||
| order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | order, Harald Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. | |||
| In addition, there were many specific contributions and helpful | Considerable material describing mapping principles has been | |||
| comments from those listed in the Acknowledgments section and others | incorporated from a draft of [IDNA2008-Mapping] by Pete Resnick and | |||
| who have contributed to the development and use of the IDNA | Paul Hoffman. In addition, there were many specific contributions | |||
| protocols. | and helpful comments from those listed in the Acknowledgments section | |||
| and others who have contributed to the development and use of the | ||||
| IDNA protocols. | ||||
| 14. References | 14. References | |||
| 14.1. Normative References | 14.1. Normative References | |||
| [ASCII] American National Standards Institute (formerly United | [ASCII] American National Standards Institute (formerly United | |||
| States of America Standards Institute), "USA Code for | States of America Standards Institute), "USA Code for | |||
| Information Interchange", ANSI X3.4-1968, 1968. | Information Interchange", ANSI X3.4-1968, 1968. | |||
| ANSI X3.4-1968 has been replaced by newer versions with | ANSI X3.4-1968 has been replaced by newer versions with | |||
| slight modifications, but the 1968 version remains | slight modifications, but the 1968 version remains | |||
| definitive for the Internet. | definitive for the Internet. | |||
| [IDNA2008-Bidi] | [IDNA2008-Bidi] | |||
| Alvestrand, H. and C. Karp, "An updated IDNA criterion for | Alvestrand, H. and C. Karp, "An updated IDNA criterion for | |||
| right to left scripts", July 2008, <https:// | right to left scripts", August 2009, <https:// | |||
| datatracker.ietf.org/drafts/draft-ietf-idnabis-bidi/>. | datatracker.ietf.org/drafts/draft-ietf-idnabis-bidi/>. | |||
| [IDNA2008-Defs] | [IDNA2008-Defs] | |||
| Klensin, J., "Internationalized Domain Names for | Klensin, J., "Internationalized Domain Names for | |||
| Applications (IDNA): Definitions and Document Framework", | Applications (IDNA): Definitions and Document Framework", | |||
| November 2008, <https://datatracker.ietf.org/drafts/ | August 2009, <https://datatracker.ietf.org/drafts/ | |||
| draft-ietf-idnabis-defs/>. | draft-ietf-idnabis-defs/>. | |||
| [IDNA2008-Protocol] | [IDNA2008-Protocol] | |||
| Klensin, J., "Internationalized Domain Names in | Klensin, J., "Internationalized Domain Names in | |||
| Applications (IDNA): Protocol", November 2008, <https:// | Applications (IDNA): Protocol", August 2009, <https:// | |||
| datatracker.ietf.org/drafts/draft-ietf-idnabis-protocol/>. | datatracker.ietf.org/drafts/draft-ietf-idnabis-protocol/>. | |||
| [IDNA2008-Tables] | [IDNA2008-Tables] | |||
| Faltstrom, P., "The Unicode Code Points and IDNA", | Faltstrom, P., "The Unicode Code Points and IDNA", | |||
| July 2008, <https://datatracker.ietf.org/drafts/ | August 2009, <https://datatracker.ietf.org/drafts/ | |||
| draft-ietf-idnabis-tables/>. | draft-ietf-idnabis-tables/>. | |||
| A version of this document is available in HTML format at | A version of this document is available in HTML format at | |||
| http://stupid.domain.name/idnabis/ | http://stupid.domain.name/idnabis/ | |||
| draft-ietf-idnabis-tables-02.html | draft-ietf-idnabis-tables-06.html | |||
| [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, March 2003. | RFC 3490, March 2003. | |||
| [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode | [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode | |||
| for Internationalized Domain Names in Applications | for Internationalized Domain Names in Applications | |||
| (IDNA)", RFC 3492, March 2003. | (IDNA)", RFC 3492, March 2003. | |||
| [Unicode-UAX15] | [Unicode-UAX15] | |||
| skipping to change at page 41, line 31 ¶ | skipping to change at page 40, line 40 ¶ | |||
| There are several forms and variations and a closely- | There are several forms and variations and a closely- | |||
| related standard, CNS 11643. See the discussion in | related standard, CNS 11643. See the discussion in | |||
| Chapter 3 of Lunde, K., CJKV Information Processing, | Chapter 3 of Lunde, K., CJKV Information Processing, | |||
| O'Reilly & Associates, 1999 | O'Reilly & Associates, 1999 | |||
| [GB18030] "Chinese National Standard GB 18030-2000: Information | [GB18030] "Chinese National Standard GB 18030-2000: Information | |||
| Technology -- Chinese ideograms coded character set for | Technology -- Chinese ideograms coded character set for | |||
| information interchange -- Extension for the basic set.", | information interchange -- Extension for the basic set.", | |||
| 2000. | 2000. | |||
| [IDNA2008-Mapping] | ||||
| Resnick, P., "Mapping Characters in IDNA", August 2009, <h | ||||
| ttps://datatracker.ietf.org/drafts/ | ||||
| draft-ietf-idnabis-mapping/>. | ||||
| [RFC0810] Feinler, E., Harrenstien, K., Su, Z., and V. White, "DoD | [RFC0810] Feinler, E., Harrenstien, K., Su, Z., and V. White, "DoD | |||
| Internet host table specification", RFC 810, March 1982. | Internet host table specification", RFC 810, March 1982. | |||
| [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet | [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet | |||
| host table specification", RFC 952, October 1985. | host table specification", RFC 952, October 1985. | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, November 1987. | STD 13, RFC 1034, November 1987. | |||
| [RFC1035] Mockapetris, P., "Domain names - implementation and | [RFC1035] Mockapetris, P., "Domain names - implementation and | |||
| skipping to change at page 46, line 9 ¶ | skipping to change at page 45, line 18 ¶ | |||
| o Clarified relationship to base DNS specifications. | o Clarified relationship to base DNS specifications. | |||
| o Consolidated discussion of lookup of unassigned characters. | o Consolidated discussion of lookup of unassigned characters. | |||
| o More editorial fine-tuning. | o More editorial fine-tuning. | |||
| A.7. Version -07 | A.7. Version -07 | |||
| o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label | o Revised terminology by adding terms: NR-LDH-label, Invalid-A-label | |||
| (or False-A-label), R-LDH-label, valid IDNA-label in | (or False-A-label), R-LDH-label, valid IDNA-label in | |||
| Section 1.3.3. | Section 1.3.2. | |||
| o Moved the "name server considerations" material to this document | o Moved the "name server considerations" material to this document | |||
| from Protocol because it is non-normative and not part of the | from Protocol because it is non-normative and not part of the | |||
| protocol itself. | protocol itself. | |||
| o To improve clarity, redid discussion of the reasons why looking up | o To improve clarity, redid discussion of the reasons why looking up | |||
| unassigned code points is prohibited. | unassigned code points is prohibited. | |||
| o Editorial and other non-substantive corrections to reflect earlier | o Editorial and other non-substantive corrections to reflect earlier | |||
| errors as well as new definitions and terminology. | errors as well as new definitions and terminology. | |||
| skipping to change at page 47, line 16 ¶ | skipping to change at page 46, line 23 ¶ | |||
| o Extensive editorial improvements, mostly due to suggestions from | o Extensive editorial improvements, mostly due to suggestions from | |||
| Lisa Dusseault. | Lisa Dusseault. | |||
| o Changes required for the new "mapping" approach and document have, | o Changes required for the new "mapping" approach and document have, | |||
| in general, not been incorporated despite several suggestions. | in general, not been incorporated despite several suggestions. | |||
| The editor intends to wait until the mapping model is stable, or | The editor intends to wait until the mapping model is stable, or | |||
| at least until -11 of this document, before trying to incorporate | at least until -11 of this document, before trying to incorporate | |||
| those suggestions. | those suggestions. | |||
| A.11. Version -11 | ||||
| o Several placeholders for additional material or editing have been | ||||
| removed since no comments have been received. | ||||
| o Updated references. | ||||
| o Corrected an apparent patching error in Section 1.6 and another | ||||
| one in Section 4.3. | ||||
| o Adjusted several sections that had not properly reflected removal | ||||
| of the material that is now in the Definitions document and | ||||
| removed an unnecessary one. | ||||
| o New material added to Section 3.2 about registration policy issues | ||||
| to reflect discussions on the mailing list. | ||||
| o Incorporated mapping material from the former "Architectural | ||||
| Principles" of version -01 of the Mapping draft into Section 6 and | ||||
| removed most of the prior mapping material and explanations. | ||||
| o Eliminated the former Section 7.3 ("More Flexibility in User | ||||
| Agents"), moving its material into Section 4.2. The replacement | ||||
| section is basically a placeholder to retain the mapping issues as | ||||
| one of the migration topics. Note that this item and the previous | ||||
| one involve considerable text, so people should check things | ||||
| carefully. | ||||
| o Corrected several typographical and editorial errors that don't | ||||
| fall into any of the above categories. | ||||
| Author's Address | Author's Address | |||
| John C Klensin | John C Klensin | |||
| 1770 Massachusetts Ave, Ste 322 | 1770 Massachusetts Ave, Ste 322 | |||
| Cambridge, MA 02140 | Cambridge, MA 02140 | |||
| USA | USA | |||
| Phone: +1 617 245 1457 | Phone: +1 617 245 1457 | |||
| Email: john+ietf@jck.com | Email: john+ietf@jck.com | |||
| End of changes. 78 change blocks. | ||||
| 420 lines changed or deleted | 427 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||