| < draft-faltstrom-unicode11-05.txt | draft-faltstrom-unicode11-06.txt > | |||
|---|---|---|---|---|
| Network Working Group P. Faltstrom | Network Working Group P. Faltstrom | |||
| Internet-Draft Netnod | Internet-Draft Netnod | |||
| Intended status: Informational December 01, 2018 | Intended status: Standards Track December 09, 2018 | |||
| Expires: June 4, 2019 | Expires: June 12, 2019 | |||
| IDNA2008 and Unicode 11.0.0 | IDNA2008 and Unicode 11.0.0 | |||
| draft-faltstrom-unicode11-05 | draft-faltstrom-unicode11-06 | |||
| Abstract | Abstract | |||
| This document describes changes between Unicode 6.3.0 and Unicode | This document describes the changes between Unicode 6.3.0 and Unicode | |||
| 11.0.0 in the context of IDNA2008. It further suggests for the IETF | 11.0.0 in the context of IDNA2008. It further suggests a path | |||
| a path forward regarding ensuring IDNA2008 follows the evolution of | forward for the IETF to ensure IDNA2008 follows the evolution of the | |||
| the Unicode Standard. | Unicode Standard. | |||
| In a few cases changes have been made in the Unicode Standard related | Some changes have been made in the Unicode Standard related to the | |||
| to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to | algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions to | |||
| add exceptions for backward compatibility to the algorithm but the | the algorithm for backward compatibility; however, this document | |||
| conclusions provided in this document suggests no such changes. | makes no such changes. Thus this document requests that IANA update | |||
| the tables to Unicode 11. | ||||
| Thus this document requests that IANA update the tables to Unicode | The document also recomments that all DNS registries continue the | |||
| 11. | practice of calculating a repertoire using conservatism and inclusion | |||
| principles. | ||||
| In addition, all registries should continue the practice of | TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC: | |||
| calculating a repertoire using conservatism and inclusion principles. | ||||
| This document is discussed on the i18nrp@ietf.org mailing list of the | ||||
| IETF. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on June 4, 2019. | This Internet-Draft will expire on June 12, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 27 ¶ | skipping to change at page 2, line 27 ¶ | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4 | 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4 | |||
| 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 | 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 | |||
| 3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 | 3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 4. Notable changes between Unicode 6.3.0 and 11.0.0 . . . . . . 6 | 4. Notable Changes Between Unicode 6.3.0 and 11.0.0 . . . . . . 6 | |||
| 4.1. Changes to Unicode 7.0.0 . . . . . . . . . . . . . . . . 6 | 4.1. Changes in Unicode 7.0.0 . . . . . . . . . . . . . . . . 6 | |||
| 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 7 | 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6 | |||
| 4.3. Changes to Unicode 11.0.0 . . . . . . . . . . . . . . . . 7 | 4.3. Changes in Unicode 11.0.0 . . . . . . . . . . . . . . . . 6 | |||
| 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 | |||
| 9.2. Non-normative references . . . . . . . . . . . . . . . . 10 | 9.2. Non-normative references . . . . . . . . . . . . . . . . 9 | |||
| Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12 | Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12 | |||
| Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15 | Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15 | |||
| Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16 | Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16 | |||
| Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17 | Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17 | |||
| Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18 | Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18 | |||
| Appendix F. Code points in Unicode Character Database (UCD) | Appendix F. Code points in Unicode Character Database (UCD) | |||
| format for Unicode 11.0.0 . . . . . . . . . . . . . 20 | format for Unicode 11.0.0 . . . . . . . . . . . . . 20 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79 | |||
| 1. Introduction | 1. Introduction | |||
| The current version of Internationalized Domain Names for | The current version of Internationalized Domain Names for | |||
| Applications (IDNA) was largely completed in 2008, known within the | Applications (IDNA) was largely completed in 2008, and is thus known | |||
| series and elsewhere as "IDNA2008" and is specified in a series of | within as "IDNA2008". It is specified in a series of documents | |||
| documents (see Section Section 3.1). The standard include an | listed in Section 3.1. The IDNA2008 standard includes an algorithm | |||
| algorithm by which a derived property value is calculated based on | by which a derived property value is calculated based on the | |||
| the properties defined in the Unicode Standard. | properties defined from the Unicode Standard. | |||
| When the Unicode Standard is updated code points are assigned and | When the Unicode Standard is updated, new code points are assigned | |||
| property values might be changed for already assigned code points. | and already-assigned code points can have their property values | |||
| changed. | ||||
| Assigning code points might create problems if the newly assigned | o Assigning code points can create problems if the newly-assigned | |||
| code points are compositions of code points so that it either changes | code points are compositions of code points changes (or would have | |||
| or would have changed the normalization functions. This is because | changed) the normalization functions. These problems can arise if | |||
| it changes the matching algorithms used which in turn might create | the new code points change the matching algorithms used and this | |||
| problems looking up already stored strings in for example DNS. | in turn creates problems looking up already stored strings. | |||
| Changing properties for already assigned code points might create | o Changing properties for already-assigned code points can create | |||
| problems if the change results in the derived property value changes. | problems if the property change results in changes to the derived | |||
| This might make an earlier allowed code point (derived property value | property value. This might make an earlier allowed code point | |||
| PVALID) not be allowed anymore (derived property value DISALLOWED). | whose derived property value is PVALID to then not be allowed | |||
| Or the other way around, a code point that was not allowed (and | anymore if its derived property value changes to DISALLOWED. The | |||
| because of that blocked in some situations) suddenly end up being | problem can also happen the other way around: a code point that | |||
| allowed. | was not allowed (and thus is blocked in some situations) to | |||
| suddenly end up being allowed. | ||||
| Historically the IETF has accepted all implications of changes in the | Historically, the IETF has accepted all implications of changes in | |||
| Unicode Standard even though the changes have resulted in problematic | the Unicode Standard even though the changes have resulted in | |||
| changes in the derived property value. The primary reason for that | problematic changes in the derived property value. The primary | |||
| is that staying with the Unicode Standard has been viewed as | reason for that choice is that staying with the Unicode Standard has | |||
| important given the diversity in implementations already existing in | been viewed as important because of the diversity of implementations | |||
| the wild. | already existing in the wild. | |||
| As described in Section 4, a few changes have been made regarding | As described in Section 4, a few changes have been made regarding | |||
| certain attributes to code points in Unicode between version 6.3.0 | certain attributes to code points in Unicode between version 6.3.0 | |||
| and 11.0.0. Such changes could result in either a change in the | and 11.0.0. Such changes could result in a change in the derived | |||
| derived property value for the code point in question or no such | property value for the code point in question. If a change occurs, | |||
| change. In turn, if the result is a change, it can be between any of | and it is between any of the derived property values except | |||
| the derived property values except DISALLOWED. Also in this case, | DISALLOWED, there is not a problem. This document concludes that no | |||
| when moving from version 6.3.0 to 11.0.0, this document concludes | exceptions are to be added to IDNA2008 even if changes in the derived | |||
| that no exceptions are to be added to IDNA2008 even if changes in the | property value is a result of the changes made in Unicode between | |||
| derived property value is a result of the changes made in Unicode. | version 6.3.0 and 11.0.0. | |||
| Specifically, the Internet Architecture Board did issue a statement | In 2015, the Internet Architecture Board (IAB) issued a statement | |||
| [IAB] which requested IETF to resolve the issues related to the code | [IAB] which requested the IETF to resolve the issues related to the | |||
| point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in | code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) that was | |||
| Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and | introduced in Unicode 7.0.0 [Unicode-7.0.0]. The current document | |||
| suggests IDNA2008 standard is to follow the Unicode Standard and not | resolves this issue and suggests that the IDNA2008 standard followsO | |||
| update RFC 5892 [RFC5892] or any other IDNA2008 RFCs. | the Unicode Standard and not update RFC 5892 [RFC5892] or any other | |||
| IDNA2008 RFCs. | ||||
| 2. Keywords for Requirement Levels | 2. Keywords for Requirement Levels | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they | 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they | |||
| appear in all capitals, as shown here. | appear in all capitals, as shown here. | |||
| 3. Background | 3. Background | |||
| 3.1. IDNA2008 Documents | 3.1. IDNA2008 Documents | |||
| IDNA2008 consists of the following documents: | IDNA2008 consists of the following documents. The documents in the | |||
| set have informal names. | ||||
| o A document, RFC 5890 [RFC5890], containing definitions and other | o RFC 5890 [RFC5890], informally called "Defs" or "Definitions", | |||
| material that are needed for understanding other documents in the | contains definitions and other material that are needed for | |||
| set. It is referred to informally in other documents in the set | understanding other documents in the set. | |||
| as "Defs" or "Definitions". | ||||
| o A document, RFC 5891 [RFC5891], that describes the core IDNA2008 | o RFC 5891 [RFC5891], informally called "Protocol", describes the | |||
| protocol and its operations. It is to be interpreted in | core IDNA2008 protocol and its operations. It needs to be | |||
| combination with the Bidi document, described immediately below. | interpreted in combination with the Bidi document (described | |||
| It is referred to informally in other documents in the set as | below). | |||
| "Protocol". | ||||
| o A specification, RFC 5892 [RFC5892], of the categories and rules | o RFC 5892 [RFC5892], informally called "Tables", lists the | |||
| that identify the code points allowed in a label written in native | categories and rules that identify the code points allowed in a | |||
| character form (defined more specifically as a "U-label"), based | label written in native character form (called a "U-label"), an is | |||
| originally on Unicode 5.2.0 [Unicode-5.2.0] code point assignments | based originally on Unicode 5.2.0 [Unicode-5.2.0] code point | |||
| and additional rules unique to IDNA2008. The Unicode-based rules | assignments and additional rules unique to IDNA2008. The Unicode- | |||
| are expected to be stable across Unicode updates and hence | based rules in RFC 4892 are expected to be stable across Unicode | |||
| independent of Unicode versions. That specification obsoletes RFC | updates and hence independent of Unicode versions. RFC 5892 | |||
| 3491 [RFC3491] and IDN use of the tables to which it refers. It | obsoletes RFC 3491 [RFC3491], and in particular the use of the | |||
| is referred to informally in other documents in the set as | tables to which it refers. | |||
| "Tables". | ||||
| o A document, RFC 5893 [RFC5893], that specifies special rules | o RFC 5893 [RFC5893], informally called "Bidi", specifies special | |||
| (Bidi) for labels that contain characters that are written from | rules for labels that contain characters that are written from | |||
| right to left. | right to left. | |||
| o A document, RFC 5894 [RFC5894], that provides an overview of the | o RFC 5894 [RFC5894], informally called "Rationale", provides an | |||
| protocol and associated tables together with explanatory material | overview of the protocol and associated tables, and gives | |||
| and some rationale for the decisions that led to IDNA2008. That | explanatory material and some rationale for the decisions that led | |||
| document also contains advice for registry operations and those | to IDNA2008. It also contains advice for DNS registry operators | |||
| who use Internationalized Domain Names (IDNs). It is referred to | and others who use Internationalized Domain Names (IDNs). | |||
| informally in other documents in the set as "Rationale". | ||||
| o A document, RFC 5895 [RFC5895], that discusses the issue of | o RFC 5895 [RFC5895], informally called "Mapping", discusses the | |||
| mapping characters into other characters and that provides | issue of mapping characters into other characters and that | |||
| guidance for doing so when that is appropriate. That document, | provides guidance for doing so when that is appropriate. RFC 5895 | |||
| referred to informally as "Mapping", provides advice; it is not a | provides advice and is not a required part of IDNA. | |||
| required part of IDNA. | ||||
| o A document, RFC 6452 [RFC6452], that looks at some changes made to | o RFC 6452 [RFC6452] describes some changes made to Unicode 6.0.0 | |||
| Unicode 6.0.0 [Unicode-6.0.0] that resulted in the derived | [Unicode-6.0.0] that resulted in the derived property value change | |||
| property value change for the code points U+0CF1, U+0CF2 and | for the code points U+0CF1, U+0CF2 and U+19DA. U+0CF1 and U+0CF2 | |||
| U+19DA. The first two changed from DISALLOWED to PVALID, the last | changed from DISALLOWED to PVALID, while U+19DA changed from | |||
| from PVALID to DISSALOWED. IETF came to the conclusion that no | PVALID to DISSALOWED. The IETF concluded that no update to RFC | |||
| update is needed to RFC 5892 [RFC5892] based on the changes made | 5892 [RFC5892] was needed based on the changes made in Unicode | |||
| in Unicode 6.0.0 [Unicode-6.0.0]. As a result, the derived | 6.0.0 [Unicode-6.0.0]. As a result, the derived property value | |||
| property value remained aligned with the Unicode Standard. | remained aligned with the Unicode Standard. | |||
| 3.2. Deployment | 3.2. Deployment | |||
| The deployment of IDNA2008 is unfortunately quite diverse. The | The level of deployment of IDNA2008 is unfortunately quite diverse. | |||
| following lists some of the strategies that existing implementations | The following lists some of the strategies that existing | |||
| are known to implement: | implementations are known to use: | |||
| o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] | o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] | |||
| which implies using a table within which it is said whether code | which implies using a table within which it is said whether code | |||
| points are allowed to be used or not, after doing the | points are allowed to be used or not, after doing the | |||
| normalization specified in IDNA2003. | normalization specified in IDNA2003. | |||
| o A mix between IDNA2003 and IDNA2008 where code points assigned to | o A mix between IDNA2003 and IDNA2008 where code points assigned to | |||
| Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | |||
| value calculated according to the algorithm specified in IDNA2008. | value calculated according to the algorithm specified in IDNA2008. | |||
| o Strict IDNA2008 following IANA which implies staying at Unicode | o Strict IDNA2008 following the current IANA tables, which implies | |||
| 6.3.0 [Unicode-6.3.0] and treating later assigned code points as | staying at Unicode 6.3.0 [Unicode-6.3.0] and treating later | |||
| UNASSIGNED. | assigned code points as UNASSIGNED. | |||
| o The IDNA2008 algorithm applied to whatever version of Unicode | o The IDNA2008 algorithm applied to whatever version of Unicode | |||
| Standard exists in the operating system and/or libraries used, | Standard exists in the operating system and/or libraries used, | |||
| regardless of whether the version is later than Unicode version | regardless of whether the version is later than Unicode version | |||
| 6.3.0 or not. | 6.3.0. | |||
| o A mix between IDNA2003 and IDNA2008 according to local | o A mix between IDNA2003 and IDNA2008 according to local | |||
| interpretation of the Unicode Technical Standard #46 [UTS-46]. | interpretation of the Unicode Technical Standard #46 [UTS-46]. | |||
| The issue is further complicated by having a very diverse | The issue is further complicated by having diverse implementations of | |||
| implementations of the requirements in RFC 5894 [RFC5894] by registry | the requirements in RFC 5894 [RFC5894] by DNS registry operators, | |||
| operators based on the IDNA2008 specification to create additional | based on the IDNA2008 specification, but with additional rules for | |||
| rules for what code points are allowed to be used for registration. | the specific code points that are allowed for registration. | |||
| In practice, the Unicode Consortium creates a maximum set of code | In practice, the Unicode Consortium creates a maximum set of code | |||
| points by assigning code points in the Unicode Standard. The | points by assigning code points in the Unicode Standard. The | |||
| IDNA2008 rules based on the Unicode Standard create a subset of these | IDNA2008 rules based on the Unicode Standard create a subset of these | |||
| by assigning the PVALID derived property value to them. Registries | by assigning the PVALID derived property value to them. DNS | |||
| (and others dealing with Internationalized Domain Names) are supposed | registries and other organizations that deal with IDNs are supposed | |||
| to create an even smaller subset that ultimately is the set of code | to create their own subsets from IDNA2008 for use by those registries | |||
| points that can be used in a particular registry. | and organizations. | |||
| There is further recommendation to be conservative when these subsets | SAC-084 [SAC-084] and RFC 6912 [RFC6912] recommend to DNS registries | |||
| are calculated and to use the inclusion principle; this is explained | and other organizations to be conservative when creating their | |||
| in SAC-084 [SAC-084] and RFC 6912 [RFC6912]. | subsets are calculated, and to use the principle of creating subsets | |||
| by inclusion. | ||||
| 4. Notable changes between Unicode 6.3.0 and 11.0.0 | 4. Notable Changes Between Unicode 6.3.0 and 11.0.0 | |||
| 4.1. Changes to Unicode 7.0.0 | 4.1. Changes in Unicode 7.0.0 | |||
| The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was | The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was | |||
| introduced in Unicode 7.0.0. This was discussed in the IETF | introduced in Unicode 7.0.0. This was discussed extensively in the | |||
| extensively and by IAB in their statement [IAB] requesting the IETF | IETF, and by the IAB in their statement [IAB] requesting the IETF to | |||
| to investigate the issue. Specifically IAB stated: | investigate the issue. Specifically, the IAB stated: | |||
| On the same precautionary principle, the IAB recommends that the | On the same precautionary principle, the IAB recommends that the | |||
| Internationalized Domain Names for Applications (IDNA) Parameters | Internationalized Domain Names for Applications (IDNA) Parameters | |||
| registry (http://www.iana.org/assignments/idna-tables/) not be | registry (http://www.iana.org/assignments/idna-tables/) not be | |||
| updated to Unicode 7.0.0 until the IETF has consensus on a | updated to Unicode 7.0.0 until the IETF has consensus on a | |||
| solution to this problem. | solution to this problem. | |||
| The discussion in the IETF concluded that although it is possible to | The discussion in the IETF concluded that although it is possible to | |||
| create "the same" character in multiple ways, the issue with U+08A1 | create "the same" character in multiple ways, the issue with U+08A1 | |||
| is not unique. In the case of U+08A1, it can be represented with the | is not unique. The character U+08A1 can be represented with the | |||
| sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). | sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). | |||
| Just like LATIN SMALL LETTER A WITH DIAERESIS (U+00E4) can be | This identical to LATIN SMALL LETTER A WITH DIAERESIS (U+00E4), that | |||
| represented via the sequence LATIN SMALL LETTER A (U+0061), and | can be represented with the sequence LATIN SMALL LETTER A (U+0061) | |||
| COMBINING DIAERESIS (U+0308). One difference between these sequences | followed by COMBINING DIAERESIS (U+0308). One difference between | |||
| is how they are treated in the normalization forms specified by the | these two sequences is how they are treated in the normalization | |||
| Unicode Consortium. | forms specified by the Unicode Consortium. | |||
| As U+08A1 is discussed in draft-freytag-troublesome-characters | U+08A1 is discussed in draft-freytag-troublesome-characters | |||
| [I-D.freytag-troublesome-characters] and elsewhere. Regardless of | [I-D.freytag-troublesome-characters] and other Internet Drafts. | |||
| whether those discussions ends in recommending including the code | Regardless of whether the discussion of those drafts ends in | |||
| point in the repertoire of characters permissable for registration or | recommendations to include the code point in the repertoire of | |||
| not, it is acceptable to allow the code point to have a derived | characters permissable for registration or not, it is still | |||
| property value of PVALID. | acceptable to allow the code point to have a derived property value | |||
| of PVALID. | ||||
| 4.2. Changes between Unicode 7.0.0 and 10.0.0 | 4.2. Changes between Unicode 7.0.0 and 10.0.0 | |||
| There are no changes made to Unicode between version 7.0.0 and 10.0.0 | There are no changes made to Unicode between version 7.0.0 and 10.0.0 | |||
| that impact IDNA2008 calculation of the derived property value. | that impact IDNA2008 calculation of the derived property value. | |||
| 4.3. Changes to Unicode 11.0.0 | 4.3. Changes in Unicode 11.0.0 | |||
| The Unicode Standard Version 11.0.0 [Unicode-11.0.0] has included a | The Unicode Standard Version 11.0.0 [Unicode-11.0.0] has included a | |||
| number of changes [Changes-11.0.0] from version 10.0.0, specifically | number of changes [Changes-11.0.0] from version 10.0.0. | |||
| to UnicodeData.txt: | ||||
| o Entries were added for the 684 new characters, including letters, | o 684 new characters were added, including letters, combining marks, | |||
| combining marks, digits, symbols, and punctuation marks. | digits, symbols, and punctuation marks. | |||
| o Georgian letters in the ranges U+10D0..U+10FA, U+10FD..U+10FF were | o Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF | |||
| changed from Lo to Ll, to reflect their status as the lowercase of | had their General Properties changed from Lo to Ll, to reflect | |||
| new Georgian case pairs. Case mappings were also added. | their status as the lowercase of new Georgian case pairs. Case | |||
| mappings were also added. | ||||
| o U+111C9 SHARADA SANDHI MARK was changed from Po to Mn, and from | o SHARADA SANDHI MARK (U+111C9 ) was changed from Po to Mn, and from | |||
| bc=L to bc=NSM. | bc=L to bc=NSM. | |||
| o U+11A07 ZANABAZAR SQUARE VOWEL SIGN AI and U+11A08 ZANABZAR SQUARE | o The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | |||
| VOWEL SIGN AU were corrected from Mc to Mn. | ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to | |||
| Mn. | ||||
| o U+29A1 SPHERICAL ANGLE OPENING UP was changed to Bidi_M=N. | o SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N. | |||
| These changes to the Unicode Standard have the following implications | These changes to the Unicode Standard have the following implications | |||
| for these code points: | for these code points: | |||
| o The newly assigned 684 characters are to have a derived property | o The newly assigned 684 characters are to have a derived property | |||
| value as of a result of applying the IDNA2008 algorithm. | value as of a result of applying the IDNA2008 algorithm. | |||
| o The Georgian letters in the ranges U+10D0..U+10FA and | o The Georgian letters in the ranges U+10D0..U+10FA and | |||
| U+10FD..U+10FF have existed since before IDNA2008 was created. | U+10FD..U+10FF existed before IDNA2008 was created. Applying the | |||
| Applying the IDNA2008 algorithm to the code points did assign the | IDNA2008 algorithm to the code points assigned the derived | |||
| derived property value PVALID and that value is unchanged even if | property value PVALID, and that value is unchanged even if the | |||
| the underlying Unicode properties have changed. | underlying Unicode properties have changed. | |||
| o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 | o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 | |||
| [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code | [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code | |||
| point did assign the derived property value DISALLOWED. The | point assigned the derived property value DISALLOWED. The changes | |||
| changes in the underlying properties in the Unicode Standard | in the underlying properties in the Unicode Standard Version | |||
| Version 11.0.0 [Unicode-11.0.0] make the derived property value | 11.0.0 [Unicode-11.0.0] caused the derived property value to | |||
| change to PVALID which is an acceptable change. | change to PVALID, which is an acceptable change. | |||
| o The characters U+11A07 ZANABAZAR SQUARE VOWEL SIGN AI and U+11A08 | o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | |||
| ZANABZAR SQUARE VOWEL SIGN AU were added to Unicode 10.0.0 | ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode | |||
| [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the code | 10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the | |||
| points did assign the derived property value PVALID and that value | code points assigned the derived property value PVALID, and that | |||
| is unchanged even if the underlying Unicode properties have | value is unchanged even if the underlying Unicode properties have | |||
| changed. | changed. | |||
| o U+29A1 SPHERICAL ANGLE OPENING UP have existed since before | o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was | |||
| IDNA2008 was created. Applying the IDNA2008 algorithm to the code | created. Applying the IDNA2008 algorithm to the code point | |||
| point did assign the derived property value PVALID and that value | assigned the derived property value PVALID, and that value is | |||
| is unchanged even if the underlying Unicode properties have | unchanged even if the underlying Unicode properties have changed. | |||
| changed. | ||||
| 5. Conclusion | 5. Conclusion | |||
| As described in Section 4 changes have been made to Unicode between | As described in Section 4, changes have been made to Unicode between | |||
| version 6.3.0 and 11.0.0. Some changes to specific characters | version 6.3.0 and 11.0.0. Some changes to specific characters | |||
| changed their derived property value. Others did not. Given the | changed their derived property value, while other changes did not. | |||
| diverse deployment described in Section 3.2 and the changes | Given the diverse deployment described in Section 3.2 and the changes | |||
| described, including implications to normalization, the conclusion is | described, including implications to normalization, the conclusion of | |||
| to not add any exception rules to IDNA2008. | this document is to not add any exception rules to IDNA2008. | |||
| To increase overall harmonization in the use of internationalized | To increase overall harmonization in the use of IDNs, this document | |||
| domain names, the author recommends that the derived property values | recommends that the derived property values MUST be calculated as | |||
| MUST be calculated as specified in the documents listed in section | specified in the documents listed in section Section 3.1 and with the | |||
| Section 3.1 also with code points in Unicode Version 11.0.0 | code points in Unicode Version 11.0.0 [Unicode-11.0.0]. | |||
| [Unicode-11.0.0]. | ||||
| All registries (and others) SHOULD calculate a repertoire using the | All DNS registries (and other organizatios) SHOULD calculate a | |||
| conservatism and inclusion principles as laid out for example in in | repertoire using the conservatism and inclusion principles, as | |||
| SAC-084 [SAC-084]. | described in SAC-084 [SAC-084] and similar documents. | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| IANA is requested to update the registry of derived property values | IANA is requested to update the IDNA Parameters registry of derived | |||
| after validation with the Appointed Expert that the derived property | property values, after the expert reviewer validates that the derived | |||
| values are calculated correctly. | property values are calculated correctly. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| This document makes recommendations regarding the use of the IDNA2008 | This document makes recommendations regarding the use of the IDNA2008 | |||
| algorithm for calculation of derived property values, based on the | algorithm for calculation of derived property values, based on the | |||
| current Unicode version. It also recommends that registries (and | current Unicode version. It also recommends that DNS registries (and | |||
| others dealing with Internationalized Domain Names) explicitly select | others dealing with Internationalized Domain Names) explicitly select | |||
| appropriate subsets of characters with the derived value of PVALID. | appropriate subsets of characters with the derived value of PVALID. | |||
| Not following these recommendations can lead to various security | Not following these recommendations can lead to various security | |||
| issues. Specifically, allowing confusable characters may lead to | issues. Specifically, allowing confusable characters may lead to | |||
| various phishing attacks. See Security Consideration Sections in the | various phishing attacks, as described in the Security Consideration | |||
| documents listed in section Section 3.1. | Sections in the documents listed in section Section 3.1. | |||
| 8. Acknowledgements | 8. Acknowledgements | |||
| Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik | Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik | |||
| Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | |||
| input to this document. | input to this document. | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | |||
| Profile for Internationalized Domain Names (IDN)", | Profile for Internationalized Domain Names (IDN)", | |||
| RFC 3491, DOI 10.17487/RFC3491, March 2003, | RFC 3491, DOI 10.17487/RFC3491, March 2003, | |||
| End of changes. 56 change blocks. | ||||
| 188 lines changed or deleted | 191 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||