| < draft-faltstrom-unicode11-07.txt | draft-faltstrom-unicode11-08.txt > | |||
|---|---|---|---|---|
| Network Working Group P. Faltstrom | Network Working Group P. Faltstrom | |||
| Internet-Draft Netnod | Internet-Draft Netnod | |||
| Intended status: Standards Track January 07, 2019 | Intended status: Standards Track March 11, 2019 | |||
| Expires: July 11, 2019 | Expires: September 12, 2019 | |||
| IDNA2008 and Unicode 11.0.0 | IDNA2008 and Unicode 11.0.0 | |||
| draft-faltstrom-unicode11-07 | draft-faltstrom-unicode11-08 | |||
| Abstract | Abstract | |||
| This document describes the changes between Unicode 6.3.0 and Unicode | This document describes the changes between Unicode 6.3.0 and Unicode | |||
| 11.0.0 in the context of IDNA2008. It further suggests a path | 11.0.0 in the context of IDNA2008. Some additions and changes have | |||
| forward for the IETF to ensure IDNA2008 follows the evolution of the | been made in the Unicode Standard that affect the values produced by | |||
| Unicode Standard. | the algorithm IDNA2008 specifies. Although IDNA2008 allows adding | |||
| exceptions to the algorithm for backward compatibility; however, this | ||||
| Some changes have been made in the Unicode Standard related to the | document does not add any such exceptions. This document provides | |||
| algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions to | the necessary tables to IANA to make its database consisstent with | |||
| the algorithm for backward compatibility; however, this document | Unicode 11.0.0. | |||
| makes no such changes. Thus this document requests that IANA update | ||||
| the tables to Unicode 11. | ||||
| The document also recomments that all DNS registries continue the | To improve understanding, this document describes systems that are | |||
| practice of calculating a repertoire using conservatism and inclusion | being used as alternatives to those that conform to IDNA2008. | |||
| principles. | ||||
| TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC: | TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC: | |||
| This document is discussed on the i18nrp@ietf.org mailing list of the | This document is discussed on the i18nrp@ietf.org mailing list of the | |||
| IETF. | IETF. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| skipping to change at page 1, line 48 ¶ | skipping to change at page 1, line 45 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on July 11, 2019. | This Internet-Draft will expire on September 12, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4 | 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 4 | |||
| 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 | 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 | |||
| 3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 | 3.2. Additional important IDNA2008-related documents . . . . . 5 | |||
| 4. Notable Changes Between Unicode 6.3.0 and 11.0.0 . . . . . . 6 | 3.3. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 4.1. Changes in Unicode 7.0.0 . . . . . . . . . . . . . . . . 6 | 4. Notable Changes Between Unicode 6.2.0 and 11.0.0 . . . . . . 7 | |||
| 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6 | 4.1. Changes between Unicode 6.2.0 and 7.0.0 . . . . . . . . . 7 | |||
| 4.3. Changes in Unicode 11.0.0 . . . . . . . . . . . . . . . . 6 | 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 8 | |||
| 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 4.3. Changes between Unicode 10.0.0 and 11.0.0 . . . . . . . . 8 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 5. U+111C9 SHARADA SANDHI MARK . . . . . . . . . . . . . . . . . 9 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 9.2. Non-normative references . . . . . . . . . . . . . . . . 9 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 | |||
| Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15 | 10.2. Non-normative references . . . . . . . . . . . . . . . . 12 | |||
| Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16 | Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 14 | |||
| Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17 | Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 17 | |||
| Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18 | Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 18 | |||
| Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 20 | ||||
| Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 21 | ||||
| Appendix F. Code points in Unicode Character Database (UCD) | Appendix F. Code points in Unicode Character Database (UCD) | |||
| format for Unicode 11.0.0 . . . . . . . . . . . . . 20 | format for Unicode 11.0.0 . . . . . . . . . . . . . 22 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 81 | |||
| 1. Introduction | 1. Introduction | |||
| The current version of Internationalized Domain Names for | The current version of Internationalized Domain Names for | |||
| Applications (IDNA) was largely completed in 2008, and is thus known | Applications (IDNA) was initiated in 2008, and despite not being | |||
| within as "IDNA2008". It is specified in a series of documents | completed until 2010, is widely known as "IDNA2008". It is specified | |||
| listed in Section 3.1. The IDNA2008 standard includes an algorithm | in the series of documents listed in Section 3.1. The IDNA2008 | |||
| by which a derived property value is calculated based on the | standard includes an algorithm by which a derived property value is | |||
| properties defined from the Unicode Standard. | calculated based on the properties defined from the Unicode Standard. | |||
| When the Unicode Standard is updated, new code points are assigned | When the Unicode Standard is updated, new code points are assigned | |||
| and already-assigned code points can have their property values | and already-assigned code points can have their property values | |||
| changed. | changed. | |||
| o Assigning code points can create problems if the newly-assigned | o Assigning code points can create problems if the newly-assigned | |||
| code points are compositions of code points changes (or would have | code points are compositions of existing code points and because | |||
| changed) the normalization functions. These problems can arise if | of that the normalization relationships associated with those code | |||
| the new code points change the matching algorithms used and this | points should have been changed. | |||
| in turn creates problems looking up already stored strings. | ||||
| o Changing properties for already-assigned code points can create | o Changing properties for already-assigned code points can create | |||
| problems if the property change results in changes to the derived | problems if the property change results in changes to the derived | |||
| property value. This might make an earlier allowed code point | property value. This might make an earlier allowed code point | |||
| whose derived property value is PVALID to then not be allowed | whose derived property value is PVALID to then not be allowed | |||
| anymore if its derived property value changes to DISALLOWED. The | anymore if its derived property value changes to DISALLOWED. The | |||
| problem can also happen the other way around: a code point that | problem can also happen the other way around: a code point that | |||
| was not allowed (and thus is blocked in some situations) to | was not allowed (and thus is prohibited) can suddenly end up being | |||
| suddenly end up being allowed. | allowed. | |||
| Historically, the IETF has accepted all implications of changes in | o Problems can also be created if the properties assigned to those | |||
| the Unicode Standard even though the changes have resulted in | code points are inconsistent with IDNA2008 assumptions about how | |||
| problematic changes in the derived property value. The primary | properties are assigned and/or about how code points with those | |||
| reason for that choice is that staying with the Unicode Standard has | properties are used or behave. | |||
| been viewed as important because of the diversity of implementations | ||||
| already existing in the wild. | ||||
| As described in Section 4, a few changes have been made regarding | There was three incompatible changes in the Unicode standard after | |||
| certain attributes to code points in Unicode between version 6.3.0 | Unicode 5.2 up to including Unicode 6.0, as described in RFC 6452 | |||
| and 11.0.0. Such changes could result in a change in the derived | [RFC6452]. The code points U+0CF1 and U+0CF2 had a derived property | |||
| property value for the code point in question. If a change occurs, | value change from DISSALOWED to PVALID while U+19DA had a change in | |||
| and it is between any of the derived property values except | derived property value from PVALID to DISSALOWED. They where | |||
| DISALLOWED, there is not a problem. This document concludes that no | examined in great detail and IETF concluded that the consensus is | |||
| exceptions are to be added to IDNA2008 even if changes in the derived | that no update was needed to RFC 5892 [RFC5892] based on the changes | |||
| property value is a result of the changes made in Unicode between | made to the Unicode standard. | |||
| version 6.3.0 and 11.0.0. | ||||
| In 2015, the Internet Architecture Board (IAB) issued a statement | As described in Section 4, more changes have been made to code points | |||
| [IAB] which requested the IETF to resolve the issues related to the | between Unicode version 6.0 and 11.0.0 so that the derived property | |||
| code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) that was | value have been changed in an incompatible way. This document | |||
| introduced in Unicode 7.0.0 [Unicode-7.0.0]. The current document | concludes that no exceptions are to be added to IDNA2008 even though | |||
| resolves this issue and suggests that the IDNA2008 standard followsO | there are changes in the derived property value is a result of the | |||
| the Unicode Standard and not update RFC 5892 [RFC5892] or any other | changes made in Unicode between version 6.2.0 and 11.0.0. | |||
| IDNA2008 RFCs. | ||||
| Further, in 2015, the Internet Architecture Board (IAB) issued a | ||||
| statement [IAB] which requested the IETF to resolve the issues | ||||
| related to the code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) | ||||
| that was introduced in Unicode 7.0.0 [Unicode-7.0.0]. This document | ||||
| concludes that this code point is not to be added to the exception | ||||
| list either. It should be noted that the review on U+08A1 indicated | ||||
| that it is not an isolated case and that a number of PVALID code | ||||
| points of long standing may have similar issues. The problem is | ||||
| described in more detail in a document in progress, draft-klensin- | ||||
| idna-5892upd-unicode70 [I-D.klensin-idna-5892upd-unicode70]. A | ||||
| fuller resolution of this issue may require future changes to | ||||
| IDNA2008 or additional specifications, but there is insufficient | ||||
| understanding yet of what would constitute the best approach. | ||||
| 2. Keywords for Requirement Levels | 2. Keywords for Requirement Levels | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they | 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they | |||
| appear in all capitals, as shown here. | appear in all capitals, as shown here. | |||
| 3. Background | 3. Background | |||
| 3.1. IDNA2008 Documents | 3.1. IDNA2008 Documents | |||
| IDNA2008 consists of the following documents. The documents in the | IDNA2008 consists of the following documents. The documents in the | |||
| set have informal names. | set have informal names. | |||
| o RFC 5890 [RFC5890], informally called "Defs" or "Definitions", | o Internationalized Domain Names for Applications (IDNA): | |||
| contains definitions and other material that are needed for | Definitions and Document Framework [RFC5890], informally called | |||
| understanding other documents in the set. | "Defs" or "Definitions", contains definitions and other material | |||
| that are needed for understanding other documents in the set. | ||||
| o RFC 5891 [RFC5891], informally called "Protocol", describes the | o Internationalized Domain Names in Applications (IDNA): Protocol | |||
| core IDNA2008 protocol and its operations. It needs to be | [RFC5891], informally called "Protocol", describes the core | |||
| interpreted in combination with the Bidi document (described | IDNA2008 protocol and its operations. It needs to be interpreted | |||
| below). | in combination with the Bidi document (described below). | |||
| o RFC 5892 [RFC5892], informally called "Tables", lists the | o The Unicode Code Points and Internationalized Domain Names for | |||
| categories and rules that identify the code points allowed in a | Applications (IDNA) [RFC5892], informally called "Tables", lists | |||
| label written in native character form (called a "U-label"), an is | the categories and rules that identify the code points allowed in | |||
| based originally on Unicode 5.2.0 [Unicode-5.2.0] code point | a label written in native character form (called a "U-label"), and | |||
| assignments and additional rules unique to IDNA2008. The Unicode- | is based on Unicode 5.2.0 [Unicode-5.2.0] code point assignments | |||
| based rules in RFC 4892 are expected to be stable across Unicode | and additional rules unique to IDNA2008. The Unicode-based rules | |||
| updates and hence independent of Unicode versions. RFC 5892 | in RFC 4892 are expected to be stable across Unicode updates and | |||
| hence independent of Unicode versions. RFC 5892 [RFC5892] | ||||
| obsoletes RFC 3491 [RFC3491], and in particular the use of the | obsoletes RFC 3491 [RFC3491], and in particular the use of the | |||
| tables to which it refers. | tables to which RFC 3491 [RFC3491] refers. | |||
| o RFC 5893 [RFC5893], informally called "Bidi", specifies special | o Right-to-Left Scripts for Internationalized Domain Names for | |||
| rules for labels that contain characters that are written from | Applications (IDNA) [RFC5893], informally called "Bidi", specifies | |||
| right to left. | special rules for labels that contain characters that are written | |||
| from right to left. | ||||
| o RFC 5894 [RFC5894], informally called "Rationale", provides an | o Internationalized Domain Names for Applications (IDNA): | |||
| overview of the protocol and associated tables, and gives | Background, Explanation, and Rationale [RFC5894], informally | |||
| explanatory material and some rationale for the decisions that led | called "Rationale", provides an overview of the protocol and | |||
| to IDNA2008. It also contains advice for DNS registry operators | associated tables, and gives explanatory material and some | |||
| and others who use Internationalized Domain Names (IDNs). | rationale for the decisions that led to IDNA2008. It also | |||
| contains advice for DNS registry operators and others who use | ||||
| Internationalized Domain Names (IDNs). | ||||
| o RFC 5895 [RFC5895], informally called "Mapping", discusses the | o Mapping Characters for Internationalized Domain Names in | |||
| issue of mapping characters into other characters and that | Applications (IDNA) 2008 [RFC5895], informally called "Mapping", | |||
| provides guidance for doing so when that is appropriate. RFC 5895 | discusses the issue of mapping characters into other characters | |||
| provides advice and is not a required part of IDNA. | and provides guidance for doing so when that is appropriate. RFC | |||
| 5895 provides advice only and is not a required part of IDNA. | ||||
| o RFC 6452 [RFC6452] describes some changes made to Unicode 6.0.0 | 3.2. Additional important IDNA2008-related documents | |||
| [Unicode-6.0.0] that resulted in the derived property value change | ||||
| for the code points U+0CF1, U+0CF2 and U+19DA. U+0CF1 and U+0CF2 | ||||
| changed from DISALLOWED to PVALID, while U+19DA changed from | ||||
| PVALID to DISSALOWED. The IETF concluded that no update to RFC | ||||
| 5892 [RFC5892] was needed based on the changes made in Unicode | ||||
| 6.0.0 [Unicode-6.0.0]. As a result, the derived property value | ||||
| remained aligned with the Unicode Standard. | ||||
| 3.2. Deployment | There are other documents important for the understanding and | |||
| functioning of IDNA2008, for example this. | ||||
| The level of deployment of IDNA2008 is unfortunately quite diverse. | o The Unicode Code Points and Internationalized Domain Names for | |||
| The following lists some of the strategies that existing | Applications (IDNA) - Unicode 6.0 [RFC6452] describes some changes | |||
| implementations are known to use: | made to Unicode 6.0.0 [Unicode-6.0.0] that resulted in derived | |||
| property value change for the code points U+0CF1, U+0CF2 and | ||||
| U+19DA. U+0CF1 and U+0CF2 changed from DISALLOWED to PVALID, | ||||
| while U+19DA changed from PVALID to DISSALOWED. The IETF | ||||
| concluded that no update to RFC 5892 [RFC5892] was needed based on | ||||
| the changes made in Unicode 6.0.0 [Unicode-6.0.0]. As a result, | ||||
| the derived property value remained aligned with the Unicode | ||||
| Standard. Specifically, no exception was added. | ||||
| o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] | 3.3. Deployment | |||
| which implies using a table within which it is said whether code | ||||
| points are allowed to be used or not, after doing the | There are many variations on the general IDNA model in use in the | |||
| normalization specified in IDNA2003. | various parts of the community. The following lists some of the | |||
| strategies that implementations that claim to be IDNA compliant are | ||||
| known to use, but it should be noted the list is not complete: | ||||
| o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 | ||||
| [RFC3491]. Those specifications are dependent on case folding and | ||||
| NFKC normalization and on tables that specify for each code point | ||||
| whether it is allowed to be used or not, with a distinction made | ||||
| between use for "stored strings" and "query strings". The tables | ||||
| themselves are dependent on version 3.2 of The Unicode Standard | ||||
| [Unicode-3.2.0]. | ||||
| o A number of variations on IDNA2003, sometimes presented as | ||||
| "updated IDNA2003" or the like, which follow the principles of | ||||
| IDNA2003 as understood by the implementers but that use tables | ||||
| that represent how the implementers believe Stringprep [RFC3454] | ||||
| and Nameprep [RFC3491] would have evolved had the IETF not moved | ||||
| in the direction of IDNA2008 instead. | ||||
| o A mix between IDNA2003 and IDNA2008 where code points assigned to | o A mix between IDNA2003 and IDNA2008 where code points assigned to | |||
| Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | |||
| value calculated according to the algorithm specified in IDNA2008. | value calculated according to the algorithm specified in IDNA2008. | |||
| o Strict IDNA2008 following the current IANA tables, which implies | o A mix between IDNA2003 and IDNA2008 according to the Unicode | |||
| staying at Unicode 6.3.0 [Unicode-6.3.0] and treating later | Technical Standard #46 [UTS-46]. Because that document specifies | |||
| assigned code points as UNASSIGNED. | different profiles, there are several different variations that | |||
| leave users with no guarantee that two applications claiming | ||||
| o The IDNA2008 algorithm applied to whatever version of Unicode | conformance to UTS#46 will interoperate well with each other much | |||
| Standard exists in the operating system and/or libraries used, | less with conforming IDNA2008 implementations. UTS#46 is | |||
| regardless of whether the version is later than Unicode version | ultimately based on a normative table very much like the one used | |||
| 6.3.0. | by Stringprep [RFC3454] but updated for each new version of | |||
| Unicode. | ||||
| o A mix between IDNA2003 and IDNA2008 according to local | ||||
| interpretation of the Unicode Technical Standard #46 [UTS-46]. | ||||
| The issue is further complicated by having diverse implementations of | o The (normative) IDNA2008 algorithm applied to whatever version of | |||
| the requirements in RFC 5894 [RFC5894] by DNS registry operators, | Unicode Standard exists in the operating system and/or libraries | |||
| based on the IDNA2008 specification, but with additional rules for | used, independent of whatever version of tables appears in the | |||
| the specific code points that are allowed for registration. | (non-normative) IANA detabase. | |||
| In practice, the Unicode Consortium creates a maximum set of code | In practice, the Unicode Consortium creates a maximum set of code | |||
| points by assigning code points in the Unicode Standard. The | points by assigning code points in the Unicode Standard. The | |||
| IDNA2008 rules based on the Unicode Standard create a subset of these | IDNA2008 rules use the Unicode Standard to create a further subset of | |||
| by assigning the PVALID derived property value to them. DNS | code points and context that are permitted in DNS labels associated | |||
| with its PVALID, CONTEXTJ, and CONTEXTO derived property values. DNS | ||||
| registries and other organizations that deal with IDNs are supposed | registries and other organizations that deal with IDNs are supposed | |||
| to create their own subsets from IDNA2008 for use by those registries | to create their own subsets from IDNA2008 for use by those registries | |||
| and organizations. | and organizations. | |||
| SAC-084 [SAC-084] and RFC 6912 [RFC6912] recommend to DNS registries | This progressing subsetting and narrowing of the repertoire of code | |||
| and other organizations to be conservative when creating their | points that can be used in labels is an implementation of the | |||
| subsets, and to use the principle of creating subsets by inclusion. | principles of being conservative when deciding what code points to | |||
| include in such a subset. SAC-084 [SAC-084] and RFC 6912 [RFC6912] | ||||
| recommend to DNS registries and other organizations to be | ||||
| conservative when creating their subsets, and to use the principle of | ||||
| creating subsets by inclusion. | ||||
| 4. Notable Changes Between Unicode 6.3.0 and 11.0.0 | 4. Notable Changes Between Unicode 6.2.0 and 11.0.0 | |||
| 4.1. Changes in Unicode 7.0.0 | 4.1. Changes between Unicode 6.2.0 and 7.0.0 | |||
| Change in number of chacters in each category: | ||||
| Code points that changed derived property value: 0 | ||||
| PVALID changed from 97946 to 99867 (+1921) | ||||
| UNASSIGNED changed from 864348 to 861509 (-2839) | ||||
| CONTEXTJ did not change, at 2 | ||||
| CONTEXTO did not change, at 25 | ||||
| DISALLOWED changed from from 151791 to 152709 (+918) | ||||
| TOTAL did not change, at 1114112 | ||||
| There are no changes made to Unicode between version 6.2.0 and | ||||
| 7.0.0 that impact IDNA2008 calculation of the derived property | ||||
| values. | ||||
| The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was | The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was | |||
| introduced in Unicode 7.0.0. This was discussed extensively in the | introduced in Unicode 7.0.0. This was discussed extensively in the | |||
| IETF, and by the IAB in their statement [IAB] requesting the IETF to | IETF, and by the IAB in their statement [IAB] requesting the IETF to | |||
| investigate the issue. Specifically, the IAB stated: | investigate the issue. Specifically, the IAB stated: | |||
| On the same precautionary principle, the IAB recommends that the | On the same precautionary principle, the IAB recommends that the | |||
| Internationalized Domain Names for Applications (IDNA) Parameters | Internationalized Domain Names for Applications (IDNA) Parameters | |||
| registry (http://www.iana.org/assignments/idna-tables/) not be | registry (http://www.iana.org/assignments/idna-tables/) not be | |||
| updated to Unicode 7.0.0 until the IETF has consensus on a | updated to Unicode 7.0.0 until the IETF has consensus on a | |||
| solution to this problem. | solution to this problem. | |||
| The discussion in the IETF concluded that although it is possible to | The discussion in the IETF concluded that although it is possible to | |||
| create "the same" character in multiple ways, the issue with U+08A1 | create "the same" character in multiple ways, the issue with U+08A1 | |||
| is not unique. The character U+08A1 can be represented with the | is not unique. The character U+08A1 (ARABIC LETTER BEH WITH HAMZA | |||
| sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). | ABOVE) can be represented with the sequence ARABIC LETTER BEH | |||
| This identical to LATIN SMALL LETTER A WITH DIAERESIS (U+00E4), that | (U+0628) and ARABIC HAMZA ABOVE (U+0654). This identical to LATIN | |||
| can be represented with the sequence LATIN SMALL LETTER A (U+0061) | SMALL LETTER O WITH STROKE (U+00F8), which can be represented with | |||
| followed by COMBINING DIAERESIS (U+0308). One difference between | the sequence LATIN SMALL LETTER O (U+006F) followed by COMBINING | |||
| these two sequences is how they are treated in the normalization | SHORT SOLIDUS OVERLAY (U+0337). | |||
| forms specified by the Unicode Consortium. | ||||
| U+08A1 is discussed in draft-freytag-troublesome-characters | Although the discussion about this specific code point resulted in | |||
| [I-D.freytag-troublesome-characters] and other Internet-Drafts. | acceptance of the derived property value of PVALID, the underlying | |||
| Regardless of whether the discussion of those drafts ends in | problem with combining sequences is not understood fully. Therefore | |||
| recommendations to include the code point in the repertoire of | it cannot be claimed that this case can be extrapolated to other | |||
| characters permissable for registration or not, it is still | situtions and other code points. | |||
| acceptable to allow the code point to have a derived property value | ||||
| of PVALID. | ||||
| 4.2. Changes between Unicode 7.0.0 and 10.0.0 | 4.2. Changes between Unicode 7.0.0 and 10.0.0 | |||
| There are no changes made to Unicode between version 7.0.0 and 10.0.0 | Change in number of chacters in each category: | |||
| that impact IDNA2008 calculation of the derived property value. | ||||
| 4.3. Changes in Unicode 11.0.0 | Code points that changed derived property value: 0 | |||
| The Unicode Standard Version 11.0.0 [Unicode-11.0.0] has included a | PVALID changed from 99867 to 122411 (+22544) | |||
| number of changes [Changes-11.0.0] from version 10.0.0. | ||||
| o 684 new characters were added, including letters, combining marks, | UNASSIGNED changed from 861509 to 837775 (-23734) | |||
| digits, symbols, and punctuation marks. | ||||
| o Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF | CONTEXTJ did not change, at 2 | |||
| CONTEXTO did not change, at 25 | ||||
| DISALLOWED changed from from 152709 to 153899 (+1190) | ||||
| TOTAL did not change, at 1114112 | ||||
| There are no changes made to Unicode between version 7.0.0 and | ||||
| 10.0.0 that impact IDNA2008 calculation of the derived property | ||||
| values. | ||||
| 4.3. Changes between Unicode 10.0.0 and 11.0.0 | ||||
| Change in number of chacters in each category: | ||||
| Code points that changed derived property value: 1 | ||||
| PVALID changed from 122411 to 122734 (+323) | ||||
| UNASSIGNED changed from 837775 to 837091 (-684) | ||||
| CONTEXTJ did not change, at 2 | ||||
| CONTEXTO did not change, at 25 | ||||
| DISALLOWED changed from from 153899 to 154260 (+361) | ||||
| TOTAL did not change, at 1114112 | ||||
| Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF | ||||
| had their General Properties changed from Lo to Ll, to reflect | had their General Properties changed from Lo to Ll, to reflect | |||
| their status as the lowercase of new Georgian case pairs. Case | their status as the lowercase of new Georgian case pairs. Case | |||
| mappings were also added. | mappings were also added. | |||
| o SHARADA SANDHI MARK (U+111C9 ) was changed from Po to Mn, and from | SHARADA SANDHI MARK (U+111C9) was changed from Po to Mn, and from | |||
| bc=L to bc=NSM. | bc=L to bc=NSM. | |||
| o The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | |||
| ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to | ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to | |||
| Mn. | Mn. | |||
| o SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N. | SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N. | |||
| These changes to the Unicode Standard have the following implications | These changes to the Unicode Standard have the following implications | |||
| for these code points: | for these code points: | |||
| o The newly assigned 684 characters are to have a derived property | o The newly assigned 684 characters are assigned a derived property | |||
| value as of a result of applying the IDNA2008 algorithm. | value as of a result of applying the IDNA2008 algorithm. | |||
| o The Georgian letters in the ranges U+10D0..U+10FA and | o The Georgian letters in the ranges U+10D0..U+10FA and | |||
| U+10FD..U+10FF existed before IDNA2008 was created. Applying the | U+10FD..U+10FF existed before IDNA2008 was created. Applying the | |||
| IDNA2008 algorithm to the code points assigned the derived | IDNA2008 algorithm to the code points assigned the derived | |||
| property value PVALID, and that value is unchanged even if the | property value PVALID, and that value is unchanged even if the | |||
| underlying Unicode properties have changed. | underlying Unicode properties have changed. The newly encoded | |||
| Mtavruli letters have general category "Lu" and are therefore | ||||
| DISALLOWED. | ||||
| o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 | o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 | |||
| [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code | [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code | |||
| point assigned the derived property value DISALLOWED. The changes | point assigned the derived property value DISALLOWED. The changes | |||
| in the underlying properties in the Unicode Standard Version | in the underlying properties in the Unicode Standard Version | |||
| 11.0.0 [Unicode-11.0.0] caused the derived property value to | 11.0.0 [Unicode-11.0.0] caused the derived property value to | |||
| change to PVALID, which is an acceptable change. | change to PVALID. | |||
| o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and | |||
| ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode | ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode | |||
| 10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the | 10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the | |||
| code points assigned the derived property value PVALID, and that | code points assigned the derived property value PVALID, and that | |||
| value is unchanged even if the underlying Unicode properties have | value is unchanged even if the underlying Unicode properties have | |||
| changed. | changed. | |||
| o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was | o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was | |||
| created. Applying the IDNA2008 algorithm to the code point | created. Applying the IDNA2008 algorithm to the code point | |||
| assigned the derived property value PVALID, and that value is | assigned the derived property value DISALLOWED, and that value is | |||
| unchanged even if the underlying Unicode properties have changed. | unchanged even if the underlying Unicode properties have changed. | |||
| 5. Conclusion | 5. U+111C9 SHARADA SANDHI MARK | |||
| As described in Section 4, changes have been made to Unicode between | As one can see in Section 4, there is one incompatible change made | |||
| version 6.3.0 and 11.0.0. Some changes to specific characters | between Unicode 6.2.0 and 11.0.0, the code point U+111C9. It has | |||
| changed their derived property value, while other changes did not. | changed derived property value from DISALLOWED to PVALID. In | |||
| Given the diverse deployment described in Section 3.2 and the changes | situations like these, IDNA2008 allow for addition of rules to RFC | |||
| 5892 [RFC5892] section 2.7. (BackwardCompatible (G)). The code | ||||
| point if being accepted might due to implementations of IDNA2008 | ||||
| based on older versions of Unicode 11.0.0 be rejected. As the | ||||
| character is rarely used outside of the group of Sharada specialist, | ||||
| and used in some records for indicating sandhi breaks, the conclusion | ||||
| is that it could be added as an exception as well as change property | ||||
| value as the use of the code point is limited outside a special | ||||
| community. As including an exception would require implementation | ||||
| changes in deployed implementations of IDNA20008, the editor proposes | ||||
| that such a BackwardCompatible rule NOT to be added to IDNA2008. | ||||
| 6. Conclusion | ||||
| As described in Section 4 and Section 5, changes have been made to | ||||
| Unicode between version 6.2.0 and 11.0.0. Some changes to specific | ||||
| characters changed their derived property value, while other changes | ||||
| did not. Given what is described in Section 3.3 and the changes | ||||
| described, including implications to normalization, the conclusion of | described, including implications to normalization, the conclusion of | |||
| this document is to not add any exception rules to IDNA2008. | this document is to not add any exception rules to IDNA2008. | |||
| To increase overall harmonization in the use of IDNs, this document | This does not preclude any such updates to RFC 5892 [RFC5892] or any | |||
| recommends that the derived property values MUST be calculated as | other IDNA2008 related document in the future when new versions of | |||
| specified in the documents listed in section Section 3.1 and with the | the Unicode Standard is released, and it might also happen that it is | |||
| code points in Unicode Version 11.0.0 [Unicode-11.0.0]. | found the algorithm specified in IDNA2008 is not suitable for DNS | |||
| without additional rules, categories, or tuning. | ||||
| All DNS registries (and other organizatios) SHOULD calculate a | ||||
| repertoire using the conservatism and inclusion principles, as | ||||
| described in SAC-084 [SAC-084] and similar documents. | ||||
| 6. IANA Considerations | 7. IANA Considerations | |||
| IANA is requested to update the IDNA Parameters registry of derived | IANA is requested to update the IDNA Parameters registry of derived | |||
| property values, after the expert reviewer validates that the derived | property values, after the expert reviewer validates that the derived | |||
| property values are calculated correctly. | property values are calculated correctly. | |||
| 7. Security Considerations | 8. Security Considerations | |||
| This document makes recommendations regarding the use of the IDNA2008 | This document makes recommendations regarding the use of the IDNA2008 | |||
| algorithm for calculation of derived property values, based on the | algorithm for calculation of derived property values, based on the | |||
| current Unicode version. It also recommends that DNS registries (and | Unicode version 11.0.0. This recommendation do not say anything | |||
| others dealing with Internationalized Domain Names) explicitly select | about what recommendations to make for future versions of the Unicode | |||
| appropriate subsets of characters with the derived value of PVALID. | Standard. | |||
| Not following these recommendations can lead to various security | Not following these recommendations can lead to various security | |||
| issues. Specifically, allowing confusable characters may lead to | issues. Specifically, allowing confusable characters may lead to | |||
| various phishing attacks, as described in the Security Consideration | various phishing attacks, as described in the Security Consideration | |||
| Sections in the documents listed in section Section 3.1. | Sections in the documents listed in section Section 3.1. | |||
| 8. Acknowledgements | 9. Acknowledgements | |||
| Thanks to Martin Duerst, Asmus Freytag, Ted Hardie, John Klensin, | Thanks to Harald Alvestrand, Marc Blanchet, Martin Duerst, Asmus | |||
| Erik Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | Freytag, Ted Hardie, John Klensin, Erik Nordmark, Pete Resnick, Peter | |||
| Saint-Andre, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | ||||
| input to this document. | input to this document. | |||
| 9. References | 10. References | |||
| 9.1. Normative References | ||||
| 10.1. Normative References | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | |||
| Profile for Internationalized Domain Names (IDN)", | Profile for Internationalized Domain Names (IDN)", | |||
| RFC 3491, DOI 10.17487/RFC3491, March 2003, | RFC 3491, DOI 10.17487/RFC3491, March 2003, | |||
| <https://www.rfc-editor.org/info/rfc3491>. | <https://www.rfc-editor.org/info/rfc3491>. | |||
| skipping to change at page 9, line 45 ¶ | skipping to change at page 12, line 9 ¶ | |||
| [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code | [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code | |||
| Points and Internationalized Domain Names for Applications | Points and Internationalized Domain Names for Applications | |||
| (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, | (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, | |||
| November 2011, <https://www.rfc-editor.org/info/rfc6452>. | November 2011, <https://www.rfc-editor.org/info/rfc6452>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| 9.2. Non-normative references | 10.2. Non-normative references | |||
| [Changes-11.0.0] | [Changes-11.0.0] | |||
| The Unicode Consortium, "Unicode Standard Annex #44", | The Unicode Consortium, "Unicode Standard Annex #44", | |||
| Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, | Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, | |||
| Change History https://www.unicode.org/reports/tr44/ | Change History https://www.unicode.org/reports/tr44/ | |||
| tr44-21d4.html#Change_History, May 2018. | tr44-21d4.html#Change_History, May 2018. | |||
| [I-D.freytag-troublesome-characters] | [I-D.freytag-troublesome-characters] | |||
| Freytag, A., Klensin, J., and A. Sullivan, "Those | Freytag, A., Klensin, J., and A. Sullivan, "Those | |||
| Troublesome Characters: A Registry of Unicode Code Points | Troublesome Characters: A Registry of Unicode Code Points | |||
| Needing Special Consideration When Used in Network | Needing Special Consideration When Used in Network | |||
| Identifiers", draft-freytag-troublesome-characters-02 | Identifiers", draft-freytag-troublesome-characters-02 | |||
| (work in progress), June 2018. | (work in progress), June 2018. | |||
| [I-D.klensin-idna-5892upd-unicode70] | ||||
| Klensin, J. and P. Faltstrom, "IDNA Update for Unicode 7.0 | ||||
| and Later Versions", draft-klensin-idna-5892upd- | ||||
| unicode70-05 (work in progress), October 2017. | ||||
| [IAB] Internet Architecture Board, "IAB Statement on Identifiers | [IAB] Internet Architecture Board, "IAB Statement on Identifiers | |||
| and Unicode 7.0.0", IAB Statement on Identifiers and | and Unicode 7.0.0", IAB Statement on Identifiers and | |||
| Unicode 7.0.0 | Unicode 7.0.0 | |||
| https://www.iab.org/documents/correspondence-reports- | https://www.iab.org/documents/correspondence-reports- | |||
| documents/2015-2/iab-statement-on-identifiers-and-unicode- | documents/2015-2/iab-statement-on-identifiers-and-unicode- | |||
| 7-0-0/, January 2015. | 7-0-0/, January 2015. | |||
| [N4330] Pandey, A., "Proposal to Encode the SANDHI MARK for | ||||
| Sharada", Proposal to Encode the SANDHI MARK for | ||||
| Sharada https://www.unicode.org/L2/L2012/12322-n4330- | ||||
| sharada-sandhi-mark.pdf, September 2012. | ||||
| [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | ||||
| Internationalized Strings ("stringprep")", RFC 3454, | ||||
| DOI 10.17487/RFC3454, December 2002, | ||||
| <https://www.rfc-editor.org/info/rfc3454>. | ||||
| [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, DOI 10.17487/RFC3490, March 2003, | RFC 3490, DOI 10.17487/RFC3490, March 2003, | |||
| <https://www.rfc-editor.org/info/rfc3490>. | <https://www.rfc-editor.org/info/rfc3490>. | |||
| [RFC5894] Klensin, J., "Internationalized Domain Names for | [RFC5894] Klensin, J., "Internationalized Domain Names for | |||
| Applications (IDNA): Background, Explanation, and | Applications (IDNA): Background, Explanation, and | |||
| Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, | Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, | |||
| <https://www.rfc-editor.org/info/rfc5894>. | <https://www.rfc-editor.org/info/rfc5894>. | |||
| End of changes. 57 change blocks. | ||||
| 187 lines changed or deleted | 299 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||