| < draft-faltstrom-unicode11-04.txt | draft-faltstrom-unicode11-05.txt > | |||
|---|---|---|---|---|
| Network Working Group P. Faltstrom | Network Working Group P. Faltstrom | |||
| Internet-Draft Netnod | Internet-Draft Netnod | |||
| Intended status: Informational October 07, 2018 | Intended status: Informational December 01, 2018 | |||
| Expires: April 10, 2019 | Expires: June 4, 2019 | |||
| IDNA2008 and Unicode 11.0.0 | IDNA2008 and Unicode 11.0.0 | |||
| draft-faltstrom-unicode11-04 | draft-faltstrom-unicode11-05 | |||
| Abstract | Abstract | |||
| This document describes changes between Unicode 6.3.0 and Unicode | This document describes changes between Unicode 6.3.0 and Unicode | |||
| 11.0.0 in the context of IDNA2008. It further suggests for the IETF | 11.0.0 in the context of IDNA2008. It further suggests for the IETF | |||
| a path forward regarding ensuring IDNA2008 follows the evolution of | a path forward regarding ensuring IDNA2008 follows the evolution of | |||
| the Unicode Standard. | the Unicode Standard. | |||
| In a few cases changes have been made in the Unicode Standard related | In a few cases changes have been made in the Unicode Standard related | |||
| to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to | to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to | |||
| skipping to change at page 1, line 44 ¶ | skipping to change at page 1, line 44 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 10, 2019. | This Internet-Draft will expire on June 4, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 3, line 12 ¶ | skipping to change at page 3, line 12 ¶ | |||
| series and elsewhere as "IDNA2008" and is specified in a series of | series and elsewhere as "IDNA2008" and is specified in a series of | |||
| documents (see Section Section 3.1). The standard include an | documents (see Section Section 3.1). The standard include an | |||
| algorithm by which a derived property value is calculated based on | algorithm by which a derived property value is calculated based on | |||
| the properties defined in the Unicode Standard. | the properties defined in the Unicode Standard. | |||
| When the Unicode Standard is updated code points are assigned and | When the Unicode Standard is updated code points are assigned and | |||
| property values might be changed for already assigned code points. | property values might be changed for already assigned code points. | |||
| Assigning code points might create problems if the newly assigned | Assigning code points might create problems if the newly assigned | |||
| code points are compositions of code points so that it either changes | code points are compositions of code points so that it either changes | |||
| or would have changed the normalization functions. This because it | or would have changed the normalization functions. This is because | |||
| changes the matching algorithms used which in turn might create | it changes the matching algorithms used which in turn might create | |||
| problems looking up already stored strings in for example DNS. | problems looking up already stored strings in for example DNS. | |||
| Changing properties for already assigned code points might create | Changing properties for already assigned code points might create | |||
| problems if the change do result in the derived property value | problems if the change results in the derived property value changes. | |||
| changes. This might make an earlier allowed code point (derived | This might make an earlier allowed code point (derived property value | |||
| property value PVALID) not be allowed anymore (derived property value | PVALID) not be allowed anymore (derived property value DISALLOWED). | |||
| DISALLOWED). Or the other way around, a code point that was not | Or the other way around, a code point that was not allowed (and | |||
| allowed (and because of that blocked in some situations) suddenly end | because of that blocked in some situations) suddenly end up being | |||
| up being allowed. | allowed. | |||
| Historically the IETF has accepted all implications of changes in the | Historically the IETF has accepted all implications of changes in the | |||
| Unicode Standard even though the changes have resulted in problematic | Unicode Standard even though the changes have resulted in problematic | |||
| changes in the derived property value. The primary reason for that | changes in the derived property value. The primary reason for that | |||
| is that staying with the Unicode Standard has been viewed as | is that staying with the Unicode Standard has been viewed as | |||
| important given the diversity in implementations already existing in | important given the diversity in implementations already existing in | |||
| the wild. | the wild. | |||
| As described in Section 4, a few changes have been made regarding | As described in Section 4, a few changes have been made regarding | |||
| certain attributes to code points in Unicode between version 6.3.0 | certain attributes to code points in Unicode between version 6.3.0 | |||
| skipping to change at page 4, line 8 ¶ | skipping to change at page 4, line 8 ¶ | |||
| Specifically, the Internet Architecture Board did issue a statement | Specifically, the Internet Architecture Board did issue a statement | |||
| [IAB] which requested IETF to resolve the issues related to the code | [IAB] which requested IETF to resolve the issues related to the code | |||
| point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in | point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in | |||
| Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and | Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and | |||
| suggests IDNA2008 standard is to follow the Unicode Standard and not | suggests IDNA2008 standard is to follow the Unicode Standard and not | |||
| update RFC 5892 [RFC5892] or any other IDNA2008 RFCs. | update RFC 5892 [RFC5892] or any other IDNA2008 RFCs. | |||
| 2. Keywords for Requirement Levels | 2. Keywords for Requirement Levels | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| document are to be interpreted as described in RFC 2119 [RFC2119]. | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they | ||||
| appear in all capitals, as shown here. | ||||
| 3. Background | 3. Background | |||
| 3.1. IDNA2008 Documents | 3.1. IDNA2008 Documents | |||
| IDNA2008 consists of the following documents: | IDNA2008 consists of the following documents: | |||
| o A document, RFC 5890 [RFC5890], containing definitions and other | o A document, RFC 5890 [RFC5890], containing definitions and other | |||
| material that are needed for understanding other documents in the | material that are needed for understanding other documents in the | |||
| set. It is referred to informally in other documents in the set | set. It is referred to informally in other documents in the set | |||
| skipping to change at page 5, line 11 ¶ | skipping to change at page 5, line 15 ¶ | |||
| o A document, RFC 5895 [RFC5895], that discusses the issue of | o A document, RFC 5895 [RFC5895], that discusses the issue of | |||
| mapping characters into other characters and that provides | mapping characters into other characters and that provides | |||
| guidance for doing so when that is appropriate. That document, | guidance for doing so when that is appropriate. That document, | |||
| referred to informally as "Mapping", provides advice; it is not a | referred to informally as "Mapping", provides advice; it is not a | |||
| required part of IDNA. | required part of IDNA. | |||
| o A document, RFC 6452 [RFC6452], that looks at some changes made to | o A document, RFC 6452 [RFC6452], that looks at some changes made to | |||
| Unicode 6.0.0 [Unicode-6.0.0] that resulted in the derived | Unicode 6.0.0 [Unicode-6.0.0] that resulted in the derived | |||
| property value change for the code points U+0CF1, U+0CF2 and | property value change for the code points U+0CF1, U+0CF2 and | |||
| U+19DA. The first two changed from DISALLOWED to PVALID, the last | U+19DA. The first two changed from DISALLOWED to PVALID, the last | |||
| from PVALID to DISSALOWED. IETF came to the conclusion the | from PVALID to DISSALOWED. IETF came to the conclusion that no | |||
| changes where acceptable and RFC 5892 [RFC5892] was not updated to | update is needed to RFC 5892 [RFC5892] based on the changes made | |||
| make the derived property value not change for these code points. | in Unicode 6.0.0 [Unicode-6.0.0]. As a result, the derived | |||
| property value remained aligned with the Unicode Standard. | ||||
| 3.2. Deployment | 3.2. Deployment | |||
| The deployment of IDNA2008 is unfortunately quite diverse. The | The deployment of IDNA2008 is unfortunately quite diverse. The | |||
| following lists some of the strategies that existing implementations | following lists some of the strategies that existing implementations | |||
| are known to implement: | are known to implement: | |||
| o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] | o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] | |||
| which implies using a table within which it is said whether code | which implies using a table within which it is said whether code | |||
| points are allowed to be used or not, and this after doing the in | points are allowed to be used or not, after doing the | |||
| IDNA2003 included normalization. | normalization specified in IDNA2003. | |||
| o A mix between IDNA2003 and IDNA2008 where code points assigned to | o A mix between IDNA2003 and IDNA2008 where code points assigned to | |||
| Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property | |||
| value calculated according to the algorithm specified in IDNA2008. | value calculated according to the algorithm specified in IDNA2008. | |||
| o Strict IDNA2008 following IANA which implies stayed at Unicode | o Strict IDNA2008 following IANA which implies staying at Unicode | |||
| 6.3.0 [Unicode-6.3.0] and treating later assigned code points as | 6.3.0 [Unicode-6.3.0] and treating later assigned code points as | |||
| UNASSIGNED. | UNASSIGNED. | |||
| o The IDNA2008 algorithm applied to whatever version of Unicode | o The IDNA2008 algorithm applied to whatever version of Unicode | |||
| Standard exists in the operating system and/or libraries used, | Standard exists in the operating system and/or libraries used, | |||
| regardless of whether the version is later than Unicode version | regardless of whether the version is later than Unicode version | |||
| 6.3.0 or not. | 6.3.0 or not. | |||
| o A mix between IDNA2003 and IDNA2008 according to local | o A mix between IDNA2003 and IDNA2008 according to local | |||
| interpretation of the Unicode Technical Standard #46 [UTS-46]. | interpretation of the Unicode Technical Standard #46 [UTS-46]. | |||
| The issue is further complicated by having a very diverse | The issue is further complicated by having a very diverse | |||
| implementations of the requirements in RFC 5894 [RFC5894] that | implementations of the requirements in RFC 5894 [RFC5894] by registry | |||
| registry operators to based on the IDNA2008 specification create | operators based on the IDNA2008 specification to create additional | |||
| additional rules for what code points are allowed to be used for | rules for what code points are allowed to be used for registration. | |||
| registration. | ||||
| In practice, the Unicode Consortium creates a maximum set of code | In practice, the Unicode Consortium creates a maximum set of code | |||
| points by assigning code points in the Unicode Standard. The | points by assigning code points in the Unicode Standard. The | |||
| IDNA2008 rules based on the Unicode Standard create a subset of these | IDNA2008 rules based on the Unicode Standard create a subset of these | |||
| by assigning the PVALID derived property value to them. Registries | by assigning the PVALID derived property value to them. Registries | |||
| (and others dealing with Internationalized Domain Names) are supposed | (and others dealing with Internationalized Domain Names) are supposed | |||
| to create an even smaller subset that ultimately is the set of code | to create an even smaller subset that ultimately is the set of code | |||
| points that can be used in a particular registry. | points that can be used in a particular registry. | |||
| There is further recommendation to be conservative when these subsets | There is further recommendation to be conservative when these subsets | |||
| are calculated and to use the inclusion principle; this is explained | are calculated and to use the inclusion principle; this is explained | |||
| in SAC-084 [SAC-084] and RFC 6912 [RFC6912]. | in SAC-084 [SAC-084] and RFC 6912 [RFC6912]. | |||
| The complicated situation with deployment of IDNA2008 is discussed | ||||
| further in draft-klensin-idna-rfc5891bis | ||||
| [I-D.klensin-idna-rfc5891bis] and draft-freytag-troublesome- | ||||
| characters [I-D.freytag-troublesome-characters]. | ||||
| 4. Notable changes between Unicode 6.3.0 and 11.0.0 | 4. Notable changes between Unicode 6.3.0 and 11.0.0 | |||
| 4.1. Changes to Unicode 7.0.0 | 4.1. Changes to Unicode 7.0.0 | |||
| The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was | The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was | |||
| introduced in Unicode 7.0.0. This was discussed in the IETF | introduced in Unicode 7.0.0. This was discussed in the IETF | |||
| extensively and by IAB in their statement [IAB] requesting the IETF | extensively and by IAB in their statement [IAB] requesting the IETF | |||
| to investigate the issue. Specifically IAB stated: | to investigate the issue. Specifically IAB stated: | |||
| On the same precautionary principle, the IAB recommends that the | On the same precautionary principle, the IAB recommends that the | |||
| skipping to change at page 8, line 25 ¶ | skipping to change at page 8, line 25 ¶ | |||
| As described in Section 4 changes have been made to Unicode between | As described in Section 4 changes have been made to Unicode between | |||
| version 6.3.0 and 11.0.0. Some changes to specific characters | version 6.3.0 and 11.0.0. Some changes to specific characters | |||
| changed their derived property value. Others did not. Given the | changed their derived property value. Others did not. Given the | |||
| diverse deployment described in Section 3.2 and the changes | diverse deployment described in Section 3.2 and the changes | |||
| described, including implications to normalization, the conclusion is | described, including implications to normalization, the conclusion is | |||
| to not add any exception rules to IDNA2008. | to not add any exception rules to IDNA2008. | |||
| To increase overall harmonization in the use of internationalized | To increase overall harmonization in the use of internationalized | |||
| domain names, the author recommends that the derived property values | domain names, the author recommends that the derived property values | |||
| MUST be calculated according to the IDNA2008 specification for | MUST be calculated as specified in the documents listed in section | |||
| Unicode Version 11.0.0 [Unicode-11.0.0]. | Section 3.1 also with code points in Unicode Version 11.0.0 | |||
| [Unicode-11.0.0]. | ||||
| All registries (and others) SHOULD calculate a repertoire, for | All registries (and others) SHOULD calculate a repertoire using the | |||
| example as explained in draft-freytag-troublesome-characters | conservatism and inclusion principles as laid out for example in in | |||
| [I-D.freytag-troublesome-characters] and draft-klensin-idna- | SAC-084 [SAC-084]. | |||
| rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and | ||||
| inclusion principles as laid out in SAC-084 [SAC-084]. | ||||
| 6. IANA Considerations | 6. IANA Considerations | |||
| IANA is requested to update the registry of derived property values | IANA is requested to update the registry of derived property values | |||
| after validation with the Appointed Expert that the derived property | after validation with the Appointed Expert that the derived property | |||
| values are calculated correctly. | values are calculated correctly. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| This document makes recommendations regarding the use of the IDNA2008 | This document makes recommendations regarding the use of the IDNA2008 | |||
| algorithm for calculation of derived property values, based on the | algorithm for calculation of derived property values, based on the | |||
| current Unicode version. It also recommends that registries (and | current Unicode version. It also recommends that registries (and | |||
| others dealing with Internationalized Domain Names) explicitly select | others dealing with Internationalized Domain Names) explicitly select | |||
| appropriate subsets of characters with the derived value of PVALID. | appropriate subsets of characters with the derived value of PVALID. | |||
| Not following these recommendations can lead to various security | Not following these recommendations can lead to various security | |||
| issues. Specifically, allowing confusable characters may lead to | issues. Specifically, allowing confusable characters may lead to | |||
| various phishing attacks. | various phishing attacks. See Security Consideration Sections in the | |||
| documents listed in section Section 3.1. | ||||
| 8. Acknowledgements | 8. Acknowledgements | |||
| Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik | Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik | |||
| Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for | |||
| input to this document. | input to this document. | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| [IAB] Internet Architecture Board, "IAB Statement on Identifiers | ||||
| and Unicode 7.0.0", IAB Statement on Identifiers and | ||||
| Unicode 7.0.0 | ||||
| https://www.iab.org/documents/correspondence-reports- | ||||
| documents/2015-2/iab-statement-on-identifiers-and-unicode- | ||||
| 7-0-0/, January 2015. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | |||
| Profile for Internationalized Domain Names (IDN)", | Profile for Internationalized Domain Names (IDN)", | |||
| RFC 3491, DOI 10.17487/RFC3491, March 2003, | RFC 3491, DOI 10.17487/RFC3491, March 2003, | |||
| <https://www.rfc-editor.org/info/rfc3491>. | <https://www.rfc-editor.org/info/rfc3491>. | |||
| skipping to change at page 10, line 10 ¶ | skipping to change at page 9, line 50 ¶ | |||
| [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts | [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts | |||
| for Internationalized Domain Names for Applications | for Internationalized Domain Names for Applications | |||
| (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, | (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, | |||
| <https://www.rfc-editor.org/info/rfc5893>. | <https://www.rfc-editor.org/info/rfc5893>. | |||
| [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code | [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code | |||
| Points and Internationalized Domain Names for Applications | Points and Internationalized Domain Names for Applications | |||
| (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, | (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, | |||
| November 2011, <https://www.rfc-editor.org/info/rfc6452>. | November 2011, <https://www.rfc-editor.org/info/rfc6452>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
| 9.2. Non-normative references | 9.2. Non-normative references | |||
| [Changes-11.0.0] | [Changes-11.0.0] | |||
| The Unicode Consortium, "Unicode Standard Annex #44", | The Unicode Consortium, "Unicode Standard Annex #44", | |||
| Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, | Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, | |||
| Change History https://www.unicode.org/reports/tr44/ | Change History https://www.unicode.org/reports/tr44/ | |||
| tr44-21d4.html#Change_History, May 2018. | tr44-21d4.html#Change_History, May 2018. | |||
| [I-D.freytag-troublesome-characters] | [I-D.freytag-troublesome-characters] | |||
| Freytag, A., Klensin, J., and A. Sullivan, "Those | Freytag, A., Klensin, J., and A. Sullivan, "Those | |||
| Troublesome Characters: A Registry of Unicode Code Points | Troublesome Characters: A Registry of Unicode Code Points | |||
| Needing Special Consideration When Used in Network | Needing Special Consideration When Used in Network | |||
| Identifiers", draft-freytag-troublesome-characters-01 | Identifiers", draft-freytag-troublesome-characters-02 | |||
| (work in progress), June 2017. | (work in progress), June 2018. | |||
| [I-D.klensin-idna-rfc5891bis] | [IAB] Internet Architecture Board, "IAB Statement on Identifiers | |||
| Klensin, J. and A. Freytag, "Internationalized Domain | and Unicode 7.0.0", IAB Statement on Identifiers and | |||
| Names in Applications (IDNA): Registry Restrictions and | Unicode 7.0.0 | |||
| Recommendations", draft-klensin-idna-rfc5891bis-01 (work | https://www.iab.org/documents/correspondence-reports- | |||
| in progress), September 2017. | documents/2015-2/iab-statement-on-identifiers-and-unicode- | |||
| 7-0-0/, January 2015. | ||||
| [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, DOI 10.17487/RFC3490, March 2003, | RFC 3490, DOI 10.17487/RFC3490, March 2003, | |||
| <https://www.rfc-editor.org/info/rfc3490>. | <https://www.rfc-editor.org/info/rfc3490>. | |||
| [RFC5894] Klensin, J., "Internationalized Domain Names for | [RFC5894] Klensin, J., "Internationalized Domain Names for | |||
| Applications (IDNA): Background, Explanation, and | Applications (IDNA): Background, Explanation, and | |||
| Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, | Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, | |||
| <https://www.rfc-editor.org/info/rfc5894>. | <https://www.rfc-editor.org/info/rfc5894>. | |||
| End of changes. 18 change blocks. | ||||
| 51 lines changed or deleted | 46 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||