< draft-faltstrom-unicode11-02.txt   draft-faltstrom-unicode11-03.txt >
Network Working Group P. Faltstrom Network Working Group P. Faltstrom
Internet-Draft Netnod Internet-Draft Netnod
Intended status: Informational September 25, 2018 Intended status: Informational October 02, 2018
Expires: March 29, 2019 Expires: April 5, 2019
IDNA2008 and Unicode 11.0.0 IDNA2008 and Unicode 11.0.0
draft-faltstrom-unicode11-02 draft-faltstrom-unicode11-03
Abstract Abstract
This document describes changes between Unicode 6.3.0 and Unicode This document describes changes between Unicode 6.3.0 and Unicode
11.0.0 in the context of IDNA2008. It further suggests for the IETF 11.0.0 in the context of IDNA2008. It further suggests for the IETF
a path forward regarding ensuring IDNA2008 follows the evolution of a path forward regarding ensuring IDNA2008 follows the evolution of
the Unicode Standard. the Unicode Standard.
In a few cases changes have been made in the Unicode Standard related In a few cases changes have been made in the Unicode Standard related
to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 29, 2019. This Internet-Draft will expire on April 5, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 24 skipping to change at page 2, line 24
3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 3 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 3
3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5
4. Notable changes between Unicode 6.3.0 and 11.0.0 . . . . . . 6 4. Notable changes between Unicode 6.3.0 and 11.0.0 . . . . . . 6
4.1. Changes to Unicode 7.0.0 . . . . . . . . . . . . . . . . 6 4.1. Changes to Unicode 7.0.0 . . . . . . . . . . . . . . . . 6
4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6
4.3. Changes to Unicode 11.0.0 . . . . . . . . . . . . . . . . 6 4.3. Changes to Unicode 11.0.0 . . . . . . . . . . . . . . . . 6
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
9.1. Normative References . . . . . . . . . . . . . . . . . . 8 9.1. Normative References . . . . . . . . . . . . . . . . . . 9
9.2. Non-normative references . . . . . . . . . . . . . . . . 9 9.2. Non-normative references . . . . . . . . . . . . . . . . 10
Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12 Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12
Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15 Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15
Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16 Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16
Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17 Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17
Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18 Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18
Appendix F. Code points in Unicode Character Database (UCD) Appendix F. Code points in Unicode Character Database (UCD)
format for Unicode 11.0.0 . . . . . . . . . . . . . 20 format for Unicode 11.0.0 . . . . . . . . . . . . . 20
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79
1. Introduction 1. Introduction
skipping to change at page 5, line 38 skipping to change at page 5, line 38
o A mix between IDNA2003 and IDNA2008 according to local o A mix between IDNA2003 and IDNA2008 according to local
interpretation of the Unicode Technical Standard #46 [UTS-46]. interpretation of the Unicode Technical Standard #46 [UTS-46].
The issue is further complicated by having a very diverse The issue is further complicated by having a very diverse
implementations of the requirements in RFC 5894 [RFC5894] that implementations of the requirements in RFC 5894 [RFC5894] that
registry operators to based on the IDNA2008 specification create registry operators to based on the IDNA2008 specification create
additional rules for what code points are allowed to be used for additional rules for what code points are allowed to be used for
registration. registration.
In practice, the Unicode Consortium set a maximum set of code points In practice, the Unicode Consortium creates a maximum set of code
by assigning code points in the Unicode Standard. The IDNA2008 rules points by assigning code points in the Unicode Standard. The
based on the Unicode Standard create a subset of these by assigning IDNA2008 rules based on the Unicode Standard create a subset of these
the PVALID derived property value to them. Registries (and others by assigning the PVALID derived property value to them. Registries
dealing with Internationalized Domain Names) are supposed to create (and others dealing with Internationalized Domain Names) are supposed
an even smaller subset that ultimately is the set of code points that to create an even smaller subset that ultimately is the set of code
can be used in a particular registry. points that can be used in a particular registry.
There is further recommendation to be conservative when these subsets There is further recommendation to be conservative when these subsets
are calculated and to use the inclusion principle; this is explained are calculated and to use the inclusion principle; this is explained
in SAC-084 [SAC-084] and RFC 6912 [RFC6912]. in SAC-084 [SAC-084] and RFC 6912 [RFC6912].
The complicated situation with deployment of IDNA2008 is discussed The complicated situation with deployment of IDNA2008 is discussed
further in draft-klensin-idna-rfc5891bis further in draft-klensin-idna-rfc5891bis
[I-D.klensin-idna-rfc5891bis] and draft-freytag-troublesome- [I-D.klensin-idna-rfc5891bis] and draft-freytag-troublesome-
characters [I-D.freytag-troublesome-characters]. characters [I-D.freytag-troublesome-characters].
4. Notable changes between Unicode 6.3.0 and 11.0.0 4. Notable changes between Unicode 6.3.0 and 11.0.0
4.1. Changes to Unicode 7.0.0 4.1. Changes to Unicode 7.0.0
The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was
introduced in Unicode 7.0.0. This was discussed in the IETF introduced in Unicode 7.0.0. This was discussed in the IETF
extensively and IAB in their statement [IAB] requesting the IETF to extensively and by IAB in their statement [IAB] requesting the IETF
investigate the issue and specifically IAB stated: to investigate the issue. Specifically IAB stated:
On the same precautionary principle, the IAB recommends that the On the same precautionary principle, the IAB recommends that the
Internationalized Domain Names for Applications (IDNA) Parameters Internationalized Domain Names for Applications (IDNA) Parameters
registry (http://www.iana.org/assignments/idna-tables/) not be registry (http://www.iana.org/assignments/idna-tables/) not be
updated to Unicode 7.0.0 until the IETF has consensus on a updated to Unicode 7.0.0 until the IETF has consensus on a
solution to this problem. solution to this problem.
The discussion in the IETF concluded that although it is possible to The discussion in the IETF concluded that although it is possible to
create "the same" character in multiple ways, the issue with U+08A1 create "the same" character in multiple ways, the issue with U+08A1
is not unique. In the case of U+08A1, it can be represented with the is not unique. In the case of U+08A1, it can be represented with the
skipping to change at page 6, line 43 skipping to change at page 6, line 43
As U+08A1 is discussed in draft-freytag-troublesome-characters As U+08A1 is discussed in draft-freytag-troublesome-characters
[I-D.freytag-troublesome-characters] and elsewhere. Regardless of [I-D.freytag-troublesome-characters] and elsewhere. Regardless of
whether those discussions ends in recommending including the code whether those discussions ends in recommending including the code
point in the repertoire of characters permissable for registration or point in the repertoire of characters permissable for registration or
not, it is acceptable to allow the code point to have a derived not, it is acceptable to allow the code point to have a derived
property value of PVALID. property value of PVALID.
4.2. Changes between Unicode 7.0.0 and 10.0.0 4.2. Changes between Unicode 7.0.0 and 10.0.0
There are no changes made to Unicode between version 7.0.0 and 10.0.0 There are no changes made to Unicode between version 7.0.0 and 10.0.0
that impacts IDNA2008 calculation of the derived property value. that impact IDNA2008 calculation of the derived property value.
4.3. Changes to Unicode 11.0.0 4.3. Changes to Unicode 11.0.0
The Unicode Standard Version 11.0.0 [Unicode-11.0.0] have included a The Unicode Standard Version 11.0.0 [Unicode-11.0.0] has included a
number of changes [Changes-11.0.0] from version 10.0.0, specifically number of changes [Changes-11.0.0] from version 10.0.0, specifically
to UnicodeData.txt: to UnicodeData.txt:
o Entries were added for the 684 new characters, including letters, o Entries were added for the 684 new characters, including letters,
combining marks, digits, symbols, and punctuation marks. combining marks, digits, symbols, and punctuation marks.
o Georgian letters in the ranges U+10D0..U+10FA, U+10FD..U+10FF were o Georgian letters in the ranges U+10D0..U+10FA, U+10FD..U+10FF were
changed from Lo to Ll, to reflect their status as the lowercase of changed from Lo to Ll, to reflect their status as the lowercase of
new Georgian case pairs. Case mappings were also added. new Georgian case pairs. Case mappings were also added.
skipping to change at page 8, line 15 skipping to change at page 8, line 15
5. Conclusion 5. Conclusion
As described in Section 4 changes have been made to Unicode between As described in Section 4 changes have been made to Unicode between
version 6.3.0 and 11.0.0. Some changes to specific characters version 6.3.0 and 11.0.0. Some changes to specific characters
changed their derived property value. Others did not. Given the changed their derived property value. Others did not. Given the
diverse deployment described in Section 3.2 and the changes diverse deployment described in Section 3.2 and the changes
described, including implications to normalization, the conclusion is described, including implications to normalization, the conclusion is
to not add any exception rules to IDNA2008. to not add any exception rules to IDNA2008.
To increase overall harmonization in the use of internationalized To increase overall harmonization in the use of internationalized
domain names, the the author recommends that the derived property domain names, the author recommends that the derived property values
values MUST be calculated according to the IDNA2008 specification for MUST be calculated according to the IDNA2008 specification for
Unicode Version 11.0.0 [Unicode-11.0.0]. Unicode Version 11.0.0 [Unicode-11.0.0].
All registries (and others) SHOULD calculate a repertoir, for example All registries (and others) SHOULD calculate a repertoire, for
as explained in draft-freytag-troublesome-characters example as explained in draft-freytag-troublesome-characters
[I-D.freytag-troublesome-characters] and draft-klensin-idna- [I-D.freytag-troublesome-characters] and draft-klensin-idna-
rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and
inclusive principles as laid out in SAC-084 [SAC-084]. inclusive principles as laid out in SAC-084 [SAC-084].
6. IANA Considerations 6. IANA Considerations
IANA is requested to update the registry of derived property values IANA is requested to update the registry of derived property values
after validation with the Appointed Expert that the derived property after validation with the Appointed Expert that the derived property
values are calculated correctly. values are calculated correctly.
7. Security Considerations 7. Security Considerations
Not following the recommendations regarding use of the IDNA2008 This document makes recommendations regarding the use of the IDNA2008
algorithm for calculation of the derived property value and/or algorithm for calculation of derived property values, based on the
explicitly deciding what subset of the by IDNA2008 algorith applied current Unicode version. It also recommends that registries (and
to current Unicode version should be permissable can lead to various others dealing with Internationalized Domain Names) explicitly select
security issues related to specifically confusability, and that way appropriate subsets of characters with the derived value of PVALID.
Not following these recommendations can lead to various security
issues. Specifically, allowing confusable characters may lead to
various phishing attacks. various phishing attacks.
8. Acknowledgements 8. Acknowledgements
Thanks to John Klensin, Asmus Freytag, Andrew Sullivan, Ted Hardie, Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin,
Suzanne Woolf and Michel Suignard for input to this document. Michel Suignard, Andrew Sullivan and Suzanne Woolf for input to this
document.
9. References 9. References
9.1. Normative References 9.1. Normative References
[IAB] Internet Architecture Board, "IAB Statement on Identifiers [IAB] Internet Architecture Board, "IAB Statement on Identifiers
and Unicode 7.0.0", IAB Statement on Identifiers and and Unicode 7.0.0", IAB Statement on Identifiers and
Unicode 7.0.0 Unicode 7.0.0
https://www.iab.org/documents/correspondence-reports- https://www.iab.org/documents/correspondence-reports-
documents/2015-2/iab-statement-on-identifiers-and-unicode- documents/2015-2/iab-statement-on-identifiers-and-unicode-
 End of changes. 12 change blocks. 
29 lines changed or deleted 33 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/