< draft-faltstrom-unicode11-01.txt   draft-faltstrom-unicode11-02.txt >
Network Working Group P. Faltstrom Network Working Group P. Faltstrom
Internet-Draft Netnod Internet-Draft Netnod
Intended status: Informational July 2, 2018 Intended status: Informational September 25, 2018
Expires: January 3, 2019 Expires: March 29, 2019
IDNA2008 and Unicode 11.0.0 IDNA2008 and Unicode 11.0.0
draft-faltstrom-unicode11-01 draft-faltstrom-unicode11-02
Abstract Abstract
This document describes changes between Unicode 6.3.0 and Unicode This document describes changes between Unicode 6.3.0 and Unicode
11.0.0 in the context of IDNA2008. It further suggests for the IETF 11.0.0 in the context of IDNA2008. It further suggests for the IETF
a path forward regarding ensuring IDNA2008 follows the evolution of a path forward regarding ensuring IDNA2008 follows the evolution of
the Unicode Standard. the Unicode Standard.
In a few cases changes have been made in the Unicode Standard related
to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to
add exceptions for backward compatibility to the algorithm but the
conclusions provided in this document suggests no such changes.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 3, 2019. This Internet-Draft will expire on March 29, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 11 skipping to change at page 2, line 15
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 3 2. Keywords for Requirement Levels . . . . . . . . . . . . . . . 3
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 3 3.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 3
3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 4 3.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 5
4. Notable changes between Unicode 6.3.0 and 11.0.0 . . . . . . 5 4. Notable changes between Unicode 6.3.0 and 11.0.0 . . . . . . 6
4.1. Changes to Unicode 7.0.0 . . . . . . . . . . . . . . . . 5 4.1. Changes to Unicode 7.0.0 . . . . . . . . . . . . . . . . 6
4.2. Changes to Unicode 11.0.0 . . . . . . . . . . . . . . . . 6 4.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 6
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3. Changes to Unicode 11.0.0 . . . . . . . . . . . . . . . . 6
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
9.1. Normative References . . . . . . . . . . . . . . . . . . 8 9.1. Normative References . . . . . . . . . . . . . . . . . . 8
9.2. Non-normative references . . . . . . . . . . . . . . . . 9 9.2. Non-normative references . . . . . . . . . . . . . . . . 9
Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 11 Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 12
Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 14 Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 15
Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 15 Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 16
Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 16 Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 17
Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 17 Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 18
Appendix F. Code points in Unicode Character Database (UCD) Appendix F. Code points in Unicode Character Database (UCD)
format for Unicode 11.0.0 . . . . . . . . . . . . . 19 format for Unicode 11.0.0 . . . . . . . . . . . . . 20
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 79
1. Introduction 1. Introduction
The current version of Internationalized Domain Names for The current version of Internationalized Domain Names for
Applications (IDNA) was largely completed in 2008, known within the Applications (IDNA) was largely completed in 2008, known within the
series and elsewhere as "IDNA2008" and is specified in a series of series and elsewhere as "IDNA2008" and is specified in a series of
documents (see Section Section 3.1). The standard include an documents (see Section Section 3.1). The standard include an
algorithm by which a derived property value is calculated based on algorithm by which a derived property value is calculated based on
the properties defined in the Unicode Standard. the properties defined in the Unicode Standard.
skipping to change at page 3, line 5 skipping to change at page 3, line 11
Assigning code points might create problems if the newly assigned Assigning code points might create problems if the newly assigned
code points are compositions of code points so that it either changes code points are compositions of code points so that it either changes
or would have changed the normalization functions. This because it or would have changed the normalization functions. This because it
changes the matching algorithms used which in turn might create changes the matching algorithms used which in turn might create
problems looking up already stored strings in for example DNS. problems looking up already stored strings in for example DNS.
Changing properties to already assigned code points might create Changing properties to already assigned code points might create
problems if the change do result in the derived property value problems if the change do result in the derived property value
changes. This might make an earlier allowed code point (derived changes. This might make an earlier allowed code point (derived
property value PVALID) not be allowed anymore (derived property value property value PVALID) not be allowed anymore (derived property value
DISALLOWED). DISALLOWED). Or the other way around, a code point that was not
allowed (and because of that blocked in some situations) suddenly end
up being allowed.
Historically the IETF has accepted all implications of changes in the Historically the IETF has accepted all implications of changes in the
Unicode Standard even though the changes have resulted in problematic Unicode Standard even though the changes have resulted in problematic
changes in the derived property value. The primary reason for that changes in the derived property value. The primary reason for that
is that staying with the Unicode Standard has been viewed as is that staying with the Unicode Standard has been viewed as
important given the diversity in implementations already existing in important given the diversity in implementations already existing in
the wild. the wild.
The Internet Architecture Board did issue a statement [IAB] which As described in Section 4, a few changes have been made regarding
requested IETF to resolve the issues related to the code point ARABIC certain attributes to code points in Unicode between version 6.3.0
LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in Unicode 7.0.0 and 11.0.0. Such changes could result in either a change in the
[Unicode-7.0.0]. This document resolves this issue and suggests derived property value for the code point in question or no such
IDNA2008 standard is to follow the Unicode Standard and not update change. In turn, if the result is a change, it can be between any of
RFC 5892 [RFC5892] or any other IDNA2008 RFCs. the derived property values except DISALLOWED. Also in this case,
when moving from version 6.3.0 to 11.0.0, this document concludes
that no exceptions are to be added to IDNA2008 even if changes in the
derived property value is a result of the changes made in Unicode.
Specifically, the Internet Architecture Board did issue a statement
[IAB] which requested IETF to resolve the issues related to the code
point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in
Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and
suggests IDNA2008 standard is to follow the Unicode Standard and not
update RFC 5892 [RFC5892] or any other IDNA2008 RFCs.
2. Keywords for Requirement Levels 2. Keywords for Requirement Levels
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
3. Background 3. Background
3.1. IDNA2008 Documents 3.1. IDNA2008 Documents
skipping to change at page 6, line 13 skipping to change at page 6, line 34
create "the same" character in multiple ways, the issue with U+08A1 create "the same" character in multiple ways, the issue with U+08A1
is not unique. In the case of U+08A1, it can be represented with the is not unique. In the case of U+08A1, it can be represented with the
sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654).
Just like LATIN SMALL LETTER A WITH DIAERESIS (U+00E4) can be Just like LATIN SMALL LETTER A WITH DIAERESIS (U+00E4) can be
represented via the sequence LATIN SMALL LETTER A (U+0061), and represented via the sequence LATIN SMALL LETTER A (U+0061), and
COMBINING DIAERESIS (U+0308). One difference between these sequences COMBINING DIAERESIS (U+0308). One difference between these sequences
is how they are treated in the normalization forms specified by the is how they are treated in the normalization forms specified by the
Unicode Consortium. Unicode Consortium.
As U+08A1 is discussed in draft-freytag-troublesome-characters As U+08A1 is discussed in draft-freytag-troublesome-characters
[I-D.freytag-troublesome-characters] and elsewhere, regardless of [I-D.freytag-troublesome-characters] and elsewhere. Regardless of
whether that discussion end in recommending including the code point whether those discussions ends in recommending including the code
in the repertoire of characters permissable for registration or not, point in the repertoire of characters permissable for registration or
it is acceptable to allow the code point to have a derived property not, it is acceptable to allow the code point to have a derived
value of PVALID. property value of PVALID.
4.2. Changes to Unicode 11.0.0 4.2. Changes between Unicode 7.0.0 and 10.0.0
There are no changes made to Unicode between version 7.0.0 and 10.0.0
that impacts IDNA2008 calculation of the derived property value.
4.3. Changes to Unicode 11.0.0
The Unicode Standard Version 11.0.0 [Unicode-11.0.0] have included a The Unicode Standard Version 11.0.0 [Unicode-11.0.0] have included a
number of changes [Changes-11.0.0] from version 10.0.0, specifically number of changes [Changes-11.0.0] from version 10.0.0, specifically
to UnicodeData.txt: to UnicodeData.txt:
o Entries were added for the 684 new characters, including letters, o Entries were added for the 684 new characters, including letters,
combining marks, digits, symbols, and punctuation marks. combining marks, digits, symbols, and punctuation marks.
o Georgian letters in the ranges U+10D0..U+10FA, U+10FD..U+10FF were o Georgian letters in the ranges U+10D0..U+10FA, U+10FD..U+10FF were
changed from Lo to Ll, to reflect their status as the lowercase of changed from Lo to Ll, to reflect their status as the lowercase of
skipping to change at page 7, line 27 skipping to change at page 8, line 7
changed. changed.
o U+29A1 SPHERICAL ANGLE OPENING UP have existed since before o U+29A1 SPHERICAL ANGLE OPENING UP have existed since before
IDNA2008 was created. Applying the IDNA2008 algorithm to the code IDNA2008 was created. Applying the IDNA2008 algorithm to the code
point did assign the derived property value PVALID and that value point did assign the derived property value PVALID and that value
is unchanged even if the underlying Unicode properties have is unchanged even if the underlying Unicode properties have
changed. changed.
5. Conclusion 5. Conclusion
Given the changes laid out in Section 4 the derived property values As described in Section 4 changes have been made to Unicode between
MUST be calculated according to the IDNA2008 specification for version 6.3.0 and 11.0.0. Some changes to specific characters
Unicode Version 11.0.0 [Unicode-11.0.0]. The changes in code points, changed their derived property value. Others did not. Given the
implications to normalization and changes in derived property values diverse deployment described in Section 3.2 and the changes
are acceptable. described, including implications to normalization, the conclusion is
to not add any exception rules to IDNA2008.
All registries and others SHOULD calculate a repertoir as explained To increase overall harmonization in the use of internationalized
in draft-freytag-troublesome-characters domain names, the the author recommends that the derived property
values MUST be calculated according to the IDNA2008 specification for
Unicode Version 11.0.0 [Unicode-11.0.0].
All registries (and others) SHOULD calculate a repertoir, for example
as explained in draft-freytag-troublesome-characters
[I-D.freytag-troublesome-characters] and draft-klensin-idna- [I-D.freytag-troublesome-characters] and draft-klensin-idna-
rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and
inclusive principles as laid out in SAC-084 [SAC-084]. inclusive principles as laid out in SAC-084 [SAC-084].
6. IANA Considerations 6. IANA Considerations
IANA is requested to update the registry of derived property values IANA is requested to update the registry of derived property values
after validation with the Appointed Expert that the derived values after validation with the Appointed Expert that the derived property
are calculated correctly. values are calculated correctly.
7. Security Considerations 7. Security Considerations
Not following the recommendations regarding explicitly deciding what Not following the recommendations regarding use of the IDNA2008
subset of the by IDNA2008 algorith applied to current Unicode version algorithm for calculation of the derived property value and/or
should be permissable can lead to various security issues related to explicitly deciding what subset of the by IDNA2008 algorith applied
specifically confusability, and that way various phishing attacks. to current Unicode version should be permissable can lead to various
security issues related to specifically confusability, and that way
various phishing attacks.
8. Acknowledgements 8. Acknowledgements
Thanks to John Klensin, Asmus Freytag, Andrew Sullivan and Michel Thanks to John Klensin, Asmus Freytag, Andrew Sullivan, Ted Hardie,
Suignard for input to this document. Suzanne Woolf and Michel Suignard for input to this document.
9. References 9. References
9.1. Normative References 9.1. Normative References
[IAB] Internet Architecture Board, "IAB Statement on Identifiers [IAB] Internet Architecture Board, "IAB Statement on Identifiers
and Unicode 7.0.0", IAB Statement on Identifiers and and Unicode 7.0.0", IAB Statement on Identifiers and
Unicode 7.0.0 Unicode 7.0.0
https://www.iab.org/documents/correspondence-reports- https://www.iab.org/documents/correspondence-reports-
documents/2015-2/iab-statement-on-identifiers-and-unicode- documents/2015-2/iab-statement-on-identifiers-and-unicode-
 End of changes. 16 change blocks. 
46 lines changed or deleted 77 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/