< draft-faltstrom-unicode11-04.txt   draft-faltstrom-unicode11-05.txt >
Network Working Group P. Faltstrom Network Working Group P. Faltstrom
Internet-Draft Netnod Internet-Draft Netnod
Intended status: Informational October 07, 2018 Intended status: Informational December 01, 2018
Expires: April 10, 2019 Expires: June 4, 2019
IDNA2008 and Unicode 11.0.0 IDNA2008 and Unicode 11.0.0
draft-faltstrom-unicode11-04 draft-faltstrom-unicode11-05
Abstract Abstract
This document describes changes between Unicode 6.3.0 and Unicode This document describes changes between Unicode 6.3.0 and Unicode
11.0.0 in the context of IDNA2008. It further suggests for the IETF 11.0.0 in the context of IDNA2008. It further suggests for the IETF
a path forward regarding ensuring IDNA2008 follows the evolution of a path forward regarding ensuring IDNA2008 follows the evolution of
the Unicode Standard. the Unicode Standard.
In a few cases changes have been made in the Unicode Standard related In a few cases changes have been made in the Unicode Standard related
to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to to the algorithm IDNA2008 specifies. IDNA2008 do give the ability to
skipping to change at page 1, line 44 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 10, 2019. This Internet-Draft will expire on June 4, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 12 skipping to change at page 3, line 12
series and elsewhere as "IDNA2008" and is specified in a series of series and elsewhere as "IDNA2008" and is specified in a series of
documents (see Section Section 3.1). The standard include an documents (see Section Section 3.1). The standard include an
algorithm by which a derived property value is calculated based on algorithm by which a derived property value is calculated based on
the properties defined in the Unicode Standard. the properties defined in the Unicode Standard.
When the Unicode Standard is updated code points are assigned and When the Unicode Standard is updated code points are assigned and
property values might be changed for already assigned code points. property values might be changed for already assigned code points.
Assigning code points might create problems if the newly assigned Assigning code points might create problems if the newly assigned
code points are compositions of code points so that it either changes code points are compositions of code points so that it either changes
or would have changed the normalization functions. This because it or would have changed the normalization functions. This is because
changes the matching algorithms used which in turn might create it changes the matching algorithms used which in turn might create
problems looking up already stored strings in for example DNS. problems looking up already stored strings in for example DNS.
Changing properties for already assigned code points might create Changing properties for already assigned code points might create
problems if the change do result in the derived property value problems if the change results in the derived property value changes.
changes. This might make an earlier allowed code point (derived This might make an earlier allowed code point (derived property value
property value PVALID) not be allowed anymore (derived property value PVALID) not be allowed anymore (derived property value DISALLOWED).
DISALLOWED). Or the other way around, a code point that was not Or the other way around, a code point that was not allowed (and
allowed (and because of that blocked in some situations) suddenly end because of that blocked in some situations) suddenly end up being
up being allowed. allowed.
Historically the IETF has accepted all implications of changes in the Historically the IETF has accepted all implications of changes in the
Unicode Standard even though the changes have resulted in problematic Unicode Standard even though the changes have resulted in problematic
changes in the derived property value. The primary reason for that changes in the derived property value. The primary reason for that
is that staying with the Unicode Standard has been viewed as is that staying with the Unicode Standard has been viewed as
important given the diversity in implementations already existing in important given the diversity in implementations already existing in
the wild. the wild.
As described in Section 4, a few changes have been made regarding As described in Section 4, a few changes have been made regarding
certain attributes to code points in Unicode between version 6.3.0 certain attributes to code points in Unicode between version 6.3.0
skipping to change at page 4, line 8 skipping to change at page 4, line 8
Specifically, the Internet Architecture Board did issue a statement Specifically, the Internet Architecture Board did issue a statement
[IAB] which requested IETF to resolve the issues related to the code [IAB] which requested IETF to resolve the issues related to the code
point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in
Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and
suggests IDNA2008 standard is to follow the Unicode Standard and not suggests IDNA2008 standard is to follow the Unicode Standard and not
update RFC 5892 [RFC5892] or any other IDNA2008 RFCs. update RFC 5892 [RFC5892] or any other IDNA2008 RFCs.
2. Keywords for Requirement Levels 2. Keywords for Requirement Levels
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
document are to be interpreted as described in RFC 2119 [RFC2119]. "OPTIONAL" in this document are to be interpreted as described in BCP
14 RFC2119 [RFC2119] RFC8174 [RFC8174] when, and only when, they
appear in all capitals, as shown here.
3. Background 3. Background
3.1. IDNA2008 Documents 3.1. IDNA2008 Documents
IDNA2008 consists of the following documents: IDNA2008 consists of the following documents:
o A document, RFC 5890 [RFC5890], containing definitions and other o A document, RFC 5890 [RFC5890], containing definitions and other
material that are needed for understanding other documents in the material that are needed for understanding other documents in the
set. It is referred to informally in other documents in the set set. It is referred to informally in other documents in the set
skipping to change at page 5, line 11 skipping to change at page 5, line 15
o A document, RFC 5895 [RFC5895], that discusses the issue of o A document, RFC 5895 [RFC5895], that discusses the issue of
mapping characters into other characters and that provides mapping characters into other characters and that provides
guidance for doing so when that is appropriate. That document, guidance for doing so when that is appropriate. That document,
referred to informally as "Mapping", provides advice; it is not a referred to informally as "Mapping", provides advice; it is not a
required part of IDNA. required part of IDNA.
o A document, RFC 6452 [RFC6452], that looks at some changes made to o A document, RFC 6452 [RFC6452], that looks at some changes made to
Unicode 6.0.0 [Unicode-6.0.0] that resulted in the derived Unicode 6.0.0 [Unicode-6.0.0] that resulted in the derived
property value change for the code points U+0CF1, U+0CF2 and property value change for the code points U+0CF1, U+0CF2 and
U+19DA. The first two changed from DISALLOWED to PVALID, the last U+19DA. The first two changed from DISALLOWED to PVALID, the last
from PVALID to DISSALOWED. IETF came to the conclusion the from PVALID to DISSALOWED. IETF came to the conclusion that no
changes where acceptable and RFC 5892 [RFC5892] was not updated to update is needed to RFC 5892 [RFC5892] based on the changes made
make the derived property value not change for these code points. in Unicode 6.0.0 [Unicode-6.0.0]. As a result, the derived
property value remained aligned with the Unicode Standard.
3.2. Deployment 3.2. Deployment
The deployment of IDNA2008 is unfortunately quite diverse. The The deployment of IDNA2008 is unfortunately quite diverse. The
following lists some of the strategies that existing implementations following lists some of the strategies that existing implementations
are known to implement: are known to implement:
o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491] o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 [RFC3491]
which implies using a table within which it is said whether code which implies using a table within which it is said whether code
points are allowed to be used or not, and this after doing the in points are allowed to be used or not, after doing the
IDNA2003 included normalization. normalization specified in IDNA2003.
o A mix between IDNA2003 and IDNA2008 where code points assigned to o A mix between IDNA2003 and IDNA2008 where code points assigned to
Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property
value calculated according to the algorithm specified in IDNA2008. value calculated according to the algorithm specified in IDNA2008.
o Strict IDNA2008 following IANA which implies stayed at Unicode o Strict IDNA2008 following IANA which implies staying at Unicode
6.3.0 [Unicode-6.3.0] and treating later assigned code points as 6.3.0 [Unicode-6.3.0] and treating later assigned code points as
UNASSIGNED. UNASSIGNED.
o The IDNA2008 algorithm applied to whatever version of Unicode o The IDNA2008 algorithm applied to whatever version of Unicode
Standard exists in the operating system and/or libraries used, Standard exists in the operating system and/or libraries used,
regardless of whether the version is later than Unicode version regardless of whether the version is later than Unicode version
6.3.0 or not. 6.3.0 or not.
o A mix between IDNA2003 and IDNA2008 according to local o A mix between IDNA2003 and IDNA2008 according to local
interpretation of the Unicode Technical Standard #46 [UTS-46]. interpretation of the Unicode Technical Standard #46 [UTS-46].
The issue is further complicated by having a very diverse The issue is further complicated by having a very diverse
implementations of the requirements in RFC 5894 [RFC5894] that implementations of the requirements in RFC 5894 [RFC5894] by registry
registry operators to based on the IDNA2008 specification create operators based on the IDNA2008 specification to create additional
additional rules for what code points are allowed to be used for rules for what code points are allowed to be used for registration.
registration.
In practice, the Unicode Consortium creates a maximum set of code In practice, the Unicode Consortium creates a maximum set of code
points by assigning code points in the Unicode Standard. The points by assigning code points in the Unicode Standard. The
IDNA2008 rules based on the Unicode Standard create a subset of these IDNA2008 rules based on the Unicode Standard create a subset of these
by assigning the PVALID derived property value to them. Registries by assigning the PVALID derived property value to them. Registries
(and others dealing with Internationalized Domain Names) are supposed (and others dealing with Internationalized Domain Names) are supposed
to create an even smaller subset that ultimately is the set of code to create an even smaller subset that ultimately is the set of code
points that can be used in a particular registry. points that can be used in a particular registry.
There is further recommendation to be conservative when these subsets There is further recommendation to be conservative when these subsets
are calculated and to use the inclusion principle; this is explained are calculated and to use the inclusion principle; this is explained
in SAC-084 [SAC-084] and RFC 6912 [RFC6912]. in SAC-084 [SAC-084] and RFC 6912 [RFC6912].
The complicated situation with deployment of IDNA2008 is discussed
further in draft-klensin-idna-rfc5891bis
[I-D.klensin-idna-rfc5891bis] and draft-freytag-troublesome-
characters [I-D.freytag-troublesome-characters].
4. Notable changes between Unicode 6.3.0 and 11.0.0 4. Notable changes between Unicode 6.3.0 and 11.0.0
4.1. Changes to Unicode 7.0.0 4.1. Changes to Unicode 7.0.0
The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was The character ARABIC LETTER BEH WITH HAMZA ABOVE U+08A1 was
introduced in Unicode 7.0.0. This was discussed in the IETF introduced in Unicode 7.0.0. This was discussed in the IETF
extensively and by IAB in their statement [IAB] requesting the IETF extensively and by IAB in their statement [IAB] requesting the IETF
to investigate the issue. Specifically IAB stated: to investigate the issue. Specifically IAB stated:
On the same precautionary principle, the IAB recommends that the On the same precautionary principle, the IAB recommends that the
skipping to change at page 8, line 25 skipping to change at page 8, line 25
As described in Section 4 changes have been made to Unicode between As described in Section 4 changes have been made to Unicode between
version 6.3.0 and 11.0.0. Some changes to specific characters version 6.3.0 and 11.0.0. Some changes to specific characters
changed their derived property value. Others did not. Given the changed their derived property value. Others did not. Given the
diverse deployment described in Section 3.2 and the changes diverse deployment described in Section 3.2 and the changes
described, including implications to normalization, the conclusion is described, including implications to normalization, the conclusion is
to not add any exception rules to IDNA2008. to not add any exception rules to IDNA2008.
To increase overall harmonization in the use of internationalized To increase overall harmonization in the use of internationalized
domain names, the author recommends that the derived property values domain names, the author recommends that the derived property values
MUST be calculated according to the IDNA2008 specification for MUST be calculated as specified in the documents listed in section
Unicode Version 11.0.0 [Unicode-11.0.0]. Section 3.1 also with code points in Unicode Version 11.0.0
[Unicode-11.0.0].
All registries (and others) SHOULD calculate a repertoire, for All registries (and others) SHOULD calculate a repertoire using the
example as explained in draft-freytag-troublesome-characters conservatism and inclusion principles as laid out for example in in
[I-D.freytag-troublesome-characters] and draft-klensin-idna- SAC-084 [SAC-084].
rfc5891bis [I-D.klensin-idna-rfc5891bis] using the conservatism and
inclusion principles as laid out in SAC-084 [SAC-084].
6. IANA Considerations 6. IANA Considerations
IANA is requested to update the registry of derived property values IANA is requested to update the registry of derived property values
after validation with the Appointed Expert that the derived property after validation with the Appointed Expert that the derived property
values are calculated correctly. values are calculated correctly.
7. Security Considerations 7. Security Considerations
This document makes recommendations regarding the use of the IDNA2008 This document makes recommendations regarding the use of the IDNA2008
algorithm for calculation of derived property values, based on the algorithm for calculation of derived property values, based on the
current Unicode version. It also recommends that registries (and current Unicode version. It also recommends that registries (and
others dealing with Internationalized Domain Names) explicitly select others dealing with Internationalized Domain Names) explicitly select
appropriate subsets of characters with the derived value of PVALID. appropriate subsets of characters with the derived value of PVALID.
Not following these recommendations can lead to various security Not following these recommendations can lead to various security
issues. Specifically, allowing confusable characters may lead to issues. Specifically, allowing confusable characters may lead to
various phishing attacks. various phishing attacks. See Security Consideration Sections in the
documents listed in section Section 3.1.
8. Acknowledgements 8. Acknowledgements
Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik Thanks to Martin Durst, Asmus Freytag, Ted Hardie, John Klensin, Erik
Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for Nordmark, Michel Suignard, Andrew Sullivan and Suzanne Woolf for
input to this document. input to this document.
9. References 9. References
9.1. Normative References 9.1. Normative References
[IAB] Internet Architecture Board, "IAB Statement on Identifiers
and Unicode 7.0.0", IAB Statement on Identifiers and
Unicode 7.0.0
https://www.iab.org/documents/correspondence-reports-
documents/2015-2/iab-statement-on-identifiers-and-unicode-
7-0-0/, January 2015.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", Profile for Internationalized Domain Names (IDN)",
RFC 3491, DOI 10.17487/RFC3491, March 2003, RFC 3491, DOI 10.17487/RFC3491, March 2003,
<https://www.rfc-editor.org/info/rfc3491>. <https://www.rfc-editor.org/info/rfc3491>.
skipping to change at page 10, line 10 skipping to change at page 9, line 50
[RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
for Internationalized Domain Names for Applications for Internationalized Domain Names for Applications
(IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,
<https://www.rfc-editor.org/info/rfc5893>. <https://www.rfc-editor.org/info/rfc5893>.
[RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
Points and Internationalized Domain Names for Applications Points and Internationalized Domain Names for Applications
(IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
November 2011, <https://www.rfc-editor.org/info/rfc6452>. November 2011, <https://www.rfc-editor.org/info/rfc6452>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
9.2. Non-normative references 9.2. Non-normative references
[Changes-11.0.0] [Changes-11.0.0]
The Unicode Consortium, "Unicode Standard Annex #44", The Unicode Consortium, "Unicode Standard Annex #44",
Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, Unicode Standard Annex #44, UNICODE CHARACTER DATABASE,
Change History https://www.unicode.org/reports/tr44/ Change History https://www.unicode.org/reports/tr44/
tr44-21d4.html#Change_History, May 2018. tr44-21d4.html#Change_History, May 2018.
[I-D.freytag-troublesome-characters] [I-D.freytag-troublesome-characters]
Freytag, A., Klensin, J., and A. Sullivan, "Those Freytag, A., Klensin, J., and A. Sullivan, "Those
Troublesome Characters: A Registry of Unicode Code Points Troublesome Characters: A Registry of Unicode Code Points
Needing Special Consideration When Used in Network Needing Special Consideration When Used in Network
Identifiers", draft-freytag-troublesome-characters-01 Identifiers", draft-freytag-troublesome-characters-02
(work in progress), June 2017. (work in progress), June 2018.
[I-D.klensin-idna-rfc5891bis] [IAB] Internet Architecture Board, "IAB Statement on Identifiers
Klensin, J. and A. Freytag, "Internationalized Domain and Unicode 7.0.0", IAB Statement on Identifiers and
Names in Applications (IDNA): Registry Restrictions and Unicode 7.0.0
Recommendations", draft-klensin-idna-rfc5891bis-01 (work https://www.iab.org/documents/correspondence-reports-
in progress), September 2017. documents/2015-2/iab-statement-on-identifiers-and-unicode-
7-0-0/, January 2015.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, DOI 10.17487/RFC3490, March 2003, RFC 3490, DOI 10.17487/RFC3490, March 2003,
<https://www.rfc-editor.org/info/rfc3490>. <https://www.rfc-editor.org/info/rfc3490>.
[RFC5894] Klensin, J., "Internationalized Domain Names for [RFC5894] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Background, Explanation, and Applications (IDNA): Background, Explanation, and
Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
<https://www.rfc-editor.org/info/rfc5894>. <https://www.rfc-editor.org/info/rfc5894>.
 End of changes. 18 change blocks. 
51 lines changed or deleted 46 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/