< draft-faltstrom-unicode12-06.txt   draft-faltstrom-unicode12-07.txt >
Network Working Group P. Faltstrom Network Working Group P. Faltstrom
Internet-Draft Netnod Internet-Draft Netnod
Intended status: Standards Track January 04, 2022 Intended status: Standards Track February 13, 2022
Expires: July 8, 2022 Expires: August 17, 2022
IDNA2008 and Unicode 12.0.0 IDNA2008 and Unicode 12.0.0
draft-faltstrom-unicode12-06 draft-faltstrom-unicode12-07
Abstract Abstract
This document describes the changes between Unicode 6.0.0 and Unicode This document describes the changes between Unicode 6.0.0 and Unicode
12.0.0 in the context of IDNA2008. Some additions and changes have 12.0.0 in the context of IDNA2008. Some additions and changes have
been made in the Unicode Standard that affect the values produced by been made in the Unicode Standard that affect the values produced by
the algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions the algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions
to the algorithm for backward compatibility; however, this document to the algorithm for backward compatibility; however, this document
does not add any such exceptions. This document provides the does not add any such exceptions. This document provides the
necessary tables to IANA to make its database consistent with Unicode necessary tables to IANA to make its database consistent with Unicode
skipping to change at page 1, line 45 skipping to change at page 1, line 45
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 8, 2022. This Internet-Draft will expire on August 17, 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 5 2.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 5
2.2. Additional important IDNA2008-related documents . . . . . 6 2.2. Additional important IDNA2008-related documents . . . . . 6
2.3. Deployment . . . . . . . . . . . . . . . . . . . . . . . 6 2.3. Deployment . . . . . . . . . . . . . . . . . . . . . . . 6
3. Notable Changes Between Unicode 6.0.0 and 12.0.0 . . . . . . 7 3. Notable Changes Between Unicode 6.0.0 and 12.0.0 . . . . . . 7
3.1. Changes between Unicode 6.0.0 and 7.0.0 . . . . . . . . . 7 3.1. Changes between Unicode 6.0.0 and 7.0.0 . . . . . . . . . 7
3.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 8 3.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 8
3.3. Changes between Unicode 10.0.0 and 11.0.0 . . . . . . . . 9 3.3. Changes between Unicode 10.0.0 and 11.0.0 . . . . . . . . 9
3.4. Changes between Unicode 11.0.0 and 12.0.0 . . . . . . . . 10 3.4. Changes between Unicode 11.0.0 and 12.0.0 . . . . . . . . 10
4. U+111C9 SHARADA SANDHI MARK . . . . . . . . . . . . . . . . . 11 4. U+111C9 SHARADA SANDHI MARK . . . . . . . . . . . . . . . . . 11
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 11 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 11
skipping to change at page 3, line 9 skipping to change at page 3, line 9
1. Introduction 1. Introduction
The current version of Internationalized Domain Names for The current version of Internationalized Domain Names for
Applications (IDNA) was initiated in 2008, and despite not being Applications (IDNA) was initiated in 2008, and despite not being
completed until 2010, is widely known as "IDNA2008". It is specified completed until 2010, is widely known as "IDNA2008". It is specified
in the series of documents listed in Section 2.1. The IDNA2008 in the series of documents listed in Section 2.1. The IDNA2008
standard includes an algorithm by which a derived property value is standard includes an algorithm by which a derived property value is
calculated based on the properties defined from the Unicode Standard. calculated based on the properties defined from the Unicode Standard.
The derived property values that can be calculated are defined in RFC The derived property values that can be calculated are defined in RFC
5892 [RFC5892]. The summary below is a summary to make the reading 5892 [RFC5892]. Below is a summary to aid in the reading of this
of this document easier. For definition of the terms, please see RFC document. For definition of the terms, please see RFC 5892
5892 [RFC5892]. [RFC5892].
o PROTOCOL VALID: Those that are allowed to be used in IDNs. Code o PROTOCOL VALID: Those that are allowed to be used in IDNs. Code
points with this property value are permitted for general use in points with this property value are permitted for general use in
IDNs. However, that a label consists only of code points that IDNs. However, that a label consists only of code points that
have this property value does not imply that the label can be used have this property value does not imply that the label can be used
in DNS. The abbreviated term PVALID is used to refer to this in DNS. The abbreviated term PVALID is used to refer to this
value. value.
o CONTEXTUAL RULE REQUIRED: Some characteristics of the character, o CONTEXTUAL RULE REQUIRED: Some characteristics of the character,
such as it being invisible in certain contexts or problematic in such as it being invisible in certain contexts or problematic in
skipping to change at page 4, line 15 skipping to change at page 4, line 15
o Problems can also be created if the properties assigned to those o Problems can also be created if the properties assigned to those
code points are inconsistent with IDNA2008 assumptions about how code points are inconsistent with IDNA2008 assumptions about how
properties are assigned and/or about how code points with those properties are assigned and/or about how code points with those
properties are used or behave. properties are used or behave.
There were three incompatible changes in the Unicode standard between There were three incompatible changes in the Unicode standard between
Unicode 5.2.0 [Unicode-5.2.0] and Unicode 6.0.0 [Unicode-6.0.0]; they Unicode 5.2.0 [Unicode-5.2.0] and Unicode 6.0.0 [Unicode-6.0.0]; they
are described in RFC 6452 [RFC6452]. The code points U+0CF1 and are described in RFC 6452 [RFC6452]. The code points U+0CF1 and
U+0CF2 had a derived property value change from DISALLOWED to PVALID, U+0CF2 had a derived property value change from DISALLOWED to PVALID,
and the code point U+19DA had a change in derived property value from and the code point U+19DA had a change in derived property value from
PVALID to DISALLOWED. These changes where exampined in great detail, PVALID to DISALLOWED. These changes where examined in great detail,
but the IETF concluded that these changes to the Unicode standard did but the IETF concluded that these changes to the Unicode standard did
not warrant an update to RFC 5892 [RFC5892]. not warrant an update to RFC 5892 [RFC5892].
As described in Section 3, more incompatible changes have been made As described in Section 3, more incompatible changes have been made
to code points between Unicode 6.0.0 and Unicode 12.0.0 to code points between Unicode 6.0.0 and Unicode 12.0.0
[Unicode-12.0.0]; however, the changes in the derived property values [Unicode-12.0.0]; however, the changes in the derived property values
do not result in exceptions (as defined in section 2.6 of RFC 5892 do not result in exceptions (as defined in section 2.6 of RFC 5892
[RFC5892]) being added to RFC 5892 [RFC5892]. [RFC5892]) being added to RFC 5892 [RFC5892].
Further, in 2015, the Internet Architecture Board (IAB) issued a Further, in 2015, the Internet Architecture Board (IAB) issued a
skipping to change at page 4, line 38 skipping to change at page 4, line 38
resolve the issues related to the code point ARABIC LETTER BEH WITH resolve the issues related to the code point ARABIC LETTER BEH WITH
HAMZA ABOVE (U+08A1) that was introduced in Unicode 7.0.0 HAMZA ABOVE (U+08A1) that was introduced in Unicode 7.0.0
[Unicode-7.0.0]. In February of that year, the statement was revised [Unicode-7.0.0]. In February of that year, the statement was revised
[IAB2005-2] to focus on the latter request. More details about the [IAB2005-2] to focus on the latter request. More details about the
problem of code point sequences not normalizing as one might expect problem of code point sequences not normalizing as one might expect
appear in a draft that was part of the discussion [IDNA7]. appear in a draft that was part of the discussion [IDNA7].
The result of the work in the IETF was that no exception was added to The result of the work in the IETF was that no exception was added to
RFC 5892 [RFC5892]; however, it should be noted that the review of RFC 5892 [RFC5892]; however, it should be noted that the review of
the issues around U+08A1 indicated that this code point is not an the issues around U+08A1 indicated that this code point is not an
isolated case and that a number of PVALID code points of long isolated case and that a number of long-standing PVALID code points
standing may have similar issues. While the affected code points may have similar issues. While the affected code points remain
remain PVALID in this document, identification of the problem PVALID in this document, identification of the problem resulted in a
resulted in a clarification of the review process for new Unicode clarification of the review process for new Unicode versions. That
versions. That clarification, which reinforces the original review clarification, which reinforces the original review plan to capture
plan to capture issues like these, was published as RFC 8753 issues like these, was published as RFC 8753 [RFC8753]. Any review
[RFC8753]. Any review of Unicode versions after 12.0.0 should be of Unicode versions after 12.0.0 should be made according to RFC 8753
made according to RFC 8753 [RFC8753]; an objective of this document [RFC8753]; an objective of this document is to ensure that a proper
is to ensure that a proper review of such versions after version review of such versions after version 12.0.0 can be made.
12.0.0 can be made.
2. Background 2. Background
2.1. IDNA2008 Documents 2.1. IDNA2008 Documents
IDNA2008 consists of the following documents. The documents in the IDNA2008 consists of the following documents. The documents in the
set have informal names. set have informal names.
o Internationalized Domain Names for Applications (IDNA): o Internationalized Domain Names for Applications (IDNA):
Definitions and Document Framework [RFC5890], informally called Definitions and Document Framework [RFC5890], informally called
"Defs" or "Definitions", contains definitions and other material "Defs" or "Definitions", contains definitions and other material
that are needed for understanding other documents in the set. that are needed for understanding other documents in the set.
skipping to change at page 6, line 48 skipping to change at page 6, line 48
that represent how the implementers believe Stringprep [RFC3454] that represent how the implementers believe Stringprep [RFC3454]
and Nameprep [RFC3491] would have evolved had the IETF not moved and Nameprep [RFC3491] would have evolved had the IETF not moved
in the direction of IDNA2008 instead. in the direction of IDNA2008 instead.
o A mix between IDNA2003 and IDNA2008 where code points assigned to o A mix between IDNA2003 and IDNA2008 where code points assigned to
Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property
value calculated according to the algorithm specified in IDNA2008. value calculated according to the algorithm specified in IDNA2008.
o A mix between IDNA2003 and IDNA2008 according to the Unicode o A mix between IDNA2003 and IDNA2008 according to the Unicode
Technical Standard #46 [UTS-46]. Because that document specifies Technical Standard #46 [UTS-46]. Because that document specifies
different profiles, there are several different variations that different profiles, there are several variations that leave users
leave users with no guarantee that two applications claiming with no guarantee that two applications claiming conformance to
conformance to UTS#46 will interoperate well with each other much UTS#46 will interoperate well with each other much less with
less with conforming IDNA2008 implementations. UTS#46 is conforming IDNA2008 implementations. UTS#46 is ultimately based
ultimately based on a normative table very much like the one used on a normative table very much like the one used by Stringprep
by Stringprep [RFC3454] but updated for each new version of [RFC3454] but updated for each new version of Unicode.
Unicode.
o The (normative) IDNA2008 algorithm applied to whatever version of o The (normative) IDNA2008 algorithm applied to whatever version of
Unicode Standard exists in the operating system and/or libraries Unicode Standard exists in the operating system and/or libraries
used, independent of whatever version of tables appears in the used, independent of whatever version of tables appears in the
(non-normative) IANA database. (non-normative) IANA database.
In practice, the Unicode Consortium creates a maximum set of code In practice, the Unicode Consortium creates a maximum set of code
points by assigning code points in the Unicode Standard. The points by assigning code points in the Unicode Standard. The
IDNA2008 rules use the Unicode Standard to create a further subset of IDNA2008 rules use the Unicode Standard to create a further subset of
code points and context that are permitted in DNS labels associated code points and context that are permitted in DNS labels associated
skipping to change at page 8, line 26 skipping to change at page 8, line 26
(Nonspacing_Mark), but that did not impact the calculation of the (Nonspacing_Mark), but that did not impact the calculation of the
derived property value which stayed at DISALLOWED. derived property value which stayed at DISALLOWED.
The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was
introduced in Unicode 7.0.0. This was discussed extensively in the introduced in Unicode 7.0.0. This was discussed extensively in the
IETF, and by the IAB in their statement [IAB2005-1] requesting the IETF, and by the IAB in their statement [IAB2005-1] requesting the
IETF to investigate the issue. Specifically, the IAB stated: IETF to investigate the issue. Specifically, the IAB stated:
On the same precautionary principle, the IAB recommends that the On the same precautionary principle, the IAB recommends that the
Internationalized Domain Names for Applications (IDNA) Parameters Internationalized Domain Names for Applications (IDNA) Parameters
registry <http://www.iana.org/assignments/idna-tables/> not be registry <https://www.iana.org/assignments/idna-tables/> not be
updated to Unicode 7.0.0 until the IETF has consensus on a updated to Unicode 7.0.0 until the IETF has consensus on a
solution to this problem. solution to this problem.
The discussion in the IETF concluded that although it is possible to The discussion in the IETF concluded that although it is possible to
create "the same" character in multiple ways, the issue with U+08A1 create "the same" character in multiple ways, the issue with U+08A1
is not unique. The character U+08A1 (ARABIC LETTER BEH WITH HAMZA is not unique. The character U+08A1 (ARABIC LETTER BEH WITH HAMZA
ABOVE) can be represented with the sequence ARABIC LETTER BEH ABOVE) can be represented with the sequence ARABIC LETTER BEH
(U+0628) and ARABIC HAMZA ABOVE (U+0654). This identical to LATIN (U+0628) and ARABIC HAMZA ABOVE (U+0654). This identical to LATIN
SMALL LETTER O WITH STROKE (U+00F8), which can be represented with SMALL LETTER O WITH STROKE (U+00F8), which can be represented with
the sequence LATIN SMALL LETTER O (U+006F) followed by COMBINING the sequence LATIN SMALL LETTER O (U+006F) followed by COMBINING
SHORT SOLIDUS OVERLAY (U+0337). SHORT SOLIDUS OVERLAY (U+0337).
Although the discussion about this specific code point resulted in Although the discussion about this specific code point resulted in
acceptance of the derived property value of PVALID, the underlying acceptance of the derived property value of PVALID, the underlying
problem with combining sequences is not understood fully. Therefore problem with combining sequences is not understood fully. Therefore,
it cannot be claimed that this case can be extrapolated to other it cannot be claimed that this case can be extrapolated to other
situations and other code points. situations and other code points.
3.2. Changes between Unicode 7.0.0 and 10.0.0 3.2. Changes between Unicode 7.0.0 and 10.0.0
Change in number of characters in each category: Change in number of characters in each category:
Code points that changed derived property value: 0 Code points that changed derived property value: 0
PVALID changed from 99867 to 122411 (+22544) PVALID changed from 99867 to 122411 (+22544)
skipping to change at page 11, line 13 skipping to change at page 11, line 13
TOTAL did not change, at 1114112 TOTAL did not change, at 1114112
4. U+111C9 SHARADA SANDHI MARK 4. U+111C9 SHARADA SANDHI MARK
As one can see in Section 3, an incompatible property change was made As one can see in Section 3, an incompatible property change was made
between Unicode 6.0.0 and 12.0.0, affecting the code point U+111C9. between Unicode 6.0.0 and 12.0.0, affecting the code point U+111C9.
Its derived property value thus changed from DISALLOWED to PVALID. Its derived property value thus changed from DISALLOWED to PVALID.
In situations like these, IDNA2008 allow for addition of rules to RFC In situations like these, IDNA2008 allow for addition of rules to RFC
5892 [RFC5892] section 2.7. If the code point is accepted, it might 5892 [RFC5892] section 2.7. If the code point is accepted, it might
still be rejected if validated by software based on older versions of still be rejected if validated by software based on older versions of
Unicode than 11.0.0. As the character is rarely used outside of the Unicode than 12.0.0. As the character is rarely used outside the
group of Sharada specialists, and used in some records for indicating group of Sharada specialists, and used in some records for indicating
sandhi breaks, the conclusion is that it could either be added as an sandhi breaks, the conclusion is that it could either be added as an
exception or allowed to change its property value, as the use of the exception or allowed to change its property value, as the use of the
code point is limited outside a special community. As including an code point is limited outside a special community. As including an
exception would require implementation changes in deployed exception would require implementation changes in deployed
implementations of IDNA20008, the IETF has decided to not add a implementations of IDNA20008, the IETF has decided to not add a
BackwardCompatible rule to IDNA2008 (i.e. Section 2.7 of RFC 5892 BackwardCompatible rule to IDNA2008 (i.e. Section 2.7 of RFC 5892
[RFC5892] for this code point. This also ensures all sandhi marks [RFC5892] for this code point. This also ensures all sandhi marks
being treated in an equal way. being treated in an equal way.
skipping to change at page 14, line 26 skipping to change at page 14, line 26
Names for Applications (IDNA) Review for New Unicode Names for Applications (IDNA) Review for New Unicode
Versions", RFC 8753, DOI 10.17487/RFC8753, April 2020, Versions", RFC 8753, DOI 10.17487/RFC8753, April 2020,
<https://www.rfc-editor.org/info/rfc8753>. <https://www.rfc-editor.org/info/rfc8753>.
[SAC-084] The Security and Stability Advisory Committee, "SAC084", [SAC-084] The Security and Stability Advisory Committee, "SAC084",
SSAC Comments on Guidelines for the Extended Process SSAC Comments on Guidelines for the Extended Process
Similarity Review Panel for the IDN ccTLD Fast Track Similarity Review Panel for the IDN ccTLD Fast Track
Process <https://www.icann.org/en/system/files/files/sac- Process <https://www.icann.org/en/system/files/files/sac-
084-en.pdf>, August 2016. 084-en.pdf>, August 2016.
[Unicode-10.0.0]
The Unicode Consortium, "The Unicode Standard, Version
10.0.0", The Unicode Standard, Version 10.0.0 ISBN
978-1-936213-16-0, June 2017.
[Unicode-11.0.0]
The Unicode Consortium, "The Unicode Standard, Version
11.0.0", The Unicode Standard, Version 11.0.0 ISBN
978-1-936213-19-1, June 2018.
[Unicode-12.0.0]
The Unicode Consortium, "The Unicode Standard, Version
12.0.0", The Unicode Standard, Version 12.0.0 ISBN
978-1-936213-22-1, March 2019.
[Unicode-3.2.0] [Unicode-3.2.0]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
3.2.0", The Unicode Standard, Version 3.2.0 ISBN 3.2.0", The Unicode Standard, Version 3.2.0 ISBN
0-201-61633-5, March 2002. 0-201-61633-5, March 2002.
[Unicode-5.2.0] [Unicode-5.2.0]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
5.2.0", The Unicode Standard, Version 5.2.0 ISBN 5.2.0", The Unicode Standard, Version 5.2.0 ISBN
978-1-936213-00-9, October 2009. 978-1-936213-00-9, October 2009.
skipping to change at page 15, line 20 skipping to change at page 15, line 5
[Unicode-7.0.0] [Unicode-7.0.0]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
7.0.0", The Unicode Standard, Version 7.0.0 ISBN 7.0.0", The Unicode Standard, Version 7.0.0 ISBN
978-1-936213-09-2, June 2014. 978-1-936213-09-2, June 2014.
[Unicode-8.0.0] [Unicode-8.0.0]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
8.0.0", The Unicode Standard, Version 8.0.0 ISBN 8.0.0", The Unicode Standard, Version 8.0.0 ISBN
978-1-936213-10-8, June 2015. 978-1-936213-10-8, June 2015.
[Unicode-10.0.0]
The Unicode Consortium, "The Unicode Standard, Version
10.0.0", The Unicode Standard, Version 10.0.0 ISBN
978-1-936213-16-0, June 2017.
[Unicode-11.0.0]
The Unicode Consortium, "The Unicode Standard, Version
11.0.0", The Unicode Standard, Version 11.0.0 ISBN
978-1-936213-19-1, June 2018.
[Unicode-12.0.0]
The Unicode Consortium, "The Unicode Standard, Version
12.0.0", The Unicode Standard, Version 12.0.0 ISBN
978-1-936213-22-1, March 2019.
[UTS-46] The Unicode Consortium, "Unicode Technical Standard #46, [UTS-46] The Unicode Consortium, "Unicode Technical Standard #46,
Version 12.0.0", UNICODE IDNA COMPATIBILITY Version 12.0.0", UNICODE IDNA COMPATIBILITY
PROCESSING <http://www.unicode.org/reports/tr46/>, March PROCESSING <https://www.unicode.org/reports/tr46/>, March
2019. 2019.
Appendix A. Changes from Unicode 6.0.0 to Unicode 7.0.0 Appendix A. Changes from Unicode 6.0.0 to Unicode 7.0.0
Changes from derived property value UNASSIGNED to either PVALID or Changes from derived property value UNASSIGNED to either PVALID or
DISALLOWED. DISALLOWED.
037F ; DISALLOWED # GREEK CAPITAL LETTER YOT 037F ; DISALLOWED # GREEK CAPITAL LETTER YOT
0528 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK 0528 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK
0529 ; PVALID # CYRILLIC SMALL LETTER EN WITH LEFT HOOK 0529 ; PVALID # CYRILLIC SMALL LETTER EN WITH LEFT HOOK
 End of changes. 15 change blocks. 
46 lines changed or deleted 43 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/