idnits 2.17.1 draft-faltstrom-unicode12-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 11, 2021) is 1136 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Faltstrom 3 Internet-Draft Netnod 4 Intended status: Standards Track March 11, 2021 5 Expires: September 12, 2021 7 IDNA2008 and Unicode 12.0.0 8 draft-faltstrom-unicode12-02 10 Abstract 12 This document describes the changes between Unicode 6.2.0 and Unicode 13 12.0.0 in the context of IDNA2008. Some additions and changes have 14 been made in the Unicode Standard that affect the values produced by 15 the algorithm IDNA2008 specifies. IDNA2008 allows adding exceptions 16 to the algorithm for backward compatibility; however, this document 17 does not add any such exceptions. This document provides the 18 necessary tables to IANA to make its database consisstent with 19 Unicode 12.0.0. 21 To improve understanding, this document describes systems that are 22 being used as alternatives to those that conform to IDNA2008. 24 TO BE REMOVED AT TIME OF PUBLICATION AS AN RFC: 26 This document is discussed on the i18n-discuss@ietf.org mailing list 27 of the IETF. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on September 12, 2021. 46 Copyright Notice 48 Copyright (c) 2021 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 2.1. IDNA2008 Documents . . . . . . . . . . . . . . . . . . . 4 66 2.2. Additional important IDNA2008-related documents . . . . . 5 67 2.3. Deployment . . . . . . . . . . . . . . . . . . . . . . . 6 68 3. Notable Changes Between Unicode 6.2.0 and 12.0.0 . . . . . . 7 69 3.1. Changes between Unicode 6.2.0 and 7.0.0 . . . . . . . . . 7 70 3.2. Changes between Unicode 7.0.0 and 10.0.0 . . . . . . . . 8 71 3.3. Changes between Unicode 10.0.0 and 11.0.0 . . . . . . . . 8 72 3.4. Changes between Unicode 11.0.0 and 12.0.0 . . . . . . . . 10 73 4. U+111C9 SHARADA SANDHI MARK . . . . . . . . . . . . . . . . . 10 74 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10 75 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 77 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 78 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 79 9.1. Normative References . . . . . . . . . . . . . . . . . . 12 80 9.2. Non-normative references . . . . . . . . . . . . . . . . 12 81 Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 . . . . 15 82 Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 . . . . 18 83 Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 . . . . 19 84 Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 . . . . 20 85 Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 . . . 21 86 Appendix F. Changes from Unicode 11.0.0 to Unicode 12.0.0 . . . 23 87 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 24 89 1. Introduction 91 The current version of Internationalized Domain Names for 92 Applications (IDNA) was initiated in 2008, and despite not being 93 completed until 2010, is widely known as "IDNA2008". It is specified 94 in the series of documents listed in Section 2.1. The IDNA2008 95 standard includes an algorithm by which a derived property value is 96 calculated based on the properties defined from the Unicode Standard. 98 The derived property values that can be calculated are defined in RFC 99 5892 [RFC5892]. The summary below is a summary to make the reading 100 of this document easier. For definition of the terms, please see RFC 101 5892 [RFC5892]. 103 o PROTOCOL VALID: Those that are allowed to be used in IDNs. Code 104 points with this property value are permitted for general use in 105 IDNs. However, that a label consists only of code points that 106 have this property value does not imply that the label can be used 107 in DNS. The abbreviated term PVALID is used to refer to this 108 value. 110 o CONTEXTUAL RULE REQUIRED: Some characteristics of the character, 111 such as it being invisible in certain contexts or problematic in 112 others, require that it not be used in labels unless specific 113 other characters or properties are present. The abbreviated term 114 CONTEXT is used to refer to this value. 116 o DISALLOWED: Those that should clearly not be included in IDNs. 117 Code points with this property value are not permitted in IDNs. 119 o UNASSIGNED: Those code points that are not designated (i.e., are 120 unassigned) in the Unicode Standard. 122 When the Unicode Standard is updated, new code points are assigned 123 and already-assigned code points can have their property values 124 changed. 126 o Assigning code points can create problems if the newly-assigned 127 code points are compositions of existing code points and because 128 of that the normalization relationships associated with those code 129 points should have been changed. 131 o Changing properties for already-assigned code points can create 132 problems if the property change results in changes to the derived 133 property value. This might make an earlier allowed code point 134 whose derived property value is PVALID to then not be allowed 135 anymore if its derived property value changes to DISALLOWED. The 136 problem can also happen the other way around: a code point that 137 was not allowed (and thus is prohibited) can suddenly end up being 138 allowed. 140 o Problems can also be created if the properties assigned to those 141 code points are inconsistent with IDNA2008 assumptions about how 142 properties are assigned and/or about how code points with those 143 properties are used or behave. 145 There were three incompatible changes in the Unicode standard after 146 Unicode 5.2.0 [Unicode-5.2.0] up to including Unicode 6.0.0 147 [Unicode-6.0.0], as described in RFC 6452 [RFC6452]. The code points 148 U+0CF1 and U+0CF2 had a derived property value change from DISALLOWED 149 to PVALID while U+19DA had a change in derived property value from 150 PVALID to DISALLOWED. They were examined in great detail and IETF 151 concluded that the consensus is that no update was needed to RFC 5892 152 [RFC5892] based on the changes made to the Unicode standard. 154 As described in Section 3, more changes have been made to code points 155 between Unicode version 6.0.0 and Unicode version 12.0.0 156 [Unicode-12.0.0] so that the derived property values have been 157 changed in an incompatible way. This document concludes that no 158 exceptions are to be added to RFC 5892 [RFC5892] even though there 159 are changes in the derived property value as a result of the changes 160 made in Unicode between version 6.2.0 and 12.0.0. 162 Further, in 2015, the Internet Architecture Board (IAB) issued a 163 statement [IAB] which requested the IETF to resolve the issues 164 related to the code point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) 165 that was introduced in Unicode 7.0.0 [Unicode-7.0.0]. This document 166 concludes that this code point is not to be added to the exception 167 list either. It should be noted that the review on U+08A1 indicated 168 that it is not an isolated case and that a number of PVALID code 169 points of long standing may have similar issues. The problem 170 resulted in a clarification of the review process of new Unicode 171 versions RFC 8753 [RFC8753]. This clarification of the review 172 process will impact review of Unicode versions after version 12.0.0. 174 2. Background 176 2.1. IDNA2008 Documents 178 IDNA2008 consists of the following documents. The documents in the 179 set have informal names. 181 o Internationalized Domain Names for Applications (IDNA): 182 Definitions and Document Framework [RFC5890], informally called 183 "Defs" or "Definitions", contains definitions and other material 184 that are needed for understanding other documents in the set. 186 o Internationalized Domain Names in Applications (IDNA): Protocol 187 [RFC5891], informally called "Protocol", describes the core 188 IDNA2008 protocol and its operations. It needs to be interpreted 189 in combination with the Bidi document (described below). 191 o The Unicode Code Points and Internationalized Domain Names for 192 Applications (IDNA) [RFC5892], informally called "Tables", lists 193 the categories and rules that identify the code points allowed in 194 a label written in native character form (called a "U-label"), and 195 is based on Unicode 5.2.0 [Unicode-5.2.0] code point assignments 196 and additional rules unique to IDNA2008. The Unicode-based rules 197 in RFC 4892 are expected to be stable across Unicode updates and 198 hence independent of Unicode versions. RFC 5892 [RFC5892] 199 obsoletes RFC 3491 [RFC3491], and in particular the use of the 200 tables to which RFC 3491 [RFC3491] refers. 202 o Right-to-Left Scripts for Internationalized Domain Names for 203 Applications (IDNA) [RFC5893], informally called "Bidi", specifies 204 special rules for labels that contain characters that are written 205 from right to left. 207 o Internationalized Domain Names for Applications (IDNA): 208 Background, Explanation, and Rationale [RFC5894], informally 209 called "Rationale", provides an overview of the protocol and 210 associated tables, and gives explanatory material and some 211 rationale for the decisions that led to IDNA2008. It also 212 contains advice for DNS registry operators and others who use 213 Internationalized Domain Names (IDNs). 215 o Mapping Characters for Internationalized Domain Names in 216 Applications (IDNA) 2008 [RFC5895], informally called "Mapping", 217 discusses the issue of mapping characters into other characters 218 and provides guidance for doing so when that is appropriate. RFC 219 5895 provides advice only and is not a required part of IDNA. 221 2.2. Additional important IDNA2008-related documents 223 There are other documents important for the understanding and 224 functioning of IDNA2008, for example this. 226 o The Unicode Code Points and Internationalized Domain Names for 227 Applications (IDNA) - Unicode 6.0 [RFC6452] describes some changes 228 made to Unicode 6.0.0 [Unicode-6.0.0] that resulted in derived 229 property value change for the code points U+0CF1, U+0CF2 and 230 U+19DA. U+0CF1 and U+0CF2 changed from DISALLOWED to PVALID, 231 while U+19DA changed from PVALID to DISALLOWED. The IETF 232 concluded that no update to RFC 5892 [RFC5892] was needed based on 233 the changes made in Unicode 6.0.0 [Unicode-6.0.0]. As a result, 234 the derived property value remained aligned with the Unicode 235 Standard. Specifically, no exception was added. 237 2.3. Deployment 239 There are many variations on the general IDNA model in use in the 240 various parts of the community. The following lists some of the 241 strategies that implementations that claim to be IDNA compliant are 242 known to use, but it should be noted the list is not complete: 244 o IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491 245 [RFC3491]. Those specifications are dependent on case folding and 246 NFKC normalization and on tables that specify for each code point 247 whether it is allowed to be used or not, with a distinction made 248 between use for "stored strings" and "query strings". The tables 249 themselves are dependent on version 3.2 of The Unicode Standard 250 [Unicode-3.2.0]. 252 o A number of variations on IDNA2003, sometimes presented as 253 "updated IDNA2003" or the like, which follow the principles of 254 IDNA2003 as understood by the implementers but that use tables 255 that represent how the implementers believe Stringprep [RFC3454] 256 and Nameprep [RFC3491] would have evolved had the IETF not moved 257 in the direction of IDNA2008 instead. 259 o A mix between IDNA2003 and IDNA2008 where code points assigned to 260 Unicode after Unicode 3.2.0 [Unicode-3.2.0] have derived property 261 value calculated according to the algorithm specified in IDNA2008. 263 o A mix between IDNA2003 and IDNA2008 according to the Unicode 264 Technical Standard #46 [UTS-46]. Because that document specifies 265 different profiles, there are several different variations that 266 leave users with no guarantee that two applications claiming 267 conformance to UTS#46 will interoperate well with each other much 268 less with conforming IDNA2008 implementations. UTS#46 is 269 ultimately based on a normative table very much like the one used 270 by Stringprep [RFC3454] but updated for each new version of 271 Unicode. 273 o The (normative) IDNA2008 algorithm applied to whatever version of 274 Unicode Standard exists in the operating system and/or libraries 275 used, independent of whatever version of tables appears in the 276 (non-normative) IANA database. 278 In practice, the Unicode Consortium creates a maximum set of code 279 points by assigning code points in the Unicode Standard. The 280 IDNA2008 rules use the Unicode Standard to create a further subset of 281 code points and context that are permitted in DNS labels associated 282 with its PVALID, CONTEXTJ, and CONTEXTO derived property values. DNS 283 registries and other organizations that deal with IDNs are supposed 284 to create their own subsets from IDNA2008 for use by those registries 285 and organizations. 287 This progressive subsetting and narrowing of the repertoire of code 288 points that can be used in labels is an implementation of the 289 principles of being conservative when deciding what code points to 290 include in such a subset. SAC-084 [SAC-084] and RFC 6912 [RFC6912] 291 recommend to DNS registries and other organizations to be 292 conservative when creating their subsets, and to use the principle of 293 creating subsets by inclusion. 295 3. Notable Changes Between Unicode 6.2.0 and 12.0.0 297 3.1. Changes between Unicode 6.2.0 and 7.0.0 299 Change in number of characters in each category: 301 Code points that changed derived property value: 0 303 PVALID changed from 97946 to 99867 (+1921) 305 UNASSIGNED changed from 864348 to 861509 (-2839) 307 CONTEXTJ did not change, at 2 309 CONTEXTO did not change, at 25 311 DISALLOWED changed from 151791 to 152709 (+918) 313 TOTAL did not change, at 1114112 315 There are no changes made to Unicode between version 6.2.0 and 316 7.0.0 that impact IDNA2008 calculation of the derived property 317 values. 319 The character ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) was 320 introduced in Unicode 7.0.0. This was discussed extensively in the 321 IETF, and by the IAB in their statement [IAB] requesting the IETF to 322 investigate the issue. Specifically, the IAB stated: 324 On the same precautionary principle, the IAB recommends that the 325 Internationalized Domain Names for Applications (IDNA) Parameters 326 registry (http://www.iana.org/assignments/idna-tables/) not be 327 updated to Unicode 7.0.0 until the IETF has consensus on a 328 solution to this problem. 330 The discussion in the IETF concluded that although it is possible to 331 create "the same" character in multiple ways, the issue with U+08A1 332 is not unique. The character U+08A1 (ARABIC LETTER BEH WITH HAMZA 333 ABOVE) can be represented with the sequence ARABIC LETTER BEH 334 (U+0628) and ARABIC HAMZA ABOVE (U+0654). This identical to LATIN 335 SMALL LETTER O WITH STROKE (U+00F8), which can be represented with 336 the sequence LATIN SMALL LETTER O (U+006F) followed by COMBINING 337 SHORT SOLIDUS OVERLAY (U+0337). 339 Although the discussion about this specific code point resulted in 340 acceptance of the derived property value of PVALID, the underlying 341 problem with combining sequences is not understood fully. Therefore 342 it cannot be claimed that this case can be extrapolated to other 343 situations and other code points. 345 3.2. Changes between Unicode 7.0.0 and 10.0.0 347 Change in number of characters in each category: 349 Code points that changed derived property value: 0 351 PVALID changed from 99867 to 122411 (+22544) 353 UNASSIGNED changed from 861509 to 837775 (-23734) 355 CONTEXTJ did not change, at 2 357 CONTEXTO did not change, at 25 359 DISALLOWED changed from 152709 to 153899 (+1190) 361 TOTAL did not change, at 1114112 363 There are no changes made to Unicode between version 7.0.0 and 364 10.0.0 that impact IDNA2008 calculation of the derived property 365 values. 367 3.3. Changes between Unicode 10.0.0 and 11.0.0 369 Change in number of characters in each category: 371 Code points that changed derived property value: 1 373 PVALID changed from 122411 to 122734 (+323) 375 UNASSIGNED changed from 837775 to 837091 (-684) 377 CONTEXTJ did not change, at 2 379 CONTEXTO did not change, at 25 380 DISALLOWED changed from 153899 to 154260 (+361) 382 TOTAL did not change, at 1114112 384 Georgian letters in the ranges U+10D0..U+10FA and U+10FD..U+10FF 385 had their General Properties changed from Lo to Ll, to reflect 386 their status as the lowercase of new Georgian case pairs. Case 387 mappings were also added. 389 SHARADA SANDHI MARK (U+111C9) was changed from Po to Mn, and from 390 bc=L to bc=NSM. 392 The properties for ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and 393 ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were corrected from Mc to 394 Mn. 396 SPHERICAL ANGLE OPENING UP (U+29A1) was changed to Bidi_M=N. 398 These changes to the Unicode Standard have the following implications 399 for these code points: 401 o The newly assigned 684 characters are assigned a derived property 402 value as of a result of applying the IDNA2008 algorithm. 404 o The Georgian letters in the ranges U+10D0..U+10FA and 405 U+10FD..U+10FF existed before IDNA2008 was created. Applying the 406 IDNA2008 algorithm to the code points assigned the derived 407 property value PVALID, and that value is unchanged even if the 408 underlying Unicode properties have changed. The newly encoded 409 Mtavruli letters have general category "Lu" and are therefore 410 DISALLOWED. 412 o The U+111C9 SHARADA SANDHI MARK was added to Unicode 8.0.0 413 [Unicode-8.0.0]. Applying the IDNA2008 algorithm to the code 414 point assigned the derived property value DISALLOWED. The changes 415 in the underlying properties in the Unicode Standard Version 416 11.0.0 [Unicode-11.0.0] caused the derived property value to 417 change to PVALID. 419 o The characters ZANABAZAR SQUARE VOWEL SIGN AI (U+11A07) and 420 ZANABZAR SQUARE VOWEL SIGN AU (U+11A08) were added to Unicode 421 10.0.0 [Unicode-10.0.0]. Applying the IDNA2008 algorithm to the 422 code points assigned the derived property value PVALID, and that 423 value is unchanged even if the underlying Unicode properties have 424 changed. 426 o SPHERICAL ANGLE OPENING UP (U+29A1) existed before IDNA2008 was 427 created. Applying the IDNA2008 algorithm to the code point 428 assigned the derived property value DISALLOWED, and that value is 429 unchanged even if the underlying Unicode properties have changed. 431 3.4. Changes between Unicode 11.0.0 and 12.0.0 433 Change in number of characters in each category: 435 Code points that changed derived property value: 0 437 PVALID changed from 122734 to 123006 (+272) 439 UNASSIGNED changed from 837091 to 836537 (-554) 441 CONTEXTJ did not change, at 2 443 CONTEXTO did not change, at 25 445 DISALLOWED changed from 154260 to 154542 (+282) 447 TOTAL did not change, at 1114112 449 4. U+111C9 SHARADA SANDHI MARK 451 As one can see in Section 3, an incompatible property change was made 452 between Unicode 6.2.0 and 12.0.0, affecting the code point U+111C9. 453 Its derived property value thus changed from DISALLOWED to PVALID. 454 In situations like these, IDNA2008 allow for addition of rules to RFC 455 5892 [RFC5892] section 2.7. (BackwardCompatible (G)). If the code 456 point is accepted, it might still be rejected if validated by 457 software based on older versions of Unicode than 11.0.0. As the 458 character is rarely used outside of the group of Sharada specialists, 459 and used in some records for indicating sandhi breaks, the conclusion 460 is that it could either be added as an exception or allowed to change 461 its property value, as the use of the code point is limited outside a 462 special community. As including an exception would require 463 implementation changes in deployed implementations of IDNA20008, the 464 editor proposes that such a BackwardCompatible rule NOT to be added 465 to IDNA2008. This also ensures all sandhi marks being treated in an 466 equal way. 468 The IETF has decided to NOT add a BackwardCompatible rule to IDNA2008 469 (i.e. Section 2.7 of RFC 5892 [RFC5892]) for this code point. 471 5. Conclusion 473 As described in Section 3 and Section 4, changes have been made to 474 Unicode between version 6.2.0 and 12.0.0. Some changes to specific 475 characters changed their derived property value, whereas other 476 changes did not. Given the deployment considerations described in 477 Section 2.3 and changes in the Unicode Standard described in 478 Section 3 and Section 4, including implications to normalization, the 479 conclusion of this document is to not add any exception rules to 480 IDNA2008. 482 This document addresses only changes to Unicode between version 6.2.0 483 and version 12.0.0. Changes in future Unicode versions might result 484 in the conclusion that exception rules need to be added to IDNA2008 485 after review process explained in RFC 8753 [RFC8753]. Separately 486 from any changes in Unicode, the IETF might conclude that updates to 487 RFC 5892 [RFC5892] or other IDNA2008 documents might become 488 necessary; such updates might include changes to the algorithm 489 specified in IDNA2008 as well as additional rules, categories, or 490 other forms of tuning. Like the clarifications in RFC 8753 491 [RFC8753]. 493 6. IANA Considerations 495 IANA is requested to update the IDNA Parameters registry of derived 496 property values, after the expert reviewer validates that the derived 497 property values are calculated correctly. 499 7. Security Considerations 501 This document makes recommendations regarding the use of the IDNA2008 502 algorithm for calculation of derived property values, based on 503 Unicode version 12.0.0. This recommendation do not say anything 504 about what recommendations to make for future versions of the Unicode 505 Standard. 507 Not following these recommendations can lead to various security 508 issues. Specifically, allowing confusable characters may lead to 509 various phishing attacks, as described in the Security Consideration 510 Sections in the documents listed in Section 2.1. 512 8. Acknowledgements 514 Thanks to Harald Alvestrand, Marc Blanchet, Martin Duerst, Asmus 515 Freytag, Ted Hardie, John Klensin, Erik Nordmark, Pete Resnick, Peter 516 Saint-Andre, Michel Suignard, Andrew Sullivan and Suzanne Woolf for 517 input to this document. 519 9. References 520 9.1. Normative References 522 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 523 Profile for Internationalized Domain Names (IDN)", 524 RFC 3491, DOI 10.17487/RFC3491, March 2003, 525 . 527 [RFC5890] Klensin, J., "Internationalized Domain Names for 528 Applications (IDNA): Definitions and Document Framework", 529 RFC 5890, DOI 10.17487/RFC5890, August 2010, 530 . 532 [RFC5891] Klensin, J., "Internationalized Domain Names in 533 Applications (IDNA): Protocol", RFC 5891, 534 DOI 10.17487/RFC5891, August 2010, 535 . 537 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 538 Internationalized Domain Names for Applications (IDNA)", 539 RFC 5892, DOI 10.17487/RFC5892, August 2010, 540 . 542 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 543 for Internationalized Domain Names for Applications 544 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, 545 . 547 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code 548 Points and Internationalized Domain Names for Applications 549 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, 550 November 2011, . 552 9.2. Non-normative references 554 [Changes-11.0.0] 555 The Unicode Consortium, "Unicode Standard Annex #44", 556 Unicode Standard Annex #44, UNICODE CHARACTER DATABASE, 557 Change History https://www.unicode.org/reports/tr44/ 558 tr44-21d4.html#Change_History, May 2018. 560 [IAB] Internet Architecture Board, "IAB Statement on Identifiers 561 and Unicode 7.0.0", IAB Statement on Identifiers and 562 Unicode 7.0.0 563 https://www.iab.org/documents/correspondence-reports- 564 documents/2015-2/iab-statement-on-identifiers-and-unicode- 565 7-0-0/, January 2015. 567 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 568 Internationalized Strings ("stringprep")", RFC 3454, 569 DOI 10.17487/RFC3454, December 2002, 570 . 572 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 573 "Internationalizing Domain Names in Applications (IDNA)", 574 RFC 3490, DOI 10.17487/RFC3490, March 2003, 575 . 577 [RFC5894] Klensin, J., "Internationalized Domain Names for 578 Applications (IDNA): Background, Explanation, and 579 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, 580 . 582 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 583 Internationalized Domain Names in Applications (IDNA) 584 2008", RFC 5895, DOI 10.17487/RFC5895, September 2010, 585 . 587 [RFC6912] Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman, 588 "Principles for Unicode Code Point Inclusion in Labels in 589 the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013, 590 . 592 [RFC8753] Klensin, J. and P. Faeltstroem, "Internationalized Domain 593 Names for Applications (IDNA) Review for New Unicode 594 Versions", RFC 8753, DOI 10.17487/RFC8753, April 2020, 595 . 597 [SAC-084] The Security and Stability Advisory Committee, "SAC084", 598 SSAC Comments on Guidelines for the Extended Process 599 Similarity Review Panel for the IDN ccTLD Fast Track 600 Process https://www.icann.org/en/system/files/files/sac- 601 084-en.pdf, August 2016. 603 [Unicode-10.0.0] 604 The Unicode Consortium, "The Unicode Standard, Version 605 10.0.0", The Unicode Standard, Version 10.0.0 ISBN 606 978-1-936213-16-0, June 2017. 608 [Unicode-11.0.0] 609 The Unicode Consortium, "The Unicode Standard, Version 610 11.0.0", The Unicode Standard, Version 11.0.0 ISBN 611 978-1-936213-19-1, June 2018. 613 [Unicode-12.0.0] 614 The Unicode Consortium, "The Unicode Standard, Version 615 12.0.0", The Unicode Standard, Version 12.0.0 ISBN 616 978-1-936213-22-1, March 2019. 618 [Unicode-3.2.0] 619 The Unicode Consortium, "The Unicode Standard, Version 620 3.2.0", The Unicode Standard, Version 3.2.0 ISBN 621 0-201-61633-5, March 2002. 623 [Unicode-5.2.0] 624 The Unicode Consortium, "The Unicode Standard, Version 625 5.2.0", The Unicode Standard, Version 5.2.0 ISBN 626 978-1-936213-00-9, October 2009. 628 [Unicode-6.0.0] 629 The Unicode Consortium, "The Unicode Standard, Version 630 6.0.0", The Unicode Standard, Version 6.0.0 ISBN 631 978-1-936213-01-6, October 2011. 633 [Unicode-6.3.0] 634 The Unicode Consortium, "The Unicode Standard, Version 635 6.3.0", The Unicode Standard, Version 6.3.0 ISBN 636 978-1-936213-08-5, September 2013. 638 [Unicode-7.0.0] 639 The Unicode Consortium, "The Unicode Standard, Version 640 7.0.0", The Unicode Standard, Version 7.0.0 ISBN 641 978-1-936213-09-2, June 2014. 643 [Unicode-8.0.0] 644 The Unicode Consortium, "The Unicode Standard, Version 645 8.0.0", The Unicode Standard, Version 8.0.0 ISBN 646 978-1-936213-10-8, June 2015. 648 [Unicode-9.0.0] 649 The Unicode Consortium, "The Unicode Standard, Version 650 9.0.0", The Unicode Standard, Version 9.0.0 ISBN 651 978-1-936213-13-9, June 2016. 653 [UTS-46] The Unicode Consortium, "Unicode Technical Standard #46, 654 Version 12.0.0", UNICODE IDNA COMPATIBILITY 655 PROCESSING http://www.unicode.org/reports/tr46/, March 656 2019. 658 Appendix A. Changes from Unicode 6.3.0 to Unicode 7.0.0 660 Changes from derived property value UNASSIGNED to either PVALID or 661 DISALLOWED. 663 037F ; DISALLOWED # GREEK CAPITAL LETTER YOT 664 0528..052F ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK..C 665 058D..058E ; DISALLOWED # RIGHT-FACING ARMENIAN ETERNITY SIGN..LEFT-FA 666 0605 ; DISALLOWED # ARABIC NUMBER MARK ABOVE 667 08A1 ; PVALID # ARABIC LETTER BEH WITH HAMZA ABOVE 668 08AD..08B2 ; PVALID # ARABIC LETTER LOW ALEF..ARABIC LETTER ZAIN W 669 08FF ; PVALID # ARABIC MARK SIDEWAYS NOON GHUNNA 670 0978 ; PVALID # DEVANAGARI LETTER MARWARI DDA 671 0980 ; PVALID # BENGALI ANJI 672 0C00 ; PVALID # TELUGU SIGN COMBINING CANDRABINDU ABOVE 673 0C34 ; PVALID # TELUGU LETTER LLLA 674 0C81 ; PVALID # KANNADA SIGN CANDRABINDU 675 0D01 ; PVALID # MALAYALAM SIGN CANDRABINDU 676 0DE6..0DEF ; PVALID # SINHALA LITH DIGIT ZERO..SINHALA LITH DIGIT 677 16F1..16F8 ; PVALID # RUNIC LETTER K..RUNIC LETTER FRANKS CASKET A 678 191D..191E ; PVALID # LIMBU LETTER GYAN..LIMBU LETTER TRA 679 1AB0..1ABE ; PVALID # COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINI 680 1CF8..1CF9 ; PVALID # VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RIN 681 1DE7..1DF5 ; PVALID # COMBINING LATIN SMALL LETTER ALPHA..COMBININ 682 20BB..20BD ; DISALLOWED # NORDIC MARK SIGN..RUBLE SIGN 683 23F4..23FA ; DISALLOWED # BLACK MEDIUM LEFT-POINTING TRIANGLE..BLACK C 684 2700 ; DISALLOWED # BLACK SAFETY SCISSORS 685 2B4D..2B4F ; DISALLOWED # DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..SHOR 686 2B5A..2B73 ; DISALLOWED # SLANTED NORTH ARROW WITH HOOKED HEAD..DOWNWA 687 2B76..2B95 ; DISALLOWED # NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIG 688 2B98..2BB9 ; DISALLOWED # THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL AR 689 2BBD..2BC8 ; DISALLOWED # BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT- 690 2BCA..2BD1 ; DISALLOWED # TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN 691 2E3C..2E42 ; DISALLOWED # STENOGRAPHIC FULL STOP..DOUBLE LOW-REVERSED- 692 A698..A69D ; DISALLOWED # CYRILLIC CAPITAL LETTER DOUBLE O..MODIFIER L 693 A794..A79F ; PVALID # LATIN SMALL LETTER C WITH PALATAL HOOK..LATI 694 A7AB..A7AD ; DISALLOWED # LATIN CAPITAL LETTER REVERSED OPEN E..LATIN 695 A7B0..A7B1 ; DISALLOWED # LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL 696 A7F7 ; PVALID # LATIN EPIGRAPHIC LETTER SIDEWAYS I 697 A9E0..A9FE ; PVALID # MYANMAR LETTER SHAN GHA..MYANMAR LETTER TAI 698 AA7C..AA7F ; PVALID # MYANMAR SIGN TAI LAING TONE-2..MYANMAR LETTE 699 AB30..AB5F ; PVALID # LATIN SMALL LETTER BARRED ALPHA..MODIFIER LE 700 AB64..AB65 ; PVALID # LATIN SMALL LETTER INVERTED ALPHA..GREEK LET 701 FE27..FE2D ; PVALID # COMBINING LIGATURE LEFT HALF BELOW..COMBININ 702 1018B..1018C; DISALLOWED # GREEK ONE QUARTER SIGN..GREEK SINUSOID SIGN 703 101A0 ; DISALLOWED # GREEK SYMBOL TAU RHO 704 102E0..102FB; PVALID # COPTIC EPACT THOUSANDS MARK..COPTIC EPACT NU 705 1031F ; PVALID # OLD ITALIC LETTER ESS 706 10350..1037A; PVALID # OLD PERMIC LETTER AN..COMBINING OLD PERMIC L 707 10500..10527; PVALID # ELBASAN LETTER A..ELBASAN LETTER KHE 708 10530..10563; PVALID # CAUCASIAN ALBANIAN LETTER ALT..CAUCASIAN ALB 709 1056F ; DISALLOWED # CAUCASIAN ALBANIAN CITATION MARK 710 10600..10736; PVALID # LINEAR A SIGN AB001..LINEAR A SIGN A664 711 10740..10755; PVALID # LINEAR A SIGN A701 A..LINEAR A SIGN A732 JE 712 10760..10767; PVALID # LINEAR A SIGN A800..LINEAR A SIGN A807 713 10860..1089E; PVALID # PALMYRENE LETTER ALEPH..NABATAEAN LETTER TAW 714 108A7..108AF; DISALLOWED # NABATAEAN NUMBER ONE..NABATAEAN NUMBER ONE H 715 10A80..10A9F; PVALID # OLD NORTH ARABIAN LETTER HEH..OLD NORTH ARAB 716 10AC0..10AE6; PVALID # MANICHAEAN LETTER ALEPH..MANICHAEAN ABBREVIA 717 10AEB..10AF6; DISALLOWED # MANICHAEAN NUMBER ONE..MANICHAEAN PUNCTUATIO 718 10B80..10B91; PVALID # PSALTER PAHLAVI LETTER ALEPH..PSALTER PAHLAV 719 10B99..10B9C; DISALLOWED # PSALTER PAHLAVI SECTION MARK..PSALTER PAHLAV 720 10BA9..10BAF; DISALLOWED # PSALTER PAHLAVI NUMBER ONE..PSALTER PAHLAVI 721 1107F ; PVALID # BRAHMI NUMBER JOINER 722 11150..11176; PVALID # MAHAJANI LETTER A..MAHAJANI LIGATURE SHRI 723 111CD ; DISALLOWED # SHARADA SUTRA MARK 724 111DA ; PVALID # SHARADA EKAM 725 111E1..111F4; DISALLOWED # SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC N 726 11200..11211; PVALID # KHOJKI LETTER A..KHOJKI LETTER JJA 727 11213..1123D; PVALID # KHOJKI LETTER NYA..KHOJKI ABBREVIATION SIGN 728 112B0..112EA; PVALID # KHUDAWADI LETTER A..KHUDAWADI SIGN VIRAMA 729 112F0..112F9; PVALID # KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE 730 11301..11303; PVALID # GRANTHA SIGN CANDRABINDU..GRANTHA SIGN VISAR 731 11305..1130C; PVALID # GRANTHA LETTER A..GRANTHA LETTER VOCALIC L 732 1130F..11310; PVALID # GRANTHA LETTER EE..GRANTHA LETTER AI 733 11313..11328; PVALID # GRANTHA LETTER OO..GRANTHA LETTER NA 734 1132A..11330; PVALID # GRANTHA LETTER PA..GRANTHA LETTER RA 735 11332..11333; PVALID # GRANTHA LETTER LA..GRANTHA LETTER LLA 736 11335..11339; PVALID # GRANTHA LETTER VA..GRANTHA LETTER HA 737 1133C..11344; PVALID # GRANTHA SIGN NUKTA..GRANTHA VOWEL SIGN VOCAL 738 11347..11348; PVALID # GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI 739 1134B..1134D; PVALID # GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA 740 11357 ; PVALID # GRANTHA AU LENGTH MARK 741 1135D..11363; PVALID # GRANTHA SIGN PLUTA..GRANTHA VOWEL SIGN VOCAL 742 11366..1136C; PVALID # COMBINING GRANTHA DIGIT ZERO..COMBINING GRAN 743 11370..11374; PVALID # COMBINING GRANTHA LETTER A..COMBINING GRANTH 744 11480..114C7; PVALID # TIRHUTA ANJI..TIRHUTA OM 745 114D0..114D9; PVALID # TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE 746 11580..115B5; PVALID # SIDDHAM LETTER A..SIDDHAM VOWEL SIGN VOCALIC 747 115B8..115C9; PVALID # SIDDHAM VOWEL SIGN E..SIDDHAM END OF TEXT MA 748 11600..11644; PVALID # MODI LETTER A..MODI SIGN HUVA 749 11650..11659; PVALID # MODI DIGIT ZERO..MODI DIGIT NINE 750 118A0..118F2; DISALLOWED # WARANG CITI CAPITAL LETTER NGAA..WARANG CITI 751 118FF ; PVALID # WARANG CITI OM 752 11AC0..11AF8; PVALID # PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL S 753 1236F..12398; PVALID # CUNEIFORM SIGN KAP ELAMITE..CUNEIFORM SIGN U 754 12463..1246E; DISALLOWED # CUNEIFORM NUMERIC SIGN ONE QUARTER GUR..CUNE 755 12474 ; DISALLOWED # CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLO 756 16A40..16A5E; PVALID # MRO LETTER TA..MRO LETTER TEK 757 16A60..16A69; PVALID # MRO DIGIT ZERO..MRO DIGIT NINE 758 16A6E..16A6F; DISALLOWED # MRO DANDA..MRO DOUBLE DANDA 759 16AD0..16AED; PVALID # BASSA VAH LETTER ENNI..BASSA VAH LETTER I 760 16AF0..16AF5; PVALID # BASSA VAH COMBINING HIGH TONE..BASSA VAH FUL 761 16B00..16B45; PVALID # PAHAWH HMONG VOWEL KEEB..PAHAWH HMONG SIGN C 762 16B50..16B59; PVALID # PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT 763 16B5B..16B61; DISALLOWED # PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBE 764 16B63..16B77; PVALID # PAHAWH HMONG SIGN VOS LUB..PAHAWH HMONG SIGN 765 16B7D..16B8F; PVALID # PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG 766 1BC00..1BC6A; PVALID # DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M 767 1BC70..1BC7C; PVALID # DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLO 768 1BC80..1BC88; PVALID # DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HI 769 1BC90..1BC99; PVALID # DUPLOYAN AFFIX LOW ACUTE..DUPLOYAN AFFIX LOW 770 1BC9C..1BCA3; DISALLOWED # DUPLOYAN SIGN O WITH CROSS..SHORTHAND FORMAT 771 1E800..1E8C4; PVALID # MENDE KIKAKUI SYLLABLE M001 KI..MENDE KIKAKU 772 1E8C7..1E8D6; DISALLOWED # MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI COMBI 773 1F0BF ; DISALLOWED # PLAYING CARD RED JOKER 774 1F0E0..1F0F5; DISALLOWED # PLAYING CARD FOOL..PLAYING CARD TRUMP-21 775 1F10B..1F10C; DISALLOWED # DINGBAT CIRCLED SANS-SERIF DIGIT ZERO..DINGB 776 1F321..1F32C; DISALLOWED # THERMOMETER..WIND BLOWING FACE 777 1F336 ; DISALLOWED # HOT PEPPER 778 1F37D ; DISALLOWED # FORK AND KNIFE WITH PLATE 779 1F394..1F39F; DISALLOWED # HEART WITH TIP ON THE LEFT..ADMISSION TICKET 780 1F3C5 ; DISALLOWED # SPORTS MEDAL 781 1F3CB..1F3CE; DISALLOWED # WEIGHT LIFTER..RACING CAR 782 1F3D4..1F3DF; DISALLOWED # SNOW CAPPED MOUNTAIN..STADIUM 783 1F3F1..1F3F7; DISALLOWED # WHITE PENNANT..LABEL 784 1F43F ; DISALLOWED # CHIPMUNK 785 1F441 ; DISALLOWED # EYE 786 1F4F8 ; DISALLOWED # CAMERA WITH FLASH 787 1F4FD..1F4FE; DISALLOWED # FILM PROJECTOR..PORTABLE STEREO 788 1F53E..1F53F; DISALLOWED # LOWER RIGHT SHADOWED WHITE CIRCLE..UPPER RIG 789 1F544..1F54A; DISALLOWED # NOTCHED RIGHT SEMICIRCLE WITH THREE DOTS..DO 790 1F568..1F579; DISALLOWED # RIGHT SPEAKER..JOYSTICK 791 1F57B..1F5A3; DISALLOWED # LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POI 792 1F5A5..1F5FA; DISALLOWED # DESKTOP COMPUTER..WORLD MAP 793 1F641..1F642; DISALLOWED # SLIGHTLY FROWNING FACE..SLIGHTLY SMILING FAC 794 1F650..1F67F; DISALLOWED # NORTH WEST POINTING LEAF..REVERSE CHECKER BO 795 1F6C6..1F6CF; DISALLOWED # TRIANGLE WITH ROUNDED CORNERS..BED 796 1F6E0..1F6EC; DISALLOWED # HAMMER AND WRENCH..AIRPLANE ARRIVING 797 1F6F0..1F6F3; DISALLOWED # SATELLITE..PASSENGER SHIP 798 1F780..1F7D4; DISALLOWED # BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE 799 1F800..1F80B; DISALLOWED # LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEA 800 1F810..1F847; DISALLOWED # LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROW 801 1F850..1F859; DISALLOWED # LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SER 802 1F860..1F887; DISALLOWED # WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE 804 Appendix B. Changes from Unicode 7.0.0 to Unicode 8.0.0 806 Changes from derived property value UNASSIGNED to either PVALID or 807 DISALLOWED. 809 08B3..08B4 ; PVALID # ARABIC LETTER AIN WITH THREE DOTS BELOW..ARA 810 08E3 ; PVALID # ARABIC TURNED DAMMA BELOW 811 0AF9 ; PVALID # GUJARATI LETTER ZHA 812 0C5A ; PVALID # TELUGU LETTER RRRA 813 0D5F ; PVALID # MALAYALAM LETTER ARCHAIC II 814 13F5 ; PVALID # CHEROKEE LETTER MV 815 13F8..13FD ; DISALLOWED # CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LET 816 20BE ; DISALLOWED # LARI SIGN 817 218A..218B ; DISALLOWED # TURNED DIGIT TWO..TURNED DIGIT THREE 818 2BEC..2BEF ; DISALLOWED # LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARR 819 9FCD..9FD5 ; PVALID # .. 820 A69E ; PVALID # COMBINING CYRILLIC LETTER EF 821 A78F ; PVALID # LATIN LETTER SINOLOGICAL DOT 822 A7B2..A7B7 ; DISALLOWED # LATIN CAPITAL LETTER J WITH CROSSED-TAIL..LA 823 A8FC..A8FD ; DISALLOWED # DEVANAGARI SIGN SIDDHAM..DEVANAGARI JAIN OM 824 AB60..AB63 ; PVALID # LATIN SMALL LETTER SAKHA YAT..LATIN SMALL LE 825 AB70..ABBF ; DISALLOWED # CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT 826 FE2E..FE2F ; PVALID # COMBINING CYRILLIC TITLO LEFT HALF..COMBININ 827 108E0..108F2; PVALID # HATRAN LETTER ALEPH..HATRAN LETTER QOPH 828 108F4..108F5; PVALID # HATRAN LETTER SHIN..HATRAN LETTER TAW 829 108FB..108FF; DISALLOWED # HATRAN NUMBER ONE..HATRAN NUMBER ONE HUNDRED 830 109BC..109BD; DISALLOWED # MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS..M 831 109C0..109CF; DISALLOWED # MEROITIC CURSIVE NUMBER ONE..MEROITIC CURSIV 832 109D2..109FF; DISALLOWED # MEROITIC CURSIVE NUMBER ONE HUNDRED..MEROITI 833 10C80..10CB2; DISALLOWED # OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIA 834 10CC0..10CF2; PVALID # OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN 835 10CFA..10CFF; DISALLOWED # OLD HUNGARIAN NUMBER ONE..OLD HUNGARIAN NUMB 836 111C9..111CC; DISALLOWED # SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOW 837 111DB..111DF; DISALLOWED # SHARADA SIGN SIDDHAM..SHARADA SECTION MARK-2 838 11280..11286; PVALID # MULTANI LETTER A..MULTANI LETTER GA 839 11288 ; PVALID # MULTANI LETTER GHA 840 1128A..1128D; PVALID # MULTANI LETTER CA..MULTANI LETTER JJA 841 1128F..1129D; PVALID # MULTANI LETTER NYA..MULTANI LETTER BA 842 1129F..112A9; PVALID # MULTANI LETTER BHA..MULTANI SECTION MARK 843 11300 ; PVALID # GRANTHA SIGN COMBINING ANUSVARA ABOVE 844 11350 ; PVALID # GRANTHA OM 845 115CA..115DD; DISALLOWED # SIDDHAM SECTION MARK WITH TRIDENT AND U-SHAP 846 11700..11719; PVALID # AHOM LETTER KA..AHOM LETTER JHA 847 1171D..1172B; PVALID # AHOM CONSONANT SIGN MEDIAL LA..AHOM SIGN KIL 848 11730..1173F; PVALID # AHOM DIGIT ZERO..AHOM SYMBOL VI 849 12399 ; PVALID # CUNEIFORM SIGN U U 850 12480..12543; PVALID # CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM 851 14400..14646; PVALID # ANATOLIAN HIEROGLYPH A001..ANATOLIAN HIEROGL 852 1D1DE..1D1E8; DISALLOWED # MUSICAL SYMBOL KIEVAN C CLEF..MUSICAL SYMBOL 853 1D800..1DA8B; DISALLOWED # SIGNWRITING HAND-FIST INDEX..SIGNWRITING PAR 854 1DA9B..1DA9F; PVALID # SIGNWRITING FILL MODIFIER-2..SIGNWRITING FIL 855 1DAA1..1DAAF; PVALID # SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING 856 1F32D..1F32F; DISALLOWED # HOT DOG..BURRITO 857 1F37E..1F37F; DISALLOWED # BOTTLE WITH POPPING CORK..POPCORN 858 1F3CF..1F3D3; DISALLOWED # CRICKET BAT AND BALL..TABLE TENNIS PADDLE AN 859 1F3F8..1F3FF; DISALLOWED # BADMINTON RACQUET AND SHUTTLECOCK..EMOJI MOD 860 1F4FF ; DISALLOWED # PRAYER BEADS 861 1F54B..1F54F; DISALLOWED # KAABA..BOWL OF HYGIEIA 862 1F643..1F644; DISALLOWED # UPSIDE-DOWN FACE..FACE WITH ROLLING EYES 863 1F6D0 ; DISALLOWED # PLACE OF WORSHIP 864 1F910..1F918; DISALLOWED # ZIPPER-MOUTH FACE..SIGN OF THE HORNS 865 1F980..1F984; DISALLOWED # CRAB..UNICORN FACE 866 1F9C0 ; DISALLOWED # CHEESE WEDGE 868 Appendix C. Changes from Unicode 8.0.0 to Unicode 9.0.0 870 Changes from derived property value UNASSIGNED to either PVALID or 871 DISALLOWED. 873 08B6..08BD ; PVALID # ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARA 874 08D4..08E2 ; PVALID # ARABIC SMALL HIGH WORD AR-RUB..ARABIC DISPUT 875 0C80 ; PVALID # KANNADA SIGN SPACING CANDRABINDU 876 0D4F ; DISALLOWED # MALAYALAM SIGN PARA 877 0D54..0D56 ; PVALID # MALAYALAM LETTER CHILLU M..MALAYALAM LETTER 878 0D58..0D5E ; DISALLOWED # MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTI 879 0D76..0D78 ; DISALLOWED # MALAYALAM FRACTION ONE SIXTEENTH..MALAYALAM 880 1C80..1C88 ; DISALLOWED # CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC S 881 1DFB ; PVALID # COMBINING DELETION MARK 882 23FB..23FE ; DISALLOWED # POWER SYMBOL..POWER SLEEP SYMBOL 883 2E43..2E44 ; DISALLOWED # DASH WITH LEFT UPTURN..DOUBLE SUSPENSION MAR 884 A7AE ; DISALLOWED # LATIN CAPITAL LETTER SMALL CAPITAL I 885 A8C5 ; PVALID # SAURASHTRA SIGN CANDRABINDU 886 1018D..1018E; DISALLOWED # GREEK INDICTION SIGN..NOMISMA SIGN 887 104B0..104D3; DISALLOWED # OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER 888 104D8..104FB; PVALID # OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA 889 1123E ; PVALID # KHOJKI SIGN SUKUN 890 11400..11459; PVALID # NEWA LETTER A..NEWA DIGIT NINE 891 1145B ; DISALLOWED # NEWA PLACEHOLDER MARK 892 1145D ; DISALLOWED # NEWA INSERTION SIGN 893 11660..1166C; DISALLOWED # MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TUR 894 11C00..11C08; PVALID # BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC 895 11C0A..11C36; PVALID # BHAIKSUKI LETTER E..BHAIKSUKI VOWEL SIGN VOC 896 11C38..11C45; PVALID # BHAIKSUKI VOWEL SIGN E..BHAIKSUKI GAP FILLER 897 11C50..11C6C; PVALID # BHAIKSUKI DIGIT ZERO..BHAIKSUKI HUNDREDS UNI 898 11C70..11C8F; DISALLOWED # MARCHEN HEAD MARK..MARCHEN LETTER A 899 11C92..11CA7; PVALID # MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOIN 900 11CA9..11CB6; PVALID # MARCHEN SUBJOINED LETTER YA..MARCHEN SIGN CA 901 16FE0 ; PVALID # TANGUT ITERATION MARK 902 17000..187EC; PVALID # .. 903 18800..18AF2; PVALID # TANGUT COMPONENT-001..TANGUT COMPONENT-755 904 1E000..1E006; PVALID # COMBINING GLAGOLITIC LETTER AZU..COMBINING G 905 1E008..1E018; PVALID # COMBINING GLAGOLITIC LETTER ZEMLJA..COMBININ 906 1E01B..1E021; PVALID # COMBINING GLAGOLITIC LETTER SHTA..COMBINING 907 1E023..1E024; PVALID # COMBINING GLAGOLITIC LETTER YU..COMBINING GL 908 1E026..1E02A; PVALID # COMBINING GLAGOLITIC LETTER YO..COMBINING GL 909 1E900..1E94A; DISALLOWED # ADLAM CAPITAL LETTER ALIF..ADLAM NUKTA 910 1E950..1E959; PVALID # ADLAM DIGIT ZERO..ADLAM DIGIT NINE 911 1E95E..1E95F; DISALLOWED # ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIA 912 1F19B..1F1AC; DISALLOWED # SQUARED THREE D..SQUARED VOD 913 1F23B ; DISALLOWED # SQUARED CJK UNIFIED IDEOGRAPH-914D 914 1F57A ; DISALLOWED # MAN DANCING 915 1F5A4 ; DISALLOWED # BLACK HEART 916 1F6D1..1F6D2; DISALLOWED # OCTAGONAL SIGN..SHOPPING TROLLEY 917 1F6F4..1F6F6; DISALLOWED # SCOOTER..CANOE 918 1F919..1F91E; DISALLOWED # CALL ME HAND..HAND WITH INDEX AND MIDDLE FIN 919 1F920..1F927; DISALLOWED # FACE WITH COWBOY HAT..SNEEZING FACE 920 1F930 ; DISALLOWED # PREGNANT WOMAN 921 1F933..1F93E; DISALLOWED # SELFIE..HANDBALL 922 1F940..1F94B; DISALLOWED # WILTED FLOWER..MARTIAL ARTS UNIFORM 923 1F950..1F95E; DISALLOWED # CROISSANT..PANCAKES 925 Appendix D. Changes from Unicode 9.0.0 to Unicode 10.0.0 927 Changes from derived property value UNASSIGNED to either PVALID or 928 DISALLOWED. 930 0860..086A ; PVALID # SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER M 931 09FC..09FD ; PVALID # BENGALI LETTER VEDIC ANUSVARA..BENGALI ABBRE 932 0AFA..0AFF ; PVALID # GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCL 933 0D00 ; PVALID # MALAYALAM SIGN COMBINING ANUSVARA ABOVE 934 0D3B..0D3C ; PVALID # MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALA 935 1CF7 ; PVALID # VEDIC SIGN ATIKRAMA 936 1DF6..1DF9 ; PVALID # COMBINING KAVYKA ABOVE RIGHT..COMBINING WIDE 937 20BF ; DISALLOWED # BITCOIN SIGN 938 23FF ; DISALLOWED # OBSERVER EYE SYMBOL 939 2BD2 ; DISALLOWED # GROUP MARK 940 2E45..2E49 ; DISALLOWED # INVERTED LOW KAVYKA..DOUBLE STACKED COMMA 941 312E ; PVALID # BOPOMOFO LETTER O WITH DOT ABOVE 942 9FD6..9FEA ; PVALID # .. 943 1032D..1032F; PVALID # OLD ITALIC LETTER YE..OLD ITALIC LETTER SOUT 944 11A00..11A47; PVALID # ZANABAZAR SQUARE LETTER A..ZANABAZAR SQUARE 945 11A50..11A83; PVALID # SOYOMBO LETTER A..SOYOMBO LETTER KSSA 946 11A86..11A9C; PVALID # SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO M 947 11A9E..11AA2; DISALLOWED # SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIP 948 11D00..11D06; PVALID # MASARAM GONDI LETTER A..MASARAM GONDI LETTER 949 11D08..11D09; PVALID # MASARAM GONDI LETTER AI..MASARAM GONDI LETTE 950 11D0B..11D36; PVALID # MASARAM GONDI LETTER AU..MASARAM GONDI VOWEL 951 11D3A ; PVALID # MASARAM GONDI VOWEL SIGN E 952 11D3C..11D3D; PVALID # MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI V 953 11D3F..11D47; PVALID # MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI R 954 11D50..11D59; PVALID # MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGI 955 16FE1 ; PVALID # NUSHU ITERATION MARK 956 1B002..1B11E; PVALID # HENTAIGANA LETTER A-1..HENTAIGANA LETTER N-M 957 1B170..1B2FB; PVALID # NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB 958 1F260..1F265; DISALLOWED # ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CA 959 1F6D3..1F6D4; DISALLOWED # STUPA..PAGODA 960 1F6F7..1F6F8; DISALLOWED # SLED..FLYING SAUCER 961 1F900..1F90B; DISALLOWED # CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWAR 962 1F91F ; DISALLOWED # I LOVE YOU HAND SIGN 963 1F928..1F92F; DISALLOWED # FACE WITH ONE EYEBROW RAISED..SHOCKED FACE W 964 1F931..1F932; DISALLOWED # BREAST-FEEDING..PALMS UP TOGETHER 965 1F94C ; DISALLOWED # CURLING STONE 966 1F95F..1F96B; DISALLOWED # DUMPLING..CANNED FOOD 967 1F992..1F997; DISALLOWED # GIRAFFE FACE..CRICKET 968 1F9D0..1F9E6; DISALLOWED # FACE WITH MONOCLE..SOCKS 970 Appendix E. Changes from Unicode 10.0.0 to Unicode 11.0.0 972 Changes from derived property value DISALLOWED to PVALID. 974 111C9 ; PVALID # SHARADA SANDHI MARK 976 Changes from derived property value UNASSIGNED to either PVALID or 977 DISALLOWED. 979 0560 ; PVALID # ARMENIAN SMALL LETTER TURNED AYB 980 0588 ; PVALID # ARMENIAN SMALL LETTER YI WITH STROKE 981 05EF ; PVALID # HEBREW YOD TRIANGLE 982 07FD..07FF ; PVALID # NKO DANTAYALAN..NKO TAMAN SIGN 983 08D3 ; PVALID # ARABIC SMALL LOW WAW 984 09FE ; PVALID # BENGALI SANDHI MARK 985 0A76 ; DISALLOWED # GURMUKHI ABBREVIATION SIGN 986 0C04 ; PVALID # TELUGU SIGN COMBINING ANUSVARA ABOVE 987 0C84 ; DISALLOWED # KANNADA SIGN SIDDHAM 988 1878 ; PVALID # MONGOLIAN LETTER CHA WITH TWO DOTS 989 1C90..1CBA ; DISALLOWED # GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIA 990 1CBD..1CBF ; DISALLOWED # GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGI 991 2BBA..2BBC ; DISALLOWED # OVERLAPPING WHITE SQUARES..OVERLAPPING BLACK 992 2BD3..2BEB ; DISALLOWED # PLUTO FORM TWO..STAR WITH RIGHT HALF BLACK 993 2BF0..2BFE ; DISALLOWED # ERIS FORM ONE..REVERSED RIGHT ANGLE 994 2E4A..2E4E ; DISALLOWED # DOTTED SOLIDUS..PUNCTUS ELEVATUS MARK 995 312F ; PVALID # BOPOMOFO LETTER NN 996 9FEB..9FEF ; PVALID # .. 997 A7AF ; PVALID # LATIN LETTER SMALL CAPITAL Q 998 A7B8..A7B9 ; DISALLOWED # LATIN CAPITAL LETTER U WITH STROKE..LATIN SM 999 A8FE..A8FF ; PVALID # DEVANAGARI LETTER AY..DEVANAGARI VOWEL SIGN 1000 10A34..10A35; PVALID # KHAROSHTHI LETTER TTTA..KHAROSHTHI LETTER VH 1001 10A48 ; DISALLOWED # KHAROSHTHI FRACTION ONE HALF 1002 10D00..10D27; PVALID # HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA SI 1003 10D30..10D39; PVALID # HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA 1004 10F00..10F27; PVALID # OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LIGATU 1005 10F30..10F59; PVALID # SOGDIAN LETTER ALEPH..SOGDIAN PUNCTUATION HA 1006 110CD ; DISALLOWED # KAITHI NUMBER SIGN ABOVE 1007 11144..11146; PVALID # CHAKMA LETTER LHAA..CHAKMA VOWEL SIGN EI 1008 1133B ; PVALID # COMBINING BINDU BELOW 1009 1145E ; PVALID # NEWA SANDHI MARK 1010 1171A ; PVALID # AHOM LETTER ALTERNATE BA 1011 11800..1183B; PVALID # DOGRA LETTER A..DOGRA ABBREVIATION SIGN 1012 11A9D ; PVALID # SOYOMBO MARK PLUTA 1013 11D60..11D65; PVALID # GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER 1014 11D67..11D68; PVALID # GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTE 1015 11D6A..11D8E; PVALID # GUNJALA GONDI LETTER OO..GUNJALA GONDI VOWEL 1016 11D90..11D91; PVALID # GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI V 1017 11D93..11D98; PVALID # GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI O 1018 11DA0..11DA9; PVALID # GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGI 1019 11EE0..11EF8; PVALID # MAKASAR LETTER KA..MAKASAR END OF SECTION 1020 16E40..16E9A; DISALLOWED # MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN EX 1021 187ED..187F1; PVALID # .. 1022 1D2E0..1D2F3; DISALLOWED # MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN 1023 1D372..1D378; DISALLOWED # IDEOGRAPHIC TALLY MARK ONE..TALLY MARK FIVE 1024 1EC71..1ECB4; DISALLOWED # INDIC SIYAQ NUMBER ONE..INDIC SIYAQ ALTERNAT 1025 1F12F ; DISALLOWED # COPYLEFT SYMBOL 1026 1F6F9 ; DISALLOWED # SKATEBOARD 1027 1F7D5..1F7D8; DISALLOWED # CIRCLED TRIANGLE..NEGATIVE CIRCLED SQUARE 1028 1F94D..1F94F; DISALLOWED # LACROSSE STICK AND BALL..FLYING DISC 1029 1F96C..1F970; DISALLOWED # LEAFY GREEN..SMILING FACE WITH SMILING EYES 1030 1F973..1F976; DISALLOWED # FACE WITH PARTY HORN AND PARTY HAT..FREEZING 1031 1F97A ; DISALLOWED # FACE WITH PLEADING EYES 1032 1F97C..1F97F; DISALLOWED # LAB COAT..FLAT SHOE 1033 1F998..1F9A2; DISALLOWED # KANGAROO..SWAN 1034 1F9B0..1F9B9; DISALLOWED # EMOJI COMPONENT RED HAIR..SUPERVILLAIN 1035 1F9C1..1F9C2; DISALLOWED # CUPCAKE..SALT SHAKER 1036 1F9E7..1F9FF; DISALLOWED # RED GIFT ENVELOPE..NAZAR AMULET 1038 Appendix F. Changes from Unicode 11.0.0 to Unicode 12.0.0 1040 Changes from derived property value UNASSIGNED to either PVALID or 1041 DISALLOWED. 1043 0C77..0C7F ; DISALLOWED # TELUGU SIGN SIDDHAM..TELUGU SIGN TUUMU 1044 0E86..0E8A ; PVALID # LAO LETTER PALI GHA..LAO LETTER SO TAM 1045 0E8C..0EA3 ; PVALID # LAO LETTER PALI JHA..LAO LETTER LO LING 1046 0EA7..0EB2 ; PVALID # LAO LETTER WO..LAO VOWEL SIGN AA 1047 0EB4..0EBD ; PVALID # LAO VOWEL SIGN I..LAO SEMIVOWEL SIGN NYO 1048 1CD4..1CFA ; PVALID # VEDIC SIGN YAJURVEDIC MIDLINE SVARITA..VEDIC 1049 2B98..2C2E ; DISALLOWED # THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL AR 1050 2E30..2E4F ; DISALLOWED # RING POINT..CORNISH VERSE DIVIDER 1051 A7BA ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL A 1052 A7BB ; PVALID # LATIN SMALL LETTER GLOTTAL A 1053 A7BC ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL I 1054 A7BD ; PVALID # LATIN SMALL LETTER GLOTTAL I 1055 A7BE ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL U 1056 A7BF ; PVALID # LATIN SMALL LETTER GLOTTAL U 1057 A7C2 ; DISALLOWED # LATIN CAPITAL LETTER ANGLICANA W 1058 A7C3 ; PVALID # LATIN SMALL LETTER ANGLICANA W 1059 A7C4..A7C6 ; DISALLOWED # LATIN CAPITAL LETTER C WITH PALATAL HOOK..LA 1060 AB60..AB67 ; PVALID # LATIN SMALL LETTER SAKHA YAT..LATIN SMALL LE 1061 10FE0..10FF6; PVALID # ELYMAIC LETTER ALEPH..ELYMAIC LIGATURE ZAYIN 1062 1145E..1145F; PVALID # NEWA SANDHI MARK..NEWA LETTER VEDIC ANUSVARA 1063 11680..116B8; PVALID # TAKRI LETTER A..TAKRI LETTER ARCHAIC KHA 1064 119A0..119A7; PVALID # NANDINAGARI LETTER A..NANDINAGARI LETTER VOC 1065 119AA..119D7; PVALID # NANDINAGARI LETTER E..NANDINAGARI VOWEL SIGN 1066 119DA..119E1; PVALID # NANDINAGARI VOWEL SIGN E..NANDINAGARI SIGN A 1067 119E2 ; DISALLOWED # NANDINAGARI SIGN SIDDHAM 1068 119E3..119E4; PVALID # NANDINAGARI HEADSTROKE..NANDINAGARI VOWEL SI 1069 11A50..11A99; PVALID # SOYOMBO LETTER A..SOYOMBO SUBJOINER 1070 11FC0..11FF1; DISALLOWED # TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIE 1071 11FFF ; DISALLOWED # TAMIL PUNCTUATION END OF TEXT 1072 13430..13438; DISALLOWED # EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIA 1073 16F00..16F4A; PVALID # MIAO LETTER PA..MIAO LETTER RTE 1074 16F4F..16F87; PVALID # MIAO SIGN CONSONANT MODIFIER BAR..MIAO VOWEL 1075 16FE2 ; DISALLOWED # OLD CHINESE HOOK MARK 1076 16FE3 ; PVALID # OLD CHINESE ITERATION MARK 1077 17000..187F7; PVALID # .. 1078 1B150..1B152; PVALID # HIRAGANA LETTER SMALL WI..HIRAGANA LETTER SM 1079 1B164..1B167; PVALID # KATAKANA LETTER SMALL WI..KATAKANA LETTER SM 1080 1E100..1E12C; PVALID # NYIAKENG PUACHUE HMONG LETTER MA..NYIAKENG P 1081 1E130..1E13D; PVALID # NYIAKENG PUACHUE HMONG TONE-B..NYIAKENG PUAC 1082 1E140..1E149; PVALID # NYIAKENG PUACHUE HMONG DIGIT ZERO..NYIAKENG 1083 1E14E ; PVALID # NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ 1084 1E14F ; DISALLOWED # NYIAKENG PUACHUE HMONG CIRCLED CA 1085 1E2C0..1E2F9; PVALID # WANCHO LETTER AA..WANCHO DIGIT NINE 1086 1E2FF ; DISALLOWED # WANCHO NGUN SIGN 1087 1E922..1E94B; PVALID # ADLAM SMALL LETTER ALIF..ADLAM NASALIZATION 1088 1ED01..1ED3D; DISALLOWED # OTTOMAN SIYAQ NUMBER ONE..OTTOMAN SIYAQ FRAC 1089 1F110..1F16C; DISALLOWED # PARENTHESIZED LATIN CAPITAL LETTER A..RAISED 1090 1F300..1F6D5; DISALLOWED # CYCLONE..HINDU TEMPLE 1091 1F6F0..1F6FA; DISALLOWED # SATELLITE..AUTO RICKSHAW 1092 1F7E0..1F7EB; DISALLOWED # LARGE ORANGE CIRCLE..LARGE BROWN SQUARE 1093 1F90D..1F971; DISALLOWED # WHITE HEART..YAWNING FACE 1094 1F97A..1F9A2; DISALLOWED # FACE WITH PLEADING EYES..SWAN 1095 1F9A5..1F9AA; DISALLOWED # SLOTH..OYSTER 1096 1F9AE..1F9CA; DISALLOWED # GUIDE DOG..ICE CUBE 1097 1F9CD..1FA53; DISALLOWED # STANDING PERSON..BLACK CHESS KNIGHT-BISHOP 1098 1FA70..1FA73; DISALLOWED # BALLET SHOES..SHORTS 1099 1FA78..1FA7A; DISALLOWED # DROP OF BLOOD..STETHOSCOPE 1100 1FA80..1FA82; DISALLOWED # YO-YO..PARACHUTE 1101 1FA90..1FA95; DISALLOWED # RINGED PLANET..BANJO 1103 Author's Address 1105 Patrik Faltstrom 1106 Netnod 1108 Email: paf@netnod.se