idnits 2.17.1 draft-ietf-precis-7564bis-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 1, 2017) is 2549 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 7564 (ref. 'Err4568') (Obsoleted by RFC 8264) == Outdated reference: A later version (-05) exists of draft-klensin-idna-5892upd-unicode70-04 -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Duplicate reference: RFC7564, mentioned in 'RFC7564', was also mentioned in 'Err4568'. -- Obsolete informational reference (is this intentional?): RFC 7564 (Obsoleted by RFC 8264) -- Obsolete informational reference (is this intentional?): RFC 7613 (Obsoleted by RFC 8265) -- Obsolete informational reference (is this intentional?): RFC 7700 (Obsoleted by RFC 8266) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Saint-Andre 3 Internet-Draft Filament 4 Obsoletes: 7564 (if approved) M. Blanchet 5 Intended status: Standards Track Viagenie 6 Expires: November 2, 2017 May 1, 2017 8 PRECIS Framework: Preparation, Enforcement, and Comparison of 9 Internationalized Strings in Application Protocols 10 draft-ietf-precis-7564bis-07 12 Abstract 14 Application protocols using Unicode code points in protocol strings 15 need to properly handle such strings in order to enforce 16 internationalization rules for strings placed in various protocol 17 slots (such as addresses and identifiers) and to perform valid 18 comparison operations (e.g., for purposes of authentication or 19 authorization). This document defines a framework enabling 20 application protocols to perform the preparation, enforcement, and 21 comparison of internationalized strings ("PRECIS") in a way that 22 depends on the properties of Unicode code points and thus is more 23 agile with respect to versions of Unicode. As a result, this 24 framework provides a more sustainable approach to the handling of 25 internationalized strings than the previous framework, known as 26 Stringprep (RFC 3454). This document obsoletes RFC 7564. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on November 2, 2017. 45 Copyright Notice 47 Copyright (c) 2017 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 7 65 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 8 66 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 8 67 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9 68 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9 69 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 10 70 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 10 71 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 11 72 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 11 73 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 11 74 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11 75 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 12 76 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 12 77 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12 78 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12 79 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 13 80 5.1. Profiles Must Not Be Multiplied beyond Necessity . . . . 13 81 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 14 83 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 14 84 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 15 85 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 15 86 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 16 87 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 17 88 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 17 89 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 17 90 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 19 91 6.3. Building Application-Layer Constructs . . . . . . . . . . 19 92 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 20 93 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 20 94 9. Category Definitions Used to Calculate Derived Property . . . 23 95 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 24 96 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 24 97 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 24 98 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 24 99 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 24 100 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 24 101 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 24 102 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 25 103 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 25 104 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 25 105 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 25 106 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 25 107 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 26 108 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 26 109 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 26 110 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 26 111 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 26 112 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 27 113 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 27 114 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 115 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 28 116 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 28 117 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 29 118 12. Security Considerations . . . . . . . . . . . . . . . . . . . 31 119 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 31 120 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 31 121 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 32 122 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 32 123 12.5. Visually Similar Characters . . . . . . . . . . . . . . 32 124 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 34 125 13. Interoperability Considerations . . . . . . . . . . . . . . . 35 126 13.1. Coded Character Sets . . . . . . . . . . . . . . . . . . 35 127 13.2. Dependency on Unicode . . . . . . . . . . . . . . . . . 35 128 13.3. Encoding . . . . . . . . . . . . . . . . . . . . . . . . 35 129 13.4. Unicode Versions . . . . . . . . . . . . . . . . . . . . 36 130 13.5. Potential Changes to Handling of Certain Unicode Code 131 Points . . . . . . . . . . . . . . . . . . . . . . . . . 36 132 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 133 14.1. Normative References . . . . . . . . . . . . . . . . . . 37 134 14.2. Informative References . . . . . . . . . . . . . . . . . 37 135 Appendix A. Changes from RFC 7564 . . . . . . . . . . . . . . . 42 136 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 42 137 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 139 1. Introduction 141 Application protocols using Unicode code points [Unicode] in protocol 142 strings need to properly handle such strings in order to enforce 143 internationalization rules for strings placed in various protocol 144 slots (such as addresses and identifiers) and to perform valid 145 comparison operations (e.g., for purposes of authentication or 146 authorization). This document defines a framework enabling 147 application protocols to perform the preparation, enforcement, and 148 comparison of internationalized strings ("PRECIS") in a way that 149 depends on the properties of Unicode code points and thus is more 150 agile with respect to versions of Unicode. 152 As described in the PRECIS problem statement [RFC6885], many IETF 153 protocols have used the Stringprep framework [RFC3454] as the basis 154 for preparing, enforcing, and comparing protocol strings that contain 155 Unicode code points, especially code points outside the ASCII range 156 [RFC20]. The Stringprep framework was developed during work on the 157 original technology for internationalized domain names (IDNs), here 158 called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the 159 Stringprep profile for IDNs. At the time, Stringprep was designed as 160 a general framework so that other application protocols could define 161 their own Stringprep profiles. Indeed, a number of application 162 protocols defined such profiles. 164 After the publication of [RFC3454] in 2002, several significant 165 issues arose with the use of Stringprep in the IDN case, as 166 documented in the IAB's recommendations regarding IDNs [RFC4690] 167 (most significantly, Stringprep was tied to Unicode version 3.2). 168 Therefore, the newer IDNA specifications, here called "IDNA2008" 169 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer 170 use Stringprep and Nameprep. This migration away from Stringprep for 171 IDNs prompted other "customers" of Stringprep to consider new 172 approaches to the preparation, enforcement, and comparison of 173 internationalized strings, as described in [RFC6885]. 175 This document defines a framework for a post-Stringprep approach to 176 the preparation, enforcement, and comparison of internationalized 177 strings in application protocols, based on several principles: 179 1. Define a small set of string classes that specify the Unicode 180 code points appropriate for common application protocol 181 constructs (where possible, maintaining compatibility with 182 IDNA2008 to help ensure a more consistent user experience). 184 2. Define each PRECIS string class in terms of Unicode code points 185 and their properties so that an algorithm can be used to 186 determine whether each code point or character category is 187 (a) valid, (b) allowed in certain contexts, (c) disallowed, or 188 (d) unassigned. 190 3. Use an "inclusion model" such that a string class consists only 191 of code points that are explicitly allowed, with the result that 192 any code point not explicitly allowed is forbidden. 194 4. Enable application protocols to define profiles of the PRECIS 195 string classes if necessary (addressing matters such as width 196 mapping, case mapping, Unicode normalization, and directionality) 197 but strongly discourage the multiplication of profiles beyond 198 necessity in order to avoid violations of the "Principle of Least 199 Astonishment". 201 It is expected that this framework will yield the following benefits: 203 o Application protocols will be more agile with regard to Unicode 204 versions (recognizing that complete agility cannot be realized in 205 practice). 207 o Implementers will be able to share code point tables and software 208 code across application protocols, most likely by means of 209 software libraries. 211 o End users will be able to acquire more accurate expectations about 212 the code points that are acceptable in various contexts. Given 213 this more uniform set of string classes, it is also expected that 214 copy/paste operations between software implementing different 215 application protocols will be more predictable and coherent. 217 Whereas the string classes define the "baseline" code points for a 218 range of applications, profiling enables application protocols to 219 apply the string classes in ways that are appropriate for common 220 constructs such as usernames [RFC7613], opaque strings such as 221 passwords [RFC7613], and nicknames [RFC7700]. Profiles are 222 responsible for defining the handling of right-to-left code points as 223 well as various mapping operations of the kind also discussed for 224 IDNs in [RFC5895], such as case preservation or lowercasing, Unicode 225 normalization, mapping of certain code points to other code points or 226 to nothing, and mapping of fullwidth and halfwidth code points. 228 When an application applies a profile of a PRECIS string class, it 229 transforms an input string (which might or might not be conforming) 230 into an output string that definitively conforms to the profile. In 231 particular, this document focuses on the resulting ability to achieve 232 the following objectives: 234 a. Enforcing all the rules of a profile for a single output string 235 (e.g., to determine if a string can be included in a protocol 236 slot, communicated to another entity within a protocol, stored in 237 a retrieval system, etc.) to check whether the output string 238 conforms to the rules of the profile. 240 b. Comparing two output strings to determine if they are equivalent, 241 typically through octet-for-octet matching to test for 242 "bit-string identity" (e.g., to make an access decision for 243 purposes of authentication or authorization as further described 244 in [RFC6943]). 246 The opportunity to define profiles naturally introduces the 247 possibility of a proliferation of profiles, thus potentially 248 mitigating the benefits of common code and violating user 249 expectations. See Section 5 for a discussion of this important 250 topic. 252 In addition, it is extremely important for protocol designers and 253 application developers to understand that the transformation of an 254 input string to an output string is rarely reversible. As one 255 relatively simple example, case mapping would transform an input 256 string of "StPeter" to "stpeter", and information about the 257 capitalization of the first and third characters would be lost. 258 Similar considerations apply to other forms of mapping and 259 normalization. 261 Although this framework is similar to IDNA2008 and includes by 262 reference some of the character categories defined in [RFC5892], it 263 defines additional character categories to meet the needs of common 264 application protocols other than DNS. 266 The character categories and calculation rules defined under 267 Sections 8 and 9 are normative and apply to all Unicode code points. 268 The code point table that results from applying the character 269 categories and calculation rules to the latest version of Unicode can 270 be found in an IANA registry. 272 2. Terminology 274 Many important terms used in this document are defined in [RFC5890], 275 [RFC6365], [RFC6885], and [Unicode]. The terms "left-to-right" (LTR) 276 and "right-to-left" (RTL) are defined in Unicode Standard Annex #9 277 [UAX9]. 279 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 280 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 281 "OPTIONAL" in this document are to be interpreted as described in 282 [RFC2119]. 284 3. Preparation, Enforcement, and Comparison 286 This document distinguishes between three different actions that an 287 entity can take with regard to a string: 289 o Enforcement entails applying all of the rules specified for a 290 particular string class or profile thereof to an individual 291 string, for the purpose of checking whether the string conforms to 292 all of the rules and thus determining if the string can be used in 293 a given protocol slot. 295 o Comparison entails applying all of the rules specified for a 296 particular string class or profile thereof to two separate 297 strings, for the purpose of determining if the two strings are 298 equivalent. 300 o Preparation primarily entails ensuring that the code points in an 301 individual string are allowed by the underlying PRECIS string 302 class, and sometimes also entails applying one or more of the 303 rules specified for a particular string class or profile thereof. 304 Preparation can be appropriate for constrained devices that can to 305 some extent restrict the code points in a string to a limited 306 repertoire of characters but that do not have the processing power 307 or onboard memory to perform operations such as Unicode 308 normalization. However, preparation does not ensure that an input 309 string conforms to all of the rules for a string class or profile 310 thereof. 312 Note: The term "preparation" as used in this specification and 313 related documents has a much more limited scope than it did in 314 Stringprep; it essentially refers to a kind of preprocessing of 315 an input string, not the actual operations that apply 316 internationalization rules to produce an output string (here 317 termed "enforcement") or to compare two output strings (here 318 termed "comparison"). 320 In most cases, authoritative entities such as servers are responsible 321 for enforcement, whereas subsidiary entities such as clients are 322 responsible only for preparation. The rationale for this distinction 323 is that clients might not have the facilities (in terms of device 324 memory and processing power) to enforce all the rules regarding 325 internationalized strings (such as width mapping and Unicode 326 normalization), although they can more easily limit the repertoire of 327 characters they offer to an end user. By contrast, it is assumed 328 that a server would have more capacity to enforce the rules, and in 329 any case acts as an authority regarding allowable strings in protocol 330 slots such as addresses and endpoint identifiers. In addition, a 331 client cannot necessarily be trusted to properly generate such 332 strings, especially for security-sensitive contexts such as 333 authentication and authorization. 335 4. String Classes 337 4.1. Overview 339 Starting in 2010, various "customers" of Stringprep began to discuss 340 the need to define a post-Stringprep approach to the preparation and 341 comparison of internationalized strings other than IDNs. This 342 community analyzed the existing Stringprep profiles and also weighed 343 the costs and benefits of defining a relatively small set of Unicode 344 code points that would minimize the potential for user confusion 345 caused by visually similar code points (and thus be relatively 346 "safe") vs. defining a much larger set of Unicode code points that 347 would maximize the potential for user creativity (and thus be 348 relatively "expressive"). As a result, the community concluded that 349 most existing uses could be addressed by two string classes: 351 IdentifierClass: a sequence of letters, numbers, and some symbols 352 that is used to identify or address a network entity such as a 353 user account, a venue (e.g., a chatroom), an information source 354 (e.g., a data feed), or a collection of data (e.g., a file); the 355 intent is that this class will minimize user confusion in a wide 356 variety of application protocols, with the result that safety has 357 been prioritized over expressiveness for this class. 359 FreeformClass: a sequence of letters, numbers, symbols, spaces, and 360 other code points that is used for free-form strings, including 361 passwords as well as display elements such as human-friendly 362 nicknames for devices or for participants in a chatroom; the 363 intent is that this class will allow nearly any Unicode code 364 point, with the result that expressiveness has been prioritized 365 over safety for this class. Note well that protocol designers, 366 application developers, service providers, and end users might not 367 understand or be able to enter all of the code points that can be 368 included in the FreeformClass -- see Section 12.3 for details. 370 Future specifications might define additional PRECIS string classes, 371 such as a class that falls somewhere between the IdentifierClass and 372 the FreeformClass. At this time, it is not clear how useful such a 373 class would be. In any case, because application developers are able 374 to define profiles of PRECIS string classes, a protocol needing a 375 construct between the IdentifierClass and the FreeformClass could 376 define a restricted profile of the FreeformClass if needed. 378 The following subsections discuss the IdentifierClass and 379 FreeformClass in more detail, with reference to the dimensions 380 described in Section 5 of [RFC6885]. Each string class is defined by 381 the following behavioral rules: 383 Valid: Defines which code points are treated as valid for the 384 string. 386 Contextual Rule Required: Defines which code points are treated as 387 allowed only if the requirements of a contextual rule are met 388 (i.e., either CONTEXTJ or CONTEXTO as originally defined in the 389 IDNA2008 specifications). 391 Disallowed: Defines which code points need to be excluded from the 392 string. 394 Unassigned: Defines application behavior in the presence of code 395 points that are unknown (i.e., not yet designated) for the version 396 of Unicode used by the application. 398 This document defines the valid, contextual rule required, 399 disallowed, and unassigned rules for the IdentifierClass and 400 FreeformClass. As described under Section 5, profiles of these 401 string classes are responsible for defining the width mapping, 402 additional mappings, case mapping, normalization, and directionality 403 rules. 405 4.2. IdentifierClass 407 Most application technologies need strings that can be used to refer 408 to, include, or communicate protocol strings like usernames, 409 filenames, data feed identifiers, and chatroom names. We group such 410 strings into a class called "IdentifierClass" having the following 411 features. 413 4.2.1. Valid 415 o Code points traditionally used as letters and numbers in writing 416 systems, i.e., the LetterDigits ("A") category first defined in 417 [RFC5892] and listed here under Section 9.1. 419 o Code points in the range U+0021 through U+007E, i.e., the 420 (printable) ASCII7 ("K") category defined under Section 9.11. 421 These code points are "grandfathered" into PRECIS and thus are 422 valid even if they would otherwise be disallowed according to the 423 property-based rules specified in the next section. 425 Note: Although the PRECIS IdentifierClass reuses the LetterDigits 426 category from IDNA2008, the range of code points allowed in the 427 IdentifierClass is wider than the range of code points allowed in 428 IDNA2008. The main reason is that IDNA2008 applies the Unstable 429 category before the LetterDigits category, thus disallowing 430 uppercase code points, whereas the IdentifierClass does not apply 431 the Unstable category. 433 4.2.2. Contextual Rule Required 435 o A number of code points from the Exceptions ("F") category defined 436 under Section 9.6 (see Section 9.6 for a full list). 438 o Joining code points, i.e., the JoinControl ("H") category defined 439 under Section 9.8. 441 4.2.3. Disallowed 443 o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I") 444 category defined under Section 9.9. 446 o Control code points, i.e., the Controls ("L") category defined 447 under Section 9.12. 449 o Ignorable code points, i.e., the PrecisIgnorableProperties ("M") 450 category defined under Section 9.13. 452 o Space code points, i.e., the Spaces ("N") category defined under 453 Section 9.14. 455 o Symbol code points, i.e., the Symbols ("O") category defined under 456 Section 9.15. 458 o Punctuation code points, i.e., the Punctuation ("P") category 459 defined under Section 9.16. 461 o Any code point that is decomposed and recomposed into something 462 other than itself under Unicode normalization form KC, i.e., the 463 HasCompat ("Q") category defined under Section 9.17. These code 464 points are disallowed even if they would otherwise be valid 465 according to the property-based rules specified in the previous 466 section. 468 o Letters and digits other than the "traditional" letters and digits 469 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 470 defined under Section 9.18. 472 4.2.4. Unassigned 474 Any code points that are not yet designated in the Unicode coded 475 character set are considered unassigned for purposes of the 476 IdentifierClass, and such code points are to be treated as 477 disallowed. See Section 9.10. 479 4.2.5. Examples 481 As described in the Introduction to this document, the string classes 482 do not handle all issues related to string preparation and comparison 483 (such as case mapping); instead, such issues are handled at the level 484 of profiles. Examples for profiles of the IdentifierClass can be 485 found in [RFC7613] (the UsernameCaseMapped and UsernameCasePreserved 486 profiles). 488 4.3. FreeformClass 490 Some application technologies need strings that can be used in a 491 free-form way, e.g., as a password in an authentication exchange (see 492 [RFC7613]) or a nickname in a chatroom (see [RFC7700]). We group 493 such things into a class called "FreeformClass" having the following 494 features. 496 Security Warning: As mentioned, the FreeformClass prioritizes 497 expressiveness over safety; Section 12.3 describes some of the 498 security hazards involved with using or profiling the 499 FreeformClass. 501 Security Warning: Consult Section 12.6 for relevant security 502 considerations when strings conforming to the FreeformClass, or a 503 profile thereof, are used as passwords. 505 4.3.1. Valid 507 o Traditional letters and numbers, i.e., the LetterDigits ("A") 508 category first defined in [RFC5892] and listed here under 509 Section 9.1. 511 o Letters and digits other than the "traditional" letters and digits 512 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 513 defined under Section 9.18. 515 o Code points in the range U+0021 through U+007E, i.e., the 516 (printable) ASCII7 ("K") category defined under Section 9.11. 518 o Any code point that is decomposed and recomposed into something 519 other than itself under Unicode normalization form KC, i.e., the 520 HasCompat ("Q") category defined under Section 9.17. 522 o Space code points, i.e., the Spaces ("N") category defined under 523 Section 9.14. 525 o Symbol code points, i.e., the Symbols ("O") category defined under 526 Section 9.15. 528 o Punctuation code points, i.e., the Punctuation ("P") category 529 defined under Section 9.16. 531 4.3.2. Contextual Rule Required 533 o A number of code points from the Exceptions ("F") category defined 534 under Section 9.6 (see Section 9.6 for a full list). 536 o Joining code points, i.e., the JoinControl ("H") category defined 537 under Section 9.8. 539 4.3.3. Disallowed 541 o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I") 542 category defined under Section 9.9. 544 o Control code points, i.e., the Controls ("L") category defined 545 under Section 9.12. 547 o Ignorable code points, i.e., the PrecisIgnorableProperties ("M") 548 category defined under Section 9.13. 550 4.3.4. Unassigned 552 Any code points that are not yet designated in the Unicode coded 553 character set are considered unassigned for purposes of the 554 FreeformClass, and such code points are to be treated as disallowed. 556 4.3.5. Examples 558 As described in the Introduction to this document, the string classes 559 do not handle all issues related to string preparation and comparison 560 (such as case mapping); instead, such issues are handled at the level 561 of profiles. Examples for profiles of the FreeformClass can be found 562 in [RFC7613] (the OpaqueString profile) and [RFC7700] (the Nickname 563 profile). 565 5. Profiles 567 This framework document defines the valid, contextual-rule-required, 568 disallowed, and unassigned rules for the IdentifierClass and the 569 FreeformClass. A profile of a PRECIS string class MUST define the 570 width mapping, additional mappings (if any), case mapping, 571 normalization, and directionality rules. A profile MAY also restrict 572 the allowable code points above and beyond the definition of the 573 relevant PRECIS string class (but MUST NOT add as valid any code 574 points that are disallowed by the relevant PRECIS string class). 575 These matters are discussed in the following subsections. 577 Profiles of the PRECIS string classes are registered with the IANA as 578 described under Section 11.3. Profile names use the following 579 convention: they are of the form "Profilename of BaseClass", where 580 the "Profilename" string is a differentiator and "BaseClass" is the 581 name of the PRECIS string class being profiled; for example, the 582 profile of the FreeformClass used for opaque strings such as 583 passwords is the OpaqueString profile [RFC7613]. 585 5.1. Profiles Must Not Be Multiplied beyond Necessity 587 The risk of profile proliferation is significant because having too 588 many profiles will result in different behavior across various 589 applications, thus violating what is known in user interface design 590 as the "Principle of Least Astonishment". 592 Indeed, we already have too many profiles. Ideally we would have at 593 most two or three profiles. Unfortunately, numerous application 594 protocols exist with their own quirks regarding protocol strings. 595 Domain names, email addresses, instant messaging addresses, chatroom 596 nicknames, filenames, authentication identifiers, passwords, and 597 other strings are already out there in the wild and need to be 598 supported in existing application protocols such as DNS, SMTP, the 599 Extensible Messaging and Presence Protocol (XMPP), Internet Relay 600 Chat (IRC), NFS, the Internet Small Computer System Interface 601 (iSCSI), the Extensible Authentication Protocol (EAP), and the Simple 602 Authentication and Security Layer (SASL), among others. 604 Nevertheless, profiles must not be multiplied beyond necessity. 606 To help prevent profile proliferation, this document recommends 607 sensible defaults for the various options offered to profile creators 608 (such as width mapping and Unicode normalization). In addition, the 609 guidelines for designated experts provided under Section 10 are meant 610 to encourage a high level of due diligence regarding new profiles. 612 5.2. Rules 614 5.2.1. Width Mapping Rule 616 The width mapping rule of a profile specifies whether width mapping 617 is performed on a string, and how the mapping is done. Typically, 618 such mapping consists of mapping fullwidth and halfwidth code points, 619 i.e., code points with a Decomposition Type of Wide or Narrow, to 620 their decomposition mappings; as an example, FULLWIDTH DIGIT ZERO 621 (U+FF10) would be mapped to DIGIT ZERO (U+0030). 623 The normalization form specified by a profile (see below) has an 624 impact on the need for width mapping. Because width mapping is 625 performed as a part of compatibility decomposition, a profile 626 employing either normalization form KD (NFKD) or normalization form 627 KC (NFKC) does not need to specify width mapping. However, if 628 Unicode normalization form C (NFC) is used (as is recommended) then 629 the profile needs to specify whether to apply width mapping; in this 630 case, width mapping is in general RECOMMENDED because allowing 631 fullwidth and halfwidth code points to remain unmapped to their 632 compatibility variants would violate the "Principle of Least 633 Astonishment". For more information about the concept of width in 634 East Asian scripts within Unicode, see Unicode Standard Annex #11 635 [UAX11]. 637 Note: Because the East Asian width property is not guaranteed to 638 be stable by the Unicode Standard (see 639 for details), 640 the results of applying a given width mapping rule might not be 641 consistent across different versions of Unicode. 643 5.2.2. Additional Mapping Rule 645 The additional mapping rule of a profile specifies whether additional 646 mappings are performed on a string, such as: 648 Mapping of delimiter code points (such as '@', ':', '/', '+', and 649 '-') 651 Mapping of special code points (e.g., non-ASCII space code points 652 to ASCII space or control code points to nothing). 654 The PRECIS mappings document [RFC7790] describes such mappings in 655 more detail. 657 5.2.3. Case Mapping Rule 659 The case mapping rule of a profile specifies whether case mapping 660 (instead of case preservation) is performed on a string, and how the 661 mapping is applied (e.g., mapping uppercase and titlecase code points 662 to their lowercase equivalents). 664 If case mapping is desired (instead of case preservation), it is 665 RECOMMENDED to use the Unicode toLowerCase() operation defined in the 666 Unicode Standard [Unicode]. In contrast to the Unicode toCaseFold() 667 operation, the toLowerCase() operation is less likely to violate the 668 "Principle of Least Astonishment", especially when an application 669 merely wishes to convert uppercase and titlecase code points to the 670 lowercase equivalents while preserving lowercase code points. 671 Although the toCaseFold() operation can be appropriate when an 672 application needs to compare two strings (such as in search 673 operations), in general few application developers and even fewer 674 users understand its implications, so toLowerCase() is almost always 675 the safer choice. 677 Note: Neither toLowerCase() nor toCaseFold() is designed to handle 678 various language-specific issues (such as so-called "dotless i" in 679 several Turkic languages). The reader is referred to the PRECIS 680 mappings document [RFC7790], which describes these issues in 681 greater detail. 683 In order to maximize entropy and minimize the potential for false 684 positives, it is NOT RECOMMENDED for application protocols to map 685 uppercase and titlecase code points to their lowercase equivalents 686 when strings conforming to the FreeformClass, or a profile thereof, 687 are used in passwords; instead, it is RECOMMENDED to preserve the 688 case of all code points contained in such strings and then perform 689 case-sensitive comparison. See also the related discussion in 690 Section 12.6 and in [RFC7613]. 692 5.2.4. Normalization Rule 694 The normalization rule of a profile specifies which Unicode 695 normalization form (D, KD, C, or KC) is to be applied (see Unicode 696 Standard Annex #15 [UAX15] for background information). 698 In accordance with [RFC5198], normalization form C (NFC) is 699 RECOMMENDED. 701 Protocol designers and application developers need to understand that 702 use certain Unicode normalization forms, especially NFKC and NFKD, 703 can result in significant loss of information in various 704 circumstances, and that these circumstances can vary depending on the 705 language and script of the strings to which the normalization forms 706 are applied. Extreme care should be taken when specifying the use of 707 these normalization forms. 709 5.2.5. Directionality Rule 711 The directionality rule of a profile specifies how to treat strings 712 containing what are often called "right-to-left" (RTL) code points 713 (see Unicode Standard Annex #9 [UAX9]). RTL code points come from 714 scripts that are normally written from right to left and are 715 considered by Unicode to, themselves, have right-to-left 716 directionality. Some strings containing RTL code points also contain 717 "left-to-right" (LTR) code points, such as ASCII numerals, as well as 718 code points without directional properties. Consequently, such 719 strings are known as "bidirectional strings". 721 Presenting bidirectional strings in different layout systems (e.g., a 722 user interface that is configured to handle primarily an RTL script 723 vs. an interface that is configured to handle primarily an LTR 724 script) can yield display results that, while predictable to those 725 who understand the display rules, are counter-intuitive to casual 726 users. In particular, the same bidirectional string (in PRECIS 727 terms) might not be presented in the same way to users of those 728 different layout systems, even though the presentation is consistent 729 within any particular layout system. In some applications, these 730 presentation differences might be considered problematic and thus the 731 application designers might wish to restrict the use of bidirectional 732 strings by specifying a directionality rule. In other applications, 733 these presentation differences might not be considered problematic 734 (this especially tends to be true of more "free-form" strings) and 735 thus no directionality rule is needed. 737 The PRECIS framework does not directly address how to deal with 738 bidirectional strings across all string classes and profiles, and 739 does not define any new directionality rules, because at present 740 there is no widely accepted and implemented solution for the safe 741 display of arbitrary bidirectional strings beyond the Unicode 742 bidirectional algorithm [UAX9]. Although rules for management and 743 display of bidirectional strings have been defined for domain name 744 labels and similar identifiers through the "Bidi Rule" specified in 745 the IDNA2008 specification on right-to-left scripts [RFC5893], those 746 rules are quite restrictive and are not necessarily applicable to all 747 bidirectional strings. 749 The authors of a PRECIS profile might believe that they need to 750 define a new directionality rule of their own. Because of the 751 complexity of the issues involved, such a belief is almost always 752 misguided, even if the authors have done a great deal of careful 753 research into the challenges of displaying bidirectional strings. 754 This document strongly suggests that profile authors who are thinking 755 about defining a new directionality rule think again, and instead 756 consider using the "Bidi Rule" [RFC5893] (for profiles based on the 757 IdentifierClass) or following the Unicode bidirectional algorithm 758 [UAX9] (for profiles based on the FreeformClass or in situations 759 where the IdentifierClass is not appropriate). 761 5.3. A Note about Spaces 763 With regard to the IdentifierClass, the consensus of the PRECIS 764 Working Group was that spaces are problematic for many reasons, 765 including the following: 767 o Many Unicode code points are confusable with ASCII space. 769 o Even if non-ASCII space code points are mapped to ASCII space 770 (U+0020), space code points are often not rendered in user 771 interfaces, leading to the possibility that a human user might 772 consider a string containing spaces to be equivalent to the same 773 string without spaces. 775 o In some locales, some devices are known to generate a code point 776 other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a 777 user performs an action like hitting the space bar on a keyboard. 779 One consequence of disallowing space code points in the 780 IdentifierClass might be to effectively discourage their use within 781 identifiers created in newer application protocols; given the 782 challenges involved with properly handling space code points 783 (especially non-ASCII space code points) in identifiers and other 784 protocol strings, the PRECIS Working Group considered this to be a 785 feature, not a bug. 787 However, the FreeformClass does allow spaces, which enables 788 application protocols to define profiles of the FreeformClass that 789 are more flexible than any profiles of the IdentifierClass. In 790 addition, as explained in Section 6.3, application protocols can also 791 define application-layer constructs containing spaces. 793 6. Applications 795 6.1. How to Use PRECIS in Applications 797 Although PRECIS has been designed with applications in mind, 798 internationalization is not suddenly made easy through the use of 799 PRECIS. Indeed, because it is extremely difficult for protocol 800 designers and application developers to do the right thing for all 801 users when supporting internationalized strings, often the safest 802 option is to support only the ASCII range [RFC20] in various protocol 803 slots. This state of affairs is unfortunate but is the direct result 804 of the complexities involved with human languages (e.g., the vast 805 number of code points, scripts, user communities, and rules with 806 their inevitable exceptions), which kinds of strings application 807 developers and their users wish to support, the wide range of devices 808 that users employ to access services enabled by various Internet 809 protocols, and so on. 811 Despite these significant challenges, application and protocol 812 developers sometimes persevere in attempting to support 813 internationalized strings in their systems. These developers need to 814 think carefully about how they will use the PRECIS string classes, or 815 profiles thereof, in their applications. This section provides some 816 guidelines to application developers (and to expert reviewers of 817 application protocol specifications). 819 o Don't define your own profile unless absolutely necessary (see 820 Section 5.1). Existing profiles have been designed for wide 821 reuse. It is highly likely that an existing profile will meet 822 your needs, especially given the ability to specify further 823 excluded code points (Section 6.2) and to build application-layer 824 constructs (see Section 6.3). 826 o Do specify: 828 * Exactly which entities are responsible for preparation, 829 enforcement, and comparison of internationalized strings (e.g., 830 servers or clients). 832 * Exactly when those entities need to complete their tasks (e.g., 833 a server might need to enforce the rules of a profile before 834 allowing a client to gain network access). 836 * Exactly which protocol slots need to be checked against which 837 profiles (e.g., checking the address of a message's intended 838 recipient against the UsernameCaseMapped profile [RFC7613] of 839 the IdentifierClass, or checking the password of a user against 840 the OpaqueString profile [RFC7613] of the FreeformClass). 842 See [RFC7613] and [RFC7622] for definitions of these matters for 843 several applications. 845 6.2. Further Excluded Characters 847 An application protocol that uses a profile MAY specify particular 848 code points that are not allowed in relevant slots within that 849 application protocol, above and beyond those excluded by the string 850 class or profile. 852 That is, an application protocol MAY do either of the following: 854 1. Exclude specific code points that are allowed by the relevant 855 string class. 857 2. Exclude code points matching certain Unicode properties (e.g., 858 math symbols) that are included in the relevant PRECIS string 859 class. 861 As a result of such exclusions, code points that are defined as valid 862 for the PRECIS string class or profile will be defined as disallowed 863 for the relevant protocol slot. 865 Typically, such exclusions are defined for the purpose of backward 866 compatibility with legacy formats within an application protocol. 867 These are defined for application protocols, not profiles, in order 868 to prevent multiplication of profiles beyond necessity (see 869 Section 5.1). 871 6.3. Building Application-Layer Constructs 873 Sometimes, an application-layer construct does not map in a 874 straightforward manner to one of the base string classes or a profile 875 thereof. Consider, for example, the "simple user name" construct in 876 the Simple Authentication and Security Layer (SASL) [RFC4422]. 877 Depending on the deployment, a simple user name might take the form 878 of a user's full name (e.g., the user's personal name followed by a 879 space and then the user's family name). Such a simple user name 880 cannot be defined as an instance of the IdentifierClass or a profile 881 thereof, because space code points are not allowed in the 882 IdentifierClass; however, it could be defined using a space-separated 883 sequence of IdentifierClass instances, as in the following ABNF 884 [RFC5234] from [RFC7613]: 886 username = userpart *(1*SP userpart) 887 userpart = 1*(idpoint) 888 ; 889 ; an "idpoint" is a Unicode code point that 890 ; can be contained in a string conforming to 891 ; the PRECIS IdentifierClass 892 ; 894 Similar techniques could be used to define many application-layer 895 constructs, say of the form "user@domain" or "/path/to/file". 897 7. Order of Operations 899 To ensure proper comparison, the rules specified for a particular 900 string class or profile MUST be applied in the following order: 902 1. Width Mapping Rule 904 2. Additional Mapping Rule 906 3. Case Mapping Rule 908 4. Normalization Rule 910 5. Directionality Rule 912 6. Behavioral rules for determining whether a code point is valid, 913 allowed under a contextual rule, disallowed, or unassigned 915 As already described, the width mapping, additional mapping, case 916 mapping, normalization, and directionality rules are specified for 917 each profile, whereas the behavioral rules are specified for each 918 string class. Some of the logic behind this order is provided under 919 Section 5.2.1 (see also the PRECIS mappings document [RFC7790]). In 920 addition, this order is consistent with IDNA2008, and with both 921 IDNA2003 and Stringprep before then, for the purpose of enabling code 922 reuse and of ensuring as much continuity as possible with the 923 Stringprep profiles that are obsoleted by several PRECIS profiles. 925 Because of the order of operations specified here, applying the rules 926 for any given PRECIS profile is not necessarily an idempotent 927 procedure (e.g., under certain circumstances, such as when Unicode 928 normalization form KC is used, performing Unicode normalization after 929 case mapping can still yield uppercase characters for certain code 930 points); therefore, implementations might need to apply the rules 931 more than once to an internationalized string. 933 8. Code Point Properties 935 In order to implement the string classes described above, this 936 document does the following: 938 1. Reviews and classifies the collections of code points in the 939 Unicode coded character set by examining various code point 940 properties. 942 2. Defines an algorithm for determining a derived property value, 943 which can vary depending on the string class being used by the 944 relevant application protocol. 946 This document is not intended to specify precisely how derived 947 property values are to be applied in protocol strings. That 948 information is the responsibility of the protocol specification that 949 uses or profiles a PRECIS string class from this document. The value 950 of the property is to be interpreted as follows. 952 PROTOCOL VALID Those code points that are allowed to be used in any 953 PRECIS string class (currently, IdentifierClass and 954 FreeformClass). The abbreviated term "PVALID" is used to refer to 955 this value in the remainder of this document. 957 SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to 958 be used in specific string classes. In the remainder of this 959 document, the abbreviated term *_PVAL is used, where * = (ID | 960 FREE), i.e., either "FREE_PVAL" or "ID_PVAL". In practice, the 961 derived property ID_PVAL is not used in this specification, 962 because every ID_PVAL code point is PVALID. 964 CONTEXTUAL RULE REQUIRED Some characteristics of the code point, 965 such as its being invisible in certain contexts or problematic in 966 others, require that it not be used in a string unless specific 967 other code points or properties are present in the string. As in 968 IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED 969 -- the first for Join_controls (called "CONTEXTJ") and the second 970 for other code points (called "CONTEXTO"). A string MUST NOT 971 contain any characters whose validity is context-dependent, unless 972 the validity is positively confirmed by a contextual rule. To 973 check this, each code point identified as CONTEXTJ or CONTEXTO in 974 the PRECIS Derived Property Value registry MUST have a non-null 975 rule. If such a code point is missing a rule, the string is 976 invalid. If the rule exists but the result of applying the rule 977 is negative or inconclusive, the proposed string is invalid. The 978 most notable of the CONTEXTUAL RULE REQUIRED code points are the 979 Join Control code points U+200D ZERO WIDTH JOINER and U+200C ZERO 980 WIDTH NON-JOINER, which have a derived property value of CONTEXTJ. 981 See Appendix A of [RFC5892] for more information. 983 DISALLOWED Those code points that are not permitted in any PRECIS 984 string class. 986 SPECIFIC CLASS DISALLOWED Those code points that are not to be 987 included in one of the string classes but that might be permitted 988 in others. In the remainder of this document, the abbreviated 989 term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS" 990 or "ID_DIS". In practice, the derived property FREE_DIS is not 991 used in this specification, because every FREE_DIS code point is 992 DISALLOWED. 994 UNASSIGNED Those code points that are not designated (i.e., are 995 unassigned) in the Unicode Standard. 997 The algorithm to calculate the value of the derived property is as 998 follows (implementations MUST NOT modify the order of operations 999 within this algorithm, because doing so would cause inconsistent 1000 results across implementations): 1002 If .cp. .in. Exceptions Then Exceptions(cp); 1003 Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); 1004 Else If .cp. .in. Unassigned Then UNASSIGNED; 1005 Else If .cp. .in. ASCII7 Then PVALID; 1006 Else If .cp. .in. JoinControl Then CONTEXTJ; 1007 Else If .cp. .in. OldHangulJamo Then DISALLOWED; 1008 Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED; 1009 Else If .cp. .in. Controls Then DISALLOWED; 1010 Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL; 1011 Else If .cp. .in. LetterDigits Then PVALID; 1012 Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL; 1013 Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL; 1014 Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL; 1015 Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL; 1016 Else DISALLOWED; 1018 The value of the derived property calculated can depend on the string 1019 class; for example, if an identifier used in an application protocol 1020 is defined as profiling the PRECIS IdentifierClass then a space 1021 character such as U+0020 would be assigned to ID_DIS, whereas if an 1022 identifier is defined as profiling the PRECIS FreeformClass then the 1023 character would be assigned to FREE_PVAL. For the sake of brevity, 1024 the designation "FREE_PVAL" is used herein, instead of the longer 1025 designation "ID_DIS or FREE_PVAL". In practice, the derived 1026 properties ID_PVAL and FREE_DIS are not used in this specification, 1027 because every ID_PVAL code point is PVALID and every FREE_DIS code 1028 point is DISALLOWED. 1030 Use of the name of a rule (such as "Exceptions") implies the set of 1031 code points that the rule defines, whereas the same name as a 1032 function call (such as "Exceptions(cp)") implies the value that the 1033 code point has in the Exceptions table. 1035 The mechanisms described here allow determination of the value of the 1036 property for future versions of Unicode (including code points added 1037 after Unicode 5.2 or 7.0 depending on the category, because some 1038 categories mentioned in this document are simply pointers to IDNA2008 1039 and therefore were defined at the time of Unicode 5.2). Changes in 1040 Unicode properties that do not affect the outcome of this process 1041 therefore do not affect this framework. For example, a code point 1042 can have its Unicode General_Category value change from So to Sm, or 1043 from Lo to Ll, without affecting the algorithm results. Moreover, 1044 even if such changes were to result, the BackwardCompatible list 1045 (Section 9.7) can be adjusted to ensure the stability of the results. 1047 9. Category Definitions Used to Calculate Derived Property 1049 The derived property obtains its value based on a two-step procedure: 1051 1. Code points are placed in one or more character categories either 1052 (1) based on core properties defined by the Unicode Standard or 1053 (2) by treating the code point as an exception and addressing the 1054 code point based on its code point value. These categories are 1055 not mutually exclusive. 1057 2. Set operations are used with these categories to determine the 1058 values for a property specific to a given string class. These 1059 operations are specified under Section 8. 1061 Note: Unicode property names and property value names might have 1062 short abbreviations, such as "gc" for the General_Category 1063 property and "Ll" for the Lowercase_Letter property value of the 1064 gc property. 1066 In the following specification of character categories, the operation 1067 that returns the value of a particular Unicode code point property 1068 for a code point is designated by using the formal name of that 1069 property (from the Unicode PropertyAliases.txt file [PropertyAliases] 1070 followed by "(cp)" for "code point". For example, the value of the 1071 General_Category property for a code point is indicated by 1072 General_Category(cp). 1074 The first ten categories (A-J) shown below were previously defined 1075 for IDNA2008 and are referenced from [RFC5892] to ease the 1076 understanding of how PRECIS handles various code points. Some of 1077 these categories are reused in PRECIS, and some of them are not; 1078 however, the lettering of categories is retained to prevent overlap 1079 and to ease implementation of both IDNA2008 and PRECIS in a single 1080 software application. The next eight categories (K-R) are specific 1081 to PRECIS. 1083 9.1. LetterDigits (A) 1085 This category is defined in Section 2.1 of [RFC5892] and is included 1086 by reference for use in PRECIS. 1088 9.2. Unstable (B) 1090 This category is defined in Section 2.2 of [RFC5892]. However, it is 1091 not used in PRECIS. 1093 9.3. IgnorableProperties (C) 1095 This category is defined in Section 2.3 of [RFC5892]. However, it is 1096 not used in PRECIS. 1098 Note: See the PrecisIgnorableProperties ("M") category below for a 1099 more inclusive category used in PRECIS identifiers. 1101 9.4. IgnorableBlocks (D) 1103 This category is defined in Section 2.4 of [RFC5892]. However, it is 1104 not used in PRECIS. 1106 9.5. LDH (E) 1108 This category is defined in Section 2.5 of [RFC5892]. However, it is 1109 not used in PRECIS. 1111 Note: See the ASCII7 ("K") category below for a more inclusive 1112 category used in PRECIS identifiers. 1114 9.6. Exceptions (F) 1116 This category is defined in Section 2.6 of [RFC5892] and is included 1117 by reference for use in PRECIS. 1119 9.7. BackwardCompatible (G) 1121 This category is defined in Section 2.7 of [RFC5892] and is included 1122 by reference for use in PRECIS. 1124 Note: Management of this category is handled via the processes 1125 specified in [RFC5892]. At the time of this writing (and also at the 1126 time that RFC 5892 was published), this category consisted of the 1127 empty set; however, that is subject to change as described in 1128 RFC 5892. 1130 9.8. JoinControl (H) 1132 This category is defined in Section 2.8 of [RFC5892] and is included 1133 by reference for use in PRECIS. 1135 Note: In particular, the code points ZERO WIDTH JOINER (U+200D) and 1136 ZERO WIDTH NON-JOINER (U+200C) are necessary to produce certain 1137 combinations of characters in certain scripts (e.g., Arabic, Persian, 1138 and Indic scripts), but if used in other contexts can have 1139 consequences that violate the principle of least user astonishment. 1140 Therefore these code points are allowed only in contexts where they 1141 are appropriate, specifically where the relevant rule (CONTEXTJ or 1142 CONTEXTO) has been defined. See [RFC5892] and [RFC5894] for further 1143 discussion. 1145 9.9. OldHangulJamo (I) 1147 This category is defined in Section 2.9 of [RFC5892] and is included 1148 by reference for use in PRECIS. 1150 Note: Exclusion of these code points results in disallowing certain 1151 archaic Korean syllables and of restricting supported Korean 1152 syllables to preformed, modern Hangul characters. 1154 9.10. Unassigned (J) 1156 This category is defined in Section 2.10 of [RFC5892] and is included 1157 by reference for use in PRECIS. 1159 9.11. ASCII7 (K) 1161 This PRECIS-specific category consists of all printable, non-space 1162 code points from the 7-bit ASCII range. By applying this category, 1163 the algorithm specified under Section 8 exempts these code points 1164 from other rules that might be applied during PRECIS processing, on 1165 the assumption that these code points are in such wide use that 1166 disallowing them would be counter-productive. 1168 K: cp is in {0021..007E} 1170 9.12. Controls (L) 1172 This PRECIS-specific category consists of all control code points. 1174 L: Control(cp) = True 1176 9.13. PrecisIgnorableProperties (M) 1178 This PRECIS-specific category is used to group code points that are 1179 discouraged from use in PRECIS string classes. 1181 M: Default_Ignorable_Code_Point(cp) = True or 1182 Noncharacter_Code_Point(cp) = True 1184 The definition for Default_Ignorable_Code_Point can be found in the 1185 DerivedCoreProperties.txt file [DerivedCoreProperties]. 1187 Note: In general, these code points are constructs such as so-called 1188 soft hypens, certain joining code points, various specialized code 1189 points for use within Unicode itself (e.g., language tags and 1190 variation selectors), and so on. Disallowing these code points in 1191 PRECIS reduces the potential for unexpected results in the use of 1192 internationalized strings. 1194 9.14. Spaces (N) 1196 This PRECIS-specific category is used to group code points that are 1197 space code points. 1199 N: General_Category(cp) is in {Zs} 1201 9.15. Symbols (O) 1203 This PRECIS-specific category is used to group code points that are 1204 symbols. 1206 O: General_Category(cp) is in {Sm, Sc, Sk, So} 1208 9.16. Punctuation (P) 1210 This PRECIS-specific category is used to group code points that are 1211 punctuation code points. 1213 P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} 1215 9.17. HasCompat (Q) 1217 This PRECIS-specific category is used to group any code point that is 1218 decomposed and recomposed into something other than itself under 1219 Unicode normalization form KC. 1221 Q: toNFKC(cp) != cp 1222 Typically this category is true of code points that are 1223 "compatibility decomposable characters" as defined in the Unicode 1224 Standard. 1226 The toNFKC() operation returns the code point in normalization form 1227 KC. For more information, see Section 5 of Unicode Standard Annex 1228 #15 [UAX15]. 1230 9.18. OtherLetterDigits (R) 1232 This PRECIS-specific category is used to group code points that are 1233 letters and digits other than the "traditional" letters and digits 1234 grouped under the LetterDigits (A) class (see Section 9.1). 1236 R: General_Category(cp) is in {Lt, Nl, No, Me} 1238 10. Guidelines for Designated Experts 1240 Experience with internationalization in application protocols has 1241 shown that protocol designers and application developers usually do 1242 not understand the subtleties and tradeoffs involved with 1243 internationalization and that they need considerable guidance in 1244 making reasonable decisions with regard to the options before them. 1246 Therefore: 1248 o Protocol designers are strongly encouraged to question the 1249 assumption that they need to define new profiles, because existing 1250 profiles are designed for wide reuse (see Section 5 for further 1251 discussion). 1253 o Those who persist in defining new profiles are strongly encouraged 1254 to clearly explain a strong justification for doing so, and to 1255 publish a stable specification that provides all of the 1256 information described under Section 11.3. 1258 o The designated experts for profile registration requests ought to 1259 seek answers to all of the questions provided under Section 11.3 1260 and to encourage applicants to provide a stable specification 1261 documenting the profile (even though the registration policy for 1262 PRECIS profiles is Expert Review and a stable specification is not 1263 strictly required). 1265 o Developers of applications that use PRECIS are strongly encouraged 1266 to apply the guidelines provided under Section 6 and to seek out 1267 the advice of the designated experts or other knowledgeable 1268 individuals in doing so. 1270 o All parties are strongly encouraged to help prevent the 1271 multiplication of profiles beyond necessity, as described under 1272 Section 5.1, and to use PRECIS in ways that will minimize user 1273 confusion and insecure application behavior. 1275 Internationalization can be difficult and contentious; designated 1276 experts, profile registrants, and application developers are strongly 1277 encouraged to work together in a spirit of good faith and mutual 1278 understanding to achieve rough consensus on profile registration 1279 requests and the use of PRECIS in particular applications. They are 1280 also encouraged to bring additional expertise into the discussion if 1281 that would be helpful in adding perspective or otherwise resolving 1282 issues. 1284 11. IANA Considerations 1286 11.1. PRECIS Derived Property Value Registry 1288 IANA has created and now maintains the "PRECIS Derived Property 1289 Value" registry that records the derived properties for the versions 1290 of Unicode that are released after (and including) version 7.0. The 1291 derived property value is to be calculated in cooperation with a 1292 designated expert [RFC5226] according to the rules specified under 1293 Sections 8 and 9. 1295 The IESG is to be notified if backward-incompatible changes to the 1296 table of derived properties are discovered or if other problems arise 1297 during the process of creating the table of derived property values 1298 or during expert review. Changes to the rules defined under 1299 Sections 8 and 9 require IETF Review. 1301 11.2. PRECIS Base Classes Registry 1303 IANA has created the "PRECIS Base Classes" registry. In accordance 1304 with [RFC5226], the registration policy is "RFC Required". 1306 The registration template is as follows: 1308 Base Class: [the name of the PRECIS string class] 1310 Description: [a brief description of the PRECIS string class and its 1311 intended use, e.g., "A sequence of letters, numbers, and symbols 1312 that is used to identify or address a network entity."] 1314 Specification: [the RFC number] 1316 The initial registrations are as follows: 1318 Base Class: FreeformClass. 1319 Description: A sequence of letters, numbers, symbols, spaces, and 1320 other code points that is used for free-form strings. 1321 Specification: Section 4.3 of RFC 7564. 1323 Base Class: IdentifierClass. 1324 Description: A sequence of letters, numbers, and symbols that is 1325 used to identify or address a network entity. 1326 Specification: Section 4.2 of RFC 7564. 1328 11.3. PRECIS Profiles Registry 1330 IANA has created the "PRECIS Profiles" registry to identify profiles 1331 that use the PRECIS string classes. In accordance with [RFC5226], 1332 the registration policy is "Expert Review". This policy was chosen 1333 in order to ease the burden of registration while ensuring that 1334 "customers" of PRECIS receive appropriate guidance regarding the 1335 sometimes complex and subtle internationalization issues related to 1336 profiles of PRECIS string classes. 1338 The registration template is as follows: 1340 Name: [the name of the profile] 1342 Base Class: [which PRECIS string class is being profiled] 1344 Applicability: [the specific protocol elements to which this profile 1345 applies, e.g., "Localparts in XMPP addresses."] 1347 Replaces: [the Stringprep profile that this PRECIS profile replaces, 1348 if any] 1350 Width Mapping Rule: [the behavioral rule for handling of width, 1351 e.g., "Map fullwidth and halfwidth code points to their 1352 compatibility variants."] 1354 Additional Mapping Rule: [any additional mappings that are required 1355 or recommended, e.g., "Map non-ASCII space code points to ASCII 1356 space."] 1358 Case Mapping Rule: [the behavioral rule for handling of case, e.g., 1359 "apply the Unicode toLowerCase() operation"] 1361 Normalization Rule: [which Unicode normalization form is applied, 1362 e.g., "NFC"] 1364 Directionality Rule: [the behavioral rule for handling of right-to- 1365 left code points, e.g., "The 'Bidi Rule' defined in RFC 5893 1366 applies."] 1368 Enforcement: [which entities enforce the rules, and when that 1369 enforcement occurs during protocol operations] 1371 Specification: [a pointer to relevant documentation, such as an RFC 1372 or Internet-Draft] 1374 In order to request a review, the registrant shall send a completed 1375 template to the precis@ietf.org list or its designated successor. 1377 Factors to focus on while defining profiles and reviewing profile 1378 registrations include the following: 1380 o Would an existing PRECIS string class or profile solve the 1381 problem? If not, why not? (See Section 5.1 for related 1382 considerations.) 1384 o Is the problem being addressed by this profile well defined? 1386 o Does the specification define what kinds of applications are 1387 involved and the protocol elements to which this profile applies? 1389 o Is the profile clearly defined? 1391 o Is the profile based on an appropriate dividing line between user 1392 interface (culture, context, intent, locale, device limitations, 1393 etc.) and the use of conformant strings in protocol elements? 1395 o Are the width mapping, case mapping, additional mappings, 1396 normalization, and directionality rules appropriate for the 1397 intended use? 1399 o Does the profile explain which entities enforce the rules, and 1400 when such enforcement occurs during protocol operations? 1402 o Does the profile reduce the degree to which human users could be 1403 surprised or confused by application behavior (the "Principle of 1404 Least Astonishment")? 1406 o Does the profile introduce any new security concerns such as those 1407 described under Section 12 of this document (e.g., false positives 1408 for authentication or authorization)? 1410 12. Security Considerations 1412 12.1. General Issues 1414 If input strings that appear "the same" to users are programmatically 1415 considered to be distinct in different systems, or if input strings 1416 that appear distinct to users are programmatically considered to be 1417 "the same" in different systems, then users can be confused. Such 1418 confusion can have security implications, such as the false positives 1419 and false negatives discussed in [RFC6943]. One starting goal of 1420 work on the PRECIS framework was to limit the number of times that 1421 users are confused (consistent with the "Principle of Least 1422 Astonishment"). Unfortunately, this goal has been difficult to 1423 achieve given the large number of application protocols already in 1424 existence. Despite these difficulties, profiles should not be 1425 multiplied beyond necessity (see Section 5.1). In particular, 1426 application protocol designers should think long and hard before 1427 defining a new profile instead of using one that has already been 1428 defined, and if they decide to define a new profile then they should 1429 clearly explain their reasons for doing so. 1431 The security of applications that use this framework can depend in 1432 part on the proper preparation, enforcement, and comparison of 1433 internationalized strings. For example, such strings can be used to 1434 make authentication and authorization decisions, and the security of 1435 an application could be compromised if an entity providing a given 1436 string is connected to the wrong account or online resource based on 1437 different interpretations of the string (again, see [RFC6943]). 1439 Specifications of application protocols that use this framework are 1440 strongly encouraged to describe how internationalized strings are 1441 used in the protocol, including the security implications of any 1442 false positives and false negatives that might result from various 1443 enforcement and comparison operations. For some helpful guidelines, 1444 refer to [RFC6943], [RFC5890], [UTR36], and [UTS39]. 1446 12.2. Use of the IdentifierClass 1448 Strings that conform to the IdentifierClass and any profile thereof 1449 are intended to be relatively safe for use in a broad range of 1450 applications, primarily because they include only letters, digits, 1451 and "grandfathered" non-space code points from the ASCII range; thus, 1452 they exclude spaces, code points with compatibility equivalents, and 1453 almost all symbols and punctuation marks. However, because such 1454 strings can still include so-called confusable code points (see 1455 Section 12.5), protocol designers and implementers are encouraged to 1456 pay close attention to the security considerations described 1457 elsewhere in this document. 1459 12.3. Use of the FreeformClass 1461 Strings that conform to the FreeformClass and many profiles thereof 1462 can include virtually any Unicode code point. This makes the 1463 FreeformClass quite expressive, but also problematic from the 1464 perspective of possible user confusion. Protocol designers are 1465 hereby warned that the FreeformClass contains code points they might 1466 not understand, and are encouraged to profile the IdentifierClass 1467 wherever feasible; however, if an application protocol requires more 1468 code points than are allowed by the IdentifierClass, protocol 1469 designers are encouraged to define a profile of the FreeformClass 1470 that restricts the allowable code points as tightly as possible. 1471 (The PRECIS Working Group considered the option of allowing 1472 "superclasses" as well as profiles of PRECIS string classes, but 1473 decided against allowing superclasses to reduce the likelihood of 1474 security and interoperability problems.) 1476 12.4. Local Character Set Issues 1478 When systems use local character sets other than ASCII and Unicode, 1479 this specification leaves the problem of converting between the local 1480 character set and Unicode up to the application or local system. If 1481 different applications (or different versions of one application) 1482 implement different rules for conversions among coded character sets, 1483 they could interpret the same name differently and contact different 1484 application servers or other network entities. This problem is not 1485 solved by security protocols, such as Transport Layer Security (TLS) 1486 [RFC5246] and the Simple Authentication and Security Layer (SASL) 1487 [RFC4422], that do not take local character sets into account. 1489 12.5. Visually Similar Characters 1491 Some code points are visually similar and thus can cause confusion 1492 among humans. Such characters are often called "confusable 1493 characters" or "confusables". 1495 The problem of confusable characters is not necessarily caused by the 1496 use of Unicode code points outside the ASCII range. For example, in 1497 some presentations and to some individuals the string "ju1iet" 1498 (spelled with DIGIT ONE, U+0031, as the third character) might appear 1499 to be the same as "juliet" (spelled with LATIN SMALL LETTER L, 1500 U+006C), especially on casual visual inspection. This phenomenon is 1501 sometimes called "typejacking". 1503 However, the problem is made more serious by introducing the full 1504 range of Unicode code points into protocol strings. For example, the 1505 code points U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the 1506 Cherokee block look similar to the ASCII code points representing 1507 "STPETER" as they might appear when presented using a "creative" font 1508 family. 1510 In some examples of confusable characters, it is unlikely that the 1511 average human could tell the difference between the real string and 1512 the fake string. (Indeed, there is no programmatic way to 1513 distinguish with full certainty which is the fake string and which is 1514 the real string; in some contexts, the string formed of Cherokee code 1515 points might be the real string and the string formed of ASCII code 1516 points might be the fake string.) Because PRECIS-compliant strings 1517 can contain almost any properly encoded Unicode code point, it can be 1518 relatively easy to fake or mimic some strings in systems that use the 1519 PRECIS framework. The fact that some strings are easily confused 1520 introduces security vulnerabilities of the kind that have also 1521 plagued the World Wide Web, specifically the phenomenon known as 1522 phishing. 1524 Despite the fact that some specific suggestions about identification 1525 and handling of confusable characters appear in the Unicode Security 1526 Considerations [UTR36] and the Unicode Security Mechanisms [UTS39], 1527 it is also true (as noted in [RFC5890]) that "there are no 1528 comprehensive technical solutions to the problems of confusable 1529 characters." Because it is impossible to map visually similar 1530 characters without a great deal of context (such as knowing the font 1531 families used), the PRECIS framework does nothing to map similar- 1532 looking characters together, nor does it prohibit some characters 1533 because they look like others. 1535 Nevertheless, specifications for application protocols that use this 1536 framework are strongly encouraged to describe how confusable 1537 characters can be abused to compromise the security of systems that 1538 use the protocol in question, along with any protocol-specific 1539 suggestions for overcoming those threats. In particular, software 1540 implementations and service deployments that use PRECIS-based 1541 technologies are strongly encouraged to define and implement 1542 consistent policies regarding the registration, storage, and 1543 presentation of visually similar characters. The following 1544 recommendations are appropriate: 1546 1. An application service SHOULD define a policy that specifies the 1547 scripts or blocks of code points that the service will allow to 1548 be registered (e.g., in an account name) or stored (e.g., in a 1549 filename). Such a policy SHOULD be informed by the languages and 1550 scripts that are used to write registered account names; in 1551 particular, to reduce confusion, the service SHOULD forbid 1552 registration or storage of strings that contain code points from 1553 more than one script and SHOULD restrict registrations to code 1554 points drawn from a very small number of scripts (e.g., scripts 1555 that are well understood by the administrators of the service, to 1556 improve manageability). 1558 2. User-oriented application software SHOULD define a policy that 1559 specifies how internationalized strings will be presented to a 1560 human user. Because every human user of such software has a 1561 preferred language or a small set of preferred languages, the 1562 software SHOULD gather that information either explicitly from 1563 the user or implicitly via the operating system of the user's 1564 device. Furthermore, because most languages are typically 1565 represented by a single script or a small set of scripts, and 1566 because most scripts are typically contained in one or more 1567 blocks of code points, the software SHOULD warn the user when 1568 presenting a string that mixes code points from more than one 1569 script or block, or that uses code points outside the normal 1570 range of the user's preferred language(s). (Such a 1571 recommendation is not intended to discourage communication across 1572 different communities of language users; instead, it recognizes 1573 the existence of such communities and encourages due caution when 1574 presenting unfamiliar scripts or code points to human users.) 1576 The challenges inherent in supporting the full range of Unicode code 1577 points have in the past led some to hope for a way to 1578 programmatically negotiate more restrictive ranges based on locale, 1579 script, or other relevant factors; to tag the locale associated with 1580 a particular string; etc. As a general-purpose internationalization 1581 technology, the PRECIS framework does not include such mechanisms. 1583 12.6. Security of Passwords 1585 Two goals of passwords are to maximize the amount of entropy and to 1586 minimize the potential for false positives. These goals can be 1587 achieved in part by allowing a wide range of code points and by 1588 ensuring that passwords are handled in such a way that code points 1589 are not compared aggressively. Therefore, it is NOT RECOMMENDED for 1590 application protocols to profile the FreeformClass for use in 1591 passwords in a way that removes entire categories (e.g., by 1592 disallowing symbols or punctuation). Furthermore, it is NOT 1593 RECOMMENDED for application protocols to map uppercase and titlecase 1594 code points to their lowercase equivalents in such strings; instead, 1595 it is RECOMMENDED to preserve the case of all code points contained 1596 in such strings and to compare them in a case-sensitive manner. 1598 That said, software implementers need to be aware that there exist 1599 tradeoffs between entropy and usability. For example, allowing a 1600 user to establish a password containing "uncommon" code points might 1601 make it difficult for the user to access a service when using an 1602 unfamiliar or constrained input device. 1604 Some application protocols use passwords directly, whereas others 1605 reuse technologies that themselves process passwords (one example of 1606 such a technology is the Simple Authentication and Security Layer 1607 [RFC4422]). Moreover, passwords are often carried by a sequence of 1608 protocols with backend authentication systems or data storage systems 1609 such as RADIUS [RFC2865] and the Lightweight Directory Access 1610 Protocol (LDAP) [RFC4510]. Developers of application protocols are 1611 encouraged to look into reusing these profiles instead of defining 1612 new ones, so that end-user expectations about passwords are 1613 consistent no matter which application protocol is used. 1615 In protocols that provide passwords as input to a cryptographic 1616 algorithm such as a hash function, the client will need to perform 1617 proper preparation of the password before applying the algorithm, 1618 because the password is not available to the server in plaintext 1619 form. 1621 Further discussion of password handling can be found in [RFC7613]. 1623 13. Interoperability Considerations 1625 13.1. Coded Character Sets 1627 It is known that some existing applications and systems do not 1628 support the full Unicode coded character set, or even any characters 1629 outside the ASCII repertoire [RFC20]. If two (or more) applications 1630 or systems need to interoperate when exchanging data (e.g., for the 1631 purpose of authenticating the combination of a username and 1632 password), naturally they will need to have in common at least one 1633 coded character set and the repertoire of characters being exchanged 1634 (see [RFC6365] for definitions of these terms). Establishing such a 1635 baseline is a matter for the application or system that uses PRECIS, 1636 not for the PRECIS framework. 1638 13.2. Dependency on Unicode 1640 The only coded character set supported by PRECIS is Unicode. If an 1641 application or system does not support Unicode or uses a different 1642 coded character set [RFC6365], then the PRECIS rules cannot be 1643 applied to that application or system. 1645 13.3. Encoding 1647 Although strings that are consumed in PRECIS-based application 1648 protocols are often encoded using UTF-8 [RFC3629], the exact encoding 1649 is a matter for the application protocol that uses PRECIS, not for 1650 the PRECIS framework or for specifications that define PRECIS string 1651 classes or profiles thereof. 1653 13.4. Unicode Versions 1655 It is extremely important for protocol designers and application 1656 developers to undersatnd that various changes can occur across 1657 versions of the Unicode Standard, and such changes can result in 1658 instability of PRECIS categories. The following are merely a few 1659 examples: 1661 o As described in [RFC6452], between Unicode 5.2 (current at the 1662 time IDNA2008 was originally published) and Unicode 6.0, three 1663 code points underwent changes in their GeneralCategory, resulting 1664 in modified handling depending on which version of Unicode is 1665 available on the underlying system. 1667 o The HasCompat() categorization of a given input string could 1668 change if, for example, the string includes a precomposed 1669 character that was added in a recent version of Unicode. 1671 o The East Asian width property, which is used in many PRECIS width- 1672 mapping rules, is not guaranteed to be stable across Unicode 1673 versions. 1675 Other such differences might arise between the version of Unicode 1676 current at the time of this writing (7.0) and future versions. 1678 13.5. Potential Changes to Handling of Certain Unicode Code Points 1680 As part of the review of Unicode 7.0 for IDNA, a question was raised 1681 about a newly added code point that led to a re-analysis of the 1682 normalization rules used by IDNA and inherited by this document 1683 (Section 5.2.4). Some of the general issues are described in 1684 [IAB-Statement] and pursued in more detail in [IDNA-Unicode]. 1686 At the time of writing, these issues have yet to be settled. 1687 However, implementers need to be aware that this specification is 1688 likely to be updated in the future to address these issues. The 1689 potential changes include the following: 1691 o The range of code points in the LetterDigits category 1692 (Sections 4.2.1 and 9.1) might be narrowed. 1694 o Some code points with special properties that are now allowed 1695 might be excluded. 1697 o More "Additional Mapping Rules" (Section 5.2.2) might be defined. 1699 o Alternative normalization methods might be added. 1701 Until these issues are sorted out, it is reasonable for the IANA to 1702 apply the same precautionary principle described in [IAB-Statement] 1703 to the PRECIS Derived Property Value Registry as is applied to the 1704 Internationalized Domain Names for Applications (IDNA) Parameters 1705 registry: that is, to not make further updates to the registry. 1707 Nevertheless, implementations and deployments are unlikely to 1708 encounter significant problems as a consequence of these issues or 1709 potential changes if they follow the advice given in this 1710 specification to use the more restrictive IdentifierClass whenever 1711 possible or, if using the FreeformClass, to allow only a restricted 1712 set of code points, particularly avoiding code points whose 1713 implications they do not understand. 1715 14. References 1717 14.1. Normative References 1719 [RFC20] Cerf, V., "ASCII format for network interchange", STD 80, 1720 RFC 20, DOI 10.17487/RFC0020, October 1969, 1721 . 1723 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1724 Requirement Levels", BCP 14, RFC 2119, 1725 DOI 10.17487/RFC2119, March 1997, 1726 . 1728 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network 1729 Interchange", RFC 5198, DOI 10.17487/RFC5198, March 2008, 1730 . 1732 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in 1733 Internationalization in the IETF", BCP 166, RFC 6365, 1734 DOI 10.17487/RFC6365, September 2011, 1735 . 1737 [Unicode] The Unicode Consortium, "The Unicode Standard", 1738 . 1740 14.2. Informative References 1742 [DerivedCoreProperties] 1743 The Unicode Consortium, "DerivedCoreProperties-7.0.0.txt", 1744 Unicode Character Database, February 2014, 1745 . 1748 [Err4568] RFC Errata, "Erratum ID 4568", RFC 7564, 1749 . 1751 [IAB-Statement] 1752 Internet Architecture Board, "IAB Statement on Identifiers 1753 and Unicode 7.0.0", February 2015, 1754 . 1758 [IDNA-Unicode] 1759 Klensin, J. and P. Faltstrom, "IDNA Update for Unicode 1760 7.0.0", Work in Progress, draft-klensin-idna-5892upd- 1761 unicode70-04, March 2015. 1763 [PropertyAliases] 1764 The Unicode Consortium, "PropertyAliases-7.0.0.txt", 1765 Unicode Character Database, November 2013, 1766 . 1769 [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, 1770 "Remote Authentication Dial In User Service (RADIUS)", 1771 RFC 2865, DOI 10.17487/RFC2865, June 2000, 1772 . 1774 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 1775 Internationalized Strings ("stringprep")", RFC 3454, 1776 DOI 10.17487/RFC3454, December 2002, 1777 . 1779 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 1780 "Internationalizing Domain Names in Applications (IDNA)", 1781 RFC 3490, DOI 10.17487/RFC3490, March 2003, 1782 . 1784 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1785 Profile for Internationalized Domain Names (IDN)", 1786 RFC 3491, DOI 10.17487/RFC3491, March 2003, 1787 . 1789 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1790 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1791 2003, . 1793 [RFC4422] Melnikov, A., Ed. and K. Zeilenga, Ed., "Simple 1794 Authentication and Security Layer (SASL)", RFC 4422, 1795 DOI 10.17487/RFC4422, June 2006, 1796 . 1798 [RFC4510] Zeilenga, K., Ed., "Lightweight Directory Access Protocol 1799 (LDAP): Technical Specification Road Map", RFC 4510, 1800 DOI 10.17487/RFC4510, June 2006, 1801 . 1803 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 1804 Recommendations for Internationalized Domain Names 1805 (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006, 1806 . 1808 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1809 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1810 DOI 10.17487/RFC5226, May 2008, 1811 . 1813 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1814 Specifications: ABNF", STD 68, RFC 5234, 1815 DOI 10.17487/RFC5234, January 2008, 1816 . 1818 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1819 (TLS) Protocol Version 1.2", RFC 5246, 1820 DOI 10.17487/RFC5246, August 2008, 1821 . 1823 [RFC5890] Klensin, J., "Internationalized Domain Names for 1824 Applications (IDNA): Definitions and Document Framework", 1825 RFC 5890, DOI 10.17487/RFC5890, August 2010, 1826 . 1828 [RFC5891] Klensin, J., "Internationalized Domain Names in 1829 Applications (IDNA): Protocol", RFC 5891, 1830 DOI 10.17487/RFC5891, August 2010, 1831 . 1833 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 1834 Internationalized Domain Names for Applications (IDNA)", 1835 RFC 5892, DOI 10.17487/RFC5892, August 2010, 1836 . 1838 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 1839 for Internationalized Domain Names for Applications 1840 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, 1841 . 1843 [RFC5894] Klensin, J., "Internationalized Domain Names for 1844 Applications (IDNA): Background, Explanation, and 1845 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, 1846 . 1848 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 1849 Internationalized Domain Names in Applications (IDNA) 1850 2008", RFC 5895, DOI 10.17487/RFC5895, September 2010, 1851 . 1853 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code 1854 Points and Internationalized Domain Names for Applications 1855 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, 1856 November 2011, . 1858 [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and 1859 Problem Statement for the Preparation and Comparison of 1860 Internationalized Strings (PRECIS)", RFC 6885, 1861 DOI 10.17487/RFC6885, March 2013, 1862 . 1864 [RFC6943] Thaler, D., Ed., "Issues in Identifier Comparison for 1865 Security Purposes", RFC 6943, DOI 10.17487/RFC6943, May 1866 2013, . 1868 [RFC7564] Saint-Andre, P. and M. Blanchet, "PRECIS Framework: 1869 Preparation, Enforcement, and Comparison of 1870 Internationalized Strings in Application Protocols", 1871 RFC 7564, DOI 10.17487/RFC7564, May 2015, 1872 . 1874 [RFC7613] Saint-Andre, P. and A. Melnikov, "Preparation, 1875 Enforcement, and Comparison of Internationalized Strings 1876 Representing Usernames and Passwords", RFC 7613, 1877 DOI 10.17487/RFC7613, August 2015, 1878 . 1880 [RFC7622] Saint-Andre, P., "Extensible Messaging and Presence 1881 Protocol (XMPP): Address Format", RFC 7622, 1882 DOI 10.17487/RFC7622, September 2015, 1883 . 1885 [RFC7700] Saint-Andre, P., "Preparation, Enforcement, and Comparison 1886 of Internationalized Strings Representing Nicknames", 1887 RFC 7700, DOI 10.17487/RFC7700, December 2015, 1888 . 1890 [RFC7790] Yoneya, Y. and T. Nemoto, "Mapping Characters for Classes 1891 of the Preparation, Enforcement, and Comparison of 1892 Internationalized Strings (PRECIS)", RFC 7790, 1893 DOI 10.17487/RFC7790, February 2016, 1894 . 1896 [UAX11] Unicode Standard Annex #11, "East Asian Width", edited by 1897 Ken Lunde. An integral part of The Unicode Standard, 1898 . 1900 [UAX15] Unicode Standard Annex #15, "Unicode Normalization Forms", 1901 edited by Mark Davis and Ken Whistler. An integral part of 1902 The Unicode Standard, . 1904 [UAX9] Unicode Standard Annex #9, "Unicode Bidirectional 1905 Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew 1906 Glass. An integral part of The Unicode Standard, 1907 . 1909 [UTR36] Unicode Technical Report #36, "Unicode Security 1910 Considerations", by Mark Davis and Michel Suignard, 1911 . 1913 [UTS39] Unicode Technical Standard #39, "Unicode Security 1914 Mechanisms", edited by Mark Davis and Michel Suignard, 1915 . 1917 Appendix A. Changes from RFC 7564 1919 The following changes were made from [RFC7564]. 1921 o Recommended the Unicode toLowerCase() operation over the Unicode 1922 toCaseFold() operation in most PRECIS applications. 1924 o Clarified the meaning of "preparation" and described the 1925 motivation for including it in PRECIS. 1927 o Updated references. 1929 See [RFC7613] for a description of the differences from [RFC3454]. 1931 Appendix B. Acknowledgements 1933 Thanks to Martin Duerst, William Fisher, John Klensin, Christian 1934 Schudt, and Sam Whited for their feedback. Thanks to Sam Whited also 1935 for submitting [Err4568]. 1937 See [RFC7564] for acknowledgements related to the specification that 1938 this document supersedes. 1940 Some algorithms and textual descriptions have been borrowed from 1941 [RFC5892]. Some text regarding security has been borrowed from 1942 [RFC5890], [RFC7613], and [RFC7622]. 1944 Authors' Addresses 1946 Peter Saint-Andre 1947 Filament 1948 P.O. Box 787 1949 Parker, CO 80134 1950 USA 1952 Phone: +1 720 256 6756 1953 EMail: peter@filament.com 1954 URI: https://filament.com/ 1956 Marc Blanchet 1957 Viagenie 1958 246 Aberdeen 1959 Quebec, QC G1R 2E1 1960 Canada 1962 EMail: Marc.Blanchet@viagenie.ca 1963 URI: http://www.viagenie.ca/