idnits 2.17.1 draft-ietf-precis-framework-23.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 19, 2015) is 3353 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1752 -- Looks like a reference, but probably isn't: '2' on line 1754 == Outdated reference: A later version (-12) exists of draft-ietf-precis-mappings-08 == Outdated reference: A later version (-19) exists of draft-ietf-precis-nickname-14 == Outdated reference: A later version (-18) exists of draft-ietf-precis-saslprepbis-13 == Outdated reference: A later version (-24) exists of draft-ietf-xmpp-6122bis-18 == Outdated reference: A later version (-05) exists of draft-klensin-idna-5892upd-unicode70-03 -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PRECIS P. Saint-Andre 3 Internet-Draft &yet 4 Obsoletes: 3454 (if approved) M. Blanchet 5 Intended status: Standards Track Viagenie 6 Expires: August 23, 2015 February 19, 2015 8 PRECIS Framework: Preparation, Enforcement, and Comparison of 9 Internationalized Strings in Application Protocols 10 draft-ietf-precis-framework-23 12 Abstract 14 Application protocols using Unicode characters in protocol strings 15 need to properly handle such strings in order to enforce 16 internationalization rules for strings placed in various protocol 17 slots (such as addresses and identifiers) and to perform valid 18 comparison operations (e.g., for purposes of authentication or 19 authorization). This document defines a framework enabling 20 application protocols to perform the preparation, enforcement, and 21 comparison of internationalized strings ("PRECIS") in a way that 22 depends on the properties of Unicode characters and thus is agile 23 with respect to versions of Unicode. As a result, this framework 24 provides a more sustainable approach to the handling of 25 internationalized strings than the previous framework, known as 26 Stringprep (RFC 3454). This document obsoletes RFC 3454. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on August 23, 2015. 45 Copyright Notice 47 Copyright (c) 2015 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 7 65 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 7 66 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 7 67 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9 68 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9 69 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 9 70 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 10 71 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10 72 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 10 73 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 11 74 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11 75 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 11 76 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 12 77 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12 78 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12 79 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 5.1. Profiles Must Not Be Multiplied Beyond Necessity . . . . 13 81 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 13 82 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 13 83 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 14 84 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 14 85 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 15 86 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 15 87 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 16 88 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 17 89 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 17 90 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 17 91 6.3. Building Application-Layer Constructs . . . . . . . . . . 18 92 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 19 93 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 19 94 9. Category Definitions Used to Calculate Derived Property . . . 22 95 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 22 96 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 22 97 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 23 98 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 23 99 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 23 100 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 23 101 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 23 102 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 23 103 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 23 104 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 24 105 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 24 106 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 24 107 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 24 108 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 24 109 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 24 110 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 25 111 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 25 112 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 25 113 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 25 114 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 115 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 26 116 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 26 117 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 27 118 12. Security Considerations . . . . . . . . . . . . . . . . . . . 29 119 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 29 120 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 30 121 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 30 122 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 30 123 12.5. Visually Similar Characters . . . . . . . . . . . . . . 30 124 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 32 125 13. Interoperability Considerations . . . . . . . . . . . . . . . 33 126 13.1. Encoding . . . . . . . . . . . . . . . . . . . . . . . . 33 127 13.2. Character Sets . . . . . . . . . . . . . . . . . . . . . 33 128 13.3. Unicode Versions . . . . . . . . . . . . . . . . . . . . 34 129 13.4. Potential Changes to Handling of Certain Unicode Code 130 Points . . . . . . . . . . . . . . . . . . . . . . . . . 34 131 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 132 14.1. Normative References . . . . . . . . . . . . . . . . . . 35 133 14.2. Informative References . . . . . . . . . . . . . . . . . 35 134 14.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 38 135 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 38 136 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 138 1. Introduction 140 Application protocols using Unicode characters [Unicode7.0] in 141 protocol strings need to properly handle such strings in order to 142 enforce internationalization rules for strings placed in various 143 protocol slots (such as addresses and identifiers) and to perform 144 valid comparison operations (e.g., for purposes of authentication or 145 authorization). This document defines a framework enabling 146 application protocols to perform the preparation, enforcement, and 147 comparison of internationalized strings ("PRECIS") in a way that 148 depends on the properties of Unicode characters and thus is agile 149 with respect to versions of Unicode. 151 As described in the PRECIS problem statement [RFC6885], many IETF 152 protocols have used the Stringprep framework [RFC3454] as the basis 153 for preparing, enforcing, and comparing protocol strings that contain 154 Unicode characters, especially characters outside the ASCII range 155 [RFC20]. The Stringprep framework was developed during work on the 156 original technology for internationalized domain names (IDNs), here 157 called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the 158 Stringprep profile for IDNs. At the time, Stringprep was designed as 159 a general framework so that other application protocols could define 160 their own Stringprep profiles. Indeed, a number of application 161 protocols defined such profiles. 163 After the publication of [RFC3454] in 2002, several significant 164 issues arose with the use of Stringprep in the IDN case, as 165 documented in the IAB's recommendations regarding IDNs [RFC4690] 166 (most significantly, Stringprep was tied to Unicode version 3.2). 167 Therefore, the newer IDNA specifications, here called "IDNA2008" 168 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer 169 use Stringprep and Nameprep. This migration away from Stringprep for 170 IDNs prompted other "customers" of Stringprep to consider new 171 approaches to the preparation, enforcement, and comparison of 172 internationalized strings, as described in [RFC6885]. 174 This document defines a framework for a post-Stringprep approach to 175 the preparation, enforcement, and comparison of internationalized 176 strings in application protocols, based on several principles: 178 1. Define a small set of string classes that specify the Unicode 179 characters (i.e., specific "code points") appropriate for common 180 application protocol constructs. 182 2. Define each PRECIS string class in terms of Unicode code points 183 and their properties so that an algorithm can be used to 184 determine whether each code point or character category is (a) 185 valid, (b) allowed in certain contexts, (c) disallowed, or (d) 186 unassigned. 188 3. Use an "inclusion model" such that a string class consists only 189 of code points that are explicitly allowed, with the result that 190 any code point not explicitly allowed is forbidden. 192 4. Enable application protocols to define profiles of the PRECIS 193 string classes if necessary (addressing matters such as width 194 mapping, case mapping, Unicode normalization, and directionality) 195 but strongly discourage the multiplication of profiles beyond 196 necessity in order to avoid violations of the Principle of Least 197 User Astonishment. 199 It is expected that this framework will yield the following benefits: 201 o Application protocols will be agile with regard to Unicode 202 versions. 204 o Implementers will be able to share code point tables and software 205 code across application protocols, most likely by means of 206 software libraries. 208 o End users will be able to acquire more accurate expectations about 209 the characters that are acceptable in various contexts. Given 210 this more uniform set of string classes, it is also expected that 211 copy/paste operations between software implementing different 212 application protocols will be more predictable and coherent. 214 Whereas the string classes define the "baseline" code points for a 215 range of applications, profiling enables application protocols to 216 apply the string classes in ways that are appropriate for common 217 constructs such as usernames [I-D.ietf-precis-saslprepbis], opaque 218 strings such as passwords [I-D.ietf-precis-saslprepbis], and 219 nicknames [I-D.ietf-precis-nickname]. Profiles are responsible for 220 defining the handling of right-to-left characters as well as various 221 mapping operations of the kind also discussed for IDNs in [RFC5895], 222 such as case preservation or lowercasing, Unicode normalization, 223 mapping of certain characters to other characters or to nothing, and 224 mapping of full-width and half-width characters. 226 When an application applies a profile of a PRECIS string class, it 227 transforms an input string (which might or might not be conforming) 228 into an output string that definitively conforms to the profile. In 229 particular, this document focuses on the resulting ability to achieve 230 the following objectives: 232 a. Enforcing all the the rules of a profile for a single output 233 string (e.g., to determine if a string can be included in a 234 protocol slot, communicated to another entity within a protocol, 235 stored in a retrieval system, etc.). 237 b. Comparing two output strings to determine if they are equivalent, 238 typically through octet-for-octet matching to test for "bit- 239 string identity" (e.g., to make an access decision for purposes 240 of authentication or authorization as further described in 241 [RFC6943]). 243 The opportunity to define profiles naturally introduces the 244 possibility of a proliferation of profiles, thus potentially 245 mitigating the benefits of common code and violating user 246 expectations. See Section 5 for a discussion of this important 247 topic. 249 In addition, it is extremely important for protocol designers and 250 application developers to understand that the transformation of an 251 input string to an output string is rarely reversible. As one 252 relatively simple example, case mapping would transform an input 253 string of "StPeter" to "stpeter", and information about the 254 capitalization of the first and third characters would be lost. 255 Similar considerations apply to other forms of mapping and 256 normalization. 258 Although this framework is similar to IDNA2008 and includes by 259 reference some of the character categories defined in [RFC5892], it 260 defines additional character categories to meet the needs of common 261 application protocols other than DNS. 263 The character categories and calculation rules defined under 264 Section 8 and Section 9 are normative and apply to all Unicode code 265 points. The code point table that results from applying the 266 character categories and calculation rules to the latest version of 267 Unicode can be found in an IANA registry. 269 2. Terminology 271 Many important terms used in this document are defined in [RFC5890], 272 [RFC6365], [RFC6885], and [Unicode7.0]. The terms "left-to-right" 273 (LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex 274 #9 [UAX9]. 276 As of the date of writing, the version of Unicode published by the 277 Unicode Consortium is 7.0 [Unicode7.0]; however, PRECIS is not tied 278 to a specific version of Unicode. The latest version of Unicode is 279 always available [UnicodeCurrent]. 281 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 282 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 283 "OPTIONAL" in this document are to be interpreted as described in 284 [RFC2119]. 286 3. Preparation, Enforcement, and Comparison 288 This document distinguishes between three different actions that an 289 entity can take with regard to a string: 291 o Enforcement entails applying all of the rules specified for a 292 particular string class or profile thereof to an individual 293 string, for the purpose of determining if the string can be used 294 in a given protocol slot. 296 o Comparison entails applying all of the rules specified for a 297 particular string class or profile thereof to two separate 298 strings, for the purpose of determining if the two strings are 299 equivalent. 301 o Preparation entails only ensuring that the characters in an 302 individual string are allowed by the underlying PRECIS string 303 class. 305 In most cases, authoritative entities such as servers are responsible 306 for enforcement, whereas subsidiary entities such as clients are 307 responsible only for preparation. The rationale for this distinction 308 is that clients might not have the facilities (in terms of device 309 memory and processing power) to enforce all the rules regarding 310 internationalized strings (such as width mapping and Unicode 311 normalization), although they can more easily limit the repertoire of 312 characters they offer to an end user. By contrast, it is assumed 313 that a server would have more capacity to enforce the rules, and in 314 any case acts as an authority regarding allowable strings in protocol 315 slots such as addresses and endpoint identifiers. In addition, a 316 client cannot necessarily be trusted to properly generate such 317 strings, especially for security-sensitive contexts such as 318 authentication and authorization. 320 4. String Classes 322 4.1. Overview 324 Starting in 2010, various "customers" of Stringprep began to discuss 325 the need to define a post-Stringprep approach to the preparation and 326 comparison of internationalized strings other than IDNs. This 327 community analyzed the existing Stringprep profiles and also weighed 328 the costs and benefits of defining a relatively small set of Unicode 329 characters that would minimize the potential for user confusion 330 caused by visually similar characters (and thus be relatively "safe") 331 vs. defining a much larger set of Unicode characters that would 332 maximize the potential for user creativity (and thus be relatively 333 "expressive"). As a result, the community concluded that most 334 existing uses could be addressed by two string classes: 336 IdentifierClass: a sequence of letters, numbers, and some symbols 337 that is used to identify or address a network entity such as a 338 user account, a venue (e.g., a chatroom), an information source 339 (e.g., a data feed), or a collection of data (e.g., a file); the 340 intent is that this class will minimize user confusion in a wide 341 variety of application protocols, with the result that safety has 342 been prioritized over expressiveness for this class. 344 FreeformClass: a sequence of letters, numbers, symbols, spaces, and 345 other characters that is used for free-form strings, including 346 passwords as well as display elements such as human-friendly 347 nicknames for devices or for participants in a chatroom; the 348 intent is that this class will allow nearly any Unicode character, 349 with the result that expressiveness has been prioritized over 350 safety for this class. Note well that protocol designers, 351 application developers, service providers, and end users might not 352 understand or be able to enter all of the characters that can be 353 included in the FreeformClass - see Section 12.3 for details. 355 Future specifications might define additional PRECIS string classes, 356 such as a class that falls somewhere between the IdentifierClass and 357 the FreeformClass. At this time, it is not clear how useful such a 358 class would be. In any case, because application developers are able 359 to define profiles of PRECIS string classes, a protocol needing a 360 construct between the IdentiferClass and the FreeformClass could 361 define a restricted profile of the FreeformClass if needed. 363 The following subsections discuss the IdentifierClass and 364 FreeformClass in more detail, with reference to the dimensions 365 described in Section 3 of [RFC6885]. Each string class is defined by 366 the following behavioral rules: 368 Valid: Defines which code points are treated as valid for the 369 string. 371 Contextual Rule Required: Defines which code points are treated as 372 allowed only if the requirements of a contextual rule are met 373 (i.e., either CONTEXTJ or CONTEXTO). 375 Disallowed: Defines which code points need to be excluded from the 376 string. 378 Unassigned: Defines application behavior in the presence of code 379 points that are unknown (i.e., not yet designated) for the version 380 of Unicode used by the application. 382 This document defines the valid, contextual rule required, 383 disallowed, and unassigned rules for the IdentifierClass and 384 FreeformClass. As described under Section 5, profiles of these 385 string classes are responsible for defining the width mapping, 386 additional mappings, case mapping, normalization, and directionality 387 rules. 389 4.2. IdentifierClass 391 Most application technologies need strings that can be used to refer 392 to, include, or communicate protocol strings like usernames, file 393 names, data feed identifiers, and chatroom names. We group such 394 strings into a class called "IdentifierClass" having the following 395 features. 397 4.2.1. Valid 399 o Code points traditionally used as letters and numbers in writing 400 systems, i.e., the LetterDigits ("A") category first defined in 401 [RFC5892] and listed here under Section 9.1. 403 o Code points in the range U+0021 through U+007E, i.e., the 404 (printable) ASCII7 ("K") rule defined under Section 9.11. These 405 code points are "grandfathered" into PRECIS and thus are valid 406 even if they would otherwise be disallowed according to the 407 property-based rules specified in the next section. 409 Note: Although the PRECIS IdentifierClass re-uses the LetterDigits 410 category from IDNA2008, the range of characters allowed in the 411 IdentifierClass is wider than the range of characters allowed in 412 IDNA2008. The main reason is that IDNA2008 applies the Unstable 413 category before the LetterDigits category, thus disallowing 414 uppercase characters, whereas the IdentifierClass does not apply 415 the Unstable category. 417 4.2.2. Contextual Rule Required 419 o A number of characters from the Exceptions ("F") category defined 420 under Section 9.6 (see Section 9.6 for a full list). 422 o Joining characters, i.e., the JoinControl ("H") category defined 423 under Section 9.8. 425 4.2.3. Disallowed 427 o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category 428 defined under Section 9.9. 430 o Control characters, i.e., the Controls ("L") category defined 431 under Section 9.12. 433 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") 434 category defined under Section 9.13. 436 o Space characters, i.e., the Spaces ("N") category defined under 437 Section 9.14. 439 o Symbol characters, i.e., the Symbols ("O") category defined under 440 Section 9.15. 442 o Punctuation characters, i.e., the Punctuation ("P") category 443 defined under Section 9.16. 445 o Any character that has a compatibility equivalent, i.e., the 446 HasCompat ("Q") category defined under Section 9.17. These code 447 points are disallowed even if they would otherwise be valid 448 according to the property-based rules specified in the previous 449 section. 451 o Letters and digits other than the "traditional" letters and digits 452 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 453 defined under Section 9.18. 455 4.2.4. Unassigned 457 Any code points that are not yet designated in the Unicode character 458 set are considered Unassigned for purposes of the IdentifierClass, 459 and such code points are to be treated as Disallowed. See 460 Section 9.10. 462 4.2.5. Examples 464 As described in the Introduction to this document, the string classes 465 do not handle all issues related to string preparation and comparison 466 (such as case mapping); instead, such issues are handled at the level 467 of profiles. Examples for two profiles of the IdentifierClass can be 468 found in [I-D.ietf-precis-saslprepbis] (the UsernameIdentifierClass 469 profile) and in [I-D.ietf-xmpp-6122bis] (the LocalpartIdentifierClass 470 profile). 472 4.3. FreeformClass 474 Some application technologies need strings that can be used in a 475 free-form way, e.g., as a password in an authentication exchange (see 476 [I-D.ietf-precis-saslprepbis]) or a nickname in a chatroom (see 477 [I-D.ietf-precis-nickname]). We group such things into a class 478 called "FreeformClass" having the following features. 480 Security Warning: As mentioned, the FreeformClass prioritizes 481 expressiveness over safety; Section 12.3 describes some of the 482 security hazards involved with using or profiling the 483 FreeformClass. 485 Security Warning: Consult Section 12.6 for relevant security 486 considerations when strings conforming to the FreeformClass, or a 487 profile thereof, are used as passwords. 489 4.3.1. Valid 491 o Traditional letters and numbers, i.e., the LetterDigits ("A") 492 category first defined in [RFC5892] and listed here under 493 Section 9.1. 495 o Letters and digits other than the "traditional" letters and digits 496 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 497 defined under Section 9.18. 499 o Code points in the range U+0021 through U+007E, i.e., the 500 (printable) ASCII7 ("K") rule defined under Section 9.11. 502 o Any character that has a compatibility equivalent, i.e., the 503 HasCompat ("Q") category defined under Section 9.17. 505 o Space characters, i.e., the Spaces ("N") category defined under 506 Section 9.14. 508 o Symbol characters, i.e., the Symbols ("O") category defined under 509 Section 9.15. 511 o Punctuation characters, i.e., the Punctuation ("P") category 512 defined under Section 9.16. 514 4.3.2. Contextual Rule Required 516 o A number of characters from the Exceptions ("F") category defined 517 under Section 9.6 (see Section 9.6 for a full list). 519 o Joining characters, i.e., the JoinControl ("H") category defined 520 under Section 9.8. 522 4.3.3. Disallowed 524 o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category 525 defined under Section 9.9. 527 o Control characters, i.e., the Controls ("L") category defined 528 under Section 9.12. 530 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") 531 category defined under Section 9.13. 533 4.3.4. Unassigned 535 Any code points that are not yet designated in the Unicode character 536 set are considered Unassigned for purposes of the FreeformClass, and 537 such code points are to be treated as Disallowed. 539 4.3.5. Examples 541 As described in the Introduction to this document, the string classes 542 do not handle all issues related to string preparation and comparison 543 (such as case mapping); instead, such issues are handled at the level 544 of profiles. Examples for two profiles of the FreeformClass can be 545 found in [I-D.ietf-precis-nickname] (the NicknameFreeformClass 546 profile) and in [I-D.ietf-xmpp-6122bis] (the 547 ResourcepartIdentifierClass profile). 549 5. Profiles 551 This framework document defines the valid, contextual-rule-required, 552 disallowed, and unassigned rules for the IdentifierClass and the 553 FreeformClass. A profile of a PRECIS string class MUST define the 554 width mapping, additional mappings (if any), case mapping, 555 normalization, and directionality rules. A profile MAY also restrict 556 the allowable characters above and beyond the definition of the 557 relevant PRECIS string class (but MUST NOT add as valid any code 558 points that are disallowed by the relevant PRECIS string class). 559 These matters are discussed in the following subsections. 561 Profiles of the PRECIS string classes are registered with the IANA as 562 described under Section 11.3. Profile names use the following 563 convention: they are of the form "Profilename of BaseClass", where 564 the "Profilename" string is a differentiator and "BaseClass" is the 565 name of the PRECIS string class being profiled; for example, the 566 profile of the Freeform used for opaque strings such as passwords is 567 the "OpaqueString" profile [I-D.ietf-precis-saslprepbis]. 569 5.1. Profiles Must Not Be Multiplied Beyond Necessity 571 The risk of profile proliferation is significant because having too 572 many profiles will result in different behavior across various 573 applications, thus violating what is known in user interface design 574 as the Principle of Least Astonishment. 576 Indeed, we already have too many profiles. Ideally we would have at 577 most two or three profiles. Unfortunately, numerous application 578 protocols exist with their own quirks regarding protocol strings. 579 Domain names, email addresses, instant messaging addresses, chatroom 580 nicknames, filenames, authentication identifiers, passwords, and 581 other strings are already out there in the wild and need to be 582 supported in existing application protocols such as DNS, SMTP, XMPP, 583 IRC, NFS, iSCSI, EAP, and SASL among others. 585 Nevertheless, profiles must not be multiplied beyond necessity. 587 To help prevent profile proliferation, this document recommends 588 sensible defaults for the various options offered to profile creators 589 (such as width mapping and Unicode normalization). In addition, the 590 guidelines for designated experts provided under Section 10 are meant 591 to encourage a high level of due diligence regarding new profiles. 593 5.2. Rules 595 5.2.1. Width Mapping Rule 597 The width mapping rule of a profile specifies whether width mapping 598 is performed on the characters of a string, and how the mapping is 599 done. Typically such mapping consists of mapping fullwidth and 600 halfwidth characters, i.e., code points with a Decomposition Type of 601 Wide or Narrow, to their decomposition mappings; as an example, 602 FULLWIDTH DIGIT ZERO (U+FF10) would be mapped to DIGIT ZERO (U+0030). 604 The normalization form specified by a profile (see below) has an 605 impact on the need for width mapping. Because width mapping is 606 performed as a part of compatibility decomposition, a profile 607 employing either normalization form KD (NFKD) or normalization form 608 KC (NFKC) does not need to specify width mapping. However, if 609 Unicode normalization form C (NFC) is used (as is recommended) then 610 the profile needs to specify whether to apply width mapping; in this 611 case, width mapping is in general RECOMMENDED because allowing 612 fullwidth and halfwidth characters to remain unmapped to their 613 compatibility variants would violate the Principle of Least 614 Astonishment. For more information about the concept of width in 615 East Asian scripts within Unicode, see Unicode Standard Annex #11 616 [UAX11]. 618 5.2.2. Additional Mapping Rule 620 The additional mapping rule of a profile specifies whether additional 621 mappings is performed on the characters of a string, such as: 623 Mapping of delimiter characters (such as '@', ':', '/', '+', and 624 '-') 626 Mapping of special characters (e.g., non-ASCII space characters to 627 ASCII space or control characters to nothing). 629 The PRECIS mappings document [I-D.ietf-precis-mappings] describes 630 such mappings in more detail. 632 5.2.3. Case Mapping Rule 634 The case mapping rule of a profile specifies whether case mapping 635 (instead of case preservation) is performed on the characters of a 636 string, and how the mapping is applied (e.g., mapping uppercase and 637 titlecase characters to their lowercase equivalents). 639 If case mapping is desired (instead of case preservation), it is 640 RECOMMENDED to use Unicode Default Case Folding as defined in Chapter 641 3 of the Unicode Standard [Unicode7.0]. 643 Note: Unicode Default Case Folding is not designed to handle 644 various localization issues (such as so-called "dotless i" in 645 several Turkic languages). The PRECIS mappings document 646 [I-D.ietf-precis-mappings] describes these issues in greater 647 detail and defines a "local case mapping" method that handles some 648 locale-dependent and context-dependent mappings. 650 In order to maximize entropy and minimize the potential for false 651 positives, it is NOT RECOMMENDED for application protocols to map 652 uppercase and titlecase code points to their lowercase equivalents 653 when strings conforming to the FreeformClass, or a profile thereof, 654 are used in passwords; instead, it is RECOMMENDED to preserve the 655 case of all code points contained in such strings and then perform 656 case-sensitive comparison. See also the related discussion in 657 [I-D.ietf-precis-saslprepbis]. 659 5.2.4. Normalization Rule 661 The normalization rule of a profile specifies which Unicode 662 normalization form (D, KD, C, or KC) is to be applied (see Unicode 663 Standard Annex #15 [UAX15] for background information). 665 In accordance with [RFC5198], normalization form C (NFC) is 666 RECOMMENDED. 668 5.2.5. Directionality Rule 670 The directionality rule of a profile specifies how to treat strings 671 containing what are often called "right-to-left" (RTL) characters 672 (see Unicode Standard Annex #9 [UAX9]). RTL characters come from 673 scripts that are normally written from right to left and are 674 considered by Unicode to, themselves, have right-to-left 675 directionality. Some strings containing RTL characters also contain 676 "left-to-right" (LTR) characters, such as numerals, as well as 677 characters without directional properties. Consequently, such 678 strings are known as "bidirectional strings". 680 Presenting bidirectional strings in different layout systems (e.g., a 681 user interface that is configured to handle primarily an RTL script 682 vs. an interface that is configured to handle primarily an LTR 683 script) can yield display results that, while predictable to those 684 who understand the display rules, are counter-intuitive to casual 685 users. In particular, the same bidirectional string (in PRECIS 686 terms) might not be presented in the same way to users of those 687 different layout systems, even though the presentation is consistent 688 within any particular layout system. In some applications, these 689 presentation differences might be considered problematic and thus the 690 application designers might wish to restrict the use of bidirectional 691 strings by specifying a directionality rule. In other applications, 692 these presentation differences might not be considered problematic 693 (this especially tends to be true of more "free-form" strings) and 694 thus no directionality rule is needed. 696 The PRECIS framework does not directly address how to deal with 697 bidirectional strings across all string classes and profiles, and 698 does not define any new directionality rules, since at present there 699 is no widely accepted and implemented solution for the safe display 700 of arbitrary bidirectional strings beyond the Unicode bidirectional 701 algorithm [UAX9]. Although rules for management and display of 702 bidirectional strings have been defined for domain name labels and 703 similar identifiers through the "Bidi Rule" specified in the IDNA2008 704 specification on right-to-left scripts [RFC5893], those rules are 705 quite restrictive and are not necessarily applicable to all 706 bidirectional strings. 708 The authors of a PRECIS profile might believe that they need to 709 define a new directionality rule of their own. Because of the 710 complexity of the issues involved, such a belief is almost always 711 misguided, even if the authors have done a great deal of careful 712 research into the challenges of displaying bidirectional strings. 713 This document strongly suggests that profile authors who are thinking 714 about defining a new directionality rule think again, and instead 715 consider using the "Bidi Rule" [RFC5893] (for profiles based on the 716 IdentifierClass) or following the Unicode bidirectional algorithm 717 [UAX9] (for profiles based on the FreeformClass or in situations 718 where the IdentifierClass is not appropriate). 720 5.3. A Note about Spaces 722 With regard to the IdentiferClass, the consensus of the PRECIS 723 Working Group was that spaces are problematic for many reasons, 724 including: 726 o Many Unicode characters are confusable with ASCII space. 728 o Even if non-ASCII space characters are mapped to ASCII space 729 (U+0020), space characters are often not rendered in user 730 interfaces, leading to the possibility that a human user might 731 consider a string containing spaces to be equivalent to the same 732 string without spaces. 734 o In some locales, some devices are known to generate a character 735 other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a 736 user performs an action like hitting the space bar on a keyboard. 738 One consequence of disallowing space characters in the 739 IdentifierClass might be to effectively discourage their use within 740 identifiers created in newer application protocols; given the 741 challenges involved with properly handling space characters 742 (especially non-ASCII space characters) in identifiers and other 743 protocol strings, the PRECIS Working Group considered this to be a 744 feature, not a bug. 746 However, the FreeformClass does allow spaces, which enables 747 application protocols to define profiles of the FreeformClass that 748 are more flexible than any profiles of the IdentifierClass. In 749 addition, as explained in the previous section, application protocols 750 can also define application-layer constructs containing spaces. 752 6. Applications 754 6.1. How to Use PRECIS in Applications 756 Although PRECIS has been designed with applications in mind, 757 internationalization is not suddenly made easy though the use of 758 PRECIS. Application developers still need to give some thought to 759 how they will use the PRECIS string classes, or profiles thereof, in 760 their applications. This section provides some guidelines to 761 application developers (and to expert reviewers of application 762 protocol specifications). 764 o Don't define your own profile unless absolutely necessary (see 765 Section 5.1). Existing profiles have been design for wide re-use. 766 It is highly likely that an existing profile will meet your needs, 767 especially given the ability to specify further excluded 768 characters (Section 6.2) and to build application-layer constructs 769 (see Section 6.3). 771 o Do specify: 773 * Exactly which entities are responsible for preparation, 774 enforcement, and comparison of internationalized strings (e.g., 775 servers or clients). 777 * Exactly when those entities need to complete their tasks (e.g., 778 a server might need to enforce the rules of a profile before 779 allowing a client to gain network access). 781 * Exactly which protocol slots need to be checked against which 782 profiles (e.g., checking the address of a message's intended 783 recipient against the UsernameCaseMapped profile 784 [I-D.ietf-precis-saslprepbis] of the IdentifierClass, or 785 checking the password of a user against the OpaqueString 786 profile [I-D.ietf-precis-saslprepbis] of the FreeformClass). 788 See [I-D.ietf-precis-saslprepbis] and [I-D.ietf-xmpp-6122bis] for 789 definitions of these matters for several applications. 791 6.2. Further Excluded Characters 793 An application protocol that uses a profile MAY specify particular 794 code points that are not allowed in relevant slots within that 795 application protocol, above and beyond those excluded by the string 796 class or profile. 798 That is, an application protocol MAY do either of the following: 800 1. Exclude specific code points that are allowed by the relevant 801 string class. 803 2. Exclude characters matching certain Unicode properties (e.g., 804 math symbols) that are included in the relevant PRECIS string 805 class. 807 As a result of such exclusions, code points that are defined as valid 808 for the PRECIS string class or profile will be defined as disallowed 809 for the relevant protocol slot. 811 Typically, such exclusions are defined for the purpose of backward- 812 compatibility with legacy formats within an application protocol. 813 These are defined for application protocols, not profiles, in order 814 to prevent multiplication of profiles beyond necessity (see 815 Section 5.1). 817 6.3. Building Application-Layer Constructs 819 Sometimes, an application-layer construct does not map in a 820 straightforward manner to one of the base string classes or a profile 821 thereof. Consider, for example, the "simple user name" construct in 822 the Simple Authentication and Security Layer (SASL) [RFC4422]. 823 Depending on the deployment, a simple user name might take the form 824 of a user's full name (e.g., the user's personal name followed by a 825 space and then the user's family name). Such a simple user name 826 cannot be defined as an instance of the IdentifierClass or a profile 827 thereof, since space characters are not allowed in the 828 IdentifierClass; however, it could be defined using a space-separated 829 sequence of IdentifierClass instances, as in the following ABNF 830 [RFC5234] from [I-D.ietf-precis-saslprepbis]: 832 username = userpart *(1*SP userpart) 833 userpart = 1*(idbyte) 834 ; 835 ; an "idbyte" is a byte used to represent a 836 ; UTF-8 encoded Unicode code point that can be 837 ; contained in a string that conforms to the 838 ; PRECIS "IdentifierClass" 839 ; 841 Similar techniques could be used to define many application-layer 842 constructs, say of the form "user@domain" or "/path/to/file". 844 7. Order of Operations 846 To ensure proper comparison, the rules specified for a particular 847 string class or profile MUST be applied in the following order: 849 1. Width Mapping Rule 851 2. Additional Mapping Rule 853 3. Case Mapping Rule 855 4. Normalization Rule 857 5. Directionality Rule 859 6. Behavioral rules for determining whether a code point is valid, 860 allowed under a contextual rule, disallowed, or unassigned 862 As already described, the width mapping, additional mapping, case 863 mapping, normalization, and directionality rules are specified for 864 each profile, whereas the behavioral rules are specified for each 865 string class. Some of the logic behind this order is provided under 866 Section 5.2.1 (see also the PRECIS mappings document 867 [I-D.ietf-precis-mappings]). 869 8. Code Point Properties 871 In order to implement the string classes described above, this 872 document does the following: 874 1. Reviews and classifies the collections of code points in the 875 Unicode character set by examining various code point properties. 877 2. Defines an algorithm for determining a derived property value, 878 which can vary depending on the string class being used by the 879 relevant application protocol. 881 This document is not intended to specify precisely how derived 882 property values are to be applied in protocol strings. That 883 information is the responsibility of the protocol specification that 884 uses or profiles a PRECIS string class from this document. The value 885 of the property is to be interpreted as follows. 887 PROTOCOL VALID Those code points that are allowed to be used in any 888 PRECIS string class (currently, IdentifierClass and 889 FreeformClass). The abbreviated term "PVALID" is used to refer to 890 this value in the remainder of this document. 892 SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to 893 be used in specific string classes. In the remainder of this 894 document, the abbreviated term *_PVAL is used, where * = (ID | 895 FREE), i.e., either "FREE_PVAL" or "ID_PVAL". In practice, the 896 derived property ID_PVAL is not used in this specification, since 897 every ID_PVAL code point is PVALID. 899 CONTEXTUAL RULE REQUIRED Some characteristics of the character, such 900 as its being invisible in certain contexts or problematic in 901 others, require that it not be used in labels unless specific 902 other characters or properties are present. As in IDNA2008, there 903 are two subdivisions of CONTEXTUAL RULE REQUIRED, the first for 904 Join_controls (called "CONTEXTJ") and the second for other 905 characters (called "CONTEXTO"). A character with the derived 906 property value CONTEXTJ or CONTEXTO MUST NOT be used unless an 907 appropriate rule has been established and the context of the 908 character is consistent with that rule. The most notable of the 909 CONTEXTUAL RULE REQUIRED characters are the Join Control 910 characters U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON- 911 JOINER, which have a derived property value of CONTEXTJ. See 912 Appendix A of [RFC5892] for more information. 914 DISALLOWED Those code points that are not permitted in any PRECIS 915 string class. 917 SPECIFIC CLASS DISALLOWED Those code points that are not to be 918 included in one of the string classes but that might be permitted 919 in others. In the remainder of this document, the abbreviated 920 term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS" 921 or "ID_DIS". In practice, the derived property FREE_DIS is not 922 used in this specification, since every FREE_DIS code point is 923 DISALLOWED. 925 UNASSIGNED Those code points that are not designated (i.e. are 926 unassigned) in the Unicode Standard. 928 The algorithm to calculate the value of the derived property is as 929 follows (implementations MUST NOT modify the order of operations 930 within this algorithm, since doing so would cause inconsistent 931 results across implementations): 933 If .cp. .in. Exceptions Then Exceptions(cp); 934 Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); 935 Else If .cp. .in. Unassigned Then UNASSIGNED; 936 Else If .cp. .in. ASCII7 Then PVALID; 937 Else If .cp. .in. JoinControl Then CONTEXTJ; 938 Else If .cp. .in. OldHangulJamo Then DISALLOWED; 939 Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED; 940 Else If .cp. .in. Controls Then DISALLOWED; 941 Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL; 942 Else If .cp. .in. LetterDigits Then PVALID; 943 Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL; 944 Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL; 945 Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL; 946 Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL; 947 Else DISALLOWED; 949 The value of the derived property calculated can depend on the string 950 class; for example, if an identifier used in an application protocol 951 is defined as profiling the PRECIS IdentifierClass then a space 952 character such as U+0020 would be assigned to ID_DIS, whereas if an 953 identifier is defined as profiling the PRECIS FreeformClass then the 954 character would be assigned to FREE_PVAL. For the sake of brevity, 955 the designation "FREE_PVAL" is used herein, instead of the longer 956 designation "ID_DIS or FREE_PVAL". In practice, the derived 957 properties ID_PVAL and FREE_DIS are not used in this specification, 958 since every ID_PVAL code point is PVALID and every FREE_DIS code 959 point is DISALLOWED. 961 Use of the name of a rule (such as "Exceptions") implies the set of 962 code points that the rule defines, whereas the same name as a 963 function call (such as "Exceptions(cp)") implies the value that the 964 code point has in the Exceptions table. 966 The mechanisms described here allow determination of the value of the 967 property for future versions of Unicode (including characters added 968 after Unicode 5.2 or 7.0 depending on the category, since some 969 categories mentioned in this document are simply pointers to IDNA2008 970 and therefore were defined at the time of Unicode 5.2). Changes in 971 Unicode properties that do not affect the outcome of this process 972 therefore do not affect this framework. For example, a character can 973 have its Unicode General_Category value (see Chapter 4 of the Unicode 974 Standard [Unicode7.0]) change from So to Sm, or from Lo to Ll, 975 without affecting the algorithm results. Moreover, even if such 976 changes were to result, the BackwardCompatible list (Section 9.7) can 977 be adjusted to ensure the stability of the results. 979 9. Category Definitions Used to Calculate Derived Property 981 The derived property obtains its value based on a two-step procedure: 983 1. Characters are placed in one or more character categories either 984 (1) based on core properties defined by the Unicode Standard or 985 (2) by treating the code point as an exception and addressing the 986 code point based on its code point value. These categories are 987 not mutually exclusive. 989 2. Set operations are used with these categories to determine the 990 values for a property specific to a given string class. These 991 operations are specified under Section 8. 993 Note: Unicode property names and property value names might have 994 short abbreviations, such as "gc" for the General_Category 995 property and "Ll" for the Lowercase_Letter property value of the 996 gc property. 998 In the following specification of character categories, the operation 999 that returns the value of a particular Unicode character property for 1000 a code point is designated by using the formal name of that property 1001 (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for 1002 "code point". For example, the value of the General_Category 1003 property for a code point is indicated by General_Category(cp). 1005 The first ten categories (A-J) shown below were previously defined 1006 for IDNA2008 and are referenced from [RFC5892] to ease the 1007 understanding of how PRECIS handles various characters. Some of 1008 these categories are reused in PRECIS and some of them are not; 1009 however, the lettering of categories is retained to prevent overlap 1010 and to ease implementation of both IDNA2008 and PRECIS in a single 1011 software application. The next eight categories (K-R) are specific 1012 to PRECIS. 1014 9.1. LetterDigits (A) 1016 This category is defined in Section 2.1 of [RFC5892] and is included 1017 by reference for use in PRECIS. 1019 9.2. Unstable (B) 1021 This category is defined in Section 2.2 of [RFC5892]. However, it is 1022 not used in PRECIS. 1024 9.3. IgnorableProperties (C) 1026 This category is defined in Section 2.3 of [RFC5892]. However, it is 1027 not used in PRECIS. 1029 Note: See the "PrecisIgnorableProperties (M)" category below for a 1030 more inclusive category used in PRECIS identifiers. 1032 9.4. IgnorableBlocks (D) 1034 This category is defined in Section 2.4 of [RFC5892]. However, it is 1035 not used in PRECIS. 1037 9.5. LDH (E) 1039 This category is defined in Section 2.5 of [RFC5892]. However, it is 1040 not used in PRECIS. 1042 Note: See the "ASCII7 (K)" category below for a more inclusive 1043 category used in PRECIS identifiers. 1045 9.6. Exceptions (F) 1047 This category is defined in Section 2.6 of [RFC5892] and is included 1048 by reference for use in PRECIS. 1050 9.7. BackwardCompatible (G) 1052 This category is defined in Section 2.7 of [RFC5892] and is included 1053 by reference for use in PRECIS. 1055 Note: Management of this category is handled via the processes 1056 specified in [RFC5892]. At the time of this writing (and also at the 1057 time that RFC 5892 was published), this category consisted of the 1058 empty set; however, that is subject to change as described in RFC 1059 5892. 1061 9.8. JoinControl (H) 1063 This category is defined in Section 2.8 of [RFC5892] and is included 1064 by reference for use in PRECIS. 1066 9.9. OldHangulJamo (I) 1068 This category is defined in Section 2.9 of [RFC5892] and is included 1069 by reference for use in PRECIS. 1071 9.10. Unassigned (J) 1073 This category is defined in Section 2.10 of [RFC5892] and is included 1074 by reference for use in PRECIS. 1076 9.11. ASCII7 (K) 1078 This PRECIS-specific category consists of all printable, non-space 1079 characters from the 7-bit ASCII range. By applying this category, 1080 the algorithm specified under Section 8 exempts these characters from 1081 other rules that might be applied during PRECIS processing, on the 1082 assumption that these code points are in such wide use that 1083 disallowing them would be counter-productive. 1085 K: cp is in {0021..007E} 1087 9.12. Controls (L) 1089 This PRECIS-specific category consists of all control characters. 1091 L: Control(cp) = True 1093 9.13. PrecisIgnorableProperties (M) 1095 This PRECIS-specific category is used to group code points that are 1096 discouraged from use in PRECIS string classes. 1098 M: Default_Ignorable_Code_Point(cp) = True or 1099 Noncharacter_Code_Point(cp) = True 1101 The definition for Default_Ignorable_Code_Point can be found in the 1102 DerivedCoreProperties.txt [2] file. 1104 9.14. Spaces (N) 1106 This PRECIS-specific category is used to group code points that are 1107 space characters. 1109 N: General_Category(cp) is in {Zs} 1111 9.15. Symbols (O) 1113 This PRECIS-specific category is used to group code points that are 1114 symbols. 1116 O: General_Category(cp) is in {Sm, Sc, Sk, So} 1118 9.16. Punctuation (P) 1120 This PRECIS-specific category is used to group code points that are 1121 punctuation characters. 1123 P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} 1125 9.17. HasCompat (Q) 1127 This PRECIS-specific category is used to group code points that have 1128 compatibility equivalents as explained in Chapter 2 and Chapter 3 of 1129 the Unicode Standard [Unicode7.0]. 1131 Q: toNFKC(cp) != cp 1133 The toNFKC() operation returns the code point in normalization form 1134 KC. For more information, see Section 5 of Unicode Standard Annex 1135 #15 [UAX15]. 1137 9.18. OtherLetterDigits (R) 1139 This PRECIS-specific category is used to group code points that are 1140 letters and digits other than the "traditional" letters and digits 1141 grouped under the LetterDigits (A) class (see Section 9.1). 1143 R: General_Category(cp) is in {Lt, Nl, No, Me} 1145 10. Guidelines for Designated Experts 1147 Experience with internationalization in application protocols has 1148 shown that protocol designers and application developers usually do 1149 not understand the subtleties and tradeoffs involved with 1150 internationalization and that they need considerable guidance in 1151 making reasonable decisions with regard to the options before them. 1153 Therefore: 1155 o Protocol designers are strongly encouraged to question the 1156 assumption that they need to define new profiles, since existing 1157 profiles are designed for wide re-use (see Section 5 for further 1158 discussion). 1160 o Those who persist in defining new profiles are strongly encouraged 1161 to clearly explain a strong justification for doing so, and to 1162 publish a stable specification that provides all of the 1163 information described under Section 11.3. 1165 o The designated experts for profile registration requests ought to 1166 seek answers to all of the questions provided under Section 11.3 1167 and to encourage applicants to provide a stable specification 1168 documenting the profile (even though the registration policy for 1169 PRECIS profiles is Expert Review and a stable specification is not 1170 strictly required). 1172 o Developers of applications that use PRECIS are strongly encouraged 1173 to apply the guidelines provided under Section 6 and to seek out 1174 the advice of the designated experts or other knowledgeable 1175 individuals in doing so. 1177 o All parties are strongly encouraged to help prevent the 1178 multiplication of profiles beyond necessity, as described under 1179 Section 5.1, and to use PRECIS in ways that will minimize user 1180 confusion and insecure application behavior. 1182 Internationalization can be difficult and contentious; designated 1183 experts, profile registrants, and application developers are strongly 1184 encouraged to work together in a spirit of good faith and mutual 1185 understanding to achieve rough consensus on profile registration 1186 requests and the use of PRECIS in particular applications. They are 1187 also encouraged to bring additional expertise into the discussion if 1188 that would be helpful in adding perspective or otherwise resolving 1189 issues. 1191 11. IANA Considerations 1193 11.1. PRECIS Derived Property Value Registry 1195 IANA is requested to create a PRECIS-specific registry with the 1196 Derived Properties for the versions of Unicode that are released 1197 after (and including) version 7.0. The derived property value is to 1198 be calculated in cooperation with a designated expert [RFC5226] 1199 according to the rules specified under Section 8 and Section 9. 1201 The IESG is to be notified if backward-incompatible changes to the 1202 table of derived properties are discovered or if other problems arise 1203 during the process of creating the table of derived property values 1204 or during expert review. Changes to the rules defined under 1205 Section 8 and Section 9 require IETF Review. 1207 11.2. PRECIS Base Classes Registry 1209 IANA is requested to create a registry of PRECIS string classes. In 1210 accordance with [RFC5226], the registration policy is "RFC Required". 1212 The registration template is as follows: 1214 Base Class: [the name of the PRECIS string class] 1216 Description: [a brief description of the PRECIS string class and its 1217 intended use, e.g., "A sequence of letters, numbers, and symbols 1218 that is used to identify or address a network entity."] 1220 Specification: [the RFC number] 1222 The initial registrations are as follows: 1224 Base Class: FreeformClass. 1225 Description: A sequence of letters, numbers, symbols, spaces, and 1226 other code points that is used for free-form strings. 1227 Specification: Section 4.3 of this document. 1228 [Note to RFC Editor: please change "this document" 1229 to the RFC number issued for this specification.] 1231 Base Class: IdentifierClass. 1232 Description: A sequence of letters, numbers, and symbols that is 1233 used to identify or address a network entity. 1234 Specification: Section 4.2 of this document. 1235 [Note to RFC Editor: please change "this document" 1236 to the RFC number issued for this specification.] 1238 11.3. PRECIS Profiles Registry 1240 IANA is requested to create a registry of profiles that use the 1241 PRECIS string classes. In accordance with [RFC5226], the 1242 registration policy is "Expert Review". This policy was chosen in 1243 order to ease the burden of registration while ensuring that 1244 "customers" of PRECIS receive appropriate guidance regarding the 1245 sometimes complex and subtle internationalization issues related to 1246 profiles of PRECIS string classes. 1248 The registration template is as follows: 1250 Name: [the name of the profile] 1252 Base Class: [which PRECIS string class is being profiled] 1254 Applicability: [the specific protocol elements to which this profile 1255 applies, e.g., "Localparts in XMPP addresses."] 1257 Replaces: [the Stringprep profile that this PRECIS profile replaces, 1258 if any] 1260 Width Mapping Rule: [the behavioral rule for handling of width, 1261 e.g., "Map fullwidth and halfwidth characters to their 1262 compatibility variants."] 1264 Additional Mapping Rule: [any additional mappings are required or 1265 recommended, e.g., "Map non-ASCII space characters to ASCII 1266 space."] 1268 Case Mapping Rule: [the behavioral rule for handling of case, e.g., 1269 "Unicode Default Case Folding"] 1271 Normalization Rule: [which Unicode normalization form is applied, 1272 e.g., "NFC"] 1274 Directionality Rule: [the behavioral rule for handling of right-to- 1275 left code points, e.g., "The 'Bidi Rule' defined in RFC 5893 1276 applies."] 1278 Enforcement: [which entities enforce the rules, and when that 1279 enforcement occurs during protocol operations] 1281 Specification: [a pointer to relevant documentation, such as an RFC 1282 or Internet-Draft] 1284 In order to request a review, the registrant shall send a completed 1285 template to the precis@ietf.org list or its designated successor. 1287 Factors to focus on while defining profiles and reviewing profile 1288 registrations include the following: 1290 o Would an existing PRECIS string class or profile solve the 1291 problem? If not, why not? (See Section 5.1 for related 1292 considerations.) 1294 o Is the problem being addressed by this profile well-defined? 1296 o Does the specification define what kinds of applications are 1297 involved and the protocol elements to which this profile applies? 1299 o Is the profile clearly defined? 1301 o Is the profile based on an appropriate dividing line between user 1302 interface (culture, context, intent, locale, device limitations, 1303 etc.) and the use of conformant strings in protocol elements? 1305 o Are the width mapping, case mapping, additional mappings, 1306 normalization, and directionality rules appropriate for the 1307 intended use? 1309 o Does the profile explain which entities enforce the rules, and 1310 when such enforcement occurs during protocol operations? 1312 o Does the profile reduce the degree to which human users could be 1313 surprised or confused by application behavior (the "Principle of 1314 Least Astonishment")? 1316 o Does the profile introduce any new security concerns such as those 1317 described under Section 12 of this document (e.g., false positives 1318 for authentication or authorization)? 1320 12. Security Considerations 1322 12.1. General Issues 1324 If input strings that appear "the same" to users are programmatically 1325 considered to be distinct in different systems, or if input strings 1326 that appear distinct to users are programmatically considered to be 1327 "the same" in different systems, then users can be confused. Such 1328 confusion can have security implications, such as the false positives 1329 and false negatieves discussed in [RFC6943]. One starting goal of 1330 work on the PRECIS framework was to limit the number of times that 1331 users are confused (consistent with the "Principle of Least 1332 Astonishment"). Unfortunately, this goal has been difficult to 1333 achieve given the large number of application protocols already in 1334 existence. Despite these difficulties, profiles should not be 1335 multiplied beyond necessity (see Section 5.1. In particular, 1336 application protocol designers should think long and hard before 1337 defining a new profile instead of using one that has already been 1338 defined, and if they decide to define a new profile then they should 1339 clearly explain their reasons for doing so. 1341 The security of applications that use this framework can depend in 1342 part on the proper preparation, enforcement, and comparison of 1343 internationalized strings. For example, such strings can be used to 1344 make authentication and authorization decisions, and the security of 1345 an application could be compromised if an entity providing a given 1346 string is connected to the wrong account or online resource based on 1347 different interpretations of the string (again, see [RFC6943]). 1349 Specifications of application protocols that use this framework are 1350 strongly encouraged to describe how internationalized strings are 1351 used in the protocol, including the security implications of any 1352 false positives and false negatives that might result from various 1353 enforcement and comparison operations. For some helpful guidelines, 1354 refer to [RFC6943], [RFC5890], [UTR36], and [UTS39]. 1356 12.2. Use of the IdentifierClass 1358 Strings that conform to the IdentifierClass and any profile thereof 1359 are intended to be relatively safe for use in a broad range of 1360 applications, primarily because they include only letters, digits, 1361 and "grandfathered" non-space characters from the ASCII range; thus 1362 they exclude spaces, characters with compatibility equivalents, and 1363 almost all symbols and punctuation marks. However, because such 1364 strings can still include so-called confusable characters (see 1365 Section 12.5), protocol designers and implementers are encouraged to 1366 pay close attention to the security considerations described 1367 elsewhere in this document. 1369 12.3. Use of the FreeformClass 1371 Strings that conform to the FreeformClass and many profiles thereof 1372 can include virtually any Unicode character. This makes the 1373 FreeformClass quite expressive, but also problematic from the 1374 perspective of possible user confusion. Protocol designers are 1375 hereby warned that the FreeformClass contains codepoints they might 1376 not understand, and are encouraged to profile the IdentifierClass 1377 wherever feasible; however, if an application protocol requires more 1378 code points than are allowed by the IdentifierClass, protocol 1379 designers are encouraged to define a profile of the FreeformClass 1380 that restricts the allowable code points as tightly as possible. 1381 (The PRECIS Working Group considered the option of allowing 1382 "superclasses" as well as profiles of PRECIS string classes, but 1383 decided against allowing superclasses to reduce the likelihood of 1384 security and interoperability problems.) 1386 12.4. Local Character Set Issues 1388 When systems use local character sets other than ASCII and Unicode, 1389 this specification leaves the problem of converting between the local 1390 character set and Unicode up to the application or local system. If 1391 different applications (or different versions of one application) 1392 implement different rules for conversions among coded character sets, 1393 they could interpret the same name differently and contact different 1394 application servers or other network entities. This problem is not 1395 solved by security protocols, such as Transport Layer Security (TLS) 1396 [RFC5246] and the Simple Authentication and Security Layer (SASL) 1397 [RFC4422], that do not take local character sets into account. 1399 12.5. Visually Similar Characters 1401 Some characters are visually similar and thus can cause confusion 1402 among humans. Such characters are often called "confusable 1403 characters" or "confusables". 1405 The problem of confusable characters is not necessarily caused by the 1406 use of Unicode code points outside the ASCII range. For example, in 1407 some presentations and to some individuals the string "ju1iet" 1408 (spelled with DIGIT ONE, U+0031, as the third character) might appear 1409 to be the same as "juliet" (spelled with LATIN SMALL LETTER L, 1410 U+006C), especially on casual visual inspection. This phenomenon is 1411 sometimes called "typejacking". 1413 However, the problem is made more serious by introducing the full 1414 range of Unicode code points into protocol strings. For example, the 1415 characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the 1416 Cherokee block look similar to the ASCII characters "STPETER" as they 1417 might appear when presented using a "creative" font family. 1419 In some examples of confusable characters, it is unlikely that the 1420 average human could tell the difference between the real string and 1421 the fake string. (Indeed, there is no programmatic way to 1422 distinguish with full certainty which is the fake string and which is 1423 the real string; in some contexts, the string formed of Cherokee 1424 characters might be the real string and the string formed of ASCII 1425 characters might be the fake string.) Because PRECIS-compliant 1426 strings can contain almost any properly-encoded Unicode code point, 1427 it can be relatively easy to fake or mimic some strings in systems 1428 that use the PRECIS framework. The fact that some strings are easily 1429 confused introduces security vulnerabilities of the kind that have 1430 also plagued the World Wide Web, specifically the phenomenon known as 1431 phishing. 1433 Despite the fact that some specific suggestions about identification 1434 and handling of confusable characters appear in the Unicode Security 1435 Considerations [UTR36] and the Unicode Security Mechanisms [UTS39], 1436 it is also true (as noted in [RFC5890]) that "there are no 1437 comprehensive technical solutions to the problems of confusable 1438 characters". Because it is impossible to map visually similar 1439 characters without a great deal of context (such as knowing the font 1440 families used), the PRECIS framework does nothing to map similar- 1441 looking characters together, nor does it prohibit some characters 1442 because they look like others. 1444 Nevertheless, specifications for application protocols that use this 1445 framework are strongly encouraged to describe how confusable 1446 characters can be abused to compromise the security of systems that 1447 use the protocol in question, along with any protocol-specific 1448 suggestions for overcoming those threats. In particular, software 1449 implementations and service deployments that use PRECIS-based 1450 technologies are strongly encouraged to define and implement 1451 consistent policies regarding the registration, storage, and 1452 presentation of visually similar characters. The following 1453 recommendations are appropriate: 1455 1. An application service SHOULD define a policy that specifies the 1456 scripts or blocks of characters that the service will allow to be 1457 registered (e.g., in an account name) or stored (e.g., in a file 1458 name). Such a policy SHOULD be informed by the languages and 1459 scripts that are used to write registered account names; in 1460 particular, to reduce confusion, the service SHOULD forbid 1461 registration or storage of strings that contain characters from 1462 more than one script and SHOULD restrict registrations to 1463 characters drawn from a very small number of scripts (e.g., 1464 scripts that are well-understood by the administrators of the 1465 service, to improve manageability). 1467 2. User-oriented application software SHOULD define a policy that 1468 specifies how internationalized strings will be presented to a 1469 human user. Because every human user of such software has a 1470 preferred language or a small set of preferred languages, the 1471 software SHOULD gather that information either explicitly from 1472 the user or implicitly via the operating system of the user's 1473 device. Furthermore, because most languages are typically 1474 represented by a single script or a small set of scripts, and 1475 because most scripts are typically contained in one or more 1476 blocks of characters, the software SHOULD warn the user when 1477 presenting a string that mixes characters from more than one 1478 script or block, or that uses characters outside the normal range 1479 of the user's preferred language(s). (Such a recommendation is 1480 not intended to discourage communication across different 1481 communities of language users; instead, it recognizes the 1482 existence of such communities and encourages due caution when 1483 presenting unfamiliar scripts or characters to human users.) 1485 The challenges inherent in supporting the full range of Unicode code 1486 points have in the past led some to hope for a way to 1487 programmatically negotiate more restrictive ranges based on locale, 1488 script, or other relevant factors, to tag the locale associated with 1489 a particular string, etc. As a general-purpose internationalization 1490 technology, the PRECIS framework does not include such mechanisms. 1492 12.6. Security of Passwords 1494 Two goals of passwords are to maximize the amount of entropy and to 1495 minimize the potential for false positives. These goals can be 1496 achieved in part by allowing a wide range of code points and by 1497 ensuring that passwords are handled in such a way that code points 1498 are not compared aggressively. Therefore, it is NOT RECOMMENDED for 1499 application protocols to profile the FreeformClass for use in 1500 passwords in a way that removes entire categories (e.g., by 1501 disallowing symbols or punctuation). Furthermore, it is NOT 1502 RECOMMENDED for application protocols to map uppercase and titlecase 1503 code points to their lowercase equivalents in such strings; instead, 1504 it is RECOMMENDED to preserve the case of all code points contained 1505 in such strings and to compare them in a case-sensitive manner. 1507 That said, software implementers need to be aware that there exist 1508 tradeoffs between entropy and usability. For example, allowing a 1509 user to establish a password containing "uncommon" code points might 1510 make it difficult for the user to access a service when using an 1511 unfamiliar or constrained input device. 1513 Some application protocols use passwords directly, whereas others 1514 reuse technologies that themselves process passwords (one example of 1515 such a technology is the Simple Authentication and Security Layer 1516 [RFC4422]). Moreover, passwords are often carried by a sequence of 1517 protocols with backend authentication systems or data storage systems 1518 such as RADIUS [RFC2865] and LDAP [RFC4510]. Developers of 1519 application protocols are encouraged to look into reusing these 1520 profiles instead of defining new ones, so that end-user expectations 1521 about passwords are consistent no matter which application protocol 1522 is used. 1524 In protocols that provide passwords as input to a cryptographic 1525 algorithm such as a hash function, the client will need to perform 1526 proper preparation of the password before applying the algorithm, 1527 since the password is not available to the server in plaintext form. 1529 Further discussion of password handling can be found in 1530 [I-D.ietf-precis-saslprepbis]. 1532 13. Interoperability Considerations 1534 13.1. Encoding 1536 Although strings that are consumed in PRECIS-based application 1537 protocols are often encoded using UTF-8 [RFC3629], the exact encoding 1538 is a matter for the application protocol that uses PRECIS, not for 1539 the PRECIS framework. 1541 13.2. Character Sets 1543 It is known that some existing systems are unable to support the full 1544 Unicode character set, or even any characters outside the ASCII 1545 range. If two (or more) applications need to interoperate when 1546 exchanging data (e.g., for the purpose of authenticating a username 1547 or password), they will naturally need to have in common at least one 1548 coded character set (as defined by [RFC6365]). Establishing such a 1549 baseline is a matter for the application protocol that uses PRECIS, 1550 not for the PRECIS framework. 1552 13.3. Unicode Versions 1554 Changes to the properties of Unicode code points can occur as the 1555 Unicode Standard is modified from time to time. For example, three 1556 code points underwent changes in their GeneralCategory between 1557 Unicode 5.2 (current at the time IDNA2008 was originally published) 1558 and Unicode 6.0, as described in [RFC6452]. Implementers might need 1559 to be aware that the treatment of these characters differs depending 1560 on which version of Unicode is available on the system that is using 1561 IDNA2008 or PRECIS. Other such differences might arise between the 1562 version of Unicode current at the time of this writing (7.0) and 1563 future versions. 1565 13.4. Potential Changes to Handling of Certain Unicode Code Points 1567 As part of the review of Unicode 7.0 for IDNA, a question was raised 1568 about a newly-added code point that led to a re-analysis of the 1569 Normalization Rules used by IDNA and inherited by this document 1570 (Section 5.2.4). Some of the general issues are described in 1571 [IAB-Statement] and pursued in more detail in 1572 [I-D.klensin-idna-5892upd-unicode70]. 1574 At the time of writing, these issues have yet to be settled. 1575 However, implementers need to be aware that this specification is 1576 likely to be updated in the future to address these issues. The 1577 potential changes include: 1579 o The range of characters in the LetterDigits category 1580 (Section 4.2.1 and Section 9.1) might be narrowed. 1582 o Some characters with special properties that are now allowed might 1583 be excluded. 1585 o More "Additional Mapping Rules" (Section 5.2.2) might be defined. 1587 o Alternative normalization methods might be added. 1589 Nevertheless, implementations and deployments that are sensitive to 1590 the advice given in this specification are unlikely to run into 1591 significant problems as a consequence of these issues or potential 1592 changes - specifically the advice to use the more restrictive 1593 IdentifierClass whenever possible, or if using the FreeformClass to 1594 allow only a restricted set of characters, particularly avoiding 1595 characters whose implications they do not actually understand. 1597 14. References 1599 14.1. Normative References 1601 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, 1602 October 1969. 1604 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1605 Requirement Levels", BCP 14, RFC 2119, March 1997. 1607 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network 1608 Interchange", RFC 5198, March 2008. 1610 [Unicode7.0] 1611 The Unicode Consortium, "The Unicode Standard, Version 1612 7.0.0", 2014, 1613 . 1615 14.2. Informative References 1617 [IAB-Statement] 1618 Internet Architecture Board, "IAB Statement on Identifiers 1619 and Unicode 7.0.0", January 2015, . 1623 [I-D.ietf-precis-mappings] 1624 Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS 1625 classes", draft-ietf-precis-mappings-08 (work in 1626 progress), June 2014. 1628 [I-D.ietf-precis-nickname] 1629 Saint-Andre, P., "Preparation and Comparison of 1630 Nicknames", draft-ietf-precis-nickname-14 (work in 1631 progress), December 2014. 1633 [I-D.ietf-precis-saslprepbis] 1634 Saint-Andre, P. and A. Melnikov, "Username and Password 1635 Preparation Algorithms", draft-ietf-precis-saslprepbis-13 1636 (work in progress), December 2014. 1638 [I-D.ietf-xmpp-6122bis] 1639 Saint-Andre, P., "Extensible Messaging and Presence 1640 Protocol (XMPP): Address Format", draft-ietf-xmpp- 1641 6122bis-18 (work in progress), December 2014. 1643 [I-D.klensin-idna-5892upd-unicode70] 1644 Klensin, J. and P. Faeltstroem, "IDNA Update for Unicode 1645 7.0.0", draft-klensin-idna-5892upd-unicode70-03 (work in 1646 progress), January 2015. 1648 [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, 1649 "Remote Authentication Dial In User Service (RADIUS)", RFC 1650 2865, June 2000. 1652 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 1653 Internationalized Strings ("stringprep")", RFC 3454, 1654 December 2002. 1656 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 1657 "Internationalizing Domain Names in Applications (IDNA)", 1658 RFC 3490, March 2003. 1660 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1661 Profile for Internationalized Domain Names (IDN)", RFC 1662 3491, March 2003. 1664 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1665 10646", STD 63, RFC 3629, November 2003. 1667 [RFC4422] Melnikov, A. and K. Zeilenga, "Simple Authentication and 1668 Security Layer (SASL)", RFC 4422, June 2006. 1670 [RFC4510] Zeilenga, K., "Lightweight Directory Access Protocol 1671 (LDAP): Technical Specification Road Map", RFC 4510, June 1672 2006. 1674 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 1675 Recommendations for Internationalized Domain Names 1676 (IDNs)", RFC 4690, September 2006. 1678 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1679 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1680 May 2008. 1682 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1683 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1685 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1686 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1688 [RFC5890] Klensin, J., "Internationalized Domain Names for 1689 Applications (IDNA): Definitions and Document Framework", 1690 RFC 5890, August 2010. 1692 [RFC5891] Klensin, J., "Internationalized Domain Names in 1693 Applications (IDNA): Protocol", RFC 5891, August 2010. 1695 [RFC5892] Faltstrom, P., "The Unicode Code Points and 1696 Internationalized Domain Names for Applications (IDNA)", 1697 RFC 5892, August 2010. 1699 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 1700 Internationalized Domain Names for Applications (IDNA)", 1701 RFC 5893, August 2010. 1703 [RFC5894] Klensin, J., "Internationalized Domain Names for 1704 Applications (IDNA): Background, Explanation, and 1705 Rationale", RFC 5894, August 2010. 1707 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 1708 Internationalized Domain Names in Applications (IDNA) 1709 2008", RFC 5895, September 2010. 1711 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in 1712 Internationalization in the IETF", BCP 166, RFC 6365, 1713 September 2011. 1715 [RFC6452] Faltstrom, P. and P. Hoffman, "The Unicode Code Points and 1716 Internationalized Domain Names for Applications (IDNA) - 1717 Unicode 6.0", RFC 6452, November 2011. 1719 [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and 1720 Problem Statement for the Preparation and Comparison of 1721 Internationalized Strings (PRECIS)", RFC 6885, March 2013. 1723 [RFC6943] Thaler, D., "Issues in Identifier Comparison for Security 1724 Purposes", RFC 6943, May 2013. 1726 [UAX9] The Unicode Consortium, "Unicode Standard Annex #9: 1727 Unicode Bidirectional Algorithm", September 2012, 1728 . 1730 [UAX11] The Unicode Consortium, "Unicode Standard Annex #11: East 1731 Asian Width", September 2012, 1732 . 1734 [UAX15] The Unicode Consortium, "Unicode Standard Annex #15: 1735 Unicode Normalization Forms", August 2012, 1736 . 1738 [UnicodeCurrent] 1739 The Unicode Consortium, "The Unicode Standard", 1740 2014-present, . 1742 [UTR36] The Unicode Consortium, "Unicode Technical Report #36: 1743 Unicode Security Considerations", July 2012, 1744 . 1746 [UTS39] The Unicode Consortium, "Unicode Technical Standard #39: 1747 Unicode Security Mechanisms", July 2012, 1748 . 1750 14.3. URIs 1752 [1] http://unicode.org/Public/UNIDATA/PropertyAliases.txt 1754 [2] http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt 1756 Appendix A. Acknowledgements 1758 The authors would like to acknowledge the comments and contributions 1759 of the following individuals during working group discussion: David 1760 Black, Edward Burns, Dan Chiba, Mark Davis, Alan DeKok, Martin 1761 Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Bjoern 1762 Hoehrmann, Paul Hoffman, Jeffrey Hutzelman, Simon Josefsson, John 1763 Klensin, Alexey Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, 1764 Pete Resnick, Andrew Sullivan, Dave Thaler, Yoshiro Yoneya, and 1765 Florian Zeitz. 1767 Special thanks are due to John Klensin and Patrik Faltstrom for their 1768 challenging feedback and detailed reviews. 1770 Charlie Kaufman, Tom Taylor, and Tim Wicinski reviewed the document 1771 on behalf of the Security Directorate, the General Area Review Team, 1772 and the Operations and Management Directorate, respectively. 1774 During IESG review, Alissa Cooper, Stephen Farrell, and Barry Leiba 1775 provided comments that led to further improvements. 1777 Some algorithms and textual descriptions have been borrowed from 1778 [RFC5892]. Some text regarding security has been borrowed from 1779 [RFC5890], [I-D.ietf-precis-saslprepbis], and 1780 [I-D.ietf-xmpp-6122bis]. 1782 Peter Saint-Andre wishes to acknowledge Cisco Systems, Inc., for 1783 employing him during his work on earlier versions of this document. 1785 Authors' Addresses 1787 Peter Saint-Andre 1788 &yet 1790 Email: peter@andyet.com 1791 URI: https://andyet.com/ 1793 Marc Blanchet 1794 Viagenie 1795 246 Aberdeen 1796 Quebec, QC G1R 2E1 1797 Canada 1799 Email: Marc.Blanchet@viagenie.ca 1800 URI: http://www.viagenie.ca/