idnits 2.17.1 draft-ietf-precis-framework-21.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 10, 2014) is 3418 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1662 -- Looks like a reference, but probably isn't: '2' on line 1664 == Outdated reference: A later version (-12) exists of draft-ietf-precis-mappings-08 == Outdated reference: A later version (-19) exists of draft-ietf-precis-nickname-13 == Outdated reference: A later version (-18) exists of draft-ietf-precis-saslprepbis-12 == Outdated reference: A later version (-24) exists of draft-ietf-xmpp-6122bis-18 -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PRECIS P. Saint-Andre 3 Internet-Draft &yet 4 Obsoletes: 3454 (if approved) M. Blanchet 5 Intended status: Standards Track Viagenie 6 Expires: June 13, 2015 December 10, 2014 8 PRECIS Framework: Preparation, Enforcement, and Comparison of 9 Internationalized Strings in Application Protocols 10 draft-ietf-precis-framework-21 12 Abstract 14 Application protocols using Unicode characters in protocol strings 15 need to properly handle such strings in order to enforce 16 internationalization rules for strings placed in various protocol 17 slots (such as addresses and identifiers) and to perform valid 18 comparison operations (e.g., for purposes of authentication or 19 authorization). This document defines a framework enabling 20 application protocols to perform the preparation, enforcement, and 21 comparison of internationalized strings ("PRECIS") in a way that 22 depends on the properties of Unicode characters and thus is agile 23 with respect to versions of Unicode. As a result, this framework 24 provides a more sustainable approach to the handling of 25 internationalized strings than the previous framework, known as 26 Stringprep (RFC 3454). This document obsoletes RFC 3454. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on June 13, 2015. 45 Copyright Notice 47 Copyright (c) 2014 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 6 65 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 7 66 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 7 67 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9 68 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9 69 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 9 70 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 9 71 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10 72 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 10 73 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 10 74 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11 75 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 11 76 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 11 77 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12 78 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12 79 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 5.1. Profiles Must Not Be Multiplied Beyond Necessity . . . . 12 81 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 13 82 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 13 83 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 13 84 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 14 85 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 14 86 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 15 87 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 15 88 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 16 89 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 16 90 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 16 91 6.3. Building Application-Layer Constructs . . . . . . . . . . 17 92 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 18 93 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 18 94 9. Category Definitions Used to Calculate Derived Property . . . 21 95 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 21 96 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 21 97 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 22 98 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 22 99 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 22 100 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 22 101 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 22 102 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 22 103 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 22 104 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 23 105 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 23 106 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 23 107 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 23 108 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 23 109 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 23 110 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 24 111 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 24 112 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 24 113 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 24 114 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 115 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 25 116 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 25 117 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 26 118 12. Security Considerations . . . . . . . . . . . . . . . . . . . 28 119 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 28 120 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 29 121 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 29 122 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 29 123 12.5. Visually Similar Characters . . . . . . . . . . . . . . 29 124 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 31 125 13. Interoperability Considerations . . . . . . . . . . . . . . . 32 126 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 127 14.1. Normative References . . . . . . . . . . . . . . . . . . 33 128 14.2. Informative References . . . . . . . . . . . . . . . . . 33 129 14.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 36 130 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 36 131 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 133 1. Introduction 135 Application protocols using Unicode characters [Unicode7.0] in 136 protocol strings need to properly handle such strings in order to 137 enforce internationalization rules for strings placed in various 138 protocol slots (such as addresses and identifiers) and to perform 139 valid comparison operations (e.g., for purposes of authentication or 140 authorization). This document defines a framework enabling 141 application protocols to perform the preparation, enforcement, and 142 comparison of internationalized strings ("PRECIS") in a way that 143 depends on the properties of Unicode characters and thus is agile 144 with respect to versions of Unicode. 146 As described in the PRECIS problem statement [RFC6885], many IETF 147 protocols have used the Stringprep framework [RFC3454] as the basis 148 for preparing, enforcing, and comparing protocol strings that contain 149 Unicode characters, especially characters outside the ASCII range 150 [RFC20]. The Stringprep framework was developed during work on the 151 original technology for internationalized domain names (IDNs), here 152 called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the 153 Stringprep profile for IDNs. At the time, Stringprep was designed as 154 a general framework so that other application protocols could define 155 their own Stringprep profiles. Indeed, a number of application 156 protocols defined such profiles. 158 After the publication of [RFC3454] in 2002, several significant 159 issues arose with the use of Stringprep in the IDN case, as 160 documented in the IAB's recommendations regarding IDNs [RFC4690] 161 (most significantly, Stringprep was tied to Unicode version 3.2). 162 Therefore, the newer IDNA specifications, here called "IDNA2008" 163 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer 164 use Stringprep and Nameprep. This migration away from Stringprep for 165 IDNs prompted other "customers" of Stringprep to consider new 166 approaches to the preparation, enforcement, and comparison of 167 internationalized strings, as described in [RFC6885]. 169 This document defines a framework for a post-Stringprep approach to 170 the preparation, enforcement, and comparison of internationalized 171 strings in application protocols, based on several principles: 173 1. Define a small set of string classes that specify the Unicode 174 characters (i.e., specific "code points") appropriate for common 175 application protocol constructs. 177 2. Define each PRECIS string class in terms of Unicode code points 178 and their properties so that an algorithm can be used to 179 determine whether each code point or character category is (a) 180 valid, (b) allowed in certain contexts, (c) disallowed, or (d) 181 unassigned. 183 3. Use an "inclusion model" such that a string class consists only 184 of code points that are explicitly allowed, with the result that 185 any code point not explicitly allowed is forbidden. 187 4. Enable application protocols to define profiles of the PRECIS 188 string classes if necessary (addressing matters such as width 189 mapping, case mapping, Unicode normalization, and directionality) 190 but strongly discourage the multiplication of profiles beyond 191 necessity in order to avoid violations of the Principle of Least 192 User Astonishment. 194 It is expected that this framework will yield the following benefits: 196 o Application protocols will be agile with regard to Unicode 197 versions. 199 o Implementers will be able to share code point tables and software 200 code across application protocols, most likely by means of 201 software libraries. 203 o End users will be able to acquire more accurate expectations about 204 the characters that are acceptable in various contexts. Given 205 this more uniform set of string classes, it is also expected that 206 copy/paste operations between software implementing different 207 application protocols will be more predictable and coherent. 209 Whereas the string classes define the "baseline" code points for a 210 range of applications, profiling enables application protocols to 211 apply the string classes in ways that are appropriate for common 212 constructs such as usernames [I-D.ietf-precis-saslprepbis], opaque 213 strings such as passwords [I-D.ietf-precis-saslprepbis], and 214 nicknames [I-D.ietf-precis-nickname]. Profiles are responsible for 215 defining the handling of right-to-left characters as well as various 216 mapping operations of the kind also discussed for IDNs in [RFC5895], 217 such as case preservation or lowercasing, Unicode normalization, 218 mapping of certain characters to other characters or to nothing, and 219 mapping of full-width and half-width characters. 221 When an application applies a profile of a PRECIS string class, it 222 transforms an input string (which might or might not be conforming) 223 into an output string that definitively conforms to the profile. In 224 particular, this document focuses on the resulting ability to achieve 225 the following objectives: 227 a. Enforcing all the the rules of a profile for a single output 228 string (e.g., to determine if a string can be included protocol 229 slot, communicated to another entity within a protocol, stored in 230 a retrieval system, etc.). 232 b. Comparing two output strings to determine if they equivalent, 233 typically through octet-for-octet matching to test for "bit- 234 string identity" (e.g., to make an access decision for purposes 235 of authentication or authorization as further described in 236 [RFC6943]). 238 The opportunity to define profiles naturally introduces the 239 possibility of a proliferation of profiles, thus potentially 240 mitigating the benefits of common code and violating user 241 expectations. See Section 5 for a discussion of this important 242 topic. 244 In addition, it is extremely important for protocol designers and 245 application developers to understand that the transformation of an 246 input string to an output string is rarely reversible. As one 247 relatively simple example, case mapping would transform an input 248 string of "StPeter" to "stpeter", and information about the 249 capitalization of the first and third characters would be lost. 250 Similar considerations apply to other forms of mapping and 251 normalization. 253 Although this framework is similar to IDNA2008 and includes by 254 reference some of the character categories defined in [RFC5892], it 255 defines additional character categories to meet the needs of common 256 application protocols other than DNS. 258 The character categories and calculation rules defined under 259 Section 8 and Section 9 are normative and apply to all Unicode code 260 points. The code point table that results from applying the 261 character categories and calculation rules to the latest version of 262 Unicode can be found in an IANA registry. 264 2. Terminology 266 Many important terms used in this document are defined in [RFC5890], 267 [RFC6365], [RFC6885], and [Unicode7.0]. The terms "left-to-right" 268 (LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex 269 #9 [UAX9]. 271 As of the date of writing, the version of Unicode published by the 272 Unicode Consortium is 7.0 [Unicode7.0]; however, PRECIS is not tied 273 to a specific version of Unicode. The latest version of Unicode is 274 always available [UnicodeCurrent]. 276 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 277 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 278 "OPTIONAL" in this document are to be interpreted as described in 279 [RFC2119]. 281 3. Preparation, Enforcement, and Comparison 283 This document distinguishes between three different actions that an 284 entity can take with regard to a string: 286 o Enforcement entails applying all of the rules specified for a 287 particular string class or profile thereof to an individual 288 string, for the purpose of determining if the string can be used 289 in a given protocol slot. 291 o Comparison entails applying all of the rules specified for a 292 particular string class or profile thereof to two separate 293 strings, for the purpose of determining if the two strings are 294 equivalent. 296 o Preparation entails only ensuring that the characters in an 297 individual string are allowed by the underlying PRECIS string 298 class. 300 In most cases, authoritative entities such as servers are responsible 301 for enforcement, whereas subsidiary entities such as clients are 302 responsible only for preparation. The rationale for this distinction 303 is that clients might not have the facilities (in terms of device 304 memory and processing power) to enforce all the rules regarding 305 internationalized strings (such as width mapping and Unicode 306 normalization), although they can more easily limit the repertoire of 307 characters they offer to an end user. By contrast, it is assumed 308 that a server would have more capacity to enforce the rules, and in 309 any case acts as an authority regarding allowable strings in protocol 310 slots such as addresses and endpoint identifiers. In addition, a 311 client cannot necessarily be trusted to properly generate such 312 strings, especially for security-sensitive contexts such as 313 authentication and authorization. 315 4. String Classes 317 4.1. Overview 319 Starting in 2010, various "customers" of Stringprep began to discuss 320 the need to define a post-Stringprep approach to the preparation and 321 comparison of internationalized strings other than IDNs. This 322 community analyzed the existing Stringprep profiles and also weighed 323 the costs and benefits of defining a relatively small set of Unicode 324 characters that would minimize the potential for user confusion 325 caused by visually similar characters (and thus be relatively "safe") 326 vs. defining a much larger set of Unicode characters that would 327 maximize the potential for user creativity (and thus be relatively 328 "expressive"). As a result, the community concluded that most 329 existing uses could be addressed by two string classes: 331 IdentifierClass: a sequence of letters, numbers, and some symbols 332 that is used to identify or address a network entity such as a 333 user account, a venue (e.g., a chatroom), an information source 334 (e.g., a data feed), or a collection of data (e.g., a file); the 335 intent is that this class will minimize user confusion in a wide 336 variety of application protocols, with the result that safety has 337 been prioritized over expressiveness for this class. 339 FreeformClass: a sequence of letters, numbers, symbols, spaces, and 340 other characters that is used for free-form strings, including 341 passwords as well as display elements such as human-friendly 342 nicknames for devices or for participants in a chatroom; the 343 intent is that this class will allow nearly any Unicode character, 344 with the result that expressiveness has been prioritized over 345 safety for this class. Note well that protocol designers, 346 application developers, service providers, and end users might not 347 understand or be able to enter all of the characters that can be 348 included in the FreeformClass - see Section 12.3 for details. 350 Future specifications might define additional PRECIS string classes, 351 such as a class that falls somewhere between the IdentifierClass and 352 the FreeformClass. At this time, it is not clear how useful such a 353 class would be. In any case, because application developers are able 354 to define profiles of PRECIS string classes, a protocol needing a 355 construct between the IdentiferClass and the FreeformClass could 356 define a restricted profile of the FreeformClass if needed. 358 The following subsections discuss the IdentifierClass and 359 FreeformClass in more detail, with reference to the dimensions 360 described in Section 3 of [RFC6885]. Each string class is defined by 361 the following behavioral rules: 363 Valid: Defines which code points are treated as valid for the 364 string. 366 Contextual Rule Required: Defines which code points are treated as 367 allowed only if the requirements of a contextual rule are met 368 (i.e., either CONTEXTJ or CONTEXTO). 370 Disallowed: Defines which code points need to be excluded from the 371 string. 373 Unassigned: Defines application behavior in the presence of code 374 points that are unknown (i.e., not yet designated) for the version 375 of Unicode used by the application. 377 This document defines the valid, contextual rule required, 378 disallowed, and unassigned rules for the IdentifierClass and 379 FreeformClass. As described under Section 5, profiles of these 380 string classes are responsible for defining the width mapping, 381 additional mappings, case mapping, normalization, and directionality 382 rules. 384 4.2. IdentifierClass 386 Most application technologies need strings that can be used to refer 387 to, include, or communicate protocol strings like usernames, file 388 names, data feed identifiers, and chatroom names. We group such 389 strings into a class called "IdentifierClass" having the following 390 features. 392 4.2.1. Valid 394 o Code points traditionally used as letters and numbers in writing 395 systems, i.e., the LetterDigits ("A") category first defined in 396 [RFC5892] and listed here under Section 9.1. 398 o Code points in the range U+0021 through U+007E, i.e., the 399 (printable) ASCII7 ("K") rule defined under Section 9.11. These 400 code points are "grandfathered" into PRECIS and thus are valid 401 even if they would otherwise be disallowed according to the 402 property-based rules specified in the next section. 404 Note: Although the PRECIS IdentifierClass re-uses the LetterDigits 405 category from IDNA2008, the range of characters allowed in the 406 IdentifierClass is wider than the range of characters allowed in 407 IDNA2008. The main reason is that IDNA2008 applies the Unstable 408 category before the LetterDigits category, thus disallowing 409 uppercase characters, whereas the IdentifierClass does not apply 410 the Unstable category. 412 4.2.2. Contextual Rule Required 414 o A number of characters from the Exceptions ("F") category defined 415 under Section 9.6 (see Section 9.6 for a full list). 417 o Joining characters, i.e., the JoinControl ("H") category defined 418 under Section 9.8. 420 4.2.3. Disallowed 422 o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category 423 defined under Section 9.9. 425 o Control characters, i.e., the Controls ("L") category defined 426 under Section 9.12. 428 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") 429 category defined under Section 9.13. 431 o Space characters, i.e., the Spaces ("N") category defined under 432 Section 9.14. 434 o Symbol characters, i.e., the Symbols ("O") category defined under 435 Section 9.15. 437 o Punctuation characters, i.e., the Punctuation ("P") category 438 defined under Section 9.16. 440 o Any character that has a compatibility equivalent, i.e., the 441 HasCompat ("Q") category defined under Section 9.17. These code 442 points are disallowed even if they would otherwise be valid 443 according to the property-based rules specified in the previous 444 section. 446 o Letters and digits other than the "traditional" letters and digits 447 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 448 defined under Section 9.18. 450 4.2.4. Unassigned 452 Any code points that are not yet designated in the Unicode character 453 set are considered Unassigned for purposes of the IdentifierClass, 454 and such code points are to be treated as Disallowed. See 455 Section 9.10. 457 4.2.5. Examples 459 As described in the Introduction to this document, the string classes 460 do not handle all issues related to string preparation and comparison 461 (such as case mapping); instead, such issues are handled at the level 462 of profiles. Examples for two profiles of the IdentifierClass can be 463 found in [I-D.ietf-precis-saslprepbis] (the UsernameIdentifierClass 464 profile) and in [I-D.ietf-xmpp-6122bis] (the LocalpartIdentifierClass 465 profile). 467 4.3. FreeformClass 469 Some application technologies need strings that can be used in a 470 free-form way, e.g., as a password in an authentication exchange (see 471 [I-D.ietf-precis-saslprepbis]) or a nickname in a chatroom (see 472 [I-D.ietf-precis-nickname]). We group such things into a class 473 called "FreeformClass" having the following features. 475 Security Warning: As mentioned, the FreeformClass prioritizes 476 expressiveness over safety; Section 12.3 describes some of the 477 security hazards involved with using or profiling the 478 FreeformClass. 480 Security Warning: Consult Section 12.6 for relevant security 481 considerations when strings conforming to the FreeformClass, or a 482 profile thereof, are used as passwords. 484 4.3.1. Valid 486 o Traditional letters and numbers, i.e., the LetterDigits ("A") 487 category first defined in [RFC5892] and listed here under 488 Section 9.1. 490 o Letters and digits other than the "traditional" letters and digits 491 allowed in IDNs, i.e., the OtherLetterDigits ("R") category 492 defined under Section 9.18. 494 o Code points in the range U+0021 through U+007E, i.e., the 495 (printable) ASCII7 ("K") rule defined under Section 9.11. 497 o Any character that has a compatibility equivalent, i.e., the 498 HasCompat ("Q") category defined under Section 9.17. 500 o Space characters, i.e., the Spaces ("N") category defined under 501 Section 9.14. 503 o Symbol characters, i.e., the Symbols ("O") category defined under 504 Section 9.15. 506 o Punctuation characters, i.e., the Punctuation ("P") category 507 defined under Section 9.16. 509 4.3.2. Contextual Rule Required 511 o A number of characters from the Exceptions ("F") category defined 512 under Section 9.6 (see Section 9.6 for a full list). 514 o Joining characters, i.e., the JoinControl ("H") category defined 515 under Section 9.8. 517 4.3.3. Disallowed 519 o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category 520 defined under Section 9.9. 522 o Control characters, i.e., the Controls ("L") category defined 523 under Section 9.12. 525 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") 526 category defined under Section 9.13. 528 4.3.4. Unassigned 530 Any code points that are not yet designated in the Unicode character 531 set are considered Unassigned for purposes of the FreeformClass, and 532 such code points are to be treated as Disallowed. 534 4.3.5. Examples 536 As described in the Introduction to this document, the string classes 537 do not handle all issues related to string preparation and comparison 538 (such as case mapping); instead, such issues are handled at the level 539 of profiles. Examples for two profiles of the FreeformClass can be 540 found in [I-D.ietf-precis-nickname] (the NicknameFreeformClass 541 profile) and in [I-D.ietf-xmpp-6122bis] (the 542 ResourcepartIdentifierClass profile). 544 5. Profiles 546 This framework document defines the valid, contextual-rule-required, 547 disallowed, and unassigned rules for the IdentifierClass and the 548 FreeformClass. A profile of a PRECIS string class MUST define the 549 width mapping, additional mappings (if any), case mapping, 550 normalization, and directionality rules. A profile MAY also restrict 551 the allowable characters above and beyond the definition of the 552 relevant PRECIS string class (but MUST NOT add as valid any code 553 points that are disallowed by the relevant PRECIS string class). 554 These matters are discussed in the following subsections. 556 Profiles of the PRECIS string classes are registered with the IANA as 557 described under Section 11.3. Profile names use the following 558 convention: they are of the form "Profilename of BaseClass", where 559 the "Profilename" string is a differentiator and "BaseClass" is the 560 name of the PRECIS string class being profiled; for example, the 561 profile of the Freeform used for opaque strings such as passwords is 562 the "OpaqueString" profile [I-D.ietf-precis-saslprepbis]. 564 5.1. Profiles Must Not Be Multiplied Beyond Necessity 566 The risk of profile proliferation is significant because having too 567 many profiles will result in different behavior across various 568 applications, thus violating what is known in user interface design 569 as the Principle of Least Astonishment. 571 Indeed, we already have too many profiles. Ideally we would have at 572 most two or three profiles. Unfortunately, numerous application 573 protocols exist with their own quirks regarding protocol strings. 574 Domain names, email addresses, instant messaging addresses, chatroom 575 nicknames, filenames, authentication identifiers, passwords, and 576 other strings are already out there in the wild and need to be 577 supported in existing application protocols such as DNS, SMTP, XMPP, 578 IRC, NFS, iSCSI, EAP, and SASL among others. 580 Nevertheless, profiles must not be multiplied beyond necessity. 582 To help prevent profile proliferation, this document recommends 583 sensible defaults for the various options offered to profile creators 584 (such as width mapping and Unicode normalization). In addition, the 585 guidelines for designated experts provided under Section 10 are meant 586 to encourage a high level of due diligence regarding new profiles. 588 5.2. Rules 590 5.2.1. Width Mapping Rule 592 The width mapping rule of a profile specifies whether width mapping 593 is performed on the characters of a string, and how the mapping is 594 done. Typically such mapping consists of mapping fullwidth and 595 halfwidth characters, i.e., code points with a Decomposition Type of 596 Wide or Narrow, to their decomposition mappings; as an example, 597 FULLWIDTH DIGIT ZERO (U+FF10) would be mapped to DIGIT ZERO (U+0030). 599 The normalization form specified by a profile (see below) has an 600 impact on the need for width mapping. Because width mapping is 601 performed as a part of compatibility decomposition, a profile 602 employing either normalization form KD (NFKD) or normalization form 603 KC (NFKC) does not need to specify width mapping. However, if 604 Unicode normalization form C (NFC) is used (as is recommended) then 605 the profile needs to specify whether to apply width mapping; in this 606 case, width mapping is in general RECOMMENDED because allowing 607 fullwidth and halfwidth characters to remain unmapped to their 608 compatibility variants would violate the Principle of Least 609 Astonishment. For more information about the concept of width in 610 East Asian scripts within Unicode, see Unicode Standard Annex #11 611 [UAX11]. 613 5.2.2. Additional Mapping Rule 615 The additional mapping rule of a profile specifies whether additional 616 mappings is performed on the characters of a string, such as: 618 Mapping of delimiter characters (such as '@', ':', '/', '+', and 619 '-') 621 Mapping of special characters (e.g., non-ASCII space characters to 622 ASCII space or control characters to nothing). 624 The PRECIS mappings document [I-D.ietf-precis-mappings] describes 625 such mappings in more detail. 627 5.2.3. Case Mapping Rule 629 The case mapping rule of a profile specifies whether case mapping 630 (instead of case preservation) is performed on the characters of a 631 string, and how the mapping is applied (e.g., mapping uppercase and 632 titlecase characters to their lowercase equivalents). 634 If case mapping is desired (instead of case preservation), it is 635 RECOMMENDED to use Unicode Default Case Folding as defined in Chapter 636 3 of the Unicode Standard [Unicode7.0]. 638 Note: Unicode Default Case Folding is not designed to handle 639 various localization issues (such as so-called "dotless i" in 640 several Turkic languages). The PRECIS mappings document 641 [I-D.ietf-precis-mappings] describes these issues in greater 642 detail and defines a "local case mapping" method that handles some 643 locale-dependent and context-dependent mappings. 645 In order to maximize entropy and minimize the potential for false 646 positives, it is NOT RECOMMENDED for application protocols to map 647 uppercase and titlecase code points to their lowercase equivalents 648 when strings conforming to the FreeformClass, or a profile thereof, 649 are used in passwords; instead, it is RECOMMENDED to preserve the 650 case of all code points contained in such strings and then perform 651 case-sensitive comparison. See also the related discussion in 652 [I-D.ietf-precis-saslprepbis]. 654 5.2.4. Normalization Rule 656 The normalization rule of a profile specifies which Unicode 657 normalization form (D, KD, C, or KC) is to be applied (see Unicode 658 Standard Annex #15 [UAX15] for background information). 660 In accordance with [RFC5198], normalization form C (NFC) is 661 RECOMMENDED. 663 5.2.5. Directionality Rule 665 The directionality rule of a profile specifies how to treat strings 666 that contain right-to-left (RTL) characters (see Unicode Standard 667 Annex #9 [UAX9]). In general this document recommends applying the 668 "Bidi Rule" from [RFC5893] to strings that contain RTL characters. 670 Mixed-direction strings (that is, strings containing some portions 671 that are left-to-right and other portions that are right-to-left) are 672 not directly supported by the PRECIS framework itself, since there is 673 currently no widely accepted and implemented solution for the safe 674 display of mixed-direction strings. An application protocol that 675 uses the PRECIS framework (or an extension to the framework) could 676 define better ways to present mixed-direction strings; however, that 677 work is outside the scope of this framework and would likely require 678 a great deal of careful research into the problems of displaying 679 bidirectional text. 681 5.3. A Note about Spaces 683 With regard to the IdentiferClass, the consensus of the PRECIS 684 Working Group was that spaces are problematic for many reasons, 685 including: 687 o Many Unicode characters are confusable with ASCII space. 689 o Even if non-ASCII space characters are mapped to ASCII space 690 (U+0020), space characters are often not rendered in user 691 interfaces, leading to the possibility that a human user might 692 consider a string containing spaces to be equivalent to the same 693 string without spaces. 695 o In some locales, some devices are known to generate a character 696 other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a 697 user performs an action like hitting the space bar on a keyboard. 699 One consequence of disallowing space characters in the 700 IdentifierClass might be to effectively discourage their use within 701 identifiers created in newer application protocols; given the 702 challenges involved with properly handling space characters 703 (especially non-ASCII space characters) in identifiers and other 704 protocol strings, the PRECIS Working Group considered this to be a 705 feature, not a bug. 707 However, the FreeformClass does allow spaces, which enables 708 application protocols to define profiles of the FreeformClass that 709 are more flexible than any profiles of the IdentifierClass. In 710 addition, as explained in the previous section, application protocols 711 can also define application-layer constructs containing spaces. 713 6. Applications 715 6.1. How to Use PRECIS in Applications 717 Although PRECIS has been designed with applications in mind, 718 internationalization is not suddenly made easy though the use of 719 PRECIS. Application developers still need to give some thought to 720 how they will use the PRECIS string classes, or profiles thereof, in 721 their applications. This section provides some guidelines to 722 application developers (and to expert reviewers of application 723 protocol specifications). 725 o Don't define your own profile unless absolutely necessary (see 726 Section 5.1). Existing profiles have been design for wide re-use. 727 It is highly likely that an existing profile will meet your needs, 728 especially given the ability to specify further excluded 729 characters (Section 6.2) and to build application-layer constructs 730 (see Section 6.3). 732 o Do specify: 734 * Exactly which entities are responsible for preparation, 735 enforcement, and comparison of internationalized strings (e.g., 736 servers or clients). 738 * Exactly when those entities need to complete their tasks (e.g., 739 a server might need to enforce the rules of a profile before 740 allowing a client to gain network access). 742 * Exactly which protocol slots need to be checked against which 743 profiles (e.g., checking the address of a message's intended 744 recipient against the UsernameCaseMapped profile 745 [I-D.ietf-precis-saslprepbis] of the IdentifierClass, or 746 checking the password of a user against the OpaqueString 747 profile [I-D.ietf-precis-saslprepbis] of the FreeformClass). 749 See [I-D.ietf-precis-saslprepbis] and [I-D.ietf-xmpp-6122bis] for 750 definitions of these matters for several applications. 752 6.2. Further Excluded Characters 754 An application protocol that uses a profile MAY specify particular 755 code points that are not allowed in relevant slots within that 756 application protocol, above and beyond those excluded by the string 757 class or profile. 759 That is, an application protocol MAY do either of the following: 761 1. Exclude specific code points that are allowed by the relevant 762 string class. 764 2. Exclude characters matching certain Unicode properties (e.g., 765 math symbols) that are included in the relevant PRECIS string 766 class. 768 As a result of such exclusions, code points that are defined as valid 769 for the PRECIS string class or profile will be defined as disallowed 770 for the relevant protocol slot. 772 Typically, such exclusions are defined for the purpose of backward- 773 compatibility with legacy formats within an application protocol. 774 These are defined for application protocols, not profiles, in order 775 to prevent multiplication of profiles beyond necessity (see 776 Section 5.1). 778 6.3. Building Application-Layer Constructs 780 Sometimes, an application-layer construct does not map in a 781 straightforward manner to one of the base string classes or a profile 782 thereof. Consider, for example, the "simple user name" construct in 783 the Simple Authentication and Security Layer (SASL) [RFC4422]. 784 Depending on the deployment, a simple user name might take the form 785 of a user's full name (e.g., the user's personal name followed by a 786 space and then the user's family name). Such a simple user name 787 cannot be defined as an instance of the IdentifierClass or a profile 788 thereof, since space characters are not allowed in the 789 IdentifierClass; however, it could be defined using a space-separated 790 sequence of IdentifierClass instances, as in the following ABNF 791 [RFC5234] from [I-D.ietf-precis-saslprepbis]: 793 username = userpart *(1*SP userpart) 794 userpart = 1*(idbyte) 795 ; 796 ; an "idbyte" is a byte used to represent a 797 ; UTF-8 encoded Unicode code point that can be 798 ; contained in a string that conforms to the 799 ; PRECIS "IdentifierClass" 800 ; 802 Similar techniques could be used to define many application-layer 803 constructs, say of the form "user@domain" or "/path/to/file". 805 7. Order of Operations 807 To ensure proper comparison, the rules specified for a particular 808 string class or profile MUST be applied in the following order: 810 1. Width Mapping Rule 812 2. Additional Mapping Rule 814 3. Case Mapping Rule 816 4. Normalization Rule 818 5. Directionality Rule 820 6. Behavioral rules for determining whether a code point is valid, 821 allowed under a contextual rule, disallowed, or unassigned 823 As already described, the width mapping, additional mapping, case 824 mapping, normalization, and directionality rules are specified for 825 each profile, whereas the behavioral rules are specified for each 826 string class. Some of the logic behind this order is provided under 827 Section 5.2.1 (see also the PRECIS mappings document 828 [I-D.ietf-precis-mappings]). 830 8. Code Point Properties 832 In order to implement the string classes described above, this 833 document does the following: 835 1. Reviews and classifies the collections of code points in the 836 Unicode character set by examining various code point properties. 838 2. Defines an algorithm for determining a derived property value, 839 which can vary depending on the string class being used by the 840 relevant application protocol. 842 This document is not intended to specify precisely how derived 843 property values are to be applied in protocol strings. That 844 information is the responsibility of the protocol specification that 845 uses or profiles a PRECIS string class from this document. The value 846 of the property is to be interpreted as follows. 848 PROTOCOL VALID Those code points that are allowed to be used in any 849 PRECIS string class (currently, IdentifierClass and 850 FreeformClass). The abbreviated term "PVALID" is used to refer to 851 this value in the remainder of this document. 853 SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to 854 be used in specific string classes. In the remainder of this 855 document, the abbreviated term *_PVAL is used, where * = (ID | 856 FREE), i.e., either "FREE_PVAL" or "ID_PVAL". In practice, the 857 derived property ID_PVAL is not used in this specification, since 858 every ID_PVAL code point is PVALID. 860 CONTEXTUAL RULE REQUIRED Some characteristics of the character, such 861 as its being invisible in certain contexts or problematic in 862 others, require that it not be used in labels unless specific 863 other characters or properties are present. As in IDNA2008, there 864 are two subdivisions of CONTEXTUAL RULE REQUIRED, the first for 865 Join_controls (called "CONTEXTJ") and the second for other 866 characters (called "CONTEXTO"). A character with the derived 867 property value CONTEXTJ or CONTEXTO MUST NOT be used unless an 868 appropriate rule has been established and the context of the 869 character is consistent with that rule. The most notable of the 870 CONTEXTUAL RULE REQUIRED characters are the Join Control 871 characters U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON- 872 JOINER, which have a derived property value of CONTEXTJ. See 873 Appendix A of [RFC5892] for more information. 875 DISALLOWED Those code points that are not permitted in any PRECIS 876 string class. 878 SPECIFIC CLASS DISALLOWED Those code points that are not to be 879 included in one of the string classes but that might be permitted 880 in others. In the remainder of this document, the abbreviated 881 term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS" 882 or "ID_DIS". In practice, the derived property FREE_DIS is not 883 used in this specification, since every FREE_DIS code point is 884 DISALLOWED. 886 UNASSIGNED Those code points that are not designated (i.e. are 887 unassigned) in the Unicode Standard. 889 The algorithm to calculate the value of the derived property is as 890 follows: 892 If .cp. .in. Exceptions Then Exceptions(cp); 893 Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); 894 Else If .cp. .in. Unassigned Then UNASSIGNED; 895 Else If .cp. .in. ASCII7 Then PVALID; 896 Else If .cp. .in. JoinControl Then CONTEXTJ; 897 Else If .cp. .in. OldHangulJamo Then DISALLOWED; 898 Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED; 899 Else If .cp. .in. Controls Then DISALLOWED; 900 Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL; 901 Else If .cp. .in. LetterDigits Then PVALID; 902 Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL; 903 Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL; 904 Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL; 905 Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL; 906 Else DISALLOWED; 908 The value of the derived property calculated can depend on the string 909 class; for example, if an identifier used in an application protocol 910 is defined as profiling the PRECIS IdentifierClass then a space 911 character such as U+0020 would be assigned to ID_DIS, whereas if an 912 identifier is defined as profiling the PRECIS FreeformClass then the 913 character would be assigned to FREE_PVAL. For the sake of brevity, 914 the designation "FREE_PVAL" is used herein, instead of the longer 915 designation "ID_DIS or FREE_PVAL". In practice, the derived 916 properties ID_PVAL and FREE_DIS are not used in this specification, 917 since every ID_PVAL code point is PVALID and every FREE_DIS code 918 point is DISALLOWED. 920 Use of the name of a rule (such as "Exceptions") implies the set of 921 code points that the rule defines, whereas the same name as a 922 function call (such as "Exceptions(cp)") implies the value that the 923 code point has in the Exceptions table. 925 The mechanisms described here allow determination of the value of the 926 property for future versions of Unicode (including characters added 927 after Unicode 5.2 or 7.0 depending on the category, since some 928 categories mentioned in this document are simply pointers to IDNA2008 929 and therefore were defined at the time of Unicode 5.2). Changes in 930 Unicode properties that do not affect the outcome of this process 931 therefore do not affect this framework. For example, a character can 932 have its Unicode General_Category value (see Chapter 4 of the Unicode 933 Standard [Unicode7.0]) change from So to Sm, or from Lo to Ll, 934 without affecting the algorithm results. Moreover, even if such 935 changes were to result, the BackwardCompatible list (Section 9.7) can 936 be adjusted to ensure the stability of the results. 938 9. Category Definitions Used to Calculate Derived Property 940 The derived property obtains its value based on a two-step procedure: 942 1. Characters are placed in one or more character categories either 943 (1) based on core properties defined by the Unicode Standard or 944 (2) by treating the code point as an exception and addressing the 945 code point based on its code point value. These categories are 946 not mutually exclusive. 948 2. Set operations are used with these categories to determine the 949 values for a property specific to a given string class. These 950 operations are specified under Section 8. 952 Note: Unicode property names and property value names might have 953 short abbreviations, such as "gc" for the General_Category 954 property and "Ll" for the Lowercase_Letter property value of the 955 gc property. 957 In the following specification of character categories, the operation 958 that returns the value of a particular Unicode character property for 959 a code point is designated by using the formal name of that property 960 (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for 961 "code point". For example, the value of the General_Category 962 property for a code point is indicated by General_Category(cp). 964 The first ten categories (A-J) shown below were previously defined 965 for IDNA2008 and are referenced from [RFC5892] to ease the 966 understanding of how PRECIS handles various characters. Some of 967 these categories are reused in PRECIS and some of them are not; 968 however, the lettering of categories is retained to prevent overlap 969 and to ease implementation of both IDNA2008 and PRECIS in a single 970 software application. The next eight categories (K-R) are specific 971 to PRECIS. 973 9.1. LetterDigits (A) 975 This category is defined in Secton 2.1 of [RFC5892] and is included 976 by reference for use in PRECIS. 978 9.2. Unstable (B) 980 This category is defined in Secton 2.2 of [RFC5892] but is not used 981 in PRECIS. 983 9.3. IgnorableProperties (C) 985 This category is defined in Secton 2.3 of [RFC5892] but is not used 986 in PRECIS. 988 Note: See the "PrecisIgnorableProperties (M)" category below for a 989 more inclusive category used in PRECIS identifiers. 991 9.4. IgnorableBlocks (D) 993 This category is defined in Secton 2.4 of [RFC5892] but is not used 994 in PRECIS. 996 9.5. LDH (E) 998 This category is defined in Secton 2.5 of [RFC5892] but is not used 999 in PRECIS. 1001 Note: See the "ASCII7 (K)" category below for a more inclusive 1002 category used in PRECIS identifiers. 1004 9.6. Exceptions (F) 1006 This category is defined in Secton 2.6 of [RFC5892] and is included 1007 by reference for use in PRECIS. 1009 9.7. BackwardCompatible (G) 1011 This category is defined in Secton 2.7 of [RFC5892] and is included 1012 by reference for use in PRECIS. 1014 Note: Management of this category is handled via the processes 1015 specified in [RFC5892]. At the time of this writing (and also at the 1016 time that RFC 5892 was published), this category consisted of the 1017 empty set; however, that is subject to change as described in RFC 1018 5892. 1020 9.8. JoinControl (H) 1022 This category is defined in Secton 2.8 of [RFC5892] and is included 1023 by reference for use in PRECIS. 1025 9.9. OldHangulJamo (I) 1027 This category is defined in Secton 2.9 of [RFC5892] and is included 1028 by reference for use in PRECIS. 1030 9.10. Unassigned (J) 1032 This category is defined in Secton 2.10 of [RFC5892] and is included 1033 by reference for use in PRECIS. 1035 9.11. ASCII7 (K) 1037 This PRECIS-specific category consists of all printable, non-space 1038 characters from the 7-bit ASCII range. By applying this category, 1039 the algorithm specified under Section 8 exempts these characters from 1040 other rules that might be applied during PRECIS processing, on the 1041 assumption that these code points are in such wide use that 1042 disallowing them would be counter-productive. 1044 K: cp is in {0021..007E} 1046 9.12. Controls (L) 1048 This PRECIS-specific category consists of all control characters. 1050 L: Control(cp) = True 1052 9.13. PrecisIgnorableProperties (M) 1054 This PRECIS-specific category is used to group code points that are 1055 discouraged from use in PRECIS string classes. 1057 M: Default_Ignorable_Code_Point(cp) = True or 1058 Noncharacter_Code_Point(cp) = True 1060 The definition for Default_Ignorable_Code_Point can be found in the 1061 DerivedCoreProperties.txt [2] file. 1063 9.14. Spaces (N) 1065 This PRECIS-specific category is used to group code points that are 1066 space characters. 1068 N: General_Category(cp) is in {Zs} 1070 9.15. Symbols (O) 1072 This PRECIS-specific category is used to group code points that are 1073 symbols. 1075 O: General_Category(cp) is in {Sm, Sc, Sk, So} 1077 9.16. Punctuation (P) 1079 This PRECIS-specific category is used to group code points that are 1080 punctuation characters. 1082 P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} 1084 9.17. HasCompat (Q) 1086 This PRECIS-specific category is used to group code points that have 1087 compatibility equivalents as explained in Chapter 2 and Chapter 3 of 1088 the Unicode Standard [Unicode7.0]. 1090 Q: toNFKC(cp) != cp 1092 The toNFKC() operation returns the code point in normalization form 1093 KC. For more information, see Section 5 of Unicode Standard Annex 1094 #15 [UAX15]. 1096 9.18. OtherLetterDigits (R) 1098 This PRECIS-specific category is used to group code points that are 1099 letters and digits other than the "traditional" letters and digits 1100 grouped under the LetterDigits (A) class (see Section 9.1). 1102 R: General_Category(cp) is in {Lt, Nl, No, Me} 1104 10. Guidelines for Designated Experts 1106 Experience with internationalization in application protocols has 1107 shown that protocol designers and application developers usually do 1108 not understand the subtleties and tradeoffs involved with 1109 internationalization and that they need considerable guidance in 1110 making reasonable decisions with regard to the options before them. 1112 Therefore: 1114 o Protocol designers are strongly encouraged to question the 1115 assumption that they need to define new profiles, since existing 1116 profiles are designed for wide re-use (see Section 5 for further 1117 discussion). 1119 o Those who persist in defining new profiles are strongly encouraged 1120 to clearly explain a strong justification for doing so, and to 1121 publish a stable specification that provides all of the 1122 information described under Section 11.3. 1124 o The designated experts for profile registration requests ought to 1125 seek answers to all of the questions provided under Section 11.3 1126 and to encourage applicants to provide a stable specification 1127 documenting the profile (even though the registration policy for 1128 PRECIS profiles is Expert Review and a stable specification is not 1129 strictly required). 1131 o Developers of applications that use PRECIS are strongly encouraged 1132 to apply the guidelines provided under Section 6 and to seek out 1133 the advice of the designated experts or other knowledgeable 1134 individuals in doing so. 1136 o All parties are strongly encouraged to help prevent the 1137 multiplication of profiles beyond necessity, as described under 1138 Section 5.1, and to use PRECIS in ways that will minimize user 1139 confusion and insecure application behavior. 1141 Internationalization can be difficult and contentious; designated 1142 experts, profile registrants, and application developers are strongly 1143 encouraged to work together in a spirit of good faith and mutual 1144 understanding to achieve rough consensus on profile registration 1145 requests and the use of PRECIS in particular applications. They are 1146 also encouraged to bring additional expertise into the discussion if 1147 that would be helpful in adding perspective or otherwise resolving 1148 issues. 1150 11. IANA Considerations 1152 11.1. PRECIS Derived Property Value Registry 1154 IANA is requested to create a PRECIS-specific registry with the 1155 Derived Properties for the versions of Unicode that are released 1156 after (and including) version 7.0. The derived property value is to 1157 be calculated in cooperation with a designated expert [RFC5226] 1158 according to the rules specified under Section 8 and Section 9. 1160 The IESG is to be notified if backward-incompatible changes to the 1161 table of derived properties are discovered or if other problems arise 1162 during the process of creating the table of derived property values 1163 or during expert review. Changes to the rules defined under 1164 Section 8 and Section 9 require IETF Review. 1166 11.2. PRECIS Base Classes Registry 1168 IANA is requested to create a registry of PRECIS string classes. In 1169 accordance with [RFC5226], the registration policy is "RFC Required". 1171 The registration template is as follows: 1173 Base Class: [the name of the PRECIS string class] 1175 Description: [a brief description of the PRECIS string class and its 1176 intended use, e.g., "A sequence of letters, numbers, and symbols 1177 that is used to identify or address a network entity."] 1179 Specification: [the RFC number] 1181 The initial registrations are as follows: 1183 Base Class: FreeformClass. 1184 Description: A sequence of letters, numbers, symbols, spaces, and 1185 other code points that is used for free-form strings. 1186 Specification: Section 4.3 of this document. 1187 [Note to RFC Editor: please change "this document" 1188 to the RFC number issued for this specification.] 1190 Base Class: IdentifierClass. 1191 Description: A sequence of letters, numbers, and symbols that is 1192 used to identify or address a network entity. 1193 Specification: Section 4.2 of this document. 1194 [Note to RFC Editor: please change "this document" 1195 to the RFC number issued for this specification.] 1197 11.3. PRECIS Profiles Registry 1199 IANA is requested to create a registry of profiles that use the 1200 PRECIS string classes. In accordance with [RFC5226], the 1201 registration policy is "Expert Review". This policy was chosen in 1202 order to ease the burden of registration while ensuring that 1203 "customers" of PRECIS receive appropriate guidance regarding the 1204 sometimes complex and subtle internationalization issues related to 1205 profiles of PRECIS string classes. 1207 The registration template is as follows: 1209 Name: [the name of the profile] 1211 Base Class: [which PRECIS string class is being profiled] 1213 Applicability: [the specific protocol elements to which this profile 1214 applies, e.g., "Localparts in XMPP addresses."] 1216 Replaces: [the Stringprep profile that this PRECIS profile replaces, 1217 if any] 1219 Width Mapping Rule: [the behavioral rule for handling of width, 1220 e.g., "Map fullwidth and halfwidth characters to their 1221 compatibility variants."] 1223 Additional Mapping Rule: [any additional mappings are required or 1224 recommended, e.g., "Map non-ASCII space characters to ASCII 1225 space."] 1227 Case Mapping Rule: [the behavioral rule for handling of case, e.g., 1228 "Unicode Default Case Folding"] 1230 Normalization Rule: [which Unicode normalization form is applied, 1231 e.g., "NFC"] 1233 Directionality Rule: [the behavioral rule for handling of right-to- 1234 left code points, e.g., "The 'Bidi Rule' defined in RFC 5893 1235 applies."] 1237 Enforcement: [which entities enforce the rules, and when that 1238 enforcement occurs during protocol operations] 1240 Specification: [a pointer to relevant documentation, such as an RFC 1241 or Internet-Draft] 1243 In order to request a review, the registrant shall send a completed 1244 template to the precis@ietf.org list or its designated successor. 1246 Factors to focus on while defining profiles and reviewing profile 1247 registrations include the following: 1249 o Would an existing PRECIS string class or profile solve the 1250 problem? If not, why not? (See Section 5.1 for related 1251 considerations.) 1253 o Is the problem being addressed by this profile well-defined? 1255 o Does the specification define what kinds of applications are 1256 involved and the protocol elements to which this profile applies? 1258 o Is the profile clearly defined? 1260 o Is the profile based on an appropriate dividing line between user 1261 interface (culture, context, intent, locale, device limitations, 1262 etc.) and the use of conformant strings in protocol elements? 1264 o Are the width mapping, case mapping, additional mappings, 1265 normalization, and directionality rules appropriate for the 1266 intended use? 1268 o Does the profile explain which entities enforce the rules, and 1269 when such enforcement occurs during protocol operations? 1271 o Does the profile reduce the degree to which human users could be 1272 surprised or confused by application behavior (the "Principle of 1273 Least Astonishment")? 1275 o Does the profile introduce any new security concerns such as those 1276 described under Section 12 of this document (e.g., false positives 1277 for authentication or authorization)? 1279 12. Security Considerations 1281 12.1. General Issues 1283 If input strings that appear "the same" to users are programmatically 1284 considered to be distinct in different systems, or if input strings 1285 that appear distinct to users are programmatically considered to be 1286 "the same" in different systems, then users can be confused. Such 1287 confusion can have security implications, such as the false positives 1288 and false negatieves discussed in [RFC6943]. One starting goal of 1289 work on the PRECIS framework was to limit the number of times that 1290 users are confused (consistent with the "Principle of Least 1291 Astonishment"). Unfortunately, this goal has been difficult to 1292 achieve given the large number of application protocols already in 1293 existence. Despite these difficulties, profiles should not be 1294 multiplied beyond necessity (see Section 5.1. In particular, 1295 application protocol designers should think long and hard before 1296 defining a new profile instead of using one that has already been 1297 defined, and if they decide to define a new profile then they should 1298 clearly explain their reasons for doing so. 1300 The security of applications that use this framework can depend in 1301 part on the proper preparation, enforcement, and comparison of 1302 internationalized strings. For example, such strings can be used to 1303 make authentication and authorization decisions, and the security of 1304 an application could be compromised if an entity providing a given 1305 string is connected to the wrong account or online resource based on 1306 different interpretations of the string (again, see [RFC6943]). 1308 Specifications of application protocols that use this framework are 1309 strongly encouraged to describe how internationalized strings are 1310 used in the protocol, including the security implications of any 1311 false positives and false negatives that might result from various 1312 enforcement and comparison operations. For some helpful guidelines, 1313 refer to [RFC6943], [RFC5890], [UTR36], and [UTS39]. 1315 12.2. Use of the IdentifierClass 1317 Strings that conform to the IdentifierClass and any profile thereof 1318 are intended to be relatively safe for use in a broad range of 1319 applications, primarily because they include only letters, digits, 1320 and "grandfathered" non-space characters from the ASCII range; thus 1321 they exclude spaces, characters with compatibility equivalents, and 1322 almost all symbols and punctuation marks. However, because such 1323 strings can still include so-called confusable characters (see 1324 Section 12.5), protocol designers and implementers are encouraged to 1325 pay close attention to the security considerations described 1326 elsewhere in this document. 1328 12.3. Use of the FreeformClass 1330 Strings that conform to the FreeformClass and many profiles thereof 1331 can include virtually any Unicode character. This makes the 1332 FreeformClass quite expressive, but also problematic from the 1333 perspective of possible user confusion. Protocol designers are 1334 hereby warned that the FreeformClass contains codepoints they might 1335 not understand, and are encouraged to profile the IdentifierClass 1336 wherever feasible; however, if an application protocol requires more 1337 code points than are allowed by the IdentifierClass, protocol 1338 designers are encouraged to define a profile of the FreeformClass 1339 that restricts the allowable code points as tightly as possible. 1340 (The PRECIS Working Group considered the option of allowing 1341 "superclasses" as well as profiles of PRECIS string classes, but 1342 decided against allowing superclasses to reduce the likelihood of 1343 security and interoperability problems.) 1345 12.4. Local Character Set Issues 1347 When systems use local character sets other than ASCII and Unicode, 1348 this specification leaves the problem of converting between the local 1349 character set and Unicode up to the application or local system. If 1350 different applications (or different versions of one application) 1351 implement different rules for conversions among coded character sets, 1352 they could interpret the same name differently and contact different 1353 application servers or other network entities. This problem is not 1354 solved by security protocols, such as Transport Layer Security (TLS) 1355 [RFC5246] and the Simple Authentication and Security Layer (SASL) 1356 [RFC4422], that do not take local character sets into account. 1358 12.5. Visually Similar Characters 1360 Some characters are visually similar and thus can cause confusion 1361 among humans. Such characters are often called "confusable 1362 characters" or "confusables". 1364 The problem of confusable characters is not necessarily caused by the 1365 use of Unicode code points outside the ASCII range. For example, in 1366 some presentations and to some individuals the string "ju1iet" 1367 (spelled with DIGIT ONE, U+0031, as the third character) might appear 1368 to be the same as "juliet" (spelled with LATIN SMALL LETTER L, 1369 U+006C), especially on casual visual inspection. This phenomenon is 1370 sometimes called "typejacking". 1372 However, the problem is made more serious by introducing the full 1373 range of Unicode code points into protocol strings. For example, the 1374 characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the 1375 Cherokee block look similar to the ASCII characters "STPETER" as they 1376 might appear when presented using a "creative" font family. 1378 In some examples of confusable characters, it is unlikely that the 1379 average human could tell the difference between the real string and 1380 the fake string. (Indeed, there is no programmatic way to 1381 distinguish with full certainty which is the fake string and which is 1382 the real string; in some contexts, the string formed of Cherokee 1383 characters might be the real string and the string formed of ASCII 1384 characters might be the fake string.) Because PRECIS-compliant 1385 strings can contain almost any properly-encoded Unicode code point, 1386 it can be relatively easy to fake or mimic some strings in systems 1387 that use the PRECIS framework. The fact that some strings are easily 1388 confused introduces security vulnerabilities of the kind that have 1389 also plagued the World Wide Web, specifically the phenomenon known as 1390 phishing. 1392 Despite the fact that some specific suggestions about identification 1393 and handling of confusable characters appear in the Unicode Security 1394 Considerations [UTR36] and the Unicode Security Mechanisms [UTS39], 1395 it is also true (as noted in [RFC5890]) that "there are no 1396 comprehensive technical solutions to the problems of confusable 1397 characters". Because it is impossible to map visually similar 1398 characters without a great deal of context (such as knowing the font 1399 families used), the PRECIS framework does nothing to map similar- 1400 looking characters together, nor does it prohibit some characters 1401 because they look like others. 1403 Nevertheless, specifications for application protocols that use this 1404 framework are strongly encouraged to describe how confusable 1405 characters can be abused to compromise the security of systems that 1406 use the protocol in question, along with any protocol-specific 1407 suggestions for overcoming those threats. In particular, software 1408 implementations and service deployments that use PRECIS-based 1409 technologies are strongly encouraged to define and implement 1410 consistent policies regarding the registration, storage, and 1411 presentation of visually similar characters. The following 1412 recommendations are appropriate: 1414 1. An application service SHOULD define a policy that specifies the 1415 scripts or blocks of characters that the service will allow to be 1416 registered (e.g., in an account name) or stored (e.g., in a file 1417 name). Such a policy SHOULD be informed by the languages and 1418 scripts that are used to write registered account names; in 1419 particular, to reduce confusion, the service SHOULD forbid 1420 registration or storage of strings that contain characters from 1421 more than one script and SHOULD restrict registrations to 1422 characters drawn from a very small number of scripts (e.g., 1423 scripts that are well-understood by the administrators of the 1424 service, to improve manageability). 1426 2. User-oriented application software SHOULD define a policy that 1427 specifies how internationalized strings will be presented to a 1428 human user. Because every human user of such software has a 1429 preferred language or a small set of preferred languages, the 1430 software SHOULD gather that information either explicitly from 1431 the user or implicitly via the operating system of the user's 1432 device. Furthermore, because most languages are typically 1433 represented by a single script or a small set of scripts, and 1434 because most scripts are typically contained in one or more 1435 blocks of characters, the software SHOULD warn the user when 1436 presenting a string that mixes characters from more than one 1437 script or block, or that uses characters outside the normal range 1438 of the user's preferred language(s). (Such a recommendation is 1439 not intended to discourage communication across different 1440 communities of language users; instead, it recognizes the 1441 existence of such communities and encourages due caution when 1442 presenting unfamiliar scripts or characters to human users.) 1444 The challenges inherent in supporting the full range of Unicode code 1445 points have in the past led some to hope for a way to 1446 programmatically negotiate more restrictive ranges based on locale, 1447 script, or other relevant factors, to tag the locale associated with 1448 a particular string, etc. As a general-purpose internationalization 1449 technology, the PRECIS framework does not include such mechanisms. 1451 12.6. Security of Passwords 1453 Two goals of passwords are to maximize the amount of entropy and to 1454 minimize the potential for false positives. These goals can be 1455 achieved in part by allowing a wide range of code points and by 1456 ensuring that passwords are handled in such a way that code points 1457 are not compared aggressively. Therefore, it is NOT RECOMMENDED for 1458 application protocols to profile the FreeformClass for use in 1459 passwords in a way that removes entire categories (e.g., by 1460 disallowing symbols or punctuation). Furthermore, it is NOT 1461 RECOMMENDED for application protocols to map uppercase and titlecase 1462 code points to their lowercase equivalents in such strings; instead, 1463 it is RECOMMENDED to preserve the case of all code points contained 1464 in such strings and to compare them in a case-sensitive manner. 1466 That said, software implementers need to be aware that there exist 1467 tradeoffs between entropy and usability. For example, allowing a 1468 user to establish a password containing "uncommon" code points might 1469 make it difficult for the user to access a service when using an 1470 unfamiliar or constrained input device. 1472 Some application protocols use passwords directly, whereas others 1473 reuse technologies that themselves process passwords (one example of 1474 such a technology is the Simple Authentication and Security Layer 1475 [RFC4422]). Moreover, passwords are often carried by a sequence of 1476 protocols with backend authentication systems or data storage systems 1477 such as RADIUS [RFC2865] and LDAP [RFC4510]. Developers of 1478 application protocols are encouraged to look into reusing these 1479 profiles instead of defining new ones, so that end-user expectations 1480 about passwords are consistent no matter which application protocol 1481 is used. 1483 In protocols that provide passwords as input to a cryptographic 1484 algorithm such as a hash function, the client will need to perform 1485 proper preparation of the password before applying the algorithm, 1486 since the password is not available to the server in plaintext form. 1488 Further discussion of password handling can be found in 1489 [I-D.ietf-precis-saslprepbis]. 1491 13. Interoperability Considerations 1493 Although strings that are consumed in PRECIS-based application 1494 protocols are often encoded using UTF-8 [RFC3629], the exact encoding 1495 is a matter for the application protocol that uses PRECIS, not for 1496 the PRECIS framework. 1498 It is known that some existing systems are unable to support the full 1499 Unicode character set, or even any characters outside the ASCII 1500 range. If two (or more) applications need to interoperate when 1501 exchanging data (e.g., for the purpose of authenticating a username 1502 or password), they will naturally need to have in common at least one 1503 coded character set (as defined by [RFC6365]). Establishing such a 1504 baseline is a matter for the application protocol that uses PRECIS, 1505 not for the PRECIS framework. 1507 Changes to the properties of Unicode code points can occur as the 1508 Unicode Standard is modified from time to time. For example, three 1509 code points underwent changes in their GeneralCategory between 1510 Unicode 5.2 (current at the time IDNA2008 was originally published) 1511 and Unicode 6.0, as described in [RFC6452]. Implementers might need 1512 to be aware that the treatment of these characters differs depending 1513 on which version of Unicode is available on the system that is using 1514 IDNA2008 or PRECIS. Other such differences might arise between the 1515 version of Unicode current at the time of this writing (7.0) and 1516 future versions. 1518 14. References 1520 14.1. Normative References 1522 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, 1523 October 1969. 1525 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1526 Requirement Levels", BCP 14, RFC 2119, March 1997. 1528 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network 1529 Interchange", RFC 5198, March 2008. 1531 [Unicode7.0] 1532 The Unicode Consortium, "The Unicode Standard, Version 1533 7.0.0", 2014, 1534 . 1536 14.2. Informative References 1538 [I-D.ietf-precis-mappings] 1539 Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS 1540 classes", draft-ietf-precis-mappings-08 (work in 1541 progress), June 2014. 1543 [I-D.ietf-precis-nickname] 1544 Saint-Andre, P., "Preparation and Comparison of 1545 Nicknames", draft-ietf-precis-nickname-13 (work in 1546 progress), November 2014. 1548 [I-D.ietf-precis-saslprepbis] 1549 Saint-Andre, P. and A. Melnikov, "Username and Password 1550 Preparation Algorithms", draft-ietf-precis-saslprepbis-12 1551 (work in progress), December 2014. 1553 [I-D.ietf-xmpp-6122bis] 1554 Saint-Andre, P., "Extensible Messaging and Presence 1555 Protocol (XMPP): Address Format", draft-ietf-xmpp- 1556 6122bis-18 (work in progress), December 2014. 1558 [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, 1559 "Remote Authentication Dial In User Service (RADIUS)", RFC 1560 2865, June 2000. 1562 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 1563 Internationalized Strings ("stringprep")", RFC 3454, 1564 December 2002. 1566 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 1567 "Internationalizing Domain Names in Applications (IDNA)", 1568 RFC 3490, March 2003. 1570 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 1571 Profile for Internationalized Domain Names (IDN)", RFC 1572 3491, March 2003. 1574 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1575 10646", STD 63, RFC 3629, November 2003. 1577 [RFC4422] Melnikov, A. and K. Zeilenga, "Simple Authentication and 1578 Security Layer (SASL)", RFC 4422, June 2006. 1580 [RFC4510] Zeilenga, K., "Lightweight Directory Access Protocol 1581 (LDAP): Technical Specification Road Map", RFC 4510, June 1582 2006. 1584 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 1585 Recommendations for Internationalized Domain Names 1586 (IDNs)", RFC 4690, September 2006. 1588 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1589 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1590 May 2008. 1592 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1593 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1595 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1596 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1598 [RFC5890] Klensin, J., "Internationalized Domain Names for 1599 Applications (IDNA): Definitions and Document Framework", 1600 RFC 5890, August 2010. 1602 [RFC5891] Klensin, J., "Internationalized Domain Names in 1603 Applications (IDNA): Protocol", RFC 5891, August 2010. 1605 [RFC5892] Faltstrom, P., "The Unicode Code Points and 1606 Internationalized Domain Names for Applications (IDNA)", 1607 RFC 5892, August 2010. 1609 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 1610 Internationalized Domain Names for Applications (IDNA)", 1611 RFC 5893, August 2010. 1613 [RFC5894] Klensin, J., "Internationalized Domain Names for 1614 Applications (IDNA): Background, Explanation, and 1615 Rationale", RFC 5894, August 2010. 1617 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 1618 Internationalized Domain Names in Applications (IDNA) 1619 2008", RFC 5895, September 2010. 1621 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in 1622 Internationalization in the IETF", BCP 166, RFC 6365, 1623 September 2011. 1625 [RFC6452] Faltstrom, P. and P. Hoffman, "The Unicode Code Points and 1626 Internationalized Domain Names for Applications (IDNA) - 1627 Unicode 6.0", RFC 6452, November 2011. 1629 [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and 1630 Problem Statement for the Preparation and Comparison of 1631 Internationalized Strings (PRECIS)", RFC 6885, March 2013. 1633 [RFC6943] Thaler, D., "Issues in Identifier Comparison for Security 1634 Purposes", RFC 6943, May 2013. 1636 [UAX9] The Unicode Consortium, "Unicode Standard Annex #9: 1637 Unicode Bidirectional Algorithm", September 2012, 1638 . 1640 [UAX11] The Unicode Consortium, "Unicode Standard Annex #11: East 1641 Asian Width", September 2012, 1642 . 1644 [UAX15] The Unicode Consortium, "Unicode Standard Annex #15: 1645 Unicode Normalization Forms", August 2012, 1646 . 1648 [UnicodeCurrent] 1649 The Unicode Consortium, "The Unicode Standard", 1650 2014-present, . 1652 [UTR36] The Unicode Consortium, "Unicode Technical Report #36: 1653 Unicode Security Considerations", July 2012, 1654 . 1656 [UTS39] The Unicode Consortium, "Unicode Technical Standard #39: 1657 Unicode Security Mechanisms", July 2012, 1658 . 1660 14.3. URIs 1662 [1] http://unicode.org/Public/UNIDATA/PropertyAliases.txt 1664 [2] http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt 1666 Appendix A. Acknowledgements 1668 The authors would like to acknowledge the comments and contributions 1669 of the following individuals during working group discussion: David 1670 Black, Edward Burns, Dan Chiba, Mark Davis, Alan DeKok, Martin 1671 Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Bjoern 1672 Hoehrmann, Paul Hoffman, Jeffrey Hutzelman, Simon Josefsson, John 1673 Klensin, Alexey Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, 1674 Pete Resnick, Andrew Sullivan, Dave Thaler, Yoshiro Yoneya, and 1675 Florian Zeitz. 1677 Special thanks are due to John Klensin and Patrik Faltstrom for their 1678 challenging feedback and detailed reviews. 1680 Charlie Kaufman, Tom Taylor, and Tim Wicinski reviewed the document 1681 on behalf of the Security Directorate, the General Area Review Team, 1682 and the Operations and Management Directorate, respectively. 1684 During IESG review, Alissa Cooper, Stephen Farrell, and Barry Leiba 1685 provided comments that led to further improvements. 1687 Some algorithms and textual descriptions have been borrowed from 1688 [RFC5892]. Some text regarding security has been borrowed from 1689 [RFC5890], [I-D.ietf-precis-saslprepbis], and 1690 [I-D.ietf-xmpp-6122bis]. 1692 Peter Saint-Andre wishes to acknowledge Cisco Systems, Inc., for 1693 employing him during his work on earlier versions of this document. 1695 Authors' Addresses 1697 Peter Saint-Andre 1698 &yet 1700 Email: peter@andyet.com 1701 URI: https://andyet.com/ 1703 Marc Blanchet 1704 Viagenie 1705 246 Aberdeen 1706 Quebec, QC G1R 2E1 1707 Canada 1709 Email: Marc.Blanchet@viagenie.ca 1710 URI: http://www.viagenie.ca/