idnits 2.17.1 draft-ietf-precis-problem-statement-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 924: '...le username, and SHOULD use the SASLpr...' RFC 2119 keyword, line 928: '...mpty string), the server MUST fail the...' RFC 2119 keyword, line 932: '... [SASLprep]), and both client and server SHOULD (*) use the...' RFC 2119 keyword, line 936: '...mpty string), the server MUST fail the...' RFC 2119 keyword, line 938: '...s requirement to MUST. Currently, the...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 800 has weird spacing: '...is used iSCSI...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 12, 2012) is 4427 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'SASL' is mentioned on line 942, but not defined == Missing Reference: 'SASLprep' is mentioned on line 1064, but not defined == Missing Reference: 'StringPrep' is mentioned on line 1149, but not defined == Missing Reference: 'RFC3629' is mentioned on line 1053, but not defined == Missing Reference: 'Stringprep' is mentioned on line 1060, but not defined == Missing Reference: 'PR29' is mentioned on line 1076, but not defined == Missing Reference: 'UTF-8' is mentioned on line 1124, but not defined == Missing Reference: 'Unicode' is mentioned on line 1137, but not defined == Outdated reference: A later version (-09) exists of draft-iab-identifier-comparison-00 -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3530 (Obsoleted by RFC 7530) -- Obsolete informational reference (is this intentional?): RFC 3920 (Obsoleted by RFC 6120) -- Obsolete informational reference (is this intentional?): RFC 4013 (Obsoleted by RFC 7613) -- Obsolete informational reference (is this intentional?): RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Blanchet 3 Internet-Draft Viagenie 4 Intended status: Informational A. Sullivan 5 Expires: September 13, 2012 Dyn, Inc. 6 March 12, 2012 8 Stringprep Revision Problem Statement 9 draft-ietf-precis-problem-statement-05.txt 11 Abstract 13 Using Unicode codepoints in protocol strings that expect comparison 14 with other strings requires preparation of the string that contains 15 the Unicode codepoints. Internationalizing Domain Names in 16 Applications (IDNA2003) defined and used Stringprep and Nameprep. 17 Other protocols subsequently defined Stringprep profiles. A new 18 approach different from Stringprep and Nameprep is used for a 19 revision of IDNA2003 (called IDNA2008). Other Stringprep profiles 20 need to be similarly updated or a replacement of Stringprep needs to 21 be designed. This document outlines the issues to be faced by those 22 designing a Stringprep replacement. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on September 13, 2012. 41 Copyright Notice 43 Copyright (c) 2012 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3. Stringprep Profiles Limitations . . . . . . . . . . . . . . . 5 73 4. Major Topics for Consideration . . . . . . . . . . . . . . . . 6 74 4.1. Comparison . . . . . . . . . . . . . . . . . . . . . . . . 6 75 4.1.1. Types of Identifiers . . . . . . . . . . . . . . . . . 6 76 4.1.2. Effect of comparison . . . . . . . . . . . . . . . . . 7 77 4.2. Dealing with characters . . . . . . . . . . . . . . . . . 7 78 4.2.1. Case folding, case sensitivity, and case 79 preservation . . . . . . . . . . . . . . . . . . . . . 7 80 4.2.2. Stringprep and NFKC . . . . . . . . . . . . . . . . . 8 81 4.2.3. Character mapping . . . . . . . . . . . . . . . . . . 8 82 4.2.4. Prohibited characters . . . . . . . . . . . . . . . . 8 83 4.2.5. Internal structure, delimiters, and special 84 characters . . . . . . . . . . . . . . . . . . . . . . 9 85 4.2.6. Restrictions because of glyph similarity . . . . . . . 10 86 4.3. Where the data comes from and where it goes . . . . . . . 10 87 4.3.1. User input and the source of protocol elements . . . . 10 88 4.3.2. User output . . . . . . . . . . . . . . . . . . . . . 10 89 4.3.3. Operations . . . . . . . . . . . . . . . . . . . . . . 10 90 4.3.4. Some useful classes of strings . . . . . . . . . . . . 11 91 5. Considerations for Stringprep replacement . . . . . . . . . . 12 92 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 93 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 94 8. Discussion home for this draft . . . . . . . . . . . . . . . . 13 95 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 96 10. Informative References . . . . . . . . . . . . . . . . . . . . 13 97 Appendix A. Classification of Stringprep Profiles . . . . . . . . 17 98 Appendix B. Evaluation of Stringprep Profiles . . . . . . . . . . 18 99 B.1. iSCSI Stringprep Profiles: RFC3722, RFC3721, RFC3720 . . . 18 100 B.2. SMTP/POP3/ManageSieve Stringprep Profiles: 101 RFC4954,RFC5034,RFC 5804 . . . . . . . . . . . . . . . . . 20 102 B.3. IMAP Stringprep Profiles: RFC5738, RFC4314: Usernames . . 21 103 B.4. IMAP Stringprep Profiles: RFC5738: Passwords . . . . . . . 23 104 B.5. Anonymous SASL Stringprep Profiles: RFC4505 . . . . . . . 24 105 B.6. XMPP Stringprep Profiles: RFC3920 Nodeprep . . . . . . . . 26 106 B.7. XMPP Stringprep Profiles: RFC3920 Resourceprep . . . . . . 27 107 B.8. EAP Stringprep Profiles: RFC3748 . . . . . . . . . . . . . 27 108 Appendix C. Changes between versions . . . . . . . . . . . . . . 28 109 C.1. 00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 110 C.2. 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 111 C.3. 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 112 C.4. 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 113 C.5. 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 114 C.6. 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 117 1. Introduction 119 Internationalizing Domain Names in Applications (IDNA2003) [RFC3490], 120 [RFC3491], [RFC3492], [RFC3454] describes a mechanism for encoding 121 Unicode labels making up Internationalized Domain Names (IDNs) as 122 standard DNS labels. The labels were processed using a method called 123 Nameprep [RFC3491] and Punycode [RFC3492]. That method was specific 124 to IDNA2003, but is generalized as Stringprep [RFC3454]. The general 125 mechanism is used by other protocols with similar needs, but with 126 different constraints than IDNA2003. 128 Stringprep defines a framework within which protocols define their 129 Stringprep profiles. Known IETF specifications using Stringprep are 130 listed below: 131 o The Nameprep profile [RFC3490] for use in Internationalized Domain 132 Names (IDNs); 133 o NFSv4 [RFC3530] and NFSv4.1 [RFC5661]; 134 o The iSCSI profile [RFC3722] for use in Internet Small Computer 135 Systems Interface (iSCSI) Names; 136 o EAP [RFC3748]; 137 o The Nodeprep and Resourceprep profiles [RFC3920] for use in the 138 Extensible Messaging and Presence Protocol (XMPP), and the XMPP to 139 CPIM mapping [RFC3922] (the latter of these relies on the former); 140 o The Policy MIB profile [RFC4011] for use in the Simple Network 141 Management Protocol (SNMP); 142 o The SASLprep profile [RFC4013] for use in the Simple 143 Authentication and Security Layer (SASL), and SASL itself 144 [RFC4422]; 145 o TLS [RFC4279]; 146 o IMAP4 using SASLprep [RFC4314]; 147 o The trace profile [RFC4505] for use with the SASL ANONYMOUS 148 mechanism; 149 o The LDAP profile [RFC4518] for use with LDAP [RFC4511] and its 150 authentication methods [RFC4513]; 151 o Plain SASL using SASLprep [RFC4616]; 152 o NNTP using SASLprep [RFC4643]; 153 o PKIX subject identification using LDAPprep [RFC4683]; 154 o Internet Application Protocol Collation Registry [RFC4790]; 155 o SMTP Auth using SASLprep [RFC4954]; 156 o POP3 Auth using SASLprep [RFC5034]; 157 o TLS SRP using SASLprep [RFC5054]; 158 o IRI and URI in XMPP [RFC5122]; 159 o PKIX CRL using LDAPprep [RFC5280]; 160 o IAX using Nameprep [RFC5456]; 161 o SASL SCRAM using SASLprep [RFC5802]; 162 o Remote management of Sieve using SASLprep [RFC5804]; 163 o The unicode-casemap Unicode Collation [RFC5051]. 165 However, a review [1] of these protocol specifications found that 166 they are very similar and can be grouped into a short number of 167 classes. Moreover, many reuse the same Stringprep profile, such as 168 the SASL one. 170 IDNA2003 was replaced because of some limitations described in 171 [RFC4690]. The new IDN specification, called IDNA2008 [RFC5890], 172 [RFC5891], [RFC5892], [RFC5893] was designed based on the 173 considerations found in [RFC5894]. One of the effects of IDNA2008 is 174 that Nameprep and Stringprep are not used at all. Instead, an 175 algorithm based on Unicode properties of codepoints is defined. That 176 algorithm generates a stable and complete table of the supported 177 Unicode codepoints for each Unicode version. This algorithm is based 178 on an inclusion-based approach, instead of the exclusion-based 179 approach of Stringprep/Nameprep. 181 This document lists the shortcomings and issues found by protocols 182 listed above that defined Stringprep profiles. It also lists the 183 requirements for any potential replacement of Stringprep. 185 2. Conventions 187 This document uses the Unicode convention [2] to specify Unicode 188 codepoint with the following syntax: U+ABCD where ABCD is the 189 codepoint in hexadecimal. 191 3. Stringprep Profiles Limitations 193 During IETF 77, a BOF [3] discussed the current state of the 194 protocols that have defined Stringprep profiles [NEWPREP]. The main 195 conclusions from that discussion were as follows: 196 o Stringprep is bound to version 3.2 of Unicode. Stringprep has not 197 been updated to new versions of Unicode. Therefore, the protocols 198 using Stringprep are stuck to Unicode 3.2. 199 o The protocols need to be updated to support new versions of 200 Unicode. The protocols would like to not be bound to a specific 201 version of Unicode, but rather have better Unicode agility in the 202 way of IDNA2008. This is important partly because it is usually 203 impossible for an application to require Unicode 3.2; the 204 application gets whatever version of Unicode is available on the 205 host. 206 o The protocols require better bidirectional support (bidi) than 207 currently offered by Stringprep. 209 o If the protocols are updated to use a new version of Stringprep or 210 another framework, then backward compatibility is an important 211 requirement. For example, Stringprep is based on and profiles may 212 use NFKC [UAX15], while IDNA2008 mostly uses NFC [UAX15]. 213 o Identifiers are passed between protocols. For example, the same 214 username string of codepoints may be passed between SASL, XMPP, 215 LDAP and EAP. Therefore, common set of rules or classes of 216 strings are preferred over specific rules for each protocol. 217 Without real planning in advance, many stringprep profiles reuse 218 other profiles, so this goal was accomplished by accident with 219 Stringprep. 221 Protocols that use Stringprep profiles use strings for different 222 purposes: 223 o XMPP uses a different Stringprep profile for each part of the XMPP 224 address (JID): a localpart which is similar to a username and used 225 for authentication, a domainpart which is a domain name and a 226 resource part which is less restrictive than the localpart. 227 o iSCSI uses a Stringprep profile for the IQN, which is very similar 228 to (often is) a DNS domain name. 229 o SASL and LDAP uses a Stringprep profile for usernames. 230 o LDAP uses a set of Stringprep profiles. 232 The consensus [4] of the BOF attendees is that it would be highly 233 desirable to have a replacement of Stringprep, with similar 234 characteristics to IDNA2008. That replacement should be defined so 235 that the protocols could use internationalized strings without a lot 236 of specialized internationalization work, since internationalization 237 expertise is not available in the respective protocols or working 238 groups. 240 4. Major Topics for Consideration 242 This section provides an overview of major topics that a Stringprep 243 replacement needs to address. The headings correspond roughly with 244 categories under which known Stringprep-using protocol RFCs have been 245 evaluated. For the details of those evaluations, see Appendix A. 247 4.1. Comparison 249 4.1.1. Types of Identifiers 251 Following [I-D.iab-identifier-comparison], it is possible to organize 252 identifiers into three classes in respect of how they may be compared 253 with one another: 255 Absolute Identifiers Identifiers that can be compared byte-by-byte 256 for equality. 257 Definite Identifiers Identifiers that have a well-defined comparison 258 algorithm on which all parties agree. 259 Indefinite Identifiers Identifiers that have no single comparison 260 algorithm on which all parties agree. 262 Definite Identifiers include cases like the comparison of Unicode 263 code points in different encodings: they do not match byte for byte, 264 but can all be converted to a single encoding which then does match 265 byte for byte. Indefinite Identifiers are sometimes algorithmically 266 comparable by well-specified subsets of parties. For more discussion 267 of these categories, see [I-D.iab-identifier-comparison]. 269 The section on treating the existing known cases, Appendix A uses the 270 categories above. 272 4.1.2. Effect of comparison 274 The three classes of comparison style outlined in Section 4.1.1 may 275 have different effects when applied. It is necessary to evaluate the 276 effects if a comparison results in a false positive, and what the 277 effects are if a comparison results in a false negative, especially 278 in terms of the consequences to security and usability. 280 4.2. Dealing with characters 282 This section outlines a range of issues having to do with characters 283 in the target protocols, and outlines the ways in which IDNA2008 284 might be a good analogy to other protocols, and ways in which it 285 might be a poor one. 287 4.2.1. Case folding, case sensitivity, and case preservation 289 In IDNA2003, labels are always mapped to lower case before the 290 Punycode transformation. In IDNA2008, there is no mapping at all: 291 input is either a valid U-label or it is not. At the same time, 292 upper-case characters are by definition not valid U-labels, because 293 they fall into the Unstable category (category B) of [RFC5892]. 295 If there are protocols that require upper and lower cases be 296 preserved, then the analogy with IDNA2008 will break down. 297 Accordingly, existing protocols are to be evaluated according to the 298 following criteria: 300 1. Does the protocol use case folding? For all blocks of code 301 points, or just for certain subsets? 303 2. Is the system or protocol case sensitive? 304 3. Does the system or protocol preserve case? 306 4.2.2. Stringprep and NFKC 308 Stringprep profiles may use normalization. If they do, they use NFKC 309 [UAX15] (most profiles do). It is not clear that NFKC is the right 310 normalization to use in all cases. In [UAX15], there is the 311 following observation regarding Normalization Forms KC and KD: "It is 312 best to think of these Normalization Forms as being like uppercase or 313 lowercase mappings: useful in certain contexts for identifying core 314 meanings, but also performing modifications to the text that may not 315 always be appropriate." For things like the spelling of users' 316 names, then, NFKC may not be the best form to use. At the same time, 317 one of the nice things about NFKC is that it deals with the width of 318 characters that are otherwise similar, by canonicalizing half-width 319 to full-width. This mapping step can be crucial in practice. A 320 replacement for stringprep depends on analyzing the different use 321 profiles and considering whether NFKC or NFC is a better 322 normalization for each profile. 324 For the purposes of evaluating an existing example of Stringprep use, 325 it is helpful to know whether it uses no normalization, NFKC, or NFC. 327 4.2.3. Character mapping 329 Along with the case mapping issues raised in Section 4.2.1, there is 330 the question of whether some characters are mapped either to other 331 characters or to nothing during Stringprep. [RFC3454], Section 3, 332 outlines a number of characters that are mapped to nothing, and also 333 permits Stringprep profiles to define their own mappings. 335 4.2.4. Prohibited characters 337 Along with case folding and other character mappings, many protocols 338 have characters that are simply disallowed. For example, control 339 characters and special characters such as "@" or "/" may be 340 prohibited in a protocol. 342 One of the primary changes of IDNA2008 is in the way it approaches 343 Unicode code points. IDNA2003 created an explicit list of excluded 344 or mapped-away characters; anything in Unicode 3.2 that was not so 345 listed could be assumed to be allowed under the protocol. IDNA2008 346 begins instead from the assumption that code points are disallowed, 347 and then relies on Unicode properties to derive whether a given code 348 point actually is allowed in the protocol. 350 Moreover, there is more than one class of "allowed in the protocol" 351 in IDNA2008 (but not in IDNA2003). While some code points are 352 disallowed outright, some are allowed only in certain contexts. The 353 reasons for the context-dependent rules have to do with the way some 354 characters are used. For instance, the ZERO WIDTH JOINER and ZERO 355 WIDTH NON-JOINER (ZWJ, U+200D and ZWNJ, U+200C) are allowed with 356 contextual rules because they are required in some circumstances, yet 357 are considered punctuation by Unicode and would therefore be 358 DISALLOWED under the usual IDNA2008 derivation rules. The goal of 359 IDNA2008 is to provide the widest repertoire of code points possible 360 and consistent with the traditional DNS LDH rule, trusting to the 361 operators of individual zones to make sensible (and usually more 362 restrictive) policies for their zones. 364 IDNA2008 may be a poor model for what other protocols ought to do in 365 this case, because it is designed to support an old protocol that is 366 designed to operate on the scale of the entire Internet. Moreover, 367 IDNA2008 is intended to be deployed without any change to the base 368 DNS protocol. Other protocols may aim at deployment in more local 369 environments, or may have protocol version negotiation built in. 371 4.2.5. Internal structure, delimiters, and special characters 373 IDNA2008 has a special problem with delimiters, because the delimiter 374 "character" in the DNS wire format is not really part of the data. 375 In DNS, labels are not separated exactly; instead, a label carries 376 with it an indicator that says how long the label is. When the label 377 is presented in presentation format as part of a fully qualified 378 domain name, the label separator FULL STOP, U+002E (.) is used to 379 break up the labels. But because that label separator does not 380 travel with the wire format of the domain name, there is no way to 381 encode a different, "internationalized" separator in IDNA2008. 383 Other protocols may include characters with similar special meaning 384 within the protocol. Common characters for these purposes include 385 FULL STOP, U+002E (.); COMMERCIAL AT, U+0040 (@); HYPHEN-MINUS, 386 U+002D (-); SOLIDUS, U+002F (/); and LOW LINE, U+005F (_). The mere 387 inclusion of such a character in the protocol is not enough for it to 388 be considered similar to another protocol using the same character; 389 instead, handling of the character must be taken into consideration 390 as well. 392 An important issue to tackle here is whether it is valuable to map to 393 or from these special characters as part of the Stringprep 394 replacement. In some locales, the analogue to FULL STOP, U+002E is 395 some other character, and users may expect to be able to substitute 396 their normal stop for FULL STOP, U+002E. At the same time, there are 397 predictability arguments in favour of treating identifiers with FULL 398 STOP, U+002E in them just the way they are treated under IDNA2008. 400 4.2.6. Restrictions because of glyph similarity 402 Homoglyphs are similarly (or identically) rendered glyphs of 403 different codepoints. For DNS names, homoglyphs may enable phishing. 404 If a protocol requires some visual comparison by end-users, then the 405 issue of homoglyphs are to be considered. In the DNS context, theses 406 issues are documented in [RFC5894] and [RFC4690]. IDNA2008 does not, 407 however, have a mechanism to deal with them, trusting to DNS zone 408 operators to enact sensible policies for the subset of Unicode they 409 wish to support, given their user community. A similar policy/ 410 protocol split may not be desirable in every protocol. 412 4.3. Where the data comes from and where it goes 414 4.3.1. User input and the source of protocol elements 416 Some protocol elements are provided by users, and others are not. 417 Those that are not may presumably be subject to greater restrictions, 418 whereas those that users provide likely need to permit the broadest 419 range of code points. The following questions are helpful: 421 1. Do users input the strings directly? 422 2. If so, how? (keyboard, stylus, voice, copy-paste, etc.) 423 3. Where do we place the dividing line between user interface and 424 protocol? (see [RFC5895]) 426 4.3.2. User output 428 Just as only some protocol elements are expected to be entered 429 directly by users, only some protocol elements are intended to be 430 consumed directly by users. It is important to know how users are 431 expected to be able to consume the protocol elements, because 432 different environments present different challenges. An element that 433 is only ever delivered as part of a vCard remains in machine-readable 434 format, so the problem of visual confusion is not a great one. Is 435 the protocol element published as part of a vCard, a web directory, 436 on a business card, or on "the side of a bus"? Do users use the 437 protocol element as an identifier (which means that they might enter 438 it again in some other context)? (See also Section 4.2.6.) 440 4.3.3. Operations 442 Some strings are useful as part of the protocol but are not used as 443 input to other operations (for instance, purely informative or 444 descriptive text). Other strings are used directly as input to other 445 operations (such as cryptographic hash functions), or are used 446 together with other strings to (such as concatenating a string with 447 some others to form a unique identifier). 449 4.3.3.1. String classes 451 Strings often have a similar function in different protocols. For 452 instance, many different protocols contain user identifiers or 453 passwords. A single profile for all such uses might be desirable. 455 Often, a string in a protocol is effectively a protocol element from 456 another protocol. For instance, different systems might use the same 457 credentials database for authentication. 459 4.3.3.2. Community Considerations 461 A Stringprep replacement that does anything more than just update 462 Stringprep to the latest version of Unicode will probably entail some 463 changes. It is important to identify the willingness of the 464 protocol-using community to accept backwards-incompatible changes. 465 By the same token, it is important to evaluate the desire of the 466 community for features not available under Stringprep. 468 4.3.3.3. Unicode Incompatible Changes 470 IDNA2008 uses an algorithm to derive the validity of a Unicode code 471 point for use under IDNA2008. It does this by using the properties 472 of each code point to test its validity. 474 This approach depends crucially on the idea that code points, once 475 valid for a protocol profile, will not later be made invalid. That 476 is not a guarantee currently provided by Unicode. Properties of code 477 points may change between versions of Unicode. Rarely, such a change 478 could cause a given code point to become invalid under a protocol 479 profile, even though the code point would be valid with an earlier 480 version of Unicode. This is not merely a theoretical possibility, 481 because it has occurred ([RFC6452]). 483 Accordingly, as IDNA2008,a Stringprep replacement that intends to be 484 Unicode version agnostic will need to work out a mechanism to address 485 cases where incompatible changes occur because of new Unicode 486 versions. 488 4.3.4. Some useful classes of strings 490 With the above considerations in hand, we can usefully classify 491 strings into the following categories: 493 DomainClass Strings that are intended for use in a domain name slot, 494 as defined in [RFC5890]. Note that strings of DomainClass could 495 be used outside a domain name slot: the question here is what the 496 eventual intended use for the string is, and not whether the 497 string is actually functioning as a domain name at any moment. 498 NameClass Strings that are intended for use as identifiers but that 499 are not DomainClass strings. NameClass strings are normally 500 public data within the protocol where they are used: these are 501 intended as identifiers that can be passed around to identify 502 something. 503 FreeClass Strings that are intended to be used by the protocol as 504 free-form strings, but that have some significant handling within 505 the protocol. This includes things that are normally not public 506 data in a protocol (like passwords), and things that might have 507 additional restrictions within the protocol in question, such as a 508 friendly name in a chat room. 510 5. Considerations for Stringprep replacement 512 The above suggests the following guidance for replacing Stringprep: 513 o A stringprep replacement should be defined. 514 o The replacement should take an approach similar to IDNA2008, (e.g. 515 by using codepoint properties instead of codepoint whitelisting) 516 in that it enables better Unicode agility. 517 o Protocols share similar characteristics of strings. Therefore, 518 defining i18n preparation algorithms for the smallest set of 519 string classes may be sufficient for most cases, providing 520 coherence among a set of related protocols or protocols where 521 identifiers are exchanged. 522 o The sets of string classes need to be evaluated according to the 523 considerations that make up the headings in Section 4 524 o It is reasonable to limit scope to Unicode code points, and rule 525 the mapping of data from other character encodings outside the 526 scope of this effort. 527 o Recommendations for handling protocol incompatibilities resulting 528 from changes to Unicode are required. 529 o Comptability within each protocol between a technique that is 530 stringprep-based and the technique's replacement has to be 531 considered very carefully. 533 Existing deployments already depend on Stringprep profiles. 534 Therefore, a replacement must consider the effects of any new 535 strategy on existing deployments. By way of comparison, it is worth 536 noting that some characters were acceptable in IDNA labels under 537 IDNA2003, but are not protocol-valid under IDNA2008 (and conversely); 538 disagreement about what to do during the transition has resulted in 539 different approaches to mapping. Different implementers may make 540 different decisions about what to do in such cases; this could have 541 interoperability effects. It is necessary to trade better support 542 for different linguistic environments against the potential side 543 effects of backward incompatibility. 545 6. Security Considerations 547 This document merely states what problems are to be solved, and does 548 not define a protocol. There are undoubtedly security implications 549 of the particular results that will come from the work to be 550 completed. 552 7. IANA Considerations 554 This document has no actions for IANA. 556 8. Discussion home for this draft 558 Note: RFC-Editor, please remove this section before publication. 560 This document is intended to define the problem space discussed on 561 the precis@ietf.org mailing list. 563 9. Acknowledgements 565 This document is the product of the PRECIS IETF Working Group, and 566 participants in that Working Group were helpful in addressing issues 567 with the text. 569 Specific contributions came from David Black, Alan DeKok, Bill 570 McQuillan, Alexey Melnikov, Peter Saint-Andre, Dave Thaler, and 571 Yoshiro Yoneya. 573 Dave Thaler provided the "buckets" insight in Section 4.1.1, central 574 to the organization of the problem. 576 Evaluations of Stringprep profiles that are included in Appendix B 577 were done by: David Black, Alexey Melnikov, Peter Saint-Andre, Dave 578 Thaler. 580 10. Informative References 582 [I-D.iab-identifier-comparison] 583 Thaler, D., "Issues in Identifier Comparison for Security 584 Purposes", draft-iab-identifier-comparison-00 (work in 585 progress), July 2011. 587 [NEWPREP] "Newprep BoF Meeting Minutes", March 2010. 589 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 590 Internationalized Strings ("stringprep")", RFC 3454, 591 December 2002. 593 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 594 "Internationalizing Domain Names in Applications (IDNA)", 595 RFC 3490, March 2003. 597 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 598 Profile for Internationalized Domain Names (IDN)", 599 RFC 3491, March 2003. 601 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 602 for Internationalized Domain Names in Applications 603 (IDNA)", RFC 3492, March 2003. 605 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 606 Beame, C., Eisler, M., and D. Noveck, "Network File System 607 (NFS) version 4 Protocol", RFC 3530, April 2003. 609 [RFC3722] Bakke, M., "String Profile for Internet Small Computer 610 Systems Interface (iSCSI) Names", RFC 3722, April 2004. 612 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. 613 Levkowetz, "Extensible Authentication Protocol (EAP)", 614 RFC 3748, June 2004. 616 [RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence 617 Protocol (XMPP): Core", RFC 3920, October 2004. 619 [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and 620 Presence Protocol (XMPP) to Common Presence and Instant 621 Messaging (CPIM)", RFC 3922, October 2004. 623 [RFC4011] Waldbusser, S., Saperia, J., and T. Hongal, "Policy Based 624 Management MIB", RFC 4011, March 2005. 626 [RFC4013] Zeilenga, K., "SASLprep: Stringprep Profile for User Names 627 and Passwords", RFC 4013, February 2005. 629 [RFC4279] Eronen, P. and H. Tschofenig, "Pre-Shared Key Ciphersuites 630 for Transport Layer Security (TLS)", RFC 4279, 631 December 2005. 633 [RFC4314] Melnikov, A., "IMAP4 Access Control List (ACL) Extension", 634 RFC 4314, December 2005. 636 [RFC4422] Melnikov, A. and K. Zeilenga, "Simple Authentication and 637 Security Layer (SASL)", RFC 4422, June 2006. 639 [RFC4505] Zeilenga, K., "Anonymous Simple Authentication and 640 Security Layer (SASL) Mechanism", RFC 4505, June 2006. 642 [RFC4511] Sermersheim, J., "Lightweight Directory Access Protocol 643 (LDAP): The Protocol", RFC 4511, June 2006. 645 [RFC4513] Harrison, R., "Lightweight Directory Access Protocol 646 (LDAP): Authentication Methods and Security Mechanisms", 647 RFC 4513, June 2006. 649 [RFC4518] Zeilenga, K., "Lightweight Directory Access Protocol 650 (LDAP): Internationalized String Preparation", RFC 4518, 651 June 2006. 653 [RFC4616] Zeilenga, K., "The PLAIN Simple Authentication and 654 Security Layer (SASL) Mechanism", RFC 4616, August 2006. 656 [RFC4643] Vinocur, J. and K. Murchison, "Network News Transfer 657 Protocol (NNTP) Extension for Authentication", RFC 4643, 658 October 2006. 660 [RFC4683] Park, J., Lee, J., Lee, H., Park, S., and T. Polk, 661 "Internet X.509 Public Key Infrastructure Subject 662 Identification Method (SIM)", RFC 4683, October 2006. 664 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 665 Recommendations for Internationalized Domain Names 666 (IDNs)", RFC 4690, September 2006. 668 [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet 669 Application Protocol Collation Registry", RFC 4790, 670 March 2007. 672 [RFC4954] Siemborski, R. and A. Melnikov, "SMTP Service Extension 673 for Authentication", RFC 4954, July 2007. 675 [RFC5034] Siemborski, R. and A. Menon-Sen, "The Post Office Protocol 676 (POP3) Simple Authentication and Security Layer (SASL) 677 Authentication Mechanism", RFC 5034, July 2007. 679 [RFC5051] Crispin, M., "i;unicode-casemap - Simple Unicode Collation 680 Algorithm", RFC 5051, October 2007. 682 [RFC5054] Taylor, D., Wu, T., Mavrogiannopoulos, N., and T. Perrin, 683 "Using the Secure Remote Password (SRP) Protocol for TLS 684 Authentication", RFC 5054, November 2007. 686 [RFC5122] Saint-Andre, P., "Internationalized Resource Identifiers 687 (IRIs) and Uniform Resource Identifiers (URIs) for the 688 Extensible Messaging and Presence Protocol (XMPP)", 689 RFC 5122, February 2008. 691 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 692 Housley, R., and W. Polk, "Internet X.509 Public Key 693 Infrastructure Certificate and Certificate Revocation List 694 (CRL) Profile", RFC 5280, May 2008. 696 [RFC5456] Spencer, M., Capouch, B., Guy, E., Miller, F., and K. 697 Shumard, "IAX: Inter-Asterisk eXchange Version 2", 698 RFC 5456, February 2010. 700 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 701 System (NFS) Version 4 Minor Version 1 Protocol", 702 RFC 5661, January 2010. 704 [RFC5802] Newman, C., Menon-Sen, A., Melnikov, A., and N. Williams, 705 "Salted Challenge Response Authentication Mechanism 706 (SCRAM) SASL and GSS-API Mechanisms", RFC 5802, July 2010. 708 [RFC5804] Melnikov, A. and T. Martin, "A Protocol for Remotely 709 Managing Sieve Scripts", RFC 5804, July 2010. 711 [RFC5890] Klensin, J., "Internationalized Domain Names for 712 Applications (IDNA): Definitions and Document Framework", 713 RFC 5890, August 2010. 715 [RFC5891] Klensin, J., "Internationalized Domain Names in 716 Applications (IDNA): Protocol", RFC 5891, August 2010. 718 [RFC5892] Faltstrom, P., "The Unicode Code Points and 719 Internationalized Domain Names for Applications (IDNA)", 720 RFC 5892, August 2010. 722 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 723 Internationalized Domain Names for Applications (IDNA)", 724 RFC 5893, August 2010. 726 [RFC5894] Klensin, J., "Internationalized Domain Names for 727 Applications (IDNA): Background, Explanation, and 728 Rationale", RFC 5894, August 2010. 730 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for 731 Internationalized Domain Names in Applications (IDNA) 732 2008", RFC 5895, September 2010. 734 [RFC6452] Faltstrom, P. and P. Hoffman, "The Unicode Code Points and 735 Internationalized Domain Names for Applications (IDNA) - 736 Unicode 6.0", RFC 6452, November 2011. 738 [UAX15] "Unicode Standard Annex #15: Unicode Normalization Forms", 739 UAX 15, September 2009. 741 [1] 743 [2] 745 [3] 747 [4] 749 [5] 751 Appendix A. Classification of Stringprep Profiles 753 A number of the known cases of Stringprep use were evaluated during 754 the preparation of this document. The known cases are here described 755 in two ways. The types of identifiers the protocol uses is first 756 called out in the ID type column (from Section 4.1.1), using the 757 short forms "a" for Absolute, "d" for Definite, and "i" for 758 Indefinite. Next, there is a column that contains an "i" if the 759 protocol string comes from user input, an "o" if the protocol string 760 becomes user-facing output, "b" if both are true, and "n" if neither 761 is true. The remaining columns have an "x" if and only if the 762 protocol uses that class, as described in Section 4.3.4. Values 763 marked "-" indicate that an answer is not useful; in this case, see 764 detailed discussion in Appendix B. 766 +------+--------+-------+-------------+-----------+-----------+ 767 | RFC | IDtype | User? | DomainClass | NameClass | FreeClass | 768 +------+--------+-------+-------------+-----------+-----------+ 769 | 3722 | a | o | | x | x | 770 | 3748 | - | - | - | x | - | 771 | 3920 | a,d | b | | x | x | 772 | 4505 | a | i | | | x | 773 | 4314 | a,d | b | | x | x | 774 | 4954 | a,d | b | | x | | 775 | 5034 | a,d | b | | x | | 776 | 5804 | a,d | b | | x | | 777 +------+--------+-------+-------------+-----------+-----------+ 779 Table 1 781 [[anchor22: This table now contains results of any reviews the WG 782 did. Unreviewed things in the tracker are not reflected here. 783 --ajs@anvilwalrusden.com]] 785 Appendix B. Evaluation of Stringprep Profiles 787 This section is a summary of the evaluation of Stringprep 788 profiles [5] that was done to get a good understanding of the usage 789 of Stringprep. This summary is by no means normative nor the actual 790 evaluations themselves. A template was used for reviewers to get a 791 coherent view of all evaluations. 793 B.1. iSCSI Stringprep Profiles: RFC3722, RFC3721, RFC3720 795 Description: An iSCSI session consists of an Initiator (i.e., host 796 or server that uses storage) communicating with a target (i.e., a 797 storage array or other system that provides storage). Both the 798 iSCSI initiator and target are named by iSCSI Names. The iSCSI 799 stringprep profile is used for iSCSI names. 800 How it is used iSCSI initiators and targets (see above). They can 801 also be used to identify SCSI ports (these are software entities 802 in the iSCSI protocol, not hardware ports), and iSCSI logical 803 units (storage volumes), although both are unusual in practice. 804 What entities create these identifiers? Generally a Human user (1) 805 configures an Automated system (2) that generates the names. 806 Advance configuration of the system is required due to the 807 embedded use of external unique identifier (from the DNS or IEEE). 808 How is the string input in the system? Keyboard and copy-paste are 809 common. Copy-paste is common because iSCSI names are long enough 810 to be problematic for humans to remember, causing use of email, 811 sneaker-net, text files, etc. to avoid mistype mistakes. 813 Where do we place the dividing line between user interface and 814 protocol? The iSCSI protocol requires that all i18n string 815 preparation occur in the user interface. The iSCSI protocol 816 treats iSCSI names as opaque identifiers that are compared byte- 817 by-byte for equality. iSCSI names are generally not checked for 818 correct formatting by the protocol. 819 What entities enforce the rules? There are no iSCSI-specific 820 enforcement entities, although the use of unique identifier 821 information in the names relies on DNS registrars and the IEEE 822 Registration Authority. 823 Comparison Byte-by-byte 824 Case Folding, Sensitivity, Preservation Case folding is required for 825 the code blocks specified in RFC 3454, Table B.2. The overall 826 iSCSI naming system (UI + protocol) is case-insensitive. 827 What is the impact if the comparison results in a false positive? 828 Potential access to the wrong storage. - If the initiator has no 829 access to the wrong storage, an authentication failure is the 830 probable result. - If the initiator has access to the worng 831 storage, the resulting mis-identificaiton could result in use of 832 the wrong data and possible corruption of stored data. 833 What is the impact if the comparison results in a false negative? 834 Denial of authorized storage access. 835 What are the security impacts? iSCSI names are often used as the 836 authentication identities for storage systems. Comparison 837 problems could result in authentication problems, although note 838 that authentication failure ameliorates some of the false positive 839 cases. 840 Normalization NFKC, as specified by RFC 3454. 841 Mapping Yes, as specified by table B.1 in RFC 3454 842 Disallowed Characters Only the following characters are allowed: - 843 ASCII dash, dot, colon - ASCII lower case letters and digits - 844 Unicode lower case characters as specified by RFC 3454 All other 845 characters are disallowed. 846 Which other strings or identifiers are these most similar to? None - 847 iSCSI names are unique to iSCSI. 848 Are these strings or identifiers sometimes the same as strings or 849 identifiers from other protocols? No 850 Does the identifier have internal structure that needs to be 851 respected? Yes - ASCII dot, dash and colon are used for internal 852 name structure. These are not reserved characters in that they 853 can occur in the name in locations other than those used for 854 structuring purposes (e.g., only the first occurrence of a colon 855 character is structural, others are not). 856 How are users exposed to these strings? How are they published? 857 iSCSI names appear in server and storage system configuration 858 interfaces. They also appear in system logs. 860 Is the string / identifier used as input to other operations? 861 Effectively, no. The rarely used port and logical unit names 862 involve concatenation, which effectively extends a unique iSCSI 863 Name for a target to uniquely identify something within that 864 target. 865 How much tolerance for change from existing stringprep approach? 866 Good tolerance; the community would prefer that i18n experts solve 867 i18n problems ;-). 868 How strong a desire for change (e.g., for Unicode agility)? Unicode 869 agility is desired in principle as long as nothing significant 870 breaks. 872 B.2. SMTP/POP3/ManageSieve Stringprep Profiles: RFC4954,RFC5034,RFC 873 5804 875 Description: Authorization identity (user identifier) exchanged 876 during SASL authentication: AUTH (SMTP/POP3) or AUTHENTICATE 877 (ManageSieve) command. 878 How It's Used: Used for proxy authorization, e.g. to [lawfully] 879 impersonate a particular user after a privileged authentication 880 Who Generates It: Typically generated by email system administrators 881 using some tools/conventions, sometimes from some backend 882 database. - In some setups human users can register own usernames 883 (e.g. webmail self registration) 884 User Input Methods: - Typed by user / selected from a list - Copy- 885 and-paste - Perhaps voice input - Can also be specified in 886 configuration files or on a command line 887 Enforcement: - Rules enforced by server / add-on service (e.g., 888 gateway service) on registration of account 889 Comparison Method: "Type 1" (byte-for-byte) or "type 2" (compare by 890 a common algorithm that everyone agrees on (e.g., normalize and 891 then compare the result byte-by-byte)) 892 Case Folding, Sensitivity, Preservation: Most likely case sensitive. 893 Exact requirements on case-sensitivity/case-preservation depend on 894 a specific implementation, e.g. an implementation might treat all 895 user identifiers as case insensitive (or case insensitive for US- 896 ASCII subset only). 897 Impact of Comparison: False positives: - an unauthorized user is 898 allowed email service access (login) False negatives: - an 899 authorized user is denied email service access 900 Normalization: NFKC (as per RFC 4013) 901 Mapping: (see Section 2 of RFC 4013 for the full list): Non ASCII 902 spaces are mapped to space, etc. 903 Disallowed Characters: (see Section 2 of RFC 4013 for the full 904 list): Unicode Control characters, etc. 906 String Classes: - simple username. See Section 2 of RFC 4013 for 907 details on restrictions. Note that some implementations allow 908 spaces in these. While implementations are not required to use a 909 specific format, an authorization identity frequently has the same 910 format as an email address (and EAI email address in the future), 911 or as a left hand side of an email address. Note: whatever is 912 recommended for SMTP/POP/ManageSieve authorization identity should 913 also be used for IMAP authorization identities, as IMAP/POP3/SMTP/ 914 ManageSieve are frequently implemented together. 915 Internal Structure: None 916 User Output: Unlikely, but possible. For example, if it is the same 917 as an email address. 918 Operations: - Sometimes concatenated with other data and then used 919 as input to a cryptographic hash function 920 How much tolerance for change from existing stringprep approach? Not 921 sure. 922 Background information: In RFC 5034, when describing the POP3 AUTH 923 command: The authorization identity generated by the SASL exchange 924 is a simple username, and SHOULD use the SASLprep profile (see 925 [RFC4013]) of the StringPrep algorithm (see [RFC3454]) to prepare 926 these names for matching. If preparation of the authorization 927 identity fails or results in an empty string (unless it was 928 transmitted as the empty string), the server MUST fail the 929 authentication. In RFC 4954, when describing the SMTP AUTH 930 command: The authorization identity generated by this [SASL] 931 exchange is a "simple username" (in the sense defined in 932 [SASLprep]), and both client and server SHOULD (*) use the 933 [SASLprep] profile of the [StringPrep] algorithm to prepare these 934 names for transmission or comparison. If preparation of the 935 authorization identity fails or results in an empty string (unless 936 it was transmitted as the empty string), the server MUST fail the 937 authentication. (*) Note: Future revision of this specification 938 may change this requirement to MUST. Currently, the SHOULD is 939 used in order to avoid breaking the majority of existing 940 implementations. In RFC 5804, when describing the ManageSieve 941 AUTHENTICATE command: The authorization identity generated by this 942 [SASL] exchange is a "simple username" (in the sense defined in 943 [SASLprep]), and both client and server MUST use the [SASLprep] 944 profile of the [StringPrep] algorithm to prepare these names for 945 transmission or comparison. If preparation of the authorization 946 identity fails or results in an empty string (unless it was 947 transmitted as the empty string), the server MUST fail the 948 authentication. 950 B.3. IMAP Stringprep Profiles: RFC5738, RFC4314: Usernames 951 Evaluation Note These documents have 2 types of strings (usernames 952 and passwords), so there are two separate templates. 953 Description: "username" parameter to the IMAP LOGIN command, 954 identifiers in IMAP ACL commands. Note that any valid username is 955 also an IMAP ACL identifier, but IMAP ACL identifiers can include 956 other things like name of group of users. 957 How It's Used: Used for authentication (Usernames), or in IMAP 958 Access Control Lists (Usernames or Group names) 959 Who Generates It: - Typically generated by email system 960 administrators using some tools/conventions, sometimes from some 961 backend database. - In some setups human users can register own 962 usernames (e.g. webmail self registration) 963 User Input Methods: - Typed by user / selected from a list - Copy- 964 and-paste - Perhaps voice input - Can also be specified in 965 configuration files or on a command line 966 Enforcement: - Rules enforced by server / add-on service (e.g., 967 gateway service) on registration of account 968 Comparison Method: Type 1" (byte-for-byte) or "type 2" (compare by a 969 common algorithm that everyone agrees on (e.g., normalize and then 970 compare the result byte-by-byte)) 971 Case Folding, Sensitivity, Preservation: - Most likely case 972 sensitive. Exact requirements on case-sensitivity/ 973 case-preservation depend on a specific implementation, e.g. an 974 implementation might treat all user identifiers as case 975 insensitive (or case insensitive for US-ASCII subset only). 976 Impact of Comparison: False positives: - an unauthorized user is 977 allowed IMAP access (login) - improperly grant privileges (e.g., 978 access to a specific mailbox, ability to manage ACLs for a 979 mailbox) False negatives: - an authorized user is denied IMAP 980 access - unable to use granted privileges (e.g., access to a 981 specific mailbox, ability to manage ACLs for a mailbox) 982 Normalization: NFKC (as per RFC 4013) 983 Mapping: (see Section 2 of RFC 4013 for the full list): non ASCII 984 spaces are mapped to space 985 Disallowed Characters: (see Section 2 of RFC 4013 for the full 986 list): Unicode Control characters, etc. 987 String Classes: - simple username. See Section 2 of RFC 4013 for 988 details on restrictions. Note that some implementations allow 989 spaces in these. While IMAP implementations are not required to 990 use a specific format, an IMAP username frequently has the same 991 format as an email address (and EAI email address in the future), 992 or as a left hand side of an email address. Note: whatever is 993 recommended for IMAP username should also be used for ManageSieve, 994 POP3 and SMTP authorization identities, as IMAP/POP3/SMTP/ 995 ManageSieve are frequently implemented together. 997 Internal Structure: None 998 User Output: Unlikely, but possible. For example, if it is the same 999 as an email address. - access control lists (e.g. in IMAP ACL 1000 extension), both when managing membership and listing membership 1001 of existing access control lists. - often show up as mailbox names 1002 (under Other Users IMAP namespace) 1003 Operations: - Sometimes concatenated with other data and then used 1004 as input to a cryptographic hash function 1005 How much tolerance for change from existing stringprep approach? Not 1006 sure. Non-ASCII IMAP usernames are currently prohibited by IMAP 1007 (RFC 3501). However they are allowed when used in IMAP ACL 1008 extension. 1010 B.4. IMAP Stringprep Profiles: RFC5738: Passwords 1012 Description: "Password" parameter to the IMAP LOGIN command 1013 How It's Used: Used for authentication (Passwords) 1014 Who Generates It: Either generated by email system administrators 1015 using some tools/conventions, or specified by the human user. 1016 User Input Methods: - Typed by user - Copy-and-paste - Perhaps voice 1017 input - Can also be specified in configuration files or on a 1018 command line 1019 Enforcement: Rules enforced by server / add-on service (e.g., 1020 gateway service or backend databse) on registration of account 1021 Comparison Method: "Type 1" (byte-for-byte) 1022 Case Folding, Sensitivity, Preservation: Most likely case sensitive. 1023 Impact of Comparison: False positives: - an unauthorized user is 1024 allowed IMAP access (login) False negatives: - an authorized user 1025 is denied IMAP access 1026 Normalization: NFKC (as per RFC 4013) 1027 Mapping: (see Section 2 of RFC 4013 for the full list): non ASCII 1028 spaces are mapped to space 1029 Disallowed Characters: (see Section 2 of RFC 4013 for the full 1030 list): Unicode Control characters, etc. 1031 String Classes: Currently defined as "simple username" (see Section 1032 2 of RFC 4013 for details on restrictions.), however this is 1033 likely to be a different class from usernames. Note that some 1034 implementations allow spaces in these. Password in all email 1035 related protocols should be treated in the same way. Same 1036 passwords are frequently shared with web, IM, etc. applications. 1037 Internal Structure: None 1038 User Output: - text of email messages (e.g. in "you forgot your 1039 password" email messages) - web page / directory - side of the bus 1040 / in ads -- possible 1042 Operations: Sometimes concatenated with other data and then used as 1043 input to a cryptographic hash function. Frequently stored as is, 1044 or hashed. 1045 How much tolerance for change from existing stringprep approach? Not 1046 sure. Non-ASCII IMAP passwords are currently prohibited by IMAP 1047 (RFC 3501), however they are likely to be in widespread use. 1048 Background information: RFC 5738 (IMAP I18N): 5. UTF8=USER 1049 Capability If the "UTF8=USER" capability is advertised, that 1050 indicates the server accepts UTF-8 user names and passwords and 1051 applies SASLprep [RFC4013] to both arguments of the LOGIN command. 1052 The server MUST reject UTF-8 that fails to comply with the formal 1053 syntax in RFC 3629 [RFC3629] or if it encounters Unicode 1054 characters listed in Section 2.3 of SASLprep RFC 4013 [RFC4013]. 1055 RFC 4314 (IMAP4 Access Control List (ACL) Extension): 3. Access 1056 control management commands and responses Servers, when processing 1057 a command that has an identifier as a parameter (i.e., any of 1058 SETACL, DELETEACL, and LISTRIGHTS commands), SHOULD first prepare 1059 the received identifier using "SASLprep" profile [SASLprep] of the 1060 "stringprep" algorithm [Stringprep]. If the preparation of the 1061 identifier fails or results in an empty string, the server MUST 1062 refuse to perform the command with a BAD response. Note that 1063 Section 6 recommends additional identifier's verification steps. 1064 and in Section 6: This document relies on [SASLprep] to describe 1065 steps required to perform identifier canonicalization 1066 (preparation). The preparation algorithm in SASLprep was 1067 specifically designed such that its output is canonical, and it is 1068 well-formed. However, due to an anomaly [PR29] in the 1069 specification of Unicode normalization, canonical equivalence is 1070 not guaranteed for a select few character sequences. Identifiers 1071 prepared with SASLprep can be stored and returned by an ACL 1072 server. The anomaly affects ACL manipulation and evaluation of 1073 identifiers containing the selected character sequences. These 1074 sequences, however, do not appear in well-formed text. In order 1075 to address this problem, an ACL server MAY reject identifiers 1076 containing sequences described in [PR29] by sending the tagged BAD 1077 response. This is in addition to the requirement to reject 1078 identifiers that fail SASLprep preparation as described in Section 1079 3. 1081 B.5. Anonymous SASL Stringprep Profiles: RFC4505 1083 Description: RFC 4505 defines a "trace" field: 1084 Comparison: this field is not intended for comparison (only used for 1085 logging) 1087 Case folding; case sensitivity, preserve case: No case folding/case 1088 sensitive 1089 Do users input the strings directly? Yes. Possibly entered in 1090 configuration UIs, or on a command line. Can also be stored in 1091 configuration files. The value can also be automatically 1092 generated by clients (e.g. a fixed string is used, or a user's 1093 email address). 1094 How users input strings? Keyboard/voice, stylus (pick from a list). 1095 Copy-paste - possibly. 1096 Normalization: None 1097 Disallowed Characters Control characters are disallowed. (See 1098 Section 3 of RFC 4505) 1099 Which other strings or identifiers are these most similar to? RFC 1100 4505 says that the trace "should take one of two forms: an 1101 Internet email address, or an opaque string that does not contain 1102 the '@' U+0040) character and that can be interpreted by the 1103 system administrator of the client's domain." In practice, this 1104 is a freeform text, so it belongs to a different class from "email 1105 address" or "username". 1106 Are these strings or identifiers sometimes the same as strings or 1107 identifiers from other protocols (e.g., does an IM system sometimes 1108 use the same credentials database for authentication as an email 1109 system)? Yes: see above. However there is no strong need to keep 1110 them consistent in the future. 1111 How are users exposed to these strings, how are they published? No. 1112 However, The value can be seen in server logs 1113 Impacts of false positives and false negatives: False positive: a 1114 user can be confused with another user. False negative: two 1115 distinct users are treated as the same user. But note that the 1116 trace field is not authenticated, so it can be easily falsified. 1117 Tolerance of changes in the community The community would be 1118 flexible. 1119 Delimiters No internal structure, but see comments above about 1120 frequent use of email addresses. 1121 Background information: The Anonymous Mechanism The mechanism 1122 consists of a single message from the client to the server. The 1123 client may include in this message trace information in the form 1124 of a string of [UTF-8]-encoded [Unicode] characters prepared in 1125 accordance with [StringPrep] and the "trace" stringprep profile 1126 defined in Section 3 of this document. The trace information, 1127 which has no semantical value, should take one of two forms: an 1128 Internet email address, or an opaque string that does not contain 1129 the '@' (U+0040) character and that can be interpreted by the 1130 system administrator of the client's domain. For privacy reasons, 1131 an Internet email address or other information identifying the 1132 user should only be used with permission from the user. 3. The 1133 "trace" Profile of "Stringprep" This section defines the "trace" 1134 profile of [StringPrep]. This profile is designed for use with 1135 the SASL ANONYMOUS Mechanism. Specifically, the client is to 1136 prepare the message production in accordance with this profile. 1137 The character repertoire of this profile is Unicode 3.2 [Unicode]. 1138 No mapping is required by this profile. No Unicode normalization 1139 is required by this profile. The list of unassigned code points 1140 for this profile is that provided in Appendix A of [StringPrep]. 1141 Unassigned code points are not prohibited. Characters from the 1142 following tables of [StringPrep] are prohibited: - C.2.1 (ASCII 1143 control characters) - C.2.2 (Non-ASCII control characters) - C.3 1144 (Private use characters) - C.4 (Non-character code points) - C.5 1145 (Surrogate codes) - C.6 (Inappropriate for plain text) - C.8 1146 (Change display properties are deprecated) - C.9 (Tagging 1147 characters) No additional characters are prohibited. This profile 1148 requires bidirectional character checking per Section 6 of 1149 [StringPrep]. 1151 B.6. XMPP Stringprep Profiles: RFC3920 Nodeprep 1153 Description: Localpart of JabberID ("JID"), as in: 1154 localpart@domainpart/resourcepart 1155 How It's Used: - Usernames (e.g., stpeter@jabber.org) - Chatroom 1156 names (e.g., precis@jabber.ietf.org) - Publish-subscribe nodes - 1157 Bot names 1158 Who Generates It: - Typically, end users via an XMPP client - 1159 Sometimes created in an automated fashion 1160 User Input Methods: - Typed by user - Copy-and-paste - Perhaps voice 1161 input - Clicking a URI/IRI 1162 Enforcement: - Rules enforced by server / add-on service (e.g., 1163 chatroom service) on registration of account, creation of room, 1164 etc. 1165 Comparison Method: "Type 2" (common algorithm) 1166 Case Folding, Sensitivity, Preservation: - Strings are always folded 1167 to lowercase - Case is not preserved 1168 Impact of Comparison: False positives: - unable to authenticate at 1169 server (or authenticate to wrong account) - add wrong person to 1170 buddy list - join the wrong chatroom - improperly grant privileges 1171 (e.g., chatroom admin) - subscribe to wrong pubsub node - interact 1172 with wrong bot - allow communication with blocked entity False 1173 negatives: - unable to authenticate - unable to add someone to 1174 buddy list - unable to join desired chatroom - unable to use 1175 granted privileges (e.g., chatroom admin) - unable to subscribe to 1176 desired pubsub node - unable to interact with desired bot - 1177 disallow communication with unblocked entity 1178 Normalization: NFKC 1179 Mapping: Spaces are mapped to nothing 1180 Disallowed Characters: ",&,',/,:,<,>,@ 1181 String Classes: - Often similar to generic username - Often similar 1182 to localpart of email address - Sometimes same as localpart of 1183 email address 1184 Internal Structure: None 1185 User Output: - vCard - email signature - web page / directory - text 1186 of message (e.g., in a chatroom) 1187 Operations: - Sometimes concatenated with other data and then used 1188 as input to a cryptographic hash function 1190 B.7. XMPP Stringprep Profiles: RFC3920 Resourceprep 1192 Description: - Resourcepart of JabberID ("JID"), as in: 1193 localpart@domainpart/resourcepart - Typically free-form text 1194 How It's Used: - Device / session names (e.g., 1195 stpeter@jabber.org/Home) - Nicknames (e.g., 1196 precis@jabber.ietf.org/StPeter) 1197 Who Generates It: - Often human users via an XMPP client - Often 1198 generated in an automated fashion by client or server 1199 User Input Methods: - Typed by user - Copy-and-paste - Perhaps voice 1200 input - Clicking a URI/IRI 1201 Enforcement: - Rules enforced by server / add-on service (e.g., 1202 chatroom service) on account login, joining a chatroom, etc. 1203 Comparison Method: "Type 2" (byte-for-byte) 1204 Case Folding, Sensitivity, Preservation: - Strings are never folded 1205 - Case is preserved 1206 Impact of Comparison: False positives: - interact with wrong device 1207 (e.g., for file transfer or voice call) - interact with wrong 1208 chatroom participant - improperly grant privileges (e.g., chatroom 1209 moderator) - allow communication with blocked entity False 1210 negatives: - unable to choose desired chatroom nick - unable to 1211 use granted privileges (e.g., chatroom moderator) - disallow 1212 communication with unblocked entity 1213 Normalization: NFKC 1214 Mapping: Spaces are mapped to nothing 1215 Disallowed Characters: None 1216 String Classes: Basically a free-form identifier 1217 Internal Structure: None 1218 User Output: - text of message (e.g., in a chatroom) - device names 1219 often not exposed to human users 1220 Operations: Sometimes concatenated with other data and then used as 1221 input to a cryptographic hash function 1223 B.8. EAP Stringprep Profiles: RFC3748 1224 Description: RFC 3748 section 5 references Stringprep, but the WG 1225 did not agree with the text (was added by IESG) and there are no 1226 known implementations that use Stringprep. The main problem with 1227 that text is that the use of strings is a per-method concept, not 1228 a generic EAP concept and so RFC 3748 itself does not really use 1229 Stringprep, but individual EAP methods could. As such, the 1230 answers to the template questions are mostly not applicable, but a 1231 few answers are universal across methods. The list of IANA 1232 registered EAP methods is at http://www.iana.org/assignments/ 1233 eap-numbers/eap-numbers.xml#eap-numbers-3 1234 Comparison Methods: n/a (per-method) 1235 Case Folding, Case Sensitivity, Case Preservation: n/a (per-method) 1236 Impact of comparison: A false positive results in unauthorized 1237 network access (and possibly theft of service if some else is 1238 billed). A false negative results in lack of authorized network 1239 access (no connectivity). 1240 User input: n/a (per-method) 1241 Normalization: n/a (per-method) 1242 Mapping: n/a (per-method) 1243 Disallowed characters: n/a (per-method) 1244 String classes: Although some EAP methods may use a syntax similar 1245 to other types of identifiers, EAP mandates that the actual values 1246 must not be assumed to be identifiers usable with anything else. 1247 Internal structure: n/a (per-method) 1248 User output: Identifiers are never human displayed except perhaps as 1249 they're typed by a human. 1250 Operations: n/a (per-method) 1251 Community considerations: There is no resistance to change for the 1252 base EAP protocol (as noted, the WG didn't want the existing 1253 text). However actual use of stringprep, if any, within specific 1254 EAP methods may have resistance. It is currently unknown whether 1255 any EAP methods use stringprep. 1257 Appendix C. Changes between versions 1259 Note to RFC Editor: This section should be removed prior to 1260 publication. 1262 C.1. 00 1264 First WG version. Based on 1265 draft-blanchet-precis-problem-statement-00. 1267 C.2. 01 1268 o Made clear that the document is talking only about Unicode code 1269 points, and not any particular encoding. 1270 o Substantially reorganized the document along the lines of the 1271 review template at . 1273 o Included specific questions for each topic for consideration. 1274 o Moved spot for individual protocol review to appendix. Not 1275 populated yet. 1277 C.3. 02 1279 o Cleared up details of comparison classes 1280 o Added a section on changes in Unicode 1282 C.4. 03 1284 o Aligned comparison discussion with identifier discussion from 1285 draft-iab-identifier-comparison-00 1286 o Added section on classes of strings ("Namey" and so on) 1288 C.5. 04 1290 Keepalive version 1292 C.6. 05 1294 o Changed classes of strings to align with framework doc 1295 o Altered table in Appendix A 1296 o Added all profiles evaluations from the wg wiki in appendix B 1298 Authors' Addresses 1300 Marc Blanchet 1301 Viagenie 1302 246 Aberdeen 1303 Quebec, QC G1R 2E1 1304 Canada 1306 Email: Marc.Blanchet@viagenie.ca 1307 URI: http://viagenie.ca 1308 Andrew Sullivan 1309 Dyn, Inc. 1310 150 Dow St 1311 Manchester, NH 03101 1312 U.S.A. 1314 Email: asullivan@dyn.com