idnits 2.17.1 draft-saintandre-username-interop-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 31, 2014) is 3671 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 369 -- Looks like a reference, but probably isn't: '1' on line 370 == Outdated reference: A later version (-23) exists of draft-ietf-precis-framework-15 == Outdated reference: A later version (-12) exists of draft-ietf-precis-mappings-07 == Outdated reference: A later version (-18) exists of draft-ietf-precis-saslprepbis-07 -- Obsolete informational reference (is this intentional?): RFC 821 (Obsoleted by RFC 2821) -- Obsolete informational reference (is this intentional?): RFC 2822 (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 4282 (Obsoleted by RFC 7542) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Saint-Andre 3 Internet-Draft &yet 4 Intended status: Informational March 31, 2014 5 Expires: October 2, 2014 7 An Interoperable Subset of Characters for Internationalized Usernames 8 draft-saintandre-username-interop-03 10 Abstract 12 Various Internet protocols define constructs for usernames, i.e., the 13 localpart of an address such as "localpart@example.com". This 14 document describes a subset of Unicode characters to allow in 15 internationalized usernames for the sake of maximal interoperability 16 across Internet protocols. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on October 2, 2014. 35 Copyright Notice 37 Copyright (c) 2014 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 3. Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 55 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 56 5. Security Considerations . . . . . . . . . . . . . . . . . . . 5 57 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 58 6.1. Normative References . . . . . . . . . . . . . . . . . . 6 59 6.2. Informative References . . . . . . . . . . . . . . . . . 6 60 Appendix A. Analysis . . . . . . . . . . . . . . . . . . . . . . 7 61 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 12 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 64 1. Introduction 66 Various Internet protocols define constructs for usernames, i.e., the 67 localpart of an address such as "localpart@example.com". As further 68 described under Appendix A), examples include the localparts of email 69 addresses, Kerberos Principal Names, Network Access Identifiers, SIP 70 URIs, instant messaging URIs and presence URIs, XMPP addresses, and 71 account URIs, as well as certain forms of SASL simple user names (see 72 [I-D.ietf-precis-saslprepbis]). This document describes a subset of 73 Unicode characters [UNICODE] to allow in internationalized usernames 74 for the sake of maximal interoperability across Internet protocols. 75 This subset might prove useful in cases where a provider offers 76 multiple services (say, email and instant messaging) using the same 77 underlying identifier, or where the same identifier (e.g., an account 78 URI) is used when interacting with multiple providers. 80 2. Terminology 82 Many important terms used in this document are defined in 83 [I-D.ietf-precis-framework], [RFC6365], and [UNICODE]. 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 87 "OPTIONAL" in this document are to be interpreted as described in 88 [RFC2119]. 90 3. Subset 92 The interoperable subset of characters provided here is defined as a 93 profile of the PRECIS IdentifierClass specified in 94 [I-D.ietf-precis-framework]. In essence, the IdentifierClass 95 restricts the allowable characters to letters and digits from all the 96 scripts of Unicode [UNICODE] while grandfathering all the characters 97 from the ASCII range [RFC20]. The profile defined here, 98 "LocalpartIdentifierClass", further restricts the characters from the 99 ASCII range to those known to work across existing application 100 protocols (as described under Appendix A). 102 The syntax is defined as follows using the Augmented Backus-Naur Form 103 (ABNF) as specified in [RFC5234]. 105 localpart = 1*1023(localpoint) 106 ; 107 ; a "localpoint" is a UTF-8 encoded Unicode code point 108 ; that conforms to the "LocalpartIdentifierClass" 109 ; profile of the PRECIS IdentifierClass 111 A "localpart" MUST consist only of Unicode code points that conform 112 to the "LocalpartIdentifierClass" profile of the "IdentifierClass" 113 base string class defined in [I-D.ietf-precis-framework]. The 114 LocalpartIdentifierClass profile includes all code points allowed by 115 the IdentifierClass base class, with the exception of the following 116 characters, which are disallowed (again, see Appendix A for the 117 reasoning behind these restrictions): 119 U+0022 (QUOTATION MARK), i.e., '"' 121 U+0023 (NUMBER SIGN), i.e., '#' 123 U+0025 (PERCENT SIGN), i.e., '%' 125 U+0026 (AMPERSAND), i.e., '&' 127 U+0027 (APOSTROPHE), i.e., "'" 129 U+0028 (LEFT PARENTHESIS), i.e., '(' 131 U+0029 (RIGHT PARENTHESIS), i.e., ')' 133 U+002C (COMMA), i.e., ',' 135 U+002E (FULL STOP), i.e., '.' 137 U+002F (SOLIDUS), i.e., '/' 139 U+003A (COLON), i.e., ':' 141 U+003B (SEMICOLON), i.e., ';' 143 U+003C (LESS-THAN SIGN), i.e., '<' 145 U+003E (GREATER-THAN SIGN), i.e., '>' 146 U+003F (QUESTION MARK), i.e., '?' 148 U+0040 (COMMERCIAL AT), i.e., '@' 150 U+005B (LEFT SQUARE BRACKET), i.e., '[' 152 U+005C (REVERSE SOLIDUS), i.e., '\' 154 U+005D (RIGHT SQUARE BRACKET), i.e., ']' 156 U+005E (CIRCUMFLEX ACCENT), i.e., '^' 158 U+0060 (GRAVE ACCENT), i.e., '`' 160 U+007B (LEFT CURLY BRACKET), i.e., '{' 162 U+007C (VERTICAL), i.e., '|' 164 U+007D (RIGHT CURLY BRACKET), i.e., '}' 166 The normalization and mapping rules for the LocalpartIdentifierClass 167 are as follows, where the operations specified MUST be completed in 168 the order shown: 170 1. Fullwidth and halfwidth characters MUST be mapped to their 171 decomposition mappings. 173 2. So-called additional mappings MAY be applied, such as mapping of 174 characters that are similar to common delimiters (such as '@', 175 ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL 176 STOP (U+3002) to FULL STOP (U+002E)) and special handling of 177 certain characters or classes of characters (e.g., mapping of 178 non-ASCII spaces to ASCII space); the PRECIS mappings document 179 [I-D.ietf-precis-mappings] describes such mappings in more 180 detail. 182 3. Uppercase and titlecase characters MUST be mapped to their 183 lowercase equivalents. 185 4. All characters MUST be mapped using Unicode Normalization Form C 186 (NFC). 188 With regard to directionality, applications MUST apply the "Bidi 189 Rule" defined in [RFC5893] (i.e., each of the six conditions of the 190 Bidi Rule must be satisfied). 192 A localpart MUST NOT be zero octets in length and MUST NOT be more 193 than 1023 octets in length. This rule is to be enforced after any 194 normalization and mapping of code points. 196 4. IANA Considerations 198 The IANA shall add the following entry to the PRECIS Profiles 199 Registry: 201 Name: LocalpartIdentifierClass. 203 Applicability: Usernames that are intended to be interoperable 204 across multiple application protocols. 206 Base Class: IdentifierClass. 208 Replaces: None. 210 Width Mapping: Map fullwidth and halfwidth characters to their 211 decomposition mappings. 213 Additional Mappings: None required or recommended. 215 Case Mapping: Map uppercase and titlecase characters to lowercase. 217 Normalization: NFC. 219 Directionality: The "Bidi Rule" defined in RFC 5893 applies. 221 Exclusions: 24 non-alphanumeric characters in the ASCII range. 223 Enforcement: Up to the application protocol or deployment. 225 Specification: this document. [Note to RFC Editor: please change 226 "this document" to the RFC number issued for this specification.] 228 5. Security Considerations 230 Deploying usernames that are interoperable across multiple protocols 231 could potentially give malicious entities multiple ways to attack an 232 account or user. 234 The security considerations described in [I-D.ietf-precis-framework] 235 apply to the "IdentifierClass" base string class used in this 236 document. 238 The security considerations described in [UTS39] apply to the use of 239 Unicode characters. 241 6. References 243 6.1. Normative References 245 [I-D.ietf-precis-framework] 246 Saint-Andre, P. and M. Blanchet, "Precis Framework: 247 Handling Internationalized Strings in Protocols", draft- 248 ietf-precis-framework-15 (work in progress), March 2014. 250 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, 251 October 1969. 253 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 254 Requirement Levels", BCP 14, RFC 2119, March 1997. 256 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 257 Specifications: ABNF", STD 68, RFC 5234, January 2008. 259 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for 260 Internationalized Domain Names for Applications (IDNA)", 261 RFC 5893, August 2010. 263 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 264 6.3", 2013, 265 . 267 6.2. Informative References 269 [I-D.ietf-appsawg-acct-uri] 270 Saint-Andre, P., "The 'acct' URI Scheme", draft-ietf- 271 appsawg-acct-uri-07 (work in progress), January 2014. 273 [I-D.ietf-precis-mappings] 274 Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS 275 classes", draft-ietf-precis-mappings-07 (work in 276 progress), February 2014. 278 [I-D.ietf-precis-saslprepbis] 279 Saint-Andre, P. and A. Melnikov, "Preparation and 280 Comparison of Internationalized Strings Representing 281 Usernames and Passwords", draft-ietf-precis-saslprepbis-07 282 (work in progress), March 2014. 284 [RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 285 821, August 1982. 287 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 288 2001. 290 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 291 A., Peterson, J., Sparks, R., Handley, M., and E. 292 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 293 June 2002. 295 [RFC3856] Rosenberg, J., "A Presence Event Package for the Session 296 Initiation Protocol (SIP)", RFC 3856, August 2004. 298 [RFC3860] Peterson, J., "Common Profile for Instant Messaging 299 (CPIM)", RFC 3860, August 2004. 301 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 302 Resource Identifier (URI): Generic Syntax", STD 66, RFC 303 3986, January 2005. 305 [RFC4120] Neuman, C., Yu, T., Hartman, S., and K. Raeburn, "The 306 Kerberos Network Authentication Service (V5)", RFC 4120, 307 July 2005. 309 [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The 310 Network Access Identifier", RFC 4282, December 2005. 312 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, 313 October 2008. 315 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence 316 Protocol (XMPP): Core", RFC 6120, March 2011. 318 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in 319 Internationalization in the IETF", BCP 166, RFC 6365, 320 September 2011. 322 [UTS39] The Unicode Consortium, "Unicode Technical Standard #39: 323 Unicode Security Mechanisms", July 2012, 324 . 326 Appendix A. Analysis 328 This document takes the following username constructs into 329 consideration: 331 o Email addresses [RFC5322] 333 o Kerberos Principal Names [RFC4120] 335 o Network Access Identifiers [RFC4282] 337 o SIP URIs [RFC3261] 338 o Instant messaging URIs [RFC3860] and presence URIs [RFC3856] 340 o XMPP addresses (a.k.a. Jabber Identifiers) [RFC6120] 342 o Account URIs [I-D.ietf-appsawg-acct-uri] 344 Each of those address formats defines something that can be used as 345 the "localpart" of an address. 347 The localpart of an email address uses either the "local-part" or the 348 "dot-atom-text" rule in [RFC5322]. Here we make the simplifying 349 assumption that the "dot-atom-text" rule applies: 351 dot-atom-text = 1*atext *("." 1*atext) 352 atext = ALPHA / DIGIT / ; Any character except 353 "!" / "#" / "$" / ; controls, SP, and 354 "%" / "&" / "'" / ; specials. Used for 355 "*" / "+" / "-" / ; atoms. 356 "/" / "=" / "?" / 357 "^" / "_" / "`" / 358 "{" / "|" / "}" / 359 "~" 361 We make the same simplifying assumption for im: and pres: URIs 362 (although their specifications reference [RFC2822]). 364 A Kerberos Principal Name is a sequence of strings of type 365 KerberosString, where each KerberosString is a GeneralString that is 366 constrained to contain only characters in IA5String. 368 PrincipalName ::= SEQUENCE { 369 name-type [0] Int32, 370 name-string [1] SEQUENCE OF KerberosString 371 } 372 KerberosString ::= GeneralString (IA5String) 374 A Network Address Identifier inherits from [RFC821]. Here we care 375 only about the "username" rule: 377 username = dot-string 378 dot-string = string 379 dot-string =/ dot-string "." string 380 string = char 381 string =/ string char 382 char = c 383 char =/ "\" x 384 c = %x21 ; '!' allowed 385 ; '"' not allowed 386 c =/ %x23 ; '#' allowed 387 c =/ %x24 ; '$' allowed 388 c =/ %x25 ; '%' allowed 389 c =/ %x26 ; '&' allowed 390 c =/ %x27 ; ''' allowed 391 ; '(', ')' not allowed 392 c =/ %x2A ; '*' allowed 393 c =/ %x2B ; '+' allowed 394 ; ',' not allowed 395 c =/ %x2D ; '-' allowed 396 ; '.' not allowed 397 c =/ %x2F ; '/' allowed 398 c =/ %x30-39 ; '0'-'9' allowed 399 ; ';', ':', '<' not allowed 400 c =/ %x3D ; '=' allowed 401 ; '>' not allowed 402 c =/ %x3F ; '?' allowed 403 ; '@' not allowed 404 c =/ %x41-5a ; 'A'-'Z' allowed 405 ; '[', '\', ']' not allowed 406 c =/ %x5E ; '^' allowed 407 c =/ %x5F ; '_' allowed 408 c =/ %x60 ; '`' allowed 409 c =/ %x61-7A ; 'a'-'z' allowed 410 c =/ %x7B ; '{' allowed 411 c =/ %x7C ; '|' allowed 412 c =/ %x7D ; '}' allowed 413 c =/ %x7E ; '~' allowed 414 ; DEL not allowed 415 c =/ %x80-FF ; UTF-8-Octet allowed 416 x = %x00-FF ; all 128 ASCII characters 418 The localpart of a sip:/sips: URI inherits from the "userinfo" rule 419 in [RFC3986] with several changes; here we discuss the SIP "user" 420 rule only: 422 user = 1*( unreserved / escaped / user-unreserved ) 423 user-unreserved = "&" / "=" / "+" / "$" / "," / ";" / "?" / "/" 424 unreserved = alphanum / mark 425 mark = "-" / "_" / "." / "!" / "~" / "*" / "'" 426 / "(" / ")" 428 The localpart of an XMPP address allows any ASCII character except 429 space, controls, and the " & ' / : < > @ characters. 431 The 'acct' URI syntax borrows the 'host', 'pct-encoded', 'sub- 432 delims', 'unreserved' rules from [RFC3986]: 434 acctURI = "acct" ":" userpart "@" host 435 userpart = unreserved / sub-delims 436 0*( unreserved / pct-encoded / sub-delims ) 438 To summarize the foregoing information, the following table lists the 439 allowed and disallowed characters in the localpart of identifiers for 440 each protocol (aside from the alphanumeric, space, and control 441 characters), in order by hexadecimal character number (where each "A" 442 row shows the allowed characters and each "D" row shows the 443 disallowed characters). 445 Table 1: Allowed and Disallowed Characters (Non-Alphanumeric) 447 +---+----------------------------------+ 448 | EMAIL ADDRESSES, IM/PRES URIs | 449 +---+----------------------------------+ 450 | A | ! #$%&' *+ - / = ? ^_`{|}~ | 451 | D | " () , . :;< > @[\] | 452 +---+----------------------------------+ 453 | KERBEROS PRINCIPAL NAMES | 454 +---+----------------------------------+ 455 | A | !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ | 456 | D | | 457 +---+----------------------------------+ 458 | NETWORK ADDRESS IDENTIFIERS | 459 +---+----------------------------------+ 460 | A | ! #$%&' *+ - / = ? ^_`{|}~ | 461 | D | " () , . :;< > @[\] | 462 +---+----------------------------------+ 463 | SIP/SIPS URIs | 464 +---+----------------------------------+ 465 | A | ! $ &'()*+,-./ ; = ? _ ~ | 466 | D | "# % : < > @[\]^ `{|} | 467 +---+----------------------------------+ 468 | XMPP ADDRESSES | 469 +---+----------------------------------+ 470 | A | ! #$% ()*+,-. ; = ? [\]^_`{|}~ | 471 | D | " &' /: < > @ | 472 +---+----------------------------------+ 473 | ACCT URIs | 474 +---+----------------------------------+ 475 | A | ! $%&'()*+,-. ; = \ ^_`{|}~ | 476 | D | "# /: < >?@[ ] | 477 +---+----------------------------------+ 479 The interoperable subset allows only characters that are allowed in 480 all of the foregoing formats, as shown in the following table. 482 Table 2: Subset Characters (Non-Alphanumeric) 484 +---+----------------------------------+ 485 | INTEROPERABLE SUBSET | 486 +---+----------------------------------+ 487 | A | ! $ *+ - = _ ~ | 488 | D | "# %&'() , ./:;< >?@[\]^ `{|} | 489 +---+----------------------------------+ 491 Appendix B. Acknowledgements 493 Thanks to Sean Turner for inspiring the work on this document. 494 Thanks also to Paul Hoffman, John Klensin, and Glen Zorn for their 495 comments. 497 Author's Address 499 Peter Saint-Andre 500 &yet 501 P.O. Box 787 502 Parker, CO 80134 503 USA 505 Email: ietf@stpeter.im