idnits 2.17.1
draft-saintandre-username-interop-03.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (March 31, 2014) is 3671 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
-- Looks like a reference, but probably isn't: '0' on line 369
-- Looks like a reference, but probably isn't: '1' on line 370
== Outdated reference: A later version (-23) exists of
draft-ietf-precis-framework-15
== Outdated reference: A later version (-12) exists of
draft-ietf-precis-mappings-07
== Outdated reference: A later version (-18) exists of
draft-ietf-precis-saslprepbis-07
-- Obsolete informational reference (is this intentional?): RFC 821
(Obsoleted by RFC 2821)
-- Obsolete informational reference (is this intentional?): RFC 2822
(Obsoleted by RFC 5322)
-- Obsolete informational reference (is this intentional?): RFC 4282
(Obsoleted by RFC 7542)
Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 6 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group P. Saint-Andre
3 Internet-Draft &yet
4 Intended status: Informational March 31, 2014
5 Expires: October 2, 2014
7 An Interoperable Subset of Characters for Internationalized Usernames
8 draft-saintandre-username-interop-03
10 Abstract
12 Various Internet protocols define constructs for usernames, i.e., the
13 localpart of an address such as "localpart@example.com". This
14 document describes a subset of Unicode characters to allow in
15 internationalized usernames for the sake of maximal interoperability
16 across Internet protocols.
18 Status of This Memo
20 This Internet-Draft is submitted in full conformance with the
21 provisions of BCP 78 and BCP 79.
23 Internet-Drafts are working documents of the Internet Engineering
24 Task Force (IETF). Note that other groups may also distribute
25 working documents as Internet-Drafts. The list of current Internet-
26 Drafts is at http://datatracker.ietf.org/drafts/current/.
28 Internet-Drafts are draft documents valid for a maximum of six months
29 and may be updated, replaced, or obsoleted by other documents at any
30 time. It is inappropriate to use Internet-Drafts as reference
31 material or to cite them other than as "work in progress."
33 This Internet-Draft will expire on October 2, 2014.
35 Copyright Notice
37 Copyright (c) 2014 IETF Trust and the persons identified as the
38 document authors. All rights reserved.
40 This document is subject to BCP 78 and the IETF Trust's Legal
41 Provisions Relating to IETF Documents
42 (http://trustee.ietf.org/license-info) in effect on the date of
43 publication of this document. Please review these documents
44 carefully, as they describe your rights and restrictions with respect
45 to this document. Code Components extracted from this document must
46 include Simplified BSD License text as described in Section 4.e of
47 the Trust Legal Provisions and are provided without warranty as
48 described in the Simplified BSD License.
50 Table of Contents
52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
53 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2
54 3. Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
55 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
56 5. Security Considerations . . . . . . . . . . . . . . . . . . . 5
57 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
58 6.1. Normative References . . . . . . . . . . . . . . . . . . 6
59 6.2. Informative References . . . . . . . . . . . . . . . . . 6
60 Appendix A. Analysis . . . . . . . . . . . . . . . . . . . . . . 7
61 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 12
62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12
64 1. Introduction
66 Various Internet protocols define constructs for usernames, i.e., the
67 localpart of an address such as "localpart@example.com". As further
68 described under Appendix A), examples include the localparts of email
69 addresses, Kerberos Principal Names, Network Access Identifiers, SIP
70 URIs, instant messaging URIs and presence URIs, XMPP addresses, and
71 account URIs, as well as certain forms of SASL simple user names (see
72 [I-D.ietf-precis-saslprepbis]). This document describes a subset of
73 Unicode characters [UNICODE] to allow in internationalized usernames
74 for the sake of maximal interoperability across Internet protocols.
75 This subset might prove useful in cases where a provider offers
76 multiple services (say, email and instant messaging) using the same
77 underlying identifier, or where the same identifier (e.g., an account
78 URI) is used when interacting with multiple providers.
80 2. Terminology
82 Many important terms used in this document are defined in
83 [I-D.ietf-precis-framework], [RFC6365], and [UNICODE].
85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
87 "OPTIONAL" in this document are to be interpreted as described in
88 [RFC2119].
90 3. Subset
92 The interoperable subset of characters provided here is defined as a
93 profile of the PRECIS IdentifierClass specified in
94 [I-D.ietf-precis-framework]. In essence, the IdentifierClass
95 restricts the allowable characters to letters and digits from all the
96 scripts of Unicode [UNICODE] while grandfathering all the characters
97 from the ASCII range [RFC20]. The profile defined here,
98 "LocalpartIdentifierClass", further restricts the characters from the
99 ASCII range to those known to work across existing application
100 protocols (as described under Appendix A).
102 The syntax is defined as follows using the Augmented Backus-Naur Form
103 (ABNF) as specified in [RFC5234].
105 localpart = 1*1023(localpoint)
106 ;
107 ; a "localpoint" is a UTF-8 encoded Unicode code point
108 ; that conforms to the "LocalpartIdentifierClass"
109 ; profile of the PRECIS IdentifierClass
111 A "localpart" MUST consist only of Unicode code points that conform
112 to the "LocalpartIdentifierClass" profile of the "IdentifierClass"
113 base string class defined in [I-D.ietf-precis-framework]. The
114 LocalpartIdentifierClass profile includes all code points allowed by
115 the IdentifierClass base class, with the exception of the following
116 characters, which are disallowed (again, see Appendix A for the
117 reasoning behind these restrictions):
119 U+0022 (QUOTATION MARK), i.e., '"'
121 U+0023 (NUMBER SIGN), i.e., '#'
123 U+0025 (PERCENT SIGN), i.e., '%'
125 U+0026 (AMPERSAND), i.e., '&'
127 U+0027 (APOSTROPHE), i.e., "'"
129 U+0028 (LEFT PARENTHESIS), i.e., '('
131 U+0029 (RIGHT PARENTHESIS), i.e., ')'
133 U+002C (COMMA), i.e., ','
135 U+002E (FULL STOP), i.e., '.'
137 U+002F (SOLIDUS), i.e., '/'
139 U+003A (COLON), i.e., ':'
141 U+003B (SEMICOLON), i.e., ';'
143 U+003C (LESS-THAN SIGN), i.e., '<'
145 U+003E (GREATER-THAN SIGN), i.e., '>'
146 U+003F (QUESTION MARK), i.e., '?'
148 U+0040 (COMMERCIAL AT), i.e., '@'
150 U+005B (LEFT SQUARE BRACKET), i.e., '['
152 U+005C (REVERSE SOLIDUS), i.e., '\'
154 U+005D (RIGHT SQUARE BRACKET), i.e., ']'
156 U+005E (CIRCUMFLEX ACCENT), i.e., '^'
158 U+0060 (GRAVE ACCENT), i.e., '`'
160 U+007B (LEFT CURLY BRACKET), i.e., '{'
162 U+007C (VERTICAL), i.e., '|'
164 U+007D (RIGHT CURLY BRACKET), i.e., '}'
166 The normalization and mapping rules for the LocalpartIdentifierClass
167 are as follows, where the operations specified MUST be completed in
168 the order shown:
170 1. Fullwidth and halfwidth characters MUST be mapped to their
171 decomposition mappings.
173 2. So-called additional mappings MAY be applied, such as mapping of
174 characters that are similar to common delimiters (such as '@',
175 ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
176 STOP (U+3002) to FULL STOP (U+002E)) and special handling of
177 certain characters or classes of characters (e.g., mapping of
178 non-ASCII spaces to ASCII space); the PRECIS mappings document
179 [I-D.ietf-precis-mappings] describes such mappings in more
180 detail.
182 3. Uppercase and titlecase characters MUST be mapped to their
183 lowercase equivalents.
185 4. All characters MUST be mapped using Unicode Normalization Form C
186 (NFC).
188 With regard to directionality, applications MUST apply the "Bidi
189 Rule" defined in [RFC5893] (i.e., each of the six conditions of the
190 Bidi Rule must be satisfied).
192 A localpart MUST NOT be zero octets in length and MUST NOT be more
193 than 1023 octets in length. This rule is to be enforced after any
194 normalization and mapping of code points.
196 4. IANA Considerations
198 The IANA shall add the following entry to the PRECIS Profiles
199 Registry:
201 Name: LocalpartIdentifierClass.
203 Applicability: Usernames that are intended to be interoperable
204 across multiple application protocols.
206 Base Class: IdentifierClass.
208 Replaces: None.
210 Width Mapping: Map fullwidth and halfwidth characters to their
211 decomposition mappings.
213 Additional Mappings: None required or recommended.
215 Case Mapping: Map uppercase and titlecase characters to lowercase.
217 Normalization: NFC.
219 Directionality: The "Bidi Rule" defined in RFC 5893 applies.
221 Exclusions: 24 non-alphanumeric characters in the ASCII range.
223 Enforcement: Up to the application protocol or deployment.
225 Specification: this document. [Note to RFC Editor: please change
226 "this document" to the RFC number issued for this specification.]
228 5. Security Considerations
230 Deploying usernames that are interoperable across multiple protocols
231 could potentially give malicious entities multiple ways to attack an
232 account or user.
234 The security considerations described in [I-D.ietf-precis-framework]
235 apply to the "IdentifierClass" base string class used in this
236 document.
238 The security considerations described in [UTS39] apply to the use of
239 Unicode characters.
241 6. References
243 6.1. Normative References
245 [I-D.ietf-precis-framework]
246 Saint-Andre, P. and M. Blanchet, "Precis Framework:
247 Handling Internationalized Strings in Protocols", draft-
248 ietf-precis-framework-15 (work in progress), March 2014.
250 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20,
251 October 1969.
253 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
254 Requirement Levels", BCP 14, RFC 2119, March 1997.
256 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
257 Specifications: ABNF", STD 68, RFC 5234, January 2008.
259 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
260 Internationalized Domain Names for Applications (IDNA)",
261 RFC 5893, August 2010.
263 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version
264 6.3", 2013,
265 .
267 6.2. Informative References
269 [I-D.ietf-appsawg-acct-uri]
270 Saint-Andre, P., "The 'acct' URI Scheme", draft-ietf-
271 appsawg-acct-uri-07 (work in progress), January 2014.
273 [I-D.ietf-precis-mappings]
274 Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS
275 classes", draft-ietf-precis-mappings-07 (work in
276 progress), February 2014.
278 [I-D.ietf-precis-saslprepbis]
279 Saint-Andre, P. and A. Melnikov, "Preparation and
280 Comparison of Internationalized Strings Representing
281 Usernames and Passwords", draft-ietf-precis-saslprepbis-07
282 (work in progress), March 2014.
284 [RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC
285 821, August 1982.
287 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April
288 2001.
290 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
291 A., Peterson, J., Sparks, R., Handley, M., and E.
292 Schooler, "SIP: Session Initiation Protocol", RFC 3261,
293 June 2002.
295 [RFC3856] Rosenberg, J., "A Presence Event Package for the Session
296 Initiation Protocol (SIP)", RFC 3856, August 2004.
298 [RFC3860] Peterson, J., "Common Profile for Instant Messaging
299 (CPIM)", RFC 3860, August 2004.
301 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
302 Resource Identifier (URI): Generic Syntax", STD 66, RFC
303 3986, January 2005.
305 [RFC4120] Neuman, C., Yu, T., Hartman, S., and K. Raeburn, "The
306 Kerberos Network Authentication Service (V5)", RFC 4120,
307 July 2005.
309 [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The
310 Network Access Identifier", RFC 4282, December 2005.
312 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
313 October 2008.
315 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence
316 Protocol (XMPP): Core", RFC 6120, March 2011.
318 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in
319 Internationalization in the IETF", BCP 166, RFC 6365,
320 September 2011.
322 [UTS39] The Unicode Consortium, "Unicode Technical Standard #39:
323 Unicode Security Mechanisms", July 2012,
324 .
326 Appendix A. Analysis
328 This document takes the following username constructs into
329 consideration:
331 o Email addresses [RFC5322]
333 o Kerberos Principal Names [RFC4120]
335 o Network Access Identifiers [RFC4282]
337 o SIP URIs [RFC3261]
338 o Instant messaging URIs [RFC3860] and presence URIs [RFC3856]
340 o XMPP addresses (a.k.a. Jabber Identifiers) [RFC6120]
342 o Account URIs [I-D.ietf-appsawg-acct-uri]
344 Each of those address formats defines something that can be used as
345 the "localpart" of an address.
347 The localpart of an email address uses either the "local-part" or the
348 "dot-atom-text" rule in [RFC5322]. Here we make the simplifying
349 assumption that the "dot-atom-text" rule applies:
351 dot-atom-text = 1*atext *("." 1*atext)
352 atext = ALPHA / DIGIT / ; Any character except
353 "!" / "#" / "$" / ; controls, SP, and
354 "%" / "&" / "'" / ; specials. Used for
355 "*" / "+" / "-" / ; atoms.
356 "/" / "=" / "?" /
357 "^" / "_" / "`" /
358 "{" / "|" / "}" /
359 "~"
361 We make the same simplifying assumption for im: and pres: URIs
362 (although their specifications reference [RFC2822]).
364 A Kerberos Principal Name is a sequence of strings of type
365 KerberosString, where each KerberosString is a GeneralString that is
366 constrained to contain only characters in IA5String.
368 PrincipalName ::= SEQUENCE {
369 name-type [0] Int32,
370 name-string [1] SEQUENCE OF KerberosString
371 }
372 KerberosString ::= GeneralString (IA5String)
374 A Network Address Identifier inherits from [RFC821]. Here we care
375 only about the "username" rule:
377 username = dot-string
378 dot-string = string
379 dot-string =/ dot-string "." string
380 string = char
381 string =/ string char
382 char = c
383 char =/ "\" x
384 c = %x21 ; '!' allowed
385 ; '"' not allowed
386 c =/ %x23 ; '#' allowed
387 c =/ %x24 ; '$' allowed
388 c =/ %x25 ; '%' allowed
389 c =/ %x26 ; '&' allowed
390 c =/ %x27 ; ''' allowed
391 ; '(', ')' not allowed
392 c =/ %x2A ; '*' allowed
393 c =/ %x2B ; '+' allowed
394 ; ',' not allowed
395 c =/ %x2D ; '-' allowed
396 ; '.' not allowed
397 c =/ %x2F ; '/' allowed
398 c =/ %x30-39 ; '0'-'9' allowed
399 ; ';', ':', '<' not allowed
400 c =/ %x3D ; '=' allowed
401 ; '>' not allowed
402 c =/ %x3F ; '?' allowed
403 ; '@' not allowed
404 c =/ %x41-5a ; 'A'-'Z' allowed
405 ; '[', '\', ']' not allowed
406 c =/ %x5E ; '^' allowed
407 c =/ %x5F ; '_' allowed
408 c =/ %x60 ; '`' allowed
409 c =/ %x61-7A ; 'a'-'z' allowed
410 c =/ %x7B ; '{' allowed
411 c =/ %x7C ; '|' allowed
412 c =/ %x7D ; '}' allowed
413 c =/ %x7E ; '~' allowed
414 ; DEL not allowed
415 c =/ %x80-FF ; UTF-8-Octet allowed
416 x = %x00-FF ; all 128 ASCII characters
418 The localpart of a sip:/sips: URI inherits from the "userinfo" rule
419 in [RFC3986] with several changes; here we discuss the SIP "user"
420 rule only:
422 user = 1*( unreserved / escaped / user-unreserved )
423 user-unreserved = "&" / "=" / "+" / "$" / "," / ";" / "?" / "/"
424 unreserved = alphanum / mark
425 mark = "-" / "_" / "." / "!" / "~" / "*" / "'"
426 / "(" / ")"
428 The localpart of an XMPP address allows any ASCII character except
429 space, controls, and the " & ' / : < > @ characters.
431 The 'acct' URI syntax borrows the 'host', 'pct-encoded', 'sub-
432 delims', 'unreserved' rules from [RFC3986]:
434 acctURI = "acct" ":" userpart "@" host
435 userpart = unreserved / sub-delims
436 0*( unreserved / pct-encoded / sub-delims )
438 To summarize the foregoing information, the following table lists the
439 allowed and disallowed characters in the localpart of identifiers for
440 each protocol (aside from the alphanumeric, space, and control
441 characters), in order by hexadecimal character number (where each "A"
442 row shows the allowed characters and each "D" row shows the
443 disallowed characters).
445 Table 1: Allowed and Disallowed Characters (Non-Alphanumeric)
447 +---+----------------------------------+
448 | EMAIL ADDRESSES, IM/PRES URIs |
449 +---+----------------------------------+
450 | A | ! #$%&' *+ - / = ? ^_`{|}~ |
451 | D | " () , . :;< > @[\] |
452 +---+----------------------------------+
453 | KERBEROS PRINCIPAL NAMES |
454 +---+----------------------------------+
455 | A | !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
456 | D | |
457 +---+----------------------------------+
458 | NETWORK ADDRESS IDENTIFIERS |
459 +---+----------------------------------+
460 | A | ! #$%&' *+ - / = ? ^_`{|}~ |
461 | D | " () , . :;< > @[\] |
462 +---+----------------------------------+
463 | SIP/SIPS URIs |
464 +---+----------------------------------+
465 | A | ! $ &'()*+,-./ ; = ? _ ~ |
466 | D | "# % : < > @[\]^ `{|} |
467 +---+----------------------------------+
468 | XMPP ADDRESSES |
469 +---+----------------------------------+
470 | A | ! #$% ()*+,-. ; = ? [\]^_`{|}~ |
471 | D | " &' /: < > @ |
472 +---+----------------------------------+
473 | ACCT URIs |
474 +---+----------------------------------+
475 | A | ! $%&'()*+,-. ; = \ ^_`{|}~ |
476 | D | "# /: < >?@[ ] |
477 +---+----------------------------------+
479 The interoperable subset allows only characters that are allowed in
480 all of the foregoing formats, as shown in the following table.
482 Table 2: Subset Characters (Non-Alphanumeric)
484 +---+----------------------------------+
485 | INTEROPERABLE SUBSET |
486 +---+----------------------------------+
487 | A | ! $ *+ - = _ ~ |
488 | D | "# %&'() , ./:;< >?@[\]^ `{|} |
489 +---+----------------------------------+
491 Appendix B. Acknowledgements
493 Thanks to Sean Turner for inspiring the work on this document.
494 Thanks also to Paul Hoffman, John Klensin, and Glen Zorn for their
495 comments.
497 Author's Address
499 Peter Saint-Andre
500 &yet
501 P.O. Box 787
502 Parker, CO 80134
503 USA
505 Email: ietf@stpeter.im