idnits 2.17.1 

draft-ietf-precis-problem-statement-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 924: '...le username, and SHOULD use the SASLpr...'
     RFC 2119 keyword, line 928: '...mpty string), the server MUST fail the...'
     RFC 2119 keyword, line 932: '...      [SASLprep]), and both client and server SHOULD (*) use the...'
     RFC 2119 keyword, line 936: '...mpty string), the server MUST fail the...'
     RFC 2119 keyword, line 938: '...s requirement to MUST.  Currently, the...'
     (6 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 800 has weird spacing: '...is used  iSCSI...'

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 12, 2012) is 4427 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'SASL' is mentioned on line 942, but not defined

  == Missing Reference: 'SASLprep' is mentioned on line 1064, but not defined

  == Missing Reference: 'StringPrep' is mentioned on line 1149, but not
     defined

  == Missing Reference: 'RFC3629' is mentioned on line 1053, but not defined

  == Missing Reference: 'Stringprep' is mentioned on line 1060, but not
     defined

  == Missing Reference: 'PR29' is mentioned on line 1076, but not defined

  == Missing Reference: 'UTF-8' is mentioned on line 1124, but not defined

  == Missing Reference: 'Unicode' is mentioned on line 1137, but not defined

  == Outdated reference: A later version (-09) exists of
     draft-iab-identifier-comparison-00

  -- Obsolete informational reference (is this intentional?): RFC 3454
     (Obsoleted by RFC 7564)

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3491
     (Obsoleted by RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3530
     (Obsoleted by RFC 7530)

  -- Obsolete informational reference (is this intentional?): RFC 3920
     (Obsoleted by RFC 6120)

  -- Obsolete informational reference (is this intentional?): RFC 4013
     (Obsoleted by RFC 7613)

  -- Obsolete informational reference (is this intentional?): RFC 5661
     (Obsoleted by RFC 8881)


     Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        M. Blanchet
3	Internet-Draft                                                  Viagenie
4	Intended status: Informational                               A. Sullivan
5	Expires: September 13, 2012                                    Dyn, Inc.
6	                                                          March 12, 2012

8	                 Stringprep Revision Problem Statement
9	               draft-ietf-precis-problem-statement-05.txt

11	Abstract

13	   Using Unicode codepoints in protocol strings that expect comparison
14	   with other strings requires preparation of the string that contains
15	   the Unicode codepoints.  Internationalizing Domain Names in
16	   Applications (IDNA2003) defined and used Stringprep and Nameprep.
17	   Other protocols subsequently defined Stringprep profiles.  A new
18	   approach different from Stringprep and Nameprep is used for a
19	   revision of IDNA2003 (called IDNA2008).  Other Stringprep profiles
20	   need to be similarly updated or a replacement of Stringprep needs to
21	   be designed.  This document outlines the issues to be faced by those
22	   designing a Stringprep replacement.

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on September 13, 2012.

41	Copyright Notice

43	   Copyright (c) 2012 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	   2.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
72	   3.  Stringprep Profiles Limitations  . . . . . . . . . . . . . . .  5
73	   4.  Major Topics for Consideration . . . . . . . . . . . . . . . .  6
74	     4.1.  Comparison . . . . . . . . . . . . . . . . . . . . . . . .  6
75	       4.1.1.  Types of Identifiers . . . . . . . . . . . . . . . . .  6
76	       4.1.2.  Effect of comparison . . . . . . . . . . . . . . . . .  7
77	     4.2.  Dealing with characters  . . . . . . . . . . . . . . . . .  7
78	       4.2.1.  Case folding, case sensitivity, and case
79	               preservation . . . . . . . . . . . . . . . . . . . . .  7
80	       4.2.2.  Stringprep and NFKC  . . . . . . . . . . . . . . . . .  8
81	       4.2.3.  Character mapping  . . . . . . . . . . . . . . . . . .  8
82	       4.2.4.  Prohibited characters  . . . . . . . . . . . . . . . .  8
83	       4.2.5.  Internal structure, delimiters, and special
84	               characters . . . . . . . . . . . . . . . . . . . . . .  9
85	       4.2.6.  Restrictions because of glyph similarity . . . . . . . 10
86	     4.3.  Where the data comes from and where it goes  . . . . . . . 10
87	       4.3.1.  User input and the source of protocol elements . . . . 10
88	       4.3.2.  User output  . . . . . . . . . . . . . . . . . . . . . 10
89	       4.3.3.  Operations . . . . . . . . . . . . . . . . . . . . . . 10
90	       4.3.4.  Some useful classes of strings . . . . . . . . . . . . 11
91	   5.  Considerations for Stringprep replacement  . . . . . . . . . . 12
92	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
93	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
94	   8.  Discussion home for this draft . . . . . . . . . . . . . . . . 13
95	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
96	   10. Informative References . . . . . . . . . . . . . . . . . . . . 13
97	   Appendix A.  Classification of Stringprep Profiles . . . . . . . . 17
98	   Appendix B.  Evaluation of Stringprep Profiles . . . . . . . . . . 18
99	     B.1.  iSCSI Stringprep Profiles: RFC3722, RFC3721, RFC3720 . . . 18
100	     B.2.  SMTP/POP3/ManageSieve Stringprep Profiles:
101	           RFC4954,RFC5034,RFC 5804 . . . . . . . . . . . . . . . . . 20
102	     B.3.  IMAP Stringprep Profiles: RFC5738, RFC4314: Usernames  . . 21
103	     B.4.  IMAP Stringprep Profiles: RFC5738: Passwords . . . . . . . 23
104	     B.5.  Anonymous SASL Stringprep Profiles: RFC4505  . . . . . . . 24
105	     B.6.  XMPP Stringprep Profiles: RFC3920 Nodeprep . . . . . . . . 26
106	     B.7.  XMPP Stringprep Profiles: RFC3920 Resourceprep . . . . . . 27
107	     B.8.  EAP Stringprep Profiles: RFC3748 . . . . . . . . . . . . . 27
108	   Appendix C.  Changes between versions  . . . . . . . . . . . . . . 28
109	     C.1.  00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
110	     C.2.  01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
111	     C.3.  02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
112	     C.4.  03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
113	     C.5.  04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
114	     C.6.  05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
115	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29

117	1.  Introduction

119	   Internationalizing Domain Names in Applications (IDNA2003) [RFC3490],
120	   [RFC3491], [RFC3492], [RFC3454] describes a mechanism for encoding
121	   Unicode labels making up Internationalized Domain Names (IDNs) as
122	   standard DNS labels.  The labels were processed using a method called
123	   Nameprep [RFC3491] and Punycode [RFC3492].  That method was specific
124	   to IDNA2003, but is generalized as Stringprep [RFC3454].  The general
125	   mechanism is used by other protocols with similar needs, but with
126	   different constraints than IDNA2003.

128	   Stringprep defines a framework within which protocols define their
129	   Stringprep profiles.  Known IETF specifications using Stringprep are
130	   listed below:
131	   o  The Nameprep profile [RFC3490] for use in Internationalized Domain
132	      Names (IDNs);
133	   o  NFSv4 [RFC3530] and NFSv4.1 [RFC5661];
134	   o  The iSCSI profile [RFC3722] for use in Internet Small Computer
135	      Systems Interface (iSCSI) Names;
136	   o  EAP [RFC3748];
137	   o  The Nodeprep and Resourceprep profiles [RFC3920] for use in the
138	      Extensible Messaging and Presence Protocol (XMPP), and the XMPP to
139	      CPIM mapping [RFC3922] (the latter of these relies on the former);
140	   o  The Policy MIB profile [RFC4011] for use in the Simple Network
141	      Management Protocol (SNMP);
142	   o  The SASLprep profile [RFC4013] for use in the Simple
143	      Authentication and Security Layer (SASL), and SASL itself
144	      [RFC4422];
145	   o  TLS [RFC4279];
146	   o  IMAP4 using SASLprep [RFC4314];
147	   o  The trace profile [RFC4505] for use with the SASL ANONYMOUS
148	      mechanism;
149	   o  The LDAP profile [RFC4518] for use with LDAP [RFC4511] and its
150	      authentication methods [RFC4513];
151	   o  Plain SASL using SASLprep [RFC4616];
152	   o  NNTP using SASLprep [RFC4643];
153	   o  PKIX subject identification using LDAPprep [RFC4683];
154	   o  Internet Application Protocol Collation Registry [RFC4790];
155	   o  SMTP Auth using SASLprep [RFC4954];
156	   o  POP3 Auth using SASLprep [RFC5034];
157	   o  TLS SRP using SASLprep [RFC5054];
158	   o  IRI and URI in XMPP [RFC5122];
159	   o  PKIX CRL using LDAPprep [RFC5280];
160	   o  IAX using Nameprep [RFC5456];
161	   o  SASL SCRAM using SASLprep [RFC5802];
162	   o  Remote management of Sieve using SASLprep [RFC5804];
163	   o  The unicode-casemap Unicode Collation [RFC5051].

165	   However, a review [1] of these protocol specifications found that
166	   they are very similar and can be grouped into a short number of
167	   classes.  Moreover, many reuse the same Stringprep profile, such as
168	   the SASL one.

170	   IDNA2003 was replaced because of some limitations described in
171	   [RFC4690].  The new IDN specification, called IDNA2008 [RFC5890],
172	   [RFC5891], [RFC5892], [RFC5893] was designed based on the
173	   considerations found in [RFC5894].  One of the effects of IDNA2008 is
174	   that Nameprep and Stringprep are not used at all.  Instead, an
175	   algorithm based on Unicode properties of codepoints is defined.  That
176	   algorithm generates a stable and complete table of the supported
177	   Unicode codepoints for each Unicode version.  This algorithm is based
178	   on an inclusion-based approach, instead of the exclusion-based
179	   approach of Stringprep/Nameprep.

181	   This document lists the shortcomings and issues found by protocols
182	   listed above that defined Stringprep profiles.  It also lists the
183	   requirements for any potential replacement of Stringprep.

185	2.  Conventions

187	   This document uses the Unicode convention [2] to specify Unicode
188	   codepoint with the following syntax: U+ABCD where ABCD is the
189	   codepoint in hexadecimal.

191	3.  Stringprep Profiles Limitations

193	   During IETF 77, a BOF [3] discussed the current state of the
194	   protocols that have defined Stringprep profiles [NEWPREP].  The main
195	   conclusions from that discussion were as follows:
196	   o  Stringprep is bound to version 3.2 of Unicode.  Stringprep has not
197	      been updated to new versions of Unicode.  Therefore, the protocols
198	      using Stringprep are stuck to Unicode 3.2.
199	   o  The protocols need to be updated to support new versions of
200	      Unicode.  The protocols would like to not be bound to a specific
201	      version of Unicode, but rather have better Unicode agility in the
202	      way of IDNA2008.  This is important partly because it is usually
203	      impossible for an application to require Unicode 3.2; the
204	      application gets whatever version of Unicode is available on the
205	      host.
206	   o  The protocols require better bidirectional support (bidi) than
207	      currently offered by Stringprep.

209	   o  If the protocols are updated to use a new version of Stringprep or
210	      another framework, then backward compatibility is an important
211	      requirement.  For example, Stringprep is based on and profiles may
212	      use NFKC [UAX15], while IDNA2008 mostly uses NFC [UAX15].
213	   o  Identifiers are passed between protocols.  For example, the same
214	      username string of codepoints may be passed between SASL, XMPP,
215	      LDAP and EAP.  Therefore, common set of rules or classes of
216	      strings are preferred over specific rules for each protocol.
217	      Without real planning in advance, many stringprep profiles reuse
218	      other profiles, so this goal was accomplished by accident with
219	      Stringprep.

221	   Protocols that use Stringprep profiles use strings for different
222	   purposes:
223	   o  XMPP uses a different Stringprep profile for each part of the XMPP
224	      address (JID): a localpart which is similar to a username and used
225	      for authentication, a domainpart which is a domain name and a
226	      resource part which is less restrictive than the localpart.
227	   o  iSCSI uses a Stringprep profile for the IQN, which is very similar
228	      to (often is) a DNS domain name.
229	   o  SASL and LDAP uses a Stringprep profile for usernames.
230	   o  LDAP uses a set of Stringprep profiles.

232	   The consensus [4] of the BOF attendees is that it would be highly
233	   desirable to have a replacement of Stringprep, with similar
234	   characteristics to IDNA2008.  That replacement should be defined so
235	   that the protocols could use internationalized strings without a lot
236	   of specialized internationalization work, since internationalization
237	   expertise is not available in the respective protocols or working
238	   groups.

240	4.  Major Topics for Consideration

242	   This section provides an overview of major topics that a Stringprep
243	   replacement needs to address.  The headings correspond roughly with
244	   categories under which known Stringprep-using protocol RFCs have been
245	   evaluated.  For the details of those evaluations, see Appendix A.

247	4.1.  Comparison

249	4.1.1.  Types of Identifiers

251	   Following [I-D.iab-identifier-comparison], it is possible to organize
252	   identifiers into three classes in respect of how they may be compared
253	   with one another:

255	   Absolute Identifiers  Identifiers that can be compared byte-by-byte
256	      for equality.
257	   Definite Identifiers  Identifiers that have a well-defined comparison
258	      algorithm on which all parties agree.
259	   Indefinite Identifiers  Identifiers that have no single comparison
260	      algorithm on which all parties agree.

262	   Definite Identifiers include cases like the comparison of Unicode
263	   code points in different encodings: they do not match byte for byte,
264	   but can all be converted to a single encoding which then does match
265	   byte for byte.  Indefinite Identifiers are sometimes algorithmically
266	   comparable by well-specified subsets of parties.  For more discussion
267	   of these categories, see [I-D.iab-identifier-comparison].

269	   The section on treating the existing known cases, Appendix A uses the
270	   categories above.

272	4.1.2.  Effect of comparison

274	   The three classes of comparison style outlined in Section 4.1.1 may
275	   have different effects when applied.  It is necessary to evaluate the
276	   effects if a comparison results in a false positive, and what the
277	   effects are if a comparison results in a false negative, especially
278	   in terms of the consequences to security and usability.

280	4.2.  Dealing with characters

282	   This section outlines a range of issues having to do with characters
283	   in the target protocols, and outlines the ways in which IDNA2008
284	   might be a good analogy to other protocols, and ways in which it
285	   might be a poor one.

287	4.2.1.  Case folding, case sensitivity, and case preservation

289	   In IDNA2003, labels are always mapped to lower case before the
290	   Punycode transformation.  In IDNA2008, there is no mapping at all:
291	   input is either a valid U-label or it is not.  At the same time,
292	   upper-case characters are by definition not valid U-labels, because
293	   they fall into the Unstable category (category B) of [RFC5892].

295	   If there are protocols that require upper and lower cases be
296	   preserved, then the analogy with IDNA2008 will break down.
297	   Accordingly, existing protocols are to be evaluated according to the
298	   following criteria:

300	   1.  Does the protocol use case folding?  For all blocks of code
301	       points, or just for certain subsets?

303	   2.  Is the system or protocol case sensitive?
304	   3.  Does the system or protocol preserve case?

306	4.2.2.  Stringprep and NFKC

308	   Stringprep profiles may use normalization.  If they do, they use NFKC
309	   [UAX15] (most profiles do).  It is not clear that NFKC is the right
310	   normalization to use in all cases.  In [UAX15], there is the
311	   following observation regarding Normalization Forms KC and KD: "It is
312	   best to think of these Normalization Forms as being like uppercase or
313	   lowercase mappings: useful in certain contexts for identifying core
314	   meanings, but also performing modifications to the text that may not
315	   always be appropriate."  For things like the spelling of users'
316	   names, then, NFKC may not be the best form to use.  At the same time,
317	   one of the nice things about NFKC is that it deals with the width of
318	   characters that are otherwise similar, by canonicalizing half-width
319	   to full-width.  This mapping step can be crucial in practice.  A
320	   replacement for stringprep depends on analyzing the different use
321	   profiles and considering whether NFKC or NFC is a better
322	   normalization for each profile.

324	   For the purposes of evaluating an existing example of Stringprep use,
325	   it is helpful to know whether it uses no normalization, NFKC, or NFC.

327	4.2.3.  Character mapping

329	   Along with the case mapping issues raised in Section 4.2.1, there is
330	   the question of whether some characters are mapped either to other
331	   characters or to nothing during Stringprep.  [RFC3454], Section 3,
332	   outlines a number of characters that are mapped to nothing, and also
333	   permits Stringprep profiles to define their own mappings.

335	4.2.4.  Prohibited characters

337	   Along with case folding and other character mappings, many protocols
338	   have characters that are simply disallowed.  For example, control
339	   characters and special characters such as "@" or "/" may be
340	   prohibited in a protocol.

342	   One of the primary changes of IDNA2008 is in the way it approaches
343	   Unicode code points.  IDNA2003 created an explicit list of excluded
344	   or mapped-away characters; anything in Unicode 3.2 that was not so
345	   listed could be assumed to be allowed under the protocol.  IDNA2008
346	   begins instead from the assumption that code points are disallowed,
347	   and then relies on Unicode properties to derive whether a given code
348	   point actually is allowed in the protocol.

350	   Moreover, there is more than one class of "allowed in the protocol"
351	   in IDNA2008 (but not in IDNA2003).  While some code points are
352	   disallowed outright, some are allowed only in certain contexts.  The
353	   reasons for the context-dependent rules have to do with the way some
354	   characters are used.  For instance, the ZERO WIDTH JOINER and ZERO
355	   WIDTH NON-JOINER (ZWJ, U+200D and ZWNJ, U+200C) are allowed with
356	   contextual rules because they are required in some circumstances, yet
357	   are considered punctuation by Unicode and would therefore be
358	   DISALLOWED under the usual IDNA2008 derivation rules.  The goal of
359	   IDNA2008 is to provide the widest repertoire of code points possible
360	   and consistent with the traditional DNS LDH rule, trusting to the
361	   operators of individual zones to make sensible (and usually more
362	   restrictive) policies for their zones.

364	   IDNA2008 may be a poor model for what other protocols ought to do in
365	   this case, because it is designed to support an old protocol that is
366	   designed to operate on the scale of the entire Internet.  Moreover,
367	   IDNA2008 is intended to be deployed without any change to the base
368	   DNS protocol.  Other protocols may aim at deployment in more local
369	   environments, or may have protocol version negotiation built in.

371	4.2.5.  Internal structure, delimiters, and special characters

373	   IDNA2008 has a special problem with delimiters, because the delimiter
374	   "character" in the DNS wire format is not really part of the data.
375	   In DNS, labels are not separated exactly; instead, a label carries
376	   with it an indicator that says how long the label is.  When the label
377	   is presented in presentation format as part of a fully qualified
378	   domain name, the label separator FULL STOP, U+002E (.) is used to
379	   break up the labels.  But because that label separator does not
380	   travel with the wire format of the domain name, there is no way to
381	   encode a different, "internationalized" separator in IDNA2008.

383	   Other protocols may include characters with similar special meaning
384	   within the protocol.  Common characters for these purposes include
385	   FULL STOP, U+002E (.); COMMERCIAL AT, U+0040 (@); HYPHEN-MINUS,
386	   U+002D (-); SOLIDUS, U+002F (/); and LOW LINE, U+005F (_).  The mere
387	   inclusion of such a character in the protocol is not enough for it to
388	   be considered similar to another protocol using the same character;
389	   instead, handling of the character must be taken into consideration
390	   as well.

392	   An important issue to tackle here is whether it is valuable to map to
393	   or from these special characters as part of the Stringprep
394	   replacement.  In some locales, the analogue to FULL STOP, U+002E is
395	   some other character, and users may expect to be able to substitute
396	   their normal stop for FULL STOP, U+002E. At the same time, there are
397	   predictability arguments in favour of treating identifiers with FULL
398	   STOP, U+002E in them just the way they are treated under IDNA2008.

400	4.2.6.  Restrictions because of glyph similarity

402	   Homoglyphs are similarly (or identically) rendered glyphs of
403	   different codepoints.  For DNS names, homoglyphs may enable phishing.
404	   If a protocol requires some visual comparison by end-users, then the
405	   issue of homoglyphs are to be considered.  In the DNS context, theses
406	   issues are documented in [RFC5894] and [RFC4690].  IDNA2008 does not,
407	   however, have a mechanism to deal with them, trusting to DNS zone
408	   operators to enact sensible policies for the subset of Unicode they
409	   wish to support, given their user community.  A similar policy/
410	   protocol split may not be desirable in every protocol.

412	4.3.  Where the data comes from and where it goes

414	4.3.1.  User input and the source of protocol elements

416	   Some protocol elements are provided by users, and others are not.
417	   Those that are not may presumably be subject to greater restrictions,
418	   whereas those that users provide likely need to permit the broadest
419	   range of code points.  The following questions are helpful:

421	   1.  Do users input the strings directly?
422	   2.  If so, how? (keyboard, stylus, voice, copy-paste, etc.)
423	   3.  Where do we place the dividing line between user interface and
424	       protocol? (see [RFC5895])

426	4.3.2.  User output

428	   Just as only some protocol elements are expected to be entered
429	   directly by users, only some protocol elements are intended to be
430	   consumed directly by users.  It is important to know how users are
431	   expected to be able to consume the protocol elements, because
432	   different environments present different challenges.  An element that
433	   is only ever delivered as part of a vCard remains in machine-readable
434	   format, so the problem of visual confusion is not a great one.  Is
435	   the protocol element published as part of a vCard, a web directory,
436	   on a business card, or on "the side of a bus"?  Do users use the
437	   protocol element as an identifier (which means that they might enter
438	   it again in some other context)?  (See also Section 4.2.6.)

440	4.3.3.  Operations

442	   Some strings are useful as part of the protocol but are not used as
443	   input to other operations (for instance, purely informative or
444	   descriptive text).  Other strings are used directly as input to other
445	   operations (such as cryptographic hash functions), or are used
446	   together with other strings to (such as concatenating a string with
447	   some others to form a unique identifier).

449	4.3.3.1.  String classes

451	   Strings often have a similar function in different protocols.  For
452	   instance, many different protocols contain user identifiers or
453	   passwords.  A single profile for all such uses might be desirable.

455	   Often, a string in a protocol is effectively a protocol element from
456	   another protocol.  For instance, different systems might use the same
457	   credentials database for authentication.

459	4.3.3.2.  Community Considerations

461	   A Stringprep replacement that does anything more than just update
462	   Stringprep to the latest version of Unicode will probably entail some
463	   changes.  It is important to identify the willingness of the
464	   protocol-using community to accept backwards-incompatible changes.
465	   By the same token, it is important to evaluate the desire of the
466	   community for features not available under Stringprep.

468	4.3.3.3.  Unicode Incompatible Changes

470	   IDNA2008 uses an algorithm to derive the validity of a Unicode code
471	   point for use under IDNA2008.  It does this by using the properties
472	   of each code point to test its validity.

474	   This approach depends crucially on the idea that code points, once
475	   valid for a protocol profile, will not later be made invalid.  That
476	   is not a guarantee currently provided by Unicode.  Properties of code
477	   points may change between versions of Unicode.  Rarely, such a change
478	   could cause a given code point to become invalid under a protocol
479	   profile, even though the code point would be valid with an earlier
480	   version of Unicode.  This is not merely a theoretical possibility,
481	   because it has occurred ([RFC6452]).

483	   Accordingly, as IDNA2008,a Stringprep replacement that intends to be
484	   Unicode version agnostic will need to work out a mechanism to address
485	   cases where incompatible changes occur because of new Unicode
486	   versions.

488	4.3.4.  Some useful classes of strings

490	   With the above considerations in hand, we can usefully classify
491	   strings into the following categories:

493	   DomainClass  Strings that are intended for use in a domain name slot,
494	      as defined in [RFC5890].  Note that strings of DomainClass could
495	      be used outside a domain name slot: the question here is what the
496	      eventual intended use for the string is, and not whether the
497	      string is actually functioning as a domain name at any moment.
498	   NameClass  Strings that are intended for use as identifiers but that
499	      are not DomainClass strings.  NameClass strings are normally
500	      public data within the protocol where they are used: these are
501	      intended as identifiers that can be passed around to identify
502	      something.
503	   FreeClass  Strings that are intended to be used by the protocol as
504	      free-form strings, but that have some significant handling within
505	      the protocol.  This includes things that are normally not public
506	      data in a protocol (like passwords), and things that might have
507	      additional restrictions within the protocol in question, such as a
508	      friendly name in a chat room.

510	5.  Considerations for Stringprep replacement

512	   The above suggests the following guidance for replacing Stringprep:
513	   o  A stringprep replacement should be defined.
514	   o  The replacement should take an approach similar to IDNA2008, (e.g.
515	      by using codepoint properties instead of codepoint whitelisting)
516	      in that it enables better Unicode agility.
517	   o  Protocols share similar characteristics of strings.  Therefore,
518	      defining i18n preparation algorithms for the smallest set of
519	      string classes may be sufficient for most cases, providing
520	      coherence among a set of related protocols or protocols where
521	      identifiers are exchanged.
522	   o  The sets of string classes need to be evaluated according to the
523	      considerations that make up the headings in Section 4
524	   o  It is reasonable to limit scope to Unicode code points, and rule
525	      the mapping of data from other character encodings outside the
526	      scope of this effort.
527	   o  Recommendations for handling protocol incompatibilities resulting
528	      from changes to Unicode are required.
529	   o  Comptability within each protocol between a technique that is
530	      stringprep-based and the technique's replacement has to be
531	      considered very carefully.

533	   Existing deployments already depend on Stringprep profiles.
534	   Therefore, a replacement must consider the effects of any new
535	   strategy on existing deployments.  By way of comparison, it is worth
536	   noting that some characters were acceptable in IDNA labels under
537	   IDNA2003, but are not protocol-valid under IDNA2008 (and conversely);
538	   disagreement about what to do during the transition has resulted in
539	   different approaches to mapping.  Different implementers may make
540	   different decisions about what to do in such cases; this could have
541	   interoperability effects.  It is necessary to trade better support
542	   for different linguistic environments against the potential side
543	   effects of backward incompatibility.

545	6.  Security Considerations

547	   This document merely states what problems are to be solved, and does
548	   not define a protocol.  There are undoubtedly security implications
549	   of the particular results that will come from the work to be
550	   completed.

552	7.  IANA Considerations

554	   This document has no actions for IANA.

556	8.  Discussion home for this draft

558	   Note: RFC-Editor, please remove this section before publication.

560	   This document is intended to define the problem space discussed on
561	   the precis@ietf.org mailing list.

563	9.  Acknowledgements

565	   This document is the product of the PRECIS IETF Working Group, and
566	   participants in that Working Group were helpful in addressing issues
567	   with the text.

569	   Specific contributions came from David Black, Alan DeKok, Bill
570	   McQuillan, Alexey Melnikov, Peter Saint-Andre, Dave Thaler, and
571	   Yoshiro Yoneya.

573	   Dave Thaler provided the "buckets" insight in Section 4.1.1, central
574	   to the organization of the problem.

576	   Evaluations of Stringprep profiles that are included in Appendix B
577	   were done by: David Black, Alexey Melnikov, Peter Saint-Andre, Dave
578	   Thaler.

580	10.  Informative References

582	   [I-D.iab-identifier-comparison]
583	              Thaler, D., "Issues in Identifier Comparison for Security
584	              Purposes", draft-iab-identifier-comparison-00 (work in
585	              progress), July 2011.

587	   [NEWPREP]  "Newprep BoF Meeting Minutes", March 2010.

589	   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
590	              Internationalized Strings ("stringprep")", RFC 3454,
591	              December 2002.

593	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
594	              "Internationalizing Domain Names in Applications (IDNA)",
595	              RFC 3490, March 2003.

597	   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
598	              Profile for Internationalized Domain Names (IDN)",
599	              RFC 3491, March 2003.

601	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
602	              for Internationalized Domain Names in Applications
603	              (IDNA)", RFC 3492, March 2003.

605	   [RFC3530]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
606	              Beame, C., Eisler, M., and D. Noveck, "Network File System
607	              (NFS) version 4 Protocol", RFC 3530, April 2003.

609	   [RFC3722]  Bakke, M., "String Profile for Internet Small Computer
610	              Systems Interface (iSCSI) Names", RFC 3722, April 2004.

612	   [RFC3748]  Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H.
613	              Levkowetz, "Extensible Authentication Protocol (EAP)",
614	              RFC 3748, June 2004.

616	   [RFC3920]  Saint-Andre, P., Ed., "Extensible Messaging and Presence
617	              Protocol (XMPP): Core", RFC 3920, October 2004.

619	   [RFC3922]  Saint-Andre, P., "Mapping the Extensible Messaging and
620	              Presence Protocol (XMPP) to Common Presence and Instant
621	              Messaging (CPIM)", RFC 3922, October 2004.

623	   [RFC4011]  Waldbusser, S., Saperia, J., and T. Hongal, "Policy Based
624	              Management MIB", RFC 4011, March 2005.

626	   [RFC4013]  Zeilenga, K., "SASLprep: Stringprep Profile for User Names
627	              and Passwords", RFC 4013, February 2005.

629	   [RFC4279]  Eronen, P. and H. Tschofenig, "Pre-Shared Key Ciphersuites
630	              for Transport Layer Security (TLS)", RFC 4279,
631	              December 2005.

633	   [RFC4314]  Melnikov, A., "IMAP4 Access Control List (ACL) Extension",
634	              RFC 4314, December 2005.

636	   [RFC4422]  Melnikov, A. and K. Zeilenga, "Simple Authentication and
637	              Security Layer (SASL)", RFC 4422, June 2006.

639	   [RFC4505]  Zeilenga, K., "Anonymous Simple Authentication and
640	              Security Layer (SASL) Mechanism", RFC 4505, June 2006.

642	   [RFC4511]  Sermersheim, J., "Lightweight Directory Access Protocol
643	              (LDAP): The Protocol", RFC 4511, June 2006.

645	   [RFC4513]  Harrison, R., "Lightweight Directory Access Protocol
646	              (LDAP): Authentication Methods and Security Mechanisms",
647	              RFC 4513, June 2006.

649	   [RFC4518]  Zeilenga, K., "Lightweight Directory Access Protocol
650	              (LDAP): Internationalized String Preparation", RFC 4518,
651	              June 2006.

653	   [RFC4616]  Zeilenga, K., "The PLAIN Simple Authentication and
654	              Security Layer (SASL) Mechanism", RFC 4616, August 2006.

656	   [RFC4643]  Vinocur, J. and K. Murchison, "Network News Transfer
657	              Protocol (NNTP) Extension for Authentication", RFC 4643,
658	              October 2006.

660	   [RFC4683]  Park, J., Lee, J., Lee, H., Park, S., and T. Polk,
661	              "Internet X.509 Public Key Infrastructure Subject
662	              Identification Method (SIM)", RFC 4683, October 2006.

664	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
665	              Recommendations for Internationalized Domain Names
666	              (IDNs)", RFC 4690, September 2006.

668	   [RFC4790]  Newman, C., Duerst, M., and A. Gulbrandsen, "Internet
669	              Application Protocol Collation Registry", RFC 4790,
670	              March 2007.

672	   [RFC4954]  Siemborski, R. and A. Melnikov, "SMTP Service Extension
673	              for Authentication", RFC 4954, July 2007.

675	   [RFC5034]  Siemborski, R. and A. Menon-Sen, "The Post Office Protocol
676	              (POP3) Simple Authentication and Security Layer (SASL)
677	              Authentication Mechanism", RFC 5034, July 2007.

679	   [RFC5051]  Crispin, M., "i;unicode-casemap - Simple Unicode Collation
680	              Algorithm", RFC 5051, October 2007.

682	   [RFC5054]  Taylor, D., Wu, T., Mavrogiannopoulos, N., and T. Perrin,
683	              "Using the Secure Remote Password (SRP) Protocol for TLS
684	              Authentication", RFC 5054, November 2007.

686	   [RFC5122]  Saint-Andre, P., "Internationalized Resource Identifiers
687	              (IRIs) and Uniform Resource Identifiers (URIs) for the
688	              Extensible Messaging and Presence Protocol (XMPP)",
689	              RFC 5122, February 2008.

691	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
692	              Housley, R., and W. Polk, "Internet X.509 Public Key
693	              Infrastructure Certificate and Certificate Revocation List
694	              (CRL) Profile", RFC 5280, May 2008.

696	   [RFC5456]  Spencer, M., Capouch, B., Guy, E., Miller, F., and K.
697	              Shumard, "IAX: Inter-Asterisk eXchange Version 2",
698	              RFC 5456, February 2010.

700	   [RFC5661]  Shepler, S., Eisler, M., and D. Noveck, "Network File
701	              System (NFS) Version 4 Minor Version 1 Protocol",
702	              RFC 5661, January 2010.

704	   [RFC5802]  Newman, C., Menon-Sen, A., Melnikov, A., and N. Williams,
705	              "Salted Challenge Response Authentication Mechanism
706	              (SCRAM) SASL and GSS-API Mechanisms", RFC 5802, July 2010.

708	   [RFC5804]  Melnikov, A. and T. Martin, "A Protocol for Remotely
709	              Managing Sieve Scripts", RFC 5804, July 2010.

711	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
712	              Applications (IDNA): Definitions and Document Framework",
713	              RFC 5890, August 2010.

715	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
716	              Applications (IDNA): Protocol", RFC 5891, August 2010.

718	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
719	              Internationalized Domain Names for Applications (IDNA)",
720	              RFC 5892, August 2010.

722	   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
723	              Internationalized Domain Names for Applications (IDNA)",
724	              RFC 5893, August 2010.

726	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
727	              Applications (IDNA): Background, Explanation, and
728	              Rationale", RFC 5894, August 2010.

730	   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
731	              Internationalized Domain Names in Applications (IDNA)
732	              2008", RFC 5895, September 2010.

734	   [RFC6452]  Faltstrom, P. and P. Hoffman, "The Unicode Code Points and
735	              Internationalized Domain Names for Applications (IDNA) -
736	              Unicode 6.0", RFC 6452, November 2011.

738	   [UAX15]    "Unicode Standard Annex #15: Unicode Normalization Forms",
739	              UAX 15, September 2009.

741	   [1]  <http://www.ietf.org/proceedings/78/slides/precis-2.pdf>

743	   [2]  <http://www.unicode.org/standard/principles.html>

745	   [3]  <http://www.ietf.org/proceedings/77/newprep.html>

747	   [4]  <http://www.ietf.org/proceedings/77/minutes/newprep.txt>

749	   [5]  <http://trac.tools.ietf.org/wg/precis/trac/report/6>

751	Appendix A.  Classification of Stringprep Profiles

753	   A number of the known cases of Stringprep use were evaluated during
754	   the preparation of this document.  The known cases are here described
755	   in two ways.  The types of identifiers the protocol uses is first
756	   called out in the ID type column (from Section 4.1.1), using the
757	   short forms "a" for Absolute, "d" for Definite, and "i" for
758	   Indefinite.  Next, there is a column that contains an "i" if the
759	   protocol string comes from user input, an "o" if the protocol string
760	   becomes user-facing output, "b" if both are true, and "n" if neither
761	   is true.  The remaining columns have an "x" if and only if the
762	   protocol uses that class, as described in Section 4.3.4.  Values
763	   marked "-" indicate that an answer is not useful; in this case, see
764	   detailed discussion in Appendix B.

766	      +------+--------+-------+-------------+-----------+-----------+
767	      |  RFC | IDtype | User? | DomainClass | NameClass | FreeClass |
768	      +------+--------+-------+-------------+-----------+-----------+
769	      | 3722 |    a   |   o   |             |     x     |     x     |
770	      | 3748 |    -   |   -   |      -      |     x     |     -     |
771	      | 3920 |   a,d  |   b   |             |     x     |     x     |
772	      | 4505 |    a   |   i   |             |           |     x     |
773	      | 4314 |   a,d  |   b   |             |     x     |     x     |
774	      | 4954 |   a,d  |   b   |             |     x     |           |
775	      | 5034 |   a,d  |   b   |             |     x     |           |
776	      | 5804 |   a,d  |   b   |             |     x     |           |
777	      +------+--------+-------+-------------+-----------+-----------+

779	                                  Table 1

781	   [[anchor22: This table now contains results of any reviews the WG
782	   did.  Unreviewed things in the tracker are not reflected here.
783	   --ajs@anvilwalrusden.com]]

785	Appendix B.  Evaluation of Stringprep Profiles

787	   This section is a summary of the evaluation of Stringprep
788	   profiles [5] that was done to get a good understanding of the usage
789	   of Stringprep.  This summary is by no means normative nor the actual
790	   evaluations themselves.  A template was used for reviewers to get a
791	   coherent view of all evaluations.

793	B.1.  iSCSI Stringprep Profiles: RFC3722, RFC3721, RFC3720

795	   Description:  An iSCSI session consists of an Initiator (i.e., host
796	      or server that uses storage) communicating with a target (i.e., a
797	      storage array or other system that provides storage).  Both the
798	      iSCSI initiator and target are named by iSCSI Names.  The iSCSI
799	      stringprep profile is used for iSCSI names.
800	   How it is used  iSCSI initiators and targets (see above).  They can
801	      also be used to identify SCSI ports (these are software entities
802	      in the iSCSI protocol, not hardware ports), and iSCSI logical
803	      units (storage volumes), although both are unusual in practice.
804	   What entities create these identifiers?  Generally a Human user (1)
805	      configures an Automated system (2) that generates the names.
806	      Advance configuration of the system is required due to the
807	      embedded use of external unique identifier (from the DNS or IEEE).
808	   How is the string input in the system?  Keyboard and copy-paste are
809	      common.  Copy-paste is common because iSCSI names are long enough
810	      to be problematic for humans to remember, causing use of email,
811	      sneaker-net, text files, etc. to avoid mistype mistakes.

813	   Where do we place the dividing line between user interface and
814	   protocol?  The iSCSI protocol requires that all i18n string
815	      preparation occur in the user interface.  The iSCSI protocol
816	      treats iSCSI names as opaque identifiers that are compared byte-
817	      by-byte for equality. iSCSI names are generally not checked for
818	      correct formatting by the protocol.
819	   What entities enforce the rules?  There are no iSCSI-specific
820	      enforcement entities, although the use of unique identifier
821	      information in the names relies on DNS registrars and the IEEE
822	      Registration Authority.
823	   Comparison  Byte-by-byte
824	   Case Folding, Sensitivity, Preservation  Case folding is required for
825	      the code blocks specified in RFC 3454, Table B.2.  The overall
826	      iSCSI naming system (UI + protocol) is case-insensitive.
827	   What is the impact if the comparison results in a false positive?
828	      Potential access to the wrong storage. - If the initiator has no
829	      access to the wrong storage, an authentication failure is the
830	      probable result. - If the initiator has access to the worng
831	      storage, the resulting mis-identificaiton could result in use of
832	      the wrong data and possible corruption of stored data.
833	   What is the impact if the comparison results in a false negative?
834	      Denial of authorized storage access.
835	   What are the security impacts?  iSCSI names are often used as the
836	      authentication identities for storage systems.  Comparison
837	      problems could result in authentication problems, although note
838	      that authentication failure ameliorates some of the false positive
839	      cases.
840	   Normalization  NFKC, as specified by RFC 3454.
841	   Mapping  Yes, as specified by table B.1 in RFC 3454
842	   Disallowed Characters  Only the following characters are allowed: -
843	      ASCII dash, dot, colon - ASCII lower case letters and digits -
844	      Unicode lower case characters as specified by RFC 3454 All other
845	      characters are disallowed.
846	   Which other strings or identifiers are these most similar to?  None -
847	      iSCSI names are unique to iSCSI.
848	   Are these strings or identifiers sometimes the same as strings or
849	   identifiers from other protocols?  No
850	   Does the identifier have internal structure that needs to be
851	   respected?  Yes - ASCII dot, dash and colon are used for internal
852	      name structure.  These are not reserved characters in that they
853	      can occur in the name in locations other than those used for
854	      structuring purposes (e.g., only the first occurrence of a colon
855	      character is structural, others are not).
856	   How are users exposed to these strings?  How are they published?
857	      iSCSI names appear in server and storage system configuration
858	      interfaces.  They also appear in system logs.

860	   Is the string / identifier used as input to other operations?
861	      Effectively, no.  The rarely used port and logical unit names
862	      involve concatenation, which effectively extends a unique iSCSI
863	      Name for a target to uniquely identify something within that
864	      target.
865	   How much tolerance for change from existing stringprep approach?
866	      Good tolerance; the community would prefer that i18n experts solve
867	      i18n problems ;-).
868	   How strong a desire for change (e.g., for Unicode agility)?  Unicode
869	      agility is desired in principle as long as nothing significant
870	      breaks.

872	B.2.  SMTP/POP3/ManageSieve Stringprep Profiles: RFC4954,RFC5034,RFC
873	      5804

875	   Description:  Authorization identity (user identifier) exchanged
876	      during SASL authentication: AUTH (SMTP/POP3) or AUTHENTICATE
877	      (ManageSieve) command.
878	   How It's Used:  Used for proxy authorization, e.g. to [lawfully]
879	      impersonate a particular user after a privileged authentication
880	   Who Generates It:  Typically generated by email system administrators
881	      using some tools/conventions, sometimes from some backend
882	      database. - In some setups human users can register own usernames
883	      (e.g. webmail self registration)
884	   User Input Methods:  - Typed by user / selected from a list - Copy-
885	      and-paste - Perhaps voice input - Can also be specified in
886	      configuration files or on a command line
887	   Enforcement:  - Rules enforced by server / add-on service (e.g.,
888	      gateway service) on registration of account
889	   Comparison Method:  "Type 1" (byte-for-byte) or "type 2" (compare by
890	      a common algorithm that everyone agrees on (e.g., normalize and
891	      then compare the result byte-by-byte))
892	   Case Folding, Sensitivity, Preservation:  Most likely case sensitive.
893	      Exact requirements on case-sensitivity/case-preservation depend on
894	      a specific implementation, e.g. an implementation might treat all
895	      user identifiers as case insensitive (or case insensitive for US-
896	      ASCII subset only).
897	   Impact of Comparison:  False positives: - an unauthorized user is
898	      allowed email service access (login) False negatives: - an
899	      authorized user is denied email service access
900	   Normalization:  NFKC (as per RFC 4013)
901	   Mapping:  (see Section 2 of RFC 4013 for the full list): Non ASCII
902	      spaces are mapped to space, etc.
903	   Disallowed Characters:  (see Section 2 of RFC 4013 for the full
904	      list): Unicode Control characters, etc.

906	   String Classes:  - simple username.  See Section 2 of RFC 4013 for
907	      details on restrictions.  Note that some implementations allow
908	      spaces in these.  While implementations are not required to use a
909	      specific format, an authorization identity frequently has the same
910	      format as an email address (and EAI email address in the future),
911	      or as a left hand side of an email address.  Note: whatever is
912	      recommended for SMTP/POP/ManageSieve authorization identity should
913	      also be used for IMAP authorization identities, as IMAP/POP3/SMTP/
914	      ManageSieve are frequently implemented together.
915	   Internal Structure:  None
916	   User Output:  Unlikely, but possible.  For example, if it is the same
917	      as an email address.
918	   Operations:  - Sometimes concatenated with other data and then used
919	      as input to a cryptographic hash function
920	   How much tolerance for change from existing stringprep approach?  Not
921	      sure.
922	   Background information:  In RFC 5034, when describing the POP3 AUTH
923	      command: The authorization identity generated by the SASL exchange
924	      is a simple username, and SHOULD use the SASLprep profile (see
925	      [RFC4013]) of the StringPrep algorithm (see [RFC3454]) to prepare
926	      these names for matching.  If preparation of the authorization
927	      identity fails or results in an empty string (unless it was
928	      transmitted as the empty string), the server MUST fail the
929	      authentication.  In RFC 4954, when describing the SMTP AUTH
930	      command: The authorization identity generated by this [SASL]
931	      exchange is a "simple username" (in the sense defined in
932	      [SASLprep]), and both client and server SHOULD (*) use the
933	      [SASLprep] profile of the [StringPrep] algorithm to prepare these
934	      names for transmission or comparison.  If preparation of the
935	      authorization identity fails or results in an empty string (unless
936	      it was transmitted as the empty string), the server MUST fail the
937	      authentication. (*) Note: Future revision of this specification
938	      may change this requirement to MUST.  Currently, the SHOULD is
939	      used in order to avoid breaking the majority of existing
940	      implementations.  In RFC 5804, when describing the ManageSieve
941	      AUTHENTICATE command: The authorization identity generated by this
942	      [SASL] exchange is a "simple username" (in the sense defined in
943	      [SASLprep]), and both client and server MUST use the [SASLprep]
944	      profile of the [StringPrep] algorithm to prepare these names for
945	      transmission or comparison.  If preparation of the authorization
946	      identity fails or results in an empty string (unless it was
947	      transmitted as the empty string), the server MUST fail the
948	      authentication.

950	B.3.  IMAP Stringprep Profiles: RFC5738, RFC4314: Usernames
951	   Evaluation Note  These documents have 2 types of strings (usernames
952	      and passwords), so there are two separate templates.
953	   Description:  "username" parameter to the IMAP LOGIN command,
954	      identifiers in IMAP ACL commands.  Note that any valid username is
955	      also an IMAP ACL identifier, but IMAP ACL identifiers can include
956	      other things like name of group of users.
957	   How It's Used:  Used for authentication (Usernames), or in IMAP
958	      Access Control Lists (Usernames or Group names)
959	   Who Generates It:  - Typically generated by email system
960	      administrators using some tools/conventions, sometimes from some
961	      backend database. - In some setups human users can register own
962	      usernames (e.g. webmail self registration)
963	   User Input Methods:  - Typed by user / selected from a list - Copy-
964	      and-paste - Perhaps voice input - Can also be specified in
965	      configuration files or on a command line
966	   Enforcement:  - Rules enforced by server / add-on service (e.g.,
967	      gateway service) on registration of account
968	   Comparison Method:  Type 1" (byte-for-byte) or "type 2" (compare by a
969	      common algorithm that everyone agrees on (e.g., normalize and then
970	      compare the result byte-by-byte))
971	   Case Folding, Sensitivity, Preservation:  - Most likely case
972	      sensitive.  Exact requirements on case-sensitivity/
973	      case-preservation depend on a specific implementation, e.g. an
974	      implementation might treat all user identifiers as case
975	      insensitive (or case insensitive for US-ASCII subset only).
976	   Impact of Comparison:  False positives: - an unauthorized user is
977	      allowed IMAP access (login) - improperly grant privileges (e.g.,
978	      access to a specific mailbox, ability to manage ACLs for a
979	      mailbox) False negatives: - an authorized user is denied IMAP
980	      access - unable to use granted privileges (e.g., access to a
981	      specific mailbox, ability to manage ACLs for a mailbox)
982	   Normalization:  NFKC (as per RFC 4013)
983	   Mapping:  (see Section 2 of RFC 4013 for the full list): non ASCII
984	      spaces are mapped to space
985	   Disallowed Characters:  (see Section 2 of RFC 4013 for the full
986	      list): Unicode Control characters, etc.
987	   String Classes:  - simple username.  See Section 2 of RFC 4013 for
988	      details on restrictions.  Note that some implementations allow
989	      spaces in these.  While IMAP implementations are not required to
990	      use a specific format, an IMAP username frequently has the same
991	      format as an email address (and EAI email address in the future),
992	      or as a left hand side of an email address.  Note: whatever is
993	      recommended for IMAP username should also be used for ManageSieve,
994	      POP3 and SMTP authorization identities, as IMAP/POP3/SMTP/
995	      ManageSieve are frequently implemented together.

997	   Internal Structure:  None
998	   User Output:  Unlikely, but possible.  For example, if it is the same
999	      as an email address. - access control lists (e.g. in IMAP ACL
1000	      extension), both when managing membership and listing membership
1001	      of existing access control lists. - often show up as mailbox names
1002	      (under Other Users IMAP namespace)
1003	   Operations:  - Sometimes concatenated with other data and then used
1004	      as input to a cryptographic hash function
1005	   How much tolerance for change from existing stringprep approach?  Not
1006	      sure.  Non-ASCII IMAP usernames are currently prohibited by IMAP
1007	      (RFC 3501).  However they are allowed when used in IMAP ACL
1008	      extension.

1010	B.4.  IMAP Stringprep Profiles: RFC5738: Passwords

1012	   Description:  "Password" parameter to the IMAP LOGIN command
1013	   How It's Used:  Used for authentication (Passwords)
1014	   Who Generates It:  Either generated by email system administrators
1015	      using some tools/conventions, or specified by the human user.
1016	   User Input Methods:  - Typed by user - Copy-and-paste - Perhaps voice
1017	      input - Can also be specified in configuration files or on a
1018	      command line
1019	   Enforcement:  Rules enforced by server / add-on service (e.g.,
1020	      gateway service or backend databse) on registration of account
1021	   Comparison Method:  "Type 1" (byte-for-byte)
1022	   Case Folding, Sensitivity, Preservation:  Most likely case sensitive.
1023	   Impact of Comparison:  False positives: - an unauthorized user is
1024	      allowed IMAP access (login) False negatives: - an authorized user
1025	      is denied IMAP access
1026	   Normalization:  NFKC (as per RFC 4013)
1027	   Mapping:  (see Section 2 of RFC 4013 for the full list): non ASCII
1028	      spaces are mapped to space
1029	   Disallowed Characters:  (see Section 2 of RFC 4013 for the full
1030	      list): Unicode Control characters, etc.
1031	   String Classes:  Currently defined as "simple username" (see Section
1032	      2 of RFC 4013 for details on restrictions.), however this is
1033	      likely to be a different class from usernames.  Note that some
1034	      implementations allow spaces in these.  Password in all email
1035	      related protocols should be treated in the same way.  Same
1036	      passwords are frequently shared with web, IM, etc. applications.
1037	   Internal Structure:  None
1038	   User Output:  - text of email messages (e.g. in "you forgot your
1039	      password" email messages) - web page / directory - side of the bus
1040	      / in ads -- possible

1042	   Operations:  Sometimes concatenated with other data and then used as
1043	      input to a cryptographic hash function.  Frequently stored as is,
1044	      or hashed.
1045	   How much tolerance for change from existing stringprep approach?  Not
1046	      sure.  Non-ASCII IMAP passwords are currently prohibited by IMAP
1047	      (RFC 3501), however they are likely to be in widespread use.
1048	   Background information:  RFC 5738 (IMAP I18N): 5.  UTF8=USER
1049	      Capability If the "UTF8=USER" capability is advertised, that
1050	      indicates the server accepts UTF-8 user names and passwords and
1051	      applies SASLprep [RFC4013] to both arguments of the LOGIN command.
1052	      The server MUST reject UTF-8 that fails to comply with the formal
1053	      syntax in RFC 3629 [RFC3629] or if it encounters Unicode
1054	      characters listed in Section 2.3 of SASLprep RFC 4013 [RFC4013].
1055	      RFC 4314 (IMAP4 Access Control List (ACL) Extension): 3.  Access
1056	      control management commands and responses Servers, when processing
1057	      a command that has an identifier as a parameter (i.e., any of
1058	      SETACL, DELETEACL, and LISTRIGHTS commands), SHOULD first prepare
1059	      the received identifier using "SASLprep" profile [SASLprep] of the
1060	      "stringprep" algorithm [Stringprep].  If the preparation of the
1061	      identifier fails or results in an empty string, the server MUST
1062	      refuse to perform the command with a BAD response.  Note that
1063	      Section 6 recommends additional identifier's verification steps.
1064	      and in Section 6: This document relies on [SASLprep] to describe
1065	      steps required to perform identifier canonicalization
1066	      (preparation).  The preparation algorithm in SASLprep was
1067	      specifically designed such that its output is canonical, and it is
1068	      well-formed.  However, due to an anomaly [PR29] in the
1069	      specification of Unicode normalization, canonical equivalence is
1070	      not guaranteed for a select few character sequences.  Identifiers
1071	      prepared with SASLprep can be stored and returned by an ACL
1072	      server.  The anomaly affects ACL manipulation and evaluation of
1073	      identifiers containing the selected character sequences.  These
1074	      sequences, however, do not appear in well-formed text.  In order
1075	      to address this problem, an ACL server MAY reject identifiers
1076	      containing sequences described in [PR29] by sending the tagged BAD
1077	      response.  This is in addition to the requirement to reject
1078	      identifiers that fail SASLprep preparation as described in Section
1079	      3.

1081	B.5.  Anonymous SASL Stringprep Profiles: RFC4505

1083	   Description:  RFC 4505 defines a "trace" field:
1084	   Comparison:  this field is not intended for comparison (only used for
1085	      logging)

1087	   Case folding; case sensitivity, preserve case:  No case folding/case
1088	      sensitive
1089	   Do users input the strings directly?  Yes. Possibly entered in
1090	      configuration UIs, or on a command line.  Can also be stored in
1091	      configuration files.  The value can also be automatically
1092	      generated by clients (e.g. a fixed string is used, or a user's
1093	      email address).
1094	   How users input strings?  Keyboard/voice, stylus (pick from a list).
1095	      Copy-paste - possibly.
1096	   Normalization:  None
1097	   Disallowed Characters  Control characters are disallowed.  (See
1098	      Section 3 of RFC 4505)
1099	   Which other strings or identifiers are these most similar to?  RFC
1100	      4505 says that the trace "should take one of two forms: an
1101	      Internet email address, or an opaque string that does not contain
1102	      the '@' U+0040) character and that can be interpreted by the
1103	      system administrator of the client's domain."  In practice, this
1104	      is a freeform text, so it belongs to a different class from "email
1105	      address" or "username".
1106	   Are these strings or identifiers sometimes the same as strings or
1107	   identifiers from other protocols (e.g., does an IM system sometimes
1108	   use the same credentials database for authentication as an email
1109	   system)?  Yes: see above.  However there is no strong need to keep
1110	      them consistent in the future.
1111	   How are users exposed to these strings, how are they published?  No.
1112	      However, The value can be seen in server logs
1113	   Impacts of false positives and false negatives:  False positive: a
1114	      user can be confused with another user.  False negative: two
1115	      distinct users are treated as the same user.  But note that the
1116	      trace field is not authenticated, so it can be easily falsified.
1117	   Tolerance of changes in the community  The community would be
1118	      flexible.
1119	   Delimiters  No internal structure, but see comments above about
1120	      frequent use of email addresses.
1121	   Background information:  The Anonymous Mechanism The mechanism
1122	      consists of a single message from the client to the server.  The
1123	      client may include in this message trace information in the form
1124	      of a string of [UTF-8]-encoded [Unicode] characters prepared in
1125	      accordance with [StringPrep] and the "trace" stringprep profile
1126	      defined in Section 3 of this document.  The trace information,
1127	      which has no semantical value, should take one of two forms: an
1128	      Internet email address, or an opaque string that does not contain
1129	      the '@' (U+0040) character and that can be interpreted by the
1130	      system administrator of the client's domain.  For privacy reasons,
1131	      an Internet email address or other information identifying the
1132	      user should only be used with permission from the user. 3.  The
1133	      "trace" Profile of "Stringprep" This section defines the "trace"
1134	      profile of [StringPrep].  This profile is designed for use with
1135	      the SASL ANONYMOUS Mechanism.  Specifically, the client is to
1136	      prepare the message production in accordance with this profile.
1137	      The character repertoire of this profile is Unicode 3.2 [Unicode].
1138	      No mapping is required by this profile.  No Unicode normalization
1139	      is required by this profile.  The list of unassigned code points
1140	      for this profile is that provided in Appendix A of [StringPrep].
1141	      Unassigned code points are not prohibited.  Characters from the
1142	      following tables of [StringPrep] are prohibited: - C.2.1 (ASCII
1143	      control characters) - C.2.2 (Non-ASCII control characters) - C.3
1144	      (Private use characters) - C.4 (Non-character code points) - C.5
1145	      (Surrogate codes) - C.6 (Inappropriate for plain text) - C.8
1146	      (Change display properties are deprecated) - C.9 (Tagging
1147	      characters) No additional characters are prohibited.  This profile
1148	      requires bidirectional character checking per Section 6 of
1149	      [StringPrep].

1151	B.6.  XMPP Stringprep Profiles: RFC3920 Nodeprep

1153	   Description:  Localpart of JabberID ("JID"), as in:
1154	      localpart@domainpart/resourcepart
1155	   How It's Used:  - Usernames (e.g., stpeter@jabber.org) - Chatroom
1156	      names (e.g., precis@jabber.ietf.org) - Publish-subscribe nodes -
1157	      Bot names
1158	   Who Generates It:  - Typically, end users via an XMPP client -
1159	      Sometimes created in an automated fashion
1160	   User Input Methods:  - Typed by user - Copy-and-paste - Perhaps voice
1161	      input - Clicking a URI/IRI
1162	   Enforcement:  - Rules enforced by server / add-on service (e.g.,
1163	      chatroom service) on registration of account, creation of room,
1164	      etc.
1165	   Comparison Method:  "Type 2" (common algorithm)
1166	   Case Folding, Sensitivity, Preservation:  - Strings are always folded
1167	      to lowercase - Case is not preserved
1168	   Impact of Comparison:  False positives: - unable to authenticate at
1169	      server (or authenticate to wrong account) - add wrong person to
1170	      buddy list - join the wrong chatroom - improperly grant privileges
1171	      (e.g., chatroom admin) - subscribe to wrong pubsub node - interact
1172	      with wrong bot - allow communication with blocked entity False
1173	      negatives: - unable to authenticate - unable to add someone to
1174	      buddy list - unable to join desired chatroom - unable to use
1175	      granted privileges (e.g., chatroom admin) - unable to subscribe to
1176	      desired pubsub node - unable to interact with desired bot -
1177	      disallow communication with unblocked entity
1178	   Normalization:  NFKC
1179	   Mapping:  Spaces are mapped to nothing
1180	   Disallowed Characters:  ",&,',/,:,<,>,@
1181	   String Classes:  - Often similar to generic username - Often similar
1182	      to localpart of email address - Sometimes same as localpart of
1183	      email address
1184	   Internal Structure:  None
1185	   User Output:  - vCard - email signature - web page / directory - text
1186	      of message (e.g., in a chatroom)
1187	   Operations:  - Sometimes concatenated with other data and then used
1188	      as input to a cryptographic hash function

1190	B.7.  XMPP Stringprep Profiles: RFC3920 Resourceprep

1192	   Description:  - Resourcepart of JabberID ("JID"), as in:
1193	      localpart@domainpart/resourcepart - Typically free-form text
1194	   How It's Used:  - Device / session names (e.g.,
1195	      stpeter@jabber.org/Home) - Nicknames (e.g.,
1196	      precis@jabber.ietf.org/StPeter)
1197	   Who Generates It:  - Often human users via an XMPP client - Often
1198	      generated in an automated fashion by client or server
1199	   User Input Methods:  - Typed by user - Copy-and-paste - Perhaps voice
1200	      input - Clicking a URI/IRI
1201	   Enforcement:  - Rules enforced by server / add-on service (e.g.,
1202	      chatroom service) on account login, joining a chatroom, etc.
1203	   Comparison Method:  "Type 2" (byte-for-byte)
1204	   Case Folding, Sensitivity, Preservation:  - Strings are never folded
1205	      - Case is preserved
1206	   Impact of Comparison:  False positives: - interact with wrong device
1207	      (e.g., for file transfer or voice call) - interact with wrong
1208	      chatroom participant - improperly grant privileges (e.g., chatroom
1209	      moderator) - allow communication with blocked entity False
1210	      negatives: - unable to choose desired chatroom nick - unable to
1211	      use granted privileges (e.g., chatroom moderator) - disallow
1212	      communication with unblocked entity
1213	   Normalization:  NFKC
1214	   Mapping:  Spaces are mapped to nothing
1215	   Disallowed Characters:  None
1216	   String Classes:  Basically a free-form identifier
1217	   Internal Structure:  None
1218	   User Output:  - text of message (e.g., in a chatroom) - device names
1219	      often not exposed to human users
1220	   Operations:  Sometimes concatenated with other data and then used as
1221	      input to a cryptographic hash function

1223	B.8.  EAP Stringprep Profiles: RFC3748
1224	   Description:  RFC 3748 section 5 references Stringprep, but the WG
1225	      did not agree with the text (was added by IESG) and there are no
1226	      known implementations that use Stringprep.  The main problem with
1227	      that text is that the use of strings is a per-method concept, not
1228	      a generic EAP concept and so RFC 3748 itself does not really use
1229	      Stringprep, but individual EAP methods could.  As such, the
1230	      answers to the template questions are mostly not applicable, but a
1231	      few answers are universal across methods.  The list of IANA
1232	      registered EAP methods is at http://www.iana.org/assignments/
1233	      eap-numbers/eap-numbers.xml#eap-numbers-3
1234	   Comparison Methods:  n/a (per-method)
1235	   Case Folding, Case Sensitivity, Case Preservation:  n/a (per-method)
1236	   Impact of comparison:  A false positive results in unauthorized
1237	      network access (and possibly theft of service if some else is
1238	      billed).  A false negative results in lack of authorized network
1239	      access (no connectivity).
1240	   User input:  n/a (per-method)
1241	   Normalization:  n/a (per-method)
1242	   Mapping:  n/a (per-method)
1243	   Disallowed characters:  n/a (per-method)
1244	   String classes:  Although some EAP methods may use a syntax similar
1245	      to other types of identifiers, EAP mandates that the actual values
1246	      must not be assumed to be identifiers usable with anything else.
1247	   Internal structure:  n/a (per-method)
1248	   User output:  Identifiers are never human displayed except perhaps as
1249	      they're typed by a human.
1250	   Operations:  n/a (per-method)
1251	   Community considerations:  There is no resistance to change for the
1252	      base EAP protocol (as noted, the WG didn't want the existing
1253	      text).  However actual use of stringprep, if any, within specific
1254	      EAP methods may have resistance.  It is currently unknown whether
1255	      any EAP methods use stringprep.

1257	Appendix C.  Changes between versions

1259	   Note to RFC Editor: This section should be removed prior to
1260	   publication.

1262	C.1.  00

1264	   First WG version.  Based on
1265	   draft-blanchet-precis-problem-statement-00.

1267	C.2.  01
1268	   o  Made clear that the document is talking only about Unicode code
1269	      points, and not any particular encoding.
1270	   o  Substantially reorganized the document along the lines of the
1271	      review template at <http://trac.tools.ietf.org/wg/precis/trac/
1272	      wiki/StringprepReviewTemplate>.
1273	   o  Included specific questions for each topic for consideration.
1274	   o  Moved spot for individual protocol review to appendix.  Not
1275	      populated yet.

1277	C.3.  02

1279	   o  Cleared up details of comparison classes
1280	   o  Added a section on changes in Unicode

1282	C.4.  03

1284	   o  Aligned comparison discussion with identifier discussion from
1285	      draft-iab-identifier-comparison-00
1286	   o  Added section on classes of strings ("Namey" and so on)

1288	C.5.  04

1290	   Keepalive version

1292	C.6.  05

1294	   o  Changed classes of strings to align with framework doc
1295	   o  Altered table in Appendix A
1296	   o  Added all profiles evaluations from the wg wiki in appendix B

1298	Authors' Addresses

1300	   Marc Blanchet
1301	   Viagenie
1302	   246 Aberdeen
1303	   Quebec, QC  G1R 2E1
1304	   Canada

1306	   Email: Marc.Blanchet@viagenie.ca
1307	   URI:   http://viagenie.ca
1308	   Andrew Sullivan
1309	   Dyn, Inc.
1310	   150 Dow St
1311	   Manchester, NH  03101
1312	   U.S.A.

1314	   Email: asullivan@dyn.com