idnits 2.17.1 

draft-ietf-precis-problem-statement-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 11, 2011) is 4673 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-09) exists of
     draft-iab-identifier-comparison-00

  -- Obsolete informational reference (is this intentional?): RFC 3454
     (Obsoleted by RFC 7564)

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3491
     (Obsoleted by RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3530
     (Obsoleted by RFC 7530)

  -- Obsolete informational reference (is this intentional?): RFC 3920
     (Obsoleted by RFC 6120)

  -- Obsolete informational reference (is this intentional?): RFC 4013
     (Obsoleted by RFC 7613)

  -- Obsolete informational reference (is this intentional?): RFC 5661
     (Obsoleted by RFC 8881)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                        M. Blanchet
3	Internet-Draft                                                  Viagenie
4	Intended status: Informational                               A. Sullivan
5	Expires: January 12, 2012                                  July 11, 2011

7	                 Stringprep Revision Problem Statement
8	               draft-ietf-precis-problem-statement-03.txt

10	Abstract

12	   Using Unicode codepoints in protocol strings that expect comparison
13	   with other strings requires preparation of the string that contains
14	   the Unicode codepoints.  Internationalizing Domain Names in
15	   Applications (IDNA2003) defined and used Stringprep and Nameprep.
16	   Other protocols subsequently defined Stringprep profiles.  A new
17	   approach different from Stringprep and Nameprep is used for a
18	   revision of IDNA2003 (called IDNA2008).  Other Stringprep profiles
19	   need to be similarly updated or a replacement of Stringprep needs to
20	   be designed.  This document outlines the issues to be faced by those
21	   designing a Stringprep replacement.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on January 12, 2012.

40	Copyright Notice

42	   Copyright (c) 2011 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	   This document may contain material from IETF Documents or IETF
56	   Contributions published or made publicly available before November
57	   10, 2008.  The person(s) controlling the copyright in some of this
58	   material may not have granted the IETF Trust the right to allow
59	   modifications of such material outside the IETF Standards Process.
60	   Without obtaining an adequate license from the person(s) controlling
61	   the copyright in such materials, this document may not be modified
62	   outside the IETF Standards Process, and derivative works of it may
63	   not be created outside the IETF Standards Process, except to format
64	   it for publication as an RFC or to translate it into languages other
65	   than English.

67	Table of Contents

69	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
70	   2.  Issues raised during newprep BOF . . . . . . . . . . . . . . .  5
71	   3.  Major Topics for Consideration . . . . . . . . . . . . . . . .  6
72	     3.1.  Comparison . . . . . . . . . . . . . . . . . . . . . . . .  6
73	       3.1.1.  Types of Identifiers . . . . . . . . . . . . . . . . .  6
74	       3.1.2.  Effect of comparison . . . . . . . . . . . . . . . . .  7
75	     3.2.  Dealing with characters  . . . . . . . . . . . . . . . . .  7
76	       3.2.1.  Case folding, case sensitivity, and case
77	               preservation . . . . . . . . . . . . . . . . . . . . .  7
78	       3.2.2.  Stringprep and NFKC  . . . . . . . . . . . . . . . . .  7
79	       3.2.3.  Character mapping  . . . . . . . . . . . . . . . . . .  8
80	       3.2.4.  Prohibited characters  . . . . . . . . . . . . . . . .  8
81	       3.2.5.  Internal structure, delimiters, and special
82	               characters . . . . . . . . . . . . . . . . . . . . . .  9
83	     3.3.  Where the data comes from and where it goes  . . . . . . .  9
84	       3.3.1.  User input and the source of protocol elements . . . .  9
85	       3.3.2.  User output  . . . . . . . . . . . . . . . . . . . . . 10
86	       3.3.3.  Operations . . . . . . . . . . . . . . . . . . . . . . 10
87	       3.3.4.  Some useful classes of strings . . . . . . . . . . . . 11
88	   4.  Considerations for Stringprep replacement  . . . . . . . . . . 12
89	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
90	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
91	   7.  Discussion home for this draft . . . . . . . . . . . . . . . . 12
92	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
93	   9.  Informative References . . . . . . . . . . . . . . . . . . . . 13
94	   Appendix A.  Protocols known to be using Stringprep  . . . . . . . 16
95	   Appendix B.  Detailed discussion of protocols under
96	                consideration . . . . . . . . . . . . . . . . . . . . 17
97	   Appendix C.  Changes between versions  . . . . . . . . . . . . . . 17
98	     C.1.  00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
99	     C.2.  01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
100	     C.3.  02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
101	     C.4.  03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
102	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18

104	1.  Introduction

106	   Internationalizing Domain Names in Applications (IDNA2003) [RFC3490],
107	   [RFC3491], [RFC3492], [RFC3454] described a mechanism for encoding
108	   Unicode labels making up Internationalized Domain Names (IDNs) as
109	   standard DNS labels.  The labels were processed using a method called
110	   Nameprep [RFC3491] and Punycode [RFC3492].  That method was specific
111	   to IDNA2003, but is generalized as Stringprep [RFC3454].  The general
112	   mechanism can be used to help other protocols with similar needs, but
113	   with different constraints than IDNA2003.

115	   Stringprep defines a framework within which protocols define their
116	   Stringprep profiles.  Known IETF specifications using Stringprep are
117	   listed below:
118	   o  The Nameprep profile [RFC3490] for use in Internationalized Domain
119	      Names (IDNs);
120	   o  NFSv4 [RFC3530] and NFSv4.1 [RFC5661];
121	   o  The iSCSI profile [RFC3722] for use in Internet Small Computer
122	      Systems Interface (iSCSI) Names;
123	   o  EAP [RFC3748];
124	   o  The Nodeprep and Resourceprep profiles [RFC3920] for use in the
125	      Extensible Messaging and Presence Protocol (XMPP), and the XMPP to
126	      CPIM mapping [RFC3922] (the latter of these relies on the former);
127	   o  The Policy MIB profile [RFC4011] for use in the Simple Network
128	      Management Protocol (SNMP);
129	   o  The SASLprep profile [RFC4013] for use in the Simple
130	      Authentication and Security Layer (SASL), and SASL itself
131	      [RFC4422];
132	   o  TLS [RFC4279];
133	   o  IMAP4 using SASLprep [RFC4314];
134	   o  The trace profile [RFC4505] for use with the SASL ANONYMOUS
135	      mechanism;
136	   o  The LDAP profile [RFC4518] for use with LDAP [RFC4511] and its
137	      authentication methods [RFC4513];
138	   o  Plain SASL using SASLprep [RFC4616];
139	   o  NNTP using SASLprep [RFC4643];
140	   o  PKIX subject identification using LDAPprep [RFC4683];
141	   o  Internet Application Protocol Collation Registry [RFC4790];
142	   o  SMTP Auth using SASLprep [RFC4954];
143	   o  POP3 Auth using SASLprep [RFC5034];
144	   o  TLS SRP using SASLprep [RFC5054];
145	   o  IRI and URI in XMPP [RFC5122];
146	   o  PKIX CRL using LDAPprep [RFC5280];
147	   o  IAX using Nameprep [RFC5456];
148	   o  SASL SCRAM using SASLprep [RFC5802];
149	   o  Remote management of Sieve using SASLprep [RFC5804];
150	   o  The i;unicode-casemap Unicode Collation [RFC5051].

152	   There turned out to be some difficulties with IDNA2003, documented in
153	   [RFC4690].  These difficulties led to a new IDN specification, called
154	   IDNA2008 [RFC5890], [RFC5891], [RFC5892], [RFC5893].  Additional
155	   background and explanations of the decisions embodied in IDNA2008 is
156	   presented in [RFC5894].  One of the effects of IDNA2008 is that
157	   Nameprep and Stringprep are not used at all.  Instead, an algorithm
158	   based on Unicode properties of codepoints is defined.  That algorithm
159	   generates a stable and complete table of the supported Unicode
160	   codepoints.  This algorithm is based on an inclusion-based approach,
161	   instead of the exclusion-based approach of Stringprep/Nameprep.

163	   This document lists the shortcomings and issues found by protocols
164	   listed above that defined Stringprep profiles.  It also lists some
165	   early conclusions and requirements for a potential replacement of
166	   Stringprep.

168	2.  Issues raised during newprep BOF

170	   During IETF 77, a BOF discussed the current state of the protocols
171	   that have defined Stringprep profiles [NEWPREP].  The main
172	   conclusions from that discussion were as follows:
173	   o  Stringprep is bound to a specific version of Unicode: 3.2.
174	      Stringprep has not been updated to new versions of Unicode.
175	      Therefore, the protocols using Stringprep are stuck to Unicode
176	      3.2.
177	   o  The protocols need to be updated to support new versions of
178	      Unicode.  The protocols would like to not be bound to a specific
179	      version of Unicode, but rather have better Unicode agility in the
180	      way of IDNA2008.  This is important partly because it is usually
181	      impossible for an application to require Unicode 3.2; the
182	      application gets whatever version of Unicode is available on the
183	      host.
184	   o  The protocols require better bidirectional support (bidi) than
185	      currently offered by Stringprep.
186	   o  If the protocols are updated to use a new version of Stringprep or
187	      another framework, then backward compatibility is an important
188	      requirement.  For example, Stringprep is based on and may use NFKC
189	      [UAX15], while IDNA2008 mostly uses NFC [UAX15].
190	   o  Protocols use each other; for example, a protocol can use user
191	      identifiers that are later passed to SASL, LDAP or another
192	      authentication mechanism.  Therefore, common set of rules or
193	      classes of strings are preferred over specific rules for each
194	      protocol.

196	   Protocols that use Stringprep profiles use strings for different
197	   purposes:
198	   o  XMPP uses a different Stringprep profile for each part of the XMPP
199	      address (JID): a localpart which is similar to a username and used
200	      for authentication, a domainpart which is a domain name and a
201	      resource part which is less restrictive than the localpart.
202	   o  iSCSI uses a Stringprep profile for the IQN, which is very similar
203	      to (often is) a DNS domain name.
204	   o  SASL and LDAP uses a Stringprep profile for usernames.
205	   o  LDAP uses a set of Stringprep profiles.

207	   During the newprep BOF, it was the consensus of the attendees that it
208	   would be highly desirable to have a replacement of Stringprep, with
209	   similar characteristics to IDNA2008.  That replacement should be
210	   defined so that the protocols could use internationalized strings
211	   without a lot of specialized internationalization work, since
212	   internationalization expertise is not available in the respective
213	   protocols or working groups.

215	3.  Major Topics for Consideration

217	   This section provides an overview of major topics that a Stringprep
218	   replacement needs to address.  The headings correspond roughly with
219	   categories under which known Stringprep-using protocol RFCs have been
220	   evaluated.  For the details of those evaluations, see Appendix A.

222	3.1.  Comparison

224	3.1.1.  Types of Identifiers

226	   Following [I-D.iab-identifier-comparison], we can organize
227	   identifiers into three classes in respect of how they may be compared
228	   with one another:

230	   Absolute Identifiers  Identifiers that can be compared byte-by-byte
231	      for equality.
232	   Definite Identifiers  Identifiers that have a well-defined comparison
233	      algorithm on which all parties agree.
234	   Indefinite Identifiers  Identifiers that have no single comparison
235	      algorithm on which all parties agree.

237	   Definite Identifiers include cases like the comparison of Unicode
238	   code points in different encodings: they do not match byte for byte,
239	   but can all be converted to a single encoding which then does match
240	   byte for byte.  Indefinite Identifiers are sometimes algorithmically
241	   comparable by well-specified subsets of parties.  For more discussion
242	   of these categories, see [I-D.iab-identifier-comparison].

244	   The section on treating the existing known cases, Appendix A uses
245	   these categories.

247	3.1.2.  Effect of comparison

249	   The three classes of comparison style outlined in Section 3.1.1 may
250	   have different effects when applied.  It is necessary to evaluate the
251	   effects if a comparison results in a false positive, and what the
252	   effects are if a comparison results in a false negative, especially
253	   in terms of the consequences to security and usability.

255	3.2.  Dealing with characters

257	   This section outlines a range of issues having to do with characters
258	   in the target protocols, and spends some effort to outline the ways
259	   in which IDNA2008 might be a good analogy to other protocols, and
260	   ways in which it might be a poor one.

262	3.2.1.  Case folding, case sensitivity, and case preservation

264	   In IDNA2003, labels are always mapped to lower case before the
265	   Punycode transformation.  In IDNA2008, there is no mapping at all:
266	   input is either a valid U-label or it is not.  At the same time,
267	   upper-case characters are by definition not valid U-labels, because
268	   they fall into the Unstable category (category B) of [RFC5892].

270	   If there are protocols that require upper and lower cases be
271	   preserved, then the analogy with IDNA2008 will break down.
272	   Accordingly, existing protocols are to be evaluated according to the
273	   following criteria:

275	   1.  Does the protocol use case folding?  For all blocks of code
276	       points, or just for certain subsets?
277	   2.  Is the system or protocol case sensitive?
278	   3.  Does the system or protocol preserve case?

280	3.2.2.  Stringprep and NFKC

282	   Stringprep profiles may use normalization.  If they do, they use NFKC
283	   [UAX15].  It is not clear that NFKC is the right normalization to use
284	   in all cases.  In [UAX15], there is the following observation
285	   regarding Normalization Forms KC and KD: "It is best to think of
286	   these Normalization Forms as being like uppercase or lowercase
287	   mappings: useful in certain contexts for identifying core meanings,
288	   but also performing modifications to the text that may not always be
289	   appropriate."  For things like the spelling of users' names, then,
290	   NFKC may not be the best form to use.  At the same time, one of the
291	   nice things about NFKC is that it deals with the width of characters
292	   that are otherwise similar, by canonicalizing half-width to full-
293	   width.  This mapping step can be crucial in practice.  The WG will
294	   need to analyze the different use profiles and consider whether NFKC
295	   or NFC is a better normalization for each profile.

297	   For the purposes of evaluating an existing example of Stringprep use,
298	   it is helpful to know whether it uses no normalization, NFKC, or NFC.

300	3.2.3.  Character mapping

302	   Along with the case mapping issues raised in Section 3.2.1, there is
303	   the question of whether some characters are mapped either to other
304	   characters or to nothing during Stringprep.  [RFC3454], Section 3,
305	   outlines a number of characters that are mapped to nothing, and also
306	   permits Stringprep profiles to define their own mappings.

308	3.2.4.  Prohibited characters

310	   Along with case folding and other character mappings, many protocols
311	   have characters that are simply disallowed.  For example, control
312	   characters and special characters such as "@" or "/" may be
313	   prohibited in a protocol.

315	   One of the primary changes of IDNA2008 is in the way it approaches
316	   Unicode code points.  IDNA2003 created an explicit list of excluded
317	   or mapped-away characters; anything in Unicode 3.2 that was not so
318	   listed could be assumed to be allowed under the protocol.  IDNA2008
319	   begins instead from the assumption that code points are disallowed,
320	   and then relies on Unicode properties to derive whether a given code
321	   point actually is allowed in the protocol.

323	   Moreover, there is more than one class of "allowed in the protocol".
324	   While some code points are disallowed outright, some are allowed only
325	   in certain contexts.  The reasons for the context-dependent rules
326	   have to do with the way some characters are used.  For instance, the
327	   ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER (ZWJ, U+200D and ZWNJ,
328	   U+200C) are allowed with contextual rules because they are required
329	   in some circumstances, yet are considered punctuation by Unicode and
330	   would therefore be DISALLOWED under the usual IDNA2008 derivation
331	   rules.  The goal is to provide the widest possible repertoire of code
332	   points possible and consistent with the traditional DNS, trusting to
333	   the operators of individual zones to make sensible (and usually more
334	   restrictive) policies for their zones.

336	   IDNA2008 may be a poor model for what other protocols ought to do in
337	   this case, because it is designed to support an old protocol that is
338	   designed to operate on the scale of the entire Internet.  Moreover,
339	   IDNA2008 is intended to be deployed without any change to the base
340	   DNS protocol.  Other protocols may aim at deployment in more local
341	   environments, or may have protocol version negotiation built in.

343	3.2.5.  Internal structure, delimiters, and special characters

345	   IDNA2008 has a special problem with delimiters, because the delimiter
346	   "character" in the DNS wire format is not really part of the data.
347	   In DNS, labels are not separated exactly; instead, a label carries
348	   with it an indicator that says how long the label is.  When the label
349	   is presented in presentation format as part of a fully qualified
350	   domain name, the label separator FULL STOP, U+002E (.) is used to
351	   break up the labels.  But because that label separator does not
352	   travel with the wire format of the domain name, there is no way to
353	   encode a different, "internationalized" separator in IDNA2008.

355	   Other protocols may include characters with similar special meaning
356	   within the protocol.  Common characters for these purposes include
357	   FULL STOP, U+002E (.); COMMERCIAL AT, U+0040 (@); HYPHEN-MINUS,
358	   U+002D (-); SOLIDUS, U+002F (/); and LOW LINE, U+005F (_).  The mere
359	   inclusion of such a character in the protocol is not enough for it to
360	   be considered similar to another protocol using the same character;
361	   instead, handling of the character must be taken into consideration
362	   as well.

364	   An important issue to tackle here is whether it is valuable to map to
365	   or from these special characters as part of the Stringprep
366	   replacement.  In some locales, the analogue to FULL STOP, U+002E is
367	   some other character, and users may expect to be able to substitute
368	   their normal stop for FULL STOP, U+002E. At the same time, there are
369	   predictability arguments in favour of treating names with FULL STOP,
370	   U+002E in them just the way they are treated under IDNA2008.

372	3.3.  Where the data comes from and where it goes

374	3.3.1.  User input and the source of protocol elements

376	   Some protocol elements are provided by users, and others are not.
377	   Those that are not may presumably be subject to greater restrictions,
378	   whereas those that users provide likely need to permit the broadest
379	   range of code points.  The following questions are helpful:

381	   1.  Do users input the strings directly?
382	   2.  If so, how? (keyboard, stylus, voice, copy-paste, etc.)
383	   3.  Where do we place the dividing line between user interface and
384	       protocol? (see [RFC5895])

386	3.3.2.  User output

388	   Just as only some protocol elements are expected to be entered
389	   directly by users, only some protocol elements are intended to be
390	   consumed directly by users.  It is important to know how users are
391	   expected to be able to consume the protocol elements, because
392	   different environments present different challenges.  An element that
393	   is only ever delivered as part of a vCard remains in machine-readable
394	   format, so the problem of visual confusion is not a great one.  Is
395	   the protocol element published as part of a vCard, a web directory,
396	   on a business card, or on "the side of a bus"?  Do users use the
397	   protocol element as an identifier (which means that they might enter
398	   it again in some other context)?

400	3.3.3.  Operations

402	   Some strings are useful as part of the protocol but are not used as
403	   input to other operations (for instance, purely informative or
404	   descriptive text).  Other strings are used directly as input to other
405	   operations (such as cryptographic hash functions), or are used
406	   together with other strings to (such as concatenating a string with
407	   some others to form a unique identifier).

409	3.3.3.1.  String classes

411	   Strings often have a similar function in different protocols.  For
412	   instance, many different protocols contain user identifiers or
413	   passwords.  A single profile for all such uses might be desirable.

415	   Often, a string in a protocol is effectively a protocol element from
416	   another protocol.  For instance, different systems might use the same
417	   credentials database for authentication.

419	3.3.3.2.  Community considerations

421	   A Stringprep replacement that does anything more than just update
422	   Stringprep to the latest version of Unicode will probably entail some
423	   changes.  It is important to identify the willingness of the
424	   protocol-using community to accept backwards-incompatible changes.
425	   By the same token, it is important to evaluate the desire of the
426	   community for features not available under Stringprep.

428	3.3.3.3.  What to do about Unicode changes

430	   IDNA2008 uses an algorithm to derive the validity of a Unicode code
431	   point for use under IDNA2008.  It does this by using the properties
432	   of each code point to test its validity.

434	   This approach depends crucially on the idea that code points, once
435	   valid for a protocol profile, will not later be made invalid.  That
436	   is not a guarantee currently provided by Unicode.  Properties of code
437	   points may change between versions of Unicode.  Rarely, such a change
438	   could cause a given code point to become invalid under a protocol
439	   profile, even though the code point would be valid with an earlier
440	   version of Unicode.  This is not merely a theoretical possibility,
441	   because it has occurred ([I-D.faltstrom-5892bis]).

443	   Accordingly, a Stringprep replacement that intends to be Unicode
444	   version agnostic will need to work out a mechanism to address cases
445	   where incompatible changes occur because of new Unicode versions.

447	3.3.4.  Some useful classes of strings

449	   With the above considerations in hand, we can usefully classify
450	   strings into the following categories, inspired by those outlined in
451	   [I-D.saintandre-xmpp-i18n]:

453	   Domainy strings  Strings that are intended for use in a domain name
454	      slot, as defined in [RFC5890].  Note that domainy strings could be
455	      used outside a domain name slot: the question here is what the
456	      eventual intended use for the string is, and not whether the
457	      string is actually functioning as a domain name at any moment.
458	   Namey strings  Strings that are intended for use as identifiers but
459	      that are not domainy strings.  Namey strings are normally public
460	      data within the protocol where they are used: these are intended
461	      as identifiers that can be passed around to identify something.
462	   Secretish strings  Strings that are intended for use as passwords or
463	      passphrases or other such type of token.  Secretish strings are
464	      normally not public data within the protocol where they are used:
465	      they function as a token for authorization, and normally should
466	      not be shared publicly.
467	   Protocolish strings  Strings that are intended to be used by the
468	      protocol as free-form strings, but that have some significant
469	      handling within the protocol.  For instance, a protocol slot that
470	      allows free-form text where case is not preserved would need to
471	      have case mapping rules applied; in this case, the string would be
472	      a protocolish string.
473	   String blobs  Elements of the protocol that look like strings to
474	      users, but that are passed around in the protocol unchanged and
475	      that cannot be used for comparison or other purposes.  In effect,
476	      these are strings that are part of a protocol payload, and are not
477	      themselves part of the protocol at all.

479	4.  Considerations for Stringprep replacement

481	   The above suggests the following direction for the working group:
482	   o  A stringprep replacement should be defined.
483	   o  The replacement should take an approach similar to IDNA2008, in
484	      that it enables Unicode agility.
485	   o  Protocols share similar characteristics of strings.  Therefore,
486	      defining i18n preparation algorithms for a (small) set of string
487	      classes may be sufficient for most cases and provides the
488	      coherence among a set of protocol friends.
489	   o  The sets of string classes need to be evaluated according to the
490	      considerations that make up the headings in Section 3
491	   o  It is reasonable to limit scope to Unicode code points, and rule
492	      the mapping of data from other character encodings outside the
493	      scope of this effort.
494	   o  Recommendations for handling protocol incompatibilities resulting
495	      from changes to Unicode are required.

497	   Existing deployments already depend on Stringprep profiles.
498	   Therefore, the working group will need to consider the effects of any
499	   new strategy on existing deployments.  By way of comparison, it is
500	   worth noting that some characters were acceptable in IDNA labels
501	   under IDNA2003, but are not protocol-valid under IDNA2008 (and
502	   conversely).  Different implementers may make different decisions
503	   about what to do in such cases; this could have interoperability
504	   effects.  The working group will need to trade better support for
505	   different linguistic environments against the potential side effects
506	   of backward incompatibility.

508	5.  Security Considerations

510	   This document merely states what problems are to be solved, and does
511	   not define a protocol.  There are undoubtedly security implications
512	   of the particular results that will come from the work to be
513	   completed.

515	6.  IANA Considerations

517	   This document has no actions for IANA.

519	7.  Discussion home for this draft

521	   This document is intended to define the problem space discussed on
522	   the precis@ietf.org mailing list.

524	8.  Acknowledgements

526	   This document is the product of the PRECIS IETF Working Group, and
527	   participants in that Working Group were helpful in addressing issues
528	   with the text.

530	   Specific contributions came from David Black, Alan DeKok, Bill
531	   McQuillan, Alexey Melnikov, Peter Saint-Andre, Dave Thaler, and
532	   Yoshiro Yoneya.

534	   Dave Thaler provided the "buckets" insight in Section 3.1.1, central
535	   to the organization of the problem.

537	9.  Informative References

539	   [I-D.faltstrom-5892bis]
540	              Faltstrom, P. and P. Hoffman, "The Unicode code points and
541	              IDNA - Unicode 6.0", draft-faltstrom-5892bis-05 (work in
542	              progress), June 2011.

544	   [I-D.iab-identifier-comparison]
545	              Thaler, D., "Issues in Identifier Comparison for Security
546	              Purposes", draft-iab-identifier-comparison-00 (work in
547	              progress), July 2011.

549	   [I-D.saintandre-xmpp-i18n]
550	              Saint-Andre, P., "Internationalized Addresses in XMPP",
551	              draft-saintandre-xmpp-i18n-03 (work in progress),
552	              March 2011.

554	   [NEWPREP]  "Newprep BoF Meeting Minutes", March 2010.

556	   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
557	              Internationalized Strings ("stringprep")", RFC 3454,
558	              December 2002.

560	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
561	              "Internationalizing Domain Names in Applications (IDNA)",
562	              RFC 3490, March 2003.

564	   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
565	              Profile for Internationalized Domain Names (IDN)",
566	              RFC 3491, March 2003.

568	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
569	              for Internationalized Domain Names in Applications
570	              (IDNA)", RFC 3492, March 2003.

572	   [RFC3530]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
573	              Beame, C., Eisler, M., and D. Noveck, "Network File System
574	              (NFS) version 4 Protocol", RFC 3530, April 2003.

576	   [RFC3722]  Bakke, M., "String Profile for Internet Small Computer
577	              Systems Interface (iSCSI) Names", RFC 3722, April 2004.

579	   [RFC3748]  Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H.
580	              Levkowetz, "Extensible Authentication Protocol (EAP)",
581	              RFC 3748, June 2004.

583	   [RFC3920]  Saint-Andre, P., Ed., "Extensible Messaging and Presence
584	              Protocol (XMPP): Core", RFC 3920, October 2004.

586	   [RFC3922]  Saint-Andre, P., "Mapping the Extensible Messaging and
587	              Presence Protocol (XMPP) to Common Presence and Instant
588	              Messaging (CPIM)", RFC 3922, October 2004.

590	   [RFC4011]  Waldbusser, S., Saperia, J., and T. Hongal, "Policy Based
591	              Management MIB", RFC 4011, March 2005.

593	   [RFC4013]  Zeilenga, K., "SASLprep: Stringprep Profile for User Names
594	              and Passwords", RFC 4013, February 2005.

596	   [RFC4279]  Eronen, P. and H. Tschofenig, "Pre-Shared Key Ciphersuites
597	              for Transport Layer Security (TLS)", RFC 4279,
598	              December 2005.

600	   [RFC4314]  Melnikov, A., "IMAP4 Access Control List (ACL) Extension",
601	              RFC 4314, December 2005.

603	   [RFC4422]  Melnikov, A. and K. Zeilenga, "Simple Authentication and
604	              Security Layer (SASL)", RFC 4422, June 2006.

606	   [RFC4505]  Zeilenga, K., "Anonymous Simple Authentication and
607	              Security Layer (SASL) Mechanism", RFC 4505, June 2006.

609	   [RFC4511]  Sermersheim, J., "Lightweight Directory Access Protocol
610	              (LDAP): The Protocol", RFC 4511, June 2006.

612	   [RFC4513]  Harrison, R., "Lightweight Directory Access Protocol
613	              (LDAP): Authentication Methods and Security Mechanisms",
614	              RFC 4513, June 2006.

616	   [RFC4518]  Zeilenga, K., "Lightweight Directory Access Protocol
617	              (LDAP): Internationalized String Preparation", RFC 4518,
618	              June 2006.

620	   [RFC4616]  Zeilenga, K., "The PLAIN Simple Authentication and
621	              Security Layer (SASL) Mechanism", RFC 4616, August 2006.

623	   [RFC4643]  Vinocur, J. and K. Murchison, "Network News Transfer
624	              Protocol (NNTP) Extension for Authentication", RFC 4643,
625	              October 2006.

627	   [RFC4683]  Park, J., Lee, J., Lee, H., Park, S., and T. Polk,
628	              "Internet X.509 Public Key Infrastructure Subject
629	              Identification Method (SIM)", RFC 4683, October 2006.

631	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
632	              Recommendations for Internationalized Domain Names
633	              (IDNs)", RFC 4690, September 2006.

635	   [RFC4790]  Newman, C., Duerst, M., and A. Gulbrandsen, "Internet
636	              Application Protocol Collation Registry", RFC 4790,
637	              March 2007.

639	   [RFC4954]  Siemborski, R. and A. Melnikov, "SMTP Service Extension
640	              for Authentication", RFC 4954, July 2007.

642	   [RFC5034]  Siemborski, R. and A. Menon-Sen, "The Post Office Protocol
643	              (POP3) Simple Authentication and Security Layer (SASL)
644	              Authentication Mechanism", RFC 5034, July 2007.

646	   [RFC5051]  Crispin, M., "i;unicode-casemap - Simple Unicode Collation
647	              Algorithm", RFC 5051, October 2007.

649	   [RFC5054]  Taylor, D., Wu, T., Mavrogiannopoulos, N., and T. Perrin,
650	              "Using the Secure Remote Password (SRP) Protocol for TLS
651	              Authentication", RFC 5054, November 2007.

653	   [RFC5122]  Saint-Andre, P., "Internationalized Resource Identifiers
654	              (IRIs) and Uniform Resource Identifiers (URIs) for the
655	              Extensible Messaging and Presence Protocol (XMPP)",
656	              RFC 5122, February 2008.

658	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
659	              Housley, R., and W. Polk, "Internet X.509 Public Key
660	              Infrastructure Certificate and Certificate Revocation List
661	              (CRL) Profile", RFC 5280, May 2008.

663	   [RFC5456]  Spencer, M., Capouch, B., Guy, E., Miller, F., and K.
664	              Shumard, "IAX: Inter-Asterisk eXchange Version 2",
665	              RFC 5456, February 2010.

667	   [RFC5661]  Shepler, S., Eisler, M., and D. Noveck, "Network File
668	              System (NFS) Version 4 Minor Version 1 Protocol",
669	              RFC 5661, January 2010.

671	   [RFC5802]  Newman, C., Menon-Sen, A., Melnikov, A., and N. Williams,
672	              "Salted Challenge Response Authentication Mechanism
673	              (SCRAM) SASL and GSS-API Mechanisms", RFC 5802, July 2010.

675	   [RFC5804]  Melnikov, A. and T. Martin, "A Protocol for Remotely
676	              Managing Sieve Scripts", RFC 5804, July 2010.

678	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
679	              Applications (IDNA): Definitions and Document Framework",
680	              RFC 5890, August 2010.

682	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
683	              Applications (IDNA): Protocol", RFC 5891, August 2010.

685	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
686	              Internationalized Domain Names for Applications (IDNA)",
687	              RFC 5892, August 2010.

689	   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
690	              Internationalized Domain Names for Applications (IDNA)",
691	              RFC 5893, August 2010.

693	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
694	              Applications (IDNA): Background, Explanation, and
695	              Rationale", RFC 5894, August 2010.

697	   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
698	              Internationalized Domain Names in Applications (IDNA)
699	              2008", RFC 5895, September 2010.

701	   [UAX15]    "Unicode Standard Annex #15: Unicode Normalization Forms",
702	              UAX 15, September 2009.

704	Appendix A.  Protocols known to be using Stringprep

706	   The known cases are here described in two ways.  The types of
707	   identifiers the protocol uses is first called out in the ID type
708	   column (from Section 3.1.1), using the short forms "a" for Absolute,
709	   "d" for Definite, and "i" for Indefinite.  Next, there is a column
710	   that contains an "i" if the protocol string comes from user input, an
711	   "o" if the protocol string becomes user-facing output, "b" if both
712	   are true, and "n" if neither is true.  The remaining columns have an
713	   "x" if and only if the protocol uses that class, as described in
714	   Section 3.3.4.  Values marked "-" indicate that an answer is not
715	   useful; in this case, see detailed discussion in Appendix B.

717	   +------+--------+-------+-------+-------+---------+---------+------+
718	   |  RFC | IDtype | User? | Dom'y | Nam'y | Sec'ish | Pro'ish | Blob |
719	   +------+--------+-------+-------+-------+---------+---------+------+
720	   | 3722 |    a   |   o   |       |   x   |    x    |    x    |      |
721	   | 3748 |    -   |   -   |   -   |   x   |    -    |    -    |   -  |
722	   | 3920 |   a,d  |   b   |       |   x   |         |    x    |      |
723	   | 4314 |   a,d  |   b   |       |   x   |    x    |    x    |      |
724	   +------+--------+-------+-------+-------+---------+---------+------+

726	                                  Table 1

728	   [[anchor21: The table still needs to be filled in, I am aware.
729	   --ajs@anvilwalrusden.com]]

731	Appendix B.  Detailed discussion of protocols under consideration

733	   Below are detailed reviews of the protocols under consideration
734	   (where such reviews are available). [[anchor22: These are to be cut
735	   and pasted from the wiki. --ajs@anvilwalrusden.com]]

737	Appendix C.  Changes between versions

739	   Note to RFC Editor: This section should be removed prior to
740	   publication.

742	C.1.  00

744	   First WG version.  Based on
745	   draft-blanchet-precis-problem-statement-00.

747	C.2.  01

749	   o  Made clear that the document is talking only about Unicode code
750	      points, and not any particular encoding.
751	   o  Substantially reorganized the document along the lines of the
752	      review template at <http://trac.tools.ietf.org/wg/precis/trac/
753	      wiki/StringprepReviewTemplate>.
754	   o  Included specific questions for each topic for consideration.
755	   o  Moved spot for individual protocol review to appendix.  Not
756	      populated yet.

758	C.3.  02

760	   o  Cleared up details of comparison classes
761	   o  Added a section on changes in Unicode

763	C.4.  03

765	   o  Aligned comparison discussion with identifier discussion from
766	      draft-iab-identifier-comparison-00
767	   o  Added section on classes of strings ("Namey" and so on)

769	Authors' Addresses

771	   Marc Blanchet
772	   Viagenie
773	   2600 boul. Laurier, suite 625
774	   Quebec, QC  G1V 4W1
775	   Canada

777	   Email: Marc.Blanchet@viagenie.ca
778	   URI:   http://viagenie.ca

780	   Andrew Sullivan
781	   519 Maitland St.
782	   London, ON  N6B 2Z5
783	   Canada

785	   Email: ajs@anvilwalrusden.com