idnits 2.17.1 

draft-ietf-precis-framework-21.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 10, 2014) is 3418 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 1662

  -- Looks like a reference, but probably isn't: '2' on line 1664

  == Outdated reference: A later version (-12) exists of
     draft-ietf-precis-mappings-08

  == Outdated reference: A later version (-19) exists of
     draft-ietf-precis-nickname-13

  == Outdated reference: A later version (-18) exists of
     draft-ietf-precis-saslprepbis-12

  == Outdated reference: A later version (-24) exists of
     draft-ietf-xmpp-6122bis-18

  -- Obsolete informational reference (is this intentional?): RFC 3454
     (Obsoleted by RFC 7564)

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3491
     (Obsoleted by RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 5226
     (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	PRECIS                                                    P. Saint-Andre
3	Internet-Draft                                                      &yet
4	Obsoletes: 3454 (if approved)                                M. Blanchet
5	Intended status: Standards Track                                Viagenie
6	Expires: June 13, 2015                                 December 10, 2014

8	     PRECIS Framework: Preparation, Enforcement, and Comparison of
9	           Internationalized Strings in Application Protocols
10	                     draft-ietf-precis-framework-21

12	Abstract

14	   Application protocols using Unicode characters in protocol strings
15	   need to properly handle such strings in order to enforce
16	   internationalization rules for strings placed in various protocol
17	   slots (such as addresses and identifiers) and to perform valid
18	   comparison operations (e.g., for purposes of authentication or
19	   authorization).  This document defines a framework enabling
20	   application protocols to perform the preparation, enforcement, and
21	   comparison of internationalized strings ("PRECIS") in a way that
22	   depends on the properties of Unicode characters and thus is agile
23	   with respect to versions of Unicode.  As a result, this framework
24	   provides a more sustainable approach to the handling of
25	   internationalized strings than the previous framework, known as
26	   Stringprep (RFC 3454).  This document obsoletes RFC 3454.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at http://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on June 13, 2015.

45	Copyright Notice

47	   Copyright (c) 2014 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
63	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   6
64	   3.  Preparation, Enforcement, and Comparison  . . . . . . . . . .   6
65	   4.  String Classes  . . . . . . . . . . . . . . . . . . . . . . .   7
66	     4.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   7
67	     4.2.  IdentifierClass . . . . . . . . . . . . . . . . . . . . .   9
68	       4.2.1.  Valid . . . . . . . . . . . . . . . . . . . . . . . .   9
69	       4.2.2.  Contextual Rule Required  . . . . . . . . . . . . . .   9
70	       4.2.3.  Disallowed  . . . . . . . . . . . . . . . . . . . . .   9
71	       4.2.4.  Unassigned  . . . . . . . . . . . . . . . . . . . . .  10
72	       4.2.5.  Examples  . . . . . . . . . . . . . . . . . . . . . .  10
73	     4.3.  FreeformClass . . . . . . . . . . . . . . . . . . . . . .  10
74	       4.3.1.  Valid . . . . . . . . . . . . . . . . . . . . . . . .  11
75	       4.3.2.  Contextual Rule Required  . . . . . . . . . . . . . .  11
76	       4.3.3.  Disallowed  . . . . . . . . . . . . . . . . . . . . .  11
77	       4.3.4.  Unassigned  . . . . . . . . . . . . . . . . . . . . .  12
78	       4.3.5.  Examples  . . . . . . . . . . . . . . . . . . . . . .  12
79	   5.  Profiles  . . . . . . . . . . . . . . . . . . . . . . . . . .  12
80	     5.1.  Profiles Must Not Be Multiplied Beyond Necessity  . . . .  12
81	     5.2.  Rules . . . . . . . . . . . . . . . . . . . . . . . . . .  13
82	       5.2.1.  Width Mapping Rule  . . . . . . . . . . . . . . . . .  13
83	       5.2.2.  Additional Mapping Rule . . . . . . . . . . . . . . .  13
84	       5.2.3.  Case Mapping Rule . . . . . . . . . . . . . . . . . .  14
85	       5.2.4.  Normalization Rule  . . . . . . . . . . . . . . . . .  14
86	       5.2.5.  Directionality Rule . . . . . . . . . . . . . . . . .  15
87	     5.3.  A Note about Spaces . . . . . . . . . . . . . . . . . . .  15
88	   6.  Applications  . . . . . . . . . . . . . . . . . . . . . . . .  16
89	     6.1.  How to Use PRECIS in Applications . . . . . . . . . . . .  16
90	     6.2.  Further Excluded Characters . . . . . . . . . . . . . . .  16
91	     6.3.  Building Application-Layer Constructs . . . . . . . . . .  17
92	   7.  Order of Operations . . . . . . . . . . . . . . . . . . . . .  18
93	   8.  Code Point Properties . . . . . . . . . . . . . . . . . . . .  18
94	   9.  Category Definitions Used to Calculate Derived Property . . .  21
95	     9.1.  LetterDigits (A)  . . . . . . . . . . . . . . . . . . . .  21
96	     9.2.  Unstable (B)  . . . . . . . . . . . . . . . . . . . . . .  21
97	     9.3.  IgnorableProperties (C) . . . . . . . . . . . . . . . . .  22
98	     9.4.  IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . .  22
99	     9.5.  LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . .  22
100	     9.6.  Exceptions (F)  . . . . . . . . . . . . . . . . . . . . .  22
101	     9.7.  BackwardCompatible (G)  . . . . . . . . . . . . . . . . .  22
102	     9.8.  JoinControl (H) . . . . . . . . . . . . . . . . . . . . .  22
103	     9.9.  OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . .  22
104	     9.10. Unassigned (J)  . . . . . . . . . . . . . . . . . . . . .  23
105	     9.11. ASCII7 (K)  . . . . . . . . . . . . . . . . . . . . . . .  23
106	     9.12. Controls (L)  . . . . . . . . . . . . . . . . . . . . . .  23
107	     9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . .  23
108	     9.14. Spaces (N)  . . . . . . . . . . . . . . . . . . . . . . .  23
109	     9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . .  23
110	     9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . .  24
111	     9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . .  24
112	     9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . .  24
113	   10. Guidelines for Designated Experts . . . . . . . . . . . . . .  24
114	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
115	     11.1.  PRECIS Derived Property Value Registry . . . . . . . . .  25
116	     11.2.  PRECIS Base Classes Registry . . . . . . . . . . . . . .  25
117	     11.3.  PRECIS Profiles Registry . . . . . . . . . . . . . . . .  26
118	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  28
119	     12.1.  General Issues . . . . . . . . . . . . . . . . . . . . .  28
120	     12.2.  Use of the IdentifierClass . . . . . . . . . . . . . . .  29
121	     12.3.  Use of the FreeformClass . . . . . . . . . . . . . . . .  29
122	     12.4.  Local Character Set Issues . . . . . . . . . . . . . . .  29
123	     12.5.  Visually Similar Characters  . . . . . . . . . . . . . .  29
124	     12.6.  Security of Passwords  . . . . . . . . . . . . . . . . .  31
125	   13. Interoperability Considerations . . . . . . . . . . . . . . .  32
126	   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  33
127	     14.1.  Normative References . . . . . . . . . . . . . . . . . .  33
128	     14.2.  Informative References . . . . . . . . . . . . . . . . .  33
129	     14.3.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . .  36
130	   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . .  36
131	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  37

133	1.  Introduction

135	   Application protocols using Unicode characters [Unicode7.0] in
136	   protocol strings need to properly handle such strings in order to
137	   enforce internationalization rules for strings placed in various
138	   protocol slots (such as addresses and identifiers) and to perform
139	   valid comparison operations (e.g., for purposes of authentication or
140	   authorization).  This document defines a framework enabling
141	   application protocols to perform the preparation, enforcement, and
142	   comparison of internationalized strings ("PRECIS") in a way that
143	   depends on the properties of Unicode characters and thus is agile
144	   with respect to versions of Unicode.

146	   As described in the PRECIS problem statement [RFC6885], many IETF
147	   protocols have used the Stringprep framework [RFC3454] as the basis
148	   for preparing, enforcing, and comparing protocol strings that contain
149	   Unicode characters, especially characters outside the ASCII range
150	   [RFC20].  The Stringprep framework was developed during work on the
151	   original technology for internationalized domain names (IDNs), here
152	   called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the
153	   Stringprep profile for IDNs.  At the time, Stringprep was designed as
154	   a general framework so that other application protocols could define
155	   their own Stringprep profiles.  Indeed, a number of application
156	   protocols defined such profiles.

158	   After the publication of [RFC3454] in 2002, several significant
159	   issues arose with the use of Stringprep in the IDN case, as
160	   documented in the IAB's recommendations regarding IDNs [RFC4690]
161	   (most significantly, Stringprep was tied to Unicode version 3.2).
162	   Therefore, the newer IDNA specifications, here called "IDNA2008"
163	   ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer
164	   use Stringprep and Nameprep.  This migration away from Stringprep for
165	   IDNs prompted other "customers" of Stringprep to consider new
166	   approaches to the preparation, enforcement, and comparison of
167	   internationalized strings, as described in [RFC6885].

169	   This document defines a framework for a post-Stringprep approach to
170	   the preparation, enforcement, and comparison of internationalized
171	   strings in application protocols, based on several principles:

173	   1.  Define a small set of string classes that specify the Unicode
174	       characters (i.e., specific "code points") appropriate for common
175	       application protocol constructs.

177	   2.  Define each PRECIS string class in terms of Unicode code points
178	       and their properties so that an algorithm can be used to
179	       determine whether each code point or character category is (a)
180	       valid, (b) allowed in certain contexts, (c) disallowed, or (d)
181	       unassigned.

183	   3.  Use an "inclusion model" such that a string class consists only
184	       of code points that are explicitly allowed, with the result that
185	       any code point not explicitly allowed is forbidden.

187	   4.  Enable application protocols to define profiles of the PRECIS
188	       string classes if necessary (addressing matters such as width
189	       mapping, case mapping, Unicode normalization, and directionality)
190	       but strongly discourage the multiplication of profiles beyond
191	       necessity in order to avoid violations of the Principle of Least
192	       User Astonishment.

194	   It is expected that this framework will yield the following benefits:

196	   o  Application protocols will be agile with regard to Unicode
197	      versions.

199	   o  Implementers will be able to share code point tables and software
200	      code across application protocols, most likely by means of
201	      software libraries.

203	   o  End users will be able to acquire more accurate expectations about
204	      the characters that are acceptable in various contexts.  Given
205	      this more uniform set of string classes, it is also expected that
206	      copy/paste operations between software implementing different
207	      application protocols will be more predictable and coherent.

209	   Whereas the string classes define the "baseline" code points for a
210	   range of applications, profiling enables application protocols to
211	   apply the string classes in ways that are appropriate for common
212	   constructs such as usernames [I-D.ietf-precis-saslprepbis], opaque
213	   strings such as passwords [I-D.ietf-precis-saslprepbis], and
214	   nicknames [I-D.ietf-precis-nickname].  Profiles are responsible for
215	   defining the handling of right-to-left characters as well as various
216	   mapping operations of the kind also discussed for IDNs in [RFC5895],
217	   such as case preservation or lowercasing, Unicode normalization,
218	   mapping of certain characters to other characters or to nothing, and
219	   mapping of full-width and half-width characters.

221	   When an application applies a profile of a PRECIS string class, it
222	   transforms an input string (which might or might not be conforming)
223	   into an output string that definitively conforms to the profile.  In
224	   particular, this document focuses on the resulting ability to achieve
225	   the following objectives:

227	   a.  Enforcing all the the rules of a profile for a single output
228	       string (e.g., to determine if a string can be included protocol
229	       slot, communicated to another entity within a protocol, stored in
230	       a retrieval system, etc.).

232	   b.  Comparing two output strings to determine if they equivalent,
233	       typically through octet-for-octet matching to test for "bit-
234	       string identity" (e.g., to make an access decision for purposes
235	       of authentication or authorization as further described in
236	       [RFC6943]).

238	   The opportunity to define profiles naturally introduces the
239	   possibility of a proliferation of profiles, thus potentially
240	   mitigating the benefits of common code and violating user
241	   expectations.  See Section 5 for a discussion of this important
242	   topic.

244	   In addition, it is extremely important for protocol designers and
245	   application developers to understand that the transformation of an
246	   input string to an output string is rarely reversible.  As one
247	   relatively simple example, case mapping would transform an input
248	   string of "StPeter" to "stpeter", and information about the
249	   capitalization of the first and third characters would be lost.
250	   Similar considerations apply to other forms of mapping and
251	   normalization.

253	   Although this framework is similar to IDNA2008 and includes by
254	   reference some of the character categories defined in [RFC5892], it
255	   defines additional character categories to meet the needs of common
256	   application protocols other than DNS.

258	   The character categories and calculation rules defined under
259	   Section 8 and Section 9 are normative and apply to all Unicode code
260	   points.  The code point table that results from applying the
261	   character categories and calculation rules to the latest version of
262	   Unicode can be found in an IANA registry.

264	2.  Terminology

266	   Many important terms used in this document are defined in [RFC5890],
267	   [RFC6365], [RFC6885], and [Unicode7.0].  The terms "left-to-right"
268	   (LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex
269	   #9 [UAX9].

271	   As of the date of writing, the version of Unicode published by the
272	   Unicode Consortium is 7.0 [Unicode7.0]; however, PRECIS is not tied
273	   to a specific version of Unicode.  The latest version of Unicode is
274	   always available [UnicodeCurrent].

276	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
277	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
278	   "OPTIONAL" in this document are to be interpreted as described in
279	   [RFC2119].

281	3.  Preparation, Enforcement, and Comparison

283	   This document distinguishes between three different actions that an
284	   entity can take with regard to a string:

286	   o  Enforcement entails applying all of the rules specified for a
287	      particular string class or profile thereof to an individual
288	      string, for the purpose of determining if the string can be used
289	      in a given protocol slot.

291	   o  Comparison entails applying all of the rules specified for a
292	      particular string class or profile thereof to two separate
293	      strings, for the purpose of determining if the two strings are
294	      equivalent.

296	   o  Preparation entails only ensuring that the characters in an
297	      individual string are allowed by the underlying PRECIS string
298	      class.

300	   In most cases, authoritative entities such as servers are responsible
301	   for enforcement, whereas subsidiary entities such as clients are
302	   responsible only for preparation.  The rationale for this distinction
303	   is that clients might not have the facilities (in terms of device
304	   memory and processing power) to enforce all the rules regarding
305	   internationalized strings (such as width mapping and Unicode
306	   normalization), although they can more easily limit the repertoire of
307	   characters they offer to an end user.  By contrast, it is assumed
308	   that a server would have more capacity to enforce the rules, and in
309	   any case acts as an authority regarding allowable strings in protocol
310	   slots such as addresses and endpoint identifiers.  In addition, a
311	   client cannot necessarily be trusted to properly generate such
312	   strings, especially for security-sensitive contexts such as
313	   authentication and authorization.

315	4.  String Classes

317	4.1.  Overview

319	   Starting in 2010, various "customers" of Stringprep began to discuss
320	   the need to define a post-Stringprep approach to the preparation and
321	   comparison of internationalized strings other than IDNs.  This
322	   community analyzed the existing Stringprep profiles and also weighed
323	   the costs and benefits of defining a relatively small set of Unicode
324	   characters that would minimize the potential for user confusion
325	   caused by visually similar characters (and thus be relatively "safe")
326	   vs. defining a much larger set of Unicode characters that would
327	   maximize the potential for user creativity (and thus be relatively
328	   "expressive").  As a result, the community concluded that most
329	   existing uses could be addressed by two string classes:

331	   IdentifierClass:  a sequence of letters, numbers, and some symbols
332	      that is used to identify or address a network entity such as a
333	      user account, a venue (e.g., a chatroom), an information source
334	      (e.g., a data feed), or a collection of data (e.g., a file); the
335	      intent is that this class will minimize user confusion in a wide
336	      variety of application protocols, with the result that safety has
337	      been prioritized over expressiveness for this class.

339	   FreeformClass:  a sequence of letters, numbers, symbols, spaces, and
340	      other characters that is used for free-form strings, including
341	      passwords as well as display elements such as human-friendly
342	      nicknames for devices or for participants in a chatroom; the
343	      intent is that this class will allow nearly any Unicode character,
344	      with the result that expressiveness has been prioritized over
345	      safety for this class.  Note well that protocol designers,
346	      application developers, service providers, and end users might not
347	      understand or be able to enter all of the characters that can be
348	      included in the FreeformClass - see Section 12.3 for details.

350	   Future specifications might define additional PRECIS string classes,
351	   such as a class that falls somewhere between the IdentifierClass and
352	   the FreeformClass.  At this time, it is not clear how useful such a
353	   class would be.  In any case, because application developers are able
354	   to define profiles of PRECIS string classes, a protocol needing a
355	   construct between the IdentiferClass and the FreeformClass could
356	   define a restricted profile of the FreeformClass if needed.

358	   The following subsections discuss the IdentifierClass and
359	   FreeformClass in more detail, with reference to the dimensions
360	   described in Section 3 of [RFC6885].  Each string class is defined by
361	   the following behavioral rules:

363	   Valid:  Defines which code points are treated as valid for the
364	      string.

366	   Contextual Rule Required:  Defines which code points are treated as
367	      allowed only if the requirements of a contextual rule are met
368	      (i.e., either CONTEXTJ or CONTEXTO).

370	   Disallowed:  Defines which code points need to be excluded from the
371	      string.

373	   Unassigned:  Defines application behavior in the presence of code
374	      points that are unknown (i.e., not yet designated) for the version
375	      of Unicode used by the application.

377	   This document defines the valid, contextual rule required,
378	   disallowed, and unassigned rules for the IdentifierClass and
379	   FreeformClass.  As described under Section 5, profiles of these
380	   string classes are responsible for defining the width mapping,
381	   additional mappings, case mapping, normalization, and directionality
382	   rules.

384	4.2.  IdentifierClass

386	   Most application technologies need strings that can be used to refer
387	   to, include, or communicate protocol strings like usernames, file
388	   names, data feed identifiers, and chatroom names.  We group such
389	   strings into a class called "IdentifierClass" having the following
390	   features.

392	4.2.1.  Valid

394	   o  Code points traditionally used as letters and numbers in writing
395	      systems, i.e., the LetterDigits ("A") category first defined in
396	      [RFC5892] and listed here under Section 9.1.

398	   o  Code points in the range U+0021 through U+007E, i.e., the
399	      (printable) ASCII7 ("K") rule defined under Section 9.11.  These
400	      code points are "grandfathered" into PRECIS and thus are valid
401	      even if they would otherwise be disallowed according to the
402	      property-based rules specified in the next section.

404	      Note: Although the PRECIS IdentifierClass re-uses the LetterDigits
405	      category from IDNA2008, the range of characters allowed in the
406	      IdentifierClass is wider than the range of characters allowed in
407	      IDNA2008.  The main reason is that IDNA2008 applies the Unstable
408	      category before the LetterDigits category, thus disallowing
409	      uppercase characters, whereas the IdentifierClass does not apply
410	      the Unstable category.

412	4.2.2.  Contextual Rule Required

414	   o  A number of characters from the Exceptions ("F") category defined
415	      under Section 9.6 (see Section 9.6 for a full list).

417	   o  Joining characters, i.e., the JoinControl ("H") category defined
418	      under Section 9.8.

420	4.2.3.  Disallowed

422	   o  Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category
423	      defined under Section 9.9.

425	   o  Control characters, i.e., the Controls ("L") category defined
426	      under Section 9.12.

428	   o  Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
429	      category defined under Section 9.13.

431	   o  Space characters, i.e., the Spaces ("N") category defined under
432	      Section 9.14.

434	   o  Symbol characters, i.e., the Symbols ("O") category defined under
435	      Section 9.15.

437	   o  Punctuation characters, i.e., the Punctuation ("P") category
438	      defined under Section 9.16.

440	   o  Any character that has a compatibility equivalent, i.e., the
441	      HasCompat ("Q") category defined under Section 9.17.  These code
442	      points are disallowed even if they would otherwise be valid
443	      according to the property-based rules specified in the previous
444	      section.

446	   o  Letters and digits other than the "traditional" letters and digits
447	      allowed in IDNs, i.e., the OtherLetterDigits ("R") category
448	      defined under Section 9.18.

450	4.2.4.  Unassigned

452	   Any code points that are not yet designated in the Unicode character
453	   set are considered Unassigned for purposes of the IdentifierClass,
454	   and such code points are to be treated as Disallowed.  See
455	   Section 9.10.

457	4.2.5.  Examples

459	   As described in the Introduction to this document, the string classes
460	   do not handle all issues related to string preparation and comparison
461	   (such as case mapping); instead, such issues are handled at the level
462	   of profiles.  Examples for two profiles of the IdentifierClass can be
463	   found in [I-D.ietf-precis-saslprepbis] (the UsernameIdentifierClass
464	   profile) and in [I-D.ietf-xmpp-6122bis] (the LocalpartIdentifierClass
465	   profile).

467	4.3.  FreeformClass

469	   Some application technologies need strings that can be used in a
470	   free-form way, e.g., as a password in an authentication exchange (see
471	   [I-D.ietf-precis-saslprepbis]) or a nickname in a chatroom (see
472	   [I-D.ietf-precis-nickname]).  We group such things into a class
473	   called "FreeformClass" having the following features.

475	      Security Warning: As mentioned, the FreeformClass prioritizes
476	      expressiveness over safety; Section 12.3 describes some of the
477	      security hazards involved with using or profiling the
478	      FreeformClass.

480	      Security Warning: Consult Section 12.6 for relevant security
481	      considerations when strings conforming to the FreeformClass, or a
482	      profile thereof, are used as passwords.

484	4.3.1.  Valid

486	   o  Traditional letters and numbers, i.e., the LetterDigits ("A")
487	      category first defined in [RFC5892] and listed here under
488	      Section 9.1.

490	   o  Letters and digits other than the "traditional" letters and digits
491	      allowed in IDNs, i.e., the OtherLetterDigits ("R") category
492	      defined under Section 9.18.

494	   o  Code points in the range U+0021 through U+007E, i.e., the
495	      (printable) ASCII7 ("K") rule defined under Section 9.11.

497	   o  Any character that has a compatibility equivalent, i.e., the
498	      HasCompat ("Q") category defined under Section 9.17.

500	   o  Space characters, i.e., the Spaces ("N") category defined under
501	      Section 9.14.

503	   o  Symbol characters, i.e., the Symbols ("O") category defined under
504	      Section 9.15.

506	   o  Punctuation characters, i.e., the Punctuation ("P") category
507	      defined under Section 9.16.

509	4.3.2.  Contextual Rule Required

511	   o  A number of characters from the Exceptions ("F") category defined
512	      under Section 9.6 (see Section 9.6 for a full list).

514	   o  Joining characters, i.e., the JoinControl ("H") category defined
515	      under Section 9.8.

517	4.3.3.  Disallowed

519	   o  Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category
520	      defined under Section 9.9.

522	   o  Control characters, i.e., the Controls ("L") category defined
523	      under Section 9.12.

525	   o  Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
526	      category defined under Section 9.13.

528	4.3.4.  Unassigned

530	   Any code points that are not yet designated in the Unicode character
531	   set are considered Unassigned for purposes of the FreeformClass, and
532	   such code points are to be treated as Disallowed.

534	4.3.5.  Examples

536	   As described in the Introduction to this document, the string classes
537	   do not handle all issues related to string preparation and comparison
538	   (such as case mapping); instead, such issues are handled at the level
539	   of profiles.  Examples for two profiles of the FreeformClass can be
540	   found in [I-D.ietf-precis-nickname] (the NicknameFreeformClass
541	   profile) and in [I-D.ietf-xmpp-6122bis] (the
542	   ResourcepartIdentifierClass profile).

544	5.  Profiles

546	   This framework document defines the valid, contextual-rule-required,
547	   disallowed, and unassigned rules for the IdentifierClass and the
548	   FreeformClass.  A profile of a PRECIS string class MUST define the
549	   width mapping, additional mappings (if any), case mapping,
550	   normalization, and directionality rules.  A profile MAY also restrict
551	   the allowable characters above and beyond the definition of the
552	   relevant PRECIS string class (but MUST NOT add as valid any code
553	   points that are disallowed by the relevant PRECIS string class).
554	   These matters are discussed in the following subsections.

556	   Profiles of the PRECIS string classes are registered with the IANA as
557	   described under Section 11.3.  Profile names use the following
558	   convention: they are of the form "Profilename of BaseClass", where
559	   the "Profilename" string is a differentiator and "BaseClass" is the
560	   name of the PRECIS string class being profiled; for example, the
561	   profile of the Freeform used for opaque strings such as passwords is
562	   the "OpaqueString" profile [I-D.ietf-precis-saslprepbis].

564	5.1.  Profiles Must Not Be Multiplied Beyond Necessity

566	   The risk of profile proliferation is significant because having too
567	   many profiles will result in different behavior across various
568	   applications, thus violating what is known in user interface design
569	   as the Principle of Least Astonishment.

571	   Indeed, we already have too many profiles.  Ideally we would have at
572	   most two or three profiles.  Unfortunately, numerous application
573	   protocols exist with their own quirks regarding protocol strings.
574	   Domain names, email addresses, instant messaging addresses, chatroom
575	   nicknames, filenames, authentication identifiers, passwords, and
576	   other strings are already out there in the wild and need to be
577	   supported in existing application protocols such as DNS, SMTP, XMPP,
578	   IRC, NFS, iSCSI, EAP, and SASL among others.

580	   Nevertheless, profiles must not be multiplied beyond necessity.

582	   To help prevent profile proliferation, this document recommends
583	   sensible defaults for the various options offered to profile creators
584	   (such as width mapping and Unicode normalization).  In addition, the
585	   guidelines for designated experts provided under Section 10 are meant
586	   to encourage a high level of due diligence regarding new profiles.

588	5.2.  Rules

590	5.2.1.  Width Mapping Rule

592	   The width mapping rule of a profile specifies whether width mapping
593	   is performed on the characters of a string, and how the mapping is
594	   done.  Typically such mapping consists of mapping fullwidth and
595	   halfwidth characters, i.e., code points with a Decomposition Type of
596	   Wide or Narrow, to their decomposition mappings; as an example,
597	   FULLWIDTH DIGIT ZERO (U+FF10) would be mapped to DIGIT ZERO (U+0030).

599	   The normalization form specified by a profile (see below) has an
600	   impact on the need for width mapping.  Because width mapping is
601	   performed as a part of compatibility decomposition, a profile
602	   employing either normalization form KD (NFKD) or normalization form
603	   KC (NFKC) does not need to specify width mapping.  However, if
604	   Unicode normalization form C (NFC) is used (as is recommended) then
605	   the profile needs to specify whether to apply width mapping; in this
606	   case, width mapping is in general RECOMMENDED because allowing
607	   fullwidth and halfwidth characters to remain unmapped to their
608	   compatibility variants would violate the Principle of Least
609	   Astonishment.  For more information about the concept of width in
610	   East Asian scripts within Unicode, see Unicode Standard Annex #11
611	   [UAX11].

613	5.2.2.  Additional Mapping Rule

615	   The additional mapping rule of a profile specifies whether additional
616	   mappings is performed on the characters of a string, such as:

618	      Mapping of delimiter characters (such as '@', ':', '/', '+', and
619	      '-')

621	      Mapping of special characters (e.g., non-ASCII space characters to
622	      ASCII space or control characters to nothing).

624	   The PRECIS mappings document [I-D.ietf-precis-mappings] describes
625	   such mappings in more detail.

627	5.2.3.  Case Mapping Rule

629	   The case mapping rule of a profile specifies whether case mapping
630	   (instead of case preservation) is performed on the characters of a
631	   string, and how the mapping is applied (e.g., mapping uppercase and
632	   titlecase characters to their lowercase equivalents).

634	   If case mapping is desired (instead of case preservation), it is
635	   RECOMMENDED to use Unicode Default Case Folding as defined in Chapter
636	   3 of the Unicode Standard [Unicode7.0].

638	      Note: Unicode Default Case Folding is not designed to handle
639	      various localization issues (such as so-called "dotless i" in
640	      several Turkic languages).  The PRECIS mappings document
641	      [I-D.ietf-precis-mappings] describes these issues in greater
642	      detail and defines a "local case mapping" method that handles some
643	      locale-dependent and context-dependent mappings.

645	   In order to maximize entropy and minimize the potential for false
646	   positives, it is NOT RECOMMENDED for application protocols to map
647	   uppercase and titlecase code points to their lowercase equivalents
648	   when strings conforming to the FreeformClass, or a profile thereof,
649	   are used in passwords; instead, it is RECOMMENDED to preserve the
650	   case of all code points contained in such strings and then perform
651	   case-sensitive comparison.  See also the related discussion in
652	   [I-D.ietf-precis-saslprepbis].

654	5.2.4.  Normalization Rule

656	   The normalization rule of a profile specifies which Unicode
657	   normalization form (D, KD, C, or KC) is to be applied (see Unicode
658	   Standard Annex #15 [UAX15] for background information).

660	   In accordance with [RFC5198], normalization form C (NFC) is
661	   RECOMMENDED.

663	5.2.5.  Directionality Rule

665	   The directionality rule of a profile specifies how to treat strings
666	   that contain right-to-left (RTL) characters (see Unicode Standard
667	   Annex #9 [UAX9]).  In general this document recommends applying the
668	   "Bidi Rule" from [RFC5893] to strings that contain RTL characters.

670	   Mixed-direction strings (that is, strings containing some portions
671	   that are left-to-right and other portions that are right-to-left) are
672	   not directly supported by the PRECIS framework itself, since there is
673	   currently no widely accepted and implemented solution for the safe
674	   display of mixed-direction strings.  An application protocol that
675	   uses the PRECIS framework (or an extension to the framework) could
676	   define better ways to present mixed-direction strings; however, that
677	   work is outside the scope of this framework and would likely require
678	   a great deal of careful research into the problems of displaying
679	   bidirectional text.

681	5.3.  A Note about Spaces

683	   With regard to the IdentiferClass, the consensus of the PRECIS
684	   Working Group was that spaces are problematic for many reasons,
685	   including:

687	   o  Many Unicode characters are confusable with ASCII space.

689	   o  Even if non-ASCII space characters are mapped to ASCII space
690	      (U+0020), space characters are often not rendered in user
691	      interfaces, leading to the possibility that a human user might
692	      consider a string containing spaces to be equivalent to the same
693	      string without spaces.

695	   o  In some locales, some devices are known to generate a character
696	      other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
697	      user performs an action like hitting the space bar on a keyboard.

699	   One consequence of disallowing space characters in the
700	   IdentifierClass might be to effectively discourage their use within
701	   identifiers created in newer application protocols; given the
702	   challenges involved with properly handling space characters
703	   (especially non-ASCII space characters) in identifiers and other
704	   protocol strings, the PRECIS Working Group considered this to be a
705	   feature, not a bug.

707	   However, the FreeformClass does allow spaces, which enables
708	   application protocols to define profiles of the FreeformClass that
709	   are more flexible than any profiles of the IdentifierClass.  In
710	   addition, as explained in the previous section, application protocols
711	   can also define application-layer constructs containing spaces.

713	6.  Applications

715	6.1.  How to Use PRECIS in Applications

717	   Although PRECIS has been designed with applications in mind,
718	   internationalization is not suddenly made easy though the use of
719	   PRECIS.  Application developers still need to give some thought to
720	   how they will use the PRECIS string classes, or profiles thereof, in
721	   their applications.  This section provides some guidelines to
722	   application developers (and to expert reviewers of application
723	   protocol specifications).

725	   o  Don't define your own profile unless absolutely necessary (see
726	      Section 5.1).  Existing profiles have been design for wide re-use.
727	      It is highly likely that an existing profile will meet your needs,
728	      especially given the ability to specify further excluded
729	      characters (Section 6.2) and to build application-layer constructs
730	      (see Section 6.3).

732	   o  Do specify:

734	      *  Exactly which entities are responsible for preparation,
735	         enforcement, and comparison of internationalized strings (e.g.,
736	         servers or clients).

738	      *  Exactly when those entities need to complete their tasks (e.g.,
739	         a server might need to enforce the rules of a profile before
740	         allowing a client to gain network access).

742	      *  Exactly which protocol slots need to be checked against which
743	         profiles (e.g., checking the address of a message's intended
744	         recipient against the UsernameCaseMapped profile
745	         [I-D.ietf-precis-saslprepbis] of the IdentifierClass, or
746	         checking the password of a user against the OpaqueString
747	         profile [I-D.ietf-precis-saslprepbis] of the FreeformClass).

749	      See [I-D.ietf-precis-saslprepbis] and [I-D.ietf-xmpp-6122bis] for
750	      definitions of these matters for several applications.

752	6.2.  Further Excluded Characters

754	   An application protocol that uses a profile MAY specify particular
755	   code points that are not allowed in relevant slots within that
756	   application protocol, above and beyond those excluded by the string
757	   class or profile.

759	   That is, an application protocol MAY do either of the following:

761	   1.  Exclude specific code points that are allowed by the relevant
762	       string class.

764	   2.  Exclude characters matching certain Unicode properties (e.g.,
765	       math symbols) that are included in the relevant PRECIS string
766	       class.

768	   As a result of such exclusions, code points that are defined as valid
769	   for the PRECIS string class or profile will be defined as disallowed
770	   for the relevant protocol slot.

772	   Typically, such exclusions are defined for the purpose of backward-
773	   compatibility with legacy formats within an application protocol.
774	   These are defined for application protocols, not profiles, in order
775	   to prevent multiplication of profiles beyond necessity (see
776	   Section 5.1).

778	6.3.  Building Application-Layer Constructs

780	   Sometimes, an application-layer construct does not map in a
781	   straightforward manner to one of the base string classes or a profile
782	   thereof.  Consider, for example, the "simple user name" construct in
783	   the Simple Authentication and Security Layer (SASL) [RFC4422].
784	   Depending on the deployment, a simple user name might take the form
785	   of a user's full name (e.g., the user's personal name followed by a
786	   space and then the user's family name).  Such a simple user name
787	   cannot be defined as an instance of the IdentifierClass or a profile
788	   thereof, since space characters are not allowed in the
789	   IdentifierClass; however, it could be defined using a space-separated
790	   sequence of IdentifierClass instances, as in the following ABNF
791	   [RFC5234] from [I-D.ietf-precis-saslprepbis]:

793	      username   = userpart *(1*SP userpart)
794	      userpart   = 1*(idbyte)
795	                   ;
796	                   ; an "idbyte" is a byte used to represent a
797	                   ; UTF-8 encoded Unicode code point that can be
798	                   ; contained in a string that conforms to the
799	                   ; PRECIS "IdentifierClass"
800	                   ;

802	   Similar techniques could be used to define many application-layer
803	   constructs, say of the form "user@domain" or "/path/to/file".

805	7.  Order of Operations

807	   To ensure proper comparison, the rules specified for a particular
808	   string class or profile MUST be applied in the following order:

810	   1.  Width Mapping Rule

812	   2.  Additional Mapping Rule

814	   3.  Case Mapping Rule

816	   4.  Normalization Rule

818	   5.  Directionality Rule

820	   6.  Behavioral rules for determining whether a code point is valid,
821	       allowed under a contextual rule, disallowed, or unassigned

823	   As already described, the width mapping, additional mapping, case
824	   mapping, normalization, and directionality rules are specified for
825	   each profile, whereas the behavioral rules are specified for each
826	   string class.  Some of the logic behind this order is provided under
827	   Section 5.2.1 (see also the PRECIS mappings document
828	   [I-D.ietf-precis-mappings]).

830	8.  Code Point Properties

832	   In order to implement the string classes described above, this
833	   document does the following:

835	   1.  Reviews and classifies the collections of code points in the
836	       Unicode character set by examining various code point properties.

838	   2.  Defines an algorithm for determining a derived property value,
839	       which can vary depending on the string class being used by the
840	       relevant application protocol.

842	   This document is not intended to specify precisely how derived
843	   property values are to be applied in protocol strings.  That
844	   information is the responsibility of the protocol specification that
845	   uses or profiles a PRECIS string class from this document.  The value
846	   of the property is to be interpreted as follows.

848	   PROTOCOL VALID  Those code points that are allowed to be used in any
849	      PRECIS string class (currently, IdentifierClass and
850	      FreeformClass).  The abbreviated term "PVALID" is used to refer to
851	      this value in the remainder of this document.

853	   SPECIFIC CLASS PROTOCOL VALID  Those code points that are allowed to
854	      be used in specific string classes.  In the remainder of this
855	      document, the abbreviated term *_PVAL is used, where * = (ID |
856	      FREE), i.e., either "FREE_PVAL" or "ID_PVAL".  In practice, the
857	      derived property ID_PVAL is not used in this specification, since
858	      every ID_PVAL code point is PVALID.

860	   CONTEXTUAL RULE REQUIRED  Some characteristics of the character, such
861	      as its being invisible in certain contexts or problematic in
862	      others, require that it not be used in labels unless specific
863	      other characters or properties are present.  As in IDNA2008, there
864	      are two subdivisions of CONTEXTUAL RULE REQUIRED, the first for
865	      Join_controls (called "CONTEXTJ") and the second for other
866	      characters (called "CONTEXTO").  A character with the derived
867	      property value CONTEXTJ or CONTEXTO MUST NOT be used unless an
868	      appropriate rule has been established and the context of the
869	      character is consistent with that rule.  The most notable of the
870	      CONTEXTUAL RULE REQUIRED characters are the Join Control
871	      characters U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON-
872	      JOINER, which have a derived property value of CONTEXTJ.  See
873	      Appendix A of [RFC5892] for more information.

875	   DISALLOWED  Those code points that are not permitted in any PRECIS
876	      string class.

878	   SPECIFIC CLASS DISALLOWED  Those code points that are not to be
879	      included in one of the string classes but that might be permitted
880	      in others.  In the remainder of this document, the abbreviated
881	      term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS"
882	      or "ID_DIS".  In practice, the derived property FREE_DIS is not
883	      used in this specification, since every FREE_DIS code point is
884	      DISALLOWED.

886	   UNASSIGNED  Those code points that are not designated (i.e. are
887	      unassigned) in the Unicode Standard.

889	   The algorithm to calculate the value of the derived property is as
890	   follows:

892	   If .cp. .in. Exceptions Then Exceptions(cp);
893	   Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
894	   Else If .cp. .in. Unassigned Then UNASSIGNED;
895	   Else If .cp. .in. ASCII7 Then PVALID;
896	   Else If .cp. .in. JoinControl Then CONTEXTJ;
897	   Else If .cp. .in. OldHangulJamo Then DISALLOWED;
898	   Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
899	   Else If .cp. .in. Controls Then DISALLOWED;
900	   Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
901	   Else If .cp. .in. LetterDigits Then PVALID;
902	   Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
903	   Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
904	   Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
905	   Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
906	   Else DISALLOWED;

908	   The value of the derived property calculated can depend on the string
909	   class; for example, if an identifier used in an application protocol
910	   is defined as profiling the PRECIS IdentifierClass then a space
911	   character such as U+0020 would be assigned to ID_DIS, whereas if an
912	   identifier is defined as profiling the PRECIS FreeformClass then the
913	   character would be assigned to FREE_PVAL.  For the sake of brevity,
914	   the designation "FREE_PVAL" is used herein, instead of the longer
915	   designation "ID_DIS or FREE_PVAL".  In practice, the derived
916	   properties ID_PVAL and FREE_DIS are not used in this specification,
917	   since every ID_PVAL code point is PVALID and every FREE_DIS code
918	   point is DISALLOWED.

920	   Use of the name of a rule (such as "Exceptions") implies the set of
921	   code points that the rule defines, whereas the same name as a
922	   function call (such as "Exceptions(cp)") implies the value that the
923	   code point has in the Exceptions table.

925	   The mechanisms described here allow determination of the value of the
926	   property for future versions of Unicode (including characters added
927	   after Unicode 5.2 or 7.0 depending on the category, since some
928	   categories mentioned in this document are simply pointers to IDNA2008
929	   and therefore were defined at the time of Unicode 5.2).  Changes in
930	   Unicode properties that do not affect the outcome of this process
931	   therefore do not affect this framework.  For example, a character can
932	   have its Unicode General_Category value (see Chapter 4 of the Unicode
933	   Standard [Unicode7.0]) change from So to Sm, or from Lo to Ll,
934	   without affecting the algorithm results.  Moreover, even if such
935	   changes were to result, the BackwardCompatible list (Section 9.7) can
936	   be adjusted to ensure the stability of the results.

938	9.  Category Definitions Used to Calculate Derived Property

940	   The derived property obtains its value based on a two-step procedure:

942	   1.  Characters are placed in one or more character categories either
943	       (1) based on core properties defined by the Unicode Standard or
944	       (2) by treating the code point as an exception and addressing the
945	       code point based on its code point value.  These categories are
946	       not mutually exclusive.

948	   2.  Set operations are used with these categories to determine the
949	       values for a property specific to a given string class.  These
950	       operations are specified under Section 8.

952	      Note: Unicode property names and property value names might have
953	      short abbreviations, such as "gc" for the General_Category
954	      property and "Ll" for the Lowercase_Letter property value of the
955	      gc property.

957	   In the following specification of character categories, the operation
958	   that returns the value of a particular Unicode character property for
959	   a code point is designated by using the formal name of that property
960	   (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for
961	   "code point".  For example, the value of the General_Category
962	   property for a code point is indicated by General_Category(cp).

964	   The first ten categories (A-J) shown below were previously defined
965	   for IDNA2008 and are referenced from [RFC5892] to ease the
966	   understanding of how PRECIS handles various characters.  Some of
967	   these categories are reused in PRECIS and some of them are not;
968	   however, the lettering of categories is retained to prevent overlap
969	   and to ease implementation of both IDNA2008 and PRECIS in a single
970	   software application.  The next eight categories (K-R) are specific
971	   to PRECIS.

973	9.1.  LetterDigits (A)

975	   This category is defined in Secton 2.1 of [RFC5892] and is included
976	   by reference for use in PRECIS.

978	9.2.  Unstable (B)

980	   This category is defined in Secton 2.2 of [RFC5892] but is not used
981	   in PRECIS.

983	9.3.  IgnorableProperties (C)

985	   This category is defined in Secton 2.3 of [RFC5892] but is not used
986	   in PRECIS.

988	   Note: See the "PrecisIgnorableProperties (M)" category below for a
989	   more inclusive category used in PRECIS identifiers.

991	9.4.  IgnorableBlocks (D)

993	   This category is defined in Secton 2.4 of [RFC5892] but is not used
994	   in PRECIS.

996	9.5.  LDH (E)

998	   This category is defined in Secton 2.5 of [RFC5892] but is not used
999	   in PRECIS.

1001	   Note: See the "ASCII7 (K)" category below for a more inclusive
1002	   category used in PRECIS identifiers.

1004	9.6.  Exceptions (F)

1006	   This category is defined in Secton 2.6 of [RFC5892] and is included
1007	   by reference for use in PRECIS.

1009	9.7.  BackwardCompatible (G)

1011	   This category is defined in Secton 2.7 of [RFC5892] and is included
1012	   by reference for use in PRECIS.

1014	   Note: Management of this category is handled via the processes
1015	   specified in [RFC5892].  At the time of this writing (and also at the
1016	   time that RFC 5892 was published), this category consisted of the
1017	   empty set; however, that is subject to change as described in RFC
1018	   5892.

1020	9.8.  JoinControl (H)

1022	   This category is defined in Secton 2.8 of [RFC5892] and is included
1023	   by reference for use in PRECIS.

1025	9.9.  OldHangulJamo (I)

1027	   This category is defined in Secton 2.9 of [RFC5892] and is included
1028	   by reference for use in PRECIS.

1030	9.10.  Unassigned (J)

1032	   This category is defined in Secton 2.10 of [RFC5892] and is included
1033	   by reference for use in PRECIS.

1035	9.11.  ASCII7 (K)

1037	   This PRECIS-specific category consists of all printable, non-space
1038	   characters from the 7-bit ASCII range.  By applying this category,
1039	   the algorithm specified under Section 8 exempts these characters from
1040	   other rules that might be applied during PRECIS processing, on the
1041	   assumption that these code points are in such wide use that
1042	   disallowing them would be counter-productive.

1044	   K: cp is in {0021..007E}

1046	9.12.  Controls (L)

1048	   This PRECIS-specific category consists of all control characters.

1050	   L: Control(cp) = True

1052	9.13.  PrecisIgnorableProperties (M)

1054	   This PRECIS-specific category is used to group code points that are
1055	   discouraged from use in PRECIS string classes.

1057	   M: Default_Ignorable_Code_Point(cp) = True or
1058	      Noncharacter_Code_Point(cp) = True

1060	   The definition for Default_Ignorable_Code_Point can be found in the
1061	   DerivedCoreProperties.txt [2] file.

1063	9.14.  Spaces (N)

1065	   This PRECIS-specific category is used to group code points that are
1066	   space characters.

1068	   N: General_Category(cp) is in {Zs}

1070	9.15.  Symbols (O)

1072	   This PRECIS-specific category is used to group code points that are
1073	   symbols.

1075	   O: General_Category(cp) is in {Sm, Sc, Sk, So}

1077	9.16.  Punctuation (P)

1079	   This PRECIS-specific category is used to group code points that are
1080	   punctuation characters.

1082	   P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}

1084	9.17.  HasCompat (Q)

1086	   This PRECIS-specific category is used to group code points that have
1087	   compatibility equivalents as explained in Chapter 2 and Chapter 3 of
1088	   the Unicode Standard [Unicode7.0].

1090	   Q: toNFKC(cp) != cp

1092	   The toNFKC() operation returns the code point in normalization form
1093	   KC.  For more information, see Section 5 of Unicode Standard Annex
1094	   #15 [UAX15].

1096	9.18.  OtherLetterDigits (R)

1098	   This PRECIS-specific category is used to group code points that are
1099	   letters and digits other than the "traditional" letters and digits
1100	   grouped under the LetterDigits (A) class (see Section 9.1).

1102	   R: General_Category(cp) is in {Lt, Nl, No, Me}

1104	10.  Guidelines for Designated Experts

1106	   Experience with internationalization in application protocols has
1107	   shown that protocol designers and application developers usually do
1108	   not understand the subtleties and tradeoffs involved with
1109	   internationalization and that they need considerable guidance in
1110	   making reasonable decisions with regard to the options before them.

1112	   Therefore:

1114	   o  Protocol designers are strongly encouraged to question the
1115	      assumption that they need to define new profiles, since existing
1116	      profiles are designed for wide re-use (see Section 5 for further
1117	      discussion).

1119	   o  Those who persist in defining new profiles are strongly encouraged
1120	      to clearly explain a strong justification for doing so, and to
1121	      publish a stable specification that provides all of the
1122	      information described under Section 11.3.

1124	   o  The designated experts for profile registration requests ought to
1125	      seek answers to all of the questions provided under Section 11.3
1126	      and to encourage applicants to provide a stable specification
1127	      documenting the profile (even though the registration policy for
1128	      PRECIS profiles is Expert Review and a stable specification is not
1129	      strictly required).

1131	   o  Developers of applications that use PRECIS are strongly encouraged
1132	      to apply the guidelines provided under Section 6 and to seek out
1133	      the advice of the designated experts or other knowledgeable
1134	      individuals in doing so.

1136	   o  All parties are strongly encouraged to help prevent the
1137	      multiplication of profiles beyond necessity, as described under
1138	      Section 5.1, and to use PRECIS in ways that will minimize user
1139	      confusion and insecure application behavior.

1141	   Internationalization can be difficult and contentious; designated
1142	   experts, profile registrants, and application developers are strongly
1143	   encouraged to work together in a spirit of good faith and mutual
1144	   understanding to achieve rough consensus on profile registration
1145	   requests and the use of PRECIS in particular applications.  They are
1146	   also encouraged to bring additional expertise into the discussion if
1147	   that would be helpful in adding perspective or otherwise resolving
1148	   issues.

1150	11.  IANA Considerations

1152	11.1.  PRECIS Derived Property Value Registry

1154	   IANA is requested to create a PRECIS-specific registry with the
1155	   Derived Properties for the versions of Unicode that are released
1156	   after (and including) version 7.0.  The derived property value is to
1157	   be calculated in cooperation with a designated expert [RFC5226]
1158	   according to the rules specified under Section 8 and Section 9.

1160	   The IESG is to be notified if backward-incompatible changes to the
1161	   table of derived properties are discovered or if other problems arise
1162	   during the process of creating the table of derived property values
1163	   or during expert review.  Changes to the rules defined under
1164	   Section 8 and Section 9 require IETF Review.

1166	11.2.  PRECIS Base Classes Registry

1168	   IANA is requested to create a registry of PRECIS string classes.  In
1169	   accordance with [RFC5226], the registration policy is "RFC Required".

1171	   The registration template is as follows:

1173	   Base Class:  [the name of the PRECIS string class]

1175	   Description:  [a brief description of the PRECIS string class and its
1176	      intended use, e.g., "A sequence of letters, numbers, and symbols
1177	      that is used to identify or address a network entity."]

1179	   Specification:  [the RFC number]

1181	   The initial registrations are as follows:

1183	   Base Class: FreeformClass.
1184	   Description: A sequence of letters, numbers, symbols, spaces, and
1185	         other code points that is used for free-form strings.
1186	   Specification: Section 4.3 of this document.
1187	                  [Note to RFC Editor: please change "this document"
1188	                  to the RFC number issued for this specification.]

1190	   Base Class: IdentifierClass.
1191	   Description: A sequence of letters, numbers, and symbols that is
1192	         used to identify or address a network entity.
1193	   Specification: Section 4.2 of this document.
1194	                  [Note to RFC Editor: please change "this document"
1195	                  to the RFC number issued for this specification.]

1197	11.3.  PRECIS Profiles Registry

1199	   IANA is requested to create a registry of profiles that use the
1200	   PRECIS string classes.  In accordance with [RFC5226], the
1201	   registration policy is "Expert Review".  This policy was chosen in
1202	   order to ease the burden of registration while ensuring that
1203	   "customers" of PRECIS receive appropriate guidance regarding the
1204	   sometimes complex and subtle internationalization issues related to
1205	   profiles of PRECIS string classes.

1207	   The registration template is as follows:

1209	   Name:  [the name of the profile]

1211	   Base Class:  [which PRECIS string class is being profiled]

1213	   Applicability:  [the specific protocol elements to which this profile
1214	      applies, e.g., "Localparts in XMPP addresses."]

1216	   Replaces:  [the Stringprep profile that this PRECIS profile replaces,
1217	      if any]

1219	   Width Mapping Rule:  [the behavioral rule for handling of width,
1220	      e.g., "Map fullwidth and halfwidth characters to their
1221	      compatibility variants."]

1223	   Additional Mapping Rule:  [any additional mappings are required or
1224	      recommended, e.g., "Map non-ASCII space characters to ASCII
1225	      space."]

1227	   Case Mapping Rule:  [the behavioral rule for handling of case, e.g.,
1228	      "Unicode Default Case Folding"]

1230	   Normalization Rule:  [which Unicode normalization form is applied,
1231	      e.g., "NFC"]

1233	   Directionality Rule:  [the behavioral rule for handling of right-to-
1234	      left code points, e.g., "The 'Bidi Rule' defined in RFC 5893
1235	      applies."]

1237	   Enforcement:  [which entities enforce the rules, and when that
1238	      enforcement occurs during protocol operations]

1240	   Specification:  [a pointer to relevant documentation, such as an RFC
1241	      or Internet-Draft]

1243	   In order to request a review, the registrant shall send a completed
1244	   template to the precis@ietf.org list or its designated successor.

1246	   Factors to focus on while defining profiles and reviewing profile
1247	   registrations include the following:

1249	   o  Would an existing PRECIS string class or profile solve the
1250	      problem?  If not, why not?  (See Section 5.1 for related
1251	      considerations.)

1253	   o  Is the problem being addressed by this profile well-defined?

1255	   o  Does the specification define what kinds of applications are
1256	      involved and the protocol elements to which this profile applies?

1258	   o  Is the profile clearly defined?

1260	   o  Is the profile based on an appropriate dividing line between user
1261	      interface (culture, context, intent, locale, device limitations,
1262	      etc.) and the use of conformant strings in protocol elements?

1264	   o  Are the width mapping, case mapping, additional mappings,
1265	      normalization, and directionality rules appropriate for the
1266	      intended use?

1268	   o  Does the profile explain which entities enforce the rules, and
1269	      when such enforcement occurs during protocol operations?

1271	   o  Does the profile reduce the degree to which human users could be
1272	      surprised or confused by application behavior (the "Principle of
1273	      Least Astonishment")?

1275	   o  Does the profile introduce any new security concerns such as those
1276	      described under Section 12 of this document (e.g., false positives
1277	      for authentication or authorization)?

1279	12.  Security Considerations

1281	12.1.  General Issues

1283	   If input strings that appear "the same" to users are programmatically
1284	   considered to be distinct in different systems, or if input strings
1285	   that appear distinct to users are programmatically considered to be
1286	   "the same" in different systems, then users can be confused.  Such
1287	   confusion can have security implications, such as the false positives
1288	   and false negatieves discussed in [RFC6943].  One starting goal of
1289	   work on the PRECIS framework was to limit the number of times that
1290	   users are confused (consistent with the "Principle of Least
1291	   Astonishment").  Unfortunately, this goal has been difficult to
1292	   achieve given the large number of application protocols already in
1293	   existence.  Despite these difficulties, profiles should not be
1294	   multiplied beyond necessity (see Section 5.1.  In particular,
1295	   application protocol designers should think long and hard before
1296	   defining a new profile instead of using one that has already been
1297	   defined, and if they decide to define a new profile then they should
1298	   clearly explain their reasons for doing so.

1300	   The security of applications that use this framework can depend in
1301	   part on the proper preparation, enforcement, and comparison of
1302	   internationalized strings.  For example, such strings can be used to
1303	   make authentication and authorization decisions, and the security of
1304	   an application could be compromised if an entity providing a given
1305	   string is connected to the wrong account or online resource based on
1306	   different interpretations of the string (again, see [RFC6943]).

1308	   Specifications of application protocols that use this framework are
1309	   strongly encouraged to describe how internationalized strings are
1310	   used in the protocol, including the security implications of any
1311	   false positives and false negatives that might result from various
1312	   enforcement and comparison operations.  For some helpful guidelines,
1313	   refer to [RFC6943], [RFC5890], [UTR36], and [UTS39].

1315	12.2.  Use of the IdentifierClass

1317	   Strings that conform to the IdentifierClass and any profile thereof
1318	   are intended to be relatively safe for use in a broad range of
1319	   applications, primarily because they include only letters, digits,
1320	   and "grandfathered" non-space characters from the ASCII range; thus
1321	   they exclude spaces, characters with compatibility equivalents, and
1322	   almost all symbols and punctuation marks.  However, because such
1323	   strings can still include so-called confusable characters (see
1324	   Section 12.5), protocol designers and implementers are encouraged to
1325	   pay close attention to the security considerations described
1326	   elsewhere in this document.

1328	12.3.  Use of the FreeformClass

1330	   Strings that conform to the FreeformClass and many profiles thereof
1331	   can include virtually any Unicode character.  This makes the
1332	   FreeformClass quite expressive, but also problematic from the
1333	   perspective of possible user confusion.  Protocol designers are
1334	   hereby warned that the FreeformClass contains codepoints they might
1335	   not understand, and are encouraged to profile the IdentifierClass
1336	   wherever feasible; however, if an application protocol requires more
1337	   code points than are allowed by the IdentifierClass, protocol
1338	   designers are encouraged to define a profile of the FreeformClass
1339	   that restricts the allowable code points as tightly as possible.
1340	   (The PRECIS Working Group considered the option of allowing
1341	   "superclasses" as well as profiles of PRECIS string classes, but
1342	   decided against allowing superclasses to reduce the likelihood of
1343	   security and interoperability problems.)

1345	12.4.  Local Character Set Issues

1347	   When systems use local character sets other than ASCII and Unicode,
1348	   this specification leaves the problem of converting between the local
1349	   character set and Unicode up to the application or local system.  If
1350	   different applications (or different versions of one application)
1351	   implement different rules for conversions among coded character sets,
1352	   they could interpret the same name differently and contact different
1353	   application servers or other network entities.  This problem is not
1354	   solved by security protocols, such as Transport Layer Security (TLS)
1355	   [RFC5246] and the Simple Authentication and Security Layer (SASL)
1356	   [RFC4422], that do not take local character sets into account.

1358	12.5.  Visually Similar Characters

1360	   Some characters are visually similar and thus can cause confusion
1361	   among humans.  Such characters are often called "confusable
1362	   characters" or "confusables".

1364	   The problem of confusable characters is not necessarily caused by the
1365	   use of Unicode code points outside the ASCII range.  For example, in
1366	   some presentations and to some individuals the string "ju1iet"
1367	   (spelled with DIGIT ONE, U+0031, as the third character) might appear
1368	   to be the same as "juliet" (spelled with LATIN SMALL LETTER L,
1369	   U+006C), especially on casual visual inspection.  This phenomenon is
1370	   sometimes called "typejacking".

1372	   However, the problem is made more serious by introducing the full
1373	   range of Unicode code points into protocol strings.  For example, the
1374	   characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
1375	   Cherokee block look similar to the ASCII characters "STPETER" as they
1376	   might appear when presented using a "creative" font family.

1378	   In some examples of confusable characters, it is unlikely that the
1379	   average human could tell the difference between the real string and
1380	   the fake string.  (Indeed, there is no programmatic way to
1381	   distinguish with full certainty which is the fake string and which is
1382	   the real string; in some contexts, the string formed of Cherokee
1383	   characters might be the real string and the string formed of ASCII
1384	   characters might be the fake string.)  Because PRECIS-compliant
1385	   strings can contain almost any properly-encoded Unicode code point,
1386	   it can be relatively easy to fake or mimic some strings in systems
1387	   that use the PRECIS framework.  The fact that some strings are easily
1388	   confused introduces security vulnerabilities of the kind that have
1389	   also plagued the World Wide Web, specifically the phenomenon known as
1390	   phishing.

1392	   Despite the fact that some specific suggestions about identification
1393	   and handling of confusable characters appear in the Unicode Security
1394	   Considerations [UTR36] and the Unicode Security Mechanisms [UTS39],
1395	   it is also true (as noted in [RFC5890]) that "there are no
1396	   comprehensive technical solutions to the problems of confusable
1397	   characters".  Because it is impossible to map visually similar
1398	   characters without a great deal of context (such as knowing the font
1399	   families used), the PRECIS framework does nothing to map similar-
1400	   looking characters together, nor does it prohibit some characters
1401	   because they look like others.

1403	   Nevertheless, specifications for application protocols that use this
1404	   framework are strongly encouraged to describe how confusable
1405	   characters can be abused to compromise the security of systems that
1406	   use the protocol in question, along with any protocol-specific
1407	   suggestions for overcoming those threats.  In particular, software
1408	   implementations and service deployments that use PRECIS-based
1409	   technologies are strongly encouraged to define and implement
1410	   consistent policies regarding the registration, storage, and
1411	   presentation of visually similar characters.  The following
1412	   recommendations are appropriate:

1414	   1.  An application service SHOULD define a policy that specifies the
1415	       scripts or blocks of characters that the service will allow to be
1416	       registered (e.g., in an account name) or stored (e.g., in a file
1417	       name).  Such a policy SHOULD be informed by the languages and
1418	       scripts that are used to write registered account names; in
1419	       particular, to reduce confusion, the service SHOULD forbid
1420	       registration or storage of strings that contain characters from
1421	       more than one script and SHOULD restrict registrations to
1422	       characters drawn from a very small number of scripts (e.g.,
1423	       scripts that are well-understood by the administrators of the
1424	       service, to improve manageability).

1426	   2.  User-oriented application software SHOULD define a policy that
1427	       specifies how internationalized strings will be presented to a
1428	       human user.  Because every human user of such software has a
1429	       preferred language or a small set of preferred languages, the
1430	       software SHOULD gather that information either explicitly from
1431	       the user or implicitly via the operating system of the user's
1432	       device.  Furthermore, because most languages are typically
1433	       represented by a single script or a small set of scripts, and
1434	       because most scripts are typically contained in one or more
1435	       blocks of characters, the software SHOULD warn the user when
1436	       presenting a string that mixes characters from more than one
1437	       script or block, or that uses characters outside the normal range
1438	       of the user's preferred language(s).  (Such a recommendation is
1439	       not intended to discourage communication across different
1440	       communities of language users; instead, it recognizes the
1441	       existence of such communities and encourages due caution when
1442	       presenting unfamiliar scripts or characters to human users.)

1444	   The challenges inherent in supporting the full range of Unicode code
1445	   points have in the past led some to hope for a way to
1446	   programmatically negotiate more restrictive ranges based on locale,
1447	   script, or other relevant factors, to tag the locale associated with
1448	   a particular string, etc.  As a general-purpose internationalization
1449	   technology, the PRECIS framework does not include such mechanisms.

1451	12.6.  Security of Passwords

1453	   Two goals of passwords are to maximize the amount of entropy and to
1454	   minimize the potential for false positives.  These goals can be
1455	   achieved in part by allowing a wide range of code points and by
1456	   ensuring that passwords are handled in such a way that code points
1457	   are not compared aggressively.  Therefore, it is NOT RECOMMENDED for
1458	   application protocols to profile the FreeformClass for use in
1459	   passwords in a way that removes entire categories (e.g., by
1460	   disallowing symbols or punctuation).  Furthermore, it is NOT
1461	   RECOMMENDED for application protocols to map uppercase and titlecase
1462	   code points to their lowercase equivalents in such strings; instead,
1463	   it is RECOMMENDED to preserve the case of all code points contained
1464	   in such strings and to compare them in a case-sensitive manner.

1466	   That said, software implementers need to be aware that there exist
1467	   tradeoffs between entropy and usability.  For example, allowing a
1468	   user to establish a password containing "uncommon" code points might
1469	   make it difficult for the user to access a service when using an
1470	   unfamiliar or constrained input device.

1472	   Some application protocols use passwords directly, whereas others
1473	   reuse technologies that themselves process passwords (one example of
1474	   such a technology is the Simple Authentication and Security Layer
1475	   [RFC4422]).  Moreover, passwords are often carried by a sequence of
1476	   protocols with backend authentication systems or data storage systems
1477	   such as RADIUS [RFC2865] and LDAP [RFC4510].  Developers of
1478	   application protocols are encouraged to look into reusing these
1479	   profiles instead of defining new ones, so that end-user expectations
1480	   about passwords are consistent no matter which application protocol
1481	   is used.

1483	   In protocols that provide passwords as input to a cryptographic
1484	   algorithm such as a hash function, the client will need to perform
1485	   proper preparation of the password before applying the algorithm,
1486	   since the password is not available to the server in plaintext form.

1488	   Further discussion of password handling can be found in
1489	   [I-D.ietf-precis-saslprepbis].

1491	13.  Interoperability Considerations

1493	   Although strings that are consumed in PRECIS-based application
1494	   protocols are often encoded using UTF-8 [RFC3629], the exact encoding
1495	   is a matter for the application protocol that uses PRECIS, not for
1496	   the PRECIS framework.

1498	   It is known that some existing systems are unable to support the full
1499	   Unicode character set, or even any characters outside the ASCII
1500	   range.  If two (or more) applications need to interoperate when
1501	   exchanging data (e.g., for the purpose of authenticating a username
1502	   or password), they will naturally need to have in common at least one
1503	   coded character set (as defined by [RFC6365]).  Establishing such a
1504	   baseline is a matter for the application protocol that uses PRECIS,
1505	   not for the PRECIS framework.

1507	   Changes to the properties of Unicode code points can occur as the
1508	   Unicode Standard is modified from time to time.  For example, three
1509	   code points underwent changes in their GeneralCategory between
1510	   Unicode 5.2 (current at the time IDNA2008 was originally published)
1511	   and Unicode 6.0, as described in [RFC6452].  Implementers might need
1512	   to be aware that the treatment of these characters differs depending
1513	   on which version of Unicode is available on the system that is using
1514	   IDNA2008 or PRECIS.  Other such differences might arise between the
1515	   version of Unicode current at the time of this writing (7.0) and
1516	   future versions.

1518	14.  References

1520	14.1.  Normative References

1522	   [RFC20]    Cerf, V., "ASCII format for network interchange", RFC 20,
1523	              October 1969.

1525	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1526	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1528	   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
1529	              Interchange", RFC 5198, March 2008.

1531	   [Unicode7.0]
1532	              The Unicode Consortium, "The Unicode Standard, Version
1533	              7.0.0", 2014,
1534	              <http://www.unicode.org/versions/Unicode7.0.0/>.

1536	14.2.  Informative References

1538	   [I-D.ietf-precis-mappings]
1539	              Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS
1540	              classes", draft-ietf-precis-mappings-08 (work in
1541	              progress), June 2014.

1543	   [I-D.ietf-precis-nickname]
1544	              Saint-Andre, P., "Preparation and Comparison of
1545	              Nicknames", draft-ietf-precis-nickname-13 (work in
1546	              progress), November 2014.

1548	   [I-D.ietf-precis-saslprepbis]
1549	              Saint-Andre, P. and A. Melnikov, "Username and Password
1550	              Preparation Algorithms", draft-ietf-precis-saslprepbis-12
1551	              (work in progress), December 2014.

1553	   [I-D.ietf-xmpp-6122bis]
1554	              Saint-Andre, P., "Extensible Messaging and Presence
1555	              Protocol (XMPP): Address Format", draft-ietf-xmpp-
1556	              6122bis-18 (work in progress), December 2014.

1558	   [RFC2865]  Rigney, C., Willens, S., Rubens, A., and W. Simpson,
1559	              "Remote Authentication Dial In User Service (RADIUS)", RFC
1560	              2865, June 2000.

1562	   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
1563	              Internationalized Strings ("stringprep")", RFC 3454,
1564	              December 2002.

1566	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1567	              "Internationalizing Domain Names in Applications (IDNA)",
1568	              RFC 3490, March 2003.

1570	   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1571	              Profile for Internationalized Domain Names (IDN)", RFC
1572	              3491, March 2003.

1574	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
1575	              10646", STD 63, RFC 3629, November 2003.

1577	   [RFC4422]  Melnikov, A. and K. Zeilenga, "Simple Authentication and
1578	              Security Layer (SASL)", RFC 4422, June 2006.

1580	   [RFC4510]  Zeilenga, K., "Lightweight Directory Access Protocol
1581	              (LDAP): Technical Specification Road Map", RFC 4510, June
1582	              2006.

1584	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
1585	              Recommendations for Internationalized Domain Names
1586	              (IDNs)", RFC 4690, September 2006.

1588	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
1589	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
1590	              May 2008.

1592	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1593	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1595	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1596	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

1598	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
1599	              Applications (IDNA): Definitions and Document Framework",
1600	              RFC 5890, August 2010.

1602	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
1603	              Applications (IDNA): Protocol", RFC 5891, August 2010.

1605	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
1606	              Internationalized Domain Names for Applications (IDNA)",
1607	              RFC 5892, August 2010.

1609	   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
1610	              Internationalized Domain Names for Applications (IDNA)",
1611	              RFC 5893, August 2010.

1613	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
1614	              Applications (IDNA): Background, Explanation, and
1615	              Rationale", RFC 5894, August 2010.

1617	   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
1618	              Internationalized Domain Names in Applications (IDNA)
1619	              2008", RFC 5895, September 2010.

1621	   [RFC6365]  Hoffman, P. and J. Klensin, "Terminology Used in
1622	              Internationalization in the IETF", BCP 166, RFC 6365,
1623	              September 2011.

1625	   [RFC6452]  Faltstrom, P. and P. Hoffman, "The Unicode Code Points and
1626	              Internationalized Domain Names for Applications (IDNA) -
1627	              Unicode 6.0", RFC 6452, November 2011.

1629	   [RFC6885]  Blanchet, M. and A. Sullivan, "Stringprep Revision and
1630	              Problem Statement for the Preparation and Comparison of
1631	              Internationalized Strings (PRECIS)", RFC 6885, March 2013.

1633	   [RFC6943]  Thaler, D., "Issues in Identifier Comparison for Security
1634	              Purposes", RFC 6943, May 2013.

1636	   [UAX9]     The Unicode Consortium, "Unicode Standard Annex #9:
1637	              Unicode Bidirectional Algorithm", September 2012,
1638	              <http://unicode.org/reports/tr9/>.

1640	   [UAX11]    The Unicode Consortium, "Unicode Standard Annex #11: East
1641	              Asian Width", September 2012,
1642	              <http://unicode.org/reports/tr11/>.

1644	   [UAX15]    The Unicode Consortium, "Unicode Standard Annex #15:
1645	              Unicode Normalization Forms", August 2012,
1646	              <http://unicode.org/reports/tr15/>.

1648	   [UnicodeCurrent]
1649	              The Unicode Consortium, "The Unicode Standard",
1650	              2014-present, <http://www.unicode.org/versions/latest/>.

1652	   [UTR36]    The Unicode Consortium, "Unicode Technical Report #36:
1653	              Unicode Security Considerations", July 2012,
1654	              <http://unicode.org/reports/tr36/>.

1656	   [UTS39]    The Unicode Consortium, "Unicode Technical Standard #39:
1657	              Unicode Security Mechanisms", July 2012,
1658	              <http://unicode.org/reports/tr39/>.

1660	14.3.  URIs

1662	   [1] http://unicode.org/Public/UNIDATA/PropertyAliases.txt

1664	   [2] http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

1666	Appendix A.  Acknowledgements

1668	   The authors would like to acknowledge the comments and contributions
1669	   of the following individuals during working group discussion: David
1670	   Black, Edward Burns, Dan Chiba, Mark Davis, Alan DeKok, Martin
1671	   Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Bjoern
1672	   Hoehrmann, Paul Hoffman, Jeffrey Hutzelman, Simon Josefsson, John
1673	   Klensin, Alexey Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker,
1674	   Pete Resnick, Andrew Sullivan, Dave Thaler, Yoshiro Yoneya, and
1675	   Florian Zeitz.

1677	   Special thanks are due to John Klensin and Patrik Faltstrom for their
1678	   challenging feedback and detailed reviews.

1680	   Charlie Kaufman, Tom Taylor, and Tim Wicinski reviewed the document
1681	   on behalf of the Security Directorate, the General Area Review Team,
1682	   and the Operations and Management Directorate, respectively.

1684	   During IESG review, Alissa Cooper, Stephen Farrell, and Barry Leiba
1685	   provided comments that led to further improvements.

1687	   Some algorithms and textual descriptions have been borrowed from
1688	   [RFC5892].  Some text regarding security has been borrowed from
1689	   [RFC5890], [I-D.ietf-precis-saslprepbis], and
1690	   [I-D.ietf-xmpp-6122bis].

1692	   Peter Saint-Andre wishes to acknowledge Cisco Systems, Inc., for
1693	   employing him during his work on earlier versions of this document.

1695	Authors' Addresses

1697	   Peter Saint-Andre
1698	   &yet

1700	   Email: peter@andyet.com
1701	   URI:   https://andyet.com/

1703	   Marc Blanchet
1704	   Viagenie
1705	   246 Aberdeen
1706	   Quebec, QC  G1R 2E1
1707	   Canada

1709	   Email: Marc.Blanchet@viagenie.ca
1710	   URI:   http://www.viagenie.ca/