idnits 2.17.1 

draft-ietf-precis-framework-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 10, 2013) is 3940 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-12) exists of
     draft-ietf-precis-mappings-02

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-precis-mappings (ref. 'I-D.ietf-precis-mappings')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'

  == Outdated reference: A later version (-19) exists of
     draft-ietf-precis-nickname-06

  == Outdated reference: A later version (-18) exists of
     draft-ietf-precis-saslprepbis-02

  == Outdated reference: A later version (-24) exists of
     draft-ietf-xmpp-6122bis-07

  -- Obsolete informational reference (is this intentional?): RFC 3454
     (Obsoleted by RFC 7564)

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3491
     (Obsoleted by RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 5226
     (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	PRECIS                                                    P. Saint-Andre
3	Internet-Draft                                       Cisco Systems, Inc.
4	Obsoletes: 3454 (if approved)                                M. Blanchet
5	Intended status: Standards Track                                Viagenie
6	Expires: January 11, 2014                                  July 10, 2013

8	   PRECIS Framework: Preparation and Comparison of Internationalized
9	                    Strings in Application Protocols
10	                     draft-ietf-precis-framework-09

12	Abstract

14	   Application protocols using Unicode code points in protocol strings
15	   need to properly prepare such strings in order to perform valid
16	   comparison operations (e.g., for purposes of authentication or
17	   authorization).  This document defines a framework enabling
18	   application protocols to perform the preparation and comparison of
19	   internationalized strings (a.k.a.  "PRECIS") in a way that depends on
20	   the properties of Unicode code points and thus is agile with respect
21	   to versions of Unicode.  As a result, this framework provides a more
22	   sustainable approach to the handling of internationalized strings
23	   than the previous framework, known as Stringprep (RFC 3454).  A
24	   specification that reuses this framework can either directly use the
25	   PRECIS string classes or subclass the PRECIS string classes as
26	   needed.  This framework takes an approach similar to the revised
27	   internationalized domain names (IDNs) in applications (IDNA)
28	   technology (RFC 5890, RFC 5891, RFC 5892, RFC 5893, RFC 5894) and
29	   thus adheres to the high-level design goals described in the IAB's
30	   recommendations regarding IDNs (RFC 4690), albeit for application
31	   technologies other than the Domain Name System (DNS).  This document
32	   obsoletes RFC 3454.

34	Status of this Memo

36	   This Internet-Draft is submitted in full conformance with the
37	   provisions of BCP 78 and BCP 79.

39	   Internet-Drafts are working documents of the Internet Engineering
40	   Task Force (IETF).  Note that other groups may also distribute
41	   working documents as Internet-Drafts.  The list of current Internet-
42	   Drafts is at http://datatracker.ietf.org/drafts/current/.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."
48	   This Internet-Draft will expire on January 11, 2014.

50	Copyright Notice

52	   Copyright (c) 2013 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
68	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
69	   3.  String Classes . . . . . . . . . . . . . . . . . . . . . . . .  6
70	     3.1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . .  6
71	     3.2.  Order of Operations  . . . . . . . . . . . . . . . . . . .  8
72	     3.3.  IdentifierClass  . . . . . . . . . . . . . . . . . . . . .  8
73	     3.4.  FreeformClass  . . . . . . . . . . . . . . . . . . . . . . 10
74	   4.  Use of PRECIS String Classes . . . . . . . . . . . . . . . . . 12
75	     4.1.  Principles . . . . . . . . . . . . . . . . . . . . . . . . 12
76	     4.2.  Subclassing  . . . . . . . . . . . . . . . . . . . . . . . 13
77	     4.3.  Building Application-Layer Constructs  . . . . . . . . . . 14
78	     4.4.  A Note about Spaces  . . . . . . . . . . . . . . . . . . . 14
79	   5.  Code Point Properties  . . . . . . . . . . . . . . . . . . . . 15
80	   6.  Category Definitions Used to Calculate Derived Property
81	       Value  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
82	     6.1.  LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 17
83	     6.2.  Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 17
84	     6.3.  IgnorableProperties (C)  . . . . . . . . . . . . . . . . . 17
85	     6.4.  IgnorableBlocks (D)  . . . . . . . . . . . . . . . . . . . 17
86	     6.5.  LDH (E)  . . . . . . . . . . . . . . . . . . . . . . . . . 18
87	     6.6.  Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 18
88	     6.7.  BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 19
89	     6.8.  JoinControl (H)  . . . . . . . . . . . . . . . . . . . . . 20
90	     6.9.  OldHangulJamo (I)  . . . . . . . . . . . . . . . . . . . . 20
91	     6.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 20
92	     6.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . . 20
93	     6.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . . 21
94	     6.13. PrecisIgnorableProperties (M)  . . . . . . . . . . . . . . 21
95	     6.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . . 21
96	     6.15. Symbols (O)  . . . . . . . . . . . . . . . . . . . . . . . 21
97	     6.16. Punctuation (P)  . . . . . . . . . . . . . . . . . . . . . 21
98	     6.17. HasCompat (Q)  . . . . . . . . . . . . . . . . . . . . . . 22
99	     6.18. OtherLetterDigits (R)  . . . . . . . . . . . . . . . . . . 22
100	   7.  Calculation of the Derived Property  . . . . . . . . . . . . . 22
101	   8.  Code Points  . . . . . . . . . . . . . . . . . . . . . . . . . 23
102	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 23
103	     9.1.  General Issues . . . . . . . . . . . . . . . . . . . . . . 23
104	     9.2.  Use of the IdentifierClass . . . . . . . . . . . . . . . . 24
105	     9.3.  Use of the FreeformClass . . . . . . . . . . . . . . . . . 24
106	     9.4.  Local Character Set Issues . . . . . . . . . . . . . . . . 24
107	     9.5.  Visually Similar Characters  . . . . . . . . . . . . . . . 25
108	     9.6.  Security of Passwords  . . . . . . . . . . . . . . . . . . 26
109	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 27
110	     10.1. PRECIS Derived Property Value Registry . . . . . . . . . . 27
111	     10.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . . 27
112	     10.3. PRECIS Subclasses Registry . . . . . . . . . . . . . . . . 29
113	     10.4. PRECIS Usage Registry  . . . . . . . . . . . . . . . . . . 29
114	   11. Interoperability Considerations  . . . . . . . . . . . . . . . 31
115	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31
116	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 31
117	     12.2. Informative References . . . . . . . . . . . . . . . . . . 32
118	   Appendix A.  Codepoint Table . . . . . . . . . . . . . . . . . . . 34
119	   Appendix B.  Acknowledgements  . . . . . . . . . . . . . . . . . . 64
120	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 65

122	1.  Introduction

124	   As described in the problem statement for the preparation and
125	   comparison of internationalized strings ("PRECIS") [RFC6885], many
126	   IETF protocols have used the Stringprep framework [RFC3454] as the
127	   basis for preparing and comparing protocol strings that contain
128	   Unicode code points [UNICODE] outside the ASCII range [RFC20].  The
129	   Stringprep framework was developed during work on the original
130	   technology for internationalized domain names (IDNs), here called
131	   "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the Stringprep
132	   profile for IDNs.  At the time, Stringprep was designed as a general
133	   framework so that other application protocols could define their own
134	   Stringprep profiles for the preparation and comparison of strings and
135	   identifiers.  Indeed, a number of application protocols defined such
136	   profiles.

138	   After the publication of [RFC3454] in 2002, several significant
139	   issues arose with the use of Stringprep in the IDN case, as
140	   documented in the IAB's recommendations regarding IDNs [RFC4690]
141	   (most significantly, Stringprep was tied to Unicode version 3.2).
142	   Therefore, the newer IDNA specifications, here called "IDNA2008"
143	   ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer
144	   use Stringprep and Nameprep.  This migration away from Stringprep for
145	   IDNs has prompted other "customers" of Stringprep to consider new
146	   approaches to the preparation and comparison of internationalized
147	   strings (a.k.a.  "PRECIS"), as described in [RFC6885].

149	   This document defines a framework for a post-Stringprep approach to
150	   the preparation and comparison of internationalized strings in
151	   application protocols, based on several principles:

153	   1.  Define a small set of string classes appropriate for common
154	       application protocol constructs such as usernames and free-form
155	       strings.
156	   2.  Define each PRECIS string class in terms of Unicode code points
157	       and their properties so that an algorithm can be used to
158	       determine whether each code point or character category is valid,
159	       disallowed, or unassigned.
160	   3.  Define string classes in terms of allowable code points, so that
161	       any code point not explicitly allowed is forbidden.
162	   4.  Enable application protocols to subclass the PRECIS string
163	       classes if needed, mainly to disallow particular code points that
164	       are currently disallowed in the relevant application protocol
165	       (e.g., characters with special or reserved meaning, such as "@"
166	       and "/" when used as separators within identifiers).
167	   5.  Leave various mapping operations (e.g., case preservation or
168	       lowercasing, Unicode normalization, mapping of certain characters
169	       to other characters or to nothing, handling of full-width and
170	       half-width characters, handling of right-to-left characters) as
171	       the responsibility of application protocols, as was done for
172	       IDNA2008 through an IDNA-specific mapping document [RFC5895].

174	   It is expected that this framework will yield the following benefits:

176	   o  Application protocols will be more version-agile with regard to
177	      the Unicode database.
178	   o  Implementers will be able to share code point tables and software
179	      code across application protocols, most likely by means of
180	      software libraries.
181	   o  End users will be able to acquire more accurate expectations about
182	      the code points that are acceptable in various contexts.  Given
183	      this more uniform set of string classes, it is also expected that
184	      copy/paste operations between software implementing different
185	      application protocols will be more predictable and coherent.

187	   Although this framework is similar to IDNA2008 and borrows some of
188	   the character categories defined in [RFC5892], it defines additional
189	   string classes and character categories to meet the needs of common
190	   application protocols.

192	2.  Terminology

194	   Many important terms used in this document are defined in [RFC5890],
195	   [RFC6365], [RFC6885], and [UNICODE].

197	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
198	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
199	   "OPTIONAL" in this document are to be interpreted as described in
200	   [RFC2119].

202	3.  String Classes

204	3.1.  Overview

206	   IDNA2008 essentially defines a string class of internationalized
207	   domain name (IDN), although it does not use the term "string class".
208	   (This document does not define a string class for domain names, and
209	   application protocols are strongly encouraged to use IDNA2008 as the
210	   appropriate method to prepare domain names and hostnames.)  Because
211	   the IDN string class is designed to meet the particular requirements
212	   of the Domain Name System (DNS), additional string classes are needed
213	   for non-DNS applications.

215	   Starting in 2010, various "customers" of Stringprep began to discuss
216	   the need to define a post-Stringprep approach to the preparation and
217	   comparison of internationalized strings.  As a result of analyzing
218	   existing Stringprep profiles, this community concluded that most
219	   existing uses could be addressed by two string classes:

221	   IdentifierClass:  a sequence of letters, numbers, and symbols that is
222	      used to identify or address a network entity such as a user
223	      account, a venue (e.g., a chatroom), an information source (e.g.,
224	      a data feed), or a collection of data (e.g., a file); the intent
225	      is that this class will be very safe for use in a wide variety of
226	      application protocols, with the result that safety has been
227	      prioritized over inclusiveness for this class.
228	   FreeformClass:  a sequence of letters, numbers, symbols, spaces, and
229	      other code points that is used for free-form strings, including
230	      passwords as well as display elements such as human-friendly
231	      nicknames in chatrooms; the intent is that this class will allow
232	      nearly any Unicode character, with the result that inclusiveness
233	      has been prioritized over safety for this class (e.g., protocol
234	      designers, application developers, service providers, and end
235	      users might not understand or be able to enter all of the
236	      characters that can be included in the FreeformClass).

238	   Although members of the community discussed the possibility of
239	   defining other PRECIS string classes (e.g., a class falling somewhere
240	   between the IdentifierClass and the FreeformClass), they concluded
241	   that the IdentifierClass would be a safe choice meeting the needs of
242	   many or even most application protocols, and that protocols needing a
243	   wider range of Unicode characters could use the FreeformClass
244	   directly or subclass it if needed.

246	   The following subsections discuss the IdentifierClass and
247	   FreeformClass in more detail, with reference to the dimensions
248	   described in Section 3 of [RFC6885].  (Naturally, future documents
249	   can define PRECIS string classes beyond the IdentifierClass and
250	   FreeformClass; see Section 10.2.)  Each string class (or a particular
251	   usage thereof) is defined by the following behavioral rules:

253	   Valid:  defines which code points and character categories are
254	      treated as valid input to the string.
255	   Disallowed:  defines which code points and character categories are
256	      treated as disallowed for the string.
257	   Unassigned:  defines application behavior in the presence of code
258	      points that are unassigned, i.e. unknown for the version of
259	      Unicode the application is built upon.

261	   Width Mapping:  specifies if width mapping is performed on fullwidth
262	      and halfwidth characters, and how the mapping is done (e.g.,
263	      mapping fullwidth and halfwidth characters to their decomposition
264	      equivalents).
265	   Additional Mappings:  specifies whether additional mappings are to be
266	      applied, such as mapping of delimiter characters, mapping of
267	      special characters (e.g., non-ASCII space characters to ASCII
268	      space or certain characters to nothing), and case mapping based on
269	      language and local context (see [I-D.ietf-precis-mappings]).
270	   Case Mapping:  specifies if case mapping is performed (instead of
271	      case preservation) on uppercase and titlecase characters, and how
272	      the mapping is done (e.g., mapping uppercase and titlecase
273	      characters to their lowercase equivalents).
274	   Normalization:  defines which Unicode normalization form (D, KD, C,
275	      or KC) is to be applied (see [UAX15]).
276	   Directionality:  defines application behavior in the presence of code
277	      points that have directionality, in particular right-to-left code
278	      points as defined in the Unicode database (see [UAX9]).

280	   This document defines the valid, disallowed, and unassigned rules for
281	   the IdentifierClass and FreeformClass.  Application protocols that
282	   use these string classes are responsible for defining the
283	   normalization, case mapping, width mapping, and directionality rules,
284	   as well as any additional mappings to be applied

286	3.2.  Order of Operations

288	   To ensure proper comparison, the following order of operations is
289	   REQUIRED:

291	   1.  Width mapping
292	   2.  Additional mappings as specified in [I-D.ietf-precis-mappings]:
293	       1.  Delimiter mapping
294	       2.  Special mapping
295	       3.  Local case mapping
296	   3.  Non-local case mapping
297	   4.  Normalization
298	   5.  PRECIS protocol

300	3.3.  IdentifierClass

302	   Most application technologies need strings that can be used to refer
303	   to, include, or communicate protocol strings like usernames, file
304	   names, data feed identifiers, and chatroom names.  We group such
305	   strings into a class called "IdentifierClass" having the following
306	   features.

308	3.3.1.  Valid

310	   o  Code points traditionally used as letters and numbers in writing
311	      systems, i.e., the LetterDigits ("A") category first defined in
312	      [RFC5892] and listed here under Section 6.1.
313	   o  Code points in the range U+0021 through U+007E, i.e., the
314	      (printable) ASCII7 ("K") rule defined under Section 6.11.  These
315	      code points are "grandfathered" into PRECIS and thus are valid
316	      even if they would otherwise be disallowed according to the
317	      property-based rules specified in the next section.

319	   Although the PRECIS IdentifierClass re-uses the LetterDigits category
320	   from IDNA2008, the range of characters allowed in the IdentifierClass
321	   is wider than the range of characters allowed in IDNA2008.  The main
322	   reason is that IDNA2008 applies the Unstable category before the
323	   LetterDigits category, thus disallowing uppercase characters, whereas
324	   the IdentifierClass does not apply the Unstable category.

326	3.3.2.  Disallowed

328	   o  Control characters, i.e., the Controls ("L") category defined
329	      under Section 6.12.
330	   o  Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
331	      category defined under Section 6.13.
332	   o  Space characters, i.e., the Spaces ("N") category defined under
333	      Section 6.14.
334	   o  Symbol characters, i.e., the Symbols ("O") category defined under
335	      Section 6.15.
336	   o  Punctuation characters, i.e., the Punctuation ("P") category
337	      defined under Section 6.16.
338	   o  Any character that has a compatibility equivalent, i.e., the
339	      HasCompat ("Q") category defined under Section 6.17.  These code
340	      points are disallowed even if they would otherwise be valid
341	      according to the property-based rules specified in the previous
342	      section.
343	   o  Letters and digits other than the "traditional" letters and digits
344	      allowed in IDNs, i.e., the OtherLetterDigits ("R") category
345	      defined under Section 6.18.

347	3.3.3.  Unassigned

349	   Any code points that are not yet assigned in the Unicode character
350	   set SHALL be considered Unassigned for purposes of the
351	   IdentifierClass.

353	3.3.4.  Width Mapping

355	   The width mapping rule MUST be specified by each application protocol
356	   that uses or subclasses the IdentifierClass.

358	3.3.5.  Additional Mappings

360	   Additional mapping rules (if any) MUST be specified by each
361	   application protocol that uses or subclasses the IdentifierClass (see
362	   [I-D.ietf-precis-mappings]).

364	3.3.6.  Case Mapping

366	   The case mapping rule MUST be specified by each application protocol
367	   that uses or subclasses the IdentifierClass.

369	3.3.7.  Normalization

371	   The Unicode normalization form MUST be specified by each application
372	   protocol that uses or subclasses the IdentifierClass.

374	   However, in accordance with [RFC5198], normalization form C (NFC) is
375	   RECOMMENDED.

377	3.3.8.  Directionality

379	   The directionality rule MUST be specified by each application
380	   protocol that uses or subclasses the IdentifierClass.

382	3.4.  FreeformClass

384	   Some application technologies need strings that can be used in a
385	   free-form way, e.g., as a password in an authentication exchange (see
386	   [I-D.ietf-precis-saslprepbis] or a nickname in a chatroom (see
387	   [I-D.ietf-precis-nickname]).  We group such things into a class
388	   called "FreeformClass" having the following features.

390	   Note: Consult Section 9.6 for relevant security considerations when
391	   strings conforming to the FreeformClass, or a subclass thereof, are
392	   used as passwords.

394	3.4.1.  Valid

396	   o  Traditional letters and numbers, i.e., the LetterDigits ("A")
397	      category first defined in [RFC5892] and listed here under
398	      Section 6.1.

400	   o  Letters and digits other than the "traditional" letters and digits
401	      allowed in IDNs, i.e., the OtherLetterDigits ("R") category
402	      defined under Section 6.18.
403	   o  Code points in the range U+0021 through U+007E, i.e., the
404	      (printable) ASCII7 ("K") rule defined under Section 6.11.
405	   o  Any character that has a compatibility equivalent, i.e., the
406	      HasCompat ("Q") category defined under Section 6.17.
407	   o  Space characters, i.e., the Spaces ("N") category defined under
408	      Section 6.14.
409	   o  Symbol characters, i.e., the Symbols ("O") category defined under
410	      Section 6.15.
411	   o  Punctuation characters, i.e., the Punctuation ("P") category
412	      defined under Section 6.16.

414	3.4.2.  Disallowed

416	   o  Control characters, i.e., the Controls ("L") category defined
417	      under Section 6.12.
418	   o  Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
419	      category defined under Section 6.13.

421	3.4.3.  Unassigned

423	   Any code points that are not yet assigned in the Unicode character
424	   set SHALL be considered Unassigned for purposes of the FreeformClass.

426	3.4.4.  Width Mapping

428	   The width mapping rule MUST be specified by each application protocol
429	   that uses or subclasses the FreeformClass.

431	   Because one aspect of Unicode normalization form KC is width mapping,
432	   a PRECIS usage or subclass that uses NFKC does not need to specify
433	   width mapping.  However, if NFC is used then the usage or subclass
434	   needs to specify whether to apply width mapping; in this case, width
435	   mapping is in general RECOMMENDED because allowing fullwidth and
436	   halfwidth characters to remain unmapped to their decomposition
437	   equivalents would violate the principle of least user surprise.  For
438	   more information about the concept of width in East Asian scripts
439	   within Unicode, see for instance [UAX11].

441	3.4.5.  Additional Mappings

443	   Additional mapping rules (if any) MUST be specified by each
444	   application protocol that uses or subclasses the FreeformClass (see
445	   [I-D.ietf-precis-mappings]).

447	3.4.6.  Case Mapping

449	   The case mapping rule MUST be specified by each application protocol
450	   that uses or subclasses the FreeformClass.

452	   In general, the combination of case preservation and case-insensitive
453	   comparison of internationalized strings is NOT RECOMMENDED; instead,
454	   application protocols SHOULD either (a) not preserve case but perform
455	   case-insensitive comparison or (b) preserve case but perform case-
456	   sensitive comparison.

458	   In order to maximize entropy and minimize the potential for false
459	   positives, it is NOT RECOMMENDED for application protocols to map
460	   uppercase and titlecase code points to their lowercase equivalents
461	   when strings conforming to the FreeformClass, or a subclass thereof,
462	   are used in passwords; instead, it is RECOMMENDED to preserve the
463	   case of all code points contained in such strings and then perform
464	   case-sensitive comparison.  See also the related discussion in
465	   [I-D.ietf-precis-saslprepbis].

467	3.4.7.  Normalization

469	   The Unicode normalization form MUST be specified by each application
470	   protocol that uses or subclasses the FreeformClass.

472	   However, in accordance with [RFC5198], normalization form C (NFC) is
473	   RECOMMENDED.

475	3.4.8.  Directionality

477	   The directionality rule MUST be specified by each application
478	   protocol that uses or subclasses the FreeformClass.

480	4.  Use of PRECIS String Classes

482	4.1.  Principles

484	   This document defines the valid, disallowed, and unassigned rules.
485	   Application protocols that use the PRECIS string classes MUST define
486	   the width mapping, additional mapping (if any), case mapping,
487	   normalization, and directionality rules.  That is, such definitions
488	   MUST at a minimum specify the following:

490	   Width Mapping:  Whether fullwidth and halfwidth code points are to be
491	      mapped to their decomposition equivalents.
492	   Additional Mappings:  Whether additional mappings are to be applied,
493	      such as mapping of delimiter characters, mapping of special
494	      characters (e.g., non-ASCII space characters to ASCII space or
495	      certain characters to nothing), and case mapping based on language
496	      and local context (see [I-D.ietf-precis-mappings]).
497	   Case Mapping:  Whether uppercase and titlecase code points are to be
498	      (a) preserved or (b) mapped to lowercase.
499	   Normalization:  Which Unicode normalization form (D, KD, C, or KC) is
500	      to be applied (see [UAX15] for background information); in
501	      accordance with [RFC5198], NFC is RECOMMENDED.
502	   Directionality:  Whether any instance of the class that contains a
503	      right-to-left code point is to be considered a right-to-left
504	      string, or whether some other rule is to be applied (e.g., the
505	      "Bidi Rule" from [RFC5893]).

507	4.2.  Subclassing

509	   Application protocols are allowed to subclass the PRECIS string
510	   classes specified in this document.  As the word "subclass" implies,
511	   a subclass MUST NOT add as valid any code points or character
512	   categories that are disallowed by the relevant PRECIS string class.
513	   However, a subclass MAY do either of the following:

515	   1.  Exclude specific code points that are included in the relevant
516	       PRECIS string class.
517	   2.  Exclude characters matching certain Unicode properties (e.g.,
518	       math symbols) that are included in the relevant PRECIS string
519	       class.

521	   As a result, code points that are defined as valid for the PRECIS
522	   string class being subclassed will be defined as disallowed for the
523	   subclass.

525	   Application protocols that subclass the PRECIS string classes MUST
526	   register with the IANA as described under Section 10.3.

528	   It is RECOMMENDED for subclass names to be of the form
529	   "SubclassBaseClass", where the "Subclass" string is a differentiator
530	   and "BaseClass" is the name of the PRECIS string class being
531	   subclassed; for example, the subclass of the IdentifierClass used for
532	   localparts in the Extensible Messaging and Presence Protocol (XMPP)
533	   is named "LocalpartIdentifierClass" [I-D.ietf-xmpp-6122bis].

535	4.3.  Building Application-Layer Constructs

537	   Sometimes, an application-layer construct does not map directly to
538	   one of the PRECIS string classes.  Consider, for example, the "simple
539	   user name" construct in the Simple Authentication and Security Layer
540	   (SASL) [RFC4422].  Depending on the deployment, a simple user name
541	   might take the form of a user's full name (e.g., the user's personal
542	   name followed by a space and then the user's family name).  Such a
543	   simple user name cannot be defined as an instance of the
544	   IdentifierClass, since space characters are not allowed in the
545	   IdentifierClass; however, it could be defined using a space-separated
546	   sequence of IdentifierClass instances, as in the following pseudo-
547	   ABNF [RFC5234]:

549	      fullname = namepart [1*(1*SP namepart)]
550	      namepart = 1*(idpoint)
551	                 ;
552	                 ; an "idpoint" is a UTF-8 encoded Unicode code point
553	                 ; that conforms to the PRECIS IdentifierClass

555	   Similar techniques could be used to define many application-layer
556	   constructs, say of the form "user@domain" or "/path/to/file".

558	4.4.  A Note about Spaces

560	   With regard to the IdentiferClass, the consensus of the PRECIS
561	   Working Group was that spaces are problematic for many reasons,
562	   including:

564	   o  Many Unicode characters are confusable with ASCII space.
565	   o  Even if non-ASCII space characters are mapped to ASCII space
566	      (U+0020), space characters are often not rendered in user
567	      interfaces, leading to the possibility that human user might
568	      consider a string containing spaces to be equivalent to the same
569	      string without spaces.
570	   o  In some locales, some devices are known to generate a character
571	      other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
572	      user performs an action like hit the space bar on a keyboard.

574	   One consequence of disallowing space characters in the
575	   IdentifierClass might be to effectively discourage the use of ASCII
576	   space (or, even more problematically, non-ASCII space characters)
577	   within identifiers created in newer application protocols; given the
578	   challenges involved in properly handling space characters in
579	   identifiers and other protocol strings, the Working Group considered
580	   this to be a feature, not a bug.

582	   However, the FreeformClass does allow spaces, which enables
583	   application protocols to define subclasses of the FreeformClass that
584	   are more flexible than any profiles of the IdentifierClass.

586	5.  Code Point Properties

588	   In order to implement the string classes described above, this
589	   document does the following:

591	   1.  Reviews and classifies the collections of code points in the
592	       Unicode character set by examining various code point properties.
593	   2.  Defines an algorithm for determining a derived property value,
594	       which can vary depending on the string class being used by the
595	       relevant application protocol.

597	   This document is not intended to specify precisely how derived
598	   property values are to be applied in protocol strings.  That
599	   information is the responsibility of the protocol specification that
600	   uses or subclasses a PRECIS string class from this document.

602	   The value of the property is to be interpreted as follows.

604	   PROTOCOL VALID  Those code points that are allowed to be used in any
605	      PRECIS string class (IdentifierClass and FreeformClass).  Code
606	      points with this property value are permitted for general use in
607	      any string class.  The abbreviated term PVALID is used to refer to
608	      this value in the remainder of this document.
609	   SPECIFIC CLASS PROTOCOL VALID  Those code points that are allowed to
610	      be used in specific string classes.  Code points with this
611	      property value are permitted for use in specific string classes.
612	      In the remainder of this document, the abbreviated term *_PVAL is
613	      used, where * = (NAME | FREE), i.e., either FREE_PVAL or ID_PVAL.
614	   CONTEXTUAL RULE REQUIRED  Some characteristics of the character, such
615	      as its being invisible in certain contexts or problematic in
616	      others, require that it not be used in labels unless specific
617	      other characters or properties are present.  The abbreviated term
618	      CONTEXT is used to refer to this value in the remainder of this
619	      document.  As in IDNA2008, there are two subdivisions of
620	      CONTEXTUAL RULE REQUIRED, the first for Join_controls (called
621	      CONTEXTJ) and the second for other characters (called CONTEXTO).
622	   DISALLOWED  Those code points that must not permitted in any PRECIS
623	      string class.
624	   SPECIFIC CLASS DISALLOWED  Those code points that are not to be
625	      included in a specific string class.  Code points with this
626	      property value are not permitted in one of the string classes but
627	      might be permitted in others.  In the remainder of this document,
628	      the abbreviated term *_DIS is used, where * = (NAME | FREE), i.e.,
629	      either FREE_DIS or ID_DIS.

631	   UNASSIGNED  Those code points that are not designated (i.e. are
632	      unassigned) in the Unicode Standard.

634	   The mechanisms described here allow determination of the value of the
635	   property for future versions of Unicode (including characters added
636	   after Unicode 5.2 or 6.1 depending on the category, since some
637	   categories in this document are reused from IDNA2008 and therefore
638	   were defined at the time of Unicode 5.2).  Changes in Unicode
639	   properties that do not affect the outcome of this process do not
640	   affect this framework.  For example, a character can have its Unicode
641	   General_Category value [UNICODE] change from So to Sm, or from Lo to
642	   Ll, without affecting the algorithm results.  Moreover, even if such
643	   changes were to result, the BackwardCompatible list (Section 6.7) can
644	   be adjusted to ensure the stability of the results.

646	   Some code points need to be allowed in exceptional circumstances, but
647	   ought to be excluded in all other cases; these rules are also
648	   described in other documents.  The most notable of these are the Join
649	   Control characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
650	   NON-JOINER.  Both of them have the derived property value CONTEXTJ.
651	   A character with the derived property value CONTEXTJ or CONTEXTO
652	   (CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
653	   rule has been established and the context of the character is
654	   consistent with that rule.  It is invalid to generate a string
655	   containing these characters unless such a contextual rule is found
656	   and satisfied.  PRECIS does not define its own contextual rules, but
657	   instead re-uses the contextual rules defined for IDNA2008; please see
658	   Appendix A of [RFC5892] for more information.

660	6.  Category Definitions Used to Calculate Derived Property Value

662	   The derived property obtains its value based on a two-step procedure:

664	   1.  Characters are placed in one or more character categories either
665	       (1) based on core properties defined by the Unicode Standard or
666	       (2) by treating the code point as an exception and addressing the
667	       code point based on its code point value.  These categories are
668	       not mutually exclusive.
669	   2.  Set operations are used with these categories to determine the
670	       values for a property that is specific to a given string class.
671	       These operations are specified under Section 7.

673	   (Note: Unicode property names and property value names might have
674	   short abbreviations, such as "gc" for the General_Category property
675	   and "Ll" for the Lowercase_Letter property value of the gc property.)

677	   In the following specification of character categories, the operation
678	   that returns the value of a particular Unicode character property for
679	   a code point is designated by using the formal name of that property
680	   (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for
681	   "code point".  For example, the value of the General_Category
682	   property for a code point is indicated by General_Category(cp).

684	   The first ten categories (A-J) shown below were previously defined
685	   for IDNA2008 and are copied directly from [RFC5892].  Some of these
686	   categories are reused in PRECIS and some of them are not; however,
687	   the lettering of categories is retained to prevent overlap and to
688	   ease implementation of both IDNA2008 and PRECIS in a single software
689	   application.  The next seven categories (K-Q) are specific to PRECIS.

691	6.1.  LetterDigits (A)

693	   Note: This category is defined in [RFC5892] and copied here for use
694	   in PRECIS.

696	   A: General_Category(cp) is in {Ll, Lu, Lm, Lo, Mn, Mc, Nd}

698	   These rules identify characters commonly used in mnemonics and often
699	   informally described as "language characters".

701	   For more information, see section 4.5 of [UNICODE].

703	   The categories used in this rule are:
704	   o  Ll - Lowercase_Letter
705	   o  Lu - Uppercase_Letter
706	   o  Lm - Modifier_Letter
707	   o  Lo - Other_Letter
708	   o  Mn - Nonspacing_Mark
709	   o  Mc - Spacing_Mark
710	   o  Nd - Decimal_Number

712	6.2.  Unstable (B)

714	   Note: This category is defined in [RFC5892] but not used in PRECIS.

716	6.3.  IgnorableProperties (C)

718	   Note: This category is defined in [RFC5892] but not used in PRECIS.
719	   See the "PrecisIgnorableProperties (M)" category below for a more
720	   inclusive category used in PRECIS identifiers.

722	6.4.  IgnorableBlocks (D)

724	   Note: This category is defined in [RFC5892] but not used in PRECIS.

726	6.5.  LDH (E)

728	   Note: This category is defined in [RFC5892] but not used in PRECIS.
729	   See the "ASCII7 (K)" category below for a more inclusive category
730	   used in PRECIS identifiers.

732	6.6.  Exceptions (F)

734	   Note: This category is defined in [RFC5892] and used in PRECIS to
735	   ensure consistent treatment of the relevant code points.

737	   F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
738	                0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
739	                0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
740	                06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
741	                302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
742	                30FB}

744	   This category explicitly lists code points for which the category
745	   cannot be assigned using only the core property values that exist in
746	   the Unicode standard.  The values are according to the table below:

748	   PVALID -- Would otherwise have been DISALLOWED

750	   00DF; PVALID     # LATIN SMALL LETTER SHARP S
751	   03C2; PVALID     # GREEK SMALL LETTER FINAL SIGMA
752	   06FD; PVALID     # ARABIC SIGN SINDHI AMPERSAND
753	   06FE; PVALID     # ARABIC SIGN SINDHI POSTPOSITION MEN
754	   0F0B; PVALID     # TIBETAN MARK INTERSYLLABIC TSHEG
755	   3007; PVALID     # IDEOGRAPHIC NUMBER ZERO

757	   CONTEXTO -- Would otherwise have been DISALLOWED

759	   00B7; CONTEXTO   # MIDDLE DOT
760	   0375; CONTEXTO   # GREEK LOWER NUMERAL SIGN (KERAIA)
761	   05F3; CONTEXTO   # HEBREW PUNCTUATION GERESH
762	   05F4; CONTEXTO   # HEBREW PUNCTUATION GERSHAYIM
763	   30FB; CONTEXTO   # KATAKANA MIDDLE DOT

765	   CONTEXTO -- Would otherwise have been PVALID

767	   0660; CONTEXTO   # ARABIC-INDIC DIGIT ZERO
768	   0661; CONTEXTO   # ARABIC-INDIC DIGIT ONE
769	   0662; CONTEXTO   # ARABIC-INDIC DIGIT TWO
770	   0663; CONTEXTO   # ARABIC-INDIC DIGIT THREE
771	   0664; CONTEXTO   # ARABIC-INDIC DIGIT FOUR
772	   0665; CONTEXTO   # ARABIC-INDIC DIGIT FIVE
773	   0666; CONTEXTO   # ARABIC-INDIC DIGIT SIX
774	   0667; CONTEXTO   # ARABIC-INDIC DIGIT SEVEN
775	   0668; CONTEXTO   # ARABIC-INDIC DIGIT EIGHT
776	   0669; CONTEXTO   # ARABIC-INDIC DIGIT NINE
777	   06F0; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT ZERO
778	   06F1; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT ONE
779	   06F2; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT TWO
780	   06F3; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT THREE
781	   06F4; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT FOUR
782	   06F5; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT FIVE
783	   06F6; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT SIX
784	   06F7; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT SEVEN
785	   06F8; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT EIGHT
786	   06F9; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT NINE

788	   DISALLOWED -- Would otherwise have been PVALID

790	   0640; DISALLOWED # ARABIC TATWEEL
791	   07FA; DISALLOWED # NKO LAJANYALAN
792	   302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK
793	   302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK
794	   3031; DISALLOWED # VERTICAL KANA REPEAT MARK
795	   3032; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
796	   3033; DISALLOWED # VERTICAL KANA REPEAT MARK UPPER HALF
797	   3034; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
798	                      UPPER HA
799	   3035; DISALLOWED # VERTICAL KANA REPEAT MARK LOWER HALF
800	   303B; DISALLOWED # VERTICAL IDEOGRAPHIC ITERATION MARK

802	6.7.  BackwardCompatible (G)

804	   Note: This category is defined in [RFC5892] and copied here for use
805	   in PRECIS.  Because of how the PRECIS string classes are defined,
806	   only changes that would result in code points being added to or
807	   removed from the LetterDigits ("A") category would result in
808	   backward-incompatible modifications to code point assignments.
809	   Therefore, management of this category is handled via the processes
810	   specified in [RFC5892].

812	   G: cp is in {}

814	   This category includes the code points for which property values in
815	   versions of Unicode after 5.2 have changed in such a way that the
816	   derived property value would no longer be PVALID or DISALLOWED.  If
817	   changes are made to future versions of Unicode so that code points
818	   might change property value from PVALID or DISALLOWED, then this
819	   table can be updated and keep special exception values so that the
820	   property values for code points stay stable.

822	6.8.  JoinControl (H)

824	   Note: This category is defined in [RFC5892] and copied here for use
825	   in PRECIS.

827	   H: Join_Control(cp) = True

829	   This category consists of Join Control characters (i.e., they are not
830	   in LetterDigits (Section 6.1) but are still required in strings under
831	   some circumstances).

833	6.9.  OldHangulJamo (I)

835	   Note: This category is defined in [RFC5892] and copied here for use
836	   in PRECIS.

838	   I: Hangul_Syllable_Type(cp) is in {L, V, T}

840	   This category consists of all conjoining Hangul Jamo (Leading Jamo,
841	   Vowel Jamo, and Trailing Jamo).

843	   Elimination of conjoining Hangul Jamos from the set of PVALID
844	   characters results in restricting the set of Korean PVALID characters
845	   just to preformed, modern Hangul syllable characters.  Old Hangul
846	   syllables, which must be spelled with sequences of conjoining Hangul
847	   Jamos, are not PVALID for string classes.

849	6.10.  Unassigned (J)

851	   Note: This category is defined in [RFC5892] and copied here for use
852	   in PRECIS.

854	   J: General_Category(cp) is in {Cn} and
855	      Noncharacter_Code_Point(cp) = False

857	   This category consists of code points in the Unicode character set
858	   that are not (yet) assigned.  It should be noted that Unicode
859	   distinguishes between 'unassigned code points' and 'unassigned
860	   characters'.  The unassigned code points are all but (Cn -
861	   Noncharacters), while the unassigned *characters* are all but (Cn +
862	   Cs).

864	6.11.  ASCII7 (K)

866	   This PRECIS-specific category exempts most characters in the
867	   (printable) ASCII-7 range from other rules that might be applied
868	   during PRECIS processing, on the assumption that these code points
869	   are in such wide use that disallowing them would be counter-
870	   productive.

872	   K: cp is in {0021..007E}

874	6.12.  Controls (L)

876	   L: Control(cp) = True

878	6.13.  PrecisIgnorableProperties (M)

880	   This PRECIS-specific category is used to group code points that are
881	   not recommended for use in PRECIS string classes.

883	   M: Default_Ignorable_Code_Point(cp) = True or
884	      Noncharacter_Code_Point(cp) = True

886	   The definition for Default_Ignorable_Code_Point can be found in the
887	   DerivedCoreProperties.txt [2] file, and at the time of Unicode 6.1 is
888	   as follows:

890	     Other_Default_Ignorable_Code_Point
891	   + Cf (Format characters)
892	   + Variation_Selector
893	   - White_Space
894	   - FFF9..FFFB (Annotation Characters)
895	   - 0600..0604, 06DD, 070F, 110BD (exceptional Cf characters
896	                                    that should be visible)

898	6.14.  Spaces (N)

900	   This PRECIS-specific category is used to group code points that are
901	   space characters.

903	   N: General_Category(cp) is in {Zs}

905	6.15.  Symbols (O)

907	   This PRECIS-specific category is used to group code points that are
908	   symbols.

910	   O: General_Category(cp) is in {Sm, Sc, Sk, So}

912	6.16.  Punctuation (P)

914	   This PRECIS-specific category is used to group code points that are
915	   punctuation characters.

917	   P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}

919	6.17.  HasCompat (Q)

921	   This PRECIS-specific category is used to group code points that have
922	   compatibility equivalents as explained in Chapter 2 and Chapter 3 of
923	   [UNICODE].

925	   Q: toNFKC(cp) != cp

927	   The toNFKC() operation returns the code point in normalization form
928	   KC.  For more information, see Section 5 of [UAX15].

930	6.18.  OtherLetterDigits (R)

932	   This PRECIS-specific category is used to group code points that are
933	   letters and digits other than the "traditional" letters and digits
934	   grouped under the LetterDigits (A) class (see Section 6.1).

936	   R: General_Category(cp) is in {Lt, Nl, No, Me}

938	7.  Calculation of the Derived Property

940	   Possible values of the derived property are:

942	   o  PVALID
943	   o  ID_PVAL
944	   o  FREE_PVAL
945	   o  CONTEXTJ
946	   o  CONTEXTO
947	   o  DISALLOWED
948	   o  ID_DIS
949	   o  FREE_DIS
950	   o  UNASSIGNED

952	   Note: The value of the derived property calculated can depend on the
953	   string class; for example, if an identifier used in an application
954	   protocol is defined as using or subclassing the PRECIS
955	   IdentifierClass then a space character such as U+0020 would be
956	   assigned to ID_DIS, whereas if an identifier is defined as using or
957	   subclassing the PRECIS FreeformClass then the character would be
958	   assigned to FREE_PVAL.  For the sake of brevity, the designation
959	   "FREE_PVAL" is used in the code point tables, instead of the longer
960	   designation "ID_DIS or FREE_PVAL".  In practice, the derived
961	   properties ID_PVAL and FREE_DIS are not used in this specification,
962	   since every ID_PVAL code point is PVALID and every FREE_DIS code
963	   point is DISALLOWED.

965	   The algorithm to calculate the value of the derived property is as
966	   follows.  (Note: Use of the name of a rule (such as "Exception")
967	   implies the set of code points that the rule defines, whereas the
968	   same name as a function call (such as "Exception(cp)") implies the
969	   value that the code point has in the Exceptions table.)

971	   If .cp. .in. Exceptions Then Exceptions(cp);
972	   Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
973	   Else If .cp. .in. Unassigned Then UNASSIGNED;
974	   Else If .cp. .in. ASCII7 Then PVALID;
975	   Else If .cp. .in. JoinControl Then CONTEXTJ;
976	   Else If .cp. .in. OldHangulJamo Then DISALLOWED;
977	   Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
978	   Else If .cp. .in. Controls Then DISALLOWED;
979	   Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
980	   Else If .cp. .in. LetterDigits Then PVALID;
981	   Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
982	   Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
983	   Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
984	   Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
985	   Else DISALLOWED;

987	8.  Code Points

989	   The Categories and Rules defined under Section 6 and Section 7 apply
990	   to all Unicode code points.  The table in Appendix A shows, for
991	   illustrative purposes, the consequences of the categories and
992	   classification rules, and the resulting property values.

994	   The list of code points that can be found in Appendix A is non-
995	   normative.  Instead, the rules defined by Section 6 and Section 7 are
996	   normative, and any tables are derived from the rules.

998	9.  Security Considerations

1000	9.1.  General Issues

1002	   The security of applications that use this framework can depend in
1003	   part on the proper preparation and comparison of internationalized
1004	   strings.  For example, such strings can be used to make
1005	   authentication and authorization decisions, and the security of an
1006	   application could be compromised if an entity providing a given
1007	   string is connected to the wrong account or online resource based on
1008	   different interpretations of the string.

1010	   Specifications of application protocols that use this framework are
1011	   encouraged to describe how internationalized strings are used in the
1012	   protocol, including the security implications of any false positives
1013	   and false negatives that might result from various comparison
1014	   operations.  For some helpful guidelines, refer to [RFC6943],
1015	   [RFC5890], [UTR36], and [UTR39].

1017	9.2.  Use of the IdentifierClass

1019	   Strings that conform to the IdentifierClass and any subclass thereof
1020	   are intended to be relatively safe for use in a broad range of
1021	   applications, primarily because they include only letters, digits,
1022	   and "grandfathered" non-space characters from the ASCII range; thus
1023	   they exclude spaces, characters with compatibility equivalents, and
1024	   almost all symbols and punctuation marks.  However, because such
1025	   strings can still include so-called confusable characters (see
1026	   Section 9.5, protocol designers and implementers are encouraged to
1027	   pay close attention to the security considerations described
1028	   elsewhere in this document.

1030	9.3.  Use of the FreeformClass

1032	   Strings that conform to the FreeformClass and many subclasses thereof
1033	   can include virtually any Unicode character.  This makes the
1034	   FreeformClass quite expressive, but also problematic from the
1035	   perspective of possible user confusion.  Protocol designers are
1036	   hereby warned that the FreeformClass contains codepoints they might
1037	   not understand, and are encouraged to use or subclass the
1038	   IdentifierClass wherever feasible; however, if an application
1039	   protocol requires more code points than are allowed by the
1040	   IdentifierClass, protocol designers are encouraged to define a
1041	   subclass of the FreeformClass that restricts the allowable code
1042	   points as tightly as possible.  (The working group considered the
1043	   option of allowing superclasses as well as subclasses of PRECIS
1044	   string classes, but decided against allowing superclasses to reduce
1045	   the likelihood of security and interoperability problems.)

1047	9.4.  Local Character Set Issues

1049	   When systems use local character sets other than ASCII and Unicode,
1050	   these specifications leave the problem of converting between the
1051	   local character set and Unicode up to the application or local
1052	   system.  If different applications (or different versions of one
1053	   application) implement different rules for conversions among coded
1054	   character sets, they could interpret the same name differently and
1055	   contact different application servers or other network entities.
1056	   This problem is not solved by security protocols, such as Transport
1057	   Layer Security (TLS) [RFC5246] and the Simple Authentication and
1058	   Security Layer (SASL) [RFC4422], that do not take local character
1059	   sets into account.

1061	9.5.  Visually Similar Characters

1063	   Some characters are visually similar and thus can cause confusion
1064	   among humans.  Such characters are often called "confusable
1065	   characters" or "confusables".

1067	   The problem of confusable characters is not necessarily caused by the
1068	   use of Unicode code points outside the ASCII range.  For example, in
1069	   some presentations and to some individuals the string "ju1iet"
1070	   (spelled with the Arabic numeral one as the third character) might
1071	   appear to be the same as "juliet" (spelled with the lowercase version
1072	   of the letter "L"), especially on casual visual inspection.  This
1073	   phenomenon is sometimes called "typejacking".

1075	   However, the problem is made more serious by introducing the full
1076	   range of Unicode code points into protocol strings.  For example, the
1077	   characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
1078	   Cherokee block look similar to the ASCII characters "STPETER" as they
1079	   might look when presented using a "creative" font family.

1081	   In some examples of confusable characters, it is unlikely that the
1082	   average human could tell the difference between the real string and
1083	   the fake string.  (Indeed, there is no programmatic way to
1084	   distinguish with full certainty which is the fake string and which is
1085	   the real string; in some contexts, the string formed of Cherokee
1086	   characters might be the real string and the string formed of ASCII
1087	   characters might be the fake string.)  Because PRECIS-compliant
1088	   strings can contain almost any properly-encoded Unicode code point,
1089	   it can be relatively easy to fake or mimic some strings in systems
1090	   that use the PRECIS framework.  The fact that some strings are easily
1091	   confused introduces security vulnerabilities of the kind that have
1092	   also plagued the World Wide Web, specifically the phenomenon known as
1093	   phishing.

1095	   Despite the fact that some specific suggestions about identification
1096	   and handling of confusable characters appear in the Unicode Security
1097	   Considerations [UTR36], it is also true (as noted in [RFC5890]) that
1098	   "there are no comprehensive technical solutions to the problems of
1099	   confusable characters".  Because it is impossible to map visually
1100	   similar characters without a great deal of context (such as knowing
1101	   the font families used), the PRECIS framework does nothing to map
1102	   similar-looking characters together, nor does it prohibit some
1103	   characters because they look like others.

1105	   Nevertheless, specifications for application protocols that use this
1106	   framework MUST describe how confusable characters can be used to
1107	   compromise the security of systems that use the protocol in question,
1108	   along with any protocol-specific suggestions for overcoming those
1109	   threats.  In particular, software implementations and service
1110	   deployments that use PRECIS-based technologies are strongly
1111	   encouraged to define and implement consistent policies regarding the
1112	   registration, storage, and presentation of visually similar
1113	   characters.  The following recommendations are appropriate:

1115	   1.  An application service SHOULD define a policy that specifies the
1116	       scripts or blocks of characters that the service will allow to be
1117	       registered (e.g., in an account name) or stored (e.g., in a file
1118	       name).  Such a policy SHOULD be informed by the languages and
1119	       scripts that are used to write registered account names; in
1120	       particular, to reduce confusion, the service SHOULD forbid
1121	       registration or storage of stings that contain characters from
1122	       more than one script and SHOULD restrict registrations to
1123	       characters drawn from a very small number of scripts (e.g.,
1124	       scripts that are well-understood by the administrators of the
1125	       service, to improve manageability).
1126	   2.  User-oriented application software SHOULD define a policy that
1127	       specifies how internationalized strings will be presented to a
1128	       human user.  Because every human user of such software has a
1129	       preferred language or a small set of preferred languages, the
1130	       software SHOULD gather that information either explicitly from
1131	       the user or implicitly via the operating system of the user's
1132	       device.  Furthermore, because most languages are typically
1133	       represented by a single script or a small set of scripts, and
1134	       because most scripts are typically contained in one or more
1135	       blocks of characters, the software SHOULD warn the user when
1136	       presenting a string that mixes characters from more than one
1137	       script or block, or that uses characters outside the normal range
1138	       of the user's preferred language(s).  (Such a recommendation is
1139	       not intended to discourage communication across different
1140	       communities of language users; instead, it recognizes the
1141	       existence of such communities and encourages due caution when
1142	       presenting unfamiliar scripts or characters to human users.)

1144	9.6.  Security of Passwords

1146	   Two goals of passwords are to maximize the amount of entropy and to
1147	   minimize the potential for false positives.  These goals can be
1148	   achieved in part by allowing a wide range of code points and by
1149	   ensuring that passwords are handled in such a way that code points
1150	   are not compared aggressively.  Therefore, it is NOT RECOMMENDED for
1151	   application protocols to subclass the FreeformClass for use in
1152	   passwords in a way that removes entire categories (e.g., by
1153	   disallowing symbols or punctuation).  Furthermore, it is NOT
1154	   RECOMMENDED for application protocols to map uppercase and titlecase
1155	   code points to their lowercase equivalents in such strings; instead,
1156	   it is RECOMMENDED to preserve the case of all code points contained
1157	   in such strings and to compare them in a case-sensitive manner.

1159	   That said, software implementers need to be aware that there exist
1160	   tradeoffs between entropy and usability.  For example, allowing a
1161	   user to establish a password containing "uncommon" code points might
1162	   make it difficult for the user to access a service when using an
1163	   unfamiliar or constrained input device.

1165	   Some application protocols use passwords directly, whereas others
1166	   reuse technologies that themselves process passwords (one example of
1167	   such a technology is the Simple Authentication and Security Layer
1168	   [RFC4422]).  Moreover, passwords are often carried by a sequence of
1169	   protocols with backend authentication systems or data storage systems
1170	   such as RADIUS [RFC2865] and LDAP [RFC4510].  Developers of
1171	   application protocols are encouraged to look into reusing these
1172	   profiles instead of defining new ones, so that end-user expectations
1173	   about passwords are consistent no matter which application protocol
1174	   is used.

1176	10.  IANA Considerations

1178	10.1.  PRECIS Derived Property Value Registry

1180	   IANA is requested to create a PRECIS-specific registry with the
1181	   Derived Properties for the versions of Unicode that are released
1182	   after (and including) version 6.1.  The derived property value is to
1183	   be calculated in cooperation with a designated expert [RFC5226]
1184	   according to the rules specified under Section 6 and Section 7, not
1185	   by copying the non-normative table found under Appendix A.

1187	   The IESG is to be notified if backward-incompatible changes to the
1188	   table of derived properties are discovered or if other problems arise
1189	   during the process of creating the table of derived property values
1190	   or during expert review.  Changes to the rules defined under
1191	   Section 6 and Section 7) require IETF Review, as described in
1192	   [RFC5226].

1194	10.2.  PRECIS Base Classes Registry

1196	   IANA is requested to create a registry of PRECIS string classes.  In
1197	   accordance with [RFC5226], the registration policy is "RFC Required".

1199	   The registration template is as follows:

1201	   Base Class:  [the name of the PRECIS string class]
1202	   Description:  [a brief description of the PRECIS string class and its
1203	      intended use, e.g., "A sequence of letters, numbers, and symbols
1204	      that is used to identify or address a network entity."]
1205	   Width Mapping:  [the behavioral rule for handling of width, e.g.,
1206	      "Map fullwidth and halfwidth characters to their decomposition
1207	      equivalents."]
1208	   Additional Mappings:  [any additional mappings are required or
1209	      recommended, e.g., "Map non-ASCII space characters to ASCII
1210	      space."; or "Application Specific" if to be defined by protocols
1211	      that use the PRECIS string class]
1212	   Case Mapping:  [the behavioral rule for handling of case, e.g., "Map
1213	      uppercase and titlecase characters to lowercase."; or "Application
1214	      Specific" if to be defined by protocols that use the PRECIS string
1215	      class]
1216	   Normalization:  [which Unicode normalization form is applied, e.g.,
1217	      "NFC"; or "Application Specific" if to be defined by protocols
1218	      that use the PRECIS string class]
1219	   Directionality:  [the behavioral rule for handling of right-to-left
1220	      code points, e.g., "The 'Bidi Rule' defined in RFC 5893 applies.";
1221	      or "Application Specific" if to be defined by protocols that use
1222	      the PRECIS string class]
1223	   Specification:  [the RFC number]

1225	   The initial registrations are as follows:

1227	   Base Class: FreeformClass.
1228	   Description: A sequence of letters, numbers, symbols, spaces, and
1229	         other code points that is used for free-form strings.
1230	   Width Mapping: Application Specific.
1231	   Additional Mappings: Application Specific.
1232	   Case Mapping: Application Specific.
1233	   Normalization: Application Specific.
1234	   Directionality: Application Specific.
1235	   Specification: RFC XXXX.  [Note to RFC Editor: please change XXXX to
1236	                  the number issued for this specification.]

1238	   Base Class: IdentifierClass.
1239	   Description: A sequence of letters, numbers, and symbols that is
1240	         used to identify or address a network entity.
1241	   Width Mapping: Application Specific.
1242	   Additional Mappings: Application Specific.
1243	   Case Mapping: Application Specific.
1244	   Normalization: Application Specific.
1245	   Directionality: Application Specific.
1246	   Specification: RFC XXXX.  [Note to RFC Editor: please change XXXX to
1247	                  the number issued for this specification.]

1249	10.3.  PRECIS Subclasses Registry

1251	   IANA is requested to create a registry of subclasses that use the
1252	   PRECIS string classes.  In accordance with [RFC5226], the
1253	   registration policy is "Expert Review".  This policy was chosen in
1254	   order to ensure that "customers" of PRECIS receive appropriate
1255	   guidance regarding the sometimes complex and subtle
1256	   internationalization issues related to subclassing of PRECIS string
1257	   classes.

1259	   The registration template is as follows:

1261	   Subclass:  [the name of the subclass]
1262	   Base Class:  [which PRECIS string class is being subclassed]
1263	   Exclusions:  [a brief description of the specific code points that
1264	      are excluded or of the properties based on which characters are
1265	      excluded, e.g., "Eight legacy characters in the ASCII range" or
1266	      "Any character that has a compatibility equivalent, i.e., the
1267	      HasCompat category"]
1268	   Specification:  [a pointer to relevant documentation, such as an RFC
1269	      or Internet-Draft]

1271	   In order to request a review, the registrant shall send a completed
1272	   template to the precis@ietf.org list or its designated successor.

1274	   Factors to focus on while reviewing subclass registrations include
1275	   the following:

1277	   o  Is the problem well-defined?
1278	   o  Is it clear what applications will use this subclass?
1279	   o  Would an existing PRECIS string class or subclass solve the
1280	      problem?
1281	   o  Are the defined exclusions a reasonable solution to the problem
1282	      for the relevant applications?
1283	   o  Is the subclass clearly defined?
1284	   o  Does the subclass reduce the degree to which human users could be
1285	      surprised by application behavior (the "principle of least user
1286	      surprise")?
1287	   o  Is the subclass based on an appropriate dividing line between user
1288	      interface (culture, context, intent, locale, device limitations,
1289	      etc.) and the use of conformant strings in protocol elements?
1290	   o  Does the subclass introduce any new security concerns (e.g., false
1291	      positives for authentication or authorization)?

1293	10.4.  PRECIS Usage Registry

1295	   IANA is requested to create a registry of application protocols that
1296	   use the PRECIS string classes.  The registry will include one entry
1297	   for each use (e.g., if a protocol uses both the IdentifierClass and
1298	   the FreeformClass then the specification for that protocol would
1299	   submit two registrations).  In accordance with [RFC5226], the
1300	   registration policy is "Expert Review".  This policy was chosen in
1301	   order to ensure that "customers" of PRECIS receive appropriate
1302	   guidance regarding the sometimes complex and subtle
1303	   internationalization issues related to use of PRECIS string classes.

1305	   The registration template is as follows:

1307	   Applicability:  [the specific protocol elements to which this usage
1308	      applies, e.g., "Localparts in XMPP addresses."]
1309	   Base Class:  [the PRECIS string class that is being used or
1310	      subclassed]
1311	   Subclass:  [whether the protocol has defined a subclass of the PRECIS
1312	      string class and, if so, the name of the subclass, e.g., "Yes,
1313	      LocalpartIdentifierClass."]
1314	   Replaces:  [the Stringprep profile that this PRECIS usage replaces,
1315	      if any]
1316	   Width Mapping:  [the behavioral rule for handling of width, e.g.,
1317	      "Map fullwidth and halfwidth characters to their decomposition
1318	      equivalents."]
1319	   Additional Mappings:  [any additional mappings are required or
1320	      recommended, e.g., "Map non-ASCII space characters to ASCII
1321	      space."]
1322	   Case Mapping:  [the behavioral rule for handling of case, e.g., "Map
1323	      uppercase and titlecase characters to lowercase."]
1324	   Normalization:  [which Unicode normalization form is applied, e.g.,
1325	      "NFC"]
1326	   Directionality:  [the behavioral rule for handling of right-to-left
1327	      code points, e.g., "The 'Bidi Rule' defined in RFC 5893 applies."]
1328	   Specification:  [a pointer to relevant documentation, such as an RFC
1329	      or Internet-Draft]

1331	   In order to request a review, the registrant shall send a completed
1332	   template to the precis@ietf.org list or its designated successor.

1334	   Factors to focus on while reviewing usage registrations include the
1335	   following:

1337	   o  Does the specification define what kinds of applications are
1338	      involved and the protocol elements to which this usage applies?
1339	   o  Is there a PRECIS string class or subclass that would be more
1340	      appropriate to use?
1341	   o  Are the normalization, case mapping, width mapping, additional
1342	      mapping, and directionality rules appropriate for the intended
1343	      use?

1345	   o  Does the usage reduce the degree to which human users could be
1346	      surprised by application behavior (the "principle of least user
1347	      surprise")?
1348	   o  Is the usage based on an appropriate dividing line between user
1349	      interface (culture, context, intent, locale, device limitations,
1350	      etc.) and the use of conformant strings in protocol elements?
1351	   o  Does the usage introduce any new security concerns (e.g., false
1352	      positives for authentication or authorization)?

1354	11.  Interoperability Considerations

1356	   Although strings that are consumed in PRECIS-based application
1357	   protocols are often encoded using UTF-8 [RFC3629], the exact encoding
1358	   is a matter for the application protocol that reuses PRECIS, not for
1359	   the PRECIS framework.

1361	   It is known that some existing systems are unable to support the full
1362	   Unicode character set, or even any characters outside the ASCII
1363	   range.  If two (or more) applications need to interoperate when
1364	   exchanging data (e.g., for the purpose of authenticating a username
1365	   or password), they will naturally need have in common at least one
1366	   coded character set (as defined by [RFC6365]).  Establishing such a
1367	   baseline is a matter for the application protocol that reuses PRECIS,
1368	   not for the PRECIS framework.

1370	12.  References

1372	12.1.  Normative References

1374	   [I-D.ietf-precis-mappings]
1375	              Yoneya, Y. and T. NEMOTO, "Mapping characters for precis
1376	              classes", draft-ietf-precis-mappings-02 (work in
1377	              progress), May 2013.

1379	   [RFC20]    Cerf, V., "ASCII format for network interchange", RFC 20,
1380	              October 1969.

1382	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1383	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1385	   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
1386	              Interchange", RFC 5198, March 2008.

1388	   [UNICODE]  The Unicode Consortium, "The Unicode Standard, Version
1389	              6.2", 2012,
1390	              <http://www.unicode.org/versions/Unicode6.2.0/>.

1392	12.2.  Informative References

1394	   [I-D.ietf-precis-nickname]
1395	              Saint-Andre, P., "Preparation and Comparison of
1396	              Nicknames", draft-ietf-precis-nickname-06 (work in
1397	              progress), July 2013.

1399	   [I-D.ietf-precis-saslprepbis]
1400	              Saint-Andre, P. and A. Melnikov, "Username and Password
1401	              Preparation Algorithms", draft-ietf-precis-saslprepbis-02
1402	              (work in progress), April 2013.

1404	   [I-D.ietf-xmpp-6122bis]
1405	              Saint-Andre, P., "Extensible Messaging and Presence
1406	              Protocol (XMPP): Address Format",
1407	              draft-ietf-xmpp-6122bis-07 (work in progress), April 2013.

1409	   [RFC2865]  Rigney, C., Willens, S., Rubens, A., and W. Simpson,
1410	              "Remote Authentication Dial In User Service (RADIUS)",
1411	              RFC 2865, June 2000.

1413	   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
1414	              Internationalized Strings ("stringprep")", RFC 3454,
1415	              December 2002.

1417	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1418	              "Internationalizing Domain Names in Applications (IDNA)",
1419	              RFC 3490, March 2003.

1421	   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1422	              Profile for Internationalized Domain Names (IDN)",
1423	              RFC 3491, March 2003.

1425	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
1426	              10646", STD 63, RFC 3629, November 2003.

1428	   [RFC4422]  Melnikov, A. and K. Zeilenga, "Simple Authentication and
1429	              Security Layer (SASL)", RFC 4422, June 2006.

1431	   [RFC4510]  Zeilenga, K., "Lightweight Directory Access Protocol
1432	              (LDAP): Technical Specification Road Map", RFC 4510,
1433	              June 2006.

1435	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
1436	              Recommendations for Internationalized Domain Names
1437	              (IDNs)", RFC 4690, September 2006.

1439	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
1440	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
1441	              May 2008.

1443	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1444	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1446	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1447	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

1449	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
1450	              Applications (IDNA): Definitions and Document Framework",
1451	              RFC 5890, August 2010.

1453	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
1454	              Applications (IDNA): Protocol", RFC 5891, August 2010.

1456	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
1457	              Internationalized Domain Names for Applications (IDNA)",
1458	              RFC 5892, August 2010.

1460	   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
1461	              Internationalized Domain Names for Applications (IDNA)",
1462	              RFC 5893, August 2010.

1464	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
1465	              Applications (IDNA): Background, Explanation, and
1466	              Rationale", RFC 5894, August 2010.

1468	   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
1469	              Internationalized Domain Names in Applications (IDNA)
1470	              2008", RFC 5895, September 2010.

1472	   [RFC6365]  Hoffman, P. and J. Klensin, "Terminology Used in
1473	              Internationalization in the IETF", BCP 166, RFC 6365,
1474	              September 2011.

1476	   [RFC6885]  Blanchet, M. and A. Sullivan, "Stringprep Revision and
1477	              Problem Statement for the Preparation and Comparison of
1478	              Internationalized Strings (PRECIS)", RFC 6885, March 2013.

1480	   [RFC6943]  Thaler, D., "Issues in Identifier Comparison for Security
1481	              Purposes", RFC 6943, May 2013.

1483	   [UAX9]     The Unicode Consortium, "Unicode Standard Annex #9:
1484	              Unicode Bidirectional Algorithm", September 2012,
1485	              <http://unicode.org/reports/tr9/>.

1487	   [UAX11]    The Unicode Consortium, "Unicode Standard Annex #11: East
1488	              Asian Width", September 2012,
1489	              <http://unicode.org/reports/tr11/>.

1491	   [UAX15]    The Unicode Consortium, "Unicode Standard Annex #15:
1492	              Unicode Normalization Forms", August 2012,
1493	              <http://unicode.org/reports/tr15/>.

1495	   [UTR36]    The Unicode Consortium, "Unicode Technical Report #36:
1496	              Unicode Security Considerations", July 2012,
1497	              <http://unicode.org/reports/tr36/>.

1499	   [UTR39]    The Unicode Consortium, "Unicode Technical Report #39:
1500	              Unicode Security Mechanisms", July 2012,
1501	              <http://unicode.org/reports/tr39/>.

1503	URIs

1505	   [1]  <http://unicode.org/Public/UNIDATA/PropertyAliases.txt>

1507	   [2]  <http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>

1509	Appendix A.  Codepoint Table

1511	   WARNING: The following table is provisional and is still being
1512	   verified!

1514	   If one applies the property calculation rules from Section 7 to the
1515	   code points 0x0000 to 0x10FFFF in Unicode 6.2, the result is as shown
1516	   in the following table, in Unicode Character Database (UCD) format.
1517	   The columns of the table are as follows:

1519	   1.  The code point or codepoint range.
1520	   2.  The assignment for the code point or range, where the value is
1521	       one of PVALID, DISALLOWED, UNASSIGNED, CONTEXTO, CONTEXTJ, or
1522	       FREE_PVAL (which includes ID_DIS).
1523	   3.  The name or names for the code point or range.

1525	   This table is non-normative, is included only for illustrative
1526	   purposes, and applies only to Unicode 6.2, not to past or future
1527	   versions of Unicode.  Please note that the strings displayed in the
1528	   third column are not necessarily the formal name of the code point
1529	   (as defined in [UNICODE]) because the fixed width of the RFC format
1530	   necessitated truncation of many names.

1532	   0000..001F  ; DISALLOWED  # <control>
1533	   0020        ; FREE_PVAL   # SPACE
1534	   0021..007E  ; PVALID      # EXCLAM MARK .. TILDE
1535	   007F..009F  ; DISALLOWED  # <control>
1536	   00A0..00AC  ; FREE_PVAL   # NO-BREAK SPACE .. NOT SIGN
1537	   00AD        ; DISALLOWED  # SOFT HYPH
1538	   00AE..00B6  ; FREE_PVAL   # REGISTERED SIGN .. PILCROW SIGN
1539	   00B7        ; CONTEXTO    # MIDDLE DOT
1540	   00B8..00BF  ; FREE_PVAL   # CEDILLA..INV QUEST IND
1541	   00C0..00D6  ; PVALID      # LAT CAP LET A W GRAV..LAT CAP O
1542	   00D7        ; FREE_PVAL   # MULTIPLICATION SIGN
1543	   00D8..00F6  ; PVALID      # LAT CAP LET O W STROKE..LAT SM
1544	   00F7        ; FREE_PVAL   # DIVISION SIGN
1545	   00F8..0131  ; PVALID      # LAT SM LET O W STROKE..LAT SM LET
1546	   0132..0133  ; FREE_PVAL   # LAT CAP LIG IJ..LAT SM LIB IJ
1547	   0134..013E  ; PVALID      # LAT CAP LET J W CIRCUM..LAT SM LET
1548	   013F..0140  ; FREE_PVAL   # LAT CAP LET L W MID DOT..LAT SM LET
1549	   0141..0148  ; PVALID      # LAT CAP LET L W STROKE..LAT SM LET
1550	   0149        ; FREE_PVAL   # LAT SM LET N PRECEDED BY APOS
1551	   014A..017E  ; PVALID      # LAT CAP LET ENG..LAT SM LET Z W CA
1552	   017F        ; FREE_PVAL   # LAT SM LET LONG S
1553	   0180..01C3  ; PVALID      # LAT SM LET B W STROKE..LAT LET RETR
1554	   01C4..01CC  ; FREE_PVAL   # LAT CAP LET DZ W CARON..LAT SM
1555	   01CD..01F0  ; PVALID      # LAT CAP LET A W CARON..LAT SM LET J
1556	   01F1..01F3  ; FREE_PVAL   # LAT CAP LET DZ..LAT SM LET DZ
1557	   01F4..02AF  ; PVALID      # LAT CAP LET G W ACUTE..LAT SM
1558	   02B0..02B8  ; FREE_PVAL   # MOD LET SM H..MOD LET SM Y
1559	   02B9..02C1  ; PVALID      # MOD LET PRIME..MOD LET REV GLOT ST
1560	   02C2..02C5  ; FREE_PVAL   # MOD LET L ARROW..MOD LET D ARROW
1561	   02C6..02D1  ; PVALID      # MOD LET CIRCUM ACC..MOD LET HALF TR
1562	   02D2..02EB  ; FREE_PVAL   # MOD LET CENT R HALF RING..MOD LET Y
1563	   02EC        ; PVALID      # MOD LET VOICING
1564	   02ED        ; FREE_PVAL   # MOD LET UNASPIRATED
1565	   02EE        ; PVALID      # MOD LET DOUBLE APOS
1566	   02EF..02FF  ; FREE_PVAL   # MOD LET LOW D ARR..MOD LET LOW L AR
1567	   0300..034E  ; PVALID      # COMB GRAVE ACCENT..COMB UP ARROW BE
1568	   034F        ; DISALLOWED  # COMB GRAPHEME JOINER
1569	   0350..0374  ; PVALID      # COMB RIGHT ARROWHEAD..GREEK NUM SIG
1570	   0375        ; CONTEXTO    # GREEK LOW NUM SIGN
1571	   0376..0377  ; PVALID      # GR CAP LET PAMPHYLIAN DIGAMMA..GR S
1572	   0378..0379  ; UNASSIGNED  # <reserved>..<reserved>
1573	   037A        ; FREE_PVAL   # GR YPOGEGRAMMENI..GR SM REV DOT LUN
1574	   037B..037D  ; PVALID      # GR SM REV LUN SIG..GR SM REV DOT LU
1575	   037E        ; FREE_PVAL   # GREEK QUEST MARK
1576	   037F..0383  ; UNASSIGNED  # <reserved>..<reserved>
1577	   0384..0385  ; FREE_PVAL   # GREEK TONOS..GREEK DIALYTIKA TONOS
1578	   0386        ; PVALID      # GR CAP LET ALPHA W TONOS
1579	   0387        ; FREE_PVAL   # GREEK ANO TELEIA
1580	   0388..038A  ; PVALID      # GR CAP LET EPSILON W TONOS..GR CAP
1581	   038B        ; UNASSIGNED  # <reserved>
1582	   038C        ; PVALID      # GREEK CAP LET OMICRON W TONOS
1583	   038D        ; UNASSIGNED  # <reserved>
1584	   038E..03A1  ; PVALID      # GR CAP LET EPSILON W TONOS..GR CAP
1585	   03A2        ; UNASSIGNED  # <reserved>
1586	   03A3..03CF  ; PVALID      # GREEK CAP LET SIGMA..GR CAP
1587	   03D0..03D2  ; FREE_PVAL   # GR BETA SYM..GR UPSILON W HOOK
1588	   03D3..03D4  ; PVALID      # GR UPSILON W ACUTE AND HOOK..GR UP
1589	   03D5..03D6  ; FREE_PVAL   # GR PHI SYM..GR PI SYM
1590	   03D7..03EF  ; PVALID      # GR KAI SYM..COPT SM LET DEI
1591	   03F0..03F2  ; FREE_PVAL   # GR KAPPA SYM..GR LUNATE SIGMA
1592	   03F3        ; PVALID      # GREEK LET YOT
1593	   03F4..03F6  ; FREE_PVAL   # GR CAP THETA..GR REV LUNATE EPSILON
1594	   03F7..03F8  ; PVALID      # GR CAP LET SHO..GR SM LET SHO
1595	   03F9        ; FREE_PVAL   # GREEK CAP LUNATE SIGMA SYM
1596	   03FA..0481  ; PVALID      # GR CAP LET SAN..CYR SML LET KOPPA
1597	   0482        ; FREE_PVAL   # CYR THOUSANDS SIGN
1598	   0483..0487  ; PVALID      # COMB CYR TITLO..COMB CYR POK
1599	   0488..0489  ; FREE_PVAL   # COMB CYR HUNDRED THOUSANDS SIGN..C
1600	   048A..0527  ; PVALID      # CYR CAP LET SH I W TAIL..CYR S
1601	   0528..0530  ; UNASSIGNED  # <reserved>..<reserved>
1602	   0531..0556  ; PVALID      # ARM CAP LET AYB..ARM CAP LET FEH
1603	   0557..0558  ; UNASSIGNED  # <reserved>..<reserved>
1604	   0559        ; PVALID      # ARM MOD LET LEFT HALF RING
1605	   055A..055F  ; FREE_PVAL   # ARM APOS..ARM ABBREV
1606	   0560        ; UNASSIGNED  # <reserved>
1607	   0561..0586  ; PVALID      # ARM SM LET AYB..ARMENIAN SM LE
1608	   0587        ; FREE_PVAL   # ARM SM LIG ECH YIWN
1609	   0588        ; UNASSIGNED  # <reserved>
1610	   0589..058A  ; FREE_PVAL   # ARMENIAN FULL STOP..ARMENIAN HYPH
1611	   058B..058E  ; UNASSIGNED  # <reserved>..<reserved>
1612	   058F        ; FREE_PVAL   # ARMENIAN DRAM SIGN
1613	   0590        ; UNASSIGNED  # <reserved>
1614	   0591..05BD  ; PVALID      # HEBR ACC ETNAHTA..HEBR PNT ME
1615	   05BE        ; FREE_PVAL   # HEBR PUNCT MAQAF
1616	   05BF        ; PVALID      # HEBR PNT RAFE
1617	   05C0        ; FREE_PVAL   # HEBR PUNCT PASEQ
1618	   05C1..05C2  ; PVALID      # HEBR PNT SHIN DOT..HEBR PNT SIN DOT
1619	   05C3        ; FREE_PVAL   # HEBR PUNCT SOF PASUQ
1620	   05C4..05C5  ; PVALID      # HEBR MARK UP DOT..HEBR MARK LOW DOT
1621	   05C6        ; FREE_PVAL   # HEBR PUNCT NUN HAFUKHA
1622	   05C7        ; PVALID      # HEBR PNT QAMATS QATAN
1623	   05C8..05CF  ; UNASSIGNED  # <reserved>..<reserved>
1624	   05D0..05EA  ; PVALID      # HEBR LET ALEF..HEBR LET TAV
1625	   05EB..05EF  ; UNASSIGNED  # <reserved>..<reserved>
1626	   05F0..05F2  ; PVALID      # HEBR LIG YIDDISH DOUBLE VAV..HEBR L
1627	   05F3..05F4  ; CONTEXTO    # HEBR PUNCT GERESH..HEBR PUNCTUATIO
1628	   05F5..05FF  ; UNASSIGNED  # <reserved>..<reserved>
1629	   0600..0604  ; DISALLOWED  # ARAB NUM SIGN..ARAB SIGN SAM
1630	   0605        ; UNASSIGNED  # <reserved>..<reserved>
1631	   0606..060F  ; FREE_PVAL   # AR-IND CUBE ROOT..ARAB SIGN MISRA
1632	   0610..061A  ; PVALID      # ARAB SIGN SALLALLAHOU ALAYHE ..AR
1633	   061B        ; FREE_PVAL   # ARAB SEMICOLON
1634	   061C..061D  ; UNASSIGNED  # <reserved>..<reserved>
1635	   061E..061F  ; FREE_PVAL   # ARAB TRIPLE DOT PUNCT MARK..ARAB Q
1636	   0620..063F  ; PVALID      # ARAB LET KASH..ARAB LET FARSI YEH
1637	   0640        ; DISALLOWED  # ARAB TATWEEL
1638	   0641..065F  ; PVALID      # ARAB LET FEH..ARAB WAVY HAMZA BEL
1639	   0660..0669  ; CONTEXTO    # AR-IND DIG ZERO..AR-IND DIG
1640	   066A..066D  ; FREE_PVAL   # ARAB PCT SIGN..ARAB FIVE PNTED STA
1641	   066E..0674  ; PVALID      # ARAB LET DOTLESS BEH..ARAB LET HIG
1642	   0675..0678  ; FREE_PVAL   # ARAB LET HIGH HAMZA ALEF..ARAB LET
1643	   0679..06D3  ; PVALID      # ARAB LET TTEH..ARAB LET YEH BARREE
1644	   06D4        ; FREE_PVAL   # ARAB FULL STOP
1645	   06D5..06DC  ; PVALID      # ARAB LET AE..ARAB SM HIGH SEEN
1646	   06DD        ; DISALLOWED  # ARAB END OF AYAH
1647	   06DE        ; FREE_PVAL   # ARAB START OF RUB EL HIZB
1648	   06DF..06E8  ; PVALID      # ARAB SM HIGH ROUNDED ZERO..ARAB SM
1649	   06E9        ; FREE_PVAL   # ARAB PLACE OF SAJDAH
1650	   06EA..06EF  ; PVALID      # ARAB EMPTY CENTRE LOW STOP..ARAB LET
1651	   06F0..06F9  ; CONTEXTO    # EXT AR-IND DIG ZERO..EXT A
1652	   06FA..06FF  ; PVALID      # ARAB LET SHEEN W DOT BEL..ARAB
1653	   0700..070D  ; FREE_PVAL   # SYR END OF PARA..SYR HARKLEAN AST
1654	   070E        ; UNASSIGNED  # <reserved>
1655	   070F        ; DISALLOWED  # SYR ABBR MARK
1656	   0710..074A  ; PVALID      # SYR LET ALAPH..SYR BARREKH
1657	   074B..074C  ; UNASSIGNED  # <reserved>..<reserved>
1658	   074D..07B1  ; PVALID      # SYR LET SOGDIAN ZHAIN..THAANA LET N
1659	   07B2..07BF  ; UNASSIGNED  # <reserved>..<reserved>
1660	   07C0..07F5  ; PVALID      # NKO DIG ZERO..NKO LOW TONE APOS
1661	   07F6..07F9  ; FREE_PVAL   # NKO SYM OO DENNEN..NKO EXCLAMATI
1662	   07FA        ; DISALLOWED  # NKO LAJANYALAN
1663	   07FB..07FF  ; UNASSIGNED  # <reserved>..<reserved>
1664	   0800..082D  ; PVALID      # SAMAR LET ALAF..SAMAR MARK NEQUDA
1665	   082E..082F  ; UNASSIGNED  # <reserved>..<reserved>
1666	   0830..083E  ; FREE_PVAL   # SAMAR PUNCT NEQUDAA..SAMAR PUN
1667	   083F        ; UNASSIGNED  # <reserved>
1668	   0840..085B  ; PVALID      # MANDAIC LET HALQA..MANDAIC GEM
1669	   085C..085D  ; UNASSIGNED  # <reserved>..<reserved>
1670	   085E        ; FREE_PVAL   # MANDAIC PUNCTUATION
1671	   085F..089F  ; UNASSIGNED  # <reserved>..<reserved>
1672	   08A0        ; PVALID      # ARAB LET BEH W SM V BEL
1673	   08A1        ; UNASSIGNED  # <reserved>
1674	   08A2..08AC  ; PVALID      # ARAB LET JEEM W 2 DOTS AB..ARAB
1675	   08AD..08E3  ; UNASSIGNED  # <reserved>..<reserved>
1676	   08E4..08FE  ; PVALID      # ARAB CURLY FATHA..ARAB DAMMA W
1677	   08FF        ; UNASSIGNED  # <reserved>
1678	   0900..0963  ; PVALID      # DEVAN SIGN INV CANDRABINDU..DEVAN V
1679	   0964..0965  ; FREE_PVAL   # DEVAN DANDA..DEVAN DOUBLE DANDA
1680	   0966..096F  ; PVALID      # DEVAN DIG ZERO..DEVAN DIG NINE
1681	   0970        ; FREE_PVAL   # DEVAN ABBR SIGN
1682	   0971..0977  ; PVALID      # DEVAN SIGN HIGH SPACING DOT..DEVAN
1683	   0978        ; UNASSIGNED  # <reserved>
1684	   0979..097F  ; PVALID      # DEVAN SIGN HIGH SPACING DOT..DEVAN
1685	   0980        ; UNASSIGNED  # <reserved>
1686	   0981..0983  ; PVALID      # BENG SIGN CANDRABINDU..BENG SIGN VIS
1687	   0984        ; UNASSIGNED  # <reserved>
1688	   0985..098C  ; PVALID      # BENG LET A..BENG LET VOC L
1689	   098D..098E  ; UNASSIGNED  # <reserved>..<reserved>
1690	   098F..0990  ; PVALID      # BENG LET E..BENG LET AI
1691	   0991..0992  ; UNASSIGNED  # <reserved>..<reserved>
1692	   0993..09A8  ; PVALID      # BENG LET O..BENG LET NA
1693	   09A9        ; UNASSIGNED  # <reserved>
1694	   09AA..09B0  ; PVALID      # BENG LET PA..BENG LET RA
1695	   09B1        ; UNASSIGNED  # <reserved>
1696	   09B2        ; PVALID      # BENG LET LA
1697	   09B3..09B5  ; UNASSIGNED  # <reserved>..<reserved>
1698	   09B6..09B9  ; PVALID      # BENG LET SHA..BENG LET HA
1699	   09BA..09BB  ; UNASSIGNED  # <reserved>..<reserved>
1700	   09BC..09C4  ; PVALID      # BENG SIGN NUKTA..BENG VOW SIGN VOCAL
1701	   09C5..09C6  ; UNASSIGNED  # <reserved>..<reserved>
1702	   09C7..09C8  ; PVALID      # BENG VOW SIGN E..BENG VOW SIGN AI
1703	   09C9..09CA  ; UNASSIGNED  # <reserved>..<reserved>
1704	   09CB..09CE  ; PVALID      # BENG VOW SIGN O..BENG LET KHANDA
1705	   09CF..09D6  ; UNASSIGNED  # <reserved>..<reserved>
1706	   09D7        ; PVALID      # BENG AU LEN MARK
1707	   09D8..09DB  ; UNASSIGNED  # <reserved>..<reserved>
1708	   09DC..09DD  ; PVALID      # BENG LET RRA..BENG LET RHA
1709	   09DE        ; UNASSIGNED  # <reserved>
1710	   09DF..09E3  ; PVALID      # BENG LET YYA..BENG VOW SIG
1711	   09E4..09E5  ; UNASSIGNED  # <reserved>..<reserved>
1712	   09E6..09F1  ; PVALID      # BENG DIG ZERO..BENG LET RA W L
1713	   09F2..09FB  ; FREE_PVAL   # BENG RUPEE MARK..BENG GANDA MARK
1714	   09FC..0A00  ; UNASSIGNED  # <reserved>..<reserved>
1715	   0A01..0A03  ; PVALID      # GURMUKHI SIGN ADAK BINDI..GURMUKHI
1716	   0A04        ; UNASSIGNED  # <reserved>
1717	   0A05..0A0A  ; PVALID      # GURMUKHI LET A..GURMUKHI LET UU
1718	   0A0B..0A0E  ; UNASSIGNED  # <reserved>..<reserved>
1719	   0A0F..0A10  ; PVALID      # GURMUKHI LET EE..GURMUKHI LET AI
1720	   0A11..0A12  ; UNASSIGNED  # <reserved>..<reserved>
1721	   0A13..0A28  ; PVALID      # GURMUKHI LET OO..GURMUKHI LET NA
1722	   0A29        ; UNASSIGNED  # <reserved>
1723	   0A2A..0A30  ; PVALID      # GURMUKHI LET PA..GURMUKHI LET RA
1724	   0A31        ; UNASSIGNED  # <reserved>
1725	   0A32..0A33  ; PVALID      # GURMUKHI LET LA..GURMUKHI LET LLA
1726	   0A34        ; UNASSIGNED  # <reserved>
1727	   0A35.OA36   ; PVALID      # GURMUKHI LET VA..GURMUKHI LET SHA
1728	   0A37        ; UNASSIGNED  # <reserved>
1729	   0A38..0A39  ; PVALID      # GURMUKHI LET SA..GURMUKHI LET HA
1730	   0A3A..0A3B  ; UNASSIGNED  # <reserved>..<reserved>
1731	   0A3C        ; PVALID      # GURMUKHI SIGN NUKTA
1732	   0A3D        ; UNASSIGNED  # <reserved>
1733	   0A3E..0A42  ; PVALID      # GURMUKHI VOW SIGN AA..GURMUKHI V
1734	   0A43..0A46  ; UNASSIGNED  # <reserved>..<reserved>
1735	   0A47..0A48  ; PVALID      # GURMUKHI VOW SIGN EE..GURMUKHI V
1736	   0A49..0A4A  ; UNASSIGNED  # <reserved>..<reserved>
1737	   0A4B..0A4D  ; PVALID      # GURMUKHI VOW SIGN OO..GURMUKHI S
1738	   0A4E..0A50  ; UNASSIGNED  # <reserved>..<reserved>
1739	   0A51        ; PVALID      # GURMUKHI SIGN UDAAT
1740	   0A52..0A58  ; UNASSIGNED  # <reserved>..<reserved>
1741	   0A59..0A5C  ; PVALID      # GURMUKHI LET KHHA..GURMUKHI LET RRA
1742	   0A5D        ; UNASSIGNED  # <reserved>
1743	   0A5E        ; PVALID      # GURMUKHI LET FA
1744	   0A5F..0A65  ; UNASSIGNED  # <reserved>..<reserved>
1745	   0A66..0A75  ; PVALID      # GURMUKHI DIG ZERO..GURMUKHI SIGN YA
1746	   0A76..0A80  ; UNASSIGNED  # <reserved>..<reserved>
1747	   0A81..0A83  ; PVALID      # GUJARATI SIGN CANDRABINDU..GUJARATI
1748	   0A84        ; UNASSIGNED  # <reserved>
1749	   0A85..0A8D  ; PVALID      # GUJARATI LET A..GUJARATI VOW CAND
1750	   0A8E        ; UNASSIGNED  # <reserved>
1751	   0A8F..0A91  ; PVALID      # GUJARATI LET E..GUJARATI VOW CAND
1752	   0A92        ; UNASSIGNED  # <reserved>
1753	   0A93..0AA8  ; PVALID      # GUJARATI LET O..GUJARATI LET NA
1754	   0AA9        ; UNASSIGNED  # <reserved>
1755	   0AAA..0AB0  ; PVALID      # GUJARATI LET PA..GUJARATI LET RA
1756	   0AB1        ; UNASSIGNED  # <reserved>
1757	   0AB2..0AB3  ; PVALID      # GUJARATI LET LA..GUJARATI LET LLA
1758	   0AB4        ; UNASSIGNED  # <reserved>
1759	   0AB5..0AB9  ; PVALID      # GUJARATI LET VA..GUJARATI LET HA
1760	   0ABA..0ABB  ; UNASSIGNED  # <reserved>..<reserved>
1761	   0ABC..0AC5  ; PVALID      # GUJARATI SIGN NUKTA..GUJARATI VOW
1762	   0AC6        ; UNASSIGNED  # <reserved>
1763	   0AC7..0AC9  ; PVALID      # GUJARATI VOW SIGN E..GUJARATI VOW
1764	   0ACA        ; UNASSIGNED  # <reserved>
1765	   0ACB..0ACD  ; PVALID      # GUJARATI VOW SIGN O..GUJARATI SIG
1766	   0ACE..0ACF  ; UNASSIGNED  # <reserved>..<reserved>
1767	   0AD0        ; PVALID      # GUJARATI OM
1768	   0AD1..0ADF  ; UNASSIGNED  # <reserved>..<reserved>
1769	   0AE0..0AE3  ; PVALID      # GUJARATI LET VOC RR..GUJARATI V
1770	   0AE4..0AE5  ; UNASSIGNED  # <reserved>..<reserved>
1771	   0AE6..0AEF  ; PVALID      # GUJARATI DIG ZERO..GUJARATI DIG NINE
1772	   0AF0..0AF1  ; FREE_PVAL   # GUJARATI ABBR SIGN..GUJARATI RUPEE S
1773	   0AF2..0B00  ; UNASSIGNED  # <reserved>..<reserved>
1774	   0B01..0B03  ; PVALID      # ORIYA SIGN CANDRABINDU..ORIYA SIGN V
1775	   0B04        ; UNASSIGNED  # <reserved>
1776	   0B05..0B0C  ; PVALID      # ORIYA LET A..ORIYA LET VOC L
1777	   0B0D..0B0E  ; UNASSIGNED  # <reserved>..<reserved>
1778	   0B0F..0B10  ; PVALID      # ORIYA LET E..ORIYA LET AI
1779	   0B11..0B12  ; UNASSIGNED  # <reserved>..<reserved>
1780	   0B13..0B28  ; PVALID      # ORIYA LET O..ORIYA LET NA
1781	   0B29        ; UNASSIGNED  # <reserved>
1782	   0B2A..0B30  ; PVALID      # ORIYA LET PA..ORIYA LET RA
1783	   0B31        ; UNASSIGNED  # <reserved>
1784	   0B32..0B33  ; PVALID      # ORIYA LET LA..ORIYA LET LLA
1785	   0B34        ; UNASSIGNED  # <reserved>
1786	   0B35..0B39  ; PVALID      # ORIYA LET VA..ORIYA LET HA
1787	   0B3A..0B3B  ; UNASSIGNED  # <reserved>..<reserved>
1788	   0B3C..0B44  ; PVALID      # ORIYA SIGN NUKTA..ORIYA VOW SIGN
1789	   0B45..0B46  ; UNASSIGNED  # <reserved>..<reserved>
1790	   0B47..0B48  ; PVALID      # ORIYA VOW SIGN E..ORIYA VOW SIG
1791	   0B49..0B4A  ; UNASSIGNED  # <reserved>..<reserved>
1792	   0B4B..0B4D  ; PVALID      # ORIYA VOW SIGN O..ORIYA SIGN VIRA
1793	   0B4E..0B55  ; UNASSIGNED  # <reserved>..<reserved>
1794	   0B56..0B57  ; PVALID      # ORIYA AI LEN MARK..ORIYA AU LENG
1795	   0B58..0B5B  ; UNASSIGNED  # <reserved>..<reserved>
1796	   0B5C..0B5D  ; PVALID      # ORIYA LET RRA..ORIYA LET RHA
1797	   0B5E        ; UNASSIGNED  # <reserved>
1798	   0B5F..0B63  ; PVALID      # ORIYA LET YYA..ORIYA VOW SIGN VOCA
1799	   0B64..0B65  ; UNASSIGNED  # <reserved>..<reserved>
1800	   0B66..0B6F  ; PVALID      # ORIYA DIG ZERO..ORIYA DIG NINE
1801	   0B70        ; FREE_PVAL   # ORIYA ISSHAR
1802	   0B71        ; PVALID      # ORIYA LET WA
1803	   0B72..0B77  ; FREE_PVAL   # ORIYA FRACT ONE QUART..ORIYA FRACT
1804	   0B78..0B81  ; UNASSIGNED  # <reserved>..<reserved>
1805	   0B82..0B83  ; PVALID      # TAMIL SIGN ANUSVARA..TAMIL SIGN VIS
1806	   0B84        ; UNASSIGNED  # <reserved>
1807	   0B85..0B8A  ; PVALID      # TAMIL LET A..TAMIL LET UU
1808	   0B8B..0B8D  ; UNASSIGNED  # <reserved>..<reserved>
1809	   0B8E..0B90  ; PVALID      # TAMIL LET E..TAMIL LET AI
1810	   0B91        ; UNASSIGNED  # <reserved>
1811	   0B92..0B95  ; PVALID      # TAMIL LET O..TAMIL LET KA
1812	   0B96..0B98  ; UNASSIGNED  # <reserved>..<reserved>
1813	   0B99..0B9A  ; PVALID      # TAMIL LET NGA..TAMIL LET CA
1814	   0B9B        ; UNASSIGNED  # <reserved>
1815	   0B9C        ; PVALID      # TAMIL LET JA
1816	   0B9D        ; UNASSIGNED  # <reserved>
1817	   0B9E..0B9F  ; PVALID      # TAMIL LET NYA..TAMIL LET TTA
1818	   0BA0..0BA2  ; UNASSIGNED  # <reserved>..<reserved>
1819	   0BA3..0BA4  ; PVALID      # TAMIL LET NNA..TAMIL LET TA
1820	   0BA5..0BA7  ; UNASSIGNED  # <reserved>..<reserved>
1821	   0BA8..0BAA  ; PVALID      # TAMIL LET NA..TAMIL LET PA
1822	   0BAB..0BAD  ; UNASSIGNED  # <reserved>..<reserved>
1823	   0BAE..0BB9  ; PVALID      # TAMIL LET MA..TAMIL LET HA
1824	   0BBA..0BBD  ; UNASSIGNED  # <reserved>..<reserved>
1825	   0BBE..0BC2  ; PVALID      # TAMIL VOW SIGN AA..TAMIL VOW SI
1826	   0BC3..0BC5  ; UNASSIGNED  # <reserved>..<reserved>
1827	   0BC6..0BC8  ; PVALID      # TAMIL VOW SIGN E..TAMIL VOW SIG
1828	   0BC9        ; UNASSIGNED  # <reserved>
1829	   0BCA..0BCD  ; PVALID      # TAMIL VOW SIGN O..TAMIL SIGN VIRA
1830	   0BCE..0BCF  ; UNASSIGNED  # <reserved>..<reserved>
1831	   0BD0        ; PVALID      # TAMIL OM
1832	   0BD1..0BD6  ; UNASSIGNED  # <reserved>..<reserved>
1833	   0BD7        ; PVALID      # TAMIL AU LEN MARK
1834	   0BD8..0BE5  ; UNASSIGNED  # <reserved>..<reserved>
1835	   0BE6..0BEF  ; PVALID      # TAMIL DIG ZERO..TAMIL DIG NINE
1836	   0BF0..0BFA  ; FREE_PVAL   # TAMIL NUM TEN..TAMIL NUM SIGN
1837	   0BFB..0C00  ; UNASSIGNED  # <reserved>..<reserved>
1838	   0C01..0C03  ; PVALID      # TELUGU SIGN CANDRABINDU..TELUGU SIG
1839	   0C04        ; UNASSIGNED  # <reserved>
1840	   0C05..0C0C  ; PVALID      # TELUGU LET A..TELUGU LET VOC L
1841	   0C0D        ; UNASSIGNED  # <reserved>
1842	   0C0E..0C10  ; PVALID      # TELUGU LET E..TELUGU LET AI
1843	   0C11        ; UNASSIGNED  # <reserved>
1844	   0C12..0C28  ; PVALID      # TELUGU LET O..TELUGU LET NA
1845	   0C29        ; UNASSIGNED  # <reserved>
1846	   0C2A..0C33  ; PVALID      # TELUGU LET PA..TELUGU LET LLA
1847	   0C34        ; UNASSIGNED  # <reserved>
1848	   0C35..0C39  ; PVALID      # TELUGU LET VA..TELUGU LET HA
1849	   0C3A..0C3C  ; UNASSIGNED  # <reserved>..<reserved>
1850	   0C3D..0C44  ; PVALID      # TELUGU SIGN AVAGRAHA..TELUGU VOW SI
1851	   0C45        ; UNASSIGNED  # <reserved>
1852	   0C46..0C48  ; PVALID      # TELUGU VOW SIGN E..TELUGU VOW SIGN
1853	   0C49        ; UNASSIGNED  # <reserved>
1854	   0C4A..0C4D  ; PVALID      # TELUGU VOW SIGN O..TELUGU SIGN VIRA
1855	   0C4E..0C54  ; UNASSIGNED  # <reserved>..<reserved>
1856	   0C55..0C56  ; PVALID      # TELUGU LEN MARK..TELUGU AI LEN MARK
1857	   0C57        ; UNASSIGNED  # <reserved>
1858	   0C58..0C59  ; PVALID      # TELUGU LET TSA..TELUGU LET DZA
1859	   0C5A..0C5F  ; UNASSIGNED  # <reserved>..<reserved>
1860	   0C60..0C63  ; PVALID      # TELUGU LET VOC RR..TELUGU VOW S
1861	   0C64..0C65  ; UNASSIGNED  # <reserved>..<reserved>
1862	   0C66..0C6F  ; PVALID      # TELUGU DIG ZERO..TELUGU DIG NINE
1863	   0C70..0C77  ; UNASSIGNED  # <reserved>..<reserved>
1864	   0C78..0C7F  ; FREE_PVAL   # TELUGU FRACTION DIG ZERO..TELUGU S
1865	   0C80..0C81  ; UNASSIGNED  # <reserved>..<reserved>
1866	   0C82..0C83  ; PVALID      # KANNADA SIGN ANUSVARA..KANNADA SIGN
1867	   0C84        ; UNASSIGNED  # <reserved>
1868	   0C85..0C8C  ; PVALID      # KANNADA LET A..KANNADA LET VOC L
1869	   0C8D        ; UNASSIGNED  # <reserved>
1870	   0C8E..0C90  ; PVALID      # KANNADA LET E..KANNADA LET AI
1871	   0C91        ; UNASSIGNED  # <reserved>
1872	   0C92..0CA8  ; PVALID      # KANNADA LET O..KANNADA LET NA
1873	   0CA9        ; UNASSIGNED  # <reserved>
1874	   0CAA..0CB3  ; PVALID      # KANNADA LET PA..KANNADA LET LLA
1875	   0CB4        ; UNASSIGNED  # <reserved>
1876	   0CB5..0CB9  ; PVALID      # KANNADA LET VA..KANNADA LET HA
1877	   0CBA..0CBB  ; UNASSIGNED  # <reserved>..<reserved>
1878	   0CBC..0CC4  ; PVALID      # KANNADA SIGN NUKTA..KANNADA VOW SIG
1879	   0CC5        ; UNASSIGNED  # <reserved>
1880	   0CC6..0CC8  ; PVALID      # KANNADA VOW SIGN E..KANNADA VOW SIG
1881	   0CC9        ; UNASSIGNED  # <reserved>
1882	   0CCA..0CCD  ; PVALID      # KANNADA VOW SIGN O..KANNADA SIGN VI
1883	   0CCE..0CD4  ; UNASSIGNED  # <reserved>..<reserved>
1884	   0CD5..0CD6  ; PVALID      # KANNADA LEN MARK..KANNADA AI LEN MA
1885	   0CD7..0CDD  ; UNASSIGNED  # <reserved>..<reserved>
1886	   0CDE        ; PVALID      # KANNADA LET FA
1887	   0CDF        ; UNASSIGNED  # <reserved>
1888	   0CE0..0CE3  ; PVALID      # KANNADA LET VOC RR..KANNADA VOW SIG
1889	   0CE4..0CE5  ; UNASSIGNED  # <reserved>..<reserved>
1890	   0CE6..0CEF  ; PVALID      # KANNADA DIG ZERO..KANNADA DIG NINE
1891	   0CF0        ; UNASSIGNED  # <reserved>
1892	   0CF1..0CF2  ; DISALLOWED  # KANNADA SIGN JIHVAMULIYA..KANNADA S
1893	   0CF3..0D01  ; UNASSIGNED  # <reserved>..<reserved>
1894	   0D02..0D03  ; PVALID      # MALAY SIGN ANUSVARA..MALAY SIGN VIS
1895	   0D04        ; UNASSIGNED  # <reserved>
1896	   0D05..0D0C  ; PVALID      # MALAY LET A..MALAY LET VOC
1897	   0D0D        ; UNASSIGNED  # <reserved>
1898	   0D0E..0D10  ; PVALID      # MALAY LET E..MALAY LET AI
1899	   0D11        ; UNASSIGNED  # <reserved>
1900	   0D12..0D3A  ; PVALID      # MALAY LET O..MALAY LET TTTA
1901	   0D3B..0D3C  ; UNASSIGNED  # <reserved>..<reserved>
1902	   0D3D..0D44  ; PVALID      # MALAY SIGN AVAGRAHA..MALAY VOW SIG
1903	   0D45        ; UNASSIGNED  # <reserved>
1904	   0D46..0D48  ; PVALID      # MALAY VOW SIGN E..MALAY VOW SIGN
1905	   0D49        ; UNASSIGNED  # <reserved>
1906	   0D4A..0D4E  ; PVALID      # MALAY VOW SIGN O..MALAY LET DOT REP
1907	   0D4F..0D56  ; UNASSIGNED  # <reserved>..<reserved>
1908	   0D57        ; PVALID      # MALAY AU LEN MARK
1909	   0D58..0D5F  ; UNASSIGNED  # <reserved>..<reserved>
1910	   0D60..0D63  ; PVALID      # MALAY LET VOC RR..MALAY VOW
1911	   0D64..0D65  ; UNASSIGNED  # <reserved>..<reserved>
1912	   0D66..0D6F  ; PVALID      # MALAY DIG ZERO..MALAY DIG NINE
1913	   0D70..0D75  ; FREE_PVAL   # MALAY NUM TEN..MALAY FRACTION THR
1914	   0D76..0D78  ; UNASSIGNED  # <reserved>..<reserved>
1915	   0D79        ; FREE_PVAL   # MALAY DATE MARK
1916	   0D7A..0D7F  ; PVALID      # MALAY LET CHILLU NN..MALAY LET
1917	   0D80..0D81  ; UNASSIGNED  # <reserved>..<reserved>
1918	   0D82..0D83  ; PVALID      # SINH SIGN ANUSVARAYA..SINH SIGN VIS
1919	   0D84        ; UNASSIGNED  # <reserved>
1920	   0D85..0D96  ; PVALID      # SINH LET AYANNA..SINH LET AUYANN
1921	   0D97..0D99  ; UNASSIGNED  # <reserved>..<reserved>
1922	   0D9A..0DB1  ; PVALID      # SINH LET ALPAPRAANA KAYANNA..SINH L
1923	   0DB2        ; UNASSIGNED  # <reserved>
1924	   0DB3..0DBB  ; PVALID      # SINH LET SANYAKA DAYANNA..SINH LETT
1925	   0DBC        ; UNASSIGNED  # <reserved>
1926	   0DBD        ; PVALID      # SINH LET DANTAJA LAYANNA
1927	   0DBE..0DBF  ; UNASSIGNED  # <reserved>..<reserved>
1928	   0DC0..0DC6  ; PVALID      # SINH LET VAYANNA..SINH LET FAYAN
1929	   0DC7..0DC9  ; UNASSIGNED  # <reserved>..<reserved>
1930	   0DCA        ; PVALID      # SINH SIGN AL-LAKUNA
1931	   0DCB..0DCE  ; UNASSIGNED  # <reserved>..<reserved>
1932	   0DCF..0DD4  ; PVALID      # SINH VOW SIGN AELA-PILLA..SINH VOW
1933	   0DD5        ; UNASSIGNED  # <reserved>
1934	   0DD6        ; PVALID      # SINH VOW SIGN DIGA PAA-PILLA
1935	   0DD7        ; UNASSIGNED  # <reserved>
1936	   0DD8..0DDF  ; PVALID      # SINH VOW SIGN GAETTA-PILLA..SINH VO
1937	   0DE0..0DF1  ; UNASSIGNED  # <reserved>..<reserved>
1938	   0DF2..0DF3  ; PVALID      # SINH VOW SIGN DIGA GAETTA-PILLA..SI
1939	   0DF4        ; FREE_PVAL   # SINH PUNCT KUNDDALIYA
1940	   0DF5..0E00  ; UNASSIGNED  # <reserved>..<reserved>
1941	   0E01..0E32  ; PVALID      # THAI CHAR KO KAI..THAI CHAR SARA A
1942	   0E33        ; FREE_PVAL   # THAI CHAR SARA AM
1943	   0E34..0E3A  ; PVALID      # THAI CHAR SARA I..THAI CHAR PHINTH
1944	   0E3B..0E3E  ; UNASSIGNED  # <reserved>..<reserved>
1945	   0E3F        ; FREE_PVAL   # THAI CURRENCY SYM BAHT
1946	   0E40..0E4E  ; PVALID      # THAI CHAR SARA E..THAI CHAR YAMAKK
1947	   0E4F        ; FREE_PVAL   # THAI CHAR FONGMAN
1948	   0E50..0E59  ; PVALID      # THAI DIG ZERO..THAI DIG NINE
1949	   0E5A..0E5B  ; FREE_PVAL   # THAI CHAR ANGKHANKHU..THAI CHAR KH
1950	   0E5C..0E80  ; UNASSIGNED  # <reserved>..<reserved>
1951	   0E81..0E82  ; PVALID      # LAO LET KO..LAO LET KHO SUNG
1952	   0E83        ; UNASSIGNED  # <reserved>
1953	   0E84        ; PVALID      # LAO LET KHO TAM
1954	   0E85..0E86  ; UNASSIGNED  # <reserved>..<reserved>
1955	   0E87..0E88  ; PVALID      # LAO LET NGO..LAO LET CO
1956	   0E89        ; UNASSIGNED  # <reserved>
1957	   0E8A        ; PVALID      # LAO LET SO TAM
1958	   0E8B..0E8C  ; UNASSIGNED  # <reserved>..<reserved>
1959	   0E8D        ; PVALID      # LAO LET NYO
1960	   0E8E..0E93  ; UNASSIGNED  # <reserved>..<reserved>
1961	   0E94..0E97  ; PVALID      # LAO LET DO..LAO LET THO TAM
1962	   0E98        ; UNASSIGNED  # <reserved>
1963	   0E99..0E9F  ; PVALID      # LAO LET NO..LAO LET FO SUNG
1964	   0EA0        ; UNASSIGNED  # <reserved>
1965	   0EA1..0EA3  ; PVALID      # LAO LET MO..LAO LET LO LING
1966	   0EA4        ; UNASSIGNED  # <reserved>
1967	   0EA5        ; PVALID      # LAO LET LO LOOT
1968	   0EA6        ; UNASSIGNED  # <reserved>
1969	   0EA7        ; PVALID      # LAO LET WO
1970	   0EA8..0EA9  ; UNASSIGNED  # <reserved>..<reserved>
1971	   0EAA..0EAB  ; PVALID      # LAO LET SO SUNG..LAO LET HO SUNG
1972	   0EAC        ; UNASSIGNED  # <reserved>
1973	   0EAD..0EB2  ; PVALID      # LAO LET O..LAO VOW SIGN AA
1974	   0EB3        ; FREE_PVAL   # LAO VOW SIGN AM
1975	   0EB4..0EB9  ; PVALID      # LAO VOW SIGN I..LAO VOW SIGN UU
1976	   0EBA        ; UNASSIGNED  # <reserved>
1977	   0EBB..0EBD  ; PVALID      # LAO VOW SIGN MAI KON..LAO SEMIVOW SIG
1978	   0EBE..0EBF  ; UNASSIGNED  # <reserved>..<reserved>
1979	   0EC0..0EC4  ; PVALID      # LAO VOW SIGN E..LAO VOW SIGN AI
1980	   0EC5        ; UNASSIGNED  # <reserved>
1981	   0EC6        ; PVALID      # LAO KO LA
1982	   0EC7        ; UNASSIGNED  # <reserved>
1983	   0EC8..0ECD  ; PVALID      # LAO TONE MAI EK..LAO NIGGAHITA
1984	   0ECE..0ECF  ; UNASSIGNED  # <reserved>..<reserved>
1985	   0ED0..0ED9  ; PVALID      # LAO DIG ZERO..LAO DIG NINE
1986	   0EDA..0EDB  ; UNASSIGNED  # <reserved>..<reserved>
1987	   0EDC..0EDD  ; FREE_PVAL   # LAO HO NO..LAO HO MO
1988	   0EDE..0EDF  ; PVALID      # LAO LET KHMU GO..TIB SYL OM
1989	   0EE0..0EEF  ; UNASSIGNED  # <reserved>..<reserved>
1990	   0F00        ; PVALID      # TIB SYLL OM
1991	   0F01..0F0A  ; FREE_PVAL   # TIB MARK GTER YIG MGO TRUNC A..TIB
1992	   0F0B        ; PVALID      # TIB MARK INTERSYLLABIC TSHEG
1993	   0F0C..0F17  ; FREE_PVAL   # TIB MARK DELIMITER TSHEG BSTAR..TIB
1994	   0F18..0F19  ; PVALID      # TIB ASTROLOGICAL SIGN -KHYUD PA..TIB
1995	   0F1A..0F1F  ; FREE_PVAL   # TIB SIGN RDEL DKAR GCIG..TIB SIGN RD
1996	   0F20..0F29  ; PVALID      # TIB DIG ZERO..TIB DIG NINE
1997	   0F2A..0F34  ; FREE_PVAL   # TIB DIG HALF ONE..TIB MARK BSDUS R
1998	   0F35        ; PVALID      # TIB MARK NGAS BZUNG NYI ZLA
1999	   0F36        ; FREE_PVAL   # TIB MARK CARET DZUD RTAGS BZHI MIG C
2000	   0F37        ; PVALID      # TIB MARK NGAS BZUNG SGOR RTAGS
2001	   0F38        ; FREE_PVAL   # TIB MARK CHE MGO
2002	   0F39        ; PVALID      # TIB MARK TSA PHRU
2003	   0F3A..0F3D  ; FREE_PVAL   # TIB MARK GUG RTAGS GYON..TIB MARK AN
2004	   0F3E..0F47  ; PVALID      # TIB SIGN YAR TSHES..TIB LET JA
2005	   0F48        ; UNASSIGNED  # <reserved>
2006	   0F49..0F6C  ; PVALID      # TIB LET NYA..TIB LET RRA
2007	   0F6D..0F70  ; UNASSIGNED  # <reserved>..<reserved>
2008	   0F71..0F76  ; PVALID      # TIB VOW SIGN AA..TIB VOW SIGN VO
2009	   0F77..0F79  ; FREE_PVAL   # TIB VOW SIGN UU..TIB VOW SIGN VO
2010	   0F7A..0F80  ; PVALID      # TIB VOW SIGN E..TIB VOW SIGN REV
2011	   0F81        ; FREE_PVAL   # TIB VOW SIGN REV II
2012	   0F82..0F84  ; PVALID      # TIB SIGN NYI ZLA NAA DA..TIB MARK H
2013	   0F85        ; FREE_PVAL   # TIB MARK PALUTA
2014	   0F86..0F8F  ; PVALID      # TIB SIGN LCI RTAGS..TIB SUBJOIN S
2015	   0F90..0F92  ; PVALID      # TIB SUBJOIN LET KA..TIB SUBJOIN
2016	   0F93        ; FREE_PVAL   # TIB SUBJOIN LET GHA
2017	   0F94..0F97  ; PVALID      # TIB SUBJOIN LET NGA..TIB SUBJOI
2018	   0F98        ; UNASSIGNED  # <reserved>
2019	   0F99..0FBC  ; PVALID      # TIB SUBJOIN LET NYA..TIB SUBJOI
2020	   0FBD        ; UNASSIGNED  # <reserved>
2021	   0FBE..0FC5  ; FREE_PVAL   # TIB KU RU KHA..TIB SYM RDO RJE
2022	   0FC6        ; PVALID      # TIB SYM PADMA GDAN
2023	   0FC7..0FCC  ; FREE_PVAL   # TIB SYM RDO RJE RGYA GRAM..TIB SY
2024	   0FCD        ; UNASSIGNED  # <reserved>
2025	   0FCE..0FDA  ; FREE_PVAL   # TIB SIGN RDEL NAG RDEL DKAR..TIB MA
2026	   0FDB..0FFF  ; UNASSIGNED  # <reserved>..<reserved>
2027	   1000..1049  ; PVALID      # MYAN LET KA..MYAN DIG NINE
2028	   104A..104F  ; FREE_PVAL   # MYAN SIGN LITTLE SECTION..MYAN SYM
2029	   1050..109D  ; PVALID      # MYAN LET SHA..MYAN VOW SIGN AITON
2030	   109E..109F  ; FREE_PVAL   # MYAN SYM SHAN ONE..MYAN SYM SHAN EX
2031	   10A0..10C5  ; PVALID      # GEORG CAP LET AN..GEORG CAP LET HOE
2032	   10C6        ; UNASSIGNED  # <reserved>
2033	   10C7        ; PVALID      # GEORG CAP LET YN
2034	   10C8..10CC  ; UNASSIGNED  # <reserved>..<reserved>
2035	   10CD        ; PVALID      # GEORG CAP LET AEN
2036	   10CE..10CF  ; UNASSIGNED  # <reserved>..<reserved>
2037	   10D0..10FA  ; PVALID      # GEORG LET AN..GEORG LET AIN
2038	   10FB..10FC  ; FREE_PVAL   # GEORG PARA SEP..MOD LET GEORG NAR
2039	   10FD..10FF  ; PVALID      # GEORG LET AEN..GEORG LET LABIAL
2040	   1100..11FF  ; DISALLOWED  # HANGUL CHO KIYEOK..HANGUL JONG SSA
2041	   1200..1248  ; PVALID      # ETHI SYL HA..ETHI SYL QWA
2042	   1249        ; UNASSIGNED  # <reserved>
2043	   124A..124D  ; PVALID      # ETHI SYL QWI..ETHI SYL QWE
2044	   124E..124F  ; UNASSIGNED  # <reserved>..<reserved>
2045	   1250..1256  ; PVALID      # ETHI SYL QHA..ETHI SYL QHO
2046	   1257        ; UNASSIGNED  # <reserved>
2047	   1258        ; PVALID      # ETHI SYL QHWA
2048	   1259        ; UNASSIGNED  # <reserved>
2049	   125A..125D  ; PVALID      # ETHI SYL QHWI..ETHI SYL QH
2050	   125E..125F  ; UNASSIGNED  # <reserved>..<reserved>
2051	   1260..1288  ; PVALID      # ETHI SYL BA..ETHI SYL XWA
2052	   1289        ; UNASSIGNED  # <reserved>
2053	   128A..128D  ; PVALID      # ETHI SYL XWI..ETHI SYL XWE
2054	   128E..128F  ; UNASSIGNED  # <reserved>..<reserved>
2055	   1290..12B0  ; PVALID      # ETHI SYL NA..ETHI SYL KWA
2056	   12B1        ; UNASSIGNED  # <reserved>
2057	   12B2..12B5  ; PVALID      # ETHI SYL KWI..ETHI SYL KWE
2058	   12B6..12B7  ; UNASSIGNED  # <reserved>..<reserved>
2059	   12B8..12BE  ; PVALID      # ETHI SYL KXA..ETHI SYL KXO
2060	   12BF        ; UNASSIGNED  # <reserved>
2061	   12C0        ; PVALID      # ETHI SYL KXWA
2062	   12C1        ; UNASSIGNED  # <reserved>
2063	   12C2..12C5  ; PVALID      # ETHI SYL KXWI..ETHI SYL KX
2064	   12C6..12C7  ; UNASSIGNED  # <reserved>..<reserved>
2065	   12C8..12D6  ; PVALID      # ETHI SYL WA..ETHI SYL PHAR
2066	   12D7        ; UNASSIGNED  # <reserved>
2067	   12D8..1310  ; PVALID      # ETHI SYL ZA..ETHI SYL GWA
2068	   1311        ; UNASSIGNED  # <reserved>
2069	   1312..1315  ; PVALID      # ETHI SYL GWI..ETHI SYL GWE
2070	   1316..1317  ; UNASSIGNED  # <reserved>..<reserved>
2071	   1318..135A  ; PVALID      # ETHI SYL GGA..ETHI SYL FYA
2072	   135B..135C  ; UNASSIGNED  # <reserved>..<reserved>
2073	   135D..135F  ; PVALID      # ETHI COMB GEM AND VOW..ETHI COMB GE
2074	   1360..137C  ; FREE_PVAL   # ETHI SECT MARK..ETHI NUM TEN THOUS
2075	   137D..137F  ; UNASSIGNED  # <reserved>..<reserved>
2076	   1380..138F  ; PVALID      # ETHI SYL SEBATBEIT MWA..ETHI SYL PW
2077	   1390..1399  ; FREE_PVAL   # ETHI TON MARK YIZET..ETHI TON MARK
2078	   139A..139F  ; UNASSIGNED  # <reserved>..<reserved>
2079	   13A0..13F4  ; PVALID      # CHEROKEE LET A..CHEROKEE LET YV
2080	   13F5..13FF  ; UNASSIGNED  # <reserved>..<reserved>
2081	   1400        ; FREE_PVAL   # CANAD SYL HYPHEN
2082	   1401..166C  ; PVALID      # CANAD SYL E..CANAD SYL CAR
2083	   166D..166E  ; FREE_PVAL   # CANAD SYL CHI SIGN..CANAD SYLLAB
2084	   166F..167F  ; PVALID      # CANAD SYL QAI..CANAD SYL B
2085	   1680        ; FREE_PVAL   # OGHAM SPACE MARK
2086	   1681..169A  ; PVALID      # OGHAM LET BEITH..OGHAM LET PEITH
2087	   169B..169C  ; FREE_PVAL   # OGHAM FEATHER MARK..OGHAM REV FEAT
2088	   169D..169F  ; UNASSIGNED  # <reserved>..<reserved>
2089	   16A0..16EA  ; PVALID      # RUNIC LET FEHU FEOH FE F..RUNIC LET
2090	   16EB..16F0  ; FREE_PVAL   # RUNIC SINGLE PUNCT..RUNIC BELGTHOR
2091	   16F1..16FF  ; UNASSIGNED  # <reserved>..<reserved>
2092	   1700..170C  ; PVALID      # TAGALOG LET A..TAGALOG LET YA
2093	   170D        ; UNASSIGNED  # <reserved>
2094	   170E..1714  ; PVALID      # TAGALOG LET LA..TAGALOG SIGN VIRAMA
2095	   1715..171F  ; UNASSIGNED  # <reserved>..<reserved>
2096	   1720..1734  ; PVALID      # HANUNOO LET A..HANUNOO SIGN PAMUDPO
2097	   1735..1736  ; FREE_PVAL   # PHILIP SINGLE PUNCT..PHILIP DOUBLE
2098	   1737..173F  ; UNASSIGNED  # <reserved>..<reserved>
2099	   1740..1753  ; PVALID      # BUHID LET A..BUHID VOW SIGN U
2100	   1754..175F  ; UNASSIGNED  # <reserved>..<reserved>
2101	   1760..176C  ; PVALID      # TAGBANWA LET A..TAGBANWA LET YA
2102	   176D        ; UNASSIGNED  # <reserved>
2103	   176E..1770  ; PVALID      # TAGBANWA LET LA..TAGBANWA LET SA
2104	   1771        ; UNASSIGNED  # <reserved>
2105	   1772..1773  ; PVALID      # TAGBANWA VOW SIGN I..TAGBANWA VOW S
2106	   1774..177F  ; UNASSIGNED  # <reserved>..<reserved>
2107	   1780..17D3  ; PVALID      # KHMER LET KA..KHMER SIGN BATHAMASAT
2108	   17D4..17D6  ; FREE_PVAL   # KHMER SIGN KHAN..KHMER SIGN CAMNUC
2109	   17D7        ; PVALID      # KHMER SIGN LEK TOO
2110	   17D8..17DB  ; FREE_PVAL   # KHMER SIGN BEYYAL..KHMER CURR SYM R
2111	   17DC..17DD  ; PVALID      # KHMER SIGN AVAKRAHASANYA..KHMER SIG
2112	   17DE..17DF  ; UNASSIGNED  # <reserved>..<reserved>
2113	   17E0..17E9  ; PVALID      # KHMER DIG ZERO..KHMER DIG NINE
2114	   17EA..17EF  ; UNASSIGNED  # <reserved>..<reserved>
2115	   17F0..17F9  ; FREE_PVAL   # KHMER SYM LEK ATTAK SON..KHMER SYM
2116	   17FA..17FF  ; UNASSIGNED  # <reserved>..<reserved>
2117	   1800..180A  ; FREE_PVAL   # MONG BIRGA..MONG NIRUGU
2118	   180B..180D  ; PVALID      # MONG FREE VAR SEL ONE..MONG FREE VA
2119	   180E        ; FREE_PVAL   # MONG VOW SEP
2120	   180F        ; UNASSIGNED  # <reserved>
2121	   1810..1819  ; PVALID      # MONG DIG ZERO..MONG DIG NINE
2122	   181A..181F  ; UNASSIGNED  # <reserved>..<reserved>
2123	   1820..1877  ; PVALID      # MONG LET A..MONG LET MANCHU
2124	   1878..187F  ; UNASSIGNED  # <reserved>..<reserved>
2125	   1880..18AA  ; PVALID      # MONG LET ALI GALI ANUSVARA ONE..MON
2126	   18AB..18AF  ; UNASSIGNED  # <reserved>..<reserved>
2127	   18B0..18F5  ; PVALID      # CAN SYL OY..CAN SYL CA
2128	   18F6..18FF  ; UNASSIGNED  # <reserved>..<reserved>
2129	   1900..191C  ; PVALID      # LIMBU VOW-CARRIER LET..LIMBU LET HA
2130	   191D..191F  ; UNASSIGNED  # <reserved>..<reserved>
2131	   1920..192B  ; PVALID      # LIMBU VOW SIGN A..LIMBU SUBJOIN LET
2132	   192C..192F  ; UNASSIGNED  # <reserved>..<reserved>
2133	   1930..193B  ; PVALID      # LIMBU SM LET KA..LIMBU SIGN SA-I
2134	   193C..193F  ; UNASSIGNED  # <reserved>..<reserved>
2135	   1940        ; FREE_PVAL   # LIMBU SIGN LOO
2136	   1941..1943  ; UNASSIGNED  # <reserved>..<reserved>
2137	   1944..1945  ; FREE_PVAL   # LIMBU EXCLAM MARK..LIMBU QUEST MARK
2138	   1946..196D  ; PVALID      # LIMBU DIG ZERO..TAI LE LET AI
2139	   196E..196F  ; UNASSIGNED  # <reserved>..<reserved>
2140	   1970..1974  ; PVALID      # TAI LE LET TONE-2..TAI LE LET TONE-
2141	   1975..197F  ; UNASSIGNED  # <reserved>..<reserved>
2142	   1980..19AB  ; PVALID      # NEW TAI LUE LET HIGH QA..NEW TAI LU
2143	   19AC..19AF  ; UNASSIGNED  # <reserved>..<reserved>
2144	   19B0..19C9  ; PVALID      # NEW TAI LUE VOW SIGN VOW SHORT..NEW
2145	   19CA..19CF  ; UNASSIGNED  # <reserved>..<reserved>
2146	   19D0..19D9  ; PVALID      # NEW TAI LUE DIG ZERO..NEW TAI DIG N
2147	   19DA        ; FREE_PVAL   # NEW TAI LUE THAM
2148	   19DB..19DD  ; UNASSIGNED  # <reserved>..<reserved>
2149	   19DE..19FF  ; FREE_PVAL   # NEW TAI LUE SIGN LAE..KHMER SYM DAP
2150	   1A00..1A1B  ; PVALID      # BUGIN LET KA..BUGIN VOW SIGN AE
2151	   1A1C..1A1D  ; UNASSIGNED  # <reserved>..<reserved>
2152	   1A1E..1A1F  ; FREE_PVAL   # BUGIN PALLAWA..BUGIN END OF SECTION
2153	   1A20..1A5E  ; PVALID      # TAI THAM LET HIGH KA..TAI THAM CONS
2154	   1A5F        ; UNASSIGNED  # <reserved>
2155	   1A60..1A7C  ; PVALID      # TAI THAM SIGN SAKOT..TAI THAM SIGN
2156	   1A7D..1A7E  ; UNASSIGNED  # <reserved>..<reserved>
2157	   1A7F..1A89  ; PVALID      # TAI THAM COMB CRYPT DOT..TAI THAM D
2158	   1A8A..1A8F  ; UNASSIGNED  # <reserved>..<reserved>
2159	   1A90..1A99  ; PVALID      # TAI THAM THAM DIG ZERO..TAI THAM TH
2160	   1A9A..1A9F  ; UNASSIGNED  # <reserved>..<reserved>
2161	   1AA0..1AA6  ; FREE_PVAL   # TAI THAM SIGN WIANG..TAI THAM SIGN
2162	   1AA7        ; PVALID      # TAI THAM SIGN MAI YAMOK
2163	   1AA8..1AAD  ; FREE_PVAL   # TAI THAM SIGN KAAN..TAI THAM SIGN C
2164	   1AAE..1AFF  ; UNASSIGNED  # <reserved>..<reserved>
2165	   1B00..1B4B  ; PVALID      # BAL SIGN ULU RICEM..BAL LET ASYURA
2166	   1B4C..1B4F  ; UNASSIGNED  # <reserved>..<reserved>
2167	   1B50..1B59  ; PVALID      # BAL DIG ZERO..BAL DIG NINE
2168	   1B5A..1B6A  ; FREE_PVAL   # BAL PANTI..BAL MUS SYM DANG
2169	   1B6B..1B73  ; PVALID      # BAL MUS SYM COMB TEGEH..BAL MUS
2170	   1B74..1B7C  ; FREE_PVAL   # BAL MUS SYM RIGHT-HAND OPEN DUG
2171	   1B7D..1B7F  ; UNASSIGNED  # <reserved>..<reserved>
2172	   1B80..1BF3  ; PVALID      # SUND SIGN PANYECEK..BATAK PANONGONAN
2173	   1BF4..1BFB  ; UNASSIGNED  # <reserved>..<reserved>
2174	   1BFC..1BFF  ; FREE_PVAL   # BATAK SYM BINDU NA METEK..BATAK SYM
2175	   1C00..1C37  ; PVALID      # LEPCHA LET KA..LEPCHA SIGN NUKTA
2176	   1C38..1C3A  ; UNASSIGNED  # <reserved>..<reserved>
2177	   1C3B..1C3F  ; FREE_PVAL   # LEPCHA PUNCT TA-ROL..LEPCHA PUNCT T
2178	   1C40..1C49  ; PVALID      # LEPCHA DIG ZERO..LEPCHA DIG NINE
2179	   1C4A..1C4C  ; UNASSIGNED  # <reserved>..<reserved>
2180	   1C4D..1C7D  ; PVALID      # LEPCHA LET TTA..OL CHIKI AHAD
2181	   1C7E..1C7F  ; FREE_PVAL   # OL CHIKI PUNCT MUCAAD..OL CHIKI PUN
2182	   1C80..1C9F  ; UNASSIGNED  # <reserved>..<reserved>
2183	   1CC0..1CC7  ; FREE_PVAL   # SUNDA PUNCT BINDU SURYA..SUNDA PUNC
2184	   1CC8..1CCF  ; UNASSIGNED  # <reserved>..<reserved>
2185	   1CD0..1CD2  ; PVALID      # VED TONE KARSHANA..VED TONE PRENKHA
2186	   1CD3        ; FREE_PVAL   # VED SIGN NIHSHVASA
2187	   1CD4..1CF6  ; PVALID      # VED SIGN YAJURVEDIC MID SVARITA..VE
2188	   1CF7..1CFF  ; UNASSIGNED  # <reserved>..<reserved>
2189	   1D00..1D2B  ; PVALID      # LAT LET SM CAP A..CYR LET SM
2190	   1D2C..1D2E  ; FREE_PVAL   # MOD LET CAP A..MOD LET C
2191	   1D2F        ; PVALID      # MOD LET CAP BARRED B
2192	   1D30..1D3A  ; FREE_PVAL   # MOD LET CAP D..MOD LET C
2193	   1D3B        ; PVALID      # MOD LET CAP REV N
2194	   1D3C..1D4D  ; FREE_PVAL   # MOD LET CAP O..MOD LET S
2195	   1D4E        ; PVALID      # MOD LET SM TURNED I
2196	   1D4F..1D6A  ; FREE_PVAL   # MOD LET SM K..GREEK SUB SMA
2197	   1D6B..1D77  ; PVALID      # LAT SM LET UE..LAT SM LET TU
2198	   1D78        ; FREE_PVAL   # MOD LET CYR EN
2199	   1D79..1D9A  ; PVALID      # LAT SM LET INSULAR G..LAT SM LE
2200	   1D9B..1DBF  ; FREE_PVAL   # MOD LET SM TURNED ALPHA..MOD
2201	   1DC0..1DE6  ; PVALID      # COMB DOTTED GRAVE ACCENT..COMB LAT
2202	   1DE7..1DFB  ; UNASSIGNED  # <reserved>..<reserved>
2203	   1DFC..1DFF  ; PVALID      # COMB DOUBLE INV BREVE BEL..COMB R
2204	   1E9A        ; FREE_PVAL   # LAT SM LET A WITH R HALF RING
2205	   1E9B..1F15  ; PVALID      # LAT SM LET LONG S W DOT ABOVE..GR
2206	   1F16..1F17  ; UNASSIGNED  # <reserved>..<reserved>
2207	   1F18..1F1D  ; FREE_PVAL   # GREEK CAP LET EPSILON W PSILI..GRE
2208	   1F1E..1F1F  ; UNASSIGNED  # <reserved>..<reserved>
2209	   1F20..1F45  ; PVALID      # GREEK SM LET ETA W PSILI..GREEK SMA
2210	   1F46..1F47  ; UNASSIGNED  # <reserved>..<reserved>
2211	   1F48..1F4D  ; FREE_PVAL   # GREEK CAP LET OMICRON W PSILI..GRE
2212	   1F4E..1F4F  ; UNASSIGNED  # <reserved>..<reserved>
2213	   1F50..1F57  ; PVALID      # GREEK SM LET UPSILON W PSILI..GREEK
2214	   1F58        ; UNASSIGNED  # <reserved>
2215	   1F59        ; PVALID      # GREEK CAP LET UPSILON W DASIA
2216	   1F5A        ; UNASSIGNED  # <reserved>
2217	   1F5B        ; PVALID      # GREEK CAP LET UPSILON W DASIA AND
2218	   1F5C        ; UNASSIGNED  # <reserved>
2219	   1F5D        ; PVALID      # GREEK CAP LET UPSILON W DASIA AND
2220	   1F5E        ; UNASSIGNED  # <reserved>
2221	   1F5F..1F7D  ; PVALID      # GREEK CAP LET UPSILON W DASIA A..GR
2222	   1F7E..1F7F  ; UNASSIGNED  # <reserved>..<reserved>
2223	   1F80..1F87  ; PVALID      # GREEK SM LET ALPHA W PSILI AND YPOG
2224	   1F88..1F8F  ; FREE_PVAL   # GREEK CAP LET ALPHA W PSILI AND..GR
2225	   1F90..1F97  ; PVALID      # GREEK SM LET ETA W PSILI AND YP..GR
2226	   1F98..1F9F  ; FREE_PVAL   # GREEK CAP LET ETA W PSILI AND P..GR
2227	   1FA0..1FA7  ; PVALID      # GREEK SM LET OMEGA W PSILI AND ..GR
2228	   1FA8..1FAF  ; FREE_PVAL   # GREEK CAPL LET OMEGA W PSILI AN..GR
2229	   1FB0..1FB4  ; PVALID      # GREEK SM LET ALPHA W VRACHY..GREEK
2230	   1FB5        ; UNASSIGNED  # <reserved>
2231	   1FB6..1FBB  ; PVALID      # GREEK SM LET ALPHA W PERISPOMEN..GR
2232	   1FBC..1FBD  ; FREE_PVAL   # GREEK CAP LET ALPHA W PROSGEGRA..GR
2233	   1FBE        ; PVALID      # GREEK PROSGEGRAMMENI
2234	   1FBF..1FC1  ; FREE_PVAL   # GREEK PSILI..GREEK DIALYTIKA AND PE
2235	   1FC2..1FC4  ; PVALID      # GREEK SM LET ETA W VARIA AND YP..GR
2236	   1FC5        ; UNASSIGNED  # <reserved>
2237	   1FC6..1FCB  ; PVALID      # GREEK SM LET ETA W PERISPOMENI..GR
2238	   1FCC..1FCF  ; FREE_PVAL   # GREEK CAP LET ETA W PROSGEGRAM..GR
2239	   1FD0..1FD3  ; PVALID      # GREEK SM LET IOTA W VRACHY..GREEK S
2240	   1FD4..1FD5  ; UNASSIGNED  # <reserved>..<reserved>
2241	   1FD6..1FDB  ; PVALID      # GREEK SM LET IOTA W PERISPOMENI..GR
2242	   1FDC        ; UNASSIGNED  # <reserved>
2243	   1FDD..1FDF  ; FREE_PVAL   # GREEK DASIA AND VARIA..GREEK DASIA
2244	   1FE0..1FEC  ; PVALID      # GREEK SM LET UPSILON W VRACHY..GREE
2245	   1FED..1FEF  ; FREE_PVAL   # GREEK DIALYTIKA AND VARIA..GREEK VA
2246	   1FF0..1FF1  ; UNASSIGNED  # <reserved>..<reserved>
2247	   1FF2..1FF4  ; FREE_PVAL   # GREEK SM LET OMEGA W VARIA AND YPOG
2248	   1FF5        ; UNASSIGNED  # <reserved>
2249	   1FF6..1FFB  ; PVALID      # GREEK SM LET OMEGA W PERISPOMEN..GR
2250	   1FFC..1FFE  ; FREE_PVAL   # GREEK CAP LET OMEGA W PROSGEGRA..GR
2251	   1FFF        ; UNASSIGNED  # <reserved>
2252	   2000..200A  ; FREE_PVAL   # EN QUAD..HAIR SPACE
2253	   200B        ; DISALLOWED  # ZERO WIDTH SPACE
2254	   200C..200D  ; CONTEXTJ    # ZERO WIDTH NON-JOINER..ZERO WIDTH J
2255	   200E..200F  ; DISALLOWED  # LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT M
2256	   2010..2027  ; FREE_PVAL   # HYPHEN..HYPHENATION POINT
2257	   2028..202E  ; DISALLOWED  # LINE SEP..RIGHT-TO-LEFT OVERRIDE
2258	   202F..205F  ; FREE_PVAL   # NARROW NO-BREAK SPACE..MED MATH SP
2259	   2060..2064  ; DISALLOWED  # WORD JOINER..INVISIBLE PLUS
2260	   2065..2069  ; UNASSIGNED  # <reserved>..<reserved>
2261	   206A..206F  ; DISALLOWED  # INHIBIT SYMM SWAP..NOM DIGIT SHAPES
2262	   2070..2071  ; FREE_PVAL   # SUPER ZERO..SUPER LAT SM LET I
2263	   2072..2073  ; UNASSIGNED  # <reserved>..<reserved>
2264	   2074..208E  ; FREE_PVAL   # SUPER FOUR..SUB RIGHT PARENTHESIS
2265	   208F        ; UNASSIGNED  # <reserved>
2266	   2090..209C  ; FREE_PVAL   # LAT SUB SM LET A..LAT SUB SM LET T
2267	   209D..209F  ; UNASSIGNED  # <reserved>..<reserved>
2268	   20A0..20B9  ; FREE_PVAL   # EURO-CURRENCY SIGN..INDIAN RUPEE SI
2269	   20BA..20CF  ; UNASSIGNED  # <reserved>..<reserved>
2270	   20D0..20DC  ; PVALID      # COMB LEFT HARPOON ABOVE..COMB FOUR
2271	   20DD..20E0  ; FREE_PVAL   # COMB ENC CIRC..COMB ENC CIRC BACKS
2272	   20E1        ; PVALID      # COMB L R ARROW ABOVE
2273	   20E2..20E4  ; FREE_PVAL   # COMB ENC SCREEN..COMB ENC UPWARD PO
2274	   20E5..20F0  ; PVALID      # COMB REV SOLIDUS OVERLAY..COMB ASTE
2275	   20F1..20FF  ; UNASSIGNED  # <reserved>..<reserved>
2276	   2100..2129  ; FREE_PVAL   # ACCOUNT OF..TURNED GREEK SM LET IOT
2277	   212A..212B  ; PVALID      # KELVIN SIGN..ANGSTROM SIGN
2278	   212C..2131  ; FREE_PVAL   # SCRIPT CAP C..SCRIPT CAP F
2279	   2132        ; PVALID      # TURNED CAP F
2280	   2133..214D  ; FREE_PVAL   # SCRIPT CAP M..AKTIESELSKAB
2281	   214E        ; PVALID      # TURNED SM F
2282	   214F..2182  ; DISALLOWED  # SYM FOR SAMAR SOURCE..ROM NUM TEN T
2283	   2183..2184  ; PVALID      # ROM NUM REV ONE HUNDRED..LAT SM LET
2284	   2185..2189  ; FREE_PVAL   # ROM NUM SIX LATE FORM..VULGAR FRACT
2285	   218A..218F  ; UNASSIGNED  # <reserved>..<reserved>
2286	   2190..23F3  ; FREE_PVAL   # LEFTWARDS ARROW..HOURGLASS WITH FLO
2287	   23F4..23FF  ; UNASSIGNED  # <reserved>..<reserved>
2288	   2400..2426  ; FREE_PVAL   # SYM FOR NULL..SYM FOR SUB FORM
2289	   2427..243F  ; UNASSIGNED  # <reserved>..<reserved>
2290	   2440..244A  ; FREE_PVAL   # OCR HOOK..OCR DOUBLE BACKSLASH
2291	   244B..245F  ; UNASSIGNED  # <reserved>..<reserved>
2292	   2460..26FF  ; FREE_PVAL   # CIRCLED DIG ONE..WHITE FLAG W HORIZ
2293	   2700        ; UNASSIGNED  # <reserved>
2294	   2701..2B4C  ; FREE_PVAL   # UP BLADE SCISSORS..RIGHTWARDS ARROW
2295	   2B4D..2B4F  ; UNASSIGNED  # <reserved>..<reserved>
2296	   2B50..2B59  ; FREE_PVAL   # WHITE MEDIUM STAR..HEAVY CIRCLED SA
2297	   2B5A..2BFF  ; UNASSIGNED  # <reserved>..<reserved>
2298	   2C00..2C2E  ; PVALID      # GLAG CAP LET AZU..GLAG CA
2299	   2C2F        ; UNASSIGNED  # <reserved>
2300	   2C30..2C5E  ; PVALID      # GLAG SM LET AZU..GLAG SMAL
2301	   2C5F        ; UNASSIGNED  # <reserved>
2302	   2C60..2C7B  ; PVALID      # LAT CAP LET L W DOUBLE BAR..LAT SM
2303	   2C7C..2C7D  ; FREE_PVAL   # LAT SUB SM LET J..MOD LET CAP V
2304	   2C7E..2CE4  ; PVALID      # LAT CAP LET S W SWASH TAIL..COPT SY
2305	   2CE5..2CEA  ; FREE_PVAL   # COPT SYM MI RO..COPT SYM SHIMA SIMA
2306	   2CEB..2CF3  ; PVALID      # COPT CAP LET CRYPTOGRAMMIC SHEI..CO
2307	   2CF4..2CF8  ; UNASSIGNED  # <reserved>..<reserved>
2308	   2CF9..2CFF  ; FREE_PVAL   # COPT OLD NUB FULL STOP..COPT MORPHO
2309	   2D00..2D25  ; PVALID      # GEORG SM LET AN..GEORG SM LET
2310	   2D26        ; UNASSIGNED  # <reserved>
2311	   2D27        ; PVALID      # GEORG SM LET YN
2312	   2D28..2D2C  ; UNASSIGNED  # <reserved>..<reserved>
2313	   2D2D        ; PVALID      # GEORG SM LET AEN
2314	   2D2E..2D2F  ; UNASSIGNED  # <reserved>..<reserved>
2315	   2D30..2D67  ; PVALID      # TIFINAGH LET YA..TIFINAGH LETTER YO
2316	   2D68..2D6E  ; UNASSIGNED  # <reserved>..<reserved>
2317	   2D6F..2D70  ; PVALID      # TIFINAGH MOD LET LABIALIZATION MARK
2318	   2D71..2D7E  ; UNASSIGNED  # <reserved>..<reserved>
2319	   2D7F..2D96  ; PVALID      # TIFINAGH CONS JOINER..ETHI SYL GGW
2320	   2D97..2D9F  ; UNASSIGNED  # <reserved>..<reserved>
2321	   2DA0..2DA6  ; PVALID      # ETHI SYL SSA..ETHI SYL SSO
2322	   2DA7        ; UNASSIGNED  # <reserved>
2323	   2DA8..2DAE  ; PVALID      # ETHI SYL CCA..ETHI SYL CCO
2324	   2DAF        ; UNASSIGNED  # <reserved>
2325	   2DB0..2DB6  ; PVALID      # ETHI SYL ZZA..ETHI SYL ZZO
2326	   2DB7        ; UNASSIGNED  # <reserved>
2327	   2DB8..2DBE  ; PVALID      # ETHI SYL CCHA..ETHI SYL CC
2328	   2DBF        ; UNASSIGNED  # <reserved>
2329	   2DC0..2DC6  ; PVALID      # ETHI SYL QYA..ETHI SYL QYO
2330	   2DC7        ; UNASSIGNED  # <reserved>
2331	   2DC8..2DCE  ; PVALID      # ETHI SYL KYA..ETHI SYL KYO
2332	   2DCF        ; UNASSIGNED  # <reserved>
2333	   2DD0..2DD6  ; PVALID      # ETHI SYL XYA..ETHI SYL XYO
2334	   2DD7        ; UNASSIGNED  # <reserved>
2335	   2DD8..2DDE  ; PVALID      # ETHI SYL GYA..ETHI SYL GYO
2336	   2DDF        ; UNASSIGNED  # <reserved>
2337	   2DE0..2DFF  ; PVALID      # COMB CYR LET BE..COMB CYRI
2338	   2E00..2E2E  ; FREE_PVAL   # RIGHT ANGLE SUB MARK..REV QUEST MAR
2339	   2E2F        ; PVALID      # VERT TILDE
2340	   2E30..2E3B  ; FREE_PVAL   # RING PNT..THREE-EM DASH
2341	   2E3C..2E7F  ; UNASSIGNED  # <reserved>..<reserved>
2342	   2E80..2E99  ; FREE_PVAL   # CJK RAD REPEAT..CJK RAD RAP
2343	   2E9A        ; UNASSIGNED  # <reserved>
2344	   2E9B..2EF3  ; FREE_PVAL   # CJK RAD CHOKE..CJK RAD C-SIMPLIFIED
2345	   2EF4..2EFF  ; UNASSIGNED  # <reserved>..<reserved>
2346	   2F00..2FD5  ; FREE_PVAL   # KANGXI RAD ONE..KANGXI RAD FLUTE
2347	   2FD6..2FEF  ; UNASSIGNED  # <reserved>..<reserved>
2348	   2FF0..2FFB  ; FREE_PVAL   # IDEO DESC CHAR LEFT TO RIGHT..IDEO
2349	   2FFC..2FFF  ; UNASSIGNED  # <reserved>..<reserved>
2350	   3000..3004  ; FREE_PVAL   # IDEO SPACE..JAPAN INDUST STAND
2351	   3005..3007  ; PVALID      # IDEO ITER MARK..IDEO NUMB ZERO
2352	   3008..3029  ; FREE_PVAL   # LEFT ANGLE BRACKET..HANGZH NUM NINE
2353	   302A..302D  ; PVALID      # IDEO LEVEL TONE MARK..IDEO ENT
2354	   302E..302F  ; FREE_PVAL   # HANGUL SING DOT TONE MARK..WAVY DAS
2355	   3031..3035  ; DISALLOWED  # VERT KANA REP MARK..VERT KANA REP M
2356	   3036..303A  ; FREE_PVAL   # CIRCLED POSTAL MARK..HANGZH NUM THI
2357	   303B        ; DISALLOWED  # VERT IDEO ITER MARK
2358	   303C        ; PVALID      # MASU MARK
2359	   303D..303F  ; DISALLOWED  # PART ALTER MARK..IDEO HALF FILL
2360	   3040        ; UNASSIGNED  # <reserved>
2361	   3041..3096  ; PVALID      # HIRAGANA LET SM A..HIRAGANA LET SMA
2362	   3097..3098  ; UNASSIGNED  # <reserved>..<reserved>
2363	   3099..309A  ; PVALID      # COMB KAT-HIR VOICED SOUND
2364	   309B..309C  ; FREE_PVAL   # KAT-HIR VOICED SOUND MARK..KAT-HIR
2365	   309D..309E  ; PVALID      # HIRAGANA ITER MARK..HIRAGANA VOICED
2366	   309F..30A0  ; FREE_PVAL   # HIRAGANA DIGRAPH YORI..KAT-HIR DOU
2367	   30A1..30FA  ; PVALID      # KATAKANA LET SM A..KATAKANA LET VO
2368	   30FB        ; CONTEXTO    # KATAKANA MIDDLE DOT
2369	   30FC..30FE  ; PVALID      # KAT-HIR PROLONGED SOUND MARK..KATA
2370	   30FF        ; FREE_PVAL   # KATAKANA DIGRAPH KOTO
2371	   3100..3104  ; UNASSIGNED  # <reserved>..<reserved>
2372	   3105..312D  ; PVALID      # BOPOMOFO LET B..BOPOMOFO LET IH
2373	   312E..3130  ; UNASSIGNED  # <reserved>..<reserved>
2374	   3131..3163  ; FREE_PVAL   # HANGUL LET KIYEOK..HANGUL LET I
2375	   3164        ; DISALLOWED  # HANGUL FILLER
2376	   3165..318E  ; FREE_PVAL   # HANGUL LET SSANGNIEUN..HANGUL LET
2377	   318F        ; UNASSIGNED  # <reserved>
2378	   3190..319F  ; FREE_PVAL   # IDEO ANNO LINK MARK..IDEO ANNO MAN
2379	   31A0..31BA  ; PVALID      # BOPOMOFO LET BU..BOPOMOFO LET ZY
2380	   31BB..31BF  ; UNASSIGNED  # <reserved>..<reserved>
2381	   31C0..31E3  ; FREE_PVAL   # CJK STROKE T..CJK STROKE Q
2382	   31E4..31EF  ; UNASSIGNED  # <reserved>..<reserved>
2383	   31F0..31FF  ; PVALID      # KATAKANA LET SM KU..KATAKANA LET SM
2384	   3200..321E  ; FREE_PVAL   # PAREN HANGUL KIYEOK..PAREN KOREAN C
2385	   321F        ; UNASSIGNED  # <reserved>
2386	   3220..32FE  ; FREE_PVAL   # PAREN IDEO ONE..CIRCLED KATAKANA WO
2387	   32FF        ; UNASSIGNED  # <reserved>
2388	   3300..33FF  ; FREE_PVAL   # SQUARE APAATO..SQUARE GAL
2389	   3400..4DB5  ; PVALID      # <CJK Ideograph Extension A>
2390	   4DB6..4DBF  ; UNASSIGNED  # <reserved>..<reserved>
2391	   4DC0..4DFF  ; FREE_PVAL   # HEX FOR THE CREATIVE HEAVEN..HEX FO
2392	   4E00..9FCC  ; PVALID      # <CJK Ideograph>
2393	   9FCE..9FFF  ; UNASSIGNED  # <reserved>..<reserved>
2394	   A000..A48C  ; PVALID      # YI SYL IT..YI SYL YYR
2395	   A48D..A48F  ; UNASSIGNED  # <reserved>..<reserved>
2396	   A490..A4C6  ; FREE_PVAL   # YI RAD QOT..YI RAD KE
2397	   A4C7..A4CF  ; UNASSIGNED  # <reserved>..<reserved>
2398	   A4D0..A4FD  ; PVALID      # LISU LET BA..LISU LET TONE MYA JEU
2399	   A4FE..A4FF  ; FREE_PVAL   # LISU PUNCT COMMA..LISU PUNCT FUL
2400	   A500..A60C  ; PVALID      # VAI SYL EE..VAI SYL LENENER
2401	   A60D..A60F  ; FREE_PVAL   # VAI COMMA..VAI QUEST MARK
2402	   A610..A62B  ; PVALID      # VAI SYL NDOLE FA..VAI SYL NDOLE DO
2403	   A62C..A63F  ; UNASSIGNED  # <reserved>..<reserved>
2404	   A640..A66F  ; PVALID      # CYR CAP LET ZEMLYA..COMB CYR VZMET
2405	   A670..A673  ; FREE_PVAL   # COMB CYR TEN MILLIONS SIGN..SLAVON
2406	   A674..A67D  ; PVALID      # COMB CYR KAVYKA..COMB CYR PAYEROK
2407	   A67E        ; FREE_PVAL   # CYR KAVYKA
2408	   A67F..A697  ; PVALID      # CYR PAYEROK..CYR SM LET SHWE
2409	   A698..A69E  ; UNASSIGNED  # <reserved>..<reserved>
2410	   A69F..A6E5  ; PVALID      # COMB CYR LET IOTIFIED E..BAMUM LET
2411	   A6E6..A6EF  ; FREE_PVAL   # BAMUM LET MO..BAMUM LET KOGHOM
2412	   A6F0..A6F1  ; PVALID      # BAMUM COMB MARK KOQNDON..BAMUM COMB
2413	   A6F2..A6F7  ; FREE_PVAL   # BAMUM NJAEMLI..BAMUM QUEST MARK
2414	   A6F8..A6FF  ; UNASSIGNED  # <reserved>..<reserved>
2415	   A700..A716  ; FREE_PVAL   # MOD LET CHIN TONE YIN PING..MOD
2416	   A717..A71F  ; PVALID      # MOD LET DOT VERT BAR..MOD L
2417	   A720..A721  ; FREE_PVAL   # MOD LET STRESS AND HIGH TONE..MOD
2418	   A722..A76F  ; PVALID      # LAT CAP LET EGYPT ALEF..LAT SM LET
2419	   A770        ; FREE_PVAL   # MODIFIER LETTER US
2420	   A771..A788  ; PVALID      # LATIN SMALL LETTER DUM..MOD LET LOW
2421	   A789..A78A  ; FREE_PVAL   # MOD LET COLON..MOD LET SH EQUALS SI
2422	   A78B..A78E  ; PVALID      # LAT SM LET SALTILLO..LAT SM LET L W
2423	   A78F        ; UNASSIGNED  # <reserved>
2424	   A790..A793  ; PVALID      # LAT CAP LET N W DESC..LAT SM LET C
2425	   A794..A79F  ; UNASSIGNED  # <reserved>..<reserved>
2426	   A7A0..A7AA  ; PVALID      # LAT CAP LET G W OBLIQUE STROKE..LAT
2427	   A7AB..A7F7  ; UNASSIGNED  # <reserved>..<reserved>
2428	   A7F8..A7F9  ; FREE_PVAL   # MOD LET CAP H W STROKE..MOD LET SM
2429	   A7FA..A827  ; PVALID      # LAT LET SM CAP TURNED M..SYLOTI NA
2430	   A828..A82B  ; FREE_PVAL   # SYLOTI NAGRI POET MARK-1..SYLOTI NA
2431	   A82C..A82F  ; UNASSIGNED  # <reserved>..<reserved>
2432	   A830..A839  ; FREE_PVAL   # N INDIC FRACT ONE QUART..N INDIC QU
2433	   A83A..A83F  ; UNASSIGNED  # <reserved>..<reserved>
2434	   A840..A873  ; PVALID      # PHAGS-PA LET KA..PHAGS-PA LET CANDR
2435	   A874..A877  ; FREE_PVAL   # PHAGS-PA SINGLE HEAD MARK..PHAGS-PA
2436	   A878..A87F  ; UNASSIGNED  # <reserved>..<reserved>
2437	   A880..A8C4  ; PVALID      # SAUR SIGN ANUSVARA..SAUR SIGN VIRAM
2438	   A8C5..A8CD  ; UNASSIGNED  # <reserved>..<reserved>
2439	   A8CE..A8CF  ; FREE_PVAL   # SAUR DANDA..SAUR DOUBLE DANDA
2440	   A8D0..A8D9  ; PVALID      # SAUR DIG ZERO..SAUR DIG NINE
2441	   A8DA..A8DF  ; UNASSIGNED  # <reserved>..<reserved>
2442	   A8E0..A8F7  ; PVALID      # COMB DEVAN DIG ZERO..DEVAN SIGN CAN
2443	   A8F8..A8FA  ; FREE_PVAL   # DEVAN SIGN PUSHPIKA..DEVAN CARET
2444	   A8FB        ; PVALID      # DEVAN HEADSTROKE
2445	   A8FC..A8FF  ; UNASSIGNED  # <reserved>..<reserved>
2446	   A900..A92D  ; PVALID      # KAYAH LI DIG ZERO..KAYAH LI TONE CA
2447	   A92E..A92F  ; FREE_PVAL   # KAYAH LI SIGN CWI..KAYAH LI SIGN SH
2448	   A930..A953  ; PVALID      # REJANG LET KA..REJANG VIRAMA
2449	   A954..A95E  ; UNASSIGNED  # <reserved>..<reserved>
2450	   A95F        ; FREE_PVAL   # REJANG SECTION MARK
2451	   A960..A97C  ; DISALLOWED  # HANGUL CHO TIKEUT-MIUEM..HANGUL CHO
2452	   A97D..A97F  ; UNASSIGNED  # <reserved>..<reserved>
2453	   A980..A9C0  ; PVALID      # JAV SIGN PANYANGGA..JAV PANGKON
2454	   A9C1..A9CD  ; FREE_PVAL   # JAV LEFT RERENGGAN..JAV TURNED PADA
2455	   A9CE        ; UNASSIGNED  # <reserved>
2456	   A9CF..A9D9  ; PVALID      # JAV PANGRANGKEP..JAV DIG NINE
2457	   A9DA..A9DD  ; UNASSIGNED  # <reserved>..<reserved>
2458	   A9DE..A9DF  ; FREE_PVAL   # JAV PADA TIRTA TUMETES..JAV PADA I
2459	   A9E0..A9FF  ; UNASSIGNED  # <reserved>..<reserved>
2460	   AA00..AA36  ; PVALID      # CHAM LET A..CHAM CONS SIGN WA
2461	   AA37..AA3F  ; UNASSIGNED  # <reserved>..<reserved>
2462	   AA40..AA4D  ; PVALID      # CHAM LET FIN K..CHAM CONS SIGN FIN
2463	   AA4E..AA4F  ; UNASSIGNED  # <reserved>..<reserved>
2464	   AA50..AA59  ; PVALID      # CHAM DIG ZERO..CHAM DIG NINE
2465	   AA5A..AA5B  ; UNASSIGNED  # <reserved>..<reserved>
2466	   AA5C..AA5F  ; FREE_PVAL   # CHAM PUNCT SPIRAL..CHAM PUNCT TR
2467	   AA60..AA76  ; PVALID      # MYAN LET KHAMTI GA..MYAN LOGOGRAM K
2468	   AA77..AA79  ; FREE_PVAL   # MYAN SYM AITON EXCLAM..MYAN SYM AIT
2469	   AA7A..AA7B  ; PVALID      # MYAN LET AITON RA..MYAN SIGN PAO KA
2470	   AA7C..AA7F  ; UNASSIGNED  # <reserved>..<reserved>
2471	   AA80..AAC2  ; PVALID      # TAI VIET LET LOW KO..TAI VIET TONE
2472	   AAC3..AADA  ; UNASSIGNED  # <reserved>..<reserved>
2473	   AADB..AADD  ; PVALID      # TAI VIET SYM KON..TAI VIET SYM SAM
2474	   AADE..AADF  ; FREE_PVAL   # TAI VIET SYM HO HOI..TAI VIET SYM K
2475	   AAE0..AAEF  ; PVALID      # MEETEI MAYEK LET E..MEETEI MAYEK VO
2476	   AAF0..AAF1  ; FREE_PVAL   # MEETEI MAYEK CHEIKHAN..MEETEI MAYEK
2477	   AAF2..AAF6  ; PVALID      # MEETEI MAYEK ANJI..MEETEI MAYEK VIR
2478	   AAF7..AB00  ; UNASSIGNED  # <reserved>..<reserved>
2479	   AB01..AB06  ; PVALID      # ETHI SYL TTHU..ETHI SYL TTHO
2480	   AB07..AB08  ; UNASSIGNED  # <reserved>..<reserved>
2481	   AB09..AB0E  ; PVALID      # ETHI SYL DDHAA..ETHI SYL DDHO
2482	   AB0F..AB10  ; UNASSIGNED  # <reserved>..<reserved>
2483	   AB11..AB16  ; PVALID      # ETHI SYL DZU..ETHI SYL DZO
2484	   AB17..AB1F  ; UNASSIGNED  # <reserved>..<reserved>
2485	   AB20..AB26  ; PVALID      # ETHI SYL CCHHA..ETHI SYL CCHHO
2486	   AB27        ; UNASSIGNED  # <reserved>..<reserved>
2487	   AB28..AB2E  ; PVALID      # ETHI SYL BBAA..ETHI SYL BBO
2488	   AB2F..ABBF  ; UNASSIGNED  # <reserved>..<reserved>
2489	   ABC0..ABEA  ; PVALID      # MEETEI MAYEK LET KOK..MEETEI MAYEK
2490	   ABEB        ; FREE_PVAL   # MEETEI MAYEK CHEIKHEI
2491	   ABEC..ABED  ; PVALID      # MEETEI MAYEK LUM IYEK..MEETEI MAYEK
2492	   ABEE..ABEF  ; UNASSIGNED  # <reserved>..<reserved>
2493	   ABF0..ABF9  ; PVALID      # MEETEI MAYEK DIG ZERO..MEETEI MAYEK
2494	   ABFA..ABFF  ; UNASSIGNED  # <reserved>..<reserved>
2495	   AC00..D7A3  ; PVALID      # <Hangul Syllable>
2496	   D7A4..D7AF  ; UNASSIGNED  # <reserved>..<reserved>
2497	   D7B0..D7C6  ; DISALLOWED  # HANGUL JUNG O-YEO..HANGUL JUNG ARAE
2498	   D7C7..D7CA  ; UNASSIGNED  # <reserved>..<reserved>
2499	   D7CB..D7FB  ; DISALLOWED  # HANGUL JONG NIEUN-RIEUL..HANGUL JON
2500	   D7FC..D7FF  ; UNASSIGNED  # <reserved>..<reserved>
2501	   D800..F8FF  ; DISALLOWED  # <Non Private Use High Surrogate>
2502	   F900..FA6D  ; PVALID      # CJK COMP IDEO-F900..CJK COMP IDEO
2503	   FA6E..FA6F  ; UNASSIGNED  # <reserved>..<reserved>
2504	   FA70..FAD9  ; FREE_PVAL   # CJK COMP IDEO-FA70..CJK COMP IDEO
2505	   FADA..FAFF  ; UNASSIGNED  # <reserved>..<reserved>
2506	   FB00..FB06  ; FREE_PVAL   # LAT SM LIG FF..LAT SM LIG ST
2507	   FB07..FB12  ; UNASSIGNED  # <reserved>..<reserved>
2508	   FB13..FB17  ; FREE_PVAL   # ARMENIAN SM LIG MEN NOW..ARMENIAN SM
2509	   FB18..FB1C  ; UNASSIGNED  # <reserved>..<reserved>
2510	   FB1D..FB1F  ; PVALID      # HEBR LET YOD W HIRIQ..HEBR LIG YID Y
2511	   FB20..FB29  ; FREE_PVAL   # HEBR LET ALT AYIN..HEB LET ALT PLUS
2512	   FB2A..FB36  ; PVALID      # HEBR LET SHIN W SHIN DOT..HEBR LET Z
2513	   FB37        ; UNASSIGNED  # <reserved>
2514	   FB38..FB3C  ; FREE_PVAL   # HEBR LET TET W DAGESH..HEBR LET
2515	   FB3D        ; UNASSIGNED  # <reserved>
2516	   FB3E        ; FREE_PVAL   # HEBR LET MEM W DAGESH
2517	   FB3F        ; UNASSIGNED  # <reserved>
2518	   FB40..FB41  ; FREE_PVAL   # HEBR LET NUN W DAGESH..HEBR LET
2519	   FB42        ; UNASSIGNED  # <reserved>
2520	   FB43..FB44  ; FREE_PVAL   # HEBR LET FIN PE W DAGESH..HEBR L
2521	   FB45        ; UNASSIGNED  # <reserved>
2522	   FB46..FB4E  ; PVALID      # HEBR LET TSADI W DAGESH..HEBR LET P
2523	   FB4F..FBC1  ; FREE_PVAL   # HEBR LIG ALEF LAMED..ARAB SYM S
2524	   FBC2..FBD2  ; UNASSIGNED  # <reserved>..<reserved>
2525	   FBD3..FD3F  ; FREE_PVAL   # ARAB LET NG ISO FORM..ORNATE RIGHT
2526	   FD40..FD4F  ; UNASSIGNED  # <reserved>..<reserved>
2527	   FD50..FD8F  ; FREE_PVAL   # ARAB LIG TEH W JEEM W MEEM INIT
2528	   FD90..FD91  ; UNASSIGNED  # <reserved>..<reserved>
2529	   FD92..FDC7  ; FREE_PVAL   # ARAB LIG MEEM W JEEM W KHAH INI
2530	   FDC8..FDEF  ; UNASSIGNED  # <reserved>..<reserved>
2531	   FDF0..FDFD  ; FREE_PVAL   # ARAB LIG SALLA USED..ARAB LIG BISMI
2532	   FDFE..FDFF  ; UNASSIGNED  # <reserved>..<reserved>
2533	   FE00..FE0F  ; PVALID      # VAR SEL-1..VAR SEL-16
2534	   FE10..FE19  ; FREE_PVAL   # PRES FORM FOR VERT COMMA..PRES FORM
2535	   FE20..FE26  ; PVALID      # COMB LIG LEFT HALF..COMB CONJ MACRO
2536	   FE27..FE2F  ; UNASSIGNED  # <reserved>..<reserved>
2537	   FE30..FE52  ; FREE_PVAL   # PRES FORM FOR VERT TWO DOT LEAD..SM
2538	   FE53        ; UNASSIGNED  # <reserved>
2539	   FE54..FE66  ; FREE_PVAL   # SM SEMICOLON..SM EQUALS SIGN
2540	   FE67        ; UNASSIGNED  # <reserved>
2541	   FE68..FE6B  ; FREE_PVAL   # SM REV SOLIDUS..SM COMM AT
2542	   FE6C..FE6F  ; UNASSIGNED  # <reserved>..<reserved>
2543	   FE70..FE72  ; FREE_PVAL   # ARAB FATHATAN ISO FORM..ARAB DAMMAT
2544	   FE73        ; PVALID      # ARAB TAIL FRAGMENT
2545	   FE74        ; FREE_PVAL   # ARAB KASRATAN ISO FORM
2546	   FE75        ; UNASSIGNED  # <reserved>
2547	   FE76..FEFC  ; FREE_PVAL   # ARAB FATHA ISO FORM..ARAB LIG LAM W
2548	   FEFD..FEFE  ; UNASSIGNED  # <reserved>..<reserved>
2549	   FEFF        ; DISALLOWED  # ZERO WIDTH NO-BREAK SPACE
2550	   FF00        ; UNASSIGNED  # <reserved>
2551	   FF01..FF9F  ; FREE_PVAL   # FULLW EXCLAM MARK..HALFW KATA SE
2552	   FFA0        ; DISALLOWED  # HALFW HANGUL FILLER
2553	   FFA1..FFBE  ; FREE_PVAL   # HALFW HANGUL LET KIYEOK..HALFW H
2554	   FFBF..FFC1  ; UNASSIGNED  # <reserved>..<reserved>
2555	   FFC2..FFC7  ; FREE_PVAL   # HALFW HANGUL LET A..HALFW HANGUL
2556	   FFC8..FFC9  ; UNASSIGNED  # <reserved>..<reserved>
2557	   FFCA..FFCF  ; FREE_PVAL   # HALFW HANGUL LET YEO..HALFW HANGU
2558	   FFD0..FFD1  ; UNASSIGNED  # <reserved>..<reserved>
2559	   FFD2..FFD7  ; FREE_PVAL   # HALFW HANGUL LET YO..HALFW HANGUL
2560	   FFD8..FFD9  ; UNASSIGNED  # <reserved>..<reserved>
2561	   FFDA..FFDC  ; FREE_PVAL   # HALFW HANGUL LET EU..HALFW HANGUL
2562	   FFDD..FFDF  ; UNASSIGNED  # <reserved>..<reserved>
2563	   FFE0..FFE6  ; FREE_PVAL   # FULLW CENT SIGN..FULLW WON SIGN
2564	   FFE7        ; UNASSIGNED  # <reserved>
2565	   FFE8..FFEE  ; FREE_PVAL   # HALFW FORMS LIGHT VERT..HALFW WH
2566	   FFEF..FFF8  ; UNASSIGNED  # <reserved>..<reserved>
2567	   FFF9..FFFB  ; DISALLOWED  # INTERL ANNO ANCHOR..INTERL ANNO TER
2568	   FFFC..FFFD  ; FREE_PVAL   # OBJECT REPL CHAR..REPL CHAR
2569	   FFFE..FFFF  ; UNASSIGNED  # <reserved>..<reserved>
2570	   10000..1000B; PVALID      # LIN B SYL B008 A..LIN B SYL
2571	   1000C       ; UNASSIGNED  # <reserved>
2572	   1000D..10026; PVALID      # LIN B SYL B036 JO..LIN B SYL
2573	   10027       ; UNASSIGNED  # <reserved>
2574	   10028..1003A; PVALID      # LIN B SYL B060 RA..LIN B SYL
2575	   1003B       ; UNASSIGNED  # <reserved>
2576	   1003C..1003D; PVALID      # LIN B SYL B017 ZA..LIN B SYL
2577	   1003E       ; UNASSIGNED  # <reserved>
2578	   1003F..1004D; PVALID      # LIN B SYL B020 ZO..LIN B SYL
2579	   1004E..1004F; UNASSIGNED  # <reserved>..<reserved>
2580	   10050..1005D; PVALID      # LIN B SYM B018..LIN B SYM B089
2581	   1005E..1007F; UNASSIGNED  # <reserved>..<reserved>
2582	   10080..100FA; PVALID      # LIN B IDEO B100 MAN..LIN B IDEO
2583	   100FB..100FF; UNASSIGNED  # <reserved>..<reserved>
2584	   10100..10102; FREE_PVAL   # AEG WORD SEP LINE..AEG CHECK MAR
2585	   10103..10106; UNASSIGNED  # <reserved>..<reserved>
2586	   10107..10133; FREE_PVAL   # AEG NUM ONE..AEG NUM NINETY THOU
2587	   10134..10136; UNASSIGNED  # <reserved>..<reserved>
2588	   10137..1018A; FREE_PVAL   # AEG WEIGHT BASE UNIT..GREEK ZERO SI
2589	   1018B..1018F; UNASSIGNED  # <reserved>..<reserved>
2590	   10190..1019B; FREE_PVAL   # ROM SEXTANS SIGN..ROM CENTURIAL SIG
2591	   1019C..101CF; UNASSIGNED  # <reserved>..<reserved>
2592	   101D0..101FC; FREE_PVAL   # PHAISTOS DISC SIGN PED..PHAISTOS DI
2593	   101FD       ; PVALID      # PHAISTOS DISC SIGN COMB OBLIQUE STR
2594	   101FE..1027F; UNASSIGNED  # <reserved>..<reserved>
2595	   10280..1029C; PVALID      # LYCIAN LET A..LYCIAN LET X
2596	   1029D..1029F; UNASSIGNED  # <reserved>..<reserved>
2597	   102A0..102D0; PVALID      # CARIAN LET A..CARIAN LET UUU3
2598	   102D1..102FF; UNASSIGNED  # <reserved>..<reserved>
2599	   10300..1031E; PVALID      # OLD ITAL LET A..OLD ITAL LET UU
2600	   1031F       ; UNASSIGNED  # <reserved>
2601	   10320..10323; FREE_PVAL   # OLD ITAL NUM ONE..OLD ITAL NUM F
2602	   10324..1032F; UNASSIGNED  # <reserved>..<reserved>
2603	   10330..10340; PVALID      # GOTH LET AHSA..GOTH LET PAIRTHRA
2604	   10341       ; FREE_PVAL   # GOTH LET NINETY
2605	   10342..10349; PVALID      # GOTH LET RAIDA..GOTH LET OTHAL
2606	   1034A       ; FREE_PVAL   # GOTH LET NINE HUNDRED
2607	   1034B..1037F; UNASSIGNED  # <reserved>..<reserved>
2608	   10380..1039D; PVALID      # UGAR LET ALPA..UGAR LET SSU
2609	   1039E       ; UNASSIGNED  # <reserved>
2610	   1039F       ; FREE_PVAL   # UGAR WORD DIVIDER
2611	   103A0..103C3; PVALID      # OLD PERS SIGN A..OLD PERS SIGN HA
2612	   103C4..103C7; UNASSIGNED  # <reserved>..<reserved>
2613	   103C8..103CF; PVALID      # OLD PERS SIGN AURAMAZDAA..OLD PERS
2614	   103D0..103D5; FREE_PVAL   # OLD PERS WORD DIVIDER..OLD PERS NUM
2615	   103D6..103FF; UNASSIGNED  # <reserved>..<reserved>
2616	   10400..1049D; PVALID      # DESERET CAP LET LONG I..OSMANYA LET
2617	   1049E..1049F; UNASSIGNED  # <reserved>..<reserved>
2618	   104A0..104A9; PVALID      # OSMANYA DIG ZERO..OSMANYA DIG NINE
2619	   104AA..107FF; UNASSIGNED  # <reserved>..<reserved>
2620	   10800..10805; PVALID      # CYPRIOT SYL A..CYPRIOT SYL JA
2621	   10806..10807; UNASSIGNED  # <reserved>..<reserved>
2622	   10808       ; PVALID      # CYPRIOT SYL JO
2623	   10809       ; UNASSIGNED  # <reserved>
2624	   1080A..10835; PVALID      # CYPRIOT SYL KA..CYPRIOT SYL WO
2625	   10836       ; UNASSIGNED  # <reserved>
2626	   10837..10838; PVALID      # CYPRIOT SYL XA..CYPRIOT SYL XE
2627	   10839..1083B; UNASSIGNED  # <reserved>..<reserved>
2628	   1083C       ; PVALID      # CYPRIOT SYL ZA
2629	   1083D..1083E; UNASSIGNED  # <reserved>..<reserved>
2630	   1083F..10855; PVALID      # CYPRIOT SYL ZO..IMP ARAM LET TAW
2631	   10856       ; UNASSIGNED  # <reserved>
2632	   10857..1085F; FREE_PVAL   # IMP ARAM SECT SIGN..IMP ARAM
2633	   10860..108FF; UNASSIGNED  # <reserved>..<reserved>
2634	   10900..10915; PVALID      # PHOEN LET ALF..PHOEN LET TAU
2635	   10916..1091B; FREE_PVAL   # PHOEN NUM ONE..PHOEN NUM THR
2636	   1091C..1091E; UNASSIGNED  # <reserved>..<reserved>
2637	   1091F       ; FREE_PVAL   # PHOEN WORD SEP
2638	   10920..10939; PVALID      # LYDIAN LET A..LYDIAN LET C
2639	   1093A..1093E; UNASSIGNED  # <reserved>..<reserved>
2640	   1093F       ; FREE_PVAL   # LYDIAN TRIANGULAR MARK
2641	   10940..1097F; UNASSIGNED  # <reserved>..<reserved>
2642	   10980..109B7; PVALID      # MERO HIER LET A..MERO CURS LET
2643	   109B8..109BD; UNASSIGNED  # <reserved>..<reserved>
2644	   109BE..109BF; PVALID      # MERO CURS LOG RMT..MERO CURS L
2645	   109C0..109FF; UNASSIGNED  # <reserved>..<reserved>
2646	   10A00..10A03; PVALID      # KHARO LET A..KHARO VOW SIGN V
2647	   10A04       ; UNASSIGNED  # <reserved>
2648	   10A05..10A06; PVALID      # KHARO VOW SIGN E..KHARO VOW SI
2649	   10A07..10A0B; UNASSIGNED  # <reserved>..<reserved>
2650	   10A0C..10A13; PVALID      # KHARO VOW LEN MARK..KHARO LET
2651	   10A14       ; UNASSIGNED  # <reserved>
2652	   10A15..10A17; PVALID      # KHARO LET CA..KHARO LET JA
2653	   10A18       ; UNASSIGNED  # <reserved>
2654	   10A19..10A33; PVALID      # KHARO LET NYA..KHARO LET TTT
2655	   10A34..10A37; UNASSIGNED  # <reserved>..<reserved>
2656	   10A38..10A3A; PVALID      # KHARO SIGN BAR ABOVE..KHARO SIGN D
2657	   10A3B..10A3E; UNASSIGNED  # <reserved>..<reserved>
2658	   10A3F       ; PVALID      # KHARO VIRAMA
2659	   10A40..10A47; FREE_PVAL   # KHARO DIG ONE..KHARO NUM ONE
2660	   10A48..10A4F; UNASSIGNED  # <reserved>..<reserved>
2661	   10A50..10A58; FREE_PVAL   # KHARO PUNCT DOT..KHARO PUNCT
2662	   10A59..10A5F; UNASSIGNED  # <reserved>..<reserved>
2663	   10A60..10A7C; PVALID      # OLD S ARAB LET HE..OLD SOUTH ARAB
2664	   10A7D..10A7F; FREE_PVAL   # OLD S ARAB NUM ONE..OLD SOUTH ARAB
2665	   10A80..10AFF; UNASSIGNED  # <reserved>..<reserved>
2666	   10B00..10B35; PVALID      # AVESTAN LET A..AVESTAN LET HE
2667	   10B36..10B38; UNASSIGNED  # <reserved>..<reserved>
2668	   10B39..10B3F; FREE_PVAL   # AVESTAN ABBR MARK..LARGE ONE RING O
2669	   10B40..10B55; PVALID      # INSCRIPT PARTHIAN LET ALEPH..INSCRI
2670	   10B56..10B57; UNASSIGNED  # <reserved>..<reserved>
2671	   10B58..10B5F; FREE_PVAL   # INSCRIPT PARTHIAN NUM ONE..INSCRIPT
2672	   10B60..10B72; PVALID      # INSCRIPT PAHLAVI LET ALEPH..INSCRIP
2673	   10B73..10B77; UNASSIGNED  # <reserved>..<reserved>
2674	   10B78..10B7F; FREE_PVAL   # INSCRIPT PAHLAVI NUM ONE..INSCRIPT
2675	   10B80..10BFF; UNASSIGNED  # <reserved>..<reserved>
2676	   10C00..10C48; PVALID      # OLD TURK LET ORKHON A..OLD TURK LET
2677	   10C49..10E5F; UNASSIGNED  # <reserved>..<reserved>
2678	   10E60..10E7E; FREE_PVAL   # RUMI DIG ONE..RUMI FRACTION TWO THI
2679	   10E7F..10FFF; UNASSIGNED  # <reserved>..<reserved>
2680	   11000..11046; PVALID      # BRAHMI SIGN CANDRABINDU..BRAHMI VIR
2681	   11047..1104D; FREE_PVAL   # BRAHMI DANDA..BRAHMI PUNCT LOTUS
2682	   1104E..11051; UNASSIGNED  # <reserved>..<reserved>
2683	   11052..11065; FREE_PVAL   # BRAHMI NUM ONE..BRAHMI NUM ONE THOU
2684	   11066..1106F; PVALID      # BRAHMI DIG ZERO..BRAHMI DIG NINE
2685	   11070..1107F; UNASSIGNED  # <reserved>..<reserved>
2686	   11080..110BA; PVALID      # KAITHI SIGN CANDRABINDU..KAITHI SIG
2687	   110BB..110BC; FREE_PVAL   # KAITHI ABBR SIGN..KAITHI ENUM SIGN
2688	   110BD       ; DISALLOWED  # KAITHI NUM SIGN
2689	   110BE..110C1; FREE_PVAL   # KAITHI SECT MARK..KAITHI DOUBLE DAN
2690	   110C2..110CF; UNASSIGNED  # <reserved>..<reserved>
2691	   110D0..110F8; PVALID      # SORA SOMPENG LETTER SAH..SORA SOMPE
2692	   110F9..110EF; UNASSIGNED  # <reserved>..<reserved>
2693	   110F0..110F9; PVALID      # SORA SOMPENG DIG ZERO..SORA SOMPENG DI
2694	   110FA..110FF; UNASSIGNED  # <reserved>..<reserved>
2695	   11100..11134; PVALID      # CHAKMA SIGN CANDRABINDU..CHAKMA MAAYY
2696	   11135       ; UNASSIGNED  # <reserved>
2697	   11136..1113F; PVALID      # CHAKMA DIG ZERO..CHAKMA DIG NINE
2698	   11140..11143; FREE_PVAL   # CHAKMA SECT MARK..CHAKMA QUEST MARK
2699	   11144..1117F; UNASSIGNED  # <reserved>..<reserved>
2700	   11180..111C4; PVALID      # SHARADA SIGN CANDRABINDU..SHARADA OM
2701	   111C5..111C8; FREE_PVAL   # SHARADA DANDA..SHARADA SEPARATOR
2702	   111C9..111CF; UNASSIGNED  # <reserved>..<reserved>
2703	   111D0..111D9; PVALID      # SHARADA DIG ZERO..SHARADA DIG NINE
2704	   111DA..1167F; UNASSIGNED  # <reserved>..<reserved>
2705	   11680..116B7; PVALID      # TAKRI LET A..TAKRI SIGN NUKTA
2706	   116B8..116BF; UNASSIGNED  # <reserved>..<reserved>
2707	   116C0..116C9; PVALID      # TAKRI DIGIT ZERO..TAKRI DIG NINE
2708	   116CA..1FFFF; UNASSIGNED  # <reserved>..<reserved>
2709	   12000..1236E; PVALID      # CUNEI SIGN A..CUNEI SIGN ZUM
2710	   1236F..123FF; UNASSIGNED  # <reserved>..<reserved>
2711	   12400..12462; FREE_PVAL   # CUNEI NUM SIGN TWO ASH..CUNEI NUM
2712	   12463..1246F; UNASSIGNED  # <reserved>..<reserved>
2713	   12470..12473; FREE_PVAL   # CUNEI PUNCT SIGN OLD ASSYRIAN WORD
2714	   12474..12FFF; UNASSIGNED  # <reserved>..<reserved>
2715	   13000..1342E; PVALID      # EGYPT HIERO A001..EGYPT HIERO AA032
2716	   1342F..167FF; UNASSIGNED  # <reserved>..<reserved>
2717	   16800..16A38; PVALID      # BAMUM LET PHASE-A NGKUE MFON..BAMUN LE
2718	   16A39..16EFF; UNASSIGNED  # <reserved>..<reserved>
2719	   16F00..16F44; PVALID      # MIAO LET PA..MIAO LET HHA
2720	   16F45..16F4F; UNASSIGNED  # <reserved>..<reserved>
2721	   16F50..16F7E; PVALID      # MIAO LET NAS..MIAO VOWEL SIGN NG
2722	   16F7F..16F8E; UNASSIGNED  # <reserved>..<reserved>
2723	   16F8F..16F9F; PVALID      # MIAO TONE RIGHT..MIAO LET REF TON
2724	   16FA0..1AFFF; UNASSIGNED  # <reserved>..<reserved>
2725	   1B000..1B001; PVALID      # KATA LET ARCH E..KATA LET ARCH YE
2726	   1B002..1CFFF; UNASSIGNED  # <reserved>..<reserved>
2727	   1D000..1D0F5; FREE_PVAL   # BYZ MUS SYM PSILI..BYZ MUS
2728	   1D0F6..1D0FF; UNASSIGNED  # <reserved>..<reserved>
2729	   1D100..1D126; FREE_PVAL   # MUS SYM SINGLE BARLINE..MUS SYMBOL
2730	   1D127..1D128; UNASSIGNED  # <reserved>..<reserved>
2731	   1D129..1D164; FREE_PVAL   # MUS SYM MULT MEASURE REST..MUS SYM ONE
2732	   1D165..1D169; PVALID      # MUS SYM COMB STEM..MUS SYM COMB TREMOL
2733	   1D16A..1D16C; FREE_PVAL   # MUS SYM FING TREM-1..MUS SYM FING TREM
2734	   1D16D..1D172; PVALID      # MUS SYM COMB AUG DOT..MUS SYM COMB FL
2735	   1D173..1D17A; DISALLOWED  # MUS SYM BEGIN BEAM..MUS SYM END PHRASE
2736	   1D17B..1D182; PVALID      # MUS SYM COMB ACCENT..MUS SYM COMB LOUR
2737	   1D183..1D184; FREE_PVAL   # MUS SYM ARP UP..MUS SYM ARP DOWN
2738	   1D185..1D18B; PVALID      # MUS SYM COMB DOIT..MUS SYM COMB TRIPLE
2739	   1D18C..1D1A9; FREE_PVAL   # MUS SYM RINFORZANDO..MUS SYM DEG SLASH
2740	   1D1AA..1D1AD; PVALID      # MUS SYM COMB DOWN BOW..MUS SYM COMB SN
2741	   1D1AE..1D1DD; FREE_PVAL   # MUS SYM PEDAL MARK..MUS SYM PES SUBPUN
2742	   1D1DE..1D1FF; UNASSIGNED  # <reserved>..<reserved>
2743	   1D200..1D241; FREE_PVAL   # GREEK VOCAL NOTATION SYM-1..GREEK INS
2744	   1D242..1D244; FREE_PVAL   # COMB GREEK MUS TRISEME..COMB GREEK MU
2745	   1D245       ; FREE_PVAL   # GREEK MUSICAL LEIMMA
2746	   1D246..1D2FF; UNASSIGNED  # <reserved>..<reserved>
2747	   1D300..1D356; DISALLOWED  # MONOG FOR EARTH..TETRAG FOR FOSTERING
2748	   1D357..1D35F; UNASSIGNED  # <reserved>..<reserved>
2749	   1D360..1D371; DISALLOWED  # COUNT ROD UNIT DIG ONE..COUNT ROD TE
2750	   1D372..1D3FF; UNASSIGNED  # <reserved>..<reserved>
2751	   1D400..1D454; FREE_PVAL   # MATH BOLD CAP A..MATH IT
2752	   1D455       ; UNASSIGNED  # <reserved>
2753	   1D456..1D49C; FREE_PVAL   # MATH ITAL SM I..MATH SC
2754	   1D49D       ; UNASSIGNED  # <reserved>
2755	   1D49E..1D49F; FREE_PVAL   # MATH SCRIPT CAP C..MATH
2756	   1D4A0..1D4A1; UNASSIGNED  # <reserved>..<reserved>
2757	   1D4A2       ; FREE_PVAL   # MATH SCRIPT CAP G
2758	   1D4A3..1D4A4; UNASSIGNED  # <reserved>..<reserved>
2759	   1D4A5..1D4A6; FREE_PVAL   # MATH SCRIPT CAP J..MATH
2760	   1D4A7..1D4A8; UNASSIGNED  # <reserved>..<reserved>
2761	   1D4A9..1D4AC; FREE_PVAL   # MATH SCRIPT CAP N..MATH
2762	   1D4AD       ; UNASSIGNED  # <reserved>
2763	   1D4AE..1D4B9; FREE_PVAL   # MATH SCRIPT CAP S..MATH
2764	   1D4BA       ; UNASSIGNED  # <reserved>
2765	   1D4BB       ; FREE_PVAL   # MATH SCRIPT SM F
2766	   1D4BC       ; UNASSIGNED  # <reserved>
2767	   1D4BD..1D4C3; FREE_PVAL   # MATH SCRIPT SM H..MATH SC
2768	   1D4C4       ; UNASSIGNED  # <reserved>
2769	   1D4C5..1D505; FREE_PVAL   # MATH SCRIPT SM P..MATH FR
2770	   1D506       ; UNASSIGNED  # <reserved>
2771	   1D507..1D50A; FREE_PVAL   # MATH FRAKTUR CAP D..MATH
2772	   1D50B..1D50C; UNASSIGNED  # <reserved>..<reserved>
2773	   1D50D..1D514; FREE_PVAL   # MATH FRAKTUR CAP J..MATH
2774	   1D515       ; UNASSIGNED  # <reserved>
2775	   1D516..1D51C; FREE_PVAL   # MATH FRAKTUR CAP S..MATH
2776	   1D51D       ; UNASSIGNED  # <reserved>
2777	   1D51E..1D539; FREE_PVAL   # MATH FRAKTUR SM A..MATH D
2778	   1D53A       ; UNASSIGNED  # <reserved>
2779	   1D53B..1D53E; FREE_PVAL   # MATH DOUBLE-STRUCK CAP D..MATHEM
2780	   1D53F       ; UNASSIGNED  # <reserved>
2781	   1D540..1D544; FREE_PVAL   # MATH DOUBLE-STRUCK CAP I..MATHEM
2782	   1D545       ; UNASSIGNED  # <reserved>
2783	   1D546       ; FREE_PVAL   # MATH DOUBLE-STRUCK CAP O
2784	   1D547..1D549; UNASSIGNED  # <reserved>..<reserved>
2785	   1D54A..1D550; FREE_PVAL   # MATH DOUBLE-STRUCK CAP S..MATHEM
2786	   1D551       ; UNASSIGNED  # <reserved>
2787	   1D552..1D6A5; FREE_PVAL   # MATH DOUBLE-STRUCK SM A..MATHEMAT
2788	   1D6A6..1D6A7; UNASSIGNED  # <reserved>..<reserved>
2789	   1D6A8..1D7CB; FREE_PVAL   # MATH BOLD CAP ALPHA..MATHEMATICA
2790	   1D7CC..1D7CD; UNASSIGNED  # <reserved>..<reserved>
2791	   1D7CE..1D7FF; FREE_PVAL   # MATH BOLD DIG ZERO..MATH M
2792	   1D800..1EDFF; UNASSIGNED  # <reserved>..<reserved>
2793	   1EE00..1EE03; FREE_PVAL   # ARAB MATH ALEF..ARAB MATH DAL
2794	   1EE04       ; UNASSIGNED  # <reserved>
2795	   1EE05..1EE1F; FREE_PVAL   # ARAB MATH WAW..ARAB MATH DOTLESS QAF
2796	   1EE20       ; UNASSIGNED  # <reserved>
2797	   1EE21..1EE22; FREE_PVAL   # ARAB MATH INIT BEH..ARAB MATH INIT JEE
2798	   1EE23       ; UNASSIGNED  # <reserved>
2799	   1EE24       ; FREE_PVAL   # ARAB MATH INIT HEH
2800	   1EE25..1EE26; UNASSIGNED  # <reserved>..<reserved>
2801	   1EE27       ; FREE_PVAL   # ARAB MATH INIT HAH
2802	   1EE28       ; UNASSIGNED  # <reserved>
2803	   1EE29..1EE32; FREE_PVAL   # ARAB MATH INIT YEH..ARAB MATH INIT QAF
2804	   1EE33       ; UNASSIGNED  # <reserved>
2805	   1EE34..1EE37; FREE_PVAL   # ARAB MATH INIT SHEEN..ARAB MATH INITIA
2806	   1EE38       ; UNASSIGNED  # <reserved>
2807	   1EE39       ; FREE_PVAL   # ARAB MATH INIT SHEEN
2808	   1EE3A       ; UNASSIGNED  # <reserved>
2809	   1EE3B       ; FREE_PVAL   # ARAB MATH INIT GHAIN
2810	   1EE3C..1EE41; UNASSIGNED  # <reserved>..<reserved>
2811	   1EE42       ; FREE_PVAL   # ARAB MATH TAILED JEEM
2812	   1EE43..1EE46; UNASSIGNED  # <reserved>..<reserved>
2813	   1EE47       ; FREE_PVAL   # ARAB MATH TAILED HAH
2814	   1EE48       ; UNASSIGNED  # <reserved>
2815	   1EE49       ; FREE_PVAL   # ARAB MATH TAILED YEH
2816	   1EE4A       ; UNASSIGNED  # <reserved>
2817	   1EE4B       ; FREE_PVAL   # ARAB MATH TAILED LAM
2818	   1EE4C       ; UNASSIGNED  # <reserved>
2819	   1EE4D..1EE4F; FREE_PVAL   # ARAB MATH TAILED NOON..ARAB MATH TAILE
2820	   1EE50       ; UNASSIGNED  # <reserved>
2821	   1EE51..1EE52; FREE_PVAL   # ARAB MATH TAILED QAF..ARAB MATH TAILED
2822	   1EE53       ; UNASSIGNED  # <reserved>
2823	   1EE54       ; FREE_PVAL   # ARAB MATH TAILED SHEEN
2824	   1EE55..1EE56; UNASSIGNED  # <reserved>..<reserved>
2825	   1EE57       ; FREE_PVAL   # ARAB MATH TAILED KHAH
2826	   1EE58       ; UNASSIGNED  # <reserved>
2827	   1EE59       ; FREE_PVAL   # ARAB MATH TAILED DAD
2828	   1EE5A       ; UNASSIGNED  # <reserved>
2829	   1EE5B       ; FREE_PVAL   # ARAB MATH TAILED GHAIN
2830	   1EE5C       ; UNASSIGNED  # <reserved>
2831	   1EE5D       ; FREE_PVAL   # ARAB MATH TAILED DOTLESS NOON
2832	   1EE5E       ; UNASSIGNED  # <reserved>
2833	   1EE5F       ; FREE_PVAL   # ARAB MATH TAILED DOTLESS GHAIN
2834	   1EE60       ; UNASSIGNED  # <reserved>
2835	   1EE61..1EE62; FREE_PVAL   # ARAB MATH STRETCHED BEH..ARAB MATH STR
2836	   1EE63       ; UNASSIGNED  # <reserved>
2837	   1EE64       ; FREE_PVAL   # ARAB MATH STRETCHED HEH
2838	   1EE65..1EE66; UNASSIGNED  # <reserved>..<reserved>
2839	   1EE67..1EE6A; FREE_PVAL   # ARAB MATH STRETCHED HAH..ARAB MATH STR
2840	   1EE6B       ; UNASSIGNED  # <reserved>
2841	   1EE6C..1EE72; FREE_PVAL   # ARAB MATH STRETCHED MEEM..ARAB MATH ST
2842	   1EE73       ; UNASSIGNED  # <reserved>
2843	   1EE74..1EE77; FREE_PVAL   # ARAB MATH STRETCHED SHEEN..ARAB MATH S
2844	   1EE78       ; UNASSIGNED  # <reserved>
2845	   1EE79..1EE7C; FREE_PVAL   # ARAB MATH STRETCHED DAD..ARAB MATH STR
2846	   1EE7D       ; UNASSIGNED  # <reserved>
2847	   1EE7E       ; FREE_PVAL   # ARAB MATH STRETCHED DOTLESS FEH
2848	   1EE7F       ; UNASSIGNED  # <reserved>
2849	   1EE80..1EE89; FREE_PVAL   # ARAB MATH LOOPED ALEF..ARAB MATH LOOPE
2850	   1EE8A       ; UNASSIGNED  # <reserved>
2851	   1EE8B..1EE9B; FREE_PVAL   # ARAB MATH LOOPED LAM..ARAB MATH LOOPED
2852	   1EE9C..1EEA0; UNASSIGNED  # <reserved>..<reserved>
2853	   1EEA1..1EEA3; FREE_PVAL   # ARAB MATH DOUBLE-STRUCK BEH..ARAB MATH
2854	   1EEA4       ; UNASSIGNED  # <reserved>
2855	   1EEA5..1EEA9; FREE_PVAL   # ARAB MATH DOUBLE-STRUCK WAW..ARAB MATH
2856	   1EEAA       ; UNASSIGNED  # <reserved>
2857	   1EEAB..1EEBB; FREE_PVAL   # ARAB MATH DOUBLE-STRUCK LAM..ARAB MATH
2858	   1EEBC..1EEEF; UNASSIGNED  # <reserved>..<reserved>
2859	   1EEF0..1EEF1; FREE_PVAL   # ARAB MATH OP MEEM W HAH W TATWHEEL..AR
2860	   1EEF2..1EFFF; UNASSIGNED  # <reserved>..<reserved>
2861	   1F000..1F02B; FREE_PVAL   # MAHJONG TILE EAST WIND..MAHJONG TILE B
2862	   1F02C..1F02F; UNASSIGNED  # <reserved>..<reserved>
2863	   1F030..1F093; FREE_PVAL   # DOMINO TILE HORIZ BACK..DOMINO TILE VE
2864	   1F094..1F09F; UNASSIGNED  # <reserved>..<reserved>
2865	   1F0A0..1F0AE; FREE_PVAL   # PLAY CARD BACK..PLAY CARD KING OF SPAD
2866	   1F0AF..1F0B0; UNASSIGNED  # <reserved>..<reserved>
2867	   1F0B1..1F0BE; FREE_PVAL   # PLAY CARD ACE OF HEARTS..PLAY CARD KIN
2868	   1F0BF..1F0C0; UNASSIGNED  # <reserved>..<reserved>
2869	   1F0C1..1F0CF; FREE_PVAL   # PLAY CARD ACE OF DIAMONDS..PLAY CARD B
2870	   1F0D0       ; UNASSIGNED  # <reserved>
2871	   1F0D1..1F0DF; FREE_PVAL   # PLAY CARD ACE OF CLUBS..PLAY CARD WHIT
2872	   1F0E0..1F0FF; UNASSIGNED  # <reserved>..<reserved>
2873	   1F100..1F10A; FREE_PVAL   # DIG ZERO FULL STOP..DIG NINE COMMA
2874	   1F10B..1F10F; UNASSIGNED  # <reserved>..<reserved>
2875	   1F110..1F12E; FREE_PVAL   # PARENTHESIZED LAT CAP LET A..CIRCLE
2876	   1F12F       ; UNASSIGNED  # <reserved>
2877	   1F130..1F16B; FREE_PVAL   # SQUARED LAT CAP LET A..RAISED MD SIGN
2878	   1F16C..1F16F; UNASSIGNED  # <reserved>..<reserved>
2879	   1F170..1F19A; FREE_PVAL   # NEG SQ LAT CAP LET A..SQUARED VS
2880	   1F19B..1F1E5; UNASSIGNED  # <reserved>..<reserved>
2881	   1F1E6..1F202; FREE_PVAL   # REG IND SYMB LET A..SQ KATAKANA SA
2882	   1F203..1F20F; UNASSIGNED  # <reserved>..<reserved>
2883	   1F210..1F23A; FREE_PVAL   # SQ CJK UNIF IDEO-624B..SQ CJK UNIF IDE
2884	   1F23B..1F23F; UNASSIGNED  # <reserved>..<reserved>
2885	   1F240..1F248; FREE_PVAL   # TORT SH BRACK CJK UNIF IDEO-672C..TORT
2886	   1F249..1F24F; UNASSIGNED  # <reserved>..<reserved>
2887	   1F250..1F251; FREE_PVAL   # CIRC IDEO ADVANTAGE..CIRC IDEO ACCEPT
2888	   1F252..1F2FF; UNASSIGNED  # <reserved>..<reserved>
2889	   1F300..1F320; FREE_PVAL   # CYCLONE..SHOOTING STAR
2890	   1F321..1F32F; UNASSIGNED  # <reserved>..<reserved>
2891	   1F330..1F335; FREE_PVAL   # CHESTNUT..CACTUS
2892	   1F336       ; UNASSIGNED  # <reserved>
2893	   1F337..1F37C; FREE_PVAL   # TULIP..BABY BOTTLE
2894	   1F37D..1F37F; UNASSIGNED  # <reserved>..<reserved>
2895	   1F380..1F393; FREE_PVAL   # RIBBON..GRADUATION CAP
2896	   1F394..1F39F; UNASSIGNED  # <reserved>..<reserved>
2897	   1F3A0..1F3C4; FREE_PVAL   # CAROUSEL HORSE..SURFER
2898	   1F3C5       ; UNASSIGNED  # <reserved>
2899	   1F3C6..1F3CA; FREE_PVAL   # TROPHY..SWIMMER
2900	   1F3CB..1F3DF; UNASSIGNED  # <reserved>..<reserved>
2901	   1F3E0..1F3F0; FREE_PVAL   # HOUSE BUILDING..EUROPEAN CASTLE
2902	   1F3F1..1F3FF; UNASSIGNED  # <reserved>..<reserved>
2903	   1F400..1F43E; FREE_PVAL   # RAT..PAW PRINTS
2904	   1F43F       ; UNASSIGNED  # <reserved>
2905	   1F440       ; FREE_PVAL   # EYES
2906	   1F441       ; UNASSIGNED  # <reserved>
2907	   1F442..1F4F7; FREE_PVAL   # EAR..CAMERA
2908	   1F4F8       ; UNASSIGNED  # <reserved>
2909	   1F4F9..1F4FC; FREE_PVAL   # VIDEOCASSETTE
2910	   1F4FD..1F4FF; UNASSIGNED  # <reserved>..<reserved>
2911	   1F500..1F53D; FREE_PVAL   # TWISTED RIGHTWARDS ARROWS..DOWN-POINTI
2912	   1F53E..1F53F; UNASSIGNED  # <reserved>..<reserved>
2913	   1F540..1F543; FREE_PVAL   # CIRCLED CROSS POMMEE..NOTCHED LEFT SEM
2914	   1F544..1F54F; UNASSIGNED  # <reserved>..<reserved>
2915	   1F550..1F567; FREE_PVAL   # CLOCK FACE ONE OCLOCK..CLOCK FACE TWEL
2916	   1F568..1F5FA; UNASSIGNED  # <reserved>..<reserved>
2917	   1F5FB..1F640; FREE_PVAL   # MOUNT FUJI..WEARY CAT FACE
2918	   1F641..1F644; UNASSIGNED  # <reserved>..<reserved>
2919	   1F645..1F650; FREE_PVAL   # FACE WITH NO GOOD GESTURE..PERSON W FO
2920	   1F650..1F67F; UNASSIGNED  # <reserved>..<reserved>
2921	   1F680..1F6C5; FREE_PVAL   # ROCKET..LEFT LUGGAGE
2922	   1F6C6..1F6FF; UNASSIGNED  # <reserved>..<reserved>
2923	   1F700..1F773; FREE_PVAL   # ALCHEMICAL SYMBOL FOR QUINTESSENCE..AL
2924	   1F774..1FFFF; UNASSIGNED  # <reserved>..<reserved>
2925	   20000..2A6D6; PVALID      # <CJK Ideograph Extension B>
2926	   2A6D7..2A6FF; UNASSIGNED  # <reserved>..<reserved>
2927	   2A700..2B734; PVALID      # <CJK Ideograph Extension C>
2928	   2A735..2A739; UNASSIGNED  # <reserved>..<reserved>
2929	   2A740..2B81D; PVALID      # <CJK Ideograph Extension D>
2930	   2F800..2FA1D; FREE_PVAL   # CJK COMP IDEO-2F800..CJK COMPA
2931	   2FA1E..2FFFD; UNASSIGNED  # <reserved>..<reserved>
2932	   2FFFE..2FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2933	   30000..3FFFD; UNASSIGNED  # <reserved>..<reserved>
2934	   3FFFE..3FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2935	   40000..4FFFD; UNASSIGNED  # <reserved>..<reserved>
2936	   4FFFE..4FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2937	   50000..5FFFD; UNASSIGNED  # <reserved>..<reserved>
2938	   5FFFE..5FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2939	   60000..6FFFD; UNASSIGNED  # <reserved>..<reserved>
2940	   6FFFE..6FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2941	   70000..7FFFD; UNASSIGNED  # <reserved>..<reserved>
2942	   7FFFE..7FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2943	   80000..8FFFD; UNASSIGNED  # <reserved>..<reserved>
2944	   8FFFE..8FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2945	   90000..9FFFD; UNASSIGNED  # <reserved>..<reserved>
2946	   9FFFE..9FFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2947	   A0000..AFFFD; UNASSIGNED  # <reserved>..<reserved>
2948	   AFFFE..AFFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2949	   B0000..BFFFD; UNASSIGNED  # <reserved>..<reserved>
2950	   BFFFE..BFFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2951	   C0000..CFFFD; UNASSIGNED  # <reserved>..<reserved>
2952	   CFFFE..CFFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2953	   D0000..DFFFD; UNASSIGNED  # <reserved>..<reserved>
2954	   DFFFE..DFFFF; DISALLOWED  # <noncharacter>..<noncharacter>
2955	   E0000       ; UNASSIGNED  # <reserved>
2956	   E0001       ; DISALLOWED  # LANGUAGE TAG
2957	   E0002..E001F; UNASSIGNED  # <reserved>..<reserved>
2958	   E0020..E007F; DISALLOWED  # TAG SPACE..CANCEL TAG
2959	   E0080..E00FF; UNASSIGNED  # <reserved>..<reserved>
2960	   E0100..E01EF; PVALID      # VAR SEL-17..VAR SEL-256
2961	   E01F0..EFFFD; UNASSIGNED  # <reserved>..<reserved>
2962	   EFFFE..10FFFF; DISALLOWED # <noncharacter>..<noncharacter>

2964	Appendix B.  Acknowledgements

2966	   The authors would like to acknowledge the comments and contributions
2967	   of the following individuals: David Black, Mark Davis, Alan DeKok,
2968	   Martin Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Paul
2969	   Hoffman, Jeffrey Hutzelman, Simon Josefsson, John Klensin, Alexey
2970	   Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, Pete Resnick,
2971	   Andrew Sullivan, Dave Thaler, and Yoshiro Yoneya.

2973	   Some algorithms and textual descriptions have been borrowed from

2975	   [RFC5892].  Some text regarding security has been borrowed from
2976	   [RFC5890] and [I-D.ietf-xmpp-6122bis].

2978	Authors' Addresses

2980	   Peter Saint-Andre
2981	   Cisco Systems, Inc.
2982	   1899 Wynkoop Street, Suite 600
2983	   Denver, CO  80202
2984	   USA

2986	   Phone: +1-303-308-3282
2987	   Email: psaintan@cisco.com

2989	   Marc Blanchet
2990	   Viagenie
2991	   246 Aberdeen
2992	   Quebec, QC  G1R 2E1
2993	   Canada

2995	   Email: Marc.Blanchet@viagenie.ca
2996	   URI:   http://www.viagenie.ca/