idnits 2.17.1
draft-ietf-precis-framework-09.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (July 10, 2013) is 3940 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Outdated reference: A later version (-12) exists of
draft-ietf-precis-mappings-02
** Downref: Normative reference to an Informational draft:
draft-ietf-precis-mappings (ref. 'I-D.ietf-precis-mappings')
-- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'
== Outdated reference: A later version (-19) exists of
draft-ietf-precis-nickname-06
== Outdated reference: A later version (-18) exists of
draft-ietf-precis-saslprepbis-02
== Outdated reference: A later version (-24) exists of
draft-ietf-xmpp-6122bis-07
-- Obsolete informational reference (is this intentional?): RFC 3454
(Obsoleted by RFC 7564)
-- Obsolete informational reference (is this intentional?): RFC 3490
(Obsoleted by RFC 5890, RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 3491
(Obsoleted by RFC 5891)
-- Obsolete informational reference (is this intentional?): RFC 5226
(Obsoleted by RFC 8126)
-- Obsolete informational reference (is this intentional?): RFC 5246
(Obsoleted by RFC 8446)
Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 7 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 PRECIS P. Saint-Andre
3 Internet-Draft Cisco Systems, Inc.
4 Obsoletes: 3454 (if approved) M. Blanchet
5 Intended status: Standards Track Viagenie
6 Expires: January 11, 2014 July 10, 2013
8 PRECIS Framework: Preparation and Comparison of Internationalized
9 Strings in Application Protocols
10 draft-ietf-precis-framework-09
12 Abstract
14 Application protocols using Unicode code points in protocol strings
15 need to properly prepare such strings in order to perform valid
16 comparison operations (e.g., for purposes of authentication or
17 authorization). This document defines a framework enabling
18 application protocols to perform the preparation and comparison of
19 internationalized strings (a.k.a. "PRECIS") in a way that depends on
20 the properties of Unicode code points and thus is agile with respect
21 to versions of Unicode. As a result, this framework provides a more
22 sustainable approach to the handling of internationalized strings
23 than the previous framework, known as Stringprep (RFC 3454). A
24 specification that reuses this framework can either directly use the
25 PRECIS string classes or subclass the PRECIS string classes as
26 needed. This framework takes an approach similar to the revised
27 internationalized domain names (IDNs) in applications (IDNA)
28 technology (RFC 5890, RFC 5891, RFC 5892, RFC 5893, RFC 5894) and
29 thus adheres to the high-level design goals described in the IAB's
30 recommendations regarding IDNs (RFC 4690), albeit for application
31 technologies other than the Domain Name System (DNS). This document
32 obsoletes RFC 3454.
34 Status of this Memo
36 This Internet-Draft is submitted in full conformance with the
37 provisions of BCP 78 and BCP 79.
39 Internet-Drafts are working documents of the Internet Engineering
40 Task Force (IETF). Note that other groups may also distribute
41 working documents as Internet-Drafts. The list of current Internet-
42 Drafts is at http://datatracker.ietf.org/drafts/current/.
44 Internet-Drafts are draft documents valid for a maximum of six months
45 and may be updated, replaced, or obsoleted by other documents at any
46 time. It is inappropriate to use Internet-Drafts as reference
47 material or to cite them other than as "work in progress."
48 This Internet-Draft will expire on January 11, 2014.
50 Copyright Notice
52 Copyright (c) 2013 IETF Trust and the persons identified as the
53 document authors. All rights reserved.
55 This document is subject to BCP 78 and the IETF Trust's Legal
56 Provisions Relating to IETF Documents
57 (http://trustee.ietf.org/license-info) in effect on the date of
58 publication of this document. Please review these documents
59 carefully, as they describe your rights and restrictions with respect
60 to this document. Code Components extracted from this document must
61 include Simplified BSD License text as described in Section 4.e of
62 the Trust Legal Provisions and are provided without warranty as
63 described in the Simplified BSD License.
65 Table of Contents
67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
69 3. String Classes . . . . . . . . . . . . . . . . . . . . . . . . 6
70 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 6
71 3.2. Order of Operations . . . . . . . . . . . . . . . . . . . 8
72 3.3. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 8
73 3.4. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 10
74 4. Use of PRECIS String Classes . . . . . . . . . . . . . . . . . 12
75 4.1. Principles . . . . . . . . . . . . . . . . . . . . . . . . 12
76 4.2. Subclassing . . . . . . . . . . . . . . . . . . . . . . . 13
77 4.3. Building Application-Layer Constructs . . . . . . . . . . 14
78 4.4. A Note about Spaces . . . . . . . . . . . . . . . . . . . 14
79 5. Code Point Properties . . . . . . . . . . . . . . . . . . . . 15
80 6. Category Definitions Used to Calculate Derived Property
81 Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
82 6.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 17
83 6.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 17
84 6.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 17
85 6.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 17
86 6.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 18
87 6.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 18
88 6.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 19
89 6.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 20
90 6.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 20
91 6.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 20
92 6.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . . 20
93 6.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . . 21
94 6.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 21
95 6.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . . 21
96 6.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 21
97 6.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 21
98 6.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 22
99 6.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 22
100 7. Calculation of the Derived Property . . . . . . . . . . . . . 22
101 8. Code Points . . . . . . . . . . . . . . . . . . . . . . . . . 23
102 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23
103 9.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 23
104 9.2. Use of the IdentifierClass . . . . . . . . . . . . . . . . 24
105 9.3. Use of the FreeformClass . . . . . . . . . . . . . . . . . 24
106 9.4. Local Character Set Issues . . . . . . . . . . . . . . . . 24
107 9.5. Visually Similar Characters . . . . . . . . . . . . . . . 25
108 9.6. Security of Passwords . . . . . . . . . . . . . . . . . . 26
109 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27
110 10.1. PRECIS Derived Property Value Registry . . . . . . . . . . 27
111 10.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . . 27
112 10.3. PRECIS Subclasses Registry . . . . . . . . . . . . . . . . 29
113 10.4. PRECIS Usage Registry . . . . . . . . . . . . . . . . . . 29
114 11. Interoperability Considerations . . . . . . . . . . . . . . . 31
115 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31
116 12.1. Normative References . . . . . . . . . . . . . . . . . . . 31
117 12.2. Informative References . . . . . . . . . . . . . . . . . . 32
118 Appendix A. Codepoint Table . . . . . . . . . . . . . . . . . . . 34
119 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 64
120 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 65
122 1. Introduction
124 As described in the problem statement for the preparation and
125 comparison of internationalized strings ("PRECIS") [RFC6885], many
126 IETF protocols have used the Stringprep framework [RFC3454] as the
127 basis for preparing and comparing protocol strings that contain
128 Unicode code points [UNICODE] outside the ASCII range [RFC20]. The
129 Stringprep framework was developed during work on the original
130 technology for internationalized domain names (IDNs), here called
131 "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the Stringprep
132 profile for IDNs. At the time, Stringprep was designed as a general
133 framework so that other application protocols could define their own
134 Stringprep profiles for the preparation and comparison of strings and
135 identifiers. Indeed, a number of application protocols defined such
136 profiles.
138 After the publication of [RFC3454] in 2002, several significant
139 issues arose with the use of Stringprep in the IDN case, as
140 documented in the IAB's recommendations regarding IDNs [RFC4690]
141 (most significantly, Stringprep was tied to Unicode version 3.2).
142 Therefore, the newer IDNA specifications, here called "IDNA2008"
143 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer
144 use Stringprep and Nameprep. This migration away from Stringprep for
145 IDNs has prompted other "customers" of Stringprep to consider new
146 approaches to the preparation and comparison of internationalized
147 strings (a.k.a. "PRECIS"), as described in [RFC6885].
149 This document defines a framework for a post-Stringprep approach to
150 the preparation and comparison of internationalized strings in
151 application protocols, based on several principles:
153 1. Define a small set of string classes appropriate for common
154 application protocol constructs such as usernames and free-form
155 strings.
156 2. Define each PRECIS string class in terms of Unicode code points
157 and their properties so that an algorithm can be used to
158 determine whether each code point or character category is valid,
159 disallowed, or unassigned.
160 3. Define string classes in terms of allowable code points, so that
161 any code point not explicitly allowed is forbidden.
162 4. Enable application protocols to subclass the PRECIS string
163 classes if needed, mainly to disallow particular code points that
164 are currently disallowed in the relevant application protocol
165 (e.g., characters with special or reserved meaning, such as "@"
166 and "/" when used as separators within identifiers).
167 5. Leave various mapping operations (e.g., case preservation or
168 lowercasing, Unicode normalization, mapping of certain characters
169 to other characters or to nothing, handling of full-width and
170 half-width characters, handling of right-to-left characters) as
171 the responsibility of application protocols, as was done for
172 IDNA2008 through an IDNA-specific mapping document [RFC5895].
174 It is expected that this framework will yield the following benefits:
176 o Application protocols will be more version-agile with regard to
177 the Unicode database.
178 o Implementers will be able to share code point tables and software
179 code across application protocols, most likely by means of
180 software libraries.
181 o End users will be able to acquire more accurate expectations about
182 the code points that are acceptable in various contexts. Given
183 this more uniform set of string classes, it is also expected that
184 copy/paste operations between software implementing different
185 application protocols will be more predictable and coherent.
187 Although this framework is similar to IDNA2008 and borrows some of
188 the character categories defined in [RFC5892], it defines additional
189 string classes and character categories to meet the needs of common
190 application protocols.
192 2. Terminology
194 Many important terms used in this document are defined in [RFC5890],
195 [RFC6365], [RFC6885], and [UNICODE].
197 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
198 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
199 "OPTIONAL" in this document are to be interpreted as described in
200 [RFC2119].
202 3. String Classes
204 3.1. Overview
206 IDNA2008 essentially defines a string class of internationalized
207 domain name (IDN), although it does not use the term "string class".
208 (This document does not define a string class for domain names, and
209 application protocols are strongly encouraged to use IDNA2008 as the
210 appropriate method to prepare domain names and hostnames.) Because
211 the IDN string class is designed to meet the particular requirements
212 of the Domain Name System (DNS), additional string classes are needed
213 for non-DNS applications.
215 Starting in 2010, various "customers" of Stringprep began to discuss
216 the need to define a post-Stringprep approach to the preparation and
217 comparison of internationalized strings. As a result of analyzing
218 existing Stringprep profiles, this community concluded that most
219 existing uses could be addressed by two string classes:
221 IdentifierClass: a sequence of letters, numbers, and symbols that is
222 used to identify or address a network entity such as a user
223 account, a venue (e.g., a chatroom), an information source (e.g.,
224 a data feed), or a collection of data (e.g., a file); the intent
225 is that this class will be very safe for use in a wide variety of
226 application protocols, with the result that safety has been
227 prioritized over inclusiveness for this class.
228 FreeformClass: a sequence of letters, numbers, symbols, spaces, and
229 other code points that is used for free-form strings, including
230 passwords as well as display elements such as human-friendly
231 nicknames in chatrooms; the intent is that this class will allow
232 nearly any Unicode character, with the result that inclusiveness
233 has been prioritized over safety for this class (e.g., protocol
234 designers, application developers, service providers, and end
235 users might not understand or be able to enter all of the
236 characters that can be included in the FreeformClass).
238 Although members of the community discussed the possibility of
239 defining other PRECIS string classes (e.g., a class falling somewhere
240 between the IdentifierClass and the FreeformClass), they concluded
241 that the IdentifierClass would be a safe choice meeting the needs of
242 many or even most application protocols, and that protocols needing a
243 wider range of Unicode characters could use the FreeformClass
244 directly or subclass it if needed.
246 The following subsections discuss the IdentifierClass and
247 FreeformClass in more detail, with reference to the dimensions
248 described in Section 3 of [RFC6885]. (Naturally, future documents
249 can define PRECIS string classes beyond the IdentifierClass and
250 FreeformClass; see Section 10.2.) Each string class (or a particular
251 usage thereof) is defined by the following behavioral rules:
253 Valid: defines which code points and character categories are
254 treated as valid input to the string.
255 Disallowed: defines which code points and character categories are
256 treated as disallowed for the string.
257 Unassigned: defines application behavior in the presence of code
258 points that are unassigned, i.e. unknown for the version of
259 Unicode the application is built upon.
261 Width Mapping: specifies if width mapping is performed on fullwidth
262 and halfwidth characters, and how the mapping is done (e.g.,
263 mapping fullwidth and halfwidth characters to their decomposition
264 equivalents).
265 Additional Mappings: specifies whether additional mappings are to be
266 applied, such as mapping of delimiter characters, mapping of
267 special characters (e.g., non-ASCII space characters to ASCII
268 space or certain characters to nothing), and case mapping based on
269 language and local context (see [I-D.ietf-precis-mappings]).
270 Case Mapping: specifies if case mapping is performed (instead of
271 case preservation) on uppercase and titlecase characters, and how
272 the mapping is done (e.g., mapping uppercase and titlecase
273 characters to their lowercase equivalents).
274 Normalization: defines which Unicode normalization form (D, KD, C,
275 or KC) is to be applied (see [UAX15]).
276 Directionality: defines application behavior in the presence of code
277 points that have directionality, in particular right-to-left code
278 points as defined in the Unicode database (see [UAX9]).
280 This document defines the valid, disallowed, and unassigned rules for
281 the IdentifierClass and FreeformClass. Application protocols that
282 use these string classes are responsible for defining the
283 normalization, case mapping, width mapping, and directionality rules,
284 as well as any additional mappings to be applied
286 3.2. Order of Operations
288 To ensure proper comparison, the following order of operations is
289 REQUIRED:
291 1. Width mapping
292 2. Additional mappings as specified in [I-D.ietf-precis-mappings]:
293 1. Delimiter mapping
294 2. Special mapping
295 3. Local case mapping
296 3. Non-local case mapping
297 4. Normalization
298 5. PRECIS protocol
300 3.3. IdentifierClass
302 Most application technologies need strings that can be used to refer
303 to, include, or communicate protocol strings like usernames, file
304 names, data feed identifiers, and chatroom names. We group such
305 strings into a class called "IdentifierClass" having the following
306 features.
308 3.3.1. Valid
310 o Code points traditionally used as letters and numbers in writing
311 systems, i.e., the LetterDigits ("A") category first defined in
312 [RFC5892] and listed here under Section 6.1.
313 o Code points in the range U+0021 through U+007E, i.e., the
314 (printable) ASCII7 ("K") rule defined under Section 6.11. These
315 code points are "grandfathered" into PRECIS and thus are valid
316 even if they would otherwise be disallowed according to the
317 property-based rules specified in the next section.
319 Although the PRECIS IdentifierClass re-uses the LetterDigits category
320 from IDNA2008, the range of characters allowed in the IdentifierClass
321 is wider than the range of characters allowed in IDNA2008. The main
322 reason is that IDNA2008 applies the Unstable category before the
323 LetterDigits category, thus disallowing uppercase characters, whereas
324 the IdentifierClass does not apply the Unstable category.
326 3.3.2. Disallowed
328 o Control characters, i.e., the Controls ("L") category defined
329 under Section 6.12.
330 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
331 category defined under Section 6.13.
332 o Space characters, i.e., the Spaces ("N") category defined under
333 Section 6.14.
334 o Symbol characters, i.e., the Symbols ("O") category defined under
335 Section 6.15.
336 o Punctuation characters, i.e., the Punctuation ("P") category
337 defined under Section 6.16.
338 o Any character that has a compatibility equivalent, i.e., the
339 HasCompat ("Q") category defined under Section 6.17. These code
340 points are disallowed even if they would otherwise be valid
341 according to the property-based rules specified in the previous
342 section.
343 o Letters and digits other than the "traditional" letters and digits
344 allowed in IDNs, i.e., the OtherLetterDigits ("R") category
345 defined under Section 6.18.
347 3.3.3. Unassigned
349 Any code points that are not yet assigned in the Unicode character
350 set SHALL be considered Unassigned for purposes of the
351 IdentifierClass.
353 3.3.4. Width Mapping
355 The width mapping rule MUST be specified by each application protocol
356 that uses or subclasses the IdentifierClass.
358 3.3.5. Additional Mappings
360 Additional mapping rules (if any) MUST be specified by each
361 application protocol that uses or subclasses the IdentifierClass (see
362 [I-D.ietf-precis-mappings]).
364 3.3.6. Case Mapping
366 The case mapping rule MUST be specified by each application protocol
367 that uses or subclasses the IdentifierClass.
369 3.3.7. Normalization
371 The Unicode normalization form MUST be specified by each application
372 protocol that uses or subclasses the IdentifierClass.
374 However, in accordance with [RFC5198], normalization form C (NFC) is
375 RECOMMENDED.
377 3.3.8. Directionality
379 The directionality rule MUST be specified by each application
380 protocol that uses or subclasses the IdentifierClass.
382 3.4. FreeformClass
384 Some application technologies need strings that can be used in a
385 free-form way, e.g., as a password in an authentication exchange (see
386 [I-D.ietf-precis-saslprepbis] or a nickname in a chatroom (see
387 [I-D.ietf-precis-nickname]). We group such things into a class
388 called "FreeformClass" having the following features.
390 Note: Consult Section 9.6 for relevant security considerations when
391 strings conforming to the FreeformClass, or a subclass thereof, are
392 used as passwords.
394 3.4.1. Valid
396 o Traditional letters and numbers, i.e., the LetterDigits ("A")
397 category first defined in [RFC5892] and listed here under
398 Section 6.1.
400 o Letters and digits other than the "traditional" letters and digits
401 allowed in IDNs, i.e., the OtherLetterDigits ("R") category
402 defined under Section 6.18.
403 o Code points in the range U+0021 through U+007E, i.e., the
404 (printable) ASCII7 ("K") rule defined under Section 6.11.
405 o Any character that has a compatibility equivalent, i.e., the
406 HasCompat ("Q") category defined under Section 6.17.
407 o Space characters, i.e., the Spaces ("N") category defined under
408 Section 6.14.
409 o Symbol characters, i.e., the Symbols ("O") category defined under
410 Section 6.15.
411 o Punctuation characters, i.e., the Punctuation ("P") category
412 defined under Section 6.16.
414 3.4.2. Disallowed
416 o Control characters, i.e., the Controls ("L") category defined
417 under Section 6.12.
418 o Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
419 category defined under Section 6.13.
421 3.4.3. Unassigned
423 Any code points that are not yet assigned in the Unicode character
424 set SHALL be considered Unassigned for purposes of the FreeformClass.
426 3.4.4. Width Mapping
428 The width mapping rule MUST be specified by each application protocol
429 that uses or subclasses the FreeformClass.
431 Because one aspect of Unicode normalization form KC is width mapping,
432 a PRECIS usage or subclass that uses NFKC does not need to specify
433 width mapping. However, if NFC is used then the usage or subclass
434 needs to specify whether to apply width mapping; in this case, width
435 mapping is in general RECOMMENDED because allowing fullwidth and
436 halfwidth characters to remain unmapped to their decomposition
437 equivalents would violate the principle of least user surprise. For
438 more information about the concept of width in East Asian scripts
439 within Unicode, see for instance [UAX11].
441 3.4.5. Additional Mappings
443 Additional mapping rules (if any) MUST be specified by each
444 application protocol that uses or subclasses the FreeformClass (see
445 [I-D.ietf-precis-mappings]).
447 3.4.6. Case Mapping
449 The case mapping rule MUST be specified by each application protocol
450 that uses or subclasses the FreeformClass.
452 In general, the combination of case preservation and case-insensitive
453 comparison of internationalized strings is NOT RECOMMENDED; instead,
454 application protocols SHOULD either (a) not preserve case but perform
455 case-insensitive comparison or (b) preserve case but perform case-
456 sensitive comparison.
458 In order to maximize entropy and minimize the potential for false
459 positives, it is NOT RECOMMENDED for application protocols to map
460 uppercase and titlecase code points to their lowercase equivalents
461 when strings conforming to the FreeformClass, or a subclass thereof,
462 are used in passwords; instead, it is RECOMMENDED to preserve the
463 case of all code points contained in such strings and then perform
464 case-sensitive comparison. See also the related discussion in
465 [I-D.ietf-precis-saslprepbis].
467 3.4.7. Normalization
469 The Unicode normalization form MUST be specified by each application
470 protocol that uses or subclasses the FreeformClass.
472 However, in accordance with [RFC5198], normalization form C (NFC) is
473 RECOMMENDED.
475 3.4.8. Directionality
477 The directionality rule MUST be specified by each application
478 protocol that uses or subclasses the FreeformClass.
480 4. Use of PRECIS String Classes
482 4.1. Principles
484 This document defines the valid, disallowed, and unassigned rules.
485 Application protocols that use the PRECIS string classes MUST define
486 the width mapping, additional mapping (if any), case mapping,
487 normalization, and directionality rules. That is, such definitions
488 MUST at a minimum specify the following:
490 Width Mapping: Whether fullwidth and halfwidth code points are to be
491 mapped to their decomposition equivalents.
492 Additional Mappings: Whether additional mappings are to be applied,
493 such as mapping of delimiter characters, mapping of special
494 characters (e.g., non-ASCII space characters to ASCII space or
495 certain characters to nothing), and case mapping based on language
496 and local context (see [I-D.ietf-precis-mappings]).
497 Case Mapping: Whether uppercase and titlecase code points are to be
498 (a) preserved or (b) mapped to lowercase.
499 Normalization: Which Unicode normalization form (D, KD, C, or KC) is
500 to be applied (see [UAX15] for background information); in
501 accordance with [RFC5198], NFC is RECOMMENDED.
502 Directionality: Whether any instance of the class that contains a
503 right-to-left code point is to be considered a right-to-left
504 string, or whether some other rule is to be applied (e.g., the
505 "Bidi Rule" from [RFC5893]).
507 4.2. Subclassing
509 Application protocols are allowed to subclass the PRECIS string
510 classes specified in this document. As the word "subclass" implies,
511 a subclass MUST NOT add as valid any code points or character
512 categories that are disallowed by the relevant PRECIS string class.
513 However, a subclass MAY do either of the following:
515 1. Exclude specific code points that are included in the relevant
516 PRECIS string class.
517 2. Exclude characters matching certain Unicode properties (e.g.,
518 math symbols) that are included in the relevant PRECIS string
519 class.
521 As a result, code points that are defined as valid for the PRECIS
522 string class being subclassed will be defined as disallowed for the
523 subclass.
525 Application protocols that subclass the PRECIS string classes MUST
526 register with the IANA as described under Section 10.3.
528 It is RECOMMENDED for subclass names to be of the form
529 "SubclassBaseClass", where the "Subclass" string is a differentiator
530 and "BaseClass" is the name of the PRECIS string class being
531 subclassed; for example, the subclass of the IdentifierClass used for
532 localparts in the Extensible Messaging and Presence Protocol (XMPP)
533 is named "LocalpartIdentifierClass" [I-D.ietf-xmpp-6122bis].
535 4.3. Building Application-Layer Constructs
537 Sometimes, an application-layer construct does not map directly to
538 one of the PRECIS string classes. Consider, for example, the "simple
539 user name" construct in the Simple Authentication and Security Layer
540 (SASL) [RFC4422]. Depending on the deployment, a simple user name
541 might take the form of a user's full name (e.g., the user's personal
542 name followed by a space and then the user's family name). Such a
543 simple user name cannot be defined as an instance of the
544 IdentifierClass, since space characters are not allowed in the
545 IdentifierClass; however, it could be defined using a space-separated
546 sequence of IdentifierClass instances, as in the following pseudo-
547 ABNF [RFC5234]:
549 fullname = namepart [1*(1*SP namepart)]
550 namepart = 1*(idpoint)
551 ;
552 ; an "idpoint" is a UTF-8 encoded Unicode code point
553 ; that conforms to the PRECIS IdentifierClass
555 Similar techniques could be used to define many application-layer
556 constructs, say of the form "user@domain" or "/path/to/file".
558 4.4. A Note about Spaces
560 With regard to the IdentiferClass, the consensus of the PRECIS
561 Working Group was that spaces are problematic for many reasons,
562 including:
564 o Many Unicode characters are confusable with ASCII space.
565 o Even if non-ASCII space characters are mapped to ASCII space
566 (U+0020), space characters are often not rendered in user
567 interfaces, leading to the possibility that human user might
568 consider a string containing spaces to be equivalent to the same
569 string without spaces.
570 o In some locales, some devices are known to generate a character
571 other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
572 user performs an action like hit the space bar on a keyboard.
574 One consequence of disallowing space characters in the
575 IdentifierClass might be to effectively discourage the use of ASCII
576 space (or, even more problematically, non-ASCII space characters)
577 within identifiers created in newer application protocols; given the
578 challenges involved in properly handling space characters in
579 identifiers and other protocol strings, the Working Group considered
580 this to be a feature, not a bug.
582 However, the FreeformClass does allow spaces, which enables
583 application protocols to define subclasses of the FreeformClass that
584 are more flexible than any profiles of the IdentifierClass.
586 5. Code Point Properties
588 In order to implement the string classes described above, this
589 document does the following:
591 1. Reviews and classifies the collections of code points in the
592 Unicode character set by examining various code point properties.
593 2. Defines an algorithm for determining a derived property value,
594 which can vary depending on the string class being used by the
595 relevant application protocol.
597 This document is not intended to specify precisely how derived
598 property values are to be applied in protocol strings. That
599 information is the responsibility of the protocol specification that
600 uses or subclasses a PRECIS string class from this document.
602 The value of the property is to be interpreted as follows.
604 PROTOCOL VALID Those code points that are allowed to be used in any
605 PRECIS string class (IdentifierClass and FreeformClass). Code
606 points with this property value are permitted for general use in
607 any string class. The abbreviated term PVALID is used to refer to
608 this value in the remainder of this document.
609 SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to
610 be used in specific string classes. Code points with this
611 property value are permitted for use in specific string classes.
612 In the remainder of this document, the abbreviated term *_PVAL is
613 used, where * = (NAME | FREE), i.e., either FREE_PVAL or ID_PVAL.
614 CONTEXTUAL RULE REQUIRED Some characteristics of the character, such
615 as its being invisible in certain contexts or problematic in
616 others, require that it not be used in labels unless specific
617 other characters or properties are present. The abbreviated term
618 CONTEXT is used to refer to this value in the remainder of this
619 document. As in IDNA2008, there are two subdivisions of
620 CONTEXTUAL RULE REQUIRED, the first for Join_controls (called
621 CONTEXTJ) and the second for other characters (called CONTEXTO).
622 DISALLOWED Those code points that must not permitted in any PRECIS
623 string class.
624 SPECIFIC CLASS DISALLOWED Those code points that are not to be
625 included in a specific string class. Code points with this
626 property value are not permitted in one of the string classes but
627 might be permitted in others. In the remainder of this document,
628 the abbreviated term *_DIS is used, where * = (NAME | FREE), i.e.,
629 either FREE_DIS or ID_DIS.
631 UNASSIGNED Those code points that are not designated (i.e. are
632 unassigned) in the Unicode Standard.
634 The mechanisms described here allow determination of the value of the
635 property for future versions of Unicode (including characters added
636 after Unicode 5.2 or 6.1 depending on the category, since some
637 categories in this document are reused from IDNA2008 and therefore
638 were defined at the time of Unicode 5.2). Changes in Unicode
639 properties that do not affect the outcome of this process do not
640 affect this framework. For example, a character can have its Unicode
641 General_Category value [UNICODE] change from So to Sm, or from Lo to
642 Ll, without affecting the algorithm results. Moreover, even if such
643 changes were to result, the BackwardCompatible list (Section 6.7) can
644 be adjusted to ensure the stability of the results.
646 Some code points need to be allowed in exceptional circumstances, but
647 ought to be excluded in all other cases; these rules are also
648 described in other documents. The most notable of these are the Join
649 Control characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
650 NON-JOINER. Both of them have the derived property value CONTEXTJ.
651 A character with the derived property value CONTEXTJ or CONTEXTO
652 (CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
653 rule has been established and the context of the character is
654 consistent with that rule. It is invalid to generate a string
655 containing these characters unless such a contextual rule is found
656 and satisfied. PRECIS does not define its own contextual rules, but
657 instead re-uses the contextual rules defined for IDNA2008; please see
658 Appendix A of [RFC5892] for more information.
660 6. Category Definitions Used to Calculate Derived Property Value
662 The derived property obtains its value based on a two-step procedure:
664 1. Characters are placed in one or more character categories either
665 (1) based on core properties defined by the Unicode Standard or
666 (2) by treating the code point as an exception and addressing the
667 code point based on its code point value. These categories are
668 not mutually exclusive.
669 2. Set operations are used with these categories to determine the
670 values for a property that is specific to a given string class.
671 These operations are specified under Section 7.
673 (Note: Unicode property names and property value names might have
674 short abbreviations, such as "gc" for the General_Category property
675 and "Ll" for the Lowercase_Letter property value of the gc property.)
677 In the following specification of character categories, the operation
678 that returns the value of a particular Unicode character property for
679 a code point is designated by using the formal name of that property
680 (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for
681 "code point". For example, the value of the General_Category
682 property for a code point is indicated by General_Category(cp).
684 The first ten categories (A-J) shown below were previously defined
685 for IDNA2008 and are copied directly from [RFC5892]. Some of these
686 categories are reused in PRECIS and some of them are not; however,
687 the lettering of categories is retained to prevent overlap and to
688 ease implementation of both IDNA2008 and PRECIS in a single software
689 application. The next seven categories (K-Q) are specific to PRECIS.
691 6.1. LetterDigits (A)
693 Note: This category is defined in [RFC5892] and copied here for use
694 in PRECIS.
696 A: General_Category(cp) is in {Ll, Lu, Lm, Lo, Mn, Mc, Nd}
698 These rules identify characters commonly used in mnemonics and often
699 informally described as "language characters".
701 For more information, see section 4.5 of [UNICODE].
703 The categories used in this rule are:
704 o Ll - Lowercase_Letter
705 o Lu - Uppercase_Letter
706 o Lm - Modifier_Letter
707 o Lo - Other_Letter
708 o Mn - Nonspacing_Mark
709 o Mc - Spacing_Mark
710 o Nd - Decimal_Number
712 6.2. Unstable (B)
714 Note: This category is defined in [RFC5892] but not used in PRECIS.
716 6.3. IgnorableProperties (C)
718 Note: This category is defined in [RFC5892] but not used in PRECIS.
719 See the "PrecisIgnorableProperties (M)" category below for a more
720 inclusive category used in PRECIS identifiers.
722 6.4. IgnorableBlocks (D)
724 Note: This category is defined in [RFC5892] but not used in PRECIS.
726 6.5. LDH (E)
728 Note: This category is defined in [RFC5892] but not used in PRECIS.
729 See the "ASCII7 (K)" category below for a more inclusive category
730 used in PRECIS identifiers.
732 6.6. Exceptions (F)
734 Note: This category is defined in [RFC5892] and used in PRECIS to
735 ensure consistent treatment of the relevant code points.
737 F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
738 0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
739 0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
740 06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
741 302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
742 30FB}
744 This category explicitly lists code points for which the category
745 cannot be assigned using only the core property values that exist in
746 the Unicode standard. The values are according to the table below:
748 PVALID -- Would otherwise have been DISALLOWED
750 00DF; PVALID # LATIN SMALL LETTER SHARP S
751 03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA
752 06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND
753 06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN
754 0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG
755 3007; PVALID # IDEOGRAPHIC NUMBER ZERO
757 CONTEXTO -- Would otherwise have been DISALLOWED
759 00B7; CONTEXTO # MIDDLE DOT
760 0375; CONTEXTO # GREEK LOWER NUMERAL SIGN (KERAIA)
761 05F3; CONTEXTO # HEBREW PUNCTUATION GERESH
762 05F4; CONTEXTO # HEBREW PUNCTUATION GERSHAYIM
763 30FB; CONTEXTO # KATAKANA MIDDLE DOT
765 CONTEXTO -- Would otherwise have been PVALID
767 0660; CONTEXTO # ARABIC-INDIC DIGIT ZERO
768 0661; CONTEXTO # ARABIC-INDIC DIGIT ONE
769 0662; CONTEXTO # ARABIC-INDIC DIGIT TWO
770 0663; CONTEXTO # ARABIC-INDIC DIGIT THREE
771 0664; CONTEXTO # ARABIC-INDIC DIGIT FOUR
772 0665; CONTEXTO # ARABIC-INDIC DIGIT FIVE
773 0666; CONTEXTO # ARABIC-INDIC DIGIT SIX
774 0667; CONTEXTO # ARABIC-INDIC DIGIT SEVEN
775 0668; CONTEXTO # ARABIC-INDIC DIGIT EIGHT
776 0669; CONTEXTO # ARABIC-INDIC DIGIT NINE
777 06F0; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO
778 06F1; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ONE
779 06F2; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT TWO
780 06F3; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT THREE
781 06F4; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FOUR
782 06F5; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FIVE
783 06F6; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SIX
784 06F7; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SEVEN
785 06F8; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT EIGHT
786 06F9; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT NINE
788 DISALLOWED -- Would otherwise have been PVALID
790 0640; DISALLOWED # ARABIC TATWEEL
791 07FA; DISALLOWED # NKO LAJANYALAN
792 302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK
793 302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK
794 3031; DISALLOWED # VERTICAL KANA REPEAT MARK
795 3032; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
796 3033; DISALLOWED # VERTICAL KANA REPEAT MARK UPPER HALF
797 3034; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
798 UPPER HA
799 3035; DISALLOWED # VERTICAL KANA REPEAT MARK LOWER HALF
800 303B; DISALLOWED # VERTICAL IDEOGRAPHIC ITERATION MARK
802 6.7. BackwardCompatible (G)
804 Note: This category is defined in [RFC5892] and copied here for use
805 in PRECIS. Because of how the PRECIS string classes are defined,
806 only changes that would result in code points being added to or
807 removed from the LetterDigits ("A") category would result in
808 backward-incompatible modifications to code point assignments.
809 Therefore, management of this category is handled via the processes
810 specified in [RFC5892].
812 G: cp is in {}
814 This category includes the code points for which property values in
815 versions of Unicode after 5.2 have changed in such a way that the
816 derived property value would no longer be PVALID or DISALLOWED. If
817 changes are made to future versions of Unicode so that code points
818 might change property value from PVALID or DISALLOWED, then this
819 table can be updated and keep special exception values so that the
820 property values for code points stay stable.
822 6.8. JoinControl (H)
824 Note: This category is defined in [RFC5892] and copied here for use
825 in PRECIS.
827 H: Join_Control(cp) = True
829 This category consists of Join Control characters (i.e., they are not
830 in LetterDigits (Section 6.1) but are still required in strings under
831 some circumstances).
833 6.9. OldHangulJamo (I)
835 Note: This category is defined in [RFC5892] and copied here for use
836 in PRECIS.
838 I: Hangul_Syllable_Type(cp) is in {L, V, T}
840 This category consists of all conjoining Hangul Jamo (Leading Jamo,
841 Vowel Jamo, and Trailing Jamo).
843 Elimination of conjoining Hangul Jamos from the set of PVALID
844 characters results in restricting the set of Korean PVALID characters
845 just to preformed, modern Hangul syllable characters. Old Hangul
846 syllables, which must be spelled with sequences of conjoining Hangul
847 Jamos, are not PVALID for string classes.
849 6.10. Unassigned (J)
851 Note: This category is defined in [RFC5892] and copied here for use
852 in PRECIS.
854 J: General_Category(cp) is in {Cn} and
855 Noncharacter_Code_Point(cp) = False
857 This category consists of code points in the Unicode character set
858 that are not (yet) assigned. It should be noted that Unicode
859 distinguishes between 'unassigned code points' and 'unassigned
860 characters'. The unassigned code points are all but (Cn -
861 Noncharacters), while the unassigned *characters* are all but (Cn +
862 Cs).
864 6.11. ASCII7 (K)
866 This PRECIS-specific category exempts most characters in the
867 (printable) ASCII-7 range from other rules that might be applied
868 during PRECIS processing, on the assumption that these code points
869 are in such wide use that disallowing them would be counter-
870 productive.
872 K: cp is in {0021..007E}
874 6.12. Controls (L)
876 L: Control(cp) = True
878 6.13. PrecisIgnorableProperties (M)
880 This PRECIS-specific category is used to group code points that are
881 not recommended for use in PRECIS string classes.
883 M: Default_Ignorable_Code_Point(cp) = True or
884 Noncharacter_Code_Point(cp) = True
886 The definition for Default_Ignorable_Code_Point can be found in the
887 DerivedCoreProperties.txt [2] file, and at the time of Unicode 6.1 is
888 as follows:
890 Other_Default_Ignorable_Code_Point
891 + Cf (Format characters)
892 + Variation_Selector
893 - White_Space
894 - FFF9..FFFB (Annotation Characters)
895 - 0600..0604, 06DD, 070F, 110BD (exceptional Cf characters
896 that should be visible)
898 6.14. Spaces (N)
900 This PRECIS-specific category is used to group code points that are
901 space characters.
903 N: General_Category(cp) is in {Zs}
905 6.15. Symbols (O)
907 This PRECIS-specific category is used to group code points that are
908 symbols.
910 O: General_Category(cp) is in {Sm, Sc, Sk, So}
912 6.16. Punctuation (P)
914 This PRECIS-specific category is used to group code points that are
915 punctuation characters.
917 P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}
919 6.17. HasCompat (Q)
921 This PRECIS-specific category is used to group code points that have
922 compatibility equivalents as explained in Chapter 2 and Chapter 3 of
923 [UNICODE].
925 Q: toNFKC(cp) != cp
927 The toNFKC() operation returns the code point in normalization form
928 KC. For more information, see Section 5 of [UAX15].
930 6.18. OtherLetterDigits (R)
932 This PRECIS-specific category is used to group code points that are
933 letters and digits other than the "traditional" letters and digits
934 grouped under the LetterDigits (A) class (see Section 6.1).
936 R: General_Category(cp) is in {Lt, Nl, No, Me}
938 7. Calculation of the Derived Property
940 Possible values of the derived property are:
942 o PVALID
943 o ID_PVAL
944 o FREE_PVAL
945 o CONTEXTJ
946 o CONTEXTO
947 o DISALLOWED
948 o ID_DIS
949 o FREE_DIS
950 o UNASSIGNED
952 Note: The value of the derived property calculated can depend on the
953 string class; for example, if an identifier used in an application
954 protocol is defined as using or subclassing the PRECIS
955 IdentifierClass then a space character such as U+0020 would be
956 assigned to ID_DIS, whereas if an identifier is defined as using or
957 subclassing the PRECIS FreeformClass then the character would be
958 assigned to FREE_PVAL. For the sake of brevity, the designation
959 "FREE_PVAL" is used in the code point tables, instead of the longer
960 designation "ID_DIS or FREE_PVAL". In practice, the derived
961 properties ID_PVAL and FREE_DIS are not used in this specification,
962 since every ID_PVAL code point is PVALID and every FREE_DIS code
963 point is DISALLOWED.
965 The algorithm to calculate the value of the derived property is as
966 follows. (Note: Use of the name of a rule (such as "Exception")
967 implies the set of code points that the rule defines, whereas the
968 same name as a function call (such as "Exception(cp)") implies the
969 value that the code point has in the Exceptions table.)
971 If .cp. .in. Exceptions Then Exceptions(cp);
972 Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
973 Else If .cp. .in. Unassigned Then UNASSIGNED;
974 Else If .cp. .in. ASCII7 Then PVALID;
975 Else If .cp. .in. JoinControl Then CONTEXTJ;
976 Else If .cp. .in. OldHangulJamo Then DISALLOWED;
977 Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
978 Else If .cp. .in. Controls Then DISALLOWED;
979 Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
980 Else If .cp. .in. LetterDigits Then PVALID;
981 Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
982 Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
983 Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
984 Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
985 Else DISALLOWED;
987 8. Code Points
989 The Categories and Rules defined under Section 6 and Section 7 apply
990 to all Unicode code points. The table in Appendix A shows, for
991 illustrative purposes, the consequences of the categories and
992 classification rules, and the resulting property values.
994 The list of code points that can be found in Appendix A is non-
995 normative. Instead, the rules defined by Section 6 and Section 7 are
996 normative, and any tables are derived from the rules.
998 9. Security Considerations
1000 9.1. General Issues
1002 The security of applications that use this framework can depend in
1003 part on the proper preparation and comparison of internationalized
1004 strings. For example, such strings can be used to make
1005 authentication and authorization decisions, and the security of an
1006 application could be compromised if an entity providing a given
1007 string is connected to the wrong account or online resource based on
1008 different interpretations of the string.
1010 Specifications of application protocols that use this framework are
1011 encouraged to describe how internationalized strings are used in the
1012 protocol, including the security implications of any false positives
1013 and false negatives that might result from various comparison
1014 operations. For some helpful guidelines, refer to [RFC6943],
1015 [RFC5890], [UTR36], and [UTR39].
1017 9.2. Use of the IdentifierClass
1019 Strings that conform to the IdentifierClass and any subclass thereof
1020 are intended to be relatively safe for use in a broad range of
1021 applications, primarily because they include only letters, digits,
1022 and "grandfathered" non-space characters from the ASCII range; thus
1023 they exclude spaces, characters with compatibility equivalents, and
1024 almost all symbols and punctuation marks. However, because such
1025 strings can still include so-called confusable characters (see
1026 Section 9.5, protocol designers and implementers are encouraged to
1027 pay close attention to the security considerations described
1028 elsewhere in this document.
1030 9.3. Use of the FreeformClass
1032 Strings that conform to the FreeformClass and many subclasses thereof
1033 can include virtually any Unicode character. This makes the
1034 FreeformClass quite expressive, but also problematic from the
1035 perspective of possible user confusion. Protocol designers are
1036 hereby warned that the FreeformClass contains codepoints they might
1037 not understand, and are encouraged to use or subclass the
1038 IdentifierClass wherever feasible; however, if an application
1039 protocol requires more code points than are allowed by the
1040 IdentifierClass, protocol designers are encouraged to define a
1041 subclass of the FreeformClass that restricts the allowable code
1042 points as tightly as possible. (The working group considered the
1043 option of allowing superclasses as well as subclasses of PRECIS
1044 string classes, but decided against allowing superclasses to reduce
1045 the likelihood of security and interoperability problems.)
1047 9.4. Local Character Set Issues
1049 When systems use local character sets other than ASCII and Unicode,
1050 these specifications leave the problem of converting between the
1051 local character set and Unicode up to the application or local
1052 system. If different applications (or different versions of one
1053 application) implement different rules for conversions among coded
1054 character sets, they could interpret the same name differently and
1055 contact different application servers or other network entities.
1056 This problem is not solved by security protocols, such as Transport
1057 Layer Security (TLS) [RFC5246] and the Simple Authentication and
1058 Security Layer (SASL) [RFC4422], that do not take local character
1059 sets into account.
1061 9.5. Visually Similar Characters
1063 Some characters are visually similar and thus can cause confusion
1064 among humans. Such characters are often called "confusable
1065 characters" or "confusables".
1067 The problem of confusable characters is not necessarily caused by the
1068 use of Unicode code points outside the ASCII range. For example, in
1069 some presentations and to some individuals the string "ju1iet"
1070 (spelled with the Arabic numeral one as the third character) might
1071 appear to be the same as "juliet" (spelled with the lowercase version
1072 of the letter "L"), especially on casual visual inspection. This
1073 phenomenon is sometimes called "typejacking".
1075 However, the problem is made more serious by introducing the full
1076 range of Unicode code points into protocol strings. For example, the
1077 characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
1078 Cherokee block look similar to the ASCII characters "STPETER" as they
1079 might look when presented using a "creative" font family.
1081 In some examples of confusable characters, it is unlikely that the
1082 average human could tell the difference between the real string and
1083 the fake string. (Indeed, there is no programmatic way to
1084 distinguish with full certainty which is the fake string and which is
1085 the real string; in some contexts, the string formed of Cherokee
1086 characters might be the real string and the string formed of ASCII
1087 characters might be the fake string.) Because PRECIS-compliant
1088 strings can contain almost any properly-encoded Unicode code point,
1089 it can be relatively easy to fake or mimic some strings in systems
1090 that use the PRECIS framework. The fact that some strings are easily
1091 confused introduces security vulnerabilities of the kind that have
1092 also plagued the World Wide Web, specifically the phenomenon known as
1093 phishing.
1095 Despite the fact that some specific suggestions about identification
1096 and handling of confusable characters appear in the Unicode Security
1097 Considerations [UTR36], it is also true (as noted in [RFC5890]) that
1098 "there are no comprehensive technical solutions to the problems of
1099 confusable characters". Because it is impossible to map visually
1100 similar characters without a great deal of context (such as knowing
1101 the font families used), the PRECIS framework does nothing to map
1102 similar-looking characters together, nor does it prohibit some
1103 characters because they look like others.
1105 Nevertheless, specifications for application protocols that use this
1106 framework MUST describe how confusable characters can be used to
1107 compromise the security of systems that use the protocol in question,
1108 along with any protocol-specific suggestions for overcoming those
1109 threats. In particular, software implementations and service
1110 deployments that use PRECIS-based technologies are strongly
1111 encouraged to define and implement consistent policies regarding the
1112 registration, storage, and presentation of visually similar
1113 characters. The following recommendations are appropriate:
1115 1. An application service SHOULD define a policy that specifies the
1116 scripts or blocks of characters that the service will allow to be
1117 registered (e.g., in an account name) or stored (e.g., in a file
1118 name). Such a policy SHOULD be informed by the languages and
1119 scripts that are used to write registered account names; in
1120 particular, to reduce confusion, the service SHOULD forbid
1121 registration or storage of stings that contain characters from
1122 more than one script and SHOULD restrict registrations to
1123 characters drawn from a very small number of scripts (e.g.,
1124 scripts that are well-understood by the administrators of the
1125 service, to improve manageability).
1126 2. User-oriented application software SHOULD define a policy that
1127 specifies how internationalized strings will be presented to a
1128 human user. Because every human user of such software has a
1129 preferred language or a small set of preferred languages, the
1130 software SHOULD gather that information either explicitly from
1131 the user or implicitly via the operating system of the user's
1132 device. Furthermore, because most languages are typically
1133 represented by a single script or a small set of scripts, and
1134 because most scripts are typically contained in one or more
1135 blocks of characters, the software SHOULD warn the user when
1136 presenting a string that mixes characters from more than one
1137 script or block, or that uses characters outside the normal range
1138 of the user's preferred language(s). (Such a recommendation is
1139 not intended to discourage communication across different
1140 communities of language users; instead, it recognizes the
1141 existence of such communities and encourages due caution when
1142 presenting unfamiliar scripts or characters to human users.)
1144 9.6. Security of Passwords
1146 Two goals of passwords are to maximize the amount of entropy and to
1147 minimize the potential for false positives. These goals can be
1148 achieved in part by allowing a wide range of code points and by
1149 ensuring that passwords are handled in such a way that code points
1150 are not compared aggressively. Therefore, it is NOT RECOMMENDED for
1151 application protocols to subclass the FreeformClass for use in
1152 passwords in a way that removes entire categories (e.g., by
1153 disallowing symbols or punctuation). Furthermore, it is NOT
1154 RECOMMENDED for application protocols to map uppercase and titlecase
1155 code points to their lowercase equivalents in such strings; instead,
1156 it is RECOMMENDED to preserve the case of all code points contained
1157 in such strings and to compare them in a case-sensitive manner.
1159 That said, software implementers need to be aware that there exist
1160 tradeoffs between entropy and usability. For example, allowing a
1161 user to establish a password containing "uncommon" code points might
1162 make it difficult for the user to access a service when using an
1163 unfamiliar or constrained input device.
1165 Some application protocols use passwords directly, whereas others
1166 reuse technologies that themselves process passwords (one example of
1167 such a technology is the Simple Authentication and Security Layer
1168 [RFC4422]). Moreover, passwords are often carried by a sequence of
1169 protocols with backend authentication systems or data storage systems
1170 such as RADIUS [RFC2865] and LDAP [RFC4510]. Developers of
1171 application protocols are encouraged to look into reusing these
1172 profiles instead of defining new ones, so that end-user expectations
1173 about passwords are consistent no matter which application protocol
1174 is used.
1176 10. IANA Considerations
1178 10.1. PRECIS Derived Property Value Registry
1180 IANA is requested to create a PRECIS-specific registry with the
1181 Derived Properties for the versions of Unicode that are released
1182 after (and including) version 6.1. The derived property value is to
1183 be calculated in cooperation with a designated expert [RFC5226]
1184 according to the rules specified under Section 6 and Section 7, not
1185 by copying the non-normative table found under Appendix A.
1187 The IESG is to be notified if backward-incompatible changes to the
1188 table of derived properties are discovered or if other problems arise
1189 during the process of creating the table of derived property values
1190 or during expert review. Changes to the rules defined under
1191 Section 6 and Section 7) require IETF Review, as described in
1192 [RFC5226].
1194 10.2. PRECIS Base Classes Registry
1196 IANA is requested to create a registry of PRECIS string classes. In
1197 accordance with [RFC5226], the registration policy is "RFC Required".
1199 The registration template is as follows:
1201 Base Class: [the name of the PRECIS string class]
1202 Description: [a brief description of the PRECIS string class and its
1203 intended use, e.g., "A sequence of letters, numbers, and symbols
1204 that is used to identify or address a network entity."]
1205 Width Mapping: [the behavioral rule for handling of width, e.g.,
1206 "Map fullwidth and halfwidth characters to their decomposition
1207 equivalents."]
1208 Additional Mappings: [any additional mappings are required or
1209 recommended, e.g., "Map non-ASCII space characters to ASCII
1210 space."; or "Application Specific" if to be defined by protocols
1211 that use the PRECIS string class]
1212 Case Mapping: [the behavioral rule for handling of case, e.g., "Map
1213 uppercase and titlecase characters to lowercase."; or "Application
1214 Specific" if to be defined by protocols that use the PRECIS string
1215 class]
1216 Normalization: [which Unicode normalization form is applied, e.g.,
1217 "NFC"; or "Application Specific" if to be defined by protocols
1218 that use the PRECIS string class]
1219 Directionality: [the behavioral rule for handling of right-to-left
1220 code points, e.g., "The 'Bidi Rule' defined in RFC 5893 applies.";
1221 or "Application Specific" if to be defined by protocols that use
1222 the PRECIS string class]
1223 Specification: [the RFC number]
1225 The initial registrations are as follows:
1227 Base Class: FreeformClass.
1228 Description: A sequence of letters, numbers, symbols, spaces, and
1229 other code points that is used for free-form strings.
1230 Width Mapping: Application Specific.
1231 Additional Mappings: Application Specific.
1232 Case Mapping: Application Specific.
1233 Normalization: Application Specific.
1234 Directionality: Application Specific.
1235 Specification: RFC XXXX. [Note to RFC Editor: please change XXXX to
1236 the number issued for this specification.]
1238 Base Class: IdentifierClass.
1239 Description: A sequence of letters, numbers, and symbols that is
1240 used to identify or address a network entity.
1241 Width Mapping: Application Specific.
1242 Additional Mappings: Application Specific.
1243 Case Mapping: Application Specific.
1244 Normalization: Application Specific.
1245 Directionality: Application Specific.
1246 Specification: RFC XXXX. [Note to RFC Editor: please change XXXX to
1247 the number issued for this specification.]
1249 10.3. PRECIS Subclasses Registry
1251 IANA is requested to create a registry of subclasses that use the
1252 PRECIS string classes. In accordance with [RFC5226], the
1253 registration policy is "Expert Review". This policy was chosen in
1254 order to ensure that "customers" of PRECIS receive appropriate
1255 guidance regarding the sometimes complex and subtle
1256 internationalization issues related to subclassing of PRECIS string
1257 classes.
1259 The registration template is as follows:
1261 Subclass: [the name of the subclass]
1262 Base Class: [which PRECIS string class is being subclassed]
1263 Exclusions: [a brief description of the specific code points that
1264 are excluded or of the properties based on which characters are
1265 excluded, e.g., "Eight legacy characters in the ASCII range" or
1266 "Any character that has a compatibility equivalent, i.e., the
1267 HasCompat category"]
1268 Specification: [a pointer to relevant documentation, such as an RFC
1269 or Internet-Draft]
1271 In order to request a review, the registrant shall send a completed
1272 template to the precis@ietf.org list or its designated successor.
1274 Factors to focus on while reviewing subclass registrations include
1275 the following:
1277 o Is the problem well-defined?
1278 o Is it clear what applications will use this subclass?
1279 o Would an existing PRECIS string class or subclass solve the
1280 problem?
1281 o Are the defined exclusions a reasonable solution to the problem
1282 for the relevant applications?
1283 o Is the subclass clearly defined?
1284 o Does the subclass reduce the degree to which human users could be
1285 surprised by application behavior (the "principle of least user
1286 surprise")?
1287 o Is the subclass based on an appropriate dividing line between user
1288 interface (culture, context, intent, locale, device limitations,
1289 etc.) and the use of conformant strings in protocol elements?
1290 o Does the subclass introduce any new security concerns (e.g., false
1291 positives for authentication or authorization)?
1293 10.4. PRECIS Usage Registry
1295 IANA is requested to create a registry of application protocols that
1296 use the PRECIS string classes. The registry will include one entry
1297 for each use (e.g., if a protocol uses both the IdentifierClass and
1298 the FreeformClass then the specification for that protocol would
1299 submit two registrations). In accordance with [RFC5226], the
1300 registration policy is "Expert Review". This policy was chosen in
1301 order to ensure that "customers" of PRECIS receive appropriate
1302 guidance regarding the sometimes complex and subtle
1303 internationalization issues related to use of PRECIS string classes.
1305 The registration template is as follows:
1307 Applicability: [the specific protocol elements to which this usage
1308 applies, e.g., "Localparts in XMPP addresses."]
1309 Base Class: [the PRECIS string class that is being used or
1310 subclassed]
1311 Subclass: [whether the protocol has defined a subclass of the PRECIS
1312 string class and, if so, the name of the subclass, e.g., "Yes,
1313 LocalpartIdentifierClass."]
1314 Replaces: [the Stringprep profile that this PRECIS usage replaces,
1315 if any]
1316 Width Mapping: [the behavioral rule for handling of width, e.g.,
1317 "Map fullwidth and halfwidth characters to their decomposition
1318 equivalents."]
1319 Additional Mappings: [any additional mappings are required or
1320 recommended, e.g., "Map non-ASCII space characters to ASCII
1321 space."]
1322 Case Mapping: [the behavioral rule for handling of case, e.g., "Map
1323 uppercase and titlecase characters to lowercase."]
1324 Normalization: [which Unicode normalization form is applied, e.g.,
1325 "NFC"]
1326 Directionality: [the behavioral rule for handling of right-to-left
1327 code points, e.g., "The 'Bidi Rule' defined in RFC 5893 applies."]
1328 Specification: [a pointer to relevant documentation, such as an RFC
1329 or Internet-Draft]
1331 In order to request a review, the registrant shall send a completed
1332 template to the precis@ietf.org list or its designated successor.
1334 Factors to focus on while reviewing usage registrations include the
1335 following:
1337 o Does the specification define what kinds of applications are
1338 involved and the protocol elements to which this usage applies?
1339 o Is there a PRECIS string class or subclass that would be more
1340 appropriate to use?
1341 o Are the normalization, case mapping, width mapping, additional
1342 mapping, and directionality rules appropriate for the intended
1343 use?
1345 o Does the usage reduce the degree to which human users could be
1346 surprised by application behavior (the "principle of least user
1347 surprise")?
1348 o Is the usage based on an appropriate dividing line between user
1349 interface (culture, context, intent, locale, device limitations,
1350 etc.) and the use of conformant strings in protocol elements?
1351 o Does the usage introduce any new security concerns (e.g., false
1352 positives for authentication or authorization)?
1354 11. Interoperability Considerations
1356 Although strings that are consumed in PRECIS-based application
1357 protocols are often encoded using UTF-8 [RFC3629], the exact encoding
1358 is a matter for the application protocol that reuses PRECIS, not for
1359 the PRECIS framework.
1361 It is known that some existing systems are unable to support the full
1362 Unicode character set, or even any characters outside the ASCII
1363 range. If two (or more) applications need to interoperate when
1364 exchanging data (e.g., for the purpose of authenticating a username
1365 or password), they will naturally need have in common at least one
1366 coded character set (as defined by [RFC6365]). Establishing such a
1367 baseline is a matter for the application protocol that reuses PRECIS,
1368 not for the PRECIS framework.
1370 12. References
1372 12.1. Normative References
1374 [I-D.ietf-precis-mappings]
1375 Yoneya, Y. and T. NEMOTO, "Mapping characters for precis
1376 classes", draft-ietf-precis-mappings-02 (work in
1377 progress), May 2013.
1379 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20,
1380 October 1969.
1382 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1383 Requirement Levels", BCP 14, RFC 2119, March 1997.
1385 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
1386 Interchange", RFC 5198, March 2008.
1388 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version
1389 6.2", 2012,
1390 .
1392 12.2. Informative References
1394 [I-D.ietf-precis-nickname]
1395 Saint-Andre, P., "Preparation and Comparison of
1396 Nicknames", draft-ietf-precis-nickname-06 (work in
1397 progress), July 2013.
1399 [I-D.ietf-precis-saslprepbis]
1400 Saint-Andre, P. and A. Melnikov, "Username and Password
1401 Preparation Algorithms", draft-ietf-precis-saslprepbis-02
1402 (work in progress), April 2013.
1404 [I-D.ietf-xmpp-6122bis]
1405 Saint-Andre, P., "Extensible Messaging and Presence
1406 Protocol (XMPP): Address Format",
1407 draft-ietf-xmpp-6122bis-07 (work in progress), April 2013.
1409 [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson,
1410 "Remote Authentication Dial In User Service (RADIUS)",
1411 RFC 2865, June 2000.
1413 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
1414 Internationalized Strings ("stringprep")", RFC 3454,
1415 December 2002.
1417 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
1418 "Internationalizing Domain Names in Applications (IDNA)",
1419 RFC 3490, March 2003.
1421 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
1422 Profile for Internationalized Domain Names (IDN)",
1423 RFC 3491, March 2003.
1425 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
1426 10646", STD 63, RFC 3629, November 2003.
1428 [RFC4422] Melnikov, A. and K. Zeilenga, "Simple Authentication and
1429 Security Layer (SASL)", RFC 4422, June 2006.
1431 [RFC4510] Zeilenga, K., "Lightweight Directory Access Protocol
1432 (LDAP): Technical Specification Road Map", RFC 4510,
1433 June 2006.
1435 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
1436 Recommendations for Internationalized Domain Names
1437 (IDNs)", RFC 4690, September 2006.
1439 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
1440 IANA Considerations Section in RFCs", BCP 26, RFC 5226,
1441 May 2008.
1443 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
1444 Specifications: ABNF", STD 68, RFC 5234, January 2008.
1446 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
1447 (TLS) Protocol Version 1.2", RFC 5246, August 2008.
1449 [RFC5890] Klensin, J., "Internationalized Domain Names for
1450 Applications (IDNA): Definitions and Document Framework",
1451 RFC 5890, August 2010.
1453 [RFC5891] Klensin, J., "Internationalized Domain Names in
1454 Applications (IDNA): Protocol", RFC 5891, August 2010.
1456 [RFC5892] Faltstrom, P., "The Unicode Code Points and
1457 Internationalized Domain Names for Applications (IDNA)",
1458 RFC 5892, August 2010.
1460 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
1461 Internationalized Domain Names for Applications (IDNA)",
1462 RFC 5893, August 2010.
1464 [RFC5894] Klensin, J., "Internationalized Domain Names for
1465 Applications (IDNA): Background, Explanation, and
1466 Rationale", RFC 5894, August 2010.
1468 [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for
1469 Internationalized Domain Names in Applications (IDNA)
1470 2008", RFC 5895, September 2010.
1472 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in
1473 Internationalization in the IETF", BCP 166, RFC 6365,
1474 September 2011.
1476 [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and
1477 Problem Statement for the Preparation and Comparison of
1478 Internationalized Strings (PRECIS)", RFC 6885, March 2013.
1480 [RFC6943] Thaler, D., "Issues in Identifier Comparison for Security
1481 Purposes", RFC 6943, May 2013.
1483 [UAX9] The Unicode Consortium, "Unicode Standard Annex #9:
1484 Unicode Bidirectional Algorithm", September 2012,
1485 .
1487 [UAX11] The Unicode Consortium, "Unicode Standard Annex #11: East
1488 Asian Width", September 2012,
1489 .
1491 [UAX15] The Unicode Consortium, "Unicode Standard Annex #15:
1492 Unicode Normalization Forms", August 2012,
1493 .
1495 [UTR36] The Unicode Consortium, "Unicode Technical Report #36:
1496 Unicode Security Considerations", July 2012,
1497 .
1499 [UTR39] The Unicode Consortium, "Unicode Technical Report #39:
1500 Unicode Security Mechanisms", July 2012,
1501 .
1503 URIs
1505 [1]
1507 [2]
1509 Appendix A. Codepoint Table
1511 WARNING: The following table is provisional and is still being
1512 verified!
1514 If one applies the property calculation rules from Section 7 to the
1515 code points 0x0000 to 0x10FFFF in Unicode 6.2, the result is as shown
1516 in the following table, in Unicode Character Database (UCD) format.
1517 The columns of the table are as follows:
1519 1. The code point or codepoint range.
1520 2. The assignment for the code point or range, where the value is
1521 one of PVALID, DISALLOWED, UNASSIGNED, CONTEXTO, CONTEXTJ, or
1522 FREE_PVAL (which includes ID_DIS).
1523 3. The name or names for the code point or range.
1525 This table is non-normative, is included only for illustrative
1526 purposes, and applies only to Unicode 6.2, not to past or future
1527 versions of Unicode. Please note that the strings displayed in the
1528 third column are not necessarily the formal name of the code point
1529 (as defined in [UNICODE]) because the fixed width of the RFC format
1530 necessitated truncation of many names.
1532 0000..001F ; DISALLOWED #
1533 0020 ; FREE_PVAL # SPACE
1534 0021..007E ; PVALID # EXCLAM MARK .. TILDE
1535 007F..009F ; DISALLOWED #
1536 00A0..00AC ; FREE_PVAL # NO-BREAK SPACE .. NOT SIGN
1537 00AD ; DISALLOWED # SOFT HYPH
1538 00AE..00B6 ; FREE_PVAL # REGISTERED SIGN .. PILCROW SIGN
1539 00B7 ; CONTEXTO # MIDDLE DOT
1540 00B8..00BF ; FREE_PVAL # CEDILLA..INV QUEST IND
1541 00C0..00D6 ; PVALID # LAT CAP LET A W GRAV..LAT CAP O
1542 00D7 ; FREE_PVAL # MULTIPLICATION SIGN
1543 00D8..00F6 ; PVALID # LAT CAP LET O W STROKE..LAT SM
1544 00F7 ; FREE_PVAL # DIVISION SIGN
1545 00F8..0131 ; PVALID # LAT SM LET O W STROKE..LAT SM LET
1546 0132..0133 ; FREE_PVAL # LAT CAP LIG IJ..LAT SM LIB IJ
1547 0134..013E ; PVALID # LAT CAP LET J W CIRCUM..LAT SM LET
1548 013F..0140 ; FREE_PVAL # LAT CAP LET L W MID DOT..LAT SM LET
1549 0141..0148 ; PVALID # LAT CAP LET L W STROKE..LAT SM LET
1550 0149 ; FREE_PVAL # LAT SM LET N PRECEDED BY APOS
1551 014A..017E ; PVALID # LAT CAP LET ENG..LAT SM LET Z W CA
1552 017F ; FREE_PVAL # LAT SM LET LONG S
1553 0180..01C3 ; PVALID # LAT SM LET B W STROKE..LAT LET RETR
1554 01C4..01CC ; FREE_PVAL # LAT CAP LET DZ W CARON..LAT SM
1555 01CD..01F0 ; PVALID # LAT CAP LET A W CARON..LAT SM LET J
1556 01F1..01F3 ; FREE_PVAL # LAT CAP LET DZ..LAT SM LET DZ
1557 01F4..02AF ; PVALID # LAT CAP LET G W ACUTE..LAT SM
1558 02B0..02B8 ; FREE_PVAL # MOD LET SM H..MOD LET SM Y
1559 02B9..02C1 ; PVALID # MOD LET PRIME..MOD LET REV GLOT ST
1560 02C2..02C5 ; FREE_PVAL # MOD LET L ARROW..MOD LET D ARROW
1561 02C6..02D1 ; PVALID # MOD LET CIRCUM ACC..MOD LET HALF TR
1562 02D2..02EB ; FREE_PVAL # MOD LET CENT R HALF RING..MOD LET Y
1563 02EC ; PVALID # MOD LET VOICING
1564 02ED ; FREE_PVAL # MOD LET UNASPIRATED
1565 02EE ; PVALID # MOD LET DOUBLE APOS
1566 02EF..02FF ; FREE_PVAL # MOD LET LOW D ARR..MOD LET LOW L AR
1567 0300..034E ; PVALID # COMB GRAVE ACCENT..COMB UP ARROW BE
1568 034F ; DISALLOWED # COMB GRAPHEME JOINER
1569 0350..0374 ; PVALID # COMB RIGHT ARROWHEAD..GREEK NUM SIG
1570 0375 ; CONTEXTO # GREEK LOW NUM SIGN
1571 0376..0377 ; PVALID # GR CAP LET PAMPHYLIAN DIGAMMA..GR S
1572 0378..0379 ; UNASSIGNED # ..
1573 037A ; FREE_PVAL # GR YPOGEGRAMMENI..GR SM REV DOT LUN
1574 037B..037D ; PVALID # GR SM REV LUN SIG..GR SM REV DOT LU
1575 037E ; FREE_PVAL # GREEK QUEST MARK
1576 037F..0383 ; UNASSIGNED # ..
1577 0384..0385 ; FREE_PVAL # GREEK TONOS..GREEK DIALYTIKA TONOS
1578 0386 ; PVALID # GR CAP LET ALPHA W TONOS
1579 0387 ; FREE_PVAL # GREEK ANO TELEIA
1580 0388..038A ; PVALID # GR CAP LET EPSILON W TONOS..GR CAP
1581 038B ; UNASSIGNED #
1582 038C ; PVALID # GREEK CAP LET OMICRON W TONOS
1583 038D ; UNASSIGNED #
1584 038E..03A1 ; PVALID # GR CAP LET EPSILON W TONOS..GR CAP
1585 03A2 ; UNASSIGNED #
1586 03A3..03CF ; PVALID # GREEK CAP LET SIGMA..GR CAP
1587 03D0..03D2 ; FREE_PVAL # GR BETA SYM..GR UPSILON W HOOK
1588 03D3..03D4 ; PVALID # GR UPSILON W ACUTE AND HOOK..GR UP
1589 03D5..03D6 ; FREE_PVAL # GR PHI SYM..GR PI SYM
1590 03D7..03EF ; PVALID # GR KAI SYM..COPT SM LET DEI
1591 03F0..03F2 ; FREE_PVAL # GR KAPPA SYM..GR LUNATE SIGMA
1592 03F3 ; PVALID # GREEK LET YOT
1593 03F4..03F6 ; FREE_PVAL # GR CAP THETA..GR REV LUNATE EPSILON
1594 03F7..03F8 ; PVALID # GR CAP LET SHO..GR SM LET SHO
1595 03F9 ; FREE_PVAL # GREEK CAP LUNATE SIGMA SYM
1596 03FA..0481 ; PVALID # GR CAP LET SAN..CYR SML LET KOPPA
1597 0482 ; FREE_PVAL # CYR THOUSANDS SIGN
1598 0483..0487 ; PVALID # COMB CYR TITLO..COMB CYR POK
1599 0488..0489 ; FREE_PVAL # COMB CYR HUNDRED THOUSANDS SIGN..C
1600 048A..0527 ; PVALID # CYR CAP LET SH I W TAIL..CYR S
1601 0528..0530 ; UNASSIGNED # ..
1602 0531..0556 ; PVALID # ARM CAP LET AYB..ARM CAP LET FEH
1603 0557..0558 ; UNASSIGNED # ..
1604 0559 ; PVALID # ARM MOD LET LEFT HALF RING
1605 055A..055F ; FREE_PVAL # ARM APOS..ARM ABBREV
1606 0560 ; UNASSIGNED #
1607 0561..0586 ; PVALID # ARM SM LET AYB..ARMENIAN SM LE
1608 0587 ; FREE_PVAL # ARM SM LIG ECH YIWN
1609 0588 ; UNASSIGNED #
1610 0589..058A ; FREE_PVAL # ARMENIAN FULL STOP..ARMENIAN HYPH
1611 058B..058E ; UNASSIGNED # ..
1612 058F ; FREE_PVAL # ARMENIAN DRAM SIGN
1613 0590 ; UNASSIGNED #
1614 0591..05BD ; PVALID # HEBR ACC ETNAHTA..HEBR PNT ME
1615 05BE ; FREE_PVAL # HEBR PUNCT MAQAF
1616 05BF ; PVALID # HEBR PNT RAFE
1617 05C0 ; FREE_PVAL # HEBR PUNCT PASEQ
1618 05C1..05C2 ; PVALID # HEBR PNT SHIN DOT..HEBR PNT SIN DOT
1619 05C3 ; FREE_PVAL # HEBR PUNCT SOF PASUQ
1620 05C4..05C5 ; PVALID # HEBR MARK UP DOT..HEBR MARK LOW DOT
1621 05C6 ; FREE_PVAL # HEBR PUNCT NUN HAFUKHA
1622 05C7 ; PVALID # HEBR PNT QAMATS QATAN
1623 05C8..05CF ; UNASSIGNED # ..
1624 05D0..05EA ; PVALID # HEBR LET ALEF..HEBR LET TAV
1625 05EB..05EF ; UNASSIGNED # ..
1626 05F0..05F2 ; PVALID # HEBR LIG YIDDISH DOUBLE VAV..HEBR L
1627 05F3..05F4 ; CONTEXTO # HEBR PUNCT GERESH..HEBR PUNCTUATIO
1628 05F5..05FF ; UNASSIGNED # ..
1629 0600..0604 ; DISALLOWED # ARAB NUM SIGN..ARAB SIGN SAM
1630 0605 ; UNASSIGNED # ..
1631 0606..060F ; FREE_PVAL # AR-IND CUBE ROOT..ARAB SIGN MISRA
1632 0610..061A ; PVALID # ARAB SIGN SALLALLAHOU ALAYHE ..AR
1633 061B ; FREE_PVAL # ARAB SEMICOLON
1634 061C..061D ; UNASSIGNED # ..
1635 061E..061F ; FREE_PVAL # ARAB TRIPLE DOT PUNCT MARK..ARAB Q
1636 0620..063F ; PVALID # ARAB LET KASH..ARAB LET FARSI YEH
1637 0640 ; DISALLOWED # ARAB TATWEEL
1638 0641..065F ; PVALID # ARAB LET FEH..ARAB WAVY HAMZA BEL
1639 0660..0669 ; CONTEXTO # AR-IND DIG ZERO..AR-IND DIG
1640 066A..066D ; FREE_PVAL # ARAB PCT SIGN..ARAB FIVE PNTED STA
1641 066E..0674 ; PVALID # ARAB LET DOTLESS BEH..ARAB LET HIG
1642 0675..0678 ; FREE_PVAL # ARAB LET HIGH HAMZA ALEF..ARAB LET
1643 0679..06D3 ; PVALID # ARAB LET TTEH..ARAB LET YEH BARREE
1644 06D4 ; FREE_PVAL # ARAB FULL STOP
1645 06D5..06DC ; PVALID # ARAB LET AE..ARAB SM HIGH SEEN
1646 06DD ; DISALLOWED # ARAB END OF AYAH
1647 06DE ; FREE_PVAL # ARAB START OF RUB EL HIZB
1648 06DF..06E8 ; PVALID # ARAB SM HIGH ROUNDED ZERO..ARAB SM
1649 06E9 ; FREE_PVAL # ARAB PLACE OF SAJDAH
1650 06EA..06EF ; PVALID # ARAB EMPTY CENTRE LOW STOP..ARAB LET
1651 06F0..06F9 ; CONTEXTO # EXT AR-IND DIG ZERO..EXT A
1652 06FA..06FF ; PVALID # ARAB LET SHEEN W DOT BEL..ARAB
1653 0700..070D ; FREE_PVAL # SYR END OF PARA..SYR HARKLEAN AST
1654 070E ; UNASSIGNED #
1655 070F ; DISALLOWED # SYR ABBR MARK
1656 0710..074A ; PVALID # SYR LET ALAPH..SYR BARREKH
1657 074B..074C ; UNASSIGNED # ..
1658 074D..07B1 ; PVALID # SYR LET SOGDIAN ZHAIN..THAANA LET N
1659 07B2..07BF ; UNASSIGNED # ..
1660 07C0..07F5 ; PVALID # NKO DIG ZERO..NKO LOW TONE APOS
1661 07F6..07F9 ; FREE_PVAL # NKO SYM OO DENNEN..NKO EXCLAMATI
1662 07FA ; DISALLOWED # NKO LAJANYALAN
1663 07FB..07FF ; UNASSIGNED # ..
1664 0800..082D ; PVALID # SAMAR LET ALAF..SAMAR MARK NEQUDA
1665 082E..082F ; UNASSIGNED # ..
1666 0830..083E ; FREE_PVAL # SAMAR PUNCT NEQUDAA..SAMAR PUN
1667 083F ; UNASSIGNED #
1668 0840..085B ; PVALID # MANDAIC LET HALQA..MANDAIC GEM
1669 085C..085D ; UNASSIGNED # ..
1670 085E ; FREE_PVAL # MANDAIC PUNCTUATION
1671 085F..089F ; UNASSIGNED # ..
1672 08A0 ; PVALID # ARAB LET BEH W SM V BEL
1673 08A1 ; UNASSIGNED #
1674 08A2..08AC ; PVALID # ARAB LET JEEM W 2 DOTS AB..ARAB
1675 08AD..08E3 ; UNASSIGNED # ..
1676 08E4..08FE ; PVALID # ARAB CURLY FATHA..ARAB DAMMA W
1677 08FF ; UNASSIGNED #
1678 0900..0963 ; PVALID # DEVAN SIGN INV CANDRABINDU..DEVAN V
1679 0964..0965 ; FREE_PVAL # DEVAN DANDA..DEVAN DOUBLE DANDA
1680 0966..096F ; PVALID # DEVAN DIG ZERO..DEVAN DIG NINE
1681 0970 ; FREE_PVAL # DEVAN ABBR SIGN
1682 0971..0977 ; PVALID # DEVAN SIGN HIGH SPACING DOT..DEVAN
1683 0978 ; UNASSIGNED #
1684 0979..097F ; PVALID # DEVAN SIGN HIGH SPACING DOT..DEVAN
1685 0980 ; UNASSIGNED #
1686 0981..0983 ; PVALID # BENG SIGN CANDRABINDU..BENG SIGN VIS
1687 0984 ; UNASSIGNED #
1688 0985..098C ; PVALID # BENG LET A..BENG LET VOC L
1689 098D..098E ; UNASSIGNED # ..
1690 098F..0990 ; PVALID # BENG LET E..BENG LET AI
1691 0991..0992 ; UNASSIGNED # ..
1692 0993..09A8 ; PVALID # BENG LET O..BENG LET NA
1693 09A9 ; UNASSIGNED #
1694 09AA..09B0 ; PVALID # BENG LET PA..BENG LET RA
1695 09B1 ; UNASSIGNED #
1696 09B2 ; PVALID # BENG LET LA
1697 09B3..09B5 ; UNASSIGNED # ..
1698 09B6..09B9 ; PVALID # BENG LET SHA..BENG LET HA
1699 09BA..09BB ; UNASSIGNED # ..
1700 09BC..09C4 ; PVALID # BENG SIGN NUKTA..BENG VOW SIGN VOCAL
1701 09C5..09C6 ; UNASSIGNED # ..
1702 09C7..09C8 ; PVALID # BENG VOW SIGN E..BENG VOW SIGN AI
1703 09C9..09CA ; UNASSIGNED # ..
1704 09CB..09CE ; PVALID # BENG VOW SIGN O..BENG LET KHANDA
1705 09CF..09D6 ; UNASSIGNED # ..
1706 09D7 ; PVALID # BENG AU LEN MARK
1707 09D8..09DB ; UNASSIGNED # ..
1708 09DC..09DD ; PVALID # BENG LET RRA..BENG LET RHA
1709 09DE ; UNASSIGNED #
1710 09DF..09E3 ; PVALID # BENG LET YYA..BENG VOW SIG
1711 09E4..09E5 ; UNASSIGNED # ..
1712 09E6..09F1 ; PVALID # BENG DIG ZERO..BENG LET RA W L
1713 09F2..09FB ; FREE_PVAL # BENG RUPEE MARK..BENG GANDA MARK
1714 09FC..0A00 ; UNASSIGNED # ..
1715 0A01..0A03 ; PVALID # GURMUKHI SIGN ADAK BINDI..GURMUKHI
1716 0A04 ; UNASSIGNED #
1717 0A05..0A0A ; PVALID # GURMUKHI LET A..GURMUKHI LET UU
1718 0A0B..0A0E ; UNASSIGNED # ..
1719 0A0F..0A10 ; PVALID # GURMUKHI LET EE..GURMUKHI LET AI
1720 0A11..0A12 ; UNASSIGNED # ..
1721 0A13..0A28 ; PVALID # GURMUKHI LET OO..GURMUKHI LET NA
1722 0A29 ; UNASSIGNED #
1723 0A2A..0A30 ; PVALID # GURMUKHI LET PA..GURMUKHI LET RA
1724 0A31 ; UNASSIGNED #
1725 0A32..0A33 ; PVALID # GURMUKHI LET LA..GURMUKHI LET LLA
1726 0A34 ; UNASSIGNED #
1727 0A35.OA36 ; PVALID # GURMUKHI LET VA..GURMUKHI LET SHA
1728 0A37 ; UNASSIGNED #
1729 0A38..0A39 ; PVALID # GURMUKHI LET SA..GURMUKHI LET HA
1730 0A3A..0A3B ; UNASSIGNED # ..
1731 0A3C ; PVALID # GURMUKHI SIGN NUKTA
1732 0A3D ; UNASSIGNED #
1733 0A3E..0A42 ; PVALID # GURMUKHI VOW SIGN AA..GURMUKHI V
1734 0A43..0A46 ; UNASSIGNED # ..
1735 0A47..0A48 ; PVALID # GURMUKHI VOW SIGN EE..GURMUKHI V
1736 0A49..0A4A ; UNASSIGNED # ..
1737 0A4B..0A4D ; PVALID # GURMUKHI VOW SIGN OO..GURMUKHI S
1738 0A4E..0A50 ; UNASSIGNED # ..
1739 0A51 ; PVALID # GURMUKHI SIGN UDAAT
1740 0A52..0A58 ; UNASSIGNED # ..
1741 0A59..0A5C ; PVALID # GURMUKHI LET KHHA..GURMUKHI LET RRA
1742 0A5D ; UNASSIGNED #
1743 0A5E ; PVALID # GURMUKHI LET FA
1744 0A5F..0A65 ; UNASSIGNED # ..
1745 0A66..0A75 ; PVALID # GURMUKHI DIG ZERO..GURMUKHI SIGN YA
1746 0A76..0A80 ; UNASSIGNED # ..
1747 0A81..0A83 ; PVALID # GUJARATI SIGN CANDRABINDU..GUJARATI
1748 0A84 ; UNASSIGNED #
1749 0A85..0A8D ; PVALID # GUJARATI LET A..GUJARATI VOW CAND
1750 0A8E ; UNASSIGNED #
1751 0A8F..0A91 ; PVALID # GUJARATI LET E..GUJARATI VOW CAND
1752 0A92 ; UNASSIGNED #
1753 0A93..0AA8 ; PVALID # GUJARATI LET O..GUJARATI LET NA
1754 0AA9 ; UNASSIGNED #
1755 0AAA..0AB0 ; PVALID # GUJARATI LET PA..GUJARATI LET RA
1756 0AB1 ; UNASSIGNED #
1757 0AB2..0AB3 ; PVALID # GUJARATI LET LA..GUJARATI LET LLA
1758 0AB4 ; UNASSIGNED #
1759 0AB5..0AB9 ; PVALID # GUJARATI LET VA..GUJARATI LET HA
1760 0ABA..0ABB ; UNASSIGNED # ..
1761 0ABC..0AC5 ; PVALID # GUJARATI SIGN NUKTA..GUJARATI VOW
1762 0AC6 ; UNASSIGNED #
1763 0AC7..0AC9 ; PVALID # GUJARATI VOW SIGN E..GUJARATI VOW
1764 0ACA ; UNASSIGNED #
1765 0ACB..0ACD ; PVALID # GUJARATI VOW SIGN O..GUJARATI SIG
1766 0ACE..0ACF ; UNASSIGNED # ..
1767 0AD0 ; PVALID # GUJARATI OM
1768 0AD1..0ADF ; UNASSIGNED # ..
1769 0AE0..0AE3 ; PVALID # GUJARATI LET VOC RR..GUJARATI V
1770 0AE4..0AE5 ; UNASSIGNED # ..
1771 0AE6..0AEF ; PVALID # GUJARATI DIG ZERO..GUJARATI DIG NINE
1772 0AF0..0AF1 ; FREE_PVAL # GUJARATI ABBR SIGN..GUJARATI RUPEE S
1773 0AF2..0B00 ; UNASSIGNED # ..
1774 0B01..0B03 ; PVALID # ORIYA SIGN CANDRABINDU..ORIYA SIGN V
1775 0B04 ; UNASSIGNED #
1776 0B05..0B0C ; PVALID # ORIYA LET A..ORIYA LET VOC L
1777 0B0D..0B0E ; UNASSIGNED # ..
1778 0B0F..0B10 ; PVALID # ORIYA LET E..ORIYA LET AI
1779 0B11..0B12 ; UNASSIGNED # ..
1780 0B13..0B28 ; PVALID # ORIYA LET O..ORIYA LET NA
1781 0B29 ; UNASSIGNED #
1782 0B2A..0B30 ; PVALID # ORIYA LET PA..ORIYA LET RA
1783 0B31 ; UNASSIGNED #
1784 0B32..0B33 ; PVALID # ORIYA LET LA..ORIYA LET LLA
1785 0B34 ; UNASSIGNED #
1786 0B35..0B39 ; PVALID # ORIYA LET VA..ORIYA LET HA
1787 0B3A..0B3B ; UNASSIGNED # ..
1788 0B3C..0B44 ; PVALID # ORIYA SIGN NUKTA..ORIYA VOW SIGN
1789 0B45..0B46 ; UNASSIGNED # ..
1790 0B47..0B48 ; PVALID # ORIYA VOW SIGN E..ORIYA VOW SIG
1791 0B49..0B4A ; UNASSIGNED # ..
1792 0B4B..0B4D ; PVALID # ORIYA VOW SIGN O..ORIYA SIGN VIRA
1793 0B4E..0B55 ; UNASSIGNED # ..
1794 0B56..0B57 ; PVALID # ORIYA AI LEN MARK..ORIYA AU LENG
1795 0B58..0B5B ; UNASSIGNED # ..
1796 0B5C..0B5D ; PVALID # ORIYA LET RRA..ORIYA LET RHA
1797 0B5E ; UNASSIGNED #
1798 0B5F..0B63 ; PVALID # ORIYA LET YYA..ORIYA VOW SIGN VOCA
1799 0B64..0B65 ; UNASSIGNED # ..
1800 0B66..0B6F ; PVALID # ORIYA DIG ZERO..ORIYA DIG NINE
1801 0B70 ; FREE_PVAL # ORIYA ISSHAR
1802 0B71 ; PVALID # ORIYA LET WA
1803 0B72..0B77 ; FREE_PVAL # ORIYA FRACT ONE QUART..ORIYA FRACT
1804 0B78..0B81 ; UNASSIGNED # ..
1805 0B82..0B83 ; PVALID # TAMIL SIGN ANUSVARA..TAMIL SIGN VIS
1806 0B84 ; UNASSIGNED #
1807 0B85..0B8A ; PVALID # TAMIL LET A..TAMIL LET UU
1808 0B8B..0B8D ; UNASSIGNED # ..
1809 0B8E..0B90 ; PVALID # TAMIL LET E..TAMIL LET AI
1810 0B91 ; UNASSIGNED #
1811 0B92..0B95 ; PVALID # TAMIL LET O..TAMIL LET KA
1812 0B96..0B98 ; UNASSIGNED # ..
1813 0B99..0B9A ; PVALID # TAMIL LET NGA..TAMIL LET CA
1814 0B9B ; UNASSIGNED #
1815 0B9C ; PVALID # TAMIL LET JA
1816 0B9D ; UNASSIGNED #
1817 0B9E..0B9F ; PVALID # TAMIL LET NYA..TAMIL LET TTA
1818 0BA0..0BA2 ; UNASSIGNED # ..
1819 0BA3..0BA4 ; PVALID # TAMIL LET NNA..TAMIL LET TA
1820 0BA5..0BA7 ; UNASSIGNED # ..
1821 0BA8..0BAA ; PVALID # TAMIL LET NA..TAMIL LET PA
1822 0BAB..0BAD ; UNASSIGNED # ..
1823 0BAE..0BB9 ; PVALID # TAMIL LET MA..TAMIL LET HA
1824 0BBA..0BBD ; UNASSIGNED # ..
1825 0BBE..0BC2 ; PVALID # TAMIL VOW SIGN AA..TAMIL VOW SI
1826 0BC3..0BC5 ; UNASSIGNED # ..
1827 0BC6..0BC8 ; PVALID # TAMIL VOW SIGN E..TAMIL VOW SIG
1828 0BC9 ; UNASSIGNED #
1829 0BCA..0BCD ; PVALID # TAMIL VOW SIGN O..TAMIL SIGN VIRA
1830 0BCE..0BCF ; UNASSIGNED # ..
1831 0BD0 ; PVALID # TAMIL OM
1832 0BD1..0BD6 ; UNASSIGNED # ..
1833 0BD7 ; PVALID # TAMIL AU LEN MARK
1834 0BD8..0BE5 ; UNASSIGNED # ..
1835 0BE6..0BEF ; PVALID # TAMIL DIG ZERO..TAMIL DIG NINE
1836 0BF0..0BFA ; FREE_PVAL # TAMIL NUM TEN..TAMIL NUM SIGN
1837 0BFB..0C00 ; UNASSIGNED # ..
1838 0C01..0C03 ; PVALID # TELUGU SIGN CANDRABINDU..TELUGU SIG
1839 0C04 ; UNASSIGNED #
1840 0C05..0C0C ; PVALID # TELUGU LET A..TELUGU LET VOC L
1841 0C0D ; UNASSIGNED #
1842 0C0E..0C10 ; PVALID # TELUGU LET E..TELUGU LET AI
1843 0C11 ; UNASSIGNED #
1844 0C12..0C28 ; PVALID # TELUGU LET O..TELUGU LET NA
1845 0C29 ; UNASSIGNED #
1846 0C2A..0C33 ; PVALID # TELUGU LET PA..TELUGU LET LLA
1847 0C34 ; UNASSIGNED #
1848 0C35..0C39 ; PVALID # TELUGU LET VA..TELUGU LET HA
1849 0C3A..0C3C ; UNASSIGNED # ..
1850 0C3D..0C44 ; PVALID # TELUGU SIGN AVAGRAHA..TELUGU VOW SI
1851 0C45 ; UNASSIGNED #
1852 0C46..0C48 ; PVALID # TELUGU VOW SIGN E..TELUGU VOW SIGN
1853 0C49 ; UNASSIGNED #
1854 0C4A..0C4D ; PVALID # TELUGU VOW SIGN O..TELUGU SIGN VIRA
1855 0C4E..0C54 ; UNASSIGNED # ..
1856 0C55..0C56 ; PVALID # TELUGU LEN MARK..TELUGU AI LEN MARK
1857 0C57 ; UNASSIGNED #
1858 0C58..0C59 ; PVALID # TELUGU LET TSA..TELUGU LET DZA
1859 0C5A..0C5F ; UNASSIGNED # ..
1860 0C60..0C63 ; PVALID # TELUGU LET VOC RR..TELUGU VOW S
1861 0C64..0C65 ; UNASSIGNED # ..
1862 0C66..0C6F ; PVALID # TELUGU DIG ZERO..TELUGU DIG NINE
1863 0C70..0C77 ; UNASSIGNED # ..
1864 0C78..0C7F ; FREE_PVAL # TELUGU FRACTION DIG ZERO..TELUGU S
1865 0C80..0C81 ; UNASSIGNED # ..
1866 0C82..0C83 ; PVALID # KANNADA SIGN ANUSVARA..KANNADA SIGN
1867 0C84 ; UNASSIGNED #
1868 0C85..0C8C ; PVALID # KANNADA LET A..KANNADA LET VOC L
1869 0C8D ; UNASSIGNED #
1870 0C8E..0C90 ; PVALID # KANNADA LET E..KANNADA LET AI
1871 0C91 ; UNASSIGNED #
1872 0C92..0CA8 ; PVALID # KANNADA LET O..KANNADA LET NA
1873 0CA9 ; UNASSIGNED #
1874 0CAA..0CB3 ; PVALID # KANNADA LET PA..KANNADA LET LLA
1875 0CB4 ; UNASSIGNED #
1876 0CB5..0CB9 ; PVALID # KANNADA LET VA..KANNADA LET HA
1877 0CBA..0CBB ; UNASSIGNED # ..
1878 0CBC..0CC4 ; PVALID # KANNADA SIGN NUKTA..KANNADA VOW SIG
1879 0CC5 ; UNASSIGNED #
1880 0CC6..0CC8 ; PVALID # KANNADA VOW SIGN E..KANNADA VOW SIG
1881 0CC9 ; UNASSIGNED #
1882 0CCA..0CCD ; PVALID # KANNADA VOW SIGN O..KANNADA SIGN VI
1883 0CCE..0CD4 ; UNASSIGNED # ..
1884 0CD5..0CD6 ; PVALID # KANNADA LEN MARK..KANNADA AI LEN MA
1885 0CD7..0CDD ; UNASSIGNED # ..
1886 0CDE ; PVALID # KANNADA LET FA
1887 0CDF ; UNASSIGNED #
1888 0CE0..0CE3 ; PVALID # KANNADA LET VOC RR..KANNADA VOW SIG
1889 0CE4..0CE5 ; UNASSIGNED # ..
1890 0CE6..0CEF ; PVALID # KANNADA DIG ZERO..KANNADA DIG NINE
1891 0CF0 ; UNASSIGNED #
1892 0CF1..0CF2 ; DISALLOWED # KANNADA SIGN JIHVAMULIYA..KANNADA S
1893 0CF3..0D01 ; UNASSIGNED # ..
1894 0D02..0D03 ; PVALID # MALAY SIGN ANUSVARA..MALAY SIGN VIS
1895 0D04 ; UNASSIGNED #
1896 0D05..0D0C ; PVALID # MALAY LET A..MALAY LET VOC
1897 0D0D ; UNASSIGNED #
1898 0D0E..0D10 ; PVALID # MALAY LET E..MALAY LET AI
1899 0D11 ; UNASSIGNED #
1900 0D12..0D3A ; PVALID # MALAY LET O..MALAY LET TTTA
1901 0D3B..0D3C ; UNASSIGNED # ..
1902 0D3D..0D44 ; PVALID # MALAY SIGN AVAGRAHA..MALAY VOW SIG
1903 0D45 ; UNASSIGNED #
1904 0D46..0D48 ; PVALID # MALAY VOW SIGN E..MALAY VOW SIGN
1905 0D49 ; UNASSIGNED #
1906 0D4A..0D4E ; PVALID # MALAY VOW SIGN O..MALAY LET DOT REP
1907 0D4F..0D56 ; UNASSIGNED # ..
1908 0D57 ; PVALID # MALAY AU LEN MARK
1909 0D58..0D5F ; UNASSIGNED # ..
1910 0D60..0D63 ; PVALID # MALAY LET VOC RR..MALAY VOW
1911 0D64..0D65 ; UNASSIGNED # ..
1912 0D66..0D6F ; PVALID # MALAY DIG ZERO..MALAY DIG NINE
1913 0D70..0D75 ; FREE_PVAL # MALAY NUM TEN..MALAY FRACTION THR
1914 0D76..0D78 ; UNASSIGNED # ..
1915 0D79 ; FREE_PVAL # MALAY DATE MARK
1916 0D7A..0D7F ; PVALID # MALAY LET CHILLU NN..MALAY LET
1917 0D80..0D81 ; UNASSIGNED # ..
1918 0D82..0D83 ; PVALID # SINH SIGN ANUSVARAYA..SINH SIGN VIS
1919 0D84 ; UNASSIGNED #
1920 0D85..0D96 ; PVALID # SINH LET AYANNA..SINH LET AUYANN
1921 0D97..0D99 ; UNASSIGNED # ..
1922 0D9A..0DB1 ; PVALID # SINH LET ALPAPRAANA KAYANNA..SINH L
1923 0DB2 ; UNASSIGNED #
1924 0DB3..0DBB ; PVALID # SINH LET SANYAKA DAYANNA..SINH LETT
1925 0DBC ; UNASSIGNED #
1926 0DBD ; PVALID # SINH LET DANTAJA LAYANNA
1927 0DBE..0DBF ; UNASSIGNED # ..
1928 0DC0..0DC6 ; PVALID # SINH LET VAYANNA..SINH LET FAYAN
1929 0DC7..0DC9 ; UNASSIGNED # ..
1930 0DCA ; PVALID # SINH SIGN AL-LAKUNA
1931 0DCB..0DCE ; UNASSIGNED # ..
1932 0DCF..0DD4 ; PVALID # SINH VOW SIGN AELA-PILLA..SINH VOW
1933 0DD5 ; UNASSIGNED #
1934 0DD6 ; PVALID # SINH VOW SIGN DIGA PAA-PILLA
1935 0DD7 ; UNASSIGNED #
1936 0DD8..0DDF ; PVALID # SINH VOW SIGN GAETTA-PILLA..SINH VO
1937 0DE0..0DF1 ; UNASSIGNED # ..
1938 0DF2..0DF3 ; PVALID # SINH VOW SIGN DIGA GAETTA-PILLA..SI
1939 0DF4 ; FREE_PVAL # SINH PUNCT KUNDDALIYA
1940 0DF5..0E00 ; UNASSIGNED # ..
1941 0E01..0E32 ; PVALID # THAI CHAR KO KAI..THAI CHAR SARA A
1942 0E33 ; FREE_PVAL # THAI CHAR SARA AM
1943 0E34..0E3A ; PVALID # THAI CHAR SARA I..THAI CHAR PHINTH
1944 0E3B..0E3E ; UNASSIGNED # ..
1945 0E3F ; FREE_PVAL # THAI CURRENCY SYM BAHT
1946 0E40..0E4E ; PVALID # THAI CHAR SARA E..THAI CHAR YAMAKK
1947 0E4F ; FREE_PVAL # THAI CHAR FONGMAN
1948 0E50..0E59 ; PVALID # THAI DIG ZERO..THAI DIG NINE
1949 0E5A..0E5B ; FREE_PVAL # THAI CHAR ANGKHANKHU..THAI CHAR KH
1950 0E5C..0E80 ; UNASSIGNED # ..
1951 0E81..0E82 ; PVALID # LAO LET KO..LAO LET KHO SUNG
1952 0E83 ; UNASSIGNED #
1953 0E84 ; PVALID # LAO LET KHO TAM
1954 0E85..0E86 ; UNASSIGNED # ..
1955 0E87..0E88 ; PVALID # LAO LET NGO..LAO LET CO
1956 0E89 ; UNASSIGNED #
1957 0E8A ; PVALID # LAO LET SO TAM
1958 0E8B..0E8C ; UNASSIGNED # ..
1959 0E8D ; PVALID # LAO LET NYO
1960 0E8E..0E93 ; UNASSIGNED # ..
1961 0E94..0E97 ; PVALID # LAO LET DO..LAO LET THO TAM
1962 0E98 ; UNASSIGNED #
1963 0E99..0E9F ; PVALID # LAO LET NO..LAO LET FO SUNG
1964 0EA0 ; UNASSIGNED #
1965 0EA1..0EA3 ; PVALID # LAO LET MO..LAO LET LO LING
1966 0EA4 ; UNASSIGNED #