idnits 2.17.1 

draft-ietf-ldapbis-strprep-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The abstract seems to contain references ([CONTROLCHARACTERS],
     [RFC2119], [CharModel], [Unicode], [Glossary]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 832 has weird spacing: '...for the  purpo...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (15 February 2004) is 7376 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC3377' is mentioned on line 146, but not defined

  ** Obsolete undefined reference: RFC 3377 (Obsoleted by RFC 4510)

  == Missing Reference: 'Stringprep' is mentioned on line 257, but not defined

  -- No information found for draft-ietf-ldapbis-roadmap-xx - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'Roadmap' 

  -- No information found for draft-hoffman-rfc3454bis-xx - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'StringPrep' 

  -- No information found for draft-ietf-ldapbis-syntaxes-xx - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'Syntaxes' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15'

  -- No information found for draft-zeilenga-ldapbis-strmatch-xx - is the
     name correct?


     Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft                                      Kurt D. Zeilenga
3	Intended Category: Standard Track                OpenLDAP Foundation
4	Expires in six months                               15 February 2004

6	                LDAP: Internationalized String Preparation
7	                   <draft-ietf-ldapbis-strprep-03.txt>

9	Status of this Memo

11	  This document is an Internet-Draft and is in full conformance with all
12	  provisions of Section 10 of RFC2026.

14	  Distribution of this memo is unlimited.  Technical discussion of this
15	  document will take place on the IETF LDAP Revision Working Group
16	  mailing list <ietf-ldapbis@openldap.org>.  Please send editorial
17	  comments directly to the author <Kurt@OpenLDAP.org>.

19	  Internet-Drafts are working documents of the Internet Engineering Task
20	  Force (IETF), its areas, and its working groups.  Note that other
21	  groups may also distribute working documents as Internet-Drafts.
22	  Internet-Drafts are draft documents valid for a maximum of six months
23	  and may be updated, replaced, or obsoleted by other documents at any
24	  time.  It is inappropriate to use Internet-Drafts as reference
25	  material or to cite them other than as ``work in progress.''

27	  The list of current Internet-Drafts can be accessed at
28	  <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
29	  Internet-Draft Shadow Directories can be accessed at
30	  <http://www.ietf.org/shadow.html>.

32	  Copyright (C) The Internet Society (2004).  All Rights Reserved.

34	  Please see the Full Copyright section near the end of this document
35	  for more information.

37	Abstract

39	  The previous Lightweight Directory Access Protocol (LDAP) technical
40	  specifications did not precisely define how character string matching
41	  is to be performed.  This led to a number of usability and
42	  interoperability problems.  This document defines string preparation
43	  algorithms for character-based matching rules defined for use in LDAP.

45	Conventions

47	  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
48	  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
49	  document are to be interpreted as described in BCP 14 [RFC2119].

51	  Character names in this document use the notation for code points and
52	  names from the Unicode Standard [Unicode].  For example, the letter
53	  "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
54	  In the lists of mappings and the prohibited characters, the "U+" is
55	  left off to make the lists easier to read.  The comments for character
56	  ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
57	  and do not come from the standard.

59	  Note: a glossary of terms used in Unicode can be found in [Glossary].
60	  Information on the Unicode character encoding model can be found in
61	  [CharModel].

63	1. Introduction

65	1.1. Background

67	  A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
68	  [Syntaxes] defines an algorithm for determining whether a presented
69	  value matches an attribute value in accordance with the criteria
70	  defined for the rule.  The proposition may be evaluated to True,
71	  False, or Undefined.

73	      True      - the attribute contains a matching value,

75	      False     - the attribute contains no matching value,

77	      Undefined - it cannot be determined whether the attribute contains
78	                  a matching value or not.

80	  For instance, the caseIgnoreMatch matching rule may be used to compare
81	  whether the commonName attribute contains a particular value without
82	  regard for case and insignificant spaces.

84	1.2. X.500 String Matching Rules

86	  "X.520: Selected attribute types" [X.520] provides (amongst other
87	  things) value syntaxes and matching rules for comparing values
88	  commonly used in the Directory.  These specifications are inadequate
89	  for strings composed of Unicode [Unicode] characters.

91	  The caseIgnoreMatch matching rule [X.520], for example, is simply
92	  defined as being a case insensitive comparison where insignificant
93	  spaces are ignored.  For printableString, there is only one space
94	  character and case mapping is bijective, hence this definition is
95	  sufficient.  However, for Unicode string types such as
96	  universalString, this is not sufficient.  For example, a case
97	  insensitive matching implementation which folded lower case characters
98	  to upper case would yield different different results than an
99	  implementation which used upper case to lower case folding.  Or one
100	  implementation may view space as referring to only SPACE (U+0020), a
101	  second implementation may view any character with the space separator
102	  (Zs) property as a space, and another implementation may view any
103	  character with the whitespace (WS) category as a space.

105	  The lack of precise specification for character string matching has
106	  led to significant interoperability problems.  When used in
107	  certificate chain validation, security vulnerabilities can arise.  To
108	  address these problems, this document defines precise algorithms for
109	  preparing character strings for matching.

111	1.3. Relationship to "stringprep"

113	  The character string preparation algorithms described in this document
114	  are based upon the "stringprep" approach [StringPrep].  In
115	  "stringprep", presented and stored values are first prepared for
116	  comparison and so that a character-by-character comparison yields the
117	  "correct" result.

119	  The approach used here is a refinement of the "stringprep"
120	  [StringPrep] approach.  Each algorithm involves two additional
121	  preparation steps.

123	  a) prior to applying the Unicode string preparation steps outlined in
124	     "stringprep", the string is transcoded to Unicode;

126	  b) after applying the Unicode string preparation steps outlined in
127	     "stringprep", characters insignificant to the matching rules are
128	     removed.

130	  Hence, preparation of character strings for X.500 matching involves
131	  the following steps:

133	      1) Transcode
134	      2) Map
135	      3) Normalize
136	      4) Prohibit
137	      5) Check Bidi (Bidirectional)
138	      6) Insignificant Character Removal

140	  These steps are described in Section 2.

142	1.4. Relationship to the LDAP Technical Specification

144	  This document is a integral part of the LDAP technical specification
145	  [Roadmap] which obsoletes the previously defined LDAP technical
146	  specification [RFC3377] in its entirety.

148	  This document details new LDAP internationalized character string
149	  preparation algorithms used by [Syntaxes] and possible other technical
150	  specifications defining LDAP syntaxes and/or matching rules.

152	1.5. Relationship to X.500

154	  LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
155	  As such, there is a strong desire for alignment between LDAP and X.500
156	  syntax and semantics.  The character string preparation algorithms
157	  described in this document are based upon "Internationalized String
158	  Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study
159	  Group 2.

161	2. String Preparation

163	  The following six-step process SHALL be applied to each presented and
164	  attribute value in preparation for character string matching rule
165	  evaluation.

167	      1) Transcode
168	      2) Map
169	      3) Normalize
170	      4) Prohibit
171	      5) Check bidi
172	      6) Insignificant Character Removal

174	  Failure in any step causes the assertion to evaluate to Undefined.

176	  This process is intended to act upon non-empty character strings.  If
177	  the string to prepare is empty, this process is not applied and the
178	  assertion is evaluated to Undefined.

180	  The character repertoire of this process is Unicode 3.2 [Unicode].

182	2.1. Transcode

184	  Each non-Unicode string value is transcoded to Unicode.

186	  TeletexString [X.680][T.61] values are transcoded to Unicode as
187	  described in Appendix A.

189	  PrintableString [X.680] value are transcoded directly to Unicode.

191	  UniversalString, UTF8String, and bmpString [X.680] values need not be
192	  transcoded as they are Unicode-based strings (in the case of
193	  bmpString, a subset of Unicode).

195	  The output is the transcoded string.

197	2.2. Map

199	  SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
200	  points are mapped to nothing.  COMBINING GRAPHEME JOINER (U+034F) and
201	  VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
202	  mapped to nothing.  The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
203	  mapped to nothing.

205	  CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
206	  TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
207	  (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).

209	  All other control code points (e.g., Cc) or code points with a control
210	  function (e.g., Cf) are mapped to nothing.

212	  ZERO WIDTH SPACE (U+200B) is mapped to nothing.  All other code points
213	  with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
214	  Zp) are mapped to SPACE (U+0020).

216	  Appendix B provides a table detailing the above mappings.

218	  For case ignore, numeric, and stored prefix string matching rules,
219	  characters are case folded per B.2 of [StringPrep].

221	  The output is the mapped string.

223	2.3. Normalize

225	  The input string is be normalized to Unicode Form KC (compatibility
226	  composed) as described in [UAX15].  The output is the normalized
227	  string.

229	2.4. Prohibit

231	  All Unassigned code points are prohibited.  Unassigned code points are
232	  listed in Table A.1 of [StringPrep].

234	  Characters which, per Section 5.8 of [Stringprep], change display
235	  properties or are deprecated are prohibited.  These characters are are
236	  listed in Table C.8 of [StringPrep].

238	  Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are
239	  prohibited.

241	  All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF,
242	  2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF,
243	  7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
244	  CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
245	  prohibited.

247	  Surrogate codes (U+D800-DFFFF) are prohibited.

249	  The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.

251	  The step fails if the input string contains any prohibited code point.
252	  Otherwise, the output is the input string.

254	2.5. Check bidi

256	  This step fails if the input string does not conform to the the
257	  bidirectional character restrictions detailed in 6 of [Stringprep].
258	  Otherwise, the output is the input string.

260	2.6. Insignificant Character Removal

262	  In this step, characters insignificant to the matching rule are to be
263	  removed.  The characters to be removed differ from matching rule to
264	  matching rule.

266	  Section 2.6.1 applies to case ignore and exact string matching.
267	  Section 2.6.2 applies to numericString matching.
268	  Section 2.6.3 applies to telephoneNumber matching.

270	2.6.1. Insignificant Space Removal

272	  For the purposes of this section, a space is defined to be the SPACE
273	  (U+0020) code point followed by no combining marks.

275	  NOTE - The previous steps ensure that the string cannot contain any
276	         code points in the separator class, other than SPACE (U+0020).

278	  If the input string consists entirely of spaces or is empty, the
279	  output is a string consisting of exactly one space (e.g. " ").

281	  Otherwise, the following spaces are removed:
282	    - leading spaces (i.e. those preceding the first character that is
283	      not a space);
284	    - trailing spaces (i.e. those following the last character that is
285	      not a space);
286	    - multiple consecutive spaces (these are taken as equivalent to a
287	      single space character).

289	  For example, removal of spaces from the Form KC string:
290	      "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
291	  would result in the output string:
292	      "foo<SPACE>bar"
293	  and the Form KC string:
294	      "<SPACE><SPACE><SPACE>"
295	  would result in the output string:
296	      "<SPACE>".

298	2.6.2. numericString Insignificant Character Removal

300	  For the purposes of this section, a space is defined to be the SPACE
301	  (U+0020) code point followed by no combining marks.

303	  All spaces are regarded as not significant.  If the input string
304	  consists entirely of spaces or is empty, the output is a string
305	  consisting of exactly one space (e.g. " ").  Otherwise, all spaces are
306	  to be removed.

308	  For example, removal of spaces from the Form KC string:
309	      "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
310	  would result in the output string:
311	      "123456"
312	  and the Form KC string:
313	      "<SPACE><SPACE><SPACE>"
314	  would result in the output string:
315	      "<SPACE>".

317	2.6.3. telephoneNumber Insignificant Character Removal

319	  For the purposes of this section, a hyphen is defined to be
320	  HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
321	  NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
322	  (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
323	  combining marks and a space is defined to be the SPACE (U+0020) code
324	  point followed by no combining marks.

326	  All hyphens and spaces are considered insignificant.  If the string
327	  contains only spaces and hyphens or is empty, then the output is a
328	  string consisting of one space.  Otherwise, all hyphens and spaces are
329	  removed.

331	  For example, removal of hyphens and spaces from the Form KC string:
332	      "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
333	  would result in the output string:
334	      "123456"
335	  and the Form KC string:
336	      "<HYPHEN><HYPHEN><HYPHEN>"
337	  would result in the output string:
338	      "<SPACE>".

340	3. Security Considerations

342	  "Preparation for International Strings ('stringprep')" [StringPrep]
343	  security considerations generally apply to the algorithms described
344	  here.

346	4. Contributors

348	  Appendix A and B of this document were authored by Howard Chu
349	  <hyc@symas.com> of Symas Corporation (based upon information provided
350	  in RFC 1345).

352	5. Acknowledgments

354	  The approach used in this document is based upon design principles and
355	  algorithms described in "Preparation of Internationalized Strings
356	  ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet.  Some
357	  additional guidance was drawn from Unicode Technical Standards,
358	  Technical Reports, and Notes.

360	  This document is a product of the IETF LDAP Revision (LDAPBIS) Working
361	  Group.

363	6. Author's Address
364	  Kurt D. Zeilenga
365	  OpenLDAP Foundation

367	  Email: Kurt@OpenLDAP.org

369	7. References

371	7.1. Normative References

373	  [RFC2119]     Bradner, S., "Key words for use in RFCs to Indicate
374	                Requirement Levels", BCP 14 (also RFC 2119), March 1997.

376	  [Roadmap]     Zeilenga, K. (editor), "LDAP: Technical Specification
377	                Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
378	                progress.

380	  [StringPrep]  Hoffman P. and M. Blanchet, "Preparation of
381	                Internationalized Strings ('stringprep')",
382	                draft-hoffman-rfc3454bis-xx.txt, a work in progress.

384	  [Syntaxes]    Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
385	                draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.

387	  [Unicode]     The Unicode Consortium, "The Unicode Standard, Version
388	                3.2.0" is defined by "The Unicode Standard, Version 3.0"
389	                (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
390	                as amended by the "Unicode Standard Annex #27: Unicode
391	                3.1" (http://www.unicode.org/reports/tr27/) and by the
392	                "Unicode Standard Annex #28: Unicode 3.2"
393	                (http://www.unicode.org/reports/tr28/).

395	  [UAX15]       Davis, M. and M. Duerst, "Unicode Standard Annex #15:
396	                Unicode Normalization Forms, Version 3.2.0".
397	                <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
398	                March 2002.

400	  [X.680]       International Telecommunication Union -
401	                Telecommunication Standardization Sector, "Abstract
402	                Syntax Notation One (ASN.1) - Specification of Basic
403	                Notation", X.680(1997) (also ISO/IEC 8824-1:1998).

405	  [T.61]        CCITT (now ITU), "Character Repertoire and Coded
406	                Character Sets for the International Teletex Service",
407	                T.61, 1988.

409	7.2. Informative References

411	  [X.500]       International Telecommunication Union -
412	                Telecommunication Standardization Sector, "The Directory
413	                -- Overview of concepts, models and services,"
414	                X.500(1993) (also ISO/IEC 9594-1:1994).

416	  [X.501]       International Telecommunication Union -
417	                Telecommunication Standardization Sector, "The Directory
418	                -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).

420	  [X.520]       International Telecommunication Union -
421	                Telecommunication Standardization Sector, "The
422	                Directory: Selected Attribute Types", X.520(1993) (also
423	                ISO/IEC 9594-6:1994).

425	  [Glossary]    The Unicode Consortium, "Unicode Glossary",
426	                <http://www.unicode.org/glossary/>.

428	  [CharModel]   Whistler, K. and M. Davis, "Unicode Technical Report
429	                #17, Character Encoding Model", UTR17,
430	                <http://www.unicode.org/unicode/reports/tr17/>, August
431	                2000.

433	  [XMATCH]      Zeilenga, K., "Internationalized String Matching Rules
434	                for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a
435	                work in progress.

437	  [RFC1345]     Simonsen, K., "Character Mnemonics & Character Sets",
438	                RFC 1345, June 1992.

440	Appendix A. Teletex (T.61) to Unicode

442	  This appendix defines an algorithm for transcoding [T.61] characters
443	  to [Unicode] characters for use in string preparation for LDAP
444	  matching rules.  This appendix is normative.

446	  The transcoding algorithm is derived from the T.61-8bit definition
447	  provided in [RFC1345].  With a few exceptions, the T.61 character
448	  codes from x00 to x7f are equivalent to the corresponding [Unicode]
449	  code points, and their values are left unchanged by this algorithm.
450	  E.g. the T.61 code x20 is identical to (U+0020).  The exceptions are
451	  for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
452	  x7d, and x7e.

454	  The codes from x80 to x9f are also equivalent to the corresponding
455	  Unicode code points.  This is specified for completeness only, as
456	  these codes are control characters, and will be mapped to nothing in
457	  the LDAP String Preparation Mapping step.

459	  The remaining T.61 codes are mapped below in Table A.1.  Table
460	  positions marked "??" are undefined.

462	  Input strings containing undefined T.61 codes SHALL produce an
463	  Undefined matching result. For diagnostic purposes, this algorithm
464	  does not fail for undefined input codes.  Instead, undefined codes in
465	  the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
466	  As the LDAP String Preparation Prohibit step disallows the REPLACEMENT
467	  CHARACTER from appearing in its output, this transcoding yields the
468	  desired effect.

470	  Note: RFC 1345 listed the non-spacing accent codepoints as residing in
471	        the range starting at (U+E000).  In the current Unicode
472	        standard, the (U+E000) range is reserved for Private Use, and
473	        the non-spacing accents are in the range starting at (U+0300).
474	        The tables here use the (U+0300) range for these accents.

476	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
477	   --+------+------+------+------+------+------+------+------+
478	   a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
479	   a8| 00a8 |  ??  |  ??  | 00ab |  ??  |  ??  |  ??  |  ??  |
480	   b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
481	   b8| 00f7 |  ??  |  ??  | 00bb | 00bc | 00bd | 00be | 00bf |
482	   c0|  ??  | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
483	   c8| 0308 |  ??  | 030a | 0327 | 0332 | 030b | 0328 | 030c |
484	   d0|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
485	   d8|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
486	   e0| 2126 | 00c6 | 00d0 | 00aa |  ??  | 0126 | 0132 | 013f |
487	   e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
488	   f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
489	   f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b |  ??  |
490	   --+------+------+------+------+------+------+------+------+
491	            Table A.1:  Mapping of 8-bit T.61 codes to Unicode

493	  T.61 also defines a number of accented characters that are formed by
494	  combining an accent prefix followed by a base character.  These
495	  prefixes are in the code range xc1 to xcf. If a prefix character
496	  appears at the end of a string, the result is undefined.  Otherwise
497	  these sequences are mapped to Unicode by substituting the
498	  corresponding non-spacing accent code (as listed in Table A.1) for the
499	  accent prefix, and exchanging the order so that the base character
500	  precedes the accent.

502	Appendix B. Additional Teletex (T.61) to Unicode Tables

504	  All of the accented characters in T.61 have a corresponding code point
505	  in Unicode.  For the sake of completeness, the combined character
506	  codes are presented in the following tables.  This is informational
507	  only; for matching purposes it is sufficient to map the non-spacing
508	  accent and exchange the order of the character pair as specified in
509	  Appendix A.   This appendix is informative.

511	B.1. Combinations with SPACE

513	  Accents may be combined with a <SPACE> to generate the accent by
514	  itself.  For each accent code, the result of combining with <SPACE> is
515	  listed in Table B.1.

517	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
518	   --+------+------+------+------+------+------+------+------+
519	   c0|  ??  | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
520	   c8| 00a8 |  ??  | 02da | 00b8 |  ??  | 02dd | 02db | 02c7 |
521	   --+------+------+------+------+------+------+------+------+
522	       Table B.1:  Mapping of T.61 Accents with <SPACE> to Unicode

524	B.2. Combinations for xc1: (Grave accent)

526	  T.61 has predefined characters for combinations with A, E, I, O, and
527	  U.  Unicode also defines combinations for N, W, and Y.  All of these
528	  combinations are present in Table B.2.

530	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
531	   --+------+------+------+------+------+------+------+------+
532	   40|  ??  | 00c0 |  ??  |  ??  |  ??  | 00c8 |  ??  |  ??  |
533	   48|  ??  | 00cc |  ??  |  ??  |  ??  |  ??  | 01f8 | 00d2 |
534	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 00d9 |  ??  | 1e80 |
535	   58|  ??  | 1ef2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
536	   60|  ??  | 00e0 |  ??  |  ??  |  ??  | 00e8 |  ??  |  ??  |
537	   68|  ??  | 00ec |  ??  |  ??  |  ??  |  ??  | 01f9 | 00f2 |
538	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 00f9 |  ??  | 1e81 |
539	   78|  ??  | 1ef3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
540	   --+------+------+------+------+------+------+------+------+
541	           Table B.2: Mapping of T.61 Grave Accent Combinations

543	B.3. Combinations for xc2: (Acute accent)

545	  T.61 has predefined characters for combinations with A, E, I, O, U, Y,
546	  C, L, N, R, S, and Z.  Unicode also defines G, K, M, P, and W.  All of
547	  these combinations are present in Table B.3.

549	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
550	   --+------+------+------+------+------+------+------+------+
551	   40|  ??  | 00c1 |  ??  | 0106 |  ??  | 00c9 |  ??  | 01f4 |
552	   48|  ??  | 00cd |  ??  | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
553	   50| 1e54 |  ??  | 0154 | 015a |  ??  | 00da |  ??  | 1e82 |
554	   58|  ??  | 00dd | 0179 |  ??  |  ??  |  ??  |  ??  |  ??  |
555	   60|  ??  | 00e1 |  ??  | 0107 |  ??  | 00e9 |  ??  | 01f5 |
556	   68|  ??  | 00ed |  ??  | 1e31 | 013a | 1e3f | 0144 | 00f3 |
557	   70| 1e55 |  ??  | 0155 | 015b |  ??  | 00fa |  ??  | 1e83 |
558	   78|  ??  | 00fd | 017a |  ??  |  ??  |  ??  |  ??  |  ??  |
559	   --+------+------+------+------+------+------+------+------+
560	           Table B.3: Mapping of T.61 Acute Accent Combinations

562	B.4. Combinations for xc3: (Circumflex)

564	  T.61 has predefined characters for combinations with A, E, I, O, U, Y,
565	  C, G, H, J, S, and W.  Unicode also defines the combination for Z.
566	  All of these combinations are present in Table B.4.

568	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
569	   --+------+------+------+------+------+------+------+------+
570	   40|  ??  | 00c2 |  ??  | 0108 |  ??  | 00ca |  ??  | 011c |
571	   48| 0124 | 00ce | 0134 |  ??  |  ??  |  ??  |  ??  | 00d4 |
572	   50|  ??  |  ??  |  ??  | 015c |  ??  | 00db |  ??  | 0174 |
573	   58|  ??  | 0176 | 1e90 |  ??  |  ??  |  ??  |  ??  |  ??  |
574	   60|  ??  | 00e2 |  ??  | 0109 |  ??  | 00ea |  ??  | 011d |
575	   68| 0125 | 00ee | 0135 |  ??  |  ??  |  ??  |  ??  | 00f4 |
576	   70|  ??  |  ??  |  ??  | 015d |  ??  | 00fb |  ??  | 0175 |
577	   78|  ??  | 0177 | 1e91 |  ??  |  ??  |  ??  |  ??  |  ??  |
578	   --+------+------+------+------+------+------+------+------+
579	        Table B.4: Mapping of T.61 Circumflex Accent Combinations

581	B.5. Combinations for xc4: (Tilde)

583	  T.61 has predefined characters for combinations with A, I, O, U, and
584	  N.  Unicode also defines E, V, and Y.  All of these combinations are
585	  present in Table B.5.

587	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
588	   --+------+------+------+------+------+------+------+------+
589	   40|  ??  | 00c3 |  ??  |  ??  |  ??  | 1ebc |  ??  |  ??  |
590	   48|  ??  | 0128 |  ??  |  ??  |  ??  |  ??  | 00d1 | 00d5 |
591	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 0168 | 1e7c |  ??  |
592	   58|  ??  | 1ef8 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
593	   60|  ??  | 00e3 |  ??  |  ??  |  ??  | 1ebd |  ??  |  ??  |
594	   68|  ??  | 0129 |  ??  |  ??  |  ??  |  ??  | 00f1 | 00f5 |
595	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 0169 | 1e7d |  ??  |
596	   78|  ??  | 1ef9 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
597	   --+------+------+------+------+------+------+------+------+
598	           Table B.5: Mapping of T.61 Tilde Accent Combinations

600	B.6. Combinations for xc5: (Macron)

602	  T.61 has predefined characters for combinations with A, E, I, O, and
603	  U.  Unicode also defines Y, G, and AE.  All of these combinations are
604	  present in Table B.6.

606	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
607	   --+------+------+------+------+------+------+------+------+
608	   40|  ??  | 0100 |  ??  |  ??  |  ??  | 0112 |  ??  | 1e20 |
609	   48|  ??  | 012a |  ??  |  ??  |  ??  |  ??  |  ??  | 014c |
610	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 016a |  ??  |  ??  |
611	   58|  ??  | 0232 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
612	   60|  ??  | 0101 |  ??  |  ??  |  ??  | 0113 |  ??  | 1e21 |
613	   68|  ??  | 012b |  ??  |  ??  |  ??  |  ??  |  ??  | 014d |
614	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 016b |  ??  |  ??  |
615	   78|  ??  | 0233 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
616	   e0|  ??  | 01e2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
617	   f0|  ??  | 01e3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
618	   --+------+------+------+------+------+------+------+------+
619	          Table B.6: Mapping of T.61 Macron Accent Combinations

621	B.7. Combinations for xc6: (Breve)

623	  T.61 has predefined characters for combinations with A, U, and G.
624	  Unicode also defines E, I, and O.  All of these combinations are
625	  present in Table B.7.

627	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
628	   --+------+------+------+------+------+------+------+------+
629	   40|  ??  | 0102 |  ??  |  ??  |  ??  | 0114 |  ??  | 011e |
630	   48|  ??  | 012c |  ??  |  ??  |  ??  |  ??  |  ??  | 014e |
631	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 016c |  ??  |  ??  |
632	   58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
633	   60|  ??  | 0103 |  ??  |  ??  |  ??  | 0115 |  ??  | 011f |
634	   68|  ??  | 012d |  ??  |  ??  |  ??  |  ??  | 00f1 | 014f |
635	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 016d |  ??  |  ??  |
636	   78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
637	   --+------+------+------+------+------+------+------+------+
638	           Table B.7: Mapping of T.61 Breve Accent Combinations

640	B.8. Combinations for xc7: (Dot Above)
641	  T.61 has predefined characters for C, E, G, I, and Z.  Unicode also
642	  defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y.  All of these
643	  combinations are present in Table B.8.

645	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
646	   --+------+------+------+------+------+------+------+------+
647	   40|  ??  | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
648	   48| 1e22 | 0130 |  ??  |  ??  |  ??  | 1e40 | 1e44 | 022e |
649	   50| 1e56 |  ??  | 1e58 | 1e60 | 1e6a |  ??  |  ??  | 1e86 |
650	   58| 1e8a | 1e8e | 017b |  ??  |  ??  |  ??  |  ??  |  ??  |
651	   60|  ??  | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
652	   68| 1e23 |  ??  |  ??  |  ??  |  ??  | 1e41 | 1e45 | 022f |
653	   70| 1e57 |  ??  | 1e59 | 1e61 | 1e6b |  ??  |  ??  | 1e87 |
654	   78| 1e8b | 1e8f | 017c |  ??  |  ??  |  ??  |  ??  |  ??  |
655	   --+------+------+------+------+------+------+------+------+
656	         Table B.8: Mapping of T.61 Dot Above Accent Combinations

658	B.9. Combinations for xc8: (Diaeresis)

660	  T.61 has predefined characters for A, E, I, O, U, and Y.  Unicode also
661	  defines H, W, X, and t.  All of these combinations are present in
662	  Table B.9.

664	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
665	   --+------+------+------+------+------+------+------+------+
666	   40|  ??  | 00c4 |  ??  |  ??  |  ??  | 00cb |  ??  |  ??  |
667	   48| 1e26 | 00cf |  ??  |  ??  |  ??  |  ??  |  ??  | 00d6 |
668	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 00dc |  ??  | 1e84 |
669	   58| 1e8c | 0178 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
670	   60|  ??  | 00e4 |  ??  |  ??  |  ??  | 00eb |  ??  |  ??  |
671	   68| 1e27 | 00ef |  ??  |  ??  |  ??  |  ??  |  ??  | 00f6 |
672	   70|  ??  |  ??  |  ??  |  ??  | 1e97 | 00fc |  ??  | 1e85 |
673	   78| 1e8d | 00ff |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
674	   --+------+------+------+------+------+------+------+------+
675	         Table B.8: Mapping of T.61 Diaeresis Accent Combinations

677	B.10. Combinations for xca: (Ring Above)

679	  T.61 has predefined characters for A, and U.  Unicode also defines w
680	  and y.  All of these combinations are present in Table B.10.

682	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
683	   --+------+------+------+------+------+------+------+------+
684	   40|  ??  | 00c5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
685	   48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
686	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 016e |  ??  |  ??  |
687	   58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
688	   60|  ??  | 00e5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
689	   68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
690	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 016f |  ??  | 1e98 |
691	   78|  ??  | 1e99 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
692	   --+------+------+------+------+------+------+------+------+
693	        Table B.10: Mapping of T.61 Ring Above Accent Combinations

695	B.11. Combinations for xcb: (Cedilla)

697	  T.61 has predefined characters for C, G, K, L, N, R, S, and T.
698	  Unicode also defines E, D, and H.  All of these combinations are
699	  present in Table B.11.

701	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
702	   --+------+------+------+------+------+------+------+------+
703	   40|  ??  |  ??  |  ??  | 00c7 | 1e10 | 0228 |  ??  | 0122 |
704	   48| 1e28 |  ??  |  ??  | 0136 | 013b |  ??  | 0145 |  ??  |
705	   50|  ??  |  ??  | 0156 | 015e | 0162 |  ??  |  ??  |  ??  |
706	   58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
707	   60|  ??  |  ??  |  ??  | 00e7 | 1e11 | 0229 |  ??  | 0123 |
708	   68| 1e29 |  ??  |  ??  | 0137 | 013c |  ??  | 0146 |  ??  |
709	   70|  ??  |  ??  | 0157 | 015f | 0163 |  ??  |  ??  |  ??  |
710	   78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
711	   --+------+------+------+------+------+------+------+------+
712	         Table B.11: Mapping of T.61 Cedilla Accent Combinations

714	B.12. Combinations for xcd: (Double Acute Accent)

716	  T.61 has predefined characters for O, and U.  These combinations are
717	  present in Table B.12.

719	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
720	   --+------+------+------+------+------+------+------+------+
721	   48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0150 |
722	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 0170 |  ??  |  ??  |
723	   68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0151 |
724	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 0171 |  ??  |  ??  |
725	   --+------+------+------+------+------+------+------+------+
726	       Table B.12: Mapping of T.61 Double Acute Accent Combinations

728	B.13. Combinations for xce: (Ogonek)

730	  T.61 has predefined characters for A, E, I, and U.  Unicode also
731	  defines the combination for O.  All of these combinations are present
732	  in Table B.13.

734	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
735	   --+------+------+------+------+------+------+------+------+
736	   40|  ??  | 0104 |  ??  |  ??  |  ??  | 0118 |  ??  |  ??  |
737	   48|  ??  | 012e |  ??  |  ??  |  ??  |  ??  |  ??  | 01ea |
738	   50|  ??  |  ??  |  ??  |  ??  |  ??  | 0172 |  ??  |  ??  |
739	   58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
740	   60|  ??  | 0105 |  ??  |  ??  |  ??  | 0119 |  ??  |  ??  |
741	   68|  ??  | 012f |  ??  |  ??  |  ??  |  ??  |  ??  | 01eb |
742	   70|  ??  |  ??  |  ??  |  ??  |  ??  | 0173 |  ??  |  ??  |
743	   78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
744	   --+------+------+------+------+------+------+------+------+
745	          Table B.13: Mapping of T.61 Ogonek Accent Combinations

747	B.14. Combinations for xcf: (Caron)

749	  T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
750	  Unicode also defines A, I, O, U, G, H, j,and K.  All of these
751	  combinations are present in Table B.14.

753	     |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
754	   --+------+------+------+------+------+------+------+------+
755	   40|  ??  | 01cd |  ??  | 010c | 010e | 011a |  ??  | 01e6 |
756	   48| 021e | 01cf |  ??  | 01e8 | 013d |  ??  | 0147 | 01d1 |
757	   50|  ??  |  ??  | 0158 | 0160 | 0164 | 01d3 |  ??  |  ??  |
758	   58|  ??  |  ??  | 017d |  ??  |  ??  |  ??  |  ??  |  ??  |
759	   60|  ??  | 01ce |  ??  | 010d | 010f | 011b |  ??  | 01e7 |
760	   68| 021f | 01d0 | 01f0 | 01e9 | 013e |  ??  | 0148 | 01d2 |
761	   70|  ??  |  ??  | 0159 | 0161 | 0165 | 01d4 |  ??  |  ??  |
762	   78|  ??  |  ??  | 017e |  ??  |  ??  |  ??  |  ??  |  ??  |
763	   --+------+------+------+------+------+------+------+------+
764	          Table B.14: Mapping of T.61 Caron Accent Combinations

766	  Appendix B -- Mapping Table

768	  Input       Output
769	  -----       ------
770	  0000-0008
771	  0009-000D   0020
772	  000E-001F
773	  007F-009F
774	  0085        0020
775	  00A0        0020
776	  00AD
777	  034F
778	  06DD
779	  070F
780	  1680        0020
781	  1806
782	  180B-180E
783	  2000-200A   0020
784	  200B-200F
785	  2028-2029   0020
786	  202A-202E
787	  202F        0020
788	  205F        0020
789	  2060-2063
790	  206A-206F
791	  3000        0020
792	  FEFF
793	  FF00-FE0F
794	  FFF9-FFFC
795	  1D173-1D17A
796	  E0001
797	  E0020-E007F

799	Intellectual Property Rights

801	  The IETF takes no position regarding the validity or scope of any
802	  intellectual property or other rights that might be claimed to pertain
803	  to the implementation or use of the technology described in this
804	  document or the extent to which any license under such rights might or
805	  might not be available; neither does it represent that it has made any
806	  effort to identify any such rights.  Information on the IETF's
807	  procedures with respect to rights in standards-track and
808	  standards-related documentation can be found in BCP-11.  Copies of
809	  claims of rights made available for publication and any assurances of
810	  licenses to be made available, or the result of an attempt made to
811	  obtain a general license or permission for the use of such proprietary
812	  rights by implementors or users of this specification can be obtained
813	  from the IETF Secretariat.

815	  The IETF invites any interested party to bring to its attention any
816	  copyrights, patents or patent applications, or other proprietary
817	  rights which may cover technology that may be required to practice
818	  this standard.  Please address the information to the IETF Executive
819	  Director.

821	Full Copyright
822	  Copyright (C) The Internet Society (2004). All Rights Reserved.

824	  This document and translations of it may be copied and furnished to
825	  others, and derivative works that comment on or otherwise explain it
826	  or assist in its implementation may be prepared, copied, published and
827	  distributed, in whole or in part, without restriction of any kind,
828	  provided that the above copyright notice and this paragraph are
829	  included on all such copies and derivative works.  However, this
830	  document itself may not be modified in any way, such as by removing
831	  the copyright notice or references to the Internet Society or other
832	  Internet organizations, except as needed for the  purpose of
833	  developing Internet standards in which case the procedures for
834	  copyrights defined in the Internet Standards process must be followed,
835	  or as required to translate it into languages other than English.