idnits 2.17.1
draft-ietf-krb-wg-utf8-profile-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** Looks like you're using RFC 2026 boilerplate. This must be updated to
follow RFC 3978/3979, as updated by RFC 4748.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
** Missing expiration date. The document expiration date should appear on
the first and last page.
** The document seems to lack a 1id_guidelines paragraph about the list of
current Internet-Drafts.
** The document seems to lack a 1id_guidelines paragraph about the list of
Shadow Directories.
== No 'Intended status' indicated for this document; assuming Proposed
Standard
== The page length should not exceed 58 lines per page, but there was 1
longer page, the longest (page 1) being 400 lines
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an Authors' Addresses Section.
** The document seems to lack separate sections for Informative/Normative
References. All references will be assumed normative when checking for
downward references.
** There are 11 instances of too long lines in the document, the longest
one being 4 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== Line 382 has weird spacing: '...versity incl...'
== The document seems to lack the recommended RFC 2119 boilerplate, even if
it appears to use RFC 2119 keywords.
(The document does seem to have the reference to RFC 2119 which the
ID-Checklist requires).
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (February 12, 2002) is 8101 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Missing Reference: 'CONTROL CHARACTERS' is mentioned on line 181, but
not defined
== Missing Reference: 'PRIVATE USE' is mentioned on line 191, but not
defined
== Missing Reference: 'PLANE 0' is mentioned on line 189, but not defined
== Missing Reference: 'PLANE 15' is mentioned on line 190, but not defined
== Missing Reference: 'PLANE 16' is mentioned on line 191, but not defined
== Missing Reference: 'SURROGATE CODES' is mentioned on line 235, but not
defined
== Missing Reference: 'TAGGING CHARACTERS' is mentioned on line 279, but
not defined
== Unused Reference: 'CharModel' is defined on line 305, but no explicit
reference was found in the text
== Unused Reference: 'Glossary' is defined on line 308, but no explicit
reference was found in the text
-- Possible downref: Non-RFC (?) normative reference: ref. 'CharModel'
-- Possible downref: Non-RFC (?) normative reference: ref. 'Glossary'
-- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'
-- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15'
Summary: 7 errors (**), 0 flaws (~~), 13 warnings (==), 6 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
1 Internet Draft Jeffrey Altman
2 draft-ietf-krb-wg-utf8-profile-00.txt Columbia University
3 February 12, 2002
4 Expires in six months
6 Stringprep Profile for Kerberos UTF-8 Strings
8 Status of this memo
10 This document is an Internet-Draft and is in full conformance with all
11 provisions of Section 10 of RFC2026.
13 Internet-Drafts are working documents of the Internet Engineering Task
14 Force (IETF), its areas, and its working groups. Note that other groups
15 may also distribute working documents as Internet-Drafts.
17 Internet-Drafts are draft documents valid for a maximum of six months
18 and may be updated, replaced, or obsoleted by other documents at any
19 time. It is inappropriate to use Internet-Drafts as reference material
20 or to cite them other than as "work in progress."
22 To view the list Internet-Draft Shadow Directories, see
23 http://www.ietf.org/shadow.html.
25 Abstract
27 This document describes how to prepare UTF-8 strings
28 in order to increase the likelihood that name input and name comparison
29 work in ways that make sense for typical users throughout the world. This
30 is a profile of the stringprep protocol developed in the IDN working group.
32 1. Introduction
34 This document specifies processing rules that will allow users to enter
35 Kerberos Principal Names and input to cryptographic String to Key functions.
36 It is a profile of stringprep [STRINGPREP].
38 This profile defines the following, as required by [STRINGPREP]
40 - The intended applicability of the profile: internationalized
41 host name parts
43 - The character repertoire that is the input and output to stringprep:
44 defined in Section 2
46 - The list of unassigned code points for the repertoire: defined
47 in Appendix F.
49 - The mappings used: defined in Section 3.
51 - The Unicode normalization used: defined in Section 4
53 - The characters that are prohibited as output: Defined in section 5
55 1.2 Terminology
57 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
58 "MAY" in this document are to be interpreted as described in RFC 2119
59 [RFC2119].
61 Examples in this document use the notation for code points and names
62 from the Unicode Standard [Unicode3.1] and ISO/IEC 10646 [ISO10646]. For
63 example, the letter "a" may be represented as either "U+0061" or "LATIN
64 SMALL LETTER A". In the lists of prohibited characters, the "U+" is left
65 off to make the lists easier to read. The comments for character ranges
66 are shown in square brackets (such as "[SYMBOLS]") and do not come from
67 the standards.
69 2. Character Repertoire
71 Unicode 3.1 [Unicode3.1] is the repertoire used in this profile.
72 The reason Unicode 3.1 was chosen instead of a version of
73 ISO/IEC 10646 is that ISO/IEC 10646 is expected to be updated soon after
74 this document becomes an RFC. Unicode 3.1 has the exact repertoire that
75 is expected in the next version of ISO/IEC 10646, and is therefore used
76 here.
78 3. Mapping
80 This profile specifies stringprep mapping using the mapping table
81 in Appendix D. That table includes all the steps described in this
82 section.
84 Note that text in this section describe how Appendix D was formed. It is
85 there for people who want to understand more, but it should be ignored
86 by implementors. Implementations of this profile MUST map based on
87 Appendix D, not based on the descriptions in this section of how
88 Appendix D was created.
90 3.1 Mapped out
92 The following characters are simply deleted from the input (that is,
93 they are mapped to nothing) because their presence or absence should not
94 make two strings different.
96 Some characters are only useful in line-based text, and are otherwise
97 invisible and ignored.
99 00AD; SOFT HYPHEN
100 1806; MONGOLIAN TODO SOFT HYPHEN
101 200B; ZERO WIDTH SPACE
102 FEFF; ZERO WIDTH NO-BREAK SPACE
104 Variation selectors and cursive connectors select different glyphs, but
105 do not bear semantics.
107 180B; MONGOLIAN FREE VARIATION SELECTOR ONE
108 180C; MONGOLIAN FREE VARIATION SELECTOR TWO
109 180D; MONGOLIAN FREE VARIATION SELECTOR THREE
110 200C; ZERO WIDTH NON-JOINER
111 200D; ZERO WIDTH JOINER
113 3.2 Space Character Conversions
115 The following Unicode spaces are to be mapped to 0020; SPACE:
117 00A0; NO-BREAK SPACE
118 2000; EN QUAD
119 2001; EM QUAD
120 2002; EN SPACE
121 2003; EM SPACE
122 2004; THREE-PER-EM SPACE
123 2005; FOUR-PER-EM SPACE
124 2006; SIX-PER-EM SPACE
125 2007; FIGURE SPACE
126 2008; PUNCTUATION SPACE
127 2009; THIN SPACE
128 200A; HAIR SPACE
129 202F; NARROW NO-BREAK SPACE
130 3000; IDEOGRAPHIC SPACE
132 4. Normalization
134 This profile specifies using Unicode normalization form KC, as described
135 in [UAX15].
137 NOTE: There was some discussion on the mailing list that would suggest
138 that Unicode NFKC does not properly handle the composition of
139 normalized Hangul strings. Following the lead of the IDN working
140 group, the Kerberos working group will not attempt to second-guess the
141 the authors of Unicode 3.1 Annex 15 (formerly Technical Report 15)
142 [UAX15], which specifies the normalization methods, or the Ideographic
143 Rappaorteur Group (IRG), which is the formal subgroup of ISO/IEC
144 JTC1/SC2/WG2 charged with approving all CJKV elements of the Unicode
145 standards. Such issues are outside the working group's charter and
146 its area of expertise.
148 5. Prohibited Output
150 This profile specifies using the prohibition table in Appendix E.
152 Note that the subsections below describe how Appendix E was formed. They
153 are there for people who want to understand more, but they should be
154 ignored by implementors. Implementations of this profile MUST map based
155 on Appendix E, not based on the descriptions in this section of how
156 Appendix E was created.
158 The collected lists of prohibited code points can be found in Appendix E
159 of this document. The lists in Appendix E MUST be used by implementations
160 of this specification. If there are any discrepancies between the lists
161 in Appendix E and subsections below, the lists in Appendix E always takes
162 precedence.
164 Some code points listed in one section would also appear in other
165 sections. Each code point is only listed once in the tables in Appendix
166 E.
168 5.1 Control characters
170 Control characters (or characters with control function) cannot be seen
171 and can cause unpredictable results when displayed.
173 0000-001F; [CONTROL CHARACTERS]
174 007F; DELETE
175 0080-009F; [CONTROL CHARACTERS]
176 070F; SYRIAC ABBREVIATION MARK
177 180E; MONGOLIAN VOWEL SEPARATOR
178 2028; LINE SEPARATOR
179 2029; PARAGRAPH SEPARATOR
180 206A-206F; [CONTROL CHARACTERS]
181 FFF9-FFFC; [CONTROL CHARACTERS]
182 1D173-1D17A; [MUSICAL CONTROL CHARACTERS]
184 5.2 Private use and replacement characters
186 Because private-use characters do not have defined meanings, they are
187 prohibited. The private-use characters are:
189 E000-F8FF; [PRIVATE USE, PLANE 0]
190 F0000-FFFFD; [PRIVATE USE, PLANE 15]
191 100000-10FFFD; [PRIVATE USE, PLANE 16]
193 The replacement character (U+FFFD) has no known semantic definition in a
194 name, and is often displayed by renderers to indicate "there would be
195 some character here, but it cannot be rendered". For example, on a
196 computer with no Asian fonts, a name with three ideographs might be
197 rendered with three replacement characters.
199 FFFD; REPLACEMENT CHARACTER
201 5.3 Non-character code points
203 Non-character code points are code points that have been allocated in
204 ISO/IEC 10646 but are not characters. Because they are already assigned,
205 they are guaranteed not to later change into characters.
207 FDD0-FDEF; [NONCHARACTER CODE POINTS]
208 FFFE-FFFF; [NONCHARACTER CODE POINTS]
209 1FFFE-1FFFF; [NONCHARACTER CODE POINTS]
210 2FFFE-2FFFF; [NONCHARACTER CODE POINTS]
211 3FFFE-3FFFF; [NONCHARACTER CODE POINTS]
212 4FFFE-4FFFF; [NONCHARACTER CODE POINTS]
213 5FFFE-5FFFF; [NONCHARACTER CODE POINTS]
214 6FFFE-6FFFF; [NONCHARACTER CODE POINTS]
215 7FFFE-7FFFF; [NONCHARACTER CODE POINTS]
216 8FFFE-8FFFF; [NONCHARACTER CODE POINTS]
217 9FFFE-9FFFF; [NONCHARACTER CODE POINTS]
218 AFFFE-AFFFF; [NONCHARACTER CODE POINTS]
219 BFFFE-BFFFF; [NONCHARACTER CODE POINTS]
220 CFFFE-CFFFF; [NONCHARACTER CODE POINTS]
221 DFFFE-DFFFF; [NONCHARACTER CODE POINTS]
222 EFFFE-EFFFF; [NONCHARACTER CODE POINTS]
223 FFFFE-FFFFF; [NONCHARACTER CODE POINTS]
224 10FFFE-10FFFF; [NONCHARACTER CODE POINTS]
226 The non-character code points are listed the PropList.txt file from the
227 Unicode database.
229 5.4 Surrogate codes
231 The following code points are permanently reserved for use as surrogate
232 code values in the UTF-16 encoding, will never be assigned to
233 characters, and are therefore prohibited:
235 D800-DFFF; [SURROGATE CODES]
237 5.5 Inappropriate for plain text
239 The following characters should not appear in regular text.
241 FFF9; INTERLINEAR ANNOTATION ANCHOR
242 FFFA; INTERLINEAR ANNOTATION SEPARATOR
243 FFFB; INTERLINEAR ANNOTATION TERMINATOR
244 FFFC; OBJECT REPLACEMENT CHARACTER
246 5.6 Inappropriate for canonical representation
248 The ideographic description characters allow different sequences of
249 characters to be rendered the same way, which makes them inappropriate
250 for host names that must have a single canonical representation.
252 2FF0-2FFB; [IDEOGRAPHIC DESCRIPTION CHARACTERS]
254 5.7 Change display properties
256 The following characters, some of which are deprecated in ISO/IEC 10646,
257 can cause changes in display or the order in which characters appear
258 when rendered.
260 200E; LEFT-TO-RIGHT MARK
261 200F; RIGHT-TO-LEFT MARK
262 202A; LEFT-TO-RIGHT EMBEDDING
263 202B; RIGHT-TO-LEFT EMBEDDING
264 202C; POP DIRECTIONAL FORMATTING
265 202D; LEFT-TO-RIGHT OVERRIDE
266 202E; RIGHT-TO-LEFT OVERRIDE
267 206A; INHIBIT SYMMETRIC SWAPPING
268 206B; ACTIVATE SYMMETRIC SWAPPING
269 206C; INHIBIT ARABIC FORM SHAPING
270 206D; ACTIVATE ARABIC FORM SHAPING
271 206E; NATIONAL DIGIT SHAPES
272 206F; NOMINAL DIGIT SHAPES
274 5.8 Tagging characters
276 The following characters are used for tagging text and are invisible.
278 E0001; LANGUAGE TAG
279 E0020-E007F; [TAGGING CHARACTERS]
281 6. Unassigned Code Points in Internationalized Host Names
283 This profile lists the unassigned code points for Unicode 3.1 in
284 Appendix F. The list in Appendix F MUST be used by implementations of
285 this specification. If there are any discrepancies between the list in
286 Appendix F and the Unicode 3.1 specification, the list Appendix F always
287 takes precedence.
289 7. Security Considerations
291 ISO/IEC 10646 has many characters that look similar. In many cases,
292 users of security protocols might do visual matching, such as when
293 comparing the names of trusted third parties. This profile does nothing
294 to map similar-looking characters together.
296 Principal names and passwords are entered by users and used within the
297 Kerberos protocol. The
298 security of the Internet would be compromised if a user entering a
299 single internationalized string could be connected to different servers
300 or denied access based on different interpretations of
301 internationalized strings.
303 8. References
305 [CharModel] Unicode Technical Report;17, Character Encoding Model.
306 .
308 [Glossary] Unicode Glossary, .
310 [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information
311 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
312 1: Architecture and Basic Multilingual Plane.
314 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
315 Requirement Levels", March 1997, RFC 2119.
317 [STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of
318 Internationalized Strings ("stringprep")", draft-hoffman-stringprep,
319 work in progress
321 [Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode
322 Consortium. The Unicode Standard, Version 3.0. Reading, MA,
323 Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended
324 by: Unicode Standard Annex #27: Unicode 3.1
325 .
327 [UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15:
328 Unicode Normalization Forms, Version 3.1.0.
329
331 A. Acknowledgements
333 This draft is based upon the work of the IETF IDN Working Group's
334 IDN Nameprep design team.
336 B. IANA Considerations
338 This is a profile of stringprep. When it becomes an RFC, it
339 should be registered in the stringprep profile registry.
341 C. Author Contact Information
343 Jeffrey Altman
344 jaltman@columbia.edu
345 Columbia University
346 612 West 115th Street
347 New York NY 10025
349 D. Mapping Tables
351 The following is the mapping table from Section 3. The table has three
352 columns:
353 - the character that is mapped from
354 - the zero or more characters that it is mapped to
355 - the reason for the mapping
356 The columns are separated by semicolons. Note that the second column may
357 be empty, or it may have one character, or it may have more than one
358 character, with each character separated by a space.
360 ----- Start Mapping Table -----
361 ... to be filled in ...
362 ----- End Mapping Table -----
364 E. Prohibited Code Point List
366 ----- Start Prohibited Table -----
367 ... to be filled in ...
368 ----- End Prohibited Table -----
370 NOTE WELL: Software that follows this specification that will be used to
371 check names before they are put in authoritative name servers MUST add
372 all unassigned code pints to the list of characters that are prohibited.
373 See Section 6 of [STRINGPREP] for more details.
375 F. Unassigned Code Point List
377 ----- Start Unassigned Table -----
378 ... to be filled in ...
379 ----- End Unassigned Table -----
381 Jeffrey Altman * Sr.Software Designer C-Kermit 8.0 available now!!!
382 The Kermit Project @ Columbia University includes Telnet, FTP and HTTP
383 http://www.kermit-project.org/ secured with Kerberos, SRP, and
384 kermit-support@columbia.edu OpenSSL. Interfaces with OpenSSH