idnits 2.17.1 

draft-ietf-krb-wg-utf8-profile-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 400 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 11 instances of too long lines in the document, the longest
     one being 4 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 382 has weird spacing: '...versity   incl...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 12, 2002) is 8101 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'CONTROL CHARACTERS' is mentioned on line 181, but
     not defined

  == Missing Reference: 'PRIVATE USE' is mentioned on line 191, but not
     defined

  == Missing Reference: 'PLANE 0' is mentioned on line 189, but not defined

  == Missing Reference: 'PLANE 15' is mentioned on line 190, but not defined

  == Missing Reference: 'PLANE 16' is mentioned on line 191, but not defined

  == Missing Reference: 'SURROGATE CODES' is mentioned on line 235, but not
     defined

  == Missing Reference: 'TAGGING CHARACTERS' is mentioned on line 279, but
     not defined

  == Unused Reference: 'CharModel' is defined on line 305, but no explicit
     reference was found in the text

  == Unused Reference: 'Glossary' is defined on line 308, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CharModel'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Glossary'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15'


     Summary: 7 errors (**), 0 flaws (~~), 13 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                              Jeffrey Altman
2	draft-ietf-krb-wg-utf8-profile-00.txt                  Columbia University
3	February 12, 2002
4	Expires in six months

6	        Stringprep Profile for Kerberos UTF-8 Strings

8	Status of this memo

10	This document is an Internet-Draft and is in full conformance with all
11	provisions of Section 10 of RFC2026.

13	Internet-Drafts are working documents of the Internet Engineering Task
14	Force (IETF), its areas, and its working groups. Note that other groups
15	may also distribute working documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference material
20	or to cite them other than as "work in progress."

22	To view the list Internet-Draft Shadow Directories, see
23	http://www.ietf.org/shadow.html.

25	Abstract

27	This document describes how to prepare UTF-8 strings
28	in order to increase the likelihood that name input and name comparison
29	work in ways that make sense for typical users throughout the world. This
30	is a profile of the stringprep protocol developed in the IDN working group.

32	1. Introduction

34	This document specifies processing rules that will allow users to enter
35	Kerberos Principal Names and input to cryptographic String to Key functions.
36	It is a profile of stringprep [STRINGPREP].

38	This profile defines the following, as required by [STRINGPREP]

40	- The intended applicability of the profile: internationalized
41	host name parts

43	- The character repertoire that is the input and output to stringprep:
44	defined in Section 2

46	- The list of unassigned code points for the repertoire: defined
47	in Appendix F.

49	- The mappings used: defined in Section 3.

51	- The Unicode normalization used: defined in Section 4

53	- The characters that are prohibited as output: Defined in section 5

55	1.2 Terminology

57	The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
58	"MAY" in this document are to be interpreted as described in RFC 2119
59	[RFC2119].

61	Examples in this document use the notation for code points and names
62	from the Unicode Standard [Unicode3.1] and ISO/IEC 10646 [ISO10646]. For
63	example, the letter "a" may be represented as either "U+0061" or "LATIN
64	SMALL LETTER A". In the lists of prohibited characters, the "U+" is left
65	off to make the lists easier to read. The comments for character ranges
66	are shown in square brackets (such as "[SYMBOLS]") and do not come from
67	the standards.

69	2. Character Repertoire

71	Unicode 3.1 [Unicode3.1] is the repertoire used in this profile.
72	The reason Unicode 3.1 was chosen instead of a version of
73	ISO/IEC 10646 is that ISO/IEC 10646 is expected to be updated soon after
74	this document becomes an RFC. Unicode 3.1 has the exact repertoire that
75	is expected in the next version of ISO/IEC 10646, and is therefore used
76	here.

78	3. Mapping

80	This profile specifies stringprep mapping using the mapping table
81	in Appendix D. That table includes all the steps described in this
82	section.

84	Note that text in this section describe how Appendix D was formed. It is
85	there for people who want to understand more, but it should be ignored
86	by implementors. Implementations of this profile MUST map based on
87	Appendix D, not based on the descriptions in this section of how
88	Appendix D was created.

90	3.1 Mapped out

92	The following characters are simply deleted from the input (that is,
93	they are mapped to nothing) because their presence or absence should not
94	make two strings different.

96	Some characters are only useful in line-based text, and are otherwise
97	invisible and ignored.

99	00AD; SOFT HYPHEN
100	1806; MONGOLIAN TODO SOFT HYPHEN
101	200B; ZERO WIDTH SPACE
102	FEFF; ZERO WIDTH NO-BREAK SPACE

104	Variation selectors and cursive connectors select different glyphs, but
105	do not bear semantics.

107	180B; MONGOLIAN FREE VARIATION SELECTOR ONE
108	180C; MONGOLIAN FREE VARIATION SELECTOR TWO
109	180D; MONGOLIAN FREE VARIATION SELECTOR THREE
110	200C; ZERO WIDTH NON-JOINER
111	200D; ZERO WIDTH JOINER

113	3.2 Space Character Conversions

115	The following Unicode spaces are to be mapped to 0020; SPACE:

117	00A0; NO-BREAK SPACE
118	2000; EN QUAD
119	2001; EM QUAD
120	2002; EN SPACE
121	2003; EM SPACE
122	2004; THREE-PER-EM SPACE
123	2005; FOUR-PER-EM SPACE
124	2006; SIX-PER-EM SPACE
125	2007; FIGURE SPACE
126	2008; PUNCTUATION SPACE
127	2009; THIN SPACE
128	200A; HAIR SPACE
129	202F; NARROW NO-BREAK SPACE
130	3000; IDEOGRAPHIC SPACE

132	4. Normalization

134	This profile specifies using Unicode normalization form KC, as described
135	in [UAX15].

137	NOTE: There was some discussion on the mailing list that would suggest
138	that Unicode NFKC does not properly handle the composition of
139	normalized Hangul strings.  Following the lead of the IDN working
140	group, the Kerberos working group will not attempt to second-guess the
141	the authors of Unicode 3.1 Annex 15 (formerly Technical Report 15)
142	[UAX15], which specifies the normalization methods, or the Ideographic
143	Rappaorteur Group (IRG), which is the formal subgroup of ISO/IEC
144	JTC1/SC2/WG2 charged with approving all CJKV elements of the Unicode
145	standards.  Such issues are outside the working group's charter and
146	its area of expertise.

148	5. Prohibited Output

150	This profile specifies using the prohibition table in Appendix E.

152	Note that the subsections below describe how Appendix E was formed. They
153	are there for people who want to understand more, but they should be
154	ignored by implementors. Implementations of this profile MUST map based
155	on Appendix E, not based on the descriptions in this section of how
156	Appendix E was created.

158	The collected lists of prohibited code points can be found in Appendix E
159	of this document. The lists in Appendix E MUST be used by implementations
160	of this specification. If there are any discrepancies between the lists
161	in Appendix E and subsections below, the lists in Appendix E always takes
162	precedence.

164	Some code points listed in one section would also appear in other
165	sections. Each code point is only listed once in the tables in Appendix
166	E.

168	5.1 Control characters

170	Control characters (or characters with control function) cannot be seen
171	and can cause unpredictable results when displayed.

173	0000-001F; [CONTROL CHARACTERS]
174	007F; DELETE
175	0080-009F; [CONTROL CHARACTERS]
176	070F; SYRIAC ABBREVIATION MARK
177	180E; MONGOLIAN VOWEL SEPARATOR
178	2028; LINE SEPARATOR
179	2029; PARAGRAPH SEPARATOR
180	206A-206F; [CONTROL CHARACTERS]
181	FFF9-FFFC; [CONTROL CHARACTERS]
182	1D173-1D17A; [MUSICAL CONTROL CHARACTERS]

184	5.2 Private use and replacement characters

186	Because private-use characters do not have defined meanings, they are
187	prohibited. The private-use characters are:

189	E000-F8FF; [PRIVATE USE, PLANE 0]
190	F0000-FFFFD; [PRIVATE USE, PLANE 15]
191	100000-10FFFD; [PRIVATE USE, PLANE 16]

193	The replacement character (U+FFFD) has no known semantic definition in a
194	name, and is often displayed by renderers to indicate "there would be
195	some character here, but it cannot be rendered". For example, on a
196	computer with no Asian fonts, a name with three ideographs might be
197	rendered with three replacement characters.

199	FFFD; REPLACEMENT CHARACTER

201	5.3 Non-character code points

203	Non-character code points are code points that have been allocated in
204	ISO/IEC 10646 but are not characters. Because they are already assigned,
205	they are guaranteed not to later change into characters.

207	FDD0-FDEF; [NONCHARACTER CODE POINTS]
208	FFFE-FFFF; [NONCHARACTER CODE POINTS]
209	1FFFE-1FFFF; [NONCHARACTER CODE POINTS]
210	2FFFE-2FFFF; [NONCHARACTER CODE POINTS]
211	3FFFE-3FFFF; [NONCHARACTER CODE POINTS]
212	4FFFE-4FFFF; [NONCHARACTER CODE POINTS]
213	5FFFE-5FFFF; [NONCHARACTER CODE POINTS]
214	6FFFE-6FFFF; [NONCHARACTER CODE POINTS]
215	7FFFE-7FFFF; [NONCHARACTER CODE POINTS]
216	8FFFE-8FFFF; [NONCHARACTER CODE POINTS]
217	9FFFE-9FFFF; [NONCHARACTER CODE POINTS]
218	AFFFE-AFFFF; [NONCHARACTER CODE POINTS]
219	BFFFE-BFFFF; [NONCHARACTER CODE POINTS]
220	CFFFE-CFFFF; [NONCHARACTER CODE POINTS]
221	DFFFE-DFFFF; [NONCHARACTER CODE POINTS]
222	EFFFE-EFFFF; [NONCHARACTER CODE POINTS]
223	FFFFE-FFFFF; [NONCHARACTER CODE POINTS]
224	10FFFE-10FFFF; [NONCHARACTER CODE POINTS]

226	The non-character code points are listed the PropList.txt file from the
227	Unicode database.

229	5.4 Surrogate codes

231	The following code points are permanently reserved for use as surrogate
232	code values in the UTF-16 encoding, will never be assigned to
233	characters, and are therefore prohibited:

235	D800-DFFF; [SURROGATE CODES]

237	5.5 Inappropriate for plain text

239	The following characters should not appear in regular text.

241	FFF9; INTERLINEAR ANNOTATION ANCHOR
242	FFFA; INTERLINEAR ANNOTATION SEPARATOR
243	FFFB; INTERLINEAR ANNOTATION TERMINATOR
244	FFFC; OBJECT REPLACEMENT CHARACTER

246	5.6 Inappropriate for canonical representation

248	The ideographic description characters allow different sequences of
249	characters to be rendered the same way, which makes them inappropriate
250	for host names that must have a single canonical representation.

252	2FF0-2FFB; [IDEOGRAPHIC DESCRIPTION CHARACTERS]

254	5.7 Change display properties

256	The following characters, some of which are deprecated in ISO/IEC 10646,
257	can cause changes in display or the order in which characters appear
258	when rendered.

260	200E; LEFT-TO-RIGHT MARK
261	200F; RIGHT-TO-LEFT MARK
262	202A; LEFT-TO-RIGHT EMBEDDING
263	202B; RIGHT-TO-LEFT EMBEDDING
264	202C; POP DIRECTIONAL FORMATTING
265	202D; LEFT-TO-RIGHT OVERRIDE
266	202E; RIGHT-TO-LEFT OVERRIDE
267	206A; INHIBIT SYMMETRIC SWAPPING
268	206B; ACTIVATE SYMMETRIC SWAPPING
269	206C; INHIBIT ARABIC FORM SHAPING
270	206D; ACTIVATE ARABIC FORM SHAPING
271	206E; NATIONAL DIGIT SHAPES
272	206F; NOMINAL DIGIT SHAPES

274	5.8 Tagging characters

276	The following characters are used for tagging text and are invisible.

278	E0001; LANGUAGE TAG
279	E0020-E007F; [TAGGING CHARACTERS]

281	6. Unassigned Code Points in Internationalized Host Names

283	This profile lists the unassigned code points for Unicode 3.1 in
284	Appendix F. The list in Appendix F MUST be used by implementations of
285	this specification. If there are any discrepancies between the list in
286	Appendix F and the Unicode 3.1 specification, the list Appendix F always
287	takes precedence.

289	7. Security Considerations

291	ISO/IEC 10646 has many characters that look similar. In many cases,
292	users of security protocols might do visual matching, such as when
293	comparing the names of trusted third parties. This profile does nothing
294	to map similar-looking characters together.

296	Principal names and passwords are entered by users and used within the
297	Kerberos protocol. The
298	security of the Internet would be compromised if a user entering a
299	single internationalized string could be connected to different servers
300	or denied access based on different interpretations of
301	internationalized strings.

303	8. References

305	[CharModel] Unicode Technical Report;17, Character Encoding Model.
306	<http://www.unicode.org/unicode/reports/tr17/>.

308	[Glossary] Unicode Glossary, <http://www.unicode.org/glossary/>.

310	[ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information
311	technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
312	1: Architecture and Basic Multilingual Plane.

314	[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
315	Requirement Levels", March 1997, RFC 2119.

317	[STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of
318	Internationalized Strings ("stringprep")", draft-hoffman-stringprep,
319	work in progress

321	[Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode
322	Consortium. The Unicode Standard, Version 3.0. Reading, MA,
323	Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended
324	by: Unicode Standard Annex #27: Unicode 3.1
325	<http://www.unicode.org/unicode/reports/tr27/tr27-4.html>.

327	[UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15:
328	Unicode Normalization Forms, Version 3.1.0.
329	<http://www.unicode.org/unicode/reports/tr15/tr15-21.html>

331	A. Acknowledgements

333	This draft is based upon the work of the IETF IDN Working Group's
334	IDN Nameprep design team.

336	B. IANA Considerations

338	This is a profile of stringprep. When it becomes an RFC, it
339	should be registered in the stringprep profile registry.

341	C. Author Contact Information

343	Jeffrey Altman
344	jaltman@columbia.edu
345	Columbia University
346	612 West 115th Street
347	New York NY 10025

349	D. Mapping Tables

351	The following is the mapping table from Section 3. The table has three
352	columns:
353	- the character that is mapped from
354	- the zero or more characters that it is mapped to
355	- the reason for the mapping
356	The columns are separated by semicolons. Note that the second column may
357	be empty, or it may have one character, or it may have more than one
358	character, with each character separated by a space.

360	----- Start Mapping Table -----
361	... to be filled in ...
362	----- End Mapping Table -----

364	E. Prohibited Code Point List

366	----- Start Prohibited Table -----
367	... to be filled in ...
368	----- End Prohibited Table -----

370	NOTE WELL: Software that follows this specification that will be used to
371	check names before they are put in authoritative name servers MUST add
372	all unassigned code pints to the list of characters that are prohibited.
373	See Section 6 of [STRINGPREP] for more details.

375	F. Unassigned Code Point List

377	----- Start Unassigned Table -----
378	... to be filled in ...
379	----- End Unassigned Table -----

381	 Jeffrey Altman * Sr.Software Designer      C-Kermit 8.0 available now!!!
382	 The Kermit Project @ Columbia University   includes Telnet, FTP and HTTP
383	 http://www.kermit-project.org/             secured with Kerberos, SRP, and
384	 kermit-support@columbia.edu                OpenSSL. Interfaces with OpenSSH