idnits 2.17.1 draft-ietf-krb-wg-utf8-profile-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 400 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 11 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 382 has weird spacing: '...versity incl...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 12, 2002) is 8101 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'CONTROL CHARACTERS' is mentioned on line 181, but not defined == Missing Reference: 'PRIVATE USE' is mentioned on line 191, but not defined == Missing Reference: 'PLANE 0' is mentioned on line 189, but not defined == Missing Reference: 'PLANE 15' is mentioned on line 190, but not defined == Missing Reference: 'PLANE 16' is mentioned on line 191, but not defined == Missing Reference: 'SURROGATE CODES' is mentioned on line 235, but not defined == Missing Reference: 'TAGGING CHARACTERS' is mentioned on line 279, but not defined == Unused Reference: 'CharModel' is defined on line 305, but no explicit reference was found in the text == Unused Reference: 'Glossary' is defined on line 308, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CharModel' -- Possible downref: Non-RFC (?) normative reference: ref. 'Glossary' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15' Summary: 7 errors (**), 0 flaws (~~), 13 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Jeffrey Altman 2 draft-ietf-krb-wg-utf8-profile-00.txt Columbia University 3 February 12, 2002 4 Expires in six months 6 Stringprep Profile for Kerberos UTF-8 Strings 8 Status of this memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other groups 15 may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 To view the list Internet-Draft Shadow Directories, see 23 http://www.ietf.org/shadow.html. 25 Abstract 27 This document describes how to prepare UTF-8 strings 28 in order to increase the likelihood that name input and name comparison 29 work in ways that make sense for typical users throughout the world. This 30 is a profile of the stringprep protocol developed in the IDN working group. 32 1. Introduction 34 This document specifies processing rules that will allow users to enter 35 Kerberos Principal Names and input to cryptographic String to Key functions. 36 It is a profile of stringprep [STRINGPREP]. 38 This profile defines the following, as required by [STRINGPREP] 40 - The intended applicability of the profile: internationalized 41 host name parts 43 - The character repertoire that is the input and output to stringprep: 44 defined in Section 2 46 - The list of unassigned code points for the repertoire: defined 47 in Appendix F. 49 - The mappings used: defined in Section 3. 51 - The Unicode normalization used: defined in Section 4 53 - The characters that are prohibited as output: Defined in section 5 55 1.2 Terminology 57 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and 58 "MAY" in this document are to be interpreted as described in RFC 2119 59 [RFC2119]. 61 Examples in this document use the notation for code points and names 62 from the Unicode Standard [Unicode3.1] and ISO/IEC 10646 [ISO10646]. For 63 example, the letter "a" may be represented as either "U+0061" or "LATIN 64 SMALL LETTER A". In the lists of prohibited characters, the "U+" is left 65 off to make the lists easier to read. The comments for character ranges 66 are shown in square brackets (such as "[SYMBOLS]") and do not come from 67 the standards. 69 2. Character Repertoire 71 Unicode 3.1 [Unicode3.1] is the repertoire used in this profile. 72 The reason Unicode 3.1 was chosen instead of a version of 73 ISO/IEC 10646 is that ISO/IEC 10646 is expected to be updated soon after 74 this document becomes an RFC. Unicode 3.1 has the exact repertoire that 75 is expected in the next version of ISO/IEC 10646, and is therefore used 76 here. 78 3. Mapping 80 This profile specifies stringprep mapping using the mapping table 81 in Appendix D. That table includes all the steps described in this 82 section. 84 Note that text in this section describe how Appendix D was formed. It is 85 there for people who want to understand more, but it should be ignored 86 by implementors. Implementations of this profile MUST map based on 87 Appendix D, not based on the descriptions in this section of how 88 Appendix D was created. 90 3.1 Mapped out 92 The following characters are simply deleted from the input (that is, 93 they are mapped to nothing) because their presence or absence should not 94 make two strings different. 96 Some characters are only useful in line-based text, and are otherwise 97 invisible and ignored. 99 00AD; SOFT HYPHEN 100 1806; MONGOLIAN TODO SOFT HYPHEN 101 200B; ZERO WIDTH SPACE 102 FEFF; ZERO WIDTH NO-BREAK SPACE 104 Variation selectors and cursive connectors select different glyphs, but 105 do not bear semantics. 107 180B; MONGOLIAN FREE VARIATION SELECTOR ONE 108 180C; MONGOLIAN FREE VARIATION SELECTOR TWO 109 180D; MONGOLIAN FREE VARIATION SELECTOR THREE 110 200C; ZERO WIDTH NON-JOINER 111 200D; ZERO WIDTH JOINER 113 3.2 Space Character Conversions 115 The following Unicode spaces are to be mapped to 0020; SPACE: 117 00A0; NO-BREAK SPACE 118 2000; EN QUAD 119 2001; EM QUAD 120 2002; EN SPACE 121 2003; EM SPACE 122 2004; THREE-PER-EM SPACE 123 2005; FOUR-PER-EM SPACE 124 2006; SIX-PER-EM SPACE 125 2007; FIGURE SPACE 126 2008; PUNCTUATION SPACE 127 2009; THIN SPACE 128 200A; HAIR SPACE 129 202F; NARROW NO-BREAK SPACE 130 3000; IDEOGRAPHIC SPACE 132 4. Normalization 134 This profile specifies using Unicode normalization form KC, as described 135 in [UAX15]. 137 NOTE: There was some discussion on the mailing list that would suggest 138 that Unicode NFKC does not properly handle the composition of 139 normalized Hangul strings. Following the lead of the IDN working 140 group, the Kerberos working group will not attempt to second-guess the 141 the authors of Unicode 3.1 Annex 15 (formerly Technical Report 15) 142 [UAX15], which specifies the normalization methods, or the Ideographic 143 Rappaorteur Group (IRG), which is the formal subgroup of ISO/IEC 144 JTC1/SC2/WG2 charged with approving all CJKV elements of the Unicode 145 standards. Such issues are outside the working group's charter and 146 its area of expertise. 148 5. Prohibited Output 150 This profile specifies using the prohibition table in Appendix E. 152 Note that the subsections below describe how Appendix E was formed. They 153 are there for people who want to understand more, but they should be 154 ignored by implementors. Implementations of this profile MUST map based 155 on Appendix E, not based on the descriptions in this section of how 156 Appendix E was created. 158 The collected lists of prohibited code points can be found in Appendix E 159 of this document. The lists in Appendix E MUST be used by implementations 160 of this specification. If there are any discrepancies between the lists 161 in Appendix E and subsections below, the lists in Appendix E always takes 162 precedence. 164 Some code points listed in one section would also appear in other 165 sections. Each code point is only listed once in the tables in Appendix 166 E. 168 5.1 Control characters 170 Control characters (or characters with control function) cannot be seen 171 and can cause unpredictable results when displayed. 173 0000-001F; [CONTROL CHARACTERS] 174 007F; DELETE 175 0080-009F; [CONTROL CHARACTERS] 176 070F; SYRIAC ABBREVIATION MARK 177 180E; MONGOLIAN VOWEL SEPARATOR 178 2028; LINE SEPARATOR 179 2029; PARAGRAPH SEPARATOR 180 206A-206F; [CONTROL CHARACTERS] 181 FFF9-FFFC; [CONTROL CHARACTERS] 182 1D173-1D17A; [MUSICAL CONTROL CHARACTERS] 184 5.2 Private use and replacement characters 186 Because private-use characters do not have defined meanings, they are 187 prohibited. The private-use characters are: 189 E000-F8FF; [PRIVATE USE, PLANE 0] 190 F0000-FFFFD; [PRIVATE USE, PLANE 15] 191 100000-10FFFD; [PRIVATE USE, PLANE 16] 193 The replacement character (U+FFFD) has no known semantic definition in a 194 name, and is often displayed by renderers to indicate "there would be 195 some character here, but it cannot be rendered". For example, on a 196 computer with no Asian fonts, a name with three ideographs might be 197 rendered with three replacement characters. 199 FFFD; REPLACEMENT CHARACTER 201 5.3 Non-character code points 203 Non-character code points are code points that have been allocated in 204 ISO/IEC 10646 but are not characters. Because they are already assigned, 205 they are guaranteed not to later change into characters. 207 FDD0-FDEF; [NONCHARACTER CODE POINTS] 208 FFFE-FFFF; [NONCHARACTER CODE POINTS] 209 1FFFE-1FFFF; [NONCHARACTER CODE POINTS] 210 2FFFE-2FFFF; [NONCHARACTER CODE POINTS] 211 3FFFE-3FFFF; [NONCHARACTER CODE POINTS] 212 4FFFE-4FFFF; [NONCHARACTER CODE POINTS] 213 5FFFE-5FFFF; [NONCHARACTER CODE POINTS] 214 6FFFE-6FFFF; [NONCHARACTER CODE POINTS] 215 7FFFE-7FFFF; [NONCHARACTER CODE POINTS] 216 8FFFE-8FFFF; [NONCHARACTER CODE POINTS] 217 9FFFE-9FFFF; [NONCHARACTER CODE POINTS] 218 AFFFE-AFFFF; [NONCHARACTER CODE POINTS] 219 BFFFE-BFFFF; [NONCHARACTER CODE POINTS] 220 CFFFE-CFFFF; [NONCHARACTER CODE POINTS] 221 DFFFE-DFFFF; [NONCHARACTER CODE POINTS] 222 EFFFE-EFFFF; [NONCHARACTER CODE POINTS] 223 FFFFE-FFFFF; [NONCHARACTER CODE POINTS] 224 10FFFE-10FFFF; [NONCHARACTER CODE POINTS] 226 The non-character code points are listed the PropList.txt file from the 227 Unicode database. 229 5.4 Surrogate codes 231 The following code points are permanently reserved for use as surrogate 232 code values in the UTF-16 encoding, will never be assigned to 233 characters, and are therefore prohibited: 235 D800-DFFF; [SURROGATE CODES] 237 5.5 Inappropriate for plain text 239 The following characters should not appear in regular text. 241 FFF9; INTERLINEAR ANNOTATION ANCHOR 242 FFFA; INTERLINEAR ANNOTATION SEPARATOR 243 FFFB; INTERLINEAR ANNOTATION TERMINATOR 244 FFFC; OBJECT REPLACEMENT CHARACTER 246 5.6 Inappropriate for canonical representation 248 The ideographic description characters allow different sequences of 249 characters to be rendered the same way, which makes them inappropriate 250 for host names that must have a single canonical representation. 252 2FF0-2FFB; [IDEOGRAPHIC DESCRIPTION CHARACTERS] 254 5.7 Change display properties 256 The following characters, some of which are deprecated in ISO/IEC 10646, 257 can cause changes in display or the order in which characters appear 258 when rendered. 260 200E; LEFT-TO-RIGHT MARK 261 200F; RIGHT-TO-LEFT MARK 262 202A; LEFT-TO-RIGHT EMBEDDING 263 202B; RIGHT-TO-LEFT EMBEDDING 264 202C; POP DIRECTIONAL FORMATTING 265 202D; LEFT-TO-RIGHT OVERRIDE 266 202E; RIGHT-TO-LEFT OVERRIDE 267 206A; INHIBIT SYMMETRIC SWAPPING 268 206B; ACTIVATE SYMMETRIC SWAPPING 269 206C; INHIBIT ARABIC FORM SHAPING 270 206D; ACTIVATE ARABIC FORM SHAPING 271 206E; NATIONAL DIGIT SHAPES 272 206F; NOMINAL DIGIT SHAPES 274 5.8 Tagging characters 276 The following characters are used for tagging text and are invisible. 278 E0001; LANGUAGE TAG 279 E0020-E007F; [TAGGING CHARACTERS] 281 6. Unassigned Code Points in Internationalized Host Names 283 This profile lists the unassigned code points for Unicode 3.1 in 284 Appendix F. The list in Appendix F MUST be used by implementations of 285 this specification. If there are any discrepancies between the list in 286 Appendix F and the Unicode 3.1 specification, the list Appendix F always 287 takes precedence. 289 7. Security Considerations 291 ISO/IEC 10646 has many characters that look similar. In many cases, 292 users of security protocols might do visual matching, such as when 293 comparing the names of trusted third parties. This profile does nothing 294 to map similar-looking characters together. 296 Principal names and passwords are entered by users and used within the 297 Kerberos protocol. The 298 security of the Internet would be compromised if a user entering a 299 single internationalized string could be connected to different servers 300 or denied access based on different interpretations of 301 internationalized strings. 303 8. References 305 [CharModel] Unicode Technical Report;17, Character Encoding Model. 306 . 308 [Glossary] Unicode Glossary, . 310 [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information 311 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 312 1: Architecture and Basic Multilingual Plane. 314 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate 315 Requirement Levels", March 1997, RFC 2119. 317 [STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of 318 Internationalized Strings ("stringprep")", draft-hoffman-stringprep, 319 work in progress 321 [Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode 322 Consortium. The Unicode Standard, Version 3.0. Reading, MA, 323 Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended 324 by: Unicode Standard Annex #27: Unicode 3.1 325 . 327 [UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15: 328 Unicode Normalization Forms, Version 3.1.0. 329 331 A. Acknowledgements 333 This draft is based upon the work of the IETF IDN Working Group's 334 IDN Nameprep design team. 336 B. IANA Considerations 338 This is a profile of stringprep. When it becomes an RFC, it 339 should be registered in the stringprep profile registry. 341 C. Author Contact Information 343 Jeffrey Altman 344 jaltman@columbia.edu 345 Columbia University 346 612 West 115th Street 347 New York NY 10025 349 D. Mapping Tables 351 The following is the mapping table from Section 3. The table has three 352 columns: 353 - the character that is mapped from 354 - the zero or more characters that it is mapped to 355 - the reason for the mapping 356 The columns are separated by semicolons. Note that the second column may 357 be empty, or it may have one character, or it may have more than one 358 character, with each character separated by a space. 360 ----- Start Mapping Table ----- 361 ... to be filled in ... 362 ----- End Mapping Table ----- 364 E. Prohibited Code Point List 366 ----- Start Prohibited Table ----- 367 ... to be filled in ... 368 ----- End Prohibited Table ----- 370 NOTE WELL: Software that follows this specification that will be used to 371 check names before they are put in authoritative name servers MUST add 372 all unassigned code pints to the list of characters that are prohibited. 373 See Section 6 of [STRINGPREP] for more details. 375 F. Unassigned Code Point List 377 ----- Start Unassigned Table ----- 378 ... to be filled in ... 379 ----- End Unassigned Table ----- 381 Jeffrey Altman * Sr.Software Designer C-Kermit 8.0 available now!!! 382 The Kermit Project @ Columbia University includes Telnet, FTP and HTTP 383 http://www.kermit-project.org/ secured with Kerberos, SRP, and 384 kermit-support@columbia.edu OpenSSL. Interfaces with OpenSSH