idnits 2.17.1 draft-hoffman-utf8-rfcs-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 336. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 347. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 354. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 360. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 87: '... language (MUST, SHOULD, and so on). ...' -- The draft header indicates that this document updates RFC2223, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year (Using the creation date from RFC2223, updated by this document, for RFC5378 checks: 1997-10-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2, 2008) is 5683 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 2223 (Obsoleted by RFC 7322) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hoffman 3 Internet-Draft VPN Consortium 4 Updates: 2223 (if approved) T. Bray 5 Intended status: Informational Sun Microsystems 6 Expires: April 5, 2009 October 2, 2008 8 Using non-ASCII Characters in RFCs 9 draft-hoffman-utf8-rfcs-03.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 5, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This document specifies a change to the IETF process in which 43 Internet Drafts and RFCs are allowed to contain non-ASCII characters. 44 The proposed change is to change the encoding of Internet Drafts and 45 RFCs to UTF-8. 47 1. Introduction 49 The purpose of this document is to specify a way for the IETF to use 50 non-ASCII characters in Internet Drafts and RFCs. 52 Various guideline documents in the IETF, notably [RFC2223], specify 53 that RFCs must use only the US-ASCII character set. This restriction 54 has historically caused problems, notably: 56 o Names and addresses of authors of IETF documents are misspelled 58 o Names and document titles in references are misspelled 60 o Protocol examples that include non-ASCII characters cannot be 61 included straightforwardly 63 The first two issues cause real problems for people searching for 64 RFCs for particular authors or references that contain non-ASCII 65 characters. For many languages that use Latin characters outside the 66 ASCII range, there are not absolute mappings between those non-ASCII 67 characters and ASCII equivalents. A common example is that "a-with- 68 umlaut" (U+00E4) may be mapped to "a" or to "ae"; many other mapping 69 difficulties exist. 71 The third issue reduces the effectiveness of IETF specifications; 72 Implementors of protocols which carry textual payloads often 73 experience difficulty in achieving interoperability related to the 74 use of character sets from around the world. Specifications which 75 can provide concrete examples of such protocol scenarios will be of 76 significant benefit to these implementors. 78 Now that UTF-8 [RFC3629] is nearly universally available in text- 79 editing and display systems, the IETF can eliminate these problems by 80 allowing RFCs to use UTF-8. 82 This document uses example characters as specified in [RFC5137]. Had 83 the recommendations from this document already been implemented, this 84 alternate representation would, of course, not be necessary. 86 It is important to note that this document does not use RFC 2119 87 language (MUST, SHOULD, and so on). Instead, it lists practices that 88 the IETF should consider. If the ideas in this document are adopted, 89 the final list of rules for using UTF-8 in Internet Drafts and RFCs 90 would be published by the IAOC. The authors are open to changing 91 this and using 2119-style language if the community prefers it. 93 2. Use of UTF-8 in Internet Drafts and RFCs 95 Upon publication of this document as an RFC, all existing RFCs and 96 Internet Drafts will be considered to be encoded in UTF-8. The RFC 97 Editor needs to change their processes to publish documents that are 98 valid UTF-8. 100 Similarly, upon acceptance of this document by the IETF, the IAOC 101 should direct the IETF Secretariat to have all Internet Drafts 102 encoded in UTF-8. The Secretariat needs to change their processes to 103 publish documents that are valid UTF-8. 105 2.1. Limits On the Locations In Which Non-ASCII Text May Be Used 107 It is suggested that the IETF Secretariat and RFC Editor limit non- 108 ASCII characters to the following: 110 o Names and addresses of authors, used at the top of RFCs and in 111 Author Contact sections 113 o Names and document titles used in References sections 115 o Quotations from non-English languages 117 o Protocol examples that show non-ASCII characters, for example in 118 Internationalized Domain Names (IDNs), Internationalized Resource 119 Identifiers (IRIs), and internationalized email addresses. 121 2.2. Allowable Character Repertoire 123 UTF-8 is an encoding of the Unicode Character Set and can be used to 124 any of its numeric codepoints, from 0 to 0x10FFFF inclusive. 125 Specifications encoded in UTF-8 should not contain the encodings of 126 certain Unicode codepoints. The codepoint ranges given in this 127 section are inclusive: 129 o The "ASCII control characters" in the ranges U+0000 to U+0008, and 130 U+000B to U+001F. These lack either visual representations, 131 interoperable semantics, or both. 133 o The Surrogate-block range U+D800 to U+DFFF. These codepoints do 134 not identify characters, but exist to support the UTF-16 encoding. 136 o The ZERO WIDTH NO-BREAK SPACE U+FEFF and its mirror image U+FFFE. 138 o The Private-Use-Area ranges, U+E000 to U+F8FF, U+F0000 to U+FFFFD, 139 and U+100000 to U+10FFFD. 141 Specifications encoded in UTF-8 should not contain the encodings of 142 Unicode codepoints which are "Compatibility Characters", that is, 143 those whose properties include a compatibility decomposition. Note 144 that such characters occur rarely and detecting them requires run- 145 time access to the Unicode character database, which may not be 146 practical in some situations. 148 2.3. Normalization 150 Due to the way that Unicode uses combining characters, there are 151 sometimes multiple codepoint sequences that denote what, to a human, 152 is the same character. For example, the character "lowercase-a-with- 153 accent" can be spelled in two ways: as a single character (U+00E1) or 154 as two characters (U+0061 followed by U+0301). This can present 155 problems in searching and rendering. 157 The process of standardizing on one of these possibilities is 158 referred to as "normalization" and several "normalization forms" are 159 defined by the Unicode Consortium. All UTF-8 text appearing in RFCs 160 (but not necessarily Internet Drafts) ought to be normalized using 161 Normalization Form C. 163 2.4. Author and Employer Names 165 Authors can choose how to spell their names and the names of their 166 employers in the various parts of Internet Drafts they are writing. 167 The spelling at the top of the first page of the document needs to 168 match the spelling in the "Authors' Addresses" section near the end 169 of the document, but the latter can have alternate spellings to help 170 those searching documents by name. Postal information listed in the 171 "Authors' Addresses" section can also use non-ASCII. 173 For example, assume that an author whose name is 174 Fltstrm has a preferred all-ASCII spelling of 175 Xiaodong Faltstrom. Two expected allowed methods for spelling his 176 name would be: 178 Network Working Group X. Faltstrom 179 Internet-Draft ExampleCo 180 . . . 181 Author's Address 183 Xiaodong Faltstrom ( Fltstrm) 184 ExampleCo 186 Email: xiaodong.faltstrom@example.com 188 Network Working Group X. Fltstrm 189 Internet-Draft ExampleCo 190 . . . 191 Author's Address 193 Fltstrm (Xiaodong Faltstrom) 194 ExampleCo 196 Email: xiaodong.faltstrom@example.com 198 3. Security Considerations 200 A display program that expects only US-ASCII input may fail when it 201 encounters octets outside the US-ASCII range of values. Such a 202 failure may become a security issue. For example, the program may 203 display incorrect results for the input. More seriously, the program 204 may have an internal error that causes it to fail in a security- 205 compromising fashion. Note that such a program is vulnerable to many 206 attacks other than just showing IETF documents. 208 Someone could insert a UTF-8 host name in an RFC that has visually 209 confusing characters. Another person could copy that host name out 210 of the RFC and have it resolve to an unintended DNS name. This 211 scenario seems quite far-fetched, given that tracking the RFC back to 212 the author is trivial. 214 4. IAOC considerations 216 If this document is adopted by the IETF, it will be up to the IAOC to 217 have the IETF Secretariat and the RFC Editor implement it. The IAOC 218 needs to consider all of the suggested rules in this document, both 219 the positive ones (such as allowing additional characters in some 220 parts of Internet Drafts and RFCs) and the negative ones (such as 221 disallowing particular characters from being used). The IAOC might 222 want to publish proposed instructions to he IETF Secretariat and the 223 RFC Editor and ask for community input on the specific instructions. 225 5. Informative References 227 [RFC2223] Postel, J. and J. Reynolds, "Instructions to RFC Authors", 228 RFC 2223, October 1997. 230 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 231 10646", STD 63, RFC 3629, November 2003. 233 [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 234 BCP 137, RFC 5137, February 2008. 236 Appendix A. Arguments Against Changing to UTF-8 238 Over more than a decade, the question of changing the encoding of 239 RFCs to UTF-8 has come up repeatedly. Although many people wanted 240 the change, various people had different reasons why they felt it was 241 a bad idea. This appendix is a summary of those arguments and an 242 explanation of why they are no longer as critical as they were long 243 ago. 245 A.1. Difficulty in Displaying 247 Some text display systems only know how to display US-ASCII. 248 Displaying an RFC that uses non-ASCII characters encoded in UTF-8 249 will cause those characters to be unreadable. 251 There are, of course, still such display systems, and there always 252 will be. However, the number is dwindling as more software is 253 improved to display non-ASCII characters and, in particular, to read 254 UTF-8 as an encoding. Of the systems that can only render US-ASCII, 255 only a small subset drop non-ASCII characters: the others show an 256 incorrect character in its place. Thus, the person using such a 257 system can often see that there is a problem, and can possibly choose 258 to get better display software. 260 A.2. Difficulty in Printing 262 Some printers can only print a limited set of characters due to the 263 fact that they are character-oriented, not graphical. Such printers 264 inherently cannot print characters they do not understand. Almost 265 all such printers print the ASCII characters just fine. 267 There are, of course, still such printers, and there always will be. 268 However, the number is dwindling as older printers are replaced with 269 ones that can print graphics so that now-common text features like 270 boldface and italics can be printed. 272 A.3. Insufficient Fonts 274 Almost no display system that can display text that is encoded with 275 UTF-8 can display every character in the Unicode repertoire. Thus, 276 some non-ASCII characters that are included in RFCs will not display 277 properly. 279 Virtually every system that can display Unicode knows how to 280 substitute a replacement character for ones that cannot be displayed. 281 In fact, most such systems have glyphs for rendering unknown 282 characters and different glyphs for rendering known characters for 283 which the system has no font. 285 A.4. Inability to Search for Non-ASCII Characers 287 If authors start using non-ASCII characters in their names and/or 288 addresses, people who know the characters but are unfamiliar with the 289 user interface on their computers may not be able to enter those 290 characters in the search criteria. For example, some people do not 291 know how to enter "u-with-umlaut" in their operating system, even 292 though the operating system allows such input. 294 This is a valid concern, but one that is orthogonal to whether or not 295 RFCs should use these characters. The alternative (never go to 296 UTF-8) simply shifts the problem to forcing the user to guess which 297 ASCII-only spelling to use when searching. 299 Appendix B. Changes from -02 to -03 301 Changed the example name from Frank Hrst to 302 Fltstrm. 304 In 2.1, changed "It is suggested that the RFC Editor limit..." to "It 305 is suggested that the IETF Secretariat and RFC Editor limit..." 307 Made 2.4 match 2.1 by saying that postal addresses can be in UTF-8 as 308 well. 310 Authors' Addresses 312 Paul Hoffman 313 VPN Consortium 315 Email: paul.hoffman@vpnc.org 317 Tim Bray 318 Sun Microsystems 320 Email: tbray@textuality.com 322 Full Copyright Statement 324 Copyright (C) The IETF Trust (2008). 326 This document is subject to the rights, licenses and restrictions 327 contained in BCP 78, and except as set forth therein, the authors 328 retain all their rights. 330 This document and the information contained herein are provided on an 331 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 332 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 333 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 334 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 335 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 336 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 338 Intellectual Property 340 The IETF takes no position regarding the validity or scope of any 341 Intellectual Property Rights or other rights that might be claimed to 342 pertain to the implementation or use of the technology described in 343 this document or the extent to which any license under such rights 344 might or might not be available; nor does it represent that it has 345 made any independent effort to identify any such rights. Information 346 on the procedures with respect to rights in RFC documents can be 347 found in BCP 78 and BCP 79. 349 Copies of IPR disclosures made to the IETF Secretariat and any 350 assurances of licenses to be made available, or the result of an 351 attempt made to obtain a general license or permission for the use of 352 such proprietary rights by implementers or users of this 353 specification can be obtained from the IETF on-line IPR repository at 354 http://www.ietf.org/ipr. 356 The IETF invites any interested party to bring to its attention any 357 copyrights, patents or patent applications, or other proprietary 358 rights that may cover technology that may be required to implement 359 this standard. Please address the information to the IETF at 360 ietf-ipr@ietf.org. 362 Acknowledgment 364 Funding for the RFC Editor function is provided by the IETF 365 Administrative Support Activity (IASA).