idnits 2.17.1 draft-flanagan-nonascii-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 10 instances of too long lines in the document, the longest one being 273 characters in excess of 72. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 216: '...me or code point MUST be included in t...' -- The abstract seems to indicate that this document updates RFC7322, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 18, 2015) is 3081 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'GOST3410' is mentioned on line 369, but not defined == Unused Reference: 'RFC3550' is defined on line 469, but no explicit reference was found in the text == Unused Reference: 'RFC6949' is defined on line 483, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7564 (Obsoleted by RFC 8264) ** Obsolete normative reference: RFC 7613 (Obsoleted by RFC 8265) Summary: 5 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force H. Flanagan, Ed. 3 Internet-Draft RFC Editor 4 Intended status: Informational November 18, 2015 5 Expires: May 21, 2016 7 The Use of Non-ASCII Characters in RFCs 8 draft-flanagan-nonascii-06 10 Abstract 12 In order to support the internationalization of protocols and a more 13 diverse Internet community, the RFC Series must evolve to allow for 14 the use of non-ASCII characters in RFCs. While English remains the 15 required language of the Series, the encoding of future RFCs will be 16 in UTF-8, allowing for a broader range of characters than typically 17 used in the English language. This document describes the RFC Editor 18 requirements and guidance regarding the use of non-ASCII characters 19 in RFCs. 21 This document updates RFC 7322. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on May 21, 2016. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Basic requirements . . . . . . . . . . . . . . . . . . . . . 3 59 3. Rules for the use of non-ASCII characters . . . . . . . . . . 3 60 3.1. General usage throughout a document . . . . . . . . . . . 4 61 3.2. Authors, Contributors, and Acknowledgments . . . . . . . 4 62 3.3. Company Names . . . . . . . . . . . . . . . . . . . . . . 5 63 3.4. Body of the document . . . . . . . . . . . . . . . . . . 5 64 3.5. Tables . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 3.6. Code components . . . . . . . . . . . . . . . . . . . . . 8 66 3.7. Bibliographic text . . . . . . . . . . . . . . . . . . . 8 67 3.8. Keywords and Citation Tags . . . . . . . . . . . . . . . 9 68 3.9. Address Information . . . . . . . . . . . . . . . . . . . 9 69 4. Normalization Forms . . . . . . . . . . . . . . . . . . . . . 9 70 5. XML Markup . . . . . . . . . . . . . . . . . . . . . . . . . 9 71 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 72 7. Internationalization Considerations . . . . . . . . . . . . . 10 73 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 74 9. Change log - to be removed by the RFC Editor . . . . . . . . 10 75 9.1. -04 to -05 . . . . . . . . . . . . . . . . . . . . . . . 10 76 9.2. -04 to -05 . . . . . . . . . . . . . . . . . . . . . . . 10 77 9.3. -02 to -04 . . . . . . . . . . . . . . . . . . . . . . . 10 78 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 11 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 82 1. Introduction 84 For much of the history of the RFC Series, the character encoding 85 used for RFCs has been ASCII [ANSI.X3-4.1986]. This was a sensible 86 choice at the time: the language of the Series has always been 87 English, a language that primarily uses ASCII-encoded characters 88 (ignoring for a moment words borrowed from more richly decorated 89 alphabets); and, ASCII is the "lowest common denominator" for 90 character encoding, making cross-platform viewing trivial. 92 There are limits to ASCII, however, that hinder its continued use as 93 the exclusive character encoding for the Series. The increasing need 94 for easily readable, internationalized content suggests it is time to 95 allow non-ASCII characters in RFCs where necessary. To support this 96 move away from ASCII, RFCs will switch to supporting UTF-8 as the 97 default character encoding and allow support for a broad range of 98 Unicode character support. [UnicodeCurrent] Note that the RFC 99 Editor may reject any codepoint that does not render adequately in 100 enough formats or on in enough rendering engines using the current 101 tooling. 103 Given the continuing goal of maximum readability across platforms, 104 the use of non-ASCII characters should be limited in a document to 105 only where necessary within the text. This document describes the 106 rules under which non-ASCII characters may be used in an RFC. These 107 rules will be applied as the necessary changes are made to submission 108 checking and editorial tools. 110 This document updates the RFC Style Guide [RFC7322]. 112 The details described in this document are expected to change based 113 on experience gained in implementing the RFC production center's 114 toolset. Revised documents will be published capturing those changes 115 as the toolset is completed. Other implementers must not expect 116 those changes to remain backwards-compatible with the details 117 described this document. 119 2. Basic requirements 121 Two fundamental requirements inform the guidance and examples 122 provided in this document. They are: 124 o Searches against RFC indexes and database tables need to return 125 expected results and support appropriate Unicode string matching 126 behaviors; 128 o RFCs must be able to display correctly across a wide range of 129 readers and browsers. People whose system does not have the fonts 130 needed to display a particular RFC need to be able to read the 131 various publication formats and the XML correctly in order to 132 understand and implement the information described in the 133 document. 135 3. Rules for the use of non-ASCII characters 137 This section describes the guidelines for the use of non-ASCII 138 characters in the header, body, and reference sections of an RFC. If 139 the RFC Editor identifies areas where the use of non-ASCII characters 140 negatively impacts the readability of the text, they will request 141 alternate text. 143 The RFC Editor may, in cases of entire words represented in non-ASCII 144 characters, ask for a set of reviewers to verify the meaning, 145 spelling, characters, and grammar of the text. 147 3.1. General usage throughout a document 149 Where the use of non-ASCII characters is purely as part of an example 150 and not otherwise required for correct protocol operation, escaping 151 the non-ASCII character is not required. Note, however, that as the 152 language of the RFC Series is English, the use of non-ASCII 153 characters is based on the spelling of words commonly used in the 154 English language following the guidance in the Merriam-Webster 155 dictionary [MerrWeb]. 157 The RFC Editor will use the primary spelling listed in that 158 dictionary by default. 160 Example of non-ASCII characters that do not require escaping 161 [RFC4475]: 163 This particular response contains unreserved and non-ascii 164 UTF-8 characters. 165 This response is well formed. A parser must accept this message. 166 Message Details : unreason 167 SIP/2.0 200 = 2**3 * 5**2 но сто девяносто девять - простое 168 Via: SIP/2.0/UDP 192.0.2.198;branch=z9hG4bK1324923 169 Call-ID: unreason.1234ksdfak3j2erwedfsASdf 170 CSeq: 35 INVITE 171 From: sip:user@example.com;tag=11141343 172 To: sip:user@example.edu;tag=2229 Content-Length: 154 173 Content-Type: application/sdp 175 3.2. Authors, Contributors, and Acknowledgments 177 Person names may appear in several places within an RFC. In all 178 cases, valid Unicode is required. For names that include non-ASCII 179 characters, an author-provided, ASCII-only identifier is required to 180 assist in search and indexing of the document. 182 Example for the header: 184 Network Working Group L. Daigle 185 Request for Comments: 2611 Thinking Cat Enterprises 186 BCP: 33 D. van Gulik 187 Category: Best Current Practice ISIS/CEO, JRC Ispra 188 R. Iannella 189 DSTC Pty Ltd 190 P. Faeltstroem (P. Faltstrom) 191 Tele2/Swipnet 192 June 1999 194 Example for the Acknowledgements: 196 OLD: The following people contributed significant text to early 197 versions of this draft: Patrik Faltstrom, William Chan, and Fred 198 Baker. 200 PROPOSED/NEW: The following people contributed significant text to 201 early versions of this draft: Patrik Faeltstroem (Patrik Faltstrom), 202 陈智昌 (William Chan), and Fred Baker. 204 3.3. Company Names 206 Company names may appear in several places within an RFC. The rules 207 for company names follow similar guidance to that of person names. 208 Valid Unicode is required. For company names that include non-ASCII 209 characters, an ASCII-only identifier is required to assist in search 210 and indexing of the document. 212 3.4. Body of the document 214 When the mention of non-ASCII characters is required for correct 215 protocol operation and understanding, the characters' Unicode 216 character name or code point MUST be included in the text. 218 o Non-ASCII characters will require identifying the Unicode code 219 point. 221 o Use of the actual UTF-8 character (e.g., Δ) is encouraged so 222 that a reader can more easily see what the character is, if their 223 device can render the text. 225 o The use of the Unicode character names like "INCREMENT" in 226 addition to the use of Unicode code points is also encouraged. 227 When used, Unicode character names should be in all capital 228 letters. 230 Examples: 232 OLD [RFC7564]: 234 However, the problem is made more serious by introducing the full 235 range of Unicode code points into protocol strings. For example, 236 the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from 237 the Cherokee block look similar to the ASCII characters "STPETER" as 238 they might appear when presented using a "creative" font family. 240 NEW/ALLOWED: 242 However, the problem is made more serious by introducing the full 243 range of Unicode code points into protocol strings. For example, 244 the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 245 (ᏚᎢᎵᎬᎢᎬᏒ) from the Cherokee block look similar to the ASCII 246 characters "STPETER" as they might appear when presented using a 247 "creative" font family. 249 ALSO ACCEPTABLE: 251 However, the problem is made more serious by introducing the full 252 range of Unicode code points into protocol strings. For example, 253 the characters "ᏚᎢᎵᎬᎢᎬᏒ" (U+13DA U+13A2 U+13B5 U+13AC U+13A2 254 U+13AC U+13D2) from the Cherokee block look similar to the ASCII 255 characters "STPETER" as they might appear when presented using a 256 "creative" font family. 258 Example of proper identification of Unicode characters in an RFC: 260 Acceptable: 262 o Temperature changes in the Temperature Control Protocol are 263 indicated by the U+2206 character. 265 Preferred: 267 1. Temperature changes in the Temperature Control Protocol are 268 indicated by the U+2206 character ("Δ"). 270 2. Temperature changes in the Temperature Control Protocol are 271 indicated by the U+2206 character (INCREMENT). 273 3. Temperature changes in the Temperature Control Protocol are 274 indicated by the U+2206 character ("Δ", INCREMENT). 276 4. Temperature changes in the Temperature Control Protocol are 277 indicated by the U+2206 character (INCREMENT, "Δ"). 279 5. Temperature changes in the Temperature Control Protocol are 280 indicated by the "Delta" character "Δ" (U+2206). 282 6. Temperature changes in the Temperature Control Protocol are 283 indicated by the character "Δ" (INCREMENT, U+2206). 285 Which option of (1), (2), (3), (4), (5), or (6) is preferred may 286 depend on context and the specific character(s) in question. All are 287 acceptable within an RFC. BCP 137, "ASCII Escaping of Unicode 288 Character" describes the pros and cons of different options for 289 identifying Unicode characters in an ASCII document BCP137 [RFC5137]. 291 3.5. Tables 293 Tables follow the same rules for identifiers and characters as in 294 "Section 3.4. Body of the document". If it is sensible (i.e., more 295 understandable for a reader) for a given document to have two tables 296 -- one including the identifiers and non-ASCII characters and a 297 second with just the non-ASCII characters -- that will be allowed on 298 a case-by-case basis. 300 Original text from "Preparation, Enforcement, and Comparison of 301 Internationalized Strings Representing Usernames and Passwords" 302 [RFC7613]. 304 Table 3: A sample of legal passwords 306 +------------------------------------+------------------------------+ 307 | # | Password | Notes | 308 +------------------------------------+------------------------------+ 309 | 12| | ASCII space is allowed | 310 +------------------------------------+------------------------------+ 311 | 13| | Different from example 12 | 312 +------------------------------------+------------------------------+ 313 | 14| <πßå> | Non-ASCII letters are OK | 314 | | | (e.g., GREEK SMALL LETTER | 315 | | | PI, U+03C0) | 316 +------------------------------------+------------------------------+ 317 | 15| | Symbols are OK (e.g., BLACK | 318 | | | DIAMOND SUIT, U+2666) | 319 +------------------------------------+------------------------------+ 320 | 16| | OGHAM SPACE MARK, U+1680, is | 321 | | | mapped to U+0020 and thus | 322 | | | the full string is mapped to | 323 | | | | 324 +------------------------------------+------------------------------+ 326 Preferred text: 328 Table 3: A sample of legal passwords 330 +------------------------------------+------------------------------+ 331 | # | Password | Notes | 332 +------------------------------------+------------------------------+ 333 | 12| | ASCII space is allowed | 334 +------------------------------------+------------------------------+ 335 | 13| | Different from example 12 | 336 +------------------------------------+------------------------------+ 337 | 14| <πss๗> | Non-ASCII letters are OK | 338 | | | (e.g., GREEK SMALL LETTER | 339 | | | PI, U+03C0; LATIN SMALL | 340 | | | LETTER SHARP S, U+00DF; THAI | 341 | | | DIGIT SEVEN, U+0E57) | 342 +------------------------------------+------------------------------+ 343 | 15| | Symbols are OK (e.g., BLACK | 344 | | | DIAMOND SUIT, U+2666) | 345 +------------------------------------+------------------------------+ 346 | 16| | OGHAM SPACE MARK, U+1680, is | 347 | | | mapped to U+0020 and thus | 348 | | | the full string is mapped to | 349 | | | | 350 +------------------------------------+------------------------------+ 352 3.6. Code components 354 The RFC Editor encourages the use of the U+ notation except within a 355 code component where you must follow the rules of the programming 356 language in which you are writing the code. 358 3.7. Bibliographic text 360 The reference entry must be in English; whatever subfields are 361 present must be available in ASCII-encoded characters. As long as 362 good sense is used, the reference entry may also include non-ASCII 363 characters at the author's discretion and as provided by the author. 364 The RFC Editor may request a review of the non-ASCII reference entry. 365 This applies to both normative and informative references. 367 Example: 369 [GOST3410] "Information technology. Cryptographic data security. 370 Signature and verification processes of [electronic] 371 digital signature.", GOST R 34.10-2001, Gosudarstvennyi 372 Standard of Russian Federation, Government Committee of 373 Russia for Standards, 2001. (In Russian) 375 Allowable addition to the above citation: 376 "Информационная технология. Криптографическая защита 377 информации. Процессы формирования и проверки 378 электронной цифровой подписи", GOST R 34.10-2001, 379 Государственный стандарт Российской Федерации, 2001. 381 3.8. Keywords and Citation Tags 383 Keywords and citation tags must be ASCII only. 385 3.9. Address Information 387 The purpose of providing address information, either postal or 388 e-mail, is to assist readers of an RFC to contact the author or 389 authors. Authors may include the official postal address as 390 recognized by their company or local postal service without 391 additional non-ASCII character escapes. If the email address 392 includes non-ASCII characters and is a valid email address at the 393 time of publication, non-ASCII character escapes are not required. 395 4. Normalization Forms 397 Authors should not expect normalization forms to be preserved. If a 398 particular normalization form is expected, note that in the text of 399 the RFC. 401 5. XML Markup 403 As described above, use of non-ASCII characters in areas such as 404 email, company name, addresses, and name is allowed. In order to 405 make it easier for code to identify the appropriate ASCII 406 alternatives, authors must include an "ascii" attribute to their XML 407 markup when an ASCII alternative is required. See 408 [I-D.hoffman-xml2rfc] for more detail on how to tag ASCII 409 alternatives. 411 6. IANA Considerations 413 This document makes no request of IANA. 415 Note to RFC Editor: this section may be removed on publication as an 416 RFC. 418 7. Internationalization Considerations 420 The ability to use non-ASCII characters in RFCs in a clear and 421 consistent manner will improve the ability to describe 422 internationalized protocols and will recognize the diversity of 423 authors. However, the goal of readability will override the use of 424 non-ASCII characters within the text. 426 8. Security Considerations 428 Valid Unicode that matches the expected text must be verified in 429 order to preserve expected behavior and protocol information. 431 9. Change log - to be removed by the RFC Editor 433 9.1. -04 to -05 435 Keywords: expanded section to include citation tags. 437 Internationalization considerations: reiterated that the use of non- 438 ASCII characters is not automatically guaranteed. 440 9.2. -04 to -05 442 Introduction: added statement regarding document subject to change. 444 Tables: added example. 446 Code: removed placeholder for example. 448 9.3. -02 to -04 450 Introduction and Abstract: change to be clearer about what/why non- 451 ASCII characters are being allowed. 453 XML Markup: section added. 455 10. References 457 [ANSI.X3-4.1986] 458 American National Standards Institute, "Coded Character 459 Set - 7-bit American Standard Code for Information 460 Interchange", ANSI X3.4, 1986. 462 [I-D.hoffman-xml2rfc] 463 Hoffman, P., "The 'XML2RFC' version 3 Vocabulary", draft- 464 hoffman-xml2rfc-23 (work in progress), September 2015. 466 [MerrWeb] Merriam-Webster,Inc., "Merriam-Webster's Collegiate 467 Dictionary, 11th Edition", 2009. 469 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 470 Jacobson, "RTP: A Transport Protocol for Real-Time 471 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 472 July 2003, . 474 [RFC4475] Sparks, R., Ed., Hawrylyshen, A., Johnston, A., Rosenberg, 475 J., and H. Schulzrinne, "Session Initiation Protocol (SIP) 476 Torture Test Messages", RFC 4475, DOI 10.17487/RFC4475, 477 May 2006, . 479 [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 480 BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008, 481 . 483 [RFC6949] Flanagan, H. and N. Brownlee, "RFC Series Format 484 Requirements and Future Development", RFC 6949, 485 DOI 10.17487/RFC6949, May 2013, 486 . 488 [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, 489 DOI 10.17487/RFC7322, September 2014, 490 . 492 [RFC7564] Saint-Andre, P. and M. Blanchet, "PRECIS Framework: 493 Preparation, Enforcement, and Comparison of 494 Internationalized Strings in Application Protocols", 495 RFC 7564, DOI 10.17487/RFC7564, May 2015, 496 . 498 [RFC7613] Saint-Andre, P. and A. Melnikov, "Preparation, 499 Enforcement, and Comparison of Internationalized Strings 500 Representing Usernames and Passwords", RFC 7613, 501 DOI 10.17487/RFC7613, August 2015, 502 . 504 [UnicodeCurrent] 505 The Unicode Consortium, "The Unicode Standard", 506 2014-present, . 508 Appendix A. Acknowledgements 510 With many thanks to the members of the IAB i18n program and the RFC 511 Format Design Team. 513 Author's Address 515 Heather Flanagan (editor) 516 RFC Editor 518 Email: rse@rfc-editor.org 519 URI: http://orcid.org/0000-0002-2647-2220