idnits 2.17.1 draft-iab-rfc-nonascii-00.txt: -(171): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(192): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(249): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(257): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(341): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(380): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(381): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(382): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(383): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 27 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 220: '...me or code point MUST be included in t...' -- The abstract seems to indicate that this document updates RFC7322, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 12, 2016) is 3026 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'GOST3410' is mentioned on line 373, but not defined == Unused Reference: 'RFC3550' is defined on line 509, but no explicit reference was found in the text == Unused Reference: 'RFC6949' is defined on line 523, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-iab-xml2rfc-01 ** Obsolete normative reference: RFC 7564 (Obsoleted by RFC 8264) ** Obsolete normative reference: RFC 7613 (Obsoleted by RFC 8265) Summary: 4 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Architecture Board H. Flanagan, Ed. 3 Internet-Draft RFC Editor 4 Intended status: Informational January 12, 2016 5 Expires: July 15, 2016 7 The Use of Non-ASCII Characters in RFCs 8 draft-iab-rfc-nonascii-00 10 Abstract 12 In order to support the internationalization of protocols and a more 13 diverse Internet community, the RFC Series must evolve to allow for 14 the use of non-ASCII characters in RFCs. While English remains the 15 required language of the Series, the encoding of future RFCs will be 16 in UTF-8, allowing for a broader range of characters than typically 17 used in the English language. This document describes the RFC Editor 18 requirements and guidance regarding the use of non-ASCII characters 19 in RFCs. 21 This document updates RFC 7322. Please review the PDF version of 22 this draft. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on July 15, 2016. 41 Copyright Notice 43 Copyright (c) 2016 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Basic requirements . . . . . . . . . . . . . . . . . . . . . 3 60 3. Rules for the use of non-ASCII characters . . . . . . . . . . 3 61 3.1. General usage throughout a document . . . . . . . . . . . 4 62 3.2. Authors, Contributors, and Acknowledgments . . . . . . . 4 63 3.3. Company Names . . . . . . . . . . . . . . . . . . . . . . 5 64 3.4. Body of the document . . . . . . . . . . . . . . . . . . 5 65 3.5. Tables . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.6. Code components . . . . . . . . . . . . . . . . . . . . . 8 67 3.7. Bibliographic text . . . . . . . . . . . . . . . . . . . 8 68 3.8. Keywords and Citation Tags . . . . . . . . . . . . . . . 9 69 3.9. Address Information . . . . . . . . . . . . . . . . . . . 9 70 4. Normalization Forms . . . . . . . . . . . . . . . . . . . . . 10 71 5. XML Markup . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 73 7. Internationalization Considerations . . . . . . . . . . . . . 11 74 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 75 9. Change log - to be removed by the RFC Editor . . . . . . . . 11 76 9.1. draft-flanagan-nonascii to draft-iab-rfc-nonascii-00 . . 11 77 9.2. -04 to -05 . . . . . . . . . . . . . . . . . . . . . . . 11 78 9.3. -04 to -05 . . . . . . . . . . . . . . . . . . . . . . . 11 79 9.4. -02 to -04 . . . . . . . . . . . . . . . . . . . . . . . 11 80 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 81 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 13 82 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 84 1. Introduction 86 Please review the PDF version of this draft. 88 For much of the history of the RFC Series, the character encoding 89 used for RFCs has been ASCII [ANSI.X3-4.1986]. This was a sensible 90 choice at the time: the language of the Series has always been 91 English, a language that primarily uses ASCII-encoded characters 92 (ignoring for a moment words borrowed from more richly decorated 93 alphabets); and, ASCII is the "lowest common denominator" for 94 character encoding, making cross-platform viewing trivial. 96 There are limits to ASCII, however, that hinder its continued use as 97 the exclusive character encoding for the Series. The increasing need 98 for easily readable, internationalized content suggests it is time to 99 allow non-ASCII characters in RFCs where necessary. To support this 100 move away from ASCII, RFCs will switch to supporting UTF-8 as the 101 default character encoding and allow support for a broad range of 102 Unicode character support. [UnicodeCurrent] Note that the RFC 103 Editor may reject any codepoint that does not render adequately in 104 enough formats or on in enough rendering engines using the current 105 tooling. 107 Given the continuing goal of maximum readability across platforms, 108 the use of non-ASCII characters should be limited in a document to 109 only where necessary within the text. This document describes the 110 rules under which non-ASCII characters may be used in an RFC. These 111 rules will be applied as the necessary changes are made to submission 112 checking and editorial tools. 114 This document updates the RFC Style Guide [RFC7322]. 116 The details described in this document are expected to change based 117 on experience gained in implementing the RFC production center's 118 toolset. Revised documents will be published capturing those changes 119 as the toolset is completed. Other implementers must not expect 120 those changes to remain backwards-compatible with the details 121 described this document. 123 2. Basic requirements 125 Two fundamental requirements inform the guidance and examples 126 provided in this document. They are: 128 o Searches against RFC indexes and database tables need to return 129 expected results and support appropriate Unicode string matching 130 behaviors; 132 o RFCs must be able to display correctly across a wide range of 133 readers and browsers. People whose system does not have the fonts 134 needed to display a particular RFC need to be able to read the 135 various publication formats and the XML correctly in order to 136 understand and implement the information described in the 137 document. 139 3. Rules for the use of non-ASCII characters 141 This section describes the guidelines for the use of non-ASCII 142 characters in the header, body, and reference sections of an RFC. If 143 the RFC Editor identifies areas where the use of non-ASCII characters 144 negatively impacts the readability of the text, they will request 145 alternate text. 147 The RFC Editor may, in cases of entire words represented in non-ASCII 148 characters, ask for a set of reviewers to verify the meaning, 149 spelling, characters, and grammar of the text. 151 3.1. General usage throughout a document 153 Where the use of non-ASCII characters is purely as part of an example 154 and not otherwise required for correct protocol operation, escaping 155 the non-ASCII character is not required. Note, however, that as the 156 language of the RFC Series is English, the use of non-ASCII 157 characters is based on the spelling of words commonly used in the 158 English language following the guidance in the Merriam-Webster 159 dictionary [MerrWeb]. 161 The RFC Editor will use the primary spelling listed in that 162 dictionary by default. 164 Example of non-ASCII characters that do not require escaping 165 [RFC4475]: 167 This particular response contains unreserved and non-ascii 168 UTF-8 characters. 169 This response is well formed. A parser must accept this message. 170 Message Details : unreason 171 SIP/2.0 200 = 2**3 * 5**2 но сто девяносто девять - простое 172 Via: SIP/2.0/UDP 192.0.2.198;branch=z9hG4bK1324923 173 Call-ID: unreason.1234ksdfak3j2erwedfsASdf 174 CSeq: 35 INVITE 175 From: sip:user@example.com;tag=11141343 176 To: sip:user@example.edu;tag=2229 Content-Length: 154 177 Content-Type: application/sdp 179 3.2. Authors, Contributors, and Acknowledgments 181 Person names may appear in several places within an RFC. In all 182 cases, valid Unicode is required. For names that include characters 183 outside of the Unicode Latin and Latin Extended script, an author- 184 provided, ASCII-only identifier is required to assist in search and 185 indexing of the document. 187 Example for the header: 189 Internet Engineering Task Force (IETF) J. Tong 190 Request for Comments: 7380 C. Bi, Ed. 191 Category: Standards Track China Telecom 192 ISSN: 2070-1721 רוני אבן (R. Even) 193 吴钦 (Q. Wu), Ed. 194 R. Huang 195 Huawei 196 November 2014 198 Example for the Acknowledgements: 200 OLD: The following people contributed significant text to early 201 versions of this draft: Patrik Faltstrom, William Chan, and Fred 202 Baker. 204 PROPOSED/NEW: The following people contributed significant text to 205 early versions of this draft: Patrik Fältström, 陈智昌 206 (William Chan), and Fred Baker. 208 3.3. Company Names 210 Company names may appear in several places within an RFC. In all 211 cases, valid Unicode is required. For names that include characters 212 outside of the Unicode Latin and Latin Extended script, an author- 213 provided, ASCII-only identifier is required to assist in search and 214 indexing of the document. 216 3.4. Body of the document 218 When the mention of non-ASCII characters is required for correct 219 protocol operation and understanding, the characters' Unicode 220 character name or code point MUST be included in the text. 222 o Non-ASCII characters will require identifying the Unicode code 223 point. 225 o Use of the actual UTF-8 character (e.g., Δ) is encouraged so 226 that a reader can more easily see what the character is, if their 227 device can render the text. 229 o The use of the Unicode character names like "INCREMENT" in 230 addition to the use of Unicode code points is also encouraged. 231 When used, Unicode character names should be in all capital 232 letters. 234 Examples: 236 OLD [RFC7564]: 238 However, the problem is made more serious by introducing the full 239 range of Unicode code points into protocol strings. For example, 240 the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from 241 the Cherokee block look similar to the ASCII characters "STPETER" as 242 they might appear when presented using a "creative" font family. 244 NEW/ALLOWED: 246 However, the problem is made more serious by introducing the full 247 range of Unicode code points into protocol strings. For example, 248 the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 249 (ᏚᎢᎵᎬᎢᎬᏒ) from the Cherokee block look similar to the ASCII 250 characters "STPETER" as they might appear when presented using a 251 "creative" font family. 253 ALSO ACCEPTABLE: 255 However, the problem is made more serious by introducing the full 256 range of Unicode code points into protocol strings. For example, 257 the characters "ᏚᎢᎵᎬᎢᎬᏒ" (U+13DA U+13A2 U+13B5 U+13AC U+13A2 258 U+13AC U+13D2) from the Cherokee block look similar to the ASCII 259 characters "STPETER" as they might appear when presented using a 260 "creative" font family. 262 Example of proper identification of Unicode characters in an RFC: 264 Acceptable: 266 o Temperature changes in the Temperature Control Protocol are 267 indicated by the U+2206 character. 269 Preferred: 271 1. Temperature changes in the Temperature Control Protocol are 272 indicated by the U+2206 character ("Δ"). 274 2. Temperature changes in the Temperature Control Protocol are 275 indicated by the U+2206 character (INCREMENT). 277 3. Temperature changes in the Temperature Control Protocol are 278 indicated by the U+2206 character ("Δ", INCREMENT). 280 4. Temperature changes in the Temperature Control Protocol are 281 indicated by the U+2206 character (INCREMENT, "Δ"). 283 5. Temperature changes in the Temperature Control Protocol are 284 indicated by the "Delta" character "Δ" (U+2206). 286 6. Temperature changes in the Temperature Control Protocol are 287 indicated by the character "Δ" (INCREMENT, U+2206). 289 Which option of (1), (2), (3), (4), (5), or (6) is preferred may 290 depend on context and the specific character(s) in question. All are 291 acceptable within an RFC. BCP 137, "ASCII Escaping of Unicode 292 Character" describes the pros and cons of different options for 293 identifying Unicode characters in an ASCII document BCP137 [RFC5137]. 295 3.5. Tables 297 Tables follow the same rules for identifiers and characters as in 298 "Section 3.4. Body of the document". If it is sensible (i.e., more 299 understandable for a reader) for a given document to have two tables 300 -- one including the identifiers and non-ASCII characters and a 301 second with just the non-ASCII characters -- that will be allowed on 302 a case-by-case basis. 304 Original text from "Preparation, Enforcement, and Comparison of 305 Internationalized Strings Representing Usernames and Passwords" 306 [RFC7613]. 308 Table 3: A sample of legal passwords 310 +------------------------------------+------------------------------+ 311 | # | Password | Notes | 312 +------------------------------------+------------------------------+ 313 | 12| | ASCII space is allowed | 314 +------------------------------------+------------------------------+ 315 | 13| | Different from example 12 | 316 +------------------------------------+------------------------------+ 317 | 14| <πßå> | Non-ASCII letters are OK | 318 | | | (e.g., GREEK SMALL LETTER | 319 | | | PI, U+03C0) | 320 +------------------------------------+------------------------------+ 321 | 15| | Symbols are OK (e.g., BLACK | 322 | | | DIAMOND SUIT, U+2666) | 323 +------------------------------------+------------------------------+ 324 | 16| | OGHAM SPACE MARK, U+1680, is | 325 | | | mapped to U+0020 and thus | 326 | | | the full string is mapped to | 327 | | | | 328 +------------------------------------+------------------------------+ 330 Preferred text: 332 Table 3: A sample of legal passwords 334 +------------------------------------+------------------------------+ 335 | # | Password | Notes | 336 +------------------------------------+------------------------------+ 337 | 12| | ASCII space is allowed | 338 +------------------------------------+------------------------------+ 339 | 13| | Different from example 12 | 340 +------------------------------------+------------------------------+ 341 | 14| <πß๗> | Non-ASCII letters are OK | 342 | | | (e.g., GREEK SMALL LETTER | 343 | | | PI, U+03C0; LATIN SMALL | 344 | | | LETTER SHARP S, U+00DF; THAI | 345 | | | DIGIT SEVEN, U+0E57) | 346 +------------------------------------+------------------------------+ 347 | 15| | Symbols are OK (e.g., BLACK | 348 | | | DIAMOND SUIT, U+2666) | 349 +------------------------------------+------------------------------+ 350 | 16| | OGHAM SPACE MARK, U+1680, is | 351 | | | mapped to U+0020 and thus | 352 | | | the full string is mapped to | 353 | | | | 354 +------------------------------------+------------------------------+ 356 3.6. Code components 358 The RFC Editor encourages the use of the U+ notation except within a 359 code component where you must follow the rules of the programming 360 language in which you are writing the code. 362 3.7. Bibliographic text 364 The reference entry must be in English; whatever subfields are 365 present must be available in ASCII-encoded characters. As long as 366 good sense is used, the reference entry may also include non-ASCII 367 characters at the author's discretion and as provided by the author. 368 The RFC Editor may request a review of the non-ASCII reference entry. 369 This applies to both normative and informative references. 371 Example: 373 [GOST3410] "Information technology. Cryptographic data security. 374 Signature and verification processes of [electronic] 375 digital signature.", GOST R 34.10-2001, Gosudarstvennyi 376 Standard of Russian Federation, Government Committee of 377 Russia for Standards, 2001. (In Russian) 379 Allowable addition to the above citation: 380 "Информационная технология. Криптографическая защита 381 информации. Процессы формирования и проверки 382 электронной цифровой подписи", GOST R 34.10-2001, 383 Государственный стандарт Российской Федерации, 2001. 385 3.8. Keywords and Citation Tags 387 Keywords and citation tags must be ASCII only. 389 3.9. Address Information 391 The purpose of providing address information, either postal or 392 e-mail, is to assist readers of an RFC to contact the author or 393 authors. Authors may include the official postal address as 394 recognized by their company or local postal service without 395 additional non-ASCII character escapes. If the email address 396 includes non-ASCII characters and is a valid email address at the 397 time of publication, non-ASCII character escapes are not required. 399 Example: 401 Qin Wu (editor) 402 Huawei 403 101 Software Avenue, Yuhua District 404 Nanjing, Jiangsu 210012 405 China 407 Alternate contact information: 408 吴钦 (editor) 409 华为技术有限公司 410 雨花区软件大道101号 411 江苏南京 210012 412 中国 414 ------ 416 Roni Even 417 Huawei 418 14 David Hamelech 419 Tel Aviv 64953 420 Israel 422 Alternate contact information: 423 רוני אבן 424 וואווי 425 דוד המלך 14 426 תל אביב 64953 427 ישראל 429 4. Normalization Forms 431 Authors should not expect normalization forms to be preserved. If a 432 particular normalization form is expected, note that in the text of 433 the RFC. 435 5. XML Markup 437 As described above, use of non-ASCII characters in areas such as 438 email, company name, addresses, and name is allowed. In order to 439 make it easier for code to identify the appropriate ASCII 440 alternatives, authors must include an "ascii" attribute to their XML 441 markup when an ASCII alternative is required. See [I-D.iab-xml2rfc] 442 for more detail on how to tag ASCII alternatives. 444 6. IANA Considerations 446 This document makes no request of IANA. 448 Note to RFC Editor: this section may be removed on publication as an 449 RFC. 451 7. Internationalization Considerations 453 The ability to use non-ASCII characters in RFCs in a clear and 454 consistent manner will improve the ability to describe 455 internationalized protocols and will recognize the diversity of 456 authors. However, the goal of readability will override the use of 457 non-ASCII characters within the text. 459 8. Security Considerations 461 Valid Unicode that matches the expected text must be verified in 462 order to preserve expected behavior and protocol information. 464 9. Change log - to be removed by the RFC Editor 466 9.1. draft-flanagan-nonascii to draft-iab-rfc-nonascii-00 468 Changed requirement for all nonASCII names (including company names) 469 to require an ASCII equivalent to requiring it only for non-Latin 470 characters. Extended Latin is also acceptable without an ASCII 471 equivalent. 473 9.2. -04 to -05 475 Keywords: expanded section to include citation tags. 477 Internationalization considerations: reiterated that the use of non- 478 ASCII characters is not automatically guaranteed. 480 9.3. -04 to -05 482 Introduction: added statement regarding document subject to change. 484 Tables: added example. 486 Code: removed placeholder for example. 488 9.4. -02 to -04 490 Introduction and Abstract: change to be clearer about what/why non- 491 ASCII characters are being allowed. 493 XML Markup: section added. 495 10. References 497 [ANSI.X3-4.1986] 498 American National Standards Institute, "Coded Character 499 Set - 7-bit American Standard Code for Information 500 Interchange", ANSI X3.4, 1986. 502 [I-D.iab-xml2rfc] 503 Hoffman, P., "The 'XML2RFC' version 3 Vocabulary", draft- 504 iab-xml2rfc-01 (work in progress), January 2016. 506 [MerrWeb] Merriam-Webster,Inc., "Merriam-Webster's Collegiate 507 Dictionary, 11th Edition", 2009. 509 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 510 Jacobson, "RTP: A Transport Protocol for Real-Time 511 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 512 July 2003, . 514 [RFC4475] Sparks, R., Ed., Hawrylyshen, A., Johnston, A., Rosenberg, 515 J., and H. Schulzrinne, "Session Initiation Protocol (SIP) 516 Torture Test Messages", RFC 4475, DOI 10.17487/RFC4475, 517 May 2006, . 519 [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 520 BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008, 521 . 523 [RFC6949] Flanagan, H. and N. Brownlee, "RFC Series Format 524 Requirements and Future Development", RFC 6949, 525 DOI 10.17487/RFC6949, May 2013, 526 . 528 [RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, 529 DOI 10.17487/RFC7322, September 2014, 530 . 532 [RFC7564] Saint-Andre, P. and M. Blanchet, "PRECIS Framework: 533 Preparation, Enforcement, and Comparison of 534 Internationalized Strings in Application Protocols", 535 RFC 7564, DOI 10.17487/RFC7564, May 2015, 536 . 538 [RFC7613] Saint-Andre, P. and A. Melnikov, "Preparation, 539 Enforcement, and Comparison of Internationalized Strings 540 Representing Usernames and Passwords", RFC 7613, 541 DOI 10.17487/RFC7613, August 2015, 542 . 544 [UnicodeCurrent] 545 The Unicode Consortium, "The Unicode Standard", 546 2014-present, . 548 Appendix A. Acknowledgements 550 With many thanks to the members of the IAB i18n program and the RFC 551 Format Design Team. 553 Author's Address 555 Heather Flanagan (editor) 556 RFC Editor 558 Email: rse@rfc-editor.org 559 URI: http://orcid.org/0000-0002-2647-2220