idnits 2.17.1 draft-klensin-unicode-escapes-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 613. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 624. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 631. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 637. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 17, 2007) is 5997 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft November 17, 2007 4 Intended status: Best Current 5 Practice 6 Expires: May 20, 2008 8 ASCII Escaping of Unicode Characters 9 draft-klensin-unicode-escapes-07.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on May 20, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 There are a number of circumstances in which an escape mechanism is 43 needed in conjunction with a protocol to encode characters that 44 cannot be represented or transmitted directly. With ASCII coding the 45 traditional escape has been either the decimal or hexadecimal numeric 46 value of the character, written in a variety of different ways. The 47 move to Unicode, where characters occupy two or more octets and may 48 be coded in several different forms, has further complicated the 49 question of escapes. This document discusses some options now in use 50 and discusses considerations for selecting one for use in new IETF 51 protocols and protocols that are now being internationalized. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Context and Background . . . . . . . . . . . . . . . . . . 3 57 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 1.3. Discussion List . . . . . . . . . . . . . . . . . . . . . 4 59 2. Encodings that Represent Unicode Code Points: Code 60 Position versus UTF-8 or UTF-16 Octets . . . . . . . . . . . . 4 61 3. Referring to Unicode Characters . . . . . . . . . . . . . . . 5 62 4. Syntax for Code Point Escapes . . . . . . . . . . . . . . . . 6 63 5. Recommended Presentation Variants for Unicode Code Point 64 Excapes . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 5.1. Backslash-U with Delimiters . . . . . . . . . . . . . . . 7 66 5.2. XML and HTML . . . . . . . . . . . . . . . . . . . . . . . 7 67 6. Forms that are Normally Not Recommended . . . . . . . . . . . 8 68 6.1. The C Programming Language: Backslash-U . . . . . . . . . 8 69 6.2. Perl: A Hexadecimal String . . . . . . . . . . . . . . . . 8 70 6.3. Java: Escaped UTF-16 . . . . . . . . . . . . . . . . . . . 9 71 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 72 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 73 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 74 10. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 10 75 10.1. Changes in -01 . . . . . . . . . . . . . . . . . . . . . . 10 76 10.2. Major Changes in -02 . . . . . . . . . . . . . . . . . . . 10 77 10.3. Major Changes in -03 . . . . . . . . . . . . . . . . . . . 10 78 10.4. Major Changes in -04 . . . . . . . . . . . . . . . . . . . 10 79 10.5. Changes in -05 . . . . . . . . . . . . . . . . . . . . . . 11 80 10.6. Changes in -06 . . . . . . . . . . . . . . . . . . . . . . 11 81 10.7. Changes in -07 . . . . . . . . . . . . . . . . . . . . . . 11 82 Appendix A. Formal Syntax for Forms Not Recommended . . . . . . 11 83 Appendix A.1. The C Programming Language Form . . . . . . . . . . 11 84 Appendix A.2. Perl Form . . . . . . . . . . . . . . . . . . . . . 12 85 Appendix A.3. Java Form . . . . . . . . . . . . . . . . . . . . . 12 86 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 87 11.1. Normative References . . . . . . . . . . . . . . . . . . . 12 88 11.2. Informative References . . . . . . . . . . . . . . . . . . 12 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 90 Intellectual Property and Copyright Statements . . . . . . . . . . 14 92 1. Introduction 94 1.1. Context and Background 96 There are a number of circumstances in which an escape mechanism is 97 needed in conjunction with a protocol to encode characters that 98 cannot be represented or transmitted directly. With ASCII [ASCII] 99 coding the traditional escape has been either the decimal or 100 hexadecimal numeric value of the character, written in a variety of 101 different ways. For example, in different contexts, we have seen 102 %dNN or %NN for the decimal form, %NN, %xNN, X'nn', and %X'NN' for 103 the hexadecimal form. "%NN" has become popular in recent years to 104 represent a hexadecimal value without further qualification, perhaps 105 as a consequence of its use in URLs and their prevalence. There are 106 even some applications around in which octal forms are used and, 107 while they do not generalize well, the MIME Quoted-Printable and 108 Encoded-word forms can be thought of as yet another set of escapes. 109 So, even for the fairly simple cases of ASCII and standard built by 110 extending ASCII, such as the ISO 8859 family, we have been living 111 with several different escaping forms, each the result of some 112 history. 114 When one moves to Unicode [Unicode] [ISO10646], where characters 115 occupy two or more octets and may be coded in several different 116 forms, the question of escapes becomes even more complicated. 117 Unicode represents characters as code points: numeric values from 0 118 to hex 10FFFF. When referencing code points in flowing text, they 119 are represented using the so-called "U+" notation, as values from 120 U+0000 to U+10FFFF. When serialized into octets, these code points 121 can be represented in different forms: 123 o in UTF-8 with one to four octets [RFC3629] 125 o in UTF-16 with two or four octets (or one or two seizets - 16-bit 126 units) 128 o in UTF-32 with exactly four octets (or one 32-bit unit) 130 When escaping characters, we have seen fairly extensive use of 131 hexadecimal representations of both the serialized forms and 132 variations on the U+ notation, known as code point escapes. 134 In accordance with existing best-practices recommendations [RFC2277], 135 new protocols that are required to carry textual content for human 136 use SHOULD be designed in such a way that the full repertoire of 137 Unicode characters may be represented in that text. 139 This document proposes that existing protocols being 140 internationalized, and that need an escape mechanism, SHOULD use some 141 contextually-appropriate variation on references to code points as 142 described in Section 2 unless other considerations outweigh those 143 described here. 145 This recommendation is not applicable to protocols that already 146 accept native UTF-8 or some other encoding of Unicode. In general, 147 when protocols are internationalized, it is preferable to accept 148 those forms rather than using escapes. This recommendation applies 149 to cases, including transition arrangements, in which that is not 150 practical. 152 In addition to the protocol contexts addressed in this specification, 153 escapes to represent Unicode characters also appear in presentations 154 to users, i.e., in user interfaces (UI). The formats specified in, 155 and the reasoning of, this document may be applicable in UI contexts 156 as well, but this is not a proposal to standardize UI or presentation 157 forms. 159 This document does not make general recommendations for processing 160 Unicode strings or for their contents. It assumes that the strings 161 that one might want to escape are valid and reasonable and that the 162 definition of "valid and reasonable" is the province of other 163 documents. Recommendations about general treatment of Unicode 164 strings may be found in many places, including the Unicode Standard 165 itself and the W3C Character Model [W3C-CharMod] as well as specific 166 rules in individual protocols. 168 1.2. Terminology 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 Additional Unicode-specific terminology appears in [UnicodeGlossary], 175 but is not necessary for understanding this specification. 177 1.3. Discussion List 179 Discussion of this document should be addressed to the 180 discuss@apps.ietf.org mailing list. 182 2. Encodings that Represent Unicode Code Points: Code Position versus 183 UTF-8 or UTF-16 Octets 185 There are two major families of ways to escape Unicode characters. 186 One uses the code point in some representation (see the next 187 section), the other encodes the octets of the UTF-8 encoding or some 188 other encoding in some representation. Some other options are 189 possible, but they have been rare in practice. This specification 190 recommends that, in the absence of compelling reasons to do 191 otherwise, the Unicode code points SHOULD be used rather than a 192 representation of UTF-8 (or UTF-16) octets. There are several 193 reasons for this, including: 195 o One reason for the success of many IETF protocols is that they use 196 human-interpretable text forms to communicate, rather than 197 encodings that generally require computer programs (or hand 198 simulation of algorithms) to decode. This suggests that the 199 presentation form should reference the Unicode tables for 200 characters and to do so as simply as possible. 202 o Because of the nature of UTF-8, for a human to interpret a decimal 203 or hexadecimal numeral representation of UTF-8 octets requires one 204 or more decoding steps to determine a Unicode code point that can 205 used to look up the character in a table. That may be appropriate 206 in some cases where the goal is really to represent the UTF-8 form 207 but, in general, it just obscures desired information and makes 208 errors more likely and debugging harder. 210 o Except for characters in the ASCII subset of Unicode (U+0000 211 through U+007F), the code point form is generally more compact 212 than forms based on coding UTF-8 octets, sometimes much more 213 compact. 215 The same considerations that apply to representation of the octets of 216 UTF-8 encoding also apply to more compact ACE encodings such as the 217 "bootstring" encoding [RFC3492] with or without its "Punycode" 218 profile. 220 Similar considerations apply to UTF-16 encoding, such as the \uNNNN 221 form used in Java (See Section 6.3). While those forms are 222 equivalent to code point references for the Basic Multilingual Plane 223 (BMP, Plane 0), a two-stage decoding process is needed to handle 224 surrogates to access higher planes. 226 3. Referring to Unicode Characters 228 Regardless of what decisions are made about escapes for Unicode 229 characters in protocol or similar contexts, text referring to a 230 Unicode code point SHOULD use the U+NNNN[N[N]] syntax, as specified 231 in the Unicode Standard, where the NNNN... string consists of 232 hexadecimal numbers. Text actually containing a Unicode character 233 SHOULD use a syntax more suitable for automated processing. 235 4. Syntax for Code Point Escapes 237 There are many options for code point escapes, some of which are 238 summarized below. All are equivalent in content and semantics -- the 239 differences lie in syntax. The best choice of syntax for a 240 particular protocol or other application depends on that application: 241 one form may simply "fit" better in a given context than others. It 242 is clear, however, that hexadecimal values are preferable to other 243 alternatives: Systems based on decimal or octal offsets SHOULD NOT be 244 used. 246 Since this specification does not recommend one specific syntax, 247 protocols specifications that use escapes MUST define the syntax they 248 are using, including any necessary escapes to permit the escape 249 sequence to be used literally. 251 The application designer selecting a format should consider at least 252 the following factors: 254 o If similar or related protocols already use one form, it may be 255 best to select that form for consistency and predictability. 257 o A Unicode code point can fall in the range from U+0000 to 258 U+10FFFF. Different escape systems may use four, five, six, or 259 eight hexadecimal digits. To avoid clever syntax tricks and the 260 consequent risk of confusion and errors, forms that use explicit 261 string delimiters are generally preferred over other alternatives. 262 In many contexts, symmetric paired delimiters are easier to 263 recognize and understand than visually-unrelated ones. 265 o Syntax forms starting in "\u", without explicit delimiters, have 266 been used in several different escape systems, including the four 267 or eight digit syntax of C [ISO-C] (see Section 6.1), the UTF-16 268 encoding of Java [Java] (see Section 6.3), and some arrangements 269 that may follow the "\u" with four, five, or six digits. The 270 possible confusion about which option is actually being used may 271 argue against use of any of these forms. 273 o Forms that require decoding surrogate pairs share most of the 274 problems that appear with encoding of UTF-8 octets. Internet 275 protocols SHOULD NOT use surrogate pairs. 277 5. Recommended Presentation Variants for Unicode Code Point Excapes 279 There are a number of different ways to represent a Unicode code 280 point position. No one of them appears to be "best" for all 281 contexts. In addition, when an escape is needed for the escape 282 mechanism itself, the optimal one of those might differ from one 283 context to another. 285 Some forms that are in popular use and that might reasonably be 286 considered for use in a given protocol are described below and 287 identified with a current-use context when feasible. The two in this 288 section are recommended for use in Internet Protocols. Other popular 289 ones appear in Section 6 with some discussion of their disadvantages. 291 5.1. Backslash-U with Delimiters 293 One of the recommended forms is a variation of the many forms that 294 start in "\u" (See, e.g., Section 6.1, below>), but uses explicit 295 delimiters for the reasons discussed elsewhere. 297 Specifically, in ABNF [RFC4234], 299 EmbeddedUnicodeChar = %x5C.75.27 4*6HEXDIG %x27 300 ; starting with lower case "\u and "'" and ending with "'". 301 ; Note that the encodings are considered to be abstractions 302 ; for the relevant characters, not designations of specific 303 ; octets. 305 HEXDIG = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / 306 "A" / "B" / "C" / "D" / "E" / "F" 307 ; effectively identical with definition in RFC 4234. 309 Protocol designers of applications using this form should specify a 310 way to escape the introducing backslash ("\") if needed. "\\" is one 311 obvious possibility, but not the only one. 313 5.2. XML and HTML 315 The other recommended form is the one used in XML. It uses the form 316 "&#xNNNN;". Like the Perl form (Section 6.2), this form has a clear 317 ending delimiter, reducing ambiguity. HTML uses a similar form, but 318 the semicolon may be omitted in some cases. If that is done, the 319 advantages of the the delimiter disappear so the HTML form without 320 the semicolon SHOULD NOT be used. However, this format is often 321 considered ugly and awkward outside of its native HTML, XML, and 322 similar contexts. 324 In ABNF: 326 EmbeddedUnicodeChar = %x26.23.78 2*6HEXDIG %x3B 327 ; starts with "&#x" and ends with ";" 329 Note that a literal "&" can be expressed by "&" when using this 330 style. 332 6. Forms that are Normally Not Recommended 334 6.1. The C Programming Language: Backslash-U 336 The forms 338 \UNNNNNNNN (for any Unicode character) and 340 \uNNNN (for Unicode characters in plane 0) 342 are utilized in the C Programming Language [ISO-C] when an ASCII 343 escape for embedded Unicode characters is needed. 345 There are disadvantages of this form which may be significant. 346 First, the use of a case variation (between "u" for the four digit 347 form and "U" for the eight digit form) may not seem natural in 348 environments in which upper and lower case characters are generally 349 considered equivalent and might be confusing to people who are not 350 very familiar with Latin-based alphabets (although those people might 351 have even more trouble reading relevant English text and 352 explanations). Second, as discussed in Section 4 the very fact that 353 there are several different conventions that start in \u or \U may 354 become a source of confusion as people make incorrect assumptions 355 about what they are looking at. 357 6.2. Perl: A Hexadecimal String 359 Perl uses the form \x{NNNN...}. The advantage of this form is that 360 there are explicit delimiters, resolving the issue of having 361 variable-length strings or using the case-change mechanism of the 362 proposed form to distinguish between Plane 0 and more general forms. 363 Some other programming languages would tend to favor X'NNNN...' forms 364 for hexadecimal strings and perhaps U'NNNN...' for Unicode-specific 365 strings, but those forms do not seem to be in use around the IETF. 367 Note that there is a possible ambiguity in how two-character or low- 368 numbered sequences in this notation are understood, i.e., that octets 369 in the range \x(00) through \x(FF) may be construed as being in the 370 local character set, not as Unicode code points. Because of this 371 apparent ambiguity, and because IETF documents do not contain 372 provision for pragmas (see [PERLUniIntro] for more information about 373 the "encoding" pragma in Perl and other details) the Perl forms 374 should be used with extreme caution if at all. 376 6.3. Java: Escaped UTF-16 378 Java [Java] uses the form \uNNNN, but as a reference to UTF-16 379 values, not Unicode code points. While it uses a syntax similar to 380 that described in Section 6.1, this relationship to UTF-16 makes it, 381 in many respects, more similar to the encodings of UTF-8 discussed 382 above than to an escape that designates Unicode code points. Note 383 that the UTF-16 form, and hence the Java escape notation, can 384 represent characters outside Plane 0 (i.e., above U+FFFF) only by the 385 use of surrogate pairs, raising some of the same issues as the use of 386 UTF-8 octets discussed above. For characters in Plane 0, the Java 387 form is indistinguishable from the Plane 0-only form described in 388 Section 6.1. If only for that reason, it SHOULD NOT be used as an 389 escape except in those Java contexts in which it is natural. 391 7. IANA Considerations 393 This document specifies no actions for IANA. 395 8. Security Considerations 397 This document proposes a set of rules for encoding Unicode characters 398 when other considerations do not apply. Since all of the recommended 399 encodings are unambiguous and normalization issues are not involved, 400 it should not introduce any security issues that are not present as a 401 result of simple use of non-ASCII characters, no matter how they are 402 encoded. The mechanisms suggested should slightly lower the risks of 403 confusing users with encoded characters by making the identity of the 404 characters being used somewhat more obvious than some of the 405 alternatives. 407 An escape mechanism such as the one specified in this document can 408 allow characters to be represented in more than one way. Where 409 software interprets the escaped form, there is a risk that security 410 checks, and any necessary checks for, e.g., minimal or normalized 411 forms, are done at the wrong point. 413 9. Acknowledgments 415 This document was produced in response to a series of discussions 416 within the IETF Applications Area and as part of work on email 417 internationalization and internationalized domain name updates. It 418 is a synthesis of a large number of discussions, the comments of the 419 participants in which are gratefully acknowledged. The help of Mark 420 Davis in constructing a list of alternative presentations and 421 selecting among them was especially important. 423 Tim Bray, Peter Constable, Stephane Bortzmeyer, Chris Newman, Frank 424 Ellermann, Clive D.W. Feather, Philip Guenther, Bjoern Hoehrmann, 425 Simon Josefsson, Bill McQuillan, der Mouse, Phil Pennock, and Julian 426 Reschke provided careful reading and some corrections and suggestions 427 on the various drafts. Taken together, their suggestions motivated 428 the significant revision of this document and its recommendations 429 between version -00 and version -01 and further improvements in the 430 subsequent versions. 432 10. Change log 434 [[anchor9: RFC Editor: Please remove this section before 435 publication.]] 437 10.1. Changes in -01 439 o Corrected ABNF syntax for Hex-quad and Full-form. 441 10.2. Major Changes in -02 443 This version removes the recommendation of a particular format, 444 discussing several of them and indicating considerations in making a 445 choice. 447 10.3. Major Changes in -03 449 This version improves the ABNF and adds it for more of the escape 450 techniques. It also contains several editorial and contextual 451 changes. 453 10.4. Major Changes in -04 455 o Updated this section to reflect the changes in -02 and -03. 457 o Modified the structure of the document to explicitly recommend the 458 "\u'[N[N]]NNNN'" and XML forms (still trying to make a 459 recommendation, not just a list). 461 o Clarified the description of the Perl form, added a reference, and 462 warned about the ambiguity with single octets. 464 o Some additional editorial changes for clarity. 466 10.5. Changes in -05 468 Moved syntax for the "not recommended" forms to an appendix. 470 10.6. Changes in -06 472 o Added syntax for Java to appendix, per Clive Feather. 474 o Added discussion of escapes for \ in the backslash-U case 475 (Section 5.1) per Frank Ellerman. 477 o Moved the definition of HEXDIG from the appendix to the normative 478 Section 5.1. 480 o Small editorial and layout corrections. 482 10.7. Changes in -07 484 Version 06 was the IETF Last Call version. This version reflects 485 changes made as the result of comments made during and after Last 486 Call. 488 o Changed reference name for the Perl document to conform to 489 conventional Perl usage. 491 o Changed terminology in Section 2 to better align with Unicode 492 Standard terminology. 494 Appendix A. Formal Syntax for Forms Not Recommended 496 While the syntax for the escape forms that are not recommended above 497 (see Section 6), are not given inline in the hope of discouraging 498 their use, they are provided in this appendix in the hope that those 499 who choose to use them will do so consistently. The reader is 500 cautioned that some of these forms are not defined precisely in the 501 original specifications and that others have evolved over time in 502 ways that are not precisely consistent. Consequently, these 503 definitions are not normative and may not even precisely match 504 reasonable interpretations of their sources. 506 The definition of "HEXDIG" for the forms that follow appears in 507 Section 5.1. 509 Appendix A.1. The C Programming Language Form 511 Specifically, in ABNF [RFC4234], 512 EmbeddedUnicodeChar = BMP-form / Full-form 514 BMP-form = %x5C.75 4HEXDIG ; starting with lower case "\u" 515 ; The encodings are considered to be abstractions for the 516 ; relevant characters, not designations of specific octets. 518 Full-form = %x5C.55 8HEXDIG ; starting with upper case "\U" 520 Appendix A.2. Perl Form 522 EmbeddedUnicodeChar = %x5C.78 "{" 2*6HEXDIG "}" ; starts with "\x" 524 Appendix A.3. Java Form 526 EmbeddedUnicodeChar = %x5C.7A 4HEXDIG ; starts with "\u" 528 11. References 530 11.1. Normative References 532 [ISO10646] 533 International Organization for Standardization, 534 "Information Technology - Universal Multiple- Octet Coded 535 Character Set (UCS)"", ISO/IEC 10646:2003, December 2003. 537 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 538 Requirement Levels", BCP 14, RFC 2119, March 1997. 540 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 541 10646", STD 63, RFC 3629, November 2003. 543 [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 544 Specifications: ABNF", RFC 4234, October 2005. 546 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 547 5.0", 2006. 549 (Addison-Wesley, 2006. ISBN 0-321-48091-0). 551 11.2. Informative References 553 [ASCII] American National Standards Institute (formerly United 554 States of America Standards Institute), "USA Code for 555 Information Interchange", ANSI X3.4-1968, 1968. 557 ANSI X3.4-1968 has been replaced by newer versions with 558 slight modifications, but the 1968 version remains 559 definitive for the Internet. 561 [ISO-C] International Organization for Standardization, 562 "Information technology -- Programming languages -- C", 563 ISO/IEC 9899:1999, 1999. 565 [Java] Sun Microsystems, Inc., "Java Language Specification, 566 Third Edition", 2005, . 569 [PERLUniIntro] 570 Hietaniemi, J., "perluniintro", Perl documentation 5.8.8, 571 2002, . 573 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 574 Languages", BCP 18, RFC 2277, January 1998. 576 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 577 for Internationalized Domain Names in Applications 578 (IDNA)", RFC 3492, March 2003. 580 [UnicodeGlossary] 581 The Unicode Consortium, "Glossary of Unicode Terms", 582 June 2007, . 584 [W3C-CharMod] 585 Duerst, M., "Character Model for the World Wide Web 1.0", 586 W3C Recommendation, February 2005, 587 . 589 Author's Address 591 John C Klensin 592 1770 Massachusetts Ave, #322 593 Cambridge, MA 02140 594 USA 596 Phone: +1 617 245 1457 597 Email: john-ietf@jck.com 599 Full Copyright Statement 601 Copyright (C) The IETF Trust (2007). 603 This document is subject to the rights, licenses and restrictions 604 contained in BCP 78, and except as set forth therein, the authors 605 retain all their rights. 607 This document and the information contained herein are provided on an 608 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 609 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 610 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 611 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 612 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 613 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 615 Intellectual Property 617 The IETF takes no position regarding the validity or scope of any 618 Intellectual Property Rights or other rights that might be claimed to 619 pertain to the implementation or use of the technology described in 620 this document or the extent to which any license under such rights 621 might or might not be available; nor does it represent that it has 622 made any independent effort to identify any such rights. Information 623 on the procedures with respect to rights in RFC documents can be 624 found in BCP 78 and BCP 79. 626 Copies of IPR disclosures made to the IETF Secretariat and any 627 assurances of licenses to be made available, or the result of an 628 attempt made to obtain a general license or permission for the use of 629 such proprietary rights by implementers or users of this 630 specification can be obtained from the IETF on-line IPR repository at 631 http://www.ietf.org/ipr. 633 The IETF invites any interested party to bring to its attention any 634 copyrights, patents or patent applications, or other proprietary 635 rights that may cover technology that may be required to implement 636 this standard. Please address the information to the IETF at 637 ietf-ipr@ietf.org. 639 Acknowledgment 641 Funding for the RFC Editor function is provided by the IETF 642 Administrative Support Activity (IASA).