idnits 2.17.1 draft-reschke-rfc5987bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document obsoletes RFC5987, but the abstract doesn't seem to directly say this. It does mention RFC5987 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 8, 2011) is 4611 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Possible downref: Non-RFC (?) normative reference: ref. 'USASCII' -- Duplicate reference: RFC2978, mentioned in 'Err1912', was also mentioned in 'RFC2978'. -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) -- Obsolete informational reference (is this intentional?): RFC 5987 (Obsoleted by RFC 8187) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Reschke 3 Internet-Draft greenbytes 4 Obsoletes: 5987 (if approved) September 8, 2011 5 Intended status: Standards Track 6 Expires: March 11, 2012 8 Indicating Character Encoding and Language for HTTP Header Field 9 Parameters 10 draft-reschke-rfc5987bis-01 12 Abstract 14 By default, message header field parameters in Hypertext Transfer 15 Protocol (HTTP) messages cannot carry characters outside the ISO- 16 8859-1 character set. RFC 2231 defines an encoding mechanism for use 17 in Multipurpose Internet Mail Extensions (MIME) headers. This 18 document specifies an encoding suitable for use in HTTP header fields 19 that is compatible with a profile of the encoding defined in RFC 20 2231. 22 Editorial Note (To be removed by RFC Editor before publication) 24 Distribution of this document is unlimited. Although this is not a 25 work item of the HTTPbis Working Group, comments should be sent to 26 the Hypertext Transfer Protocol (HTTP) mailing list at 27 ietf-http-wg@w3.org [1], which may be joined by sending a message 28 with subject "subscribe" to ietf-http-wg-request@w3.org [2]. 30 Discussions of the HTTPbis Working Group are archived at 31 . 33 XML versions, latest edits and the issues list for this document are 34 available from 35 . A 36 collection of test cases is available at 37 . 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at http://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on March 11, 2012. 56 Copyright Notice 58 Copyright (c) 2011 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 75 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4 76 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5 77 3.2. Parameter Value Character Set and Language Information . . 5 78 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5 79 3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7 80 3.3. Language Specification in Encoded Words . . . . . . . . . 8 81 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8 82 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9 83 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9 84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 85 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 87 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10 88 7.2. Informative References . . . . . . . . . . . . . . . . . . 11 89 Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 12 90 Appendix B. Change Log (to be removed by RFC Editor before 91 publication) . . . . . . . . . . . . . . . . . . . . 12 92 B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12 93 B.2. Since draft-reschke-rfc5987bis-00 . . . . . . . . . . . . 12 94 Appendix C. Resolved issues (to be removed by RFC Editor 95 before publication) . . . . . . . . . . . . . . . . . 12 96 C.1. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12 97 C.2. title . . . . . . . . . . . . . . . . . . . . . . . . . . 13 98 C.3. historic5987 . . . . . . . . . . . . . . . . . . . . . . . 13 99 Appendix D. Open issues (to be removed by RFC Editor prior to 100 publication) . . . . . . . . . . . . . . . . . . . . 13 101 D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 102 D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 13 104 1. Introduction 106 By default, message header field parameters in HTTP ([RFC2616]) 107 messages cannot carry characters outside the ISO-8859-1 character set 108 ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism 109 for use in MIME headers. This document specifies an encoding 110 suitable for use in HTTP header fields that is compatible with a 111 profile of the encoding defined in RFC 2231. 113 This document obsoletes [RFC5987] and moves it to "historic" status; 114 the changes are summarized in Appendix A. 116 Note: in the remainder of this document, RFC 2231 is only 117 referenced for the purpose of explaining the choice of features 118 that were adopted; they are therefore purely informative. 120 Note: this encoding does not apply to message payloads transmitted 121 over HTTP, such as when using the media type "multipart/form-data" 122 ([RFC2388]). 124 2. Notational Conventions 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 128 document are to be interpreted as described in [RFC2119]. 130 This specification uses the ABNF (Augmented Backus-Naur Form) 131 notation defined in [RFC5234]. The following core rules are included 132 by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters), 133 DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP 134 (linear whitespace). 136 Note that this specification uses the term "character set" for 137 consistency with other IETF specifications such as RFC 2277 (see 138 [RFC2277], Section 3). A more accurate term would be "character 139 encoding" (a mapping of code points to octet sequences). 141 3. Comparison to RFC 2231 and Definition of the Encoding 143 RFC 2231 defines several extensions to MIME. The sections below 144 discuss if and how they apply to HTTP header fields. 146 In short: 148 o Parameter Continuations aren't needed (Section 3.1), 150 o Character Set and Language Information are useful, therefore a 151 simple subset is specified (Section 3.2), and 153 o Language Specifications in Encoded Words aren't needed 154 (Section 3.3). 156 3.1. Parameter Continuations 158 Section 3 of [RFC2231] defines a mechanism that deals with the length 159 limitations that apply to MIME headers. These limitations do not 160 apply to HTTP ([RFC2616], Section 19.4.7). 162 Thus, parameter continuations are not part of the encoding defined by 163 this specification. 165 3.2. Parameter Value Character Set and Language Information 167 Section 4 of [RFC2231] specifies how to embed language information 168 into parameter values, and also how to encode non-ASCII characters, 169 dealing with restrictions both in MIME and HTTP header parameters. 171 However, RFC 2231 does not specify a mandatory-to-implement character 172 set, making it hard for senders to decide which character set to use. 173 Thus, recipients implementing this specification MUST support the 174 "UTF-8" character set [RFC3629]. 176 Furthermore, RFC 2231 allows the character set information to be left 177 out. The encoding defined by this specification does not allow that. 179 3.2.1. Definition 181 The syntax for parameters is defined in Section 3.6 of [RFC2616] 182 (with RFC 2616 implied LWS translated to RFC 5234 LWSP): 184 parameter = attribute LWSP "=" LWSP value 186 attribute = token 187 value = token / quoted-string 189 quoted-string = 190 token = 192 In order to include character set and language information, this 193 specification modifies the RFC 2616 grammar to be: 195 parameter = reg-parameter / ext-parameter 197 reg-parameter = parmname LWSP "=" LWSP value 199 ext-parameter = parmname "*" LWSP "=" LWSP ext-value 201 parmname = 1*attr-char 203 ext-value = charset "'" [ language ] "'" value-chars 204 ; like RFC 2231's 205 ; (see [RFC2231], Section 7) 207 charset = "UTF-8" / mime-charset 209 mime-charset = 1*mime-charsetc 210 mime-charsetc = ALPHA / DIGIT 211 / "!" / "#" / "$" / "%" / "&" 212 / "+" / "-" / "^" / "_" / "`" 213 / "{" / "}" / "~" 214 ; as in Section 2.3 of [RFC2978] 215 ; except that the single quote is not included 216 ; SHOULD be registered in the IANA charset registry 218 language = 220 value-chars = *( pct-encoded / attr-char ) 222 pct-encoded = "%" HEXDIG HEXDIG 223 ; see [RFC3986], Section 2.1 225 attr-char = ALPHA / DIGIT 226 / "!" / "#" / "$" / "&" / "+" / "-" / "." 227 / "^" / "_" / "`" / "|" / "~" 228 ; token except ( "*" / "'" / "%" ) 230 Thus, a parameter is either a regular parameter (reg-parameter), as 231 previously defined in Section 3.6 of [RFC2616], or an extended 232 parameter (ext-parameter). 234 Extended parameters are those where the left-hand side of the 235 assignment ends with an asterisk character. 237 The value part of an extended parameter (ext-value) is a token that 238 consists of three parts: the REQUIRED character set name (charset), 239 the OPTIONAL language information (language), and a character 240 sequence representing the actual value (value-chars), separated by 241 single quote characters. Note that both character set names and 242 language tags are restricted to the US-ASCII character set, and are 243 matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646], 244 Section 2.1.1). 246 Inside the value part, characters not contained in attr-char are 247 encoded into an octet sequence using the specified character set. 248 That octet sequence is then percent-encoded as specified in Section 249 2.1 of [RFC3986]. 251 Producers MUST use the "UTF-8" ([RFC3629]) character set. Extension 252 character sets (mime-charset) are reserved for future use. 254 Note: recipients should be prepared to handle encoding errors, 255 such as malformed or incomplete percent escape sequences, or non- 256 decodable octet sequences, in a robust manner. This specification 257 does not mandate any specific behavior, for instance, the 258 following strategies are all acceptable: 260 * ignoring the parameter, 262 * stripping a non-decodable octet sequence, 264 * substituting a non-decodable octet sequence by a replacement 265 character, such as the Unicode character U+FFFD (Replacement 266 Character). 268 Note: the RFC 2616 token production ([RFC2616], Section 2.2) 269 differs from the production used in RFC 2231 (imported from 270 Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are 271 excluded. Thus, these two characters are excluded from the attr- 272 char production as well. 274 Note: the ABNF defined here differs from the one in 275 Section 2.3 of [RFC2978] in that it does not allow the single 276 quote character (see also RFC Errata ID 1912 [Err1912]). In 277 practice, no character set names using that character have been 278 registered at the time of this writing. 280 Note: [RFC5987] did require support for ISO-8859-1, too; for 281 compatibility with legacy code, recipients are encouraged to 282 support this encoding as well. 284 3.2.2. Examples 286 Non-extended notation, using "token": 288 foo: bar; title=Economy 290 Non-extended notation, using "quoted-string": 292 foo: bar; title="US-$ rates" 294 Extended notation, using the Unicode character U+00A3 (POUND SIGN): 296 foo: bar; title*=utf-8'en'%C2%A3%20rates 298 Note: the Unicode pound sign character U+00A3 was encoded into the 299 octet sequence C2 A3 using the UTF-8 character encoding, then 300 percent-encoded. Also, note that the space character was encoded as 301 %20, as it is not contained in attr-char. 303 Extended notation, using the Unicode characters U+00A3 (POUND SIGN) 304 and U+20AC (EURO SIGN): 306 foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates 308 Note: the Unicode pound sign character U+00A3 was encoded into the 309 octet sequence C2 A3 using the UTF-8 character encoding, then 310 percent-encoded. Likewise, the Unicode euro sign character U+20AC 311 was encoded into the octet sequence E2 82 AC, then percent-encoded. 312 Also note that HEXDIG allows both lowercase and uppercase characters, 313 so recipients must understand both, and that the language information 314 is optional, while the character set is not. 316 3.3. Language Specification in Encoded Words 318 Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to 319 also support language specification in encoded words. Although the 320 HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section 321 2.2), it's not clear to which header field exactly it applies, and 322 whether it is implemented in practice (see 323 for details). 325 Thus, this specification does not include this feature. 327 4. Guidelines for Usage in HTTP Header Field Definitions 329 Specifications of HTTP header fields that use the extensions defined 330 in Section 3.2 ought to clearly state that. A simple way to achieve 331 this is to normatively reference this specification, and to include 332 the ext-value production into the ABNF for that header field. 334 For instance: 336 foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param 337 title-param = "title" LWSP "=" LWSP value 338 / "title*" LWSP "=" LWSP ext-value 339 ext-value = 341 Note: The Parameter Value Continuation feature defined in Section 342 3 of [RFC2231] makes it impossible to have multiple instances of 343 extended parameters with identical parmname components, as the 344 processing of continuations would become ambiguous. Thus, 345 specifications using this extension are advised to disallow this 346 case for compatibility with RFC 2231. 348 4.1. When to Use the Extension 350 Section 4.2 of [RFC2277] requires that protocol elements containing 351 human-readable text are able to carry language information. Thus, 352 the ext-value production ought to be always used when the parameter 353 value is of textual nature and its language is known. 355 Furthermore, the extension ought to also be used whenever the 356 parameter value needs to carry characters not present in the US-ASCII 357 ([USASCII]) character set (note that it would be unacceptable to 358 define a new parameter that would be restricted to a subset of the 359 Unicode character set). 361 4.2. Error Handling 363 Header field specifications need to define whether multiple instances 364 of parameters with identical parmname components are allowed, and how 365 they should be processed. This specification suggests that a 366 parameter using the extended syntax takes precedence. This would 367 allow producers to use both formats without breaking recipients that 368 do not understand the extended syntax yet. 370 Example: 372 foo: bar; title="EURO exchange rates"; 373 title*=utf-8''%e2%82%ac%20exchange%20rates 375 In this case, the sender provides an ASCII version of the title for 376 legacy recipients, but also includes an internationalized version for 377 recipients understanding this specification -- the latter obviously 378 ought to prefer the new syntax over the old one. 380 Note: at the time of this writing, many implementations failed to 381 ignore the form they do not understand, or prioritize the ASCII 382 form although the extended syntax was present. 384 5. Security Considerations 386 The format described in this document makes it possible to transport 387 non-ASCII characters, and thus enables character "spoofing" 388 scenarios, in which a displayed value appears to be something other 389 than it is. 391 Furthermore, there are known attack scenarios relating to decoding 392 UTF-8. 394 See Section 10 of [RFC3629] for more information on both topics. 396 In addition, the extension specified in this document makes it 397 possible to transport multiple language variants for a single 398 parameter, and such use might allow spoofing attacks, where different 399 language versions of the same parameter are not equivalent. Whether 400 this attack is useful as an attack depends on the parameter 401 specified. 403 6. Acknowledgements 405 Thanks to Martin Duerst and Frank Ellermann for help figuring out 406 ABNF details, to Graham Klyne and Alexey Melnikov for general review, 407 to Chris Newman for pointing out an RFC 2231 incompatibility, and to 408 Benjamin Carlyle, Roar Lauritzsen, and Eric Lawrence for 409 implementer's feedback. 411 7. References 413 7.1. Normative References 415 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 416 Requirement Levels", BCP 14, RFC 2119, March 1997. 418 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 419 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 420 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 422 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration 423 Procedures", BCP 19, RFC 2978, October 2000. 425 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 426 10646", STD 63, RFC 3629, November 2003. 428 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, 429 "Uniform Resource Identifier (URI): Generic Syntax", 430 STD 66, RFC 3986, January 2005. 432 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for 433 Syntax Specifications: ABNF", STD 68, RFC 5234, 434 January 2008. 436 [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for 437 Identifying Languages", BCP 47, RFC 5646, 438 September 2009. 440 [USASCII] American National Standards Institute, "Coded Character 441 Set -- 7-bit American Standard Code for Information 442 Interchange", ANSI X3.4, 1986. 444 7.2. Informative References 446 [Err1912] RFC Errata, "Errata ID 1912, RFC 2978", 447 . 449 [ISO-8859-1] International Organization for Standardization, 450 "Information technology -- 8-bit single-byte coded 451 graphic character sets -- Part 1: Latin alphabet No. 452 1", ISO/IEC 8859-1:1998, 1998. 454 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet 455 Mail Extensions (MIME) Part One: Format of Internet 456 Message Bodies", RFC 2045, November 1996. 458 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail 459 Extensions) Part Three: Message Header Extensions for 460 Non-ASCII Text", RFC 2047, November 1996. 462 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and 463 Encoded Word Extensions: Character Sets, Languages, and 464 Continuations", RFC 2231, November 1997. 466 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 467 Languages", BCP 18, RFC 2277, January 1998. 469 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 470 form-data", RFC 2388, August 1998. 472 [RFC5987] Reschke, J., "Character Set and Language Encoding for 473 Hypertext Transfer Protocol (HTTP) Header Field 474 Parameters", RFC 5987, August 2010. 476 URIs 478 [1] 480 [2] 482 Appendix A. Changes from RFC 5987 484 This section summarizes the changes compared to [RFC5987]: 486 o The document title was changed to "Indicating Character Encoding 487 and Language for HTTP Header Field Parameters". 489 o The requirement to support the "ISO-8859-1" encoding was removed. 491 Appendix B. Change Log (to be removed by RFC Editor before publication) 493 B.1. Since RFC5987 495 Only editorial changes for the purpose of starting the revision 496 process (obs5987). 498 B.2. Since draft-reschke-rfc5987bis-00 500 Resolved issues "iso-8859-1" and "title" (title simplified). Added 501 and resolved issue "historic5987". 503 Appendix C. Resolved issues (to be removed by RFC Editor before 504 publication) 506 Issues that were either rejected or resolved in this version of this 507 document. 509 C.1. iso-8859-1 511 Type: change 513 julian.reschke@greenbytes.de (2011-04-15): Remove requirement to 514 support ISO-8859-1? It doesn't really help, and it is not 515 implemented in IE9. 517 Resolution (2011-09-07): Removed requirement; adjusted examples; 518 explain that RFC 5987 required this so recipients may want to support 519 it anyway. 521 C.2. title 523 Type: edit 525 duerst@it.aoyama.ac.jp (2011-04-17): Proposed title: "Indicating 526 Character Encoding and Language for HTTP Header Field Parameters" 528 Resolution (2011-09-07): Done. 530 C.3. historic5987 532 In Section 1: 534 Type: change 536 julian.reschke@greenbytes.de (2011-09-08): Point out that RFC 5987 537 should be moved to "historic". 539 Resolution (2011-09-08): Done. 541 Appendix D. Open issues (to be removed by RFC Editor prior to 542 publication) 544 D.1. edit 546 Type: edit 548 julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for 549 editorial fixes/enhancements. 551 D.2. impls 553 Type: change 555 julian.reschke@greenbytes.de (2011-04-15): Add implementation report. 557 Author's Address 559 Julian F. Reschke 560 greenbytes GmbH 561 Hafenweg 16 562 Muenster, NW 48155 563 Germany 565 EMail: julian.reschke@greenbytes.de 566 URI: http://greenbytes.de/tech/webdav/