idnits 2.17.1 draft-reschke-rfc5987bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document obsoletes RFC5987, but the abstract doesn't seem to directly say this. It does mention RFC5987 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 15, 2011) is 4759 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Possible downref: Non-RFC (?) normative reference: ref. 'USASCII' -- Duplicate reference: RFC2978, mentioned in 'Err1912', was also mentioned in 'RFC2978'. -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) -- Obsolete informational reference (is this intentional?): RFC 5987 (Obsoleted by RFC 8187) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Reschke 3 Internet-Draft greenbytes 4 Obsoletes: 5987 (if approved) April 15, 2011 5 Intended status: Standards Track 6 Expires: October 17, 2011 8 Character Set and Language Encoding for 9 Hypertext Transfer Protocol (HTTP) Header Field Parameters 10 draft-reschke-rfc5987bis-00 12 Abstract 14 By default, message header field parameters in Hypertext Transfer 15 Protocol (HTTP) messages cannot carry characters outside the ISO- 16 8859-1 character set. RFC 2231 defines an encoding mechanism for use 17 in Multipurpose Internet Mail Extensions (MIME) headers. This 18 document specifies an encoding suitable for use in HTTP header fields 19 that is compatible with a profile of the encoding defined in RFC 20 2231. 22 Editorial Note (To be removed by RFC Editor before publication) 24 Distribution of this document is unlimited. Although this is not a 25 work item of the HTTPbis Working Group, comments should be sent to 26 the Hypertext Transfer Protocol (HTTP) mailing list at 27 ietf-http-wg@w3.org [1], which may be joined by sending a message 28 with subject "subscribe" to ietf-http-wg-request@w3.org [2]. 30 Discussions of the HTTPbis Working Group are archived at 31 . 33 XML versions, latest edits and the issues list for this document are 34 available from 35 . A 36 collection of test cases is available at 37 . 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at http://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on October 17, 2011. 56 Copyright Notice 58 Copyright (c) 2011 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 75 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4 76 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5 77 3.2. Parameter Value Character Set and Language Information . . 5 78 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5 79 3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7 80 3.3. Language Specification in Encoded Words . . . . . . . . . 8 81 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8 82 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9 83 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9 84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 85 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 87 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10 88 7.2. Informative References . . . . . . . . . . . . . . . . . . 11 89 Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 11 90 Appendix B. Change Log (to be removed by RFC Editor before 91 publication) . . . . . . . . . . . . . . . . . . . . 12 92 B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12 93 Appendix C. Resolved issues (to be removed by RFC Editor 94 before publication) . . . . . . . . . . . . . . . . . 12 95 C.1. obs5987 . . . . . . . . . . . . . . . . . . . . . . . . . 12 96 Appendix D. Open issues (to be removed by RFC Editor prior to 97 publication) . . . . . . . . . . . . . . . . . . . . 12 98 D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 99 D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 12 100 D.3. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12 102 1. Introduction 104 By default, message header field parameters in HTTP ([RFC2616]) 105 messages cannot carry characters outside the ISO-8859-1 character set 106 ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism 107 for use in MIME headers. This document specifies an encoding 108 suitable for use in HTTP header fields that is compatible with a 109 profile of the encoding defined in RFC 2231. 111 This document obsoletes [RFC5987]; the changes are summarized in 112 Appendix A. 114 Note: in the remainder of this document, RFC 2231 is only 115 referenced for the purpose of explaining the choice of features 116 that were adopted; they are therefore purely informative. 118 Note: this encoding does not apply to message payloads transmitted 119 over HTTP, such as when using the media type "multipart/form-data" 120 ([RFC2388]). 122 2. Notational Conventions 124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 126 document are to be interpreted as described in [RFC2119]. 128 This specification uses the ABNF (Augmented Backus-Naur Form) 129 notation defined in [RFC5234]. The following core rules are included 130 by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters), 131 DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP 132 (linear whitespace). 134 Note that this specification uses the term "character set" for 135 consistency with other IETF specifications such as RFC 2277 (see 136 [RFC2277], Section 3). A more accurate term would be "character 137 encoding" (a mapping of code points to octet sequences). 139 3. Comparison to RFC 2231 and Definition of the Encoding 141 RFC 2231 defines several extensions to MIME. The sections below 142 discuss if and how they apply to HTTP header fields. 144 In short: 146 o Parameter Continuations aren't needed (Section 3.1), 148 o Character Set and Language Information are useful, therefore a 149 simple subset is specified (Section 3.2), and 151 o Language Specifications in Encoded Words aren't needed 152 (Section 3.3). 154 3.1. Parameter Continuations 156 Section 3 of [RFC2231] defines a mechanism that deals with the length 157 limitations that apply to MIME headers. These limitations do not 158 apply to HTTP ([RFC2616], Section 19.4.7). 160 Thus, parameter continuations are not part of the encoding defined by 161 this specification. 163 3.2. Parameter Value Character Set and Language Information 165 Section 4 of [RFC2231] specifies how to embed language information 166 into parameter values, and also how to encode non-ASCII characters, 167 dealing with restrictions both in MIME and HTTP header parameters. 169 However, RFC 2231 does not specify a mandatory-to-implement character 170 set, making it hard for senders to decide which character set to use. 171 Thus, recipients implementing this specification MUST support the 172 character sets "ISO-8859-1" [ISO-8859-1] and "UTF-8" [RFC3629]. 174 Furthermore, RFC 2231 allows the character set information to be left 175 out. The encoding defined by this specification does not allow that. 177 3.2.1. Definition 179 The syntax for parameters is defined in Section 3.6 of [RFC2616] 180 (with RFC 2616 implied LWS translated to RFC 5234 LWSP): 182 parameter = attribute LWSP "=" LWSP value 184 attribute = token 185 value = token / quoted-string 187 quoted-string = 188 token = 190 In order to include character set and language information, this 191 specification modifies the RFC 2616 grammar to be: 193 parameter = reg-parameter / ext-parameter 195 reg-parameter = parmname LWSP "=" LWSP value 197 ext-parameter = parmname "*" LWSP "=" LWSP ext-value 199 parmname = 1*attr-char 201 ext-value = charset "'" [ language ] "'" value-chars 202 ; like RFC 2231's 203 ; (see [RFC2231], Section 7) 205 charset = "UTF-8" / "ISO-8859-1" / mime-charset 207 mime-charset = 1*mime-charsetc 208 mime-charsetc = ALPHA / DIGIT 209 / "!" / "#" / "$" / "%" / "&" 210 / "+" / "-" / "^" / "_" / "`" 211 / "{" / "}" / "~" 212 ; as in Section 2.3 of [RFC2978] 213 ; except that the single quote is not included 214 ; SHOULD be registered in the IANA charset registry 216 language = 218 value-chars = *( pct-encoded / attr-char ) 220 pct-encoded = "%" HEXDIG HEXDIG 221 ; see [RFC3986], Section 2.1 223 attr-char = ALPHA / DIGIT 224 / "!" / "#" / "$" / "&" / "+" / "-" / "." 225 / "^" / "_" / "`" / "|" / "~" 226 ; token except ( "*" / "'" / "%" ) 228 Thus, a parameter is either a regular parameter (reg-parameter), as 229 previously defined in Section 3.6 of [RFC2616], or an extended 230 parameter (ext-parameter). 232 Extended parameters are those where the left-hand side of the 233 assignment ends with an asterisk character. 235 The value part of an extended parameter (ext-value) is a token that 236 consists of three parts: the REQUIRED character set name (charset), 237 the OPTIONAL language information (language), and a character 238 sequence representing the actual value (value-chars), separated by 239 single quote characters. Note that both character set names and 240 language tags are restricted to the US-ASCII character set, and are 241 matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646], 242 Section 2.1.1). 244 Inside the value part, characters not contained in attr-char are 245 encoded into an octet sequence using the specified character set. 246 That octet sequence is then percent-encoded as specified in Section 247 2.1 of [RFC3986]. 249 Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1" 250 ([ISO-8859-1]) character set. Extension character sets (mime- 251 charset) are reserved for future use. 253 Note: recipients should be prepared to handle encoding errors, 254 such as malformed or incomplete percent escape sequences, or non- 255 decodable octet sequences, in a robust manner. This specification 256 does not mandate any specific behavior, for instance, the 257 following strategies are all acceptable: 259 * ignoring the parameter, 261 * stripping a non-decodable octet sequence, 263 * substituting a non-decodable octet sequence by a replacement 264 character, such as the Unicode character U+FFFD (Replacement 265 Character). 267 Note: the RFC 2616 token production ([RFC2616], Section 2.2) 268 differs from the production used in RFC 2231 (imported from 269 Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are 270 excluded. Thus, these two characters are excluded from the attr- 271 char production as well. 273 Note: the ABNF defined here differs from the one in 274 Section 2.3 of [RFC2978] in that it does not allow the single 275 quote character (see also RFC Errata ID 1912 [Err1912]). In 276 practice, no character set names using that character have been 277 registered at the time of this writing. 279 3.2.2. Examples 281 Non-extended notation, using "token": 283 foo: bar; title=Economy 285 Non-extended notation, using "quoted-string": 287 foo: bar; title="US-$ rates" 289 Extended notation, using the Unicode character U+00A3 (POUND SIGN): 291 foo: bar; title*=iso-8859-1'en'%A3%20rates 293 Note: the Unicode pound sign character U+00A3 was encoded into the 294 single octet A3 using the ISO-8859-1 character encoding, then 295 percent-encoded. Also, note that the space character was encoded as 296 %20, as it is not contained in attr-char. 298 Extended notation, using the Unicode characters U+00A3 (POUND SIGN) 299 and U+20AC (EURO SIGN): 301 foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates 303 Note: the Unicode pound sign character U+00A3 was encoded into the 304 octet sequence C2 A3 using the UTF-8 character encoding, then 305 percent-encoded. Likewise, the Unicode euro sign character U+20AC 306 was encoded into the octet sequence E2 82 AC, then percent-encoded. 307 Also note that HEXDIG allows both lowercase and uppercase characters, 308 so recipients must understand both, and that the language information 309 is optional, while the character set is not. 311 3.3. Language Specification in Encoded Words 313 Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to 314 also support language specification in encoded words. Although the 315 HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section 316 2.2), it's not clear to which header field exactly it applies, and 317 whether it is implemented in practice (see 318 for details). 320 Thus, this specification does not include this feature. 322 4. Guidelines for Usage in HTTP Header Field Definitions 324 Specifications of HTTP header fields that use the extensions defined 325 in Section 3.2 ought to clearly state that. A simple way to achieve 326 this is to normatively reference this specification, and to include 327 the ext-value production into the ABNF for that header field. 329 For instance: 331 foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param 332 title-param = "title" LWSP "=" LWSP value 333 / "title*" LWSP "=" LWSP ext-value 334 ext-value = 335 Note: The Parameter Value Continuation feature defined in Section 336 3 of [RFC2231] makes it impossible to have multiple instances of 337 extended parameters with identical parmname components, as the 338 processing of continuations would become ambiguous. Thus, 339 specifications using this extension are advised to disallow this 340 case for compatibility with RFC 2231. 342 4.1. When to Use the Extension 344 Section 4.2 of [RFC2277] requires that protocol elements containing 345 human-readable text are able to carry language information. Thus, 346 the ext-value production ought to be always used when the parameter 347 value is of textual nature and its language is known. 349 Furthermore, the extension ought to also be used whenever the 350 parameter value needs to carry characters not present in the US-ASCII 351 ([USASCII]) character set (note that it would be unacceptable to 352 define a new parameter that would be restricted to a subset of the 353 Unicode character set). 355 4.2. Error Handling 357 Header field specifications need to define whether multiple instances 358 of parameters with identical parmname components are allowed, and how 359 they should be processed. This specification suggests that a 360 parameter using the extended syntax takes precedence. This would 361 allow producers to use both formats without breaking recipients that 362 do not understand the extended syntax yet. 364 Example: 366 foo: bar; title="EURO exchange rates"; 367 title*=utf-8''%e2%82%ac%20exchange%20rates 369 In this case, the sender provides an ASCII version of the title for 370 legacy recipients, but also includes an internationalized version for 371 recipients understanding this specification -- the latter obviously 372 ought to prefer the new syntax over the old one. 374 Note: at the time of this writing, many implementations failed to 375 ignore the form they do not understand, or prioritize the ASCII 376 form although the extended syntax was present. 378 5. Security Considerations 380 The format described in this document makes it possible to transport 381 non-ASCII characters, and thus enables character "spoofing" 382 scenarios, in which a displayed value appears to be something other 383 than it is. 385 Furthermore, there are known attack scenarios relating to decoding 386 UTF-8. 388 See Section 10 of [RFC3629] for more information on both topics. 390 In addition, the extension specified in this document makes it 391 possible to transport multiple language variants for a single 392 parameter, and such use might allow spoofing attacks, where different 393 language versions of the same parameter are not equivalent. Whether 394 this attack is useful as an attack depends on the parameter 395 specified. 397 6. Acknowledgements 399 Thanks to Martin Duerst and Frank Ellermann for help figuring out 400 ABNF details, to Graham Klyne and Alexey Melnikov for general review, 401 to Chris Newman for pointing out an RFC 2231 incompatibility, and to 402 Benjamin Carlyle and Roar Lauritzsen for implementer's feedback. 404 7. References 406 7.1. Normative References 408 [ISO-8859-1] International Organization for Standardization, 409 "Information technology -- 8-bit single-byte coded 410 graphic character sets -- Part 1: Latin alphabet No. 411 1", ISO/IEC 8859-1:1998, 1998. 413 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 414 Requirement Levels", BCP 14, RFC 2119, March 1997. 416 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 417 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 418 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 420 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration 421 Procedures", BCP 19, RFC 2978, October 2000. 423 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 424 10646", STD 63, RFC 3629, November 2003. 426 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, 427 "Uniform Resource Identifier (URI): Generic Syntax", 428 STD 66, RFC 3986, January 2005. 430 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for 431 Syntax Specifications: ABNF", STD 68, RFC 5234, 432 January 2008. 434 [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for 435 Identifying Languages", BCP 47, RFC 5646, 436 September 2009. 438 [USASCII] American National Standards Institute, "Coded Character 439 Set -- 7-bit American Standard Code for Information 440 Interchange", ANSI X3.4, 1986. 442 7.2. Informative References 444 [Err1912] RFC Errata, "Errata ID 1912, RFC 2978", 445 . 447 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet 448 Mail Extensions (MIME) Part One: Format of Internet 449 Message Bodies", RFC 2045, November 1996. 451 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail 452 Extensions) Part Three: Message Header Extensions for 453 Non-ASCII Text", RFC 2047, November 1996. 455 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and 456 Encoded Word Extensions: Character Sets, Languages, and 457 Continuations", RFC 2231, November 1997. 459 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 460 Languages", BCP 18, RFC 2277, January 1998. 462 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 463 form-data", RFC 2388, August 1998. 465 [RFC5987] Reschke, J., "Character Set and Language Encoding for 466 Hypertext Transfer Protocol (HTTP) Header Field 467 Parameters", RFC 5987, August 2010. 469 URIs 471 [1] 473 [2] 475 Appendix A. Changes from RFC 5987 477 This section summarizes the changes compared to [RFC5987]: 479 [[anchor8: None yet.]] 481 Appendix B. Change Log (to be removed by RFC Editor before publication) 483 B.1. Since RFC5987 485 Only editorial changes for the purpose of starting the revision 486 process (obs5987). 488 Appendix C. Resolved issues (to be removed by RFC Editor before 489 publication) 491 Issues that were either rejected or resolved in this version of this 492 document. 494 C.1. obs5987 496 Type: change 498 julian.reschke@greenbytes.de (2011-04-15): Obsolete RFC 5987, 499 summarize differences. 501 Appendix D. Open issues (to be removed by RFC Editor prior to 502 publication) 504 D.1. edit 506 Type: edit 508 julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for 509 editorial fixes/enhancements. 511 D.2. impls 513 Type: change 515 julian.reschke@greenbytes.de (2011-04-15): Add implementation report. 517 D.3. iso-8859-1 519 Type: change 521 julian.reschke@greenbytes.de (2011-04-15): Remove requirement to 522 support ISO-8859-1? It doesn't really help, and it is not 523 implemented in IE9. 525 Author's Address 527 Julian F. Reschke 528 greenbytes GmbH 529 Hafenweg 16 530 Muenster, NW 48155 531 Germany 533 EMail: julian.reschke@greenbytes.de 534 URI: http://greenbytes.de/tech/webdav/