idnits 2.17.1 draft-reschke-rfc2231-in-http-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 19, 2009) is 5427 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4646' is defined on line 358, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Reschke 3 Internet-Draft greenbytes 4 Intended status: Standards Track May 19, 2009 5 Expires: November 20, 2009 7 Application of RFC 2231 Encoding to 8 Hypertext Transfer Protocol (HTTP) Headers 9 draft-reschke-rfc2231-in-http-02 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on November 20, 2009. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 By default, message header parameters in Hypertext Transfer Protocol 48 (HTTP) messages can not carry characters outside the ISO-8859-1 49 character set. RFC 2231 defines an escaping mechanism for use in 50 Multipurpose Internet Mail Extensions (MIME) headers. This document 51 specifies a profile of that encoding suitable for use in HTTP. 53 Editorial Note (To be removed by RFC Editor before publication) 55 There are multiple HTTP headers that already use RFC 2231 encoding in 56 practice (Content-Disposition) or might use it in the future (Link). 57 The purpose of this document is to provide a single place where the 58 generic aspects of RFC 2231 encoding in HTTP headers are defined. 60 Distribution of this document is unlimited. Although this is not a 61 work item of the HTTPbis Working Group, comments should be sent to 62 the Hypertext Transfer Protocol (HTTP) mailing list at 63 ietf-http-wg@w3.org [1], which may be joined by sending a message 64 with subject "subscribe" to ietf-http-wg-request@w3.org [2]. 66 Discussions of the HTTPbis Working Group are archived at 67 . 69 XML versions, latest edits and the issues list for this document are 70 available from 71 . A 72 collection of test cases is available at 73 . 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 79 3. A Profile of RFC 2231 for Use in HTTP . . . . . . . . . . . . 4 80 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 4 81 3.2. Parameter Value Character Set and Language Information . . 5 82 3.2.1. Examples . . . . . . . . . . . . . . . . . . . . . . . 7 83 3.3. Language specification in Encoded Words . . . . . . . . . 7 84 4. Guidelines for Usage in HTTP Header Definitions . . . . . . . 8 85 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 8 86 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 8 87 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 88 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 89 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 90 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 91 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 92 8.2. Informative References . . . . . . . . . . . . . . . . . . 10 93 Appendix A. Change Log (to be removed by RFC Editor before 94 publication) . . . . . . . . . . . . . . . . . . . . 10 95 A.1. Since draft-reschke-rfc2231-in-http-00 . . . . . . . . . . 10 96 A.2. Since draft-reschke-rfc2231-in-http-01 . . . . . . . . . . 10 97 Appendix B. Open issues (to be removed by RFC Editor prior to 98 publication) . . . . . . . . . . . . . . . . . . . . 10 99 B.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 100 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 102 1. Introduction 104 By default, message header parameters in HTTP ([RFC2616]) messages 105 can not carry characters outside the ISO-8859-1 character set 106 ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an escaping mechanism 107 for use in MIME headers. This document specifies a profile of that 108 encoding for use in HTTP. 110 2. Notational Conventions 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 114 document are to be interpreted as described in [RFC2119]. 116 This specification uses the ABNF (Augmented Backus-Naur Form) 117 notation defined in [RFC5234]. The following core rules are included 118 by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters), 119 DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f) and LWSP 120 (linear white space). 122 Note that this specification uses the term "character set" for 123 consistency with other IETF specifications such as RFC 2277 (see 124 [RFC2277], Section 3). A more accurate term would be "character 125 encoding" (a mapping of code points to octet sequences). 127 3. A Profile of RFC 2231 for Use in HTTP 129 RFC 2231 defines several extensions to MIME. The sections below 130 discuss if and how they apply to HTTP. 132 In short: 134 o Parameter Continuations aren't needed (Section 3.1), 136 o Character Set and Language Information are useful, therefore a 137 simple subset is specified (Section 3.2), and 139 o Language Specifications in Encoded Words aren't needed 140 (Section 3.3). 142 3.1. Parameter Continuations 144 Section 3 of [RFC2231] defines a mechanism that deals with the length 145 limitations that apply to MIME headers. These limitations do not 146 apply to HTTP ([RFC2616], Section 19.4.7). 148 Thus in HTTP, senders MUST NOT use parameter continuations, and 149 therefore recipients do not need to support them. 151 3.2. Parameter Value Character Set and Language Information 153 Section 4 of [RFC2231] specifies how to embed language information 154 into parameter values, and also how to encode non-ASCII characters, 155 dealing with restrictions both in MIME and HTTP header parameters. 157 However, RFC 2231 does not specify a mandatory-to-implement character 158 encoding, making it hard for senders to decide which character set to 159 use. Thus, recipients implementing this specification MUST support 160 the character sets "ISO-8859-1" [ISO-8859-1] and "UTF-8" [RFC3629]. 162 Furthermore, RFC 2231 allows leaving out the character encoding 163 information. The profile defined by this specification does not 164 allow that. 166 The syntax for parameters is defined in Section 3.6 of [RFC2616] 167 (with RFC 2616 implied LWS translated to RFC 5234 LWSP): 169 parameter = attribute LWSP "=" LWSP value 171 attribute = token 172 value = token / quoted-string 174 quoted-string = 175 token = 177 This specification extends the grammar to: 179 parameter = reg-parameter / ext-parameter 181 reg-parameter = attribute LWSP "=" LWSP value 183 ext-parameter = attribute "*" LWSP "=" LWSP ext-value 185 ext-value = charset "'" [ language ] "'" value-chars 186 ; extended-initial-value, 187 ; defined in [RFC2231], Section 7 189 charset = %x55.54.46.2D.38 ; "UTF-8" 190 / %x49.53.4F.2D.38.38.35.39.2D.31 ; "ISO-8859-1" 191 / ext-charset 193 ext-charset = token ; see IANA charset registry 194 ; () 196 language = 198 value-chars = *( pct-encoded / attr-char ) 200 pct-encoded = "%" HEXDIG HEXDIG 201 ; see [RFC3986], Section 2.1 203 attr-char = ALPHA / DIGIT 204 / "-" / "." / "_" / "~" / ":" 205 / "!" / "$" / "&" / "+" 207 Thus, a parameter is either regular parameter (reg-parameter), as 208 previously defined in Section 3.6 of [RFC2616], or an extended 209 parameter (ext-parameter). 211 Extended parameters are those where the left hand side of the 212 assignment ends with an asterisk character. 214 The value part of an extended parameter (ext-value) is a token that 215 consists of three parts: the REQUIRED character set name (charset), 216 the OPTIONAL language information (language), and a a character 217 sequence representing the actual value (value-chars), separated by 218 single quote characters. 220 Inside the value part, characters not contained in attr-char are 221 encoded into an octet sequence using the specified character set. 222 That octet sequence then is percent-encoded as specified in Section 223 2.1 of [RFC3986]. 225 Producers MUST NOT use character sets other than "UTF-8" ([RFC3629]) 226 or ISO-8859-1 ([ISO-8859-1]). Extension character sets (ext-charset) 227 are reserved for future use. 229 3.2.1. Examples 231 Non-extended notation, using "token": 233 foo: bar; title=Economy 235 Non-extended notation, using "quoted-string": 237 foo: bar; title="US-$ rates" 239 Extended notation, using the unicode character U+00A3 (POUND SIGN): 241 foo: bar; title*=iso-8859-1'en'%A3%20rates 243 Note: the Unicode pound sign character U+00A3 was encoded using ISO- 244 8859-1 into the single octet A3, then percent-encoded. Also note 245 that the space character was encoded as %20, as it is not contained 246 in attr-char. 248 Extended notation, using the unicode characters U+00A3 (POUND SIGN) 249 and U+20AC (EURO SIGN): 251 foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates 253 Note: the unicode pound sign character U+00A3 was encoded using UTF-8 254 into the octet sequence C2 A3, then percent-encoded. Likewise, the 255 unicode euro sign character U+20AC was encoded into the octet 256 sequence E2 82 AC, then percent-encoded. Also note that HEXDIG 257 allows both lower-case and upper-case character, so recipients must 258 understand both, and that the language information is optional, while 259 the character set is not. 261 3.3. Language specification in Encoded Words 263 Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to 264 also support language specification in encoded words. Although the 265 HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section 266 2.2), it's not clear to which header field exactly it applies, and 267 whether it is implemented in practice (see 268 for details). 270 Thus, the RFC 2231 profile defined by this specification does not 271 include this feature. 273 4. Guidelines for Usage in HTTP Header Definitions 275 Specifications of HTTP headers that use the extensions defined in 276 Section 3.2 should clearly state that. A simple way to achieve this 277 is to normatively reference this specification, and to include the 278 ext-value production into the ABNF for that header. 280 For instance: 282 foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param 283 title-param = "title" LWSP "=" LWSP value 284 / "title*" LWSP "=" LWSP ext-value 285 ext-value = 287 [[rfcno: Note to RFC Editor: in the figure above, please replace 288 "xxxx" by the RFC number assigned to this specification.]] 290 4.1. When to Use the Extension 292 Section 4.2 of [RFC2277] requires that protocol elements containing 293 text can carry language information. Thus, the ext-value production 294 should always be used when the parameter value is of textual nature. 296 Furthermore, the extension should also be used whenever the parameter 297 value needs to carry characters not present in the US-ASCII 298 ([USASCII]) character set (note that it would be unacceptable to 299 define a new parameter that would be restricted to a subset of the 300 Unicode character set). 302 4.2. Error Handling 304 Header specifications that include parameters should also specify 305 whether same-named parameters can occur multiple times. If 306 repetitions are not allowed (and this is believed to be the common 307 case), the specification should state whether regular or the extended 308 syntax takes precedence. In the latter case, this could be used by 309 producers to use both formats without breaking recipients that do not 310 understand the syntax. [[anchor6: Does not work as expected, see 311 and 312 .]] 314 Example: 316 foo: bar; title="EURO exchange rates"; 317 title*=utf-8''%e2%82%ac%20exchange%20rates 319 In this case, the sender provides an ASCII version of the title for 320 legacy recipients, but also includes an internationalized version for 321 recipients understanding this specification -- the latter obviously 322 should prefer the new syntax over the old one. 324 5. Security Considerations 326 This document does not discuss security issues and is not believed to 327 raise any security issues not already endemic in HTTP. 329 6. IANA Considerations 331 There are no IANA Considerations related to this specification. 333 7. Acknowledgements 335 Thanks to Frank Ellermann for help figuring out ABNF details, and to 336 Roar Lauritzsen for implementer's feedback. 338 8. References 340 8.1. Normative References 342 [ISO-8859-1] 343 International Organization for Standardization, 344 "Information technology -- 8-bit single-byte coded graphic 345 character sets -- Part 1: Latin alphabet No. 1", ISO/ 346 IEC 8859-1:1998, 1998. 348 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 349 Requirement Levels", BCP 14, RFC 2119, March 1997. 351 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 352 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 353 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 355 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 356 10646", RFC 3629, STD 63, November 2003. 358 [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying 359 Languages", BCP 47, RFC 4646, September 2006. 361 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 362 Specifications: ABNF", STD 68, RFC 5234, January 2008. 364 8.2. Informative References 366 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 367 Part Three: Message Header Extensions for Non-ASCII Text", 368 RFC 2047, November 1996. 370 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 371 Word Extensions: 372 Character Sets, Languages, and Continuations", RFC 2231, 373 November 1997. 375 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 376 Languages", BCP 18, RFC 2277, January 1998. 378 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 379 Resource Identifier (URI): Generic Syntax", RFC 3986, 380 STD 66, January 2005. 382 [USASCII] American National Standards Institute, "Coded Character 383 Set -- 7-bit American Standard Code for Information 384 Interchange", ANSI X3.4, 1986. 386 URIs 388 [1] 390 [2] 392 Appendix A. Change Log (to be removed by RFC Editor before publication) 394 A.1. Since draft-reschke-rfc2231-in-http-00 396 Use RFC5234-style ABNF, closer to the one used in RFC 2231. 398 Make RFC 2231 dependency informative, so this specification can 399 evolve independantly. 401 Explain the ABNF in prose. 403 A.2. Since draft-reschke-rfc2231-in-http-01 405 Remove unneeded RFC5137 notation (code point vs character). 407 Appendix B. Open issues (to be removed by RFC Editor prior to 408 publication) 410 B.1. edit 412 Type: edit 414 julian.reschke@greenbytes.de (2009-04-17): Umbrella issue for 415 editorial fixes/enhancements. 417 Author's Address 419 Julian F. Reschke 420 greenbytes GmbH 421 Hafenweg 16 422 Muenster, NW 48155 423 Germany 425 Email: julian.reschke@greenbytes.de 426 URI: http://greenbytes.de/tech/webdav/