idnits 2.17.1
draft-reschke-rfc5987bis-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
** The abstract seems to contain references ([2], [1]), which it shouldn't.
Please replace those with straight textual mentions of the documents in
question.
-- The draft header indicates that this document obsoletes RFC5987, but the
abstract doesn't seem to directly say this. It does mention RFC5987
though, so this could be OK.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (April 15, 2011) is 4759 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
-- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1'
** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
RFC 7232, RFC 7233, RFC 7234, RFC 7235)
-- Possible downref: Non-RFC (?) normative reference: ref. 'USASCII'
-- Duplicate reference: RFC2978, mentioned in 'Err1912', was also mentioned
in 'RFC2978'.
-- Obsolete informational reference (is this intentional?): RFC 2388
(Obsoleted by RFC 7578)
-- Obsolete informational reference (is this intentional?): RFC 5987
(Obsoleted by RFC 8187)
Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Reschke
3 Internet-Draft greenbytes
4 Obsoletes: 5987 (if approved) April 15, 2011
5 Intended status: Standards Track
6 Expires: October 17, 2011
8 Character Set and Language Encoding for
9 Hypertext Transfer Protocol (HTTP) Header Field Parameters
10 draft-reschke-rfc5987bis-00
12 Abstract
14 By default, message header field parameters in Hypertext Transfer
15 Protocol (HTTP) messages cannot carry characters outside the ISO-
16 8859-1 character set. RFC 2231 defines an encoding mechanism for use
17 in Multipurpose Internet Mail Extensions (MIME) headers. This
18 document specifies an encoding suitable for use in HTTP header fields
19 that is compatible with a profile of the encoding defined in RFC
20 2231.
22 Editorial Note (To be removed by RFC Editor before publication)
24 Distribution of this document is unlimited. Although this is not a
25 work item of the HTTPbis Working Group, comments should be sent to
26 the Hypertext Transfer Protocol (HTTP) mailing list at
27 ietf-http-wg@w3.org [1], which may be joined by sending a message
28 with subject "subscribe" to ietf-http-wg-request@w3.org [2].
30 Discussions of the HTTPbis Working Group are archived at
31 .
33 XML versions, latest edits and the issues list for this document are
34 available from
35 . A
36 collection of test cases is available at
37 .
39 Status of This Memo
41 This Internet-Draft is submitted in full conformance with the
42 provisions of BCP 78 and BCP 79.
44 Internet-Drafts are working documents of the Internet Engineering
45 Task Force (IETF). Note that other groups may also distribute
46 working documents as Internet-Drafts. The list of current Internet-
47 Drafts is at http://datatracker.ietf.org/drafts/current/.
49 Internet-Drafts are draft documents valid for a maximum of six months
50 and may be updated, replaced, or obsoleted by other documents at any
51 time. It is inappropriate to use Internet-Drafts as reference
52 material or to cite them other than as "work in progress."
54 This Internet-Draft will expire on October 17, 2011.
56 Copyright Notice
58 Copyright (c) 2011 IETF Trust and the persons identified as the
59 document authors. All rights reserved.
61 This document is subject to BCP 78 and the IETF Trust's Legal
62 Provisions Relating to IETF Documents
63 (http://trustee.ietf.org/license-info) in effect on the date of
64 publication of this document. Please review these documents
65 carefully, as they describe your rights and restrictions with respect
66 to this document. Code Components extracted from this document must
67 include Simplified BSD License text as described in Section 4.e of
68 the Trust Legal Provisions and are provided without warranty as
69 described in the Simplified BSD License.
71 Table of Contents
73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
74 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4
75 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4
76 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5
77 3.2. Parameter Value Character Set and Language Information . . 5
78 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5
79 3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7
80 3.3. Language Specification in Encoded Words . . . . . . . . . 8
81 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8
82 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9
83 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9
84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9
85 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
87 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
88 7.2. Informative References . . . . . . . . . . . . . . . . . . 11
89 Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 11
90 Appendix B. Change Log (to be removed by RFC Editor before
91 publication) . . . . . . . . . . . . . . . . . . . . 12
92 B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12
93 Appendix C. Resolved issues (to be removed by RFC Editor
94 before publication) . . . . . . . . . . . . . . . . . 12
95 C.1. obs5987 . . . . . . . . . . . . . . . . . . . . . . . . . 12
96 Appendix D. Open issues (to be removed by RFC Editor prior to
97 publication) . . . . . . . . . . . . . . . . . . . . 12
98 D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
99 D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 12
100 D.3. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12
102 1. Introduction
104 By default, message header field parameters in HTTP ([RFC2616])
105 messages cannot carry characters outside the ISO-8859-1 character set
106 ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism
107 for use in MIME headers. This document specifies an encoding
108 suitable for use in HTTP header fields that is compatible with a
109 profile of the encoding defined in RFC 2231.
111 This document obsoletes [RFC5987]; the changes are summarized in
112 Appendix A.
114 Note: in the remainder of this document, RFC 2231 is only
115 referenced for the purpose of explaining the choice of features
116 that were adopted; they are therefore purely informative.
118 Note: this encoding does not apply to message payloads transmitted
119 over HTTP, such as when using the media type "multipart/form-data"
120 ([RFC2388]).
122 2. Notational Conventions
124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
126 document are to be interpreted as described in [RFC2119].
128 This specification uses the ABNF (Augmented Backus-Naur Form)
129 notation defined in [RFC5234]. The following core rules are included
130 by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters),
131 DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP
132 (linear whitespace).
134 Note that this specification uses the term "character set" for
135 consistency with other IETF specifications such as RFC 2277 (see
136 [RFC2277], Section 3). A more accurate term would be "character
137 encoding" (a mapping of code points to octet sequences).
139 3. Comparison to RFC 2231 and Definition of the Encoding
141 RFC 2231 defines several extensions to MIME. The sections below
142 discuss if and how they apply to HTTP header fields.
144 In short:
146 o Parameter Continuations aren't needed (Section 3.1),
148 o Character Set and Language Information are useful, therefore a
149 simple subset is specified (Section 3.2), and
151 o Language Specifications in Encoded Words aren't needed
152 (Section 3.3).
154 3.1. Parameter Continuations
156 Section 3 of [RFC2231] defines a mechanism that deals with the length
157 limitations that apply to MIME headers. These limitations do not
158 apply to HTTP ([RFC2616], Section 19.4.7).
160 Thus, parameter continuations are not part of the encoding defined by
161 this specification.
163 3.2. Parameter Value Character Set and Language Information
165 Section 4 of [RFC2231] specifies how to embed language information
166 into parameter values, and also how to encode non-ASCII characters,
167 dealing with restrictions both in MIME and HTTP header parameters.
169 However, RFC 2231 does not specify a mandatory-to-implement character
170 set, making it hard for senders to decide which character set to use.
171 Thus, recipients implementing this specification MUST support the
172 character sets "ISO-8859-1" [ISO-8859-1] and "UTF-8" [RFC3629].
174 Furthermore, RFC 2231 allows the character set information to be left
175 out. The encoding defined by this specification does not allow that.
177 3.2.1. Definition
179 The syntax for parameters is defined in Section 3.6 of [RFC2616]
180 (with RFC 2616 implied LWS translated to RFC 5234 LWSP):
182 parameter = attribute LWSP "=" LWSP value
184 attribute = token
185 value = token / quoted-string
187 quoted-string =
188 token =
190 In order to include character set and language information, this
191 specification modifies the RFC 2616 grammar to be:
193 parameter = reg-parameter / ext-parameter
195 reg-parameter = parmname LWSP "=" LWSP value
197 ext-parameter = parmname "*" LWSP "=" LWSP ext-value
199 parmname = 1*attr-char
201 ext-value = charset "'" [ language ] "'" value-chars
202 ; like RFC 2231's
203 ; (see [RFC2231], Section 7)
205 charset = "UTF-8" / "ISO-8859-1" / mime-charset
207 mime-charset = 1*mime-charsetc
208 mime-charsetc = ALPHA / DIGIT
209 / "!" / "#" / "$" / "%" / "&"
210 / "+" / "-" / "^" / "_" / "`"
211 / "{" / "}" / "~"
212 ; as in Section 2.3 of [RFC2978]
213 ; except that the single quote is not included
214 ; SHOULD be registered in the IANA charset registry
216 language =
218 value-chars = *( pct-encoded / attr-char )
220 pct-encoded = "%" HEXDIG HEXDIG
221 ; see [RFC3986], Section 2.1
223 attr-char = ALPHA / DIGIT
224 / "!" / "#" / "$" / "&" / "+" / "-" / "."
225 / "^" / "_" / "`" / "|" / "~"
226 ; token except ( "*" / "'" / "%" )
228 Thus, a parameter is either a regular parameter (reg-parameter), as
229 previously defined in Section 3.6 of [RFC2616], or an extended
230 parameter (ext-parameter).
232 Extended parameters are those where the left-hand side of the
233 assignment ends with an asterisk character.
235 The value part of an extended parameter (ext-value) is a token that
236 consists of three parts: the REQUIRED character set name (charset),
237 the OPTIONAL language information (language), and a character
238 sequence representing the actual value (value-chars), separated by
239 single quote characters. Note that both character set names and
240 language tags are restricted to the US-ASCII character set, and are
241 matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646],
242 Section 2.1.1).
244 Inside the value part, characters not contained in attr-char are
245 encoded into an octet sequence using the specified character set.
246 That octet sequence is then percent-encoded as specified in Section
247 2.1 of [RFC3986].
249 Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1"
250 ([ISO-8859-1]) character set. Extension character sets (mime-
251 charset) are reserved for future use.
253 Note: recipients should be prepared to handle encoding errors,
254 such as malformed or incomplete percent escape sequences, or non-
255 decodable octet sequences, in a robust manner. This specification
256 does not mandate any specific behavior, for instance, the
257 following strategies are all acceptable:
259 * ignoring the parameter,
261 * stripping a non-decodable octet sequence,
263 * substituting a non-decodable octet sequence by a replacement
264 character, such as the Unicode character U+FFFD (Replacement
265 Character).
267 Note: the RFC 2616 token production ([RFC2616], Section 2.2)
268 differs from the production used in RFC 2231 (imported from
269 Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are
270 excluded. Thus, these two characters are excluded from the attr-
271 char production as well.
273 Note: the ABNF defined here differs from the one in
274 Section 2.3 of [RFC2978] in that it does not allow the single
275 quote character (see also RFC Errata ID 1912 [Err1912]). In
276 practice, no character set names using that character have been
277 registered at the time of this writing.
279 3.2.2. Examples
281 Non-extended notation, using "token":
283 foo: bar; title=Economy
285 Non-extended notation, using "quoted-string":
287 foo: bar; title="US-$ rates"
289 Extended notation, using the Unicode character U+00A3 (POUND SIGN):
291 foo: bar; title*=iso-8859-1'en'%A3%20rates
293 Note: the Unicode pound sign character U+00A3 was encoded into the
294 single octet A3 using the ISO-8859-1 character encoding, then
295 percent-encoded. Also, note that the space character was encoded as
296 %20, as it is not contained in attr-char.
298 Extended notation, using the Unicode characters U+00A3 (POUND SIGN)
299 and U+20AC (EURO SIGN):
301 foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates
303 Note: the Unicode pound sign character U+00A3 was encoded into the
304 octet sequence C2 A3 using the UTF-8 character encoding, then
305 percent-encoded. Likewise, the Unicode euro sign character U+20AC
306 was encoded into the octet sequence E2 82 AC, then percent-encoded.
307 Also note that HEXDIG allows both lowercase and uppercase characters,
308 so recipients must understand both, and that the language information
309 is optional, while the character set is not.
311 3.3. Language Specification in Encoded Words
313 Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to
314 also support language specification in encoded words. Although the
315 HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section
316 2.2), it's not clear to which header field exactly it applies, and
317 whether it is implemented in practice (see
318 for details).
320 Thus, this specification does not include this feature.
322 4. Guidelines for Usage in HTTP Header Field Definitions
324 Specifications of HTTP header fields that use the extensions defined
325 in Section 3.2 ought to clearly state that. A simple way to achieve
326 this is to normatively reference this specification, and to include
327 the ext-value production into the ABNF for that header field.
329 For instance:
331 foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param
332 title-param = "title" LWSP "=" LWSP value
333 / "title*" LWSP "=" LWSP ext-value
334 ext-value =
335 Note: The Parameter Value Continuation feature defined in Section
336 3 of [RFC2231] makes it impossible to have multiple instances of
337 extended parameters with identical parmname components, as the
338 processing of continuations would become ambiguous. Thus,
339 specifications using this extension are advised to disallow this
340 case for compatibility with RFC 2231.
342 4.1. When to Use the Extension
344 Section 4.2 of [RFC2277] requires that protocol elements containing
345 human-readable text are able to carry language information. Thus,
346 the ext-value production ought to be always used when the parameter
347 value is of textual nature and its language is known.
349 Furthermore, the extension ought to also be used whenever the
350 parameter value needs to carry characters not present in the US-ASCII
351 ([USASCII]) character set (note that it would be unacceptable to
352 define a new parameter that would be restricted to a subset of the
353 Unicode character set).
355 4.2. Error Handling
357 Header field specifications need to define whether multiple instances
358 of parameters with identical parmname components are allowed, and how
359 they should be processed. This specification suggests that a
360 parameter using the extended syntax takes precedence. This would
361 allow producers to use both formats without breaking recipients that
362 do not understand the extended syntax yet.
364 Example:
366 foo: bar; title="EURO exchange rates";
367 title*=utf-8''%e2%82%ac%20exchange%20rates
369 In this case, the sender provides an ASCII version of the title for
370 legacy recipients, but also includes an internationalized version for
371 recipients understanding this specification -- the latter obviously
372 ought to prefer the new syntax over the old one.
374 Note: at the time of this writing, many implementations failed to
375 ignore the form they do not understand, or prioritize the ASCII
376 form although the extended syntax was present.
378 5. Security Considerations
380 The format described in this document makes it possible to transport
381 non-ASCII characters, and thus enables character "spoofing"
382 scenarios, in which a displayed value appears to be something other
383 than it is.
385 Furthermore, there are known attack scenarios relating to decoding
386 UTF-8.
388 See Section 10 of [RFC3629] for more information on both topics.
390 In addition, the extension specified in this document makes it
391 possible to transport multiple language variants for a single
392 parameter, and such use might allow spoofing attacks, where different
393 language versions of the same parameter are not equivalent. Whether
394 this attack is useful as an attack depends on the parameter
395 specified.
397 6. Acknowledgements
399 Thanks to Martin Duerst and Frank Ellermann for help figuring out
400 ABNF details, to Graham Klyne and Alexey Melnikov for general review,
401 to Chris Newman for pointing out an RFC 2231 incompatibility, and to
402 Benjamin Carlyle and Roar Lauritzsen for implementer's feedback.
404 7. References
406 7.1. Normative References
408 [ISO-8859-1] International Organization for Standardization,
409 "Information technology -- 8-bit single-byte coded
410 graphic character sets -- Part 1: Latin alphabet No.
411 1", ISO/IEC 8859-1:1998, 1998.
413 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
414 Requirement Levels", BCP 14, RFC 2119, March 1997.
416 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
417 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
418 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
420 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration
421 Procedures", BCP 19, RFC 2978, October 2000.
423 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
424 10646", STD 63, RFC 3629, November 2003.
426 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter,
427 "Uniform Resource Identifier (URI): Generic Syntax",
428 STD 66, RFC 3986, January 2005.
430 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for
431 Syntax Specifications: ABNF", STD 68, RFC 5234,
432 January 2008.
434 [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for
435 Identifying Languages", BCP 47, RFC 5646,
436 September 2009.
438 [USASCII] American National Standards Institute, "Coded Character
439 Set -- 7-bit American Standard Code for Information
440 Interchange", ANSI X3.4, 1986.
442 7.2. Informative References
444 [Err1912] RFC Errata, "Errata ID 1912, RFC 2978",
445 .
447 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet
448 Mail Extensions (MIME) Part One: Format of Internet
449 Message Bodies", RFC 2045, November 1996.
451 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
452 Extensions) Part Three: Message Header Extensions for
453 Non-ASCII Text", RFC 2047, November 1996.
455 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
456 Encoded Word Extensions: Character Sets, Languages, and
457 Continuations", RFC 2231, November 1997.
459 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
460 Languages", BCP 18, RFC 2277, January 1998.
462 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/
463 form-data", RFC 2388, August 1998.
465 [RFC5987] Reschke, J., "Character Set and Language Encoding for
466 Hypertext Transfer Protocol (HTTP) Header Field
467 Parameters", RFC 5987, August 2010.
469 URIs
471 [1]
473 [2]
475 Appendix A. Changes from RFC 5987
477 This section summarizes the changes compared to [RFC5987]:
479 [[anchor8: None yet.]]
481 Appendix B. Change Log (to be removed by RFC Editor before publication)
483 B.1. Since RFC5987
485 Only editorial changes for the purpose of starting the revision
486 process (obs5987).
488 Appendix C. Resolved issues (to be removed by RFC Editor before
489 publication)
491 Issues that were either rejected or resolved in this version of this
492 document.
494 C.1. obs5987
496 Type: change
498 julian.reschke@greenbytes.de (2011-04-15): Obsolete RFC 5987,
499 summarize differences.
501 Appendix D. Open issues (to be removed by RFC Editor prior to
502 publication)
504 D.1. edit
506 Type: edit
508 julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for
509 editorial fixes/enhancements.
511 D.2. impls
513 Type: change
515 julian.reschke@greenbytes.de (2011-04-15): Add implementation report.
517 D.3. iso-8859-1
519 Type: change
521 julian.reschke@greenbytes.de (2011-04-15): Remove requirement to
522 support ISO-8859-1? It doesn't really help, and it is not
523 implemented in IE9.
525 Author's Address
527 Julian F. Reschke
528 greenbytes GmbH
529 Hafenweg 16
530 Muenster, NW 48155
531 Germany
533 EMail: julian.reschke@greenbytes.de
534 URI: http://greenbytes.de/tech/webdav/