idnits 2.17.1 

draft-hoehrmann-urlencoded-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 25, 2010) is 4962 days in the past.  Is
     this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 1866
     (Obsoleted by RFC 2854)

  -- Obsolete informational reference (is this intentional?): RFC 4288
     (Obsoleted by RFC 6838)


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                       B. Hoehrmann
3	Internet-Draft                                        September 25, 2010
4	Expires: March 29, 2011

6	               The application/www-form-urlencoded format
7	                     draft-hoehrmann-urlencoded-01

9	Abstract

11	   This memo defines the application/www-form-urlencoded format, a
12	   compact data format that encodes ordered data sets of name-value
13	   pairs of character data.  The format is similar to the format
14	   application/x-www-form-urlencoded first defined in RFC 1866, but
15	   addresses some of that format's shortcomings.

17	Status of This Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on March 29, 2011.

34	Copyright Notice

36	   Copyright (c) 2010 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.  Code Components extracted from this document must
45	   include Simplified BSD License text as described in Section 4.e of
46	   the Trust Legal Provisions and are provided without warranty as
47	   described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
52	   2.  Terminology and Conformance . . . . . . . . . . . . . . . . . . 3
53	   3.  Format syntax . . . . . . . . . . . . . . . . . . . . . . . . . 4
54	   4.  Format semantics  . . . . . . . . . . . . . . . . . . . . . . . 4
55	   5.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
56	   6.  Security considerations . . . . . . . . . . . . . . . . . . . . 7
57	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
58	   8.  Media type registration . . . . . . . . . . . . . . . . . . . . 8
59	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
60	     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
61	     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 9
62	   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . . . 9

64	1.  Introduction

66	   RFC 1866 [RFC1866] introduced the application/x-www-form-urlencoded
67	   media type to facilitate the encoding and transmission of form data
68	   sets.  Formats based on RFC 1866 continued to use this media type as
69	   default encoding format, and other protocols adopted the type for
70	   similar purposes.  The format defined in this document addresses some
71	   of the RFC 1866 format's shortcomings.

73	   The application/www-form-urlencoded format defined in this document
74	   encodes ordered data sets of pairs consisting of a name and a
75	   (possibly undefined) value as a string, with pairs separated by
76	   semicolons and names and values separated by the equals sign.
77	   Special characters are escaped using the percent-encoding scheme also
78	   used for resource identifiers.  Issues of internationalization are
79	   addressed through the use of the UTF-8 character encoding scheme.

81	   For compatibility with the RFC 1866 format the ampersand character is
82	   tolerated as alternative separator character, and the plus sign may
83	   be used to represent space characters.  The new format accepts any
84	   string as valid representation of a data set, except for character
85	   encoding errors, in keeping with typical implementations of the RFC
86	   1866 format.

88	2.  Terminology and Conformance

90	   A character string is a sequence of Unicode scalar values.  An octet
91	   string is a sequence of octets.

93	   A character string conforms to this specification if and only if
94	   encoding it using the UTF-8 character encoding yields an octet string
95	   that conforms to this specification.

97	   A octet string conforms to this specification if and only if it is,
98	   after replacing all sequences that match pct-encoded [RFC3986] by the
99	   corresponding octets, a valid UTF-8 sequence.

101	   A software module that encodes data sets into character strings
102	   conforms to this specification if and only if it does so as defined
103	   in section 3.

105	   A software module that decodes character or octet strings into data
106	   sets conforms to this specification if and only if it does so as
107	   defined in section 3.

109	3.  Format syntax

111	   The syntax of the application/www-form-urlencoded format is defined
112	   by the following ABNF [RFC5234] grammar.  The grammar is ambiguous:
113	   the empty string matches both `empty-set` and `pairs` and percent-
114	   encoded sequences match `escape` and `percent` followed by other
115	   characters.  A match for `escape` takes precedence over a match
116	   involving `percent`.  The choice between interpreting the empty
117	   string as an empty data set or a pair consisting of the empty string
118	   as name and an undefined value is made by individual applications.

120	     data-set  = empty-set / pairs
121	     pairs     = pair *(seperator pair)
122	     pair      = name [ "=" value ]
123	     name      = *(namechar / escape / percent / plus)
124	     value     = *(valuechar / escape / percent / plus)
125	     namechar  = <any octet except ";", "&", "+", "%", "=">
126	     valuechar = <any octet except ";", "&", "+", "%">
127	     escape    = "%" 2hexdig
128	     separator = ";" / "&"
129	     percent   = "%"
130	     plus      = "+"
131	     empty-set = ""

133	   A character string is decoded by encoding it using the UTF-8
134	   character encoding and then decoding the resulting octet string.  An
135	   octet string is decoded by replacing any instance of `escape` by the
136	   corresponding octet, replacing any instance of `plus` by the U+0020
137	   SPACE character, and then decoding the resulting `name` and `value`
138	   instances using the UTF-8 character encoding.  If that results in an
139	   error, the data set is malformed and represents nothing.

141	   A data set is encoded by encoding the names and values using the
142	   UTF-8 character encoding, replacing any octet not matching `namechar`
143	   in the names and replacing any octet not matching `valuechar` in the
144	   values by their percent-encoded equivalent and concatenating them
145	   using "=" and ";" as separators.  The ampersand can be used as
146	   alternative separator, but doing so is discouraged.  Similarily, "%"
147	   only has to be escaped when it is followed by two hex digits, but
148	   keeping it unescaped is discouraged.  Spaces may additionally be
149	   replaced by the plus sign.  Implementations are free to percent-
150	   encode additional octets.

152	4.  Format semantics

154	   This specification defines only the mapping between data sets and
155	   their encoded form.  It is up to individual applications using this
156	   format to define, for instance, whether the ordering of pairs is
157	   significant or how multiple pairs with the same name are handled.

159	5.  Examples

161	   This section provides a number of examples that illustrate encoding
162	   and decoding of data sets as defined in this specification.  At the
163	   beginning of each example is the data set under consideration; it is
164	   followed by equivalent encoded data sets (==) and different ones
165	   (!!).  The notation <U+XXXX> is used to refer to Unicode scalar
166	   values.  The equivalence rules here are only those that all
167	   implementations must recognize, individual applications may define
168	   additional rules.

170	   There are multiple ways to represent space characters, they can occur
171	   literally, as a plus sign, or as percent-encoded sequences.  All
172	   white space is considered significant and retained unmodified.

174	     [(' a ', ' 1 ')]
175	       == ' a = 1 '
176	       == '+a+=+1+'
177	       == '%20a%20=%201%20'
178	       !! 'a=1'

180	   Characters typically used to represent the end of a line are not
181	   considered special, and no normalization of such characters is
182	   performed.

184	     [('text', 'x<U+000A>y')]
185	       == 'text=x<U+000A>y'
186	       == 'text=x%0Ay'
187	       !! 'text=x%0D%0Ay'
188	       !! 'text=x%0Dy'

190	   Similarily, characters outside the repertoire of US-ASCII are not
191	   handled in any special manner:

193	     [('constellation', 'Bo<U+00F6>tes')]
194	       == 'constellation=Bo<U+00F6>tes'
195	       == 'constellation=Bo%C3%B6tes'
196	       !! 'constellation=Boo<U+0308>tes'

198	   The character U+0000 can occur in data sets and encoders and decoders
199	   have to be prepared to handle them unless applications that employ
200	   them gurantee otherwise.  It is incorrect so truncate the data set at
201	   the first occurence of such a character.

203	     [('name', '<U+0000>value')]
204	       == 'name=<U+0000>value'
205	       == 'name=%00value'
206	       !! 'name='

208	   The following example illustrates handling of percent-encoding.
209	   While it is discouraged to have percent signs in encoded data sets
210	   that are not followed by two hex digits, decoders have to be prepared
211	   to handle them.

213	     [('Cipher', 'c=(m^e)%n')]
214	       == 'Cipher=c%3D(m%5Ee)%25n'
215	       == 'Cipher=c=(m%5Ee)%25n'
216	       == 'Cipher=c=(m^e)%n'
217	       == '%43%69%70%68%65%72=%63%3d%28%6D%5E%65%29%25%6e'
218	       !! 'Cipher%3Dc%3D(m%5Ee)%25n'
219	       !! 'Cipher=c=(m^e)'
220	       !! 'Cipher=c'

222	   The following six examples illustrate handling of empty name fields,
223	   empty value fields, and undefined value fields.  The empty string is
224	   ambiguous as noted earlier in this document.

226	     [('', undefined), ('', undefined)] == ';'
227	     [('', undefined), ('', '')]        == ';='
228	     [('', ''), ('', undefined)]        == '=;'
229	     [('', ''), ('', '')]               == '=;='
230	     [('', undefined)]                  == ''
231	     []                                 == ''
232	     [('', '')]                         == '='

234	   The separator characters ";" and "&" can both be used in encoded data
235	   sets; they always separate pairs if not escaped, even if both of them
236	   occur in a single string.

238	     [('a&b', '1'), ('c', '2;3'), ('e', '4')]
239	       == 'a%26b=1;c=2%3B3;e=4'
240	       == 'a%26b=1&c=2%3B3&e=4'
241	       == 'a%26b=1;c=2%3B3&e=4'
242	       == 'a%26b=1&c=2%3B3;e=4'
243	       !! 'a&b=1;c=2%3B3;e=4'
244	       !! 'a%26b=1&c=2;3&e=4'

246	   Undefined values allow to represent certain information in a more
247	   compact form.  A filter that selects columns in a product listing for
248	   instance could be encoded as follows:

250	     [('image', undefined), ('title', undefined), ('price', undefined)]
251	       == 'image;title;price'

253	   The following examples do not conform to this specification due to
254	   character encoding errors and consequently represent nothing.

256	     * 'Lookup=%ED%AD%80%ED%B1%BF'
257	     * 'Lookup=%FE%83%9E%AB%9B%BB%AF'
258	     * 'Lookup=%C0%80'
259	     * 'Lookup=%C3'
260	     * 'Lookup=Bo%F6tes'

262	6.  Security considerations

264	   None not already inherent to the processing of the UTF-8 character
265	   encoding [RFC3629] and the handling of percent-encoded sequences
266	   [RFC3986].  Depending on how the format defined in this document is
267	   being used, the security considerations of the aforementioned RFCs,
268	   [RFC3987], and [RFC3875] might inform security decisions.

270	7.  IANA Considerations

272	   This memo registers application/www-form-urlencoded as per [RFC4288].

274	8.  Media type registration

276	   Type name:               application
277	   Subtype name:            www-form-urlencoded
278	   Required parameters:     none
279	   Optional parameters:     none

281	      Note: The media type does not have a 'charset' parameter, it
282	      is incorrect specify one and to associate any significance to
283	      it if specified. The character encoding is always UTF-8. The
284	      Unicode encoding form signature is not supported; a leading
285	      U+FEFF character will be considered part of a <name>.

287	   Encoding considerations: 8bit

289	   Security considerations: See section 9.
290	   Interoperability considerations:
291	      None, except as noted in other sections of this document.

293	   Published specification: RFC XXXX
294	   Applications that use this media type:
295	      Systems that interchange data sets of name-value pairs.

297	   Additional information:

299	      Magic number(s):             n/a
300	      File extension(s):           n/a
301	      Macintosh file type code(s): TEXT
302	      Fragment identifiers:        n/a

304	   Person & email address to contact for further information:
305	      See Author's Address section.

307	   Intended usage:          COMMON
308	   Restrictions on usage:   n/a
309	   Author:                  See Author's Address section.
310	   Change controller:       The IESG.

312	9.  References

314	9.1.  Normative References

316	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
317	              10646", STD 63, RFC 3629, November 2003.

319	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
320	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

322	9.2.  Informative References

324	   [RFC1866]  Berners-Lee, T. and D. Connolly, "Hypertext Markup
325	              Language - 2.0", RFC 1866, November 1995.

327	   [RFC3875]  Robinson, D. and K. Coar, "The Common Gateway Interface
328	              (CGI) Version 1.1", RFC 3875, October 2004.

330	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
331	              Resource Identifier (URI): Generic Syntax", STD 66,
332	              RFC 3986, January 2005.

334	   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
335	              Identifiers (IRIs)", RFC 3987, January 2005.

337	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
338	              Registration Procedures", BCP 13, RFC 4288, December 2005.

340	Appendix A.  Acknowledgements

342	   Mark Nottingham pointed out a serious omission in the first draft of
343	   this document.

345	Author's Address

347	   Bjoern Hoehrmann
348	   Mittelstrasse 50
349	   39114 Magdeburg
350	   Germany

352	   EMail: mailto:bjoern@hoehrmann.de
353	   URI:   http://bjoern.hoehrmann.de

355	   Note: Please write "Bjoern Hoehrmann" with o-umlaut (U+00F6) wherever
356	   possible, e.g., as "Bj&#246;rn H&#246;hrmann" in HTML and XML.