idnits 2.17.1 draft-hoehrmann-urlencoded-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 25, 2010) is 4962 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 1866 (Obsoleted by RFC 2854) -- Obsolete informational reference (is this intentional?): RFC 4288 (Obsoleted by RFC 6838) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Hoehrmann 3 Internet-Draft September 25, 2010 4 Expires: March 29, 2011 6 The application/www-form-urlencoded format 7 draft-hoehrmann-urlencoded-01 9 Abstract 11 This memo defines the application/www-form-urlencoded format, a 12 compact data format that encodes ordered data sets of name-value 13 pairs of character data. The format is similar to the format 14 application/x-www-form-urlencoded first defined in RFC 1866, but 15 addresses some of that format's shortcomings. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on March 29, 2011. 34 Copyright Notice 36 Copyright (c) 2010 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Terminology and Conformance . . . . . . . . . . . . . . . . . . 3 53 3. Format syntax . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 4. Format semantics . . . . . . . . . . . . . . . . . . . . . . . 4 55 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 6. Security considerations . . . . . . . . . . . . . . . . . . . . 7 57 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 58 8. Media type registration . . . . . . . . . . . . . . . . . . . . 8 59 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 60 9.1. Normative References . . . . . . . . . . . . . . . . . . . 8 61 9.2. Informative References . . . . . . . . . . . . . . . . . . 9 62 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . . 9 64 1. Introduction 66 RFC 1866 [RFC1866] introduced the application/x-www-form-urlencoded 67 media type to facilitate the encoding and transmission of form data 68 sets. Formats based on RFC 1866 continued to use this media type as 69 default encoding format, and other protocols adopted the type for 70 similar purposes. The format defined in this document addresses some 71 of the RFC 1866 format's shortcomings. 73 The application/www-form-urlencoded format defined in this document 74 encodes ordered data sets of pairs consisting of a name and a 75 (possibly undefined) value as a string, with pairs separated by 76 semicolons and names and values separated by the equals sign. 77 Special characters are escaped using the percent-encoding scheme also 78 used for resource identifiers. Issues of internationalization are 79 addressed through the use of the UTF-8 character encoding scheme. 81 For compatibility with the RFC 1866 format the ampersand character is 82 tolerated as alternative separator character, and the plus sign may 83 be used to represent space characters. The new format accepts any 84 string as valid representation of a data set, except for character 85 encoding errors, in keeping with typical implementations of the RFC 86 1866 format. 88 2. Terminology and Conformance 90 A character string is a sequence of Unicode scalar values. An octet 91 string is a sequence of octets. 93 A character string conforms to this specification if and only if 94 encoding it using the UTF-8 character encoding yields an octet string 95 that conforms to this specification. 97 A octet string conforms to this specification if and only if it is, 98 after replacing all sequences that match pct-encoded [RFC3986] by the 99 corresponding octets, a valid UTF-8 sequence. 101 A software module that encodes data sets into character strings 102 conforms to this specification if and only if it does so as defined 103 in section 3. 105 A software module that decodes character or octet strings into data 106 sets conforms to this specification if and only if it does so as 107 defined in section 3. 109 3. Format syntax 111 The syntax of the application/www-form-urlencoded format is defined 112 by the following ABNF [RFC5234] grammar. The grammar is ambiguous: 113 the empty string matches both `empty-set` and `pairs` and percent- 114 encoded sequences match `escape` and `percent` followed by other 115 characters. A match for `escape` takes precedence over a match 116 involving `percent`. The choice between interpreting the empty 117 string as an empty data set or a pair consisting of the empty string 118 as name and an undefined value is made by individual applications. 120 data-set = empty-set / pairs 121 pairs = pair *(seperator pair) 122 pair = name [ "=" value ] 123 name = *(namechar / escape / percent / plus) 124 value = *(valuechar / escape / percent / plus) 125 namechar = 127 escape = "%" 2hexdig 128 separator = ";" / "&" 129 percent = "%" 130 plus = "+" 131 empty-set = "" 133 A character string is decoded by encoding it using the UTF-8 134 character encoding and then decoding the resulting octet string. An 135 octet string is decoded by replacing any instance of `escape` by the 136 corresponding octet, replacing any instance of `plus` by the U+0020 137 SPACE character, and then decoding the resulting `name` and `value` 138 instances using the UTF-8 character encoding. If that results in an 139 error, the data set is malformed and represents nothing. 141 A data set is encoded by encoding the names and values using the 142 UTF-8 character encoding, replacing any octet not matching `namechar` 143 in the names and replacing any octet not matching `valuechar` in the 144 values by their percent-encoded equivalent and concatenating them 145 using "=" and ";" as separators. The ampersand can be used as 146 alternative separator, but doing so is discouraged. Similarily, "%" 147 only has to be escaped when it is followed by two hex digits, but 148 keeping it unescaped is discouraged. Spaces may additionally be 149 replaced by the plus sign. Implementations are free to percent- 150 encode additional octets. 152 4. Format semantics 154 This specification defines only the mapping between data sets and 155 their encoded form. It is up to individual applications using this 156 format to define, for instance, whether the ordering of pairs is 157 significant or how multiple pairs with the same name are handled. 159 5. Examples 161 This section provides a number of examples that illustrate encoding 162 and decoding of data sets as defined in this specification. At the 163 beginning of each example is the data set under consideration; it is 164 followed by equivalent encoded data sets (==) and different ones 165 (!!). The notation is used to refer to Unicode scalar 166 values. The equivalence rules here are only those that all 167 implementations must recognize, individual applications may define 168 additional rules. 170 There are multiple ways to represent space characters, they can occur 171 literally, as a plus sign, or as percent-encoded sequences. All 172 white space is considered significant and retained unmodified. 174 [(' a ', ' 1 ')] 175 == ' a = 1 ' 176 == '+a+=+1+' 177 == '%20a%20=%201%20' 178 !! 'a=1' 180 Characters typically used to represent the end of a line are not 181 considered special, and no normalization of such characters is 182 performed. 184 [('text', 'xy')] 185 == 'text=xy' 186 == 'text=x%0Ay' 187 !! 'text=x%0D%0Ay' 188 !! 'text=x%0Dy' 190 Similarily, characters outside the repertoire of US-ASCII are not 191 handled in any special manner: 193 [('constellation', 'Botes')] 194 == 'constellation=Botes' 195 == 'constellation=Bo%C3%B6tes' 196 !! 'constellation=Bootes' 198 The character U+0000 can occur in data sets and encoders and decoders 199 have to be prepared to handle them unless applications that employ 200 them gurantee otherwise. It is incorrect so truncate the data set at 201 the first occurence of such a character. 203 [('name', 'value')] 204 == 'name=value' 205 == 'name=%00value' 206 !! 'name=' 208 The following example illustrates handling of percent-encoding. 209 While it is discouraged to have percent signs in encoded data sets 210 that are not followed by two hex digits, decoders have to be prepared 211 to handle them. 213 [('Cipher', 'c=(m^e)%n')] 214 == 'Cipher=c%3D(m%5Ee)%25n' 215 == 'Cipher=c=(m%5Ee)%25n' 216 == 'Cipher=c=(m^e)%n' 217 == '%43%69%70%68%65%72=%63%3d%28%6D%5E%65%29%25%6e' 218 !! 'Cipher%3Dc%3D(m%5Ee)%25n' 219 !! 'Cipher=c=(m^e)' 220 !! 'Cipher=c' 222 The following six examples illustrate handling of empty name fields, 223 empty value fields, and undefined value fields. The empty string is 224 ambiguous as noted earlier in this document. 226 [('', undefined), ('', undefined)] == ';' 227 [('', undefined), ('', '')] == ';=' 228 [('', ''), ('', undefined)] == '=;' 229 [('', ''), ('', '')] == '=;=' 230 [('', undefined)] == '' 231 [] == '' 232 [('', '')] == '=' 234 The separator characters ";" and "&" can both be used in encoded data 235 sets; they always separate pairs if not escaped, even if both of them 236 occur in a single string. 238 [('a&b', '1'), ('c', '2;3'), ('e', '4')] 239 == 'a%26b=1;c=2%3B3;e=4' 240 == 'a%26b=1&c=2%3B3&e=4' 241 == 'a%26b=1;c=2%3B3&e=4' 242 == 'a%26b=1&c=2%3B3;e=4' 243 !! 'a&b=1;c=2%3B3;e=4' 244 !! 'a%26b=1&c=2;3&e=4' 246 Undefined values allow to represent certain information in a more 247 compact form. A filter that selects columns in a product listing for 248 instance could be encoded as follows: 250 [('image', undefined), ('title', undefined), ('price', undefined)] 251 == 'image;title;price' 253 The following examples do not conform to this specification due to 254 character encoding errors and consequently represent nothing. 256 * 'Lookup=%ED%AD%80%ED%B1%BF' 257 * 'Lookup=%FE%83%9E%AB%9B%BB%AF' 258 * 'Lookup=%C0%80' 259 * 'Lookup=%C3' 260 * 'Lookup=Bo%F6tes' 262 6. Security considerations 264 None not already inherent to the processing of the UTF-8 character 265 encoding [RFC3629] and the handling of percent-encoded sequences 266 [RFC3986]. Depending on how the format defined in this document is 267 being used, the security considerations of the aforementioned RFCs, 268 [RFC3987], and [RFC3875] might inform security decisions. 270 7. IANA Considerations 272 This memo registers application/www-form-urlencoded as per [RFC4288]. 274 8. Media type registration 276 Type name: application 277 Subtype name: www-form-urlencoded 278 Required parameters: none 279 Optional parameters: none 281 Note: The media type does not have a 'charset' parameter, it 282 is incorrect specify one and to associate any significance to 283 it if specified. The character encoding is always UTF-8. The 284 Unicode encoding form signature is not supported; a leading 285 U+FEFF character will be considered part of a . 287 Encoding considerations: 8bit 289 Security considerations: See section 9. 290 Interoperability considerations: 291 None, except as noted in other sections of this document. 293 Published specification: RFC XXXX 294 Applications that use this media type: 295 Systems that interchange data sets of name-value pairs. 297 Additional information: 299 Magic number(s): n/a 300 File extension(s): n/a 301 Macintosh file type code(s): TEXT 302 Fragment identifiers: n/a 304 Person & email address to contact for further information: 305 See Author's Address section. 307 Intended usage: COMMON 308 Restrictions on usage: n/a 309 Author: See Author's Address section. 310 Change controller: The IESG. 312 9. References 314 9.1. Normative References 316 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 317 10646", STD 63, RFC 3629, November 2003. 319 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 320 Specifications: ABNF", STD 68, RFC 5234, January 2008. 322 9.2. Informative References 324 [RFC1866] Berners-Lee, T. and D. Connolly, "Hypertext Markup 325 Language - 2.0", RFC 1866, November 1995. 327 [RFC3875] Robinson, D. and K. Coar, "The Common Gateway Interface 328 (CGI) Version 1.1", RFC 3875, October 2004. 330 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 331 Resource Identifier (URI): Generic Syntax", STD 66, 332 RFC 3986, January 2005. 334 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 335 Identifiers (IRIs)", RFC 3987, January 2005. 337 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 338 Registration Procedures", BCP 13, RFC 4288, December 2005. 340 Appendix A. Acknowledgements 342 Mark Nottingham pointed out a serious omission in the first draft of 343 this document. 345 Author's Address 347 Bjoern Hoehrmann 348 Mittelstrasse 50 349 39114 Magdeburg 350 Germany 352 EMail: mailto:bjoern@hoehrmann.de 353 URI: http://bjoern.hoehrmann.de 355 Note: Please write "Bjoern Hoehrmann" with o-umlaut (U+00F6) wherever 356 possible, e.g., as "Björn Höhrmann" in HTML and XML.