idnits 2.17.1 draft-masinter-multipart-form-data-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 116: '... names SHOULD be unique. After a fo...' RFC 2119 keyword, line 121: '... MUST contain a "Content-Disposition" header [RFC2183] where the...' RFC 2119 keyword, line 145: '... the file SHOULD be supplied as well...' RFC 2119 keyword, line 146: '...on header. (The SHOULD is to allow fi...' RFC 2119 keyword, line 152: '...lename parameter MUST be restricted to...' (8 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 21, 2013) is 3863 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1806' is defined on line 359, but no explicit reference was found in the text == Unused Reference: 'RFC2184' is defined on line 375, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1806 (Obsoleted by RFC 2183) ** Obsolete normative reference: RFC 2184 (Obsoleted by RFC 2231) -- Obsolete informational reference (is this intentional?): RFC 1867 (Obsoleted by RFC 2854) -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Masinter 3 Internet-Draft Adobe 4 Obsoletes: 2388 (if approved) September 21, 2013 5 Intended status: Standards Track 6 Expires: March 23, 2014 8 Returning Values from Forms: multipart/form-data 9 draft-masinter-multipart-form-data-03 11 Abstract 13 This specification (re)defines the multipart/form-data Internet Media 14 Type, which can be used by a wide variety of applications and 15 transported by a wide variety of protocols as a way of returning a 16 set of values as the result of a user filling out a form. It 17 replaces RFC 2388. 19 NOTE 21 There is a GitHub repository for this draft at https://github.com/ 22 masinter/multipart-form-data along with an issue tracker. This 23 specification has been proposed as a work item of the APPSAWG 24 Applications Area working group, apps-discuss@ietf.org. Please raise 25 issues in the tracker, or send to the apps-discuss list. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on March 23, 2014. 44 Copyright Notice 46 Copyright (c) 2013 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (http://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Simplified BSD License text 55 as described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Definition of multipart/form-data . . . . . . . . . . . . . . 3 62 2.1. Boundary . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2.2. filename attribute . . . . . . . . . . . . . . . . . . . . 3 64 2.3. Multiple files for one form field . . . . . . . . . . . . 4 65 2.4. Content-Type . . . . . . . . . . . . . . . . . . . . . . . 4 66 2.5. The charset parameter . . . . . . . . . . . . . . . . . . 4 67 2.6. The _charset_ field . . . . . . . . . . . . . . . . . . . 4 68 2.7. Content-Transfer-Encoding . . . . . . . . . . . . . . . . 4 69 2.8. Other Content- headers . . . . . . . . . . . . . . . . . . 4 70 3. Operability considerations . . . . . . . . . . . . . . . . . . 5 71 3.1. Non-ASCII field names and values . . . . . . . . . . . . . 5 72 3.1.1. Avoid creating forms with non-ASCII field names . . . 5 73 3.1.2. Ampersand hash encoding . . . . . . . . . . . . . . . 5 74 3.1.3. Interpreting forms and creating form-data . . . . . . 5 75 3.1.4. Parsing and interpreting form data . . . . . . . . . . 6 76 3.2. Ordered fields and duplicated field names . . . . . . . . 6 77 3.3. Interoperability with web applications . . . . . . . . . . 6 78 3.4. Correlating form data with the original form . . . . . . . 6 79 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 80 5. Media type registration for multipart/form-data . . . . . . . 7 81 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 82 6.1. Normative References . . . . . . . . . . . . . . . . . . . 8 83 6.2. Informative References . . . . . . . . . . . . . . . . . . 8 84 Appendix A. Changes from RFC 2388 . . . . . . . . . . . . . . . . 8 85 Appendix B. Alternatives . . . . . . . . . . . . . . . . . . . . . 9 86 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 9 88 1. Introduction 90 In many applications, it is possible for a user to be presented with 91 a form. The user will fill out the form, including information that 92 is typed, generated by user input, or included from files that the 93 user has selected. When the form is filled out, the data from the 94 form is sent from the user to the receiving application. 96 The definition of "multipart/form-data" is derived from one of those 97 applications, originally set out in [RFC1867] and subsequently 98 incorporated into [HTML3.2] and [HTML4], where forms are expressed in 99 HTML, and in which the form values are sent via HTTP or electronic 100 mail. This representation is widely implemented in numerous web 101 browsers and web servers. 103 However, multipart/form-data can be used for forms that are presented 104 using representations other than HTML (spreadsheets, Portable 105 Document Format, etc.), and for transport using other means than 106 electronic mail or HTTP. This document defines the representation of 107 form values independently of the application for which it is used. 109 2. Definition of multipart/form-data 111 The media-type multipart/form-data generally follows the model of 112 multipart MIME data streams as described in [RFC2046] Section 5.1. 114 In forms, there are a series of fields to be supplied by the user who 115 fills out the form. Each field has a name. Within a given form, the 116 names SHOULD be unique. After a form has been "filled out" and 117 "submitted" (processes defined by the form), the result is a set of 118 values for each field-- the form-data. 120 A "multipart/form-data" body contains a series of parts. Each part 121 MUST contain a "Content-Disposition" header [RFC2183] where the 122 disposition type is "form-data", and where the disposition contains 123 an (additional) parameter of "name"; the value of the parameter is 124 the original field name from the form (encoded, see Section 3.1). 125 For example, a part might contain a header: 127 Content-Disposition: form-data; name="user" 129 with the value corresponding to the entry of the "user" field. 131 2.1. Boundary 133 As with other multipart types, the parts are delimited with a 134 boundary, selected such that it does not occur in any of the data. 135 Each field of the form is sent, in the order defined by the sending 136 application and form, as a part of the multipart stream. The 137 boundary is supplied as a "boundary" parameter to the multipart/form- 138 data type, e.g., 140 multipart/form-data;boundary="-AaB03x" 142 2.2. filename attribute 144 For form data that represents the content of a local file, a name for 145 the file SHOULD be supplied as well, by using a "filename" parameter 146 of the Content-Distribution header. (The SHOULD is to allow file 147 uploads that result from drag-and-drop in systems where the file name 148 is meaningless or private, where the uploaded content is streamed 149 directly from a device, or where the file name is not user visible 150 and would be unrecognized.) 151 For compatibility with other multipart types, the value of the 152 filename parameter MUST be restricted to US-ASCII. File names 153 normally visible to users which contain non-ASCII characters SHOULD 154 be encoded using the &#nn; method described in Section 3.1.2. 156 2.3. Multiple files for one form field 158 If the value of a form field is a set of files rather than a single 159 file, that value MUST be transmitted by supplying each in a separate 160 part, but all with the same "name", parameter. 162 2.4. Content-Type 164 Each part has an (optional) "Content-Type", which defaults to "text/ 165 plain". If the contents of a file are to be sent, the file data is 166 labeled with an appropriate media type, if known, or "application/ 167 octet-stream". 169 2.5. The charset parameter 171 In the case where a field value is text, the charset parameter for 172 the "text/plain" "Content-Type" may be used to indicate the character 173 encoding used in that part. For example, a form with a text field in 174 which a user typed "Joe owes 100" where is the Euro symbol 175 might have form data returned as: 177 --AaB03x 178 content-disposition: form-data; name="field1" 179 content-type: text/plain;charset=windows-1250 180 content-transfer-encoding: quoted-printable 182 Joe owes =80100. 183 --AaB03x 185 2.6. The _charset_ field 187 Forms have the convention that the value of a form entry with entry 188 name "_charset_" and type "hidden" is automatically set to the name 189 of the form-charset. In this case, the value of the default charset 190 of each text/plain part without a charset parameter is the supplied 191 value. 193 2.7. Content-Transfer-Encoding 195 When used in transports which do not allow arbitrary binary data, 196 each part that cannot be represented within the transport SHOULD be 197 encoded and the "Content-Transfer-Encoding" header supplied in that 198 part. For example, some email transports use a 7BIT encoding. (See 199 section 5 of [RFC2046] for more details.) When transferred via HTTP, 200 Content-Transfer-Encoding the form-data values SHOULD NOT be used. 202 2.8. Other Content- headers 203 The "multipart/form-data" media type does not support any MIME 204 headers in the parts other than Content-Type, Content-Disposition, 205 and (when appropriate), Content-Transfer-Encoding. 207 3. Operability considerations 209 3.1. Non-ASCII field names and values 211 MIME headers in multipart/form-data are required to consist only of 212 7-bit data in the US-ASCII character set. While [RFC2388] suggested 213 that non-ASCII field names should be encoded according to the method 214 in [RFC2047] if they contain characters outside of US-ASCII, practice 215 varies. 217 This specification makes three recommendations for three different 218 states of workflow. 220 3.1.1. Avoid creating forms with non-ASCII field names 222 For broadest interoperability with existing deployed software, those 223 creating forms SHOULD avoid non-ASCII field names. This should not 224 be a burden, because in general the field names are not visible to 225 users. 227 3.1.2. Ampersand hash encoding 229 Within this specification, the "ampersand hash encoding" is used for 230 representing characters that are not allowed in a context: replace 231 each disallowed character character by a string consisting of an 232 ampersand (&), a hash mark (#), one or more ASCII digits representing 233 the Unicode code point of the character in base ten, and finally a 234 semicolon (;). 236 3.1.3. Interpreting forms and creating form-data 238 Some applications of this specification will supply a character 239 encoding to be used for creation of the multipart/form-data result. 240 In particular, [HTML5] uses: 242 o the value of an accept-charset attribute of the
element, if 243 there is one, 245 o the character encoding of the document containing the form, if it 246 is US-ASCII compatible, 248 o otherwise UTF-8. 250 Call this the form-charset. Any field name or file name which is not 251 in US-ASCII must be encoded using the &#nn; encoding in Section 3.1.2 253 multipart/form-data parts which do not have a Content-Type header and 254 which are not the result of supplying a local file MUST be 255 transformed by the same algorithm. 257 3.1.4. Parsing and interpreting form data 259 While this specification provides guidance for creation of multipart/ 260 form-data, interpreters of multipart/form-data should be aware of the 261 variety of implementations. Currently, deployed browsers differ as 262 to how they encode multipart/form-data. For this reason the matching 263 of form elements to form-data parts may rely on a fuzzier match. In 264 particular, some form-data generators might have followed the advice 265 of [RFC2388] and used the [RFC2047] "encoded-word" method of encoding 266 non-ASCII values: 268 encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 270 Others have been known to follow [RFC2231] or to send unencoded UTF-8 271 or even unencoded strings in the form-charset. 273 Generally, interpreting "multipart/form-data" (even from conforming 274 generators) may require knowing the charset used in form encoding, in 275 cases where the _charset_ field value or a charset parameter of a 276 text/plain Content-Type header is not supplied. 278 3.2. Ordered fields and duplicated field names 280 Form processors given forms with a well-defined ordering SHOULD send 281 back results in the order received and preserve duplicate field 282 names, in order. Intermediaries MUST NOT reorder the results.(Note 283 that there are some forms which do not define a natural order of 284 appearance.) 286 3.3. Interoperability with web applications 288 Many web applications use the "application/x-url-encoded" method for 289 returning data from forms. This format is quite compact, e.g.: 291 name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send 293 However, there is no opportunity to label the enclosed data with 294 content type, apply a charset, or use other encoding mechanisms. 296 Many form-interpreting programs (primarily web browsers) now 297 implement and generate multipart/form-data, but an existing 298 application might need to optionally support both the application/x 299 -url-encoded format as well. 301 3.4. Correlating form data with the original form 302 This specification provides no specific mechanism by which multipart/ 303 form-data can be associated with the form that caused it to be 304 transmitted. This separation is intentional; many different forms 305 might be used for transmitting the same data. In practice, 306 applications may supply a specific form processing resource (in HTML, 307 the ACTION attribute in a FORM tag) for each different form. 308 Alternatively, data about the form might be encoded in a "hidden 309 field" (a field which is part of the form but which has a fixed value 310 to be transmitted back to the form-data processor.) 312 4. Security Considerations 314 It is important when interpreting the filename of the Content- 315 Disposition header to not overwrite files in the recipient's file 316 space inadvertently. 318 User applications that request form information from users must be 319 careful not to cause a user to send information to the requestor or a 320 third party unwillingly or unwittingly. For example, a form might 321 request 'spam' information to be sent to an unintended third party, 322 or private information to be sent to someone that the user might not 323 actually intend. While this is primarily an issue for the 324 representation and interpretation of forms themselves, rather than 325 the data representation of the result of form data, the 326 transportation of private information must be done in a way that does 327 not expose it to unwanted prying. 329 With the introduction of form-data that can reasonably send back the 330 content of files from a user's file space, the possibility arises 331 that a user might be sent an automated script that fills out a form 332 and then sends one of the user's local files to another address. 333 Thus, additional caution is required when executing automated 334 scripting where form-data might include a user's files. 336 5. Media type registration for multipart/form-data 338 Media Type name: multipart 340 Media subtype name: form-data 342 Required parameters: boundary 344 Optional parameters: none 346 Encoding considerations: For use in transports that restrict the 347 encoding to 7BIT or 8BIT, each part is encoded separately. 349 Security considerations: Applications which receive forms and process 350 them must be careful not to supply data back to the requesting 351 form processing site that was not intended to be sent by the 352 recipient. This is a consideration for any application that 353 generates a multipart/form-data. See Section 4 of this document. 355 6. References 357 6.1. Normative References 359 [RFC1806] Troost, R. and S. Dorner, "Communicating Presentation 360 Information in Internet Messages: The Content-Disposition 361 Header", RFC 1806, June 1995. 363 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 364 Extensions (MIME) Part Two: Media Types", RFC 2046, 365 November 1996. 367 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 368 Part Three: Message Header Extensions for Non-ASCII Text", 369 RFC 2047, November 1996. 371 [RFC2183] Troost, R., Dorner, S. and K. Moore, "Communicating 372 Presentation Information in Internet Messages: The 373 Content-Disposition Header Field", RFC 2183, August 1997. 375 [RFC2184] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 376 Word Extensions: Character Sets, Languages, and 377 Continuations", RFC 2184, August 1997. 379 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 380 Word Extensions: Character Sets, Languages, and 381 Continuations", RFC 2231, November 1997. 383 6.2. Informative References 385 [HTML3.2] Raggett, D., "HTML 3.2 Reference Specification", World 386 Wide Web Consortium Recommendation REC-html32-19970114, 387 January 1997, . 389 [HTML4] Raggett, D., Hors, A. and I. Jacobs, "HTML 4.0 390 Recommendation", World Wide Web Consortium REC- 391 html40-971218, December 1997, . 394 [HTML5] Berjon, R., Faulkner, S., Leithead, T., Navara, E., 395 O'Connor, E. and S. Pfeiffer, "HTML5", September 2013, 396 . 398 [RFC1867] Nebel, E. and L. Masinter, "Form-based File Upload in 399 HTML", RFC 1867, November 1995. 401 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 402 form-data", RFC 2388, August 1998. 404 Appendix A. Changes from RFC 2388 405 The handling of multiple files submitted as the result of a single 406 form field (e.g., HTML's element) results 407 in each file having its own top level part with the same name 408 parameter; the method of using a nested "multipart/mixed" from 409 [RFC2388] is not recommended. 411 The _charset_ convention and use of an explicit form-data charset is 412 documented. 414 The handling of non-ASCII field names is changed significantly. Few 415 if any implemented the =?charset:string?= method of [RFC2047]. 417 The relationship of the ordering of fields within a form and the 418 ordering of returned values within multipart/form-data was not 419 defined before, nor was the handling of the case where a form has 420 multiple fields with the same name. 422 More prescriptive about order and duplicates. 424 Remove obsolete discussion of alternatives. 426 Appendix B. Alternatives 428 There are numerous alternative ways in which form data can be 429 encoded; many are listed in [RFC2388] under "Other data encodings 430 rather than multipart." The multipart/form-data encoding is verbose, 431 especially if there are many fields with short values. In most use 432 cases, this overhead isn't significant. 434 More problematic is the ambiguity introduced because implementations 435 did not follow [RFC2388] because it used "may" instead of "MUST" when 436 specifying encoding of field names, and for other unknown reasons, so 437 now, parsers need to be more complex for fuzzy matching against the 438 possible outputs of various encoding methods. 440 Author's Address 442 Larry Masinter 443 Adobe 445 Email: masinter@adobe.com 446 URI: http://larry.masinter.net