idnits 2.17.1 draft-ietf-appsawg-multipart-form-data-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 166: '...undary delimiter MUST NOT appear insid...' RFC 2119 keyword, line 172: '... Each part MUST contain a "content-disposition" header [RFC2183] and...' RFC 2119 keyword, line 174: '... header MUST also contain an additio...' RFC 2119 keyword, line 187: '... file SHOULD be supplied as well, by...' RFC 2119 keyword, line 196: '...visible to users MAY be encoded (using...' (13 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 2, 2014) is 3426 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 1867 (Obsoleted by RFC 2854) -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) -- Obsolete informational reference (is this intentional?): RFC 5987 (Obsoleted by RFC 8187) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Masinter 3 Internet-Draft Adobe 4 Obsoletes: 2388 (if approved) December 2, 2014 5 Intended status: Standards Track 6 Expires: June 5, 2015 8 Returning Values from Forms: multipart/form-data 9 draft-ietf-appsawg-multipart-form-data-07 11 Abstract 13 This specification (re)defines the multipart/form-data Internet Media 14 Type, which can be used by a wide variety of applications and 15 transported by a wide variety of protocols as a way of returning a 16 set of values as the result of a user filling out a form. It 17 replaces RFC 2388. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 5, 2015. 36 Copyright Notice 38 Copyright (c) 2014 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. NOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 3. URL encoding non-ASCII values . . . . . . . . . . . . . . . . 3 56 4. Advice for Forms and Form Processing . . . . . . . . . . . . 3 57 5. Definition of multipart/form-data . . . . . . . . . . . . . . 4 58 5.1. Boundary parameter of multipart/form-data . . . . . . . . 4 59 5.2. Content-Disposition header for each part . . . . . . . . 4 60 5.3. filename attribute of content-distribution part header . 5 61 5.4. Multiple files for one form field . . . . . . . . . . . . 5 62 5.5. Content-Type header for each part . . . . . . . . . . . . 5 63 5.6. The charset parameter for text/plain form data . . . . . 6 64 5.7. The _charset_ field for default charset . . . . . . . . . 6 65 5.8. Content-Transfer-Encoding deprecated . . . . . . . . . . 6 66 5.9. Other Content- headers . . . . . . . . . . . . . . . . . 7 67 6. Operability considerations . . . . . . . . . . . . . . . . . 7 68 6.1. Non-ASCII field names and values . . . . . . . . . . . . 7 69 6.1.1. Avoid non-ASCII field names . . . . . . . . . . . . . 7 70 6.1.2. Interpreting forms and creating form-data . . . . . . 7 71 6.1.3. Parsing and interpreting form data . . . . . . . . . 8 72 6.2. Ordered fields and duplicated field names . . . . . . . . 8 73 6.3. Interoperability with web applications . . . . . . . . . 8 74 6.4. Correlating form data with the original form . . . . . . 9 75 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 76 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 77 9. Media type registration for multipart/form-data . . . . . . . 10 78 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 80 10.2. Informative References . . . . . . . . . . . . . . . . . 11 81 Appendix A. Changes from RFC 2388 . . . . . . . . . . . . . . . 11 82 Appendix B. Alternatives . . . . . . . . . . . . . . . . . . . . 12 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 85 1. NOTE 87 There is a GitHub repository for this draft at 88 https://github.com/masinter/multipart-form-data along with an issue 89 tracker. This specification is a work item of the APPSAWG 90 Applications Area working group, apps-discuss@ietf.org. Please raise 91 issues in the tracker, and/or send to the apps-discuss list. 93 2. Introduction 95 In many applications, it is possible for a user to be presented with 96 a form. The user will fill out the form, including information that 97 is typed, generated by user input, or included from files that the 98 user has selected. When the form is filled out, the data from the 99 form is sent from the user to the receiving application. 101 The definition of "multipart/form-data" is derived from one of those 102 applications, originally set out in [RFC1867] and subsequently 103 incorporated into HTML 3.2 [W3C.REC-html32-19970114], where forms are 104 expressed in HTML, and in which the form data is sent via HTTP or 105 electronic mail. This representation is widely implemented in 106 numerous web browsers and web servers. 108 However, "multipart/form-data" is also used for forms that are 109 presented using representations other than HTML (spreadsheets, PDF, 110 etc.), and for transport using means other than electronic mail or 111 HTTP; it is used in distributed applications which do not involve 112 forms at all, or do not have users filling out the form. For this 113 reason, this document defines a general syntax and semantics 114 independent of the application for which it is used, with specific 115 rules for web applications noted in context. 117 3. URL encoding non-ASCII values 119 Within this specification, "URL-encoding" is offered as a possible 120 way of encoding non-ASCII characters in file names. The encoding is 121 created replacing each non-ASCII or disallowed character with a 122 sequence, where each byte of the UTF-8 encoding of the character is 123 represented by a percent-sign (%) followed by the (lower case) 124 hexadecimal of that byte. 126 4. Advice for Forms and Form Processing 128 The representation and interpretation of forms and the nature of form 129 processing is not specified by this document. However, for forms and 130 form-processing that result in generation of multipart/form-data, 131 some suggestions are included. 133 In a form, there is generally a sequence of fields, where each field 134 is expected to be supplied with a value, e.g. by a user who fills out 135 the form. Each field has a name. After a form has been filled out, 136 and the form's data is "submitted": the form processing results in a 137 set of values for each field-- the "form data". 139 In forms that work with multipart/form-data, field names could be 140 arbitrary Unicode strings; however, restricting field names to ASCII 141 will help avoid some interoperability issues (see Section 6.1). 143 Within a given form, insuring field names are unique is also helpful. 144 Some fields may have default values or presupplied values in the form 145 itself. Fields with presupplied values might be hidden or invisible; 146 this allows using generic processing for form data from a variety of 147 actual forms. 149 5. Definition of multipart/form-data 151 The media-type "multipart/form-data" follows the model of multipart 152 MIME data streams as specified in [RFC2046] Section 5.1; changes are 153 noted in this document. 155 A "multipart/form-data" body contains a series of parts, separated by 156 a boundary. 158 5.1. Boundary parameter of multipart/form-data 160 As with other multipart types, the parts are delimited with a 161 boundary delimiter, constructed using CRLF, "--", the value of the 162 boundary parameter. Each field's form data of the form is sent, in 163 the order defined by the sending application and form, as a part of 164 the multipart stream. The boundary is supplied as a "boundary" 165 parameter to the "multipart/form-data type". As noted in [RFC2046] 166 Section 5.1, the boundary delimiter MUST NOT appear inside any of the 167 encapsulated parts, and it is often necessary to enclose the boundary 168 parameter values in quotes on the Content-type line. 170 5.2. Content-Disposition header for each part 172 Each part MUST contain a "content-disposition" header [RFC2183] and 173 where the disposition type is "form-data". The "content-disposition" 174 header MUST also contain an additional parameter of "name"; the value 175 of the "name" parameter is the original field name from the form 176 (possibly encoded; see Section 6.1). For example, a part might 177 contain a header: 179 Content-Disposition: form-data; name="user" 181 with the body of the part corresponding to the form data of the 182 "user" field. 184 5.3. filename attribute of content-distribution part header 186 For form data that represents the content of a file, a name for the 187 file SHOULD be supplied as well, by using a "filename" parameter of 188 the "content-disposition" header. A file name isn't mandatory; file 189 uploads might result from selection or drag-and-drop even in systems 190 where the file name is meaningless or private, where the form data 191 content is streamed directly from a device, or where the file name is 192 not user visible and would be unrecognized.) 194 In most multipart types, the MIME headers in each part are restricted 195 to US-ASCII; for compatibility with those systems, file names 196 normally visible to users MAY be encoded (using the URL-encoding 197 method in Section 3, such as how a "file:" URI might be encoded. 199 NOTE: the method in [RFC5987] for using a "filename*" paramter of the 200 "Content-Disposition" header SHOULD NOT be used. 202 Some commonly deployed systems use multipart/form-data with file 203 names directly encoded including octets outside the US-ASCII range. 204 The encoding used for the file names is typically UTF-8, although 205 HTML forms will use the charset associated with the form. 207 5.4. Multiple files for one form field 209 The form data for a form field might include multiple files. 211 [RFC2388] suggested that multiple files for a single form field be 212 transmitted using a nested multipart/mixed part. 214 To match widely deployed implementations, multiple files SHOULD be 215 sent by supplying each file in a separate part, but all with the same 216 "name" parameter. 218 Receiving applications intended for wide applicability (e.g. 219 multipart/form-data parsing libraries) SHOULD also support the older 220 method of supplying multiple files. 222 5.5. Content-Type header for each part 224 Each part MAY have an (optional) "content-type", which defaults to 225 "text/plain". If the contents of a file are to be sent, the file 226 data SHOULD be labeled with an appropriate media type, if known, or 227 "application/octet-stream". 229 5.6. The charset parameter for text/plain form data 231 In the case where the form data is text, the charset parameter for 232 the "text/plain" Content-Type MAY be used to indicate the character 233 encoding used in that part. For example, a form with a text field in 234 which a user typed "Joe owes 100" where is the Euro symbol 235 might have form data returned as: 237 --AaB03x 238 content-disposition: form-data; name="field1" 239 content-type: text/plain;charset=UTF-8 240 content-transfer-encoding: quoted-printable 242 Joe owes =E2=82=AC100. 243 --AaB03x 245 In practice, many widely deployed implementations do not supply a 246 charset parameter in each part, but, rather, they rely on the notion 247 of a "default charset" for a multipart/form-data instance. 248 Subsequent sections will explain how the default charset is 249 established. 251 5.7. The _charset_ field for default charset 253 Some form processing applications (including HTML) have the 254 convention that the value of a form entry with entry name "_charset_" 255 and type "hidden" is automatically set when the form is opened; the 256 value is used as the default charset of text field values (see form- 257 charset in Section 6.1.2). In such cases, the value of the default 258 charset for each text/plain part without a charset parameter is the 259 supplied value. For example: 261 --AaB03x 262 content-disposition: form-data; name="_charset_" 264 iso8859-1 265 --AaB03x-- 266 content-disposition: form-data; name="field1" 268 ...text encoded in iso-8859-1 ... 269 AaB03x-- 271 5.8. Content-Transfer-Encoding deprecated 273 Previously, it was recommended that senders use a "Content-Transfer- 274 Encoding" encoding (such as "quoted-printable") for each non-ASCII 275 part of a multipart/form-data body, because that would allow use in 276 transports that only support a "7BIT" encoding. This use is 277 deprecated for use in contexts that support binary data such as HTTP. 278 Senders SHOULD NOT generate any parts with a "Content-Transfer- 279 Encoding" header. 281 Currently, no deployed implementations that send such bodies have 282 been discovered. 284 5.9. Other Content- headers 286 The "multipart/form-data" media type does not support any MIME 287 headers in the parts other than Content-Type, Content-Disposition, 288 and (in limited circumstances) Content-Transfer-Encoding. Other 289 headers MUST NOT be included and MUST be ignored. 291 6. Operability considerations 293 6.1. Non-ASCII field names and values 295 Normally, MIME headers in multipart bodies are required to consist 296 only of 7-bit data in the US-ASCII character set. While [RFC2388] 297 suggested that non-ASCII field names should be encoded according to 298 the method in [RFC2047] if they contain characters outside of US- 299 ASCII, this practice doesn't seem to have been followed widely. 301 This specification makes three sets of recommendations for three 302 different states of workflow. 304 6.1.1. Avoid non-ASCII field names 306 For broadest interoperability with existing deployed software, those 307 creating forms SHOULD avoid non-ASCII field names. This should not 308 be a burden, because in general the field names are not visible to 309 users. 311 If non-ASCII field names are unavoidable, form or application 312 creators SHOULD use UTF-8 uniformly. This will minimize 313 interoperability problems. 315 6.1.2. Interpreting forms and creating form-data 317 Some applications of this specification will supply a character 318 encoding to be used for interpretation of the multipart/form-data 319 body. In particular, HTML 5 [W3C.REC-html5-20141028] uses: 321 o The content of a '_charset_' field, if there is one. 323 o the value of an accept-charset attribute of the
element, if 324 there is one, 326 o the character encoding of the document containing the form, if it 327 is US-ASCII compatible, 329 o otherwise UTF-8. 331 Call this value the form-charset. Any text, whether field name, 332 field value, or (text/plain) form data which is uses characters 333 outside the ASCII range MAY be represented directly encoded in the 334 form-charset. 336 6.1.3. Parsing and interpreting form data 338 While this specification provides guidance for creation of multipart/ 339 form-data, parsers and interpreters should be aware of the variety of 340 implementations. File systems differ as to whether and how they 341 normalize Unicode names, for example. The matching of form elements 342 to form-data parts may rely on a fuzzier match. In particular, some 343 multipart/form-data generators might have followed the previous 344 advice of [RFC2388] and used the [RFC2047] "encoded-word" method of 345 encoding non-ASCII values: 347 encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 349 Others have been known to follow [RFC2231], to send unencoded UTF-8, 350 or even strings encoded in the form-charset. 352 For this reason, interpreting "multipart/form-data" (even from 353 conforming generators) may require knowing the charset used in form 354 encoding, in cases where the _charset_ field value or a charset 355 parameter of a text/plain Content-Type header is not supplied. 357 6.2. Ordered fields and duplicated field names 359 Form processors given forms with a well-defined ordering SHOULD send 360 back results in the order received and preserve duplicate field 361 names, in order. Intermediaries MUST NOT reorder the results. (Note 362 that there are some forms which do not define a natural order of 363 appearance.) 365 6.3. Interoperability with web applications 367 Many web applications use the "application/x-url-encoded" method for 368 returning data from forms. This format is quite compact, e.g.: 370 name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send 372 However, there is no opportunity to label the enclosed data with 373 content type, apply a charset, or use other encoding mechanisms. 375 Many form-interpreting programs (primarily web browsers) now 376 implement and generate multipart/form-data, but an existing 377 application might need to optionally support both the application/x- 378 url-encoded format as well. 380 6.4. Correlating form data with the original form 382 This specification provides no specific mechanism by which multipart/ 383 form-data can be associated with the form that caused it to be 384 transmitted. This separation is intentional; many different forms 385 might be used for transmitting the same data. In practice, 386 applications may supply a specific form processing resource (in HTML, 387 the ACTION attribute in a FORM tag) for each different form. 388 Alternatively, data about the form might be encoded in a "hidden 389 field" (a field which is part of the form but which has a fixed value 390 to be transmitted back to the form-data processor.) 392 7. IANA Considerations 394 Please update the Internet Media Type registration of multipart/form- 395 data to point to this document. In addition, please update the 396 registration of the "name" parameter in the "Content Disposition 397 Paramters" registry to point to this document. 399 8. Security Considerations 401 Applications which receive forms and process them must be careful not 402 to supply data back to the requesting form processing site that was 403 not intended to be sent. 405 It is important when interpreting the filename of the Content- 406 Disposition header to not overwrite files in the recipient's file 407 space inadvertently. 409 User applications that request form information from users must be 410 careful not to cause a user to send information to the requestor or a 411 third party unwillingly or unwittingly. For example, a form might 412 request 'spam' information to be sent to an unintended third party, 413 or private information to be sent to someone that the user might not 414 actually intend. While this is primarily an issue for the 415 representation and interpretation of forms themselves (rather than 416 the data representation of the form data), the transportation of 417 private information must be done in a way that does not expose it to 418 unwanted prying. 420 With the introduction of form-data that can reasonably send back the 421 content of files from a user's file space, the possibility arises 422 that a user might be sent an automated script that fills out a form 423 and then sends one of the user's local files to another address. 424 Thus, additional caution is required when executing automated 425 scripting where form-data might include a user's files. 427 Files sent via multipart/form-data may contain arbitrary executable 428 content, and precautions against malicious content are necessary. 430 All form processing software should treat user supplied form-data 431 with sensitivity, as it often contains confidential or personally 432 identifying information. Multipart/form-data does not supply any 433 features for checking integrity, ensuring confidentiality or other 434 security features; those concerns must be addressed by the form- 435 filling and form-data-interpreting applications. 437 9. Media type registration for multipart/form-data 439 Media Type name: multipart 441 Media subtype name: form-data 443 Required parameters: boundary 445 Optional parameters: none 447 Encoding considerations: Common use is BINARY. 448 In limited use (or transports that restrict the encoding to 7BIT 449 or 8BIT) each part is encoded separately using Content-Transfer- 450 Encoding Section 5.8. 452 Security considerations: See Section 8 of this document. 454 Interoperability considerations: This document makes several 455 recommendations for interoperability with deployed 456 implementations, including Section 5.8. 458 10. References 460 10.1. Normative References 462 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 463 Extensions (MIME) Part Two: Media Types", RFC 2046, 464 November 1996. 466 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 467 Part Three: Message Header Extensions for Non-ASCII Text", 468 RFC 2047, November 1996. 470 [RFC2183] Troost, R., Dorner, S., and K. Moore, "Communicating 471 Presentation Information in Internet Messages: The 472 Content-Disposition Header Field", RFC 2183, August 1997. 474 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 475 Word Extensions: 476 Character Sets, Languages, and Continuations", RFC 2231, 477 November 1997. 479 10.2. Informative References 481 [RFC1867] Nebel, E. and L. Masinter, "Form-based File Upload in 482 HTML", RFC 1867, November 1995. 484 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 485 form-data", RFC 2388, August 1998. 487 [RFC5987] Reschke, J., "Character Set and Language Encoding for 488 Hypertext Transfer Protocol (HTTP) Header Field 489 Parameters", RFC 5987, August 2010. 491 [W3C.REC-html32-19970114] 492 Raggett, D., "HTML 3.2 Reference Specification", World 493 Wide Web Consortium Recommendation REC-html32-19970114, 494 January 1997, . 496 [W3C.REC-html5-20141028] 497 Hickson, I., Berjon, R., Faulkner, S., Leithead, T., 498 Navara, E., O'Connor, E., and S. Pfeiffer, "HTML5", 499 World Wide Web Consortium Recommendation REC- 500 html5-20141028, October 2014, 501 . 503 Appendix A. Changes from RFC 2388 505 The handling of non-ASCII field names changed-- no longer 506 recommending the RFC 2047 method, instead suggesting senders send 507 UTF-8 field names directly, and file names directly in the form- 508 charset. 510 The handling of multiple files submitted as the result of a single 511 form field (e.g. HTML's element) results 512 in each file having its own top level part with the same name 513 parameter; the method of using a nested "multipart/mixed" from 514 [RFC2388] is no longer recommended for creators, and not required for 515 receivers as there are no known implementations of senders. 517 The _charset_ convention and use of an explicit form-data charset is 518 documented. 520 'boundary' is a required parameter in Content-Type. 522 The relationship of the ordering of fields within a form and the 523 ordering of returned values within multipart/form-data was not 524 defined before, nor was the handling of the case where a form has 525 multiple fields with the same name. 527 Editorial: Removed obsolete discussion of alternatives in appendix. 528 Update references. Move outline of form processing into 529 Introduction. 531 Appendix B. Alternatives 533 There are numerous alternative ways in which form data can be 534 encoded; many are listed in [RFC2388] section 5.2. The multipart/ 535 form-data encoding is verbose, especially if there are many fields 536 with short values. In most use cases, this overhead isn't 537 significant. 539 More problematic is the ambiguity introduced because implementations 540 did not follow [RFC2388] because it used "may" instead of "MUST" when 541 specifying encoding of field names, and for other unknown reasons, so 542 now, parsers need to be more complex for fuzzy matching against the 543 possible outputs of various encoding methods. 545 Author's Address 547 Larry Masinter 548 Adobe 550 Email: masinter@adobe.com 551 URI: http://larry.masinter.net