idnits 2.17.1 draft-ietf-appsawg-multipart-form-data-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 10, 2015) is 3303 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-16) exists of draft-ietf-appsawg-file-scheme-00 -- Obsolete informational reference (is this intentional?): RFC 1867 (Obsoleted by RFC 2854) -- Obsolete informational reference (is this intentional?): RFC 2388 (Obsoleted by RFC 7578) -- Obsolete informational reference (is this intentional?): RFC 5987 (Obsoleted by RFC 8187) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 APPSAWG L. Masinter 3 Internet-Draft Adobe 4 Obsoletes: 2388 (if approved) April 10, 2015 5 Intended status: Standards Track 6 Expires: October 12, 2015 8 Returning Values from Forms: multipart/form-data 9 draft-ietf-appsawg-multipart-form-data-11 11 Abstract 13 This specification defines the multipart/form-data Internet Media 14 Type, which can be used by a wide variety of applications and 15 transported by a wide variety of protocols as a way of returning a 16 set of values as the result of a user filling out a form. It 17 obsoletes RFC 2388. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on October 12, 2015. 36 Copyright Notice 38 Copyright (c) 2015 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. percent-encoding option . . . . . . . . . . . . . . . . . . . 3 55 3. Advice for Forms and Form Processing . . . . . . . . . . . . 3 56 4. Definition of multipart/form-data . . . . . . . . . . . . . . 4 57 4.1. Boundary parameter of multipart/form-data . . . . . . . . 4 58 4.2. Content-Disposition header for each part . . . . . . . . 4 59 4.3. filename attribute of content-distribution part header . 4 60 4.4. Multiple files for one form field . . . . . . . . . . . . 5 61 4.5. Content-Type header for each part . . . . . . . . . . . . 5 62 4.6. The charset parameter for text/plain form data . . . . . 5 63 4.7. The _charset_ field for default charset . . . . . . . . . 6 64 4.8. Content-Transfer-Encoding deprecated . . . . . . . . . . 6 65 4.9. Other Content- headers . . . . . . . . . . . . . . . . . 7 66 5. Operability considerations . . . . . . . . . . . . . . . . . 7 67 5.1. Non-ASCII field names and values . . . . . . . . . . . . 7 68 5.1.1. Avoid non-ASCII field names . . . . . . . . . . . . . 7 69 5.1.2. Interpreting forms and creating form-data . . . . . . 7 70 5.1.3. Parsing and interpreting form data . . . . . . . . . 8 71 5.2. Ordered fields and duplicated field names . . . . . . . . 8 72 5.3. Interoperability with web applications . . . . . . . . . 8 73 5.4. Correlating form data with the original form . . . . . . 9 74 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 76 8. Media type registration for multipart/form-data . . . . . . . 10 77 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 78 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 79 9.2. Informative References . . . . . . . . . . . . . . . . . 12 80 Appendix A. Changes from RFC 2388 . . . . . . . . . . . . . . . 12 81 Appendix B. Alternatives . . . . . . . . . . . . . . . . . . . . 13 82 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 84 1. Introduction 86 In many applications, it is possible for a user to be presented with 87 a form. The user will fill out the form, including information that 88 is typed, generated by user input, or included from files that the 89 user has selected. When the form is filled out, the data from the 90 form is sent from the user to the receiving application. 92 The definition of "multipart/form-data" is derived from one of those 93 applications, originally set out in [RFC1867] and subsequently 94 incorporated into HTML 3.2 [W3C.REC-html32-19970114], where forms are 95 expressed in HTML, and in which the form data is sent via HTTP or 96 electronic mail. This representation is widely implemented in 97 numerous web browsers and web servers. 99 However, "multipart/form-data" is also used for forms that are 100 presented using representations other than HTML (spreadsheets, PDF, 101 etc.), and for transport using means other than electronic mail or 102 HTTP; it is used in distributed applications which do not involve 103 forms at all, or do not have users filling out the form. For this 104 reason, this document defines a general syntax and semantics 105 independent of the application for which it is used, with specific 106 rules for web applications noted in context. 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in BCP 14, RFC 2119 111 [RFC2119]. 113 2. percent-encoding option 115 Within this specification, "percent-encoding" (as defined in 116 [RFC3986]) is offered as a possible way of encoding characters in 117 file names that are otherwise disallowed, including non-ASCII 118 characters, spaces, control characters and so forth. The encoding is 119 created replacing each non-ASCII or disallowed character with a 120 sequence, where each byte of the UTF-8 encoding of the character is 121 represented by a percent-sign (%) followed by the (case-insensitive) 122 hexadecimal of that byte. 124 3. Advice for Forms and Form Processing 126 The representation and interpretation of forms and the nature of form 127 processing is not specified by this document. However, for forms and 128 form-processing that result in generation of multipart/form-data, 129 some suggestions are included. 131 In a form, there is generally a sequence of fields, where each field 132 is expected to be supplied with a value, e.g. by a user who fills out 133 the form. Each field has a name. After a form has been filled out, 134 and the form's data is "submitted": the form processing results in a 135 set of values for each field-- the "form data". 137 In forms that work with multipart/form-data, field names could be 138 arbitrary Unicode strings; however, restricting field names to ASCII 139 will help avoid some interoperability issues (see Section 5.1). 141 Within a given form, ensuring field names are unique is also helpful. 142 Some fields may have default values or presupplied values in the form 143 itself. Fields with presupplied values might be hidden or invisible; 144 this allows using generic processing for form data from a variety of 145 actual forms. 147 4. Definition of multipart/form-data 149 The media-type "multipart/form-data" follows the model of multipart 150 MIME data streams as specified in [RFC2046] Section 5.1; changes are 151 noted in this document. 153 A "multipart/form-data" body contains a series of parts, separated by 154 a boundary. 156 4.1. Boundary parameter of multipart/form-data 158 As with other multipart types, the parts are delimited with a 159 boundary delimiter, constructed using CRLF, "--", the value of the 160 boundary parameter. The boundary is supplied as a "boundary" 161 parameter to the "multipart/form-data" type. As noted in [RFC2046] 162 Section 5.1, the boundary delimiter MUST NOT appear inside any of the 163 encapsulated parts, and it is often necessary to enclose the boundary 164 parameter values in quotes on the Content-type line. 166 4.2. Content-Disposition header for each part 168 Each part MUST contain a "content-disposition" header [RFC2183] and 169 where the disposition type is "form-data". The "content-disposition" 170 header MUST also contain an additional parameter of "name"; the value 171 of the "name" parameter is the original field name from the form 172 (possibly encoded; see Section 5.1). For example, a part might 173 contain a header: 175 Content-Disposition: form-data; name="user" 177 with the body of the part containing the form data of the "user" 178 field. 180 4.3. filename attribute of content-distribution part header 182 For form data that represents the content of a file, a name for the 183 file SHOULD be supplied as well, by using a "filename" parameter of 184 the "content-disposition" header. The file name isn't mandatory for 185 cases where the file name isn't available or is meaningless or 186 private; this might result, for example, from selection or drag-and- 187 drop or where the form data content is streamed directly from a 188 device. 190 If a filename parameter is supplied, the requirements of [RFC2183] 191 Section 2.3 for "receiving MUA" apply to recievers of "multipart/ 192 form-data" as well: Do not use the file name blindly, check and 193 possibly change to match local filesystem conventions if applicable, 194 do not use directory path information that may be present. 196 In most multipart types, the MIME headers in each part are restricted 197 to US-ASCII; for compatibility with those systems, file names 198 normally visible to users MAY be encoded using the percent-encoding 199 method in Section 2, following how a "file:" URI 200 [I-D.ietf-appsawg-file-scheme] might be encoded. 202 NOTE: The encoding method described in [RFC5987], which would add a 203 "filename*" paramter to the "Content-Disposition" header, MUST NOT be 204 used. 206 Some commonly deployed systems use multipart/form-data with file 207 names directly encoded including octets outside the US-ASCII range. 208 The encoding used for the file names is typically UTF-8, although 209 HTML forms will use the charset associated with the form. 211 4.4. Multiple files for one form field 213 The form data for a form field might include multiple files. 215 [RFC2388] suggested that multiple files for a single form field be 216 transmitted using a nested multipart/mixed part. This usage is 217 deprecated. 219 To match widely deployed implementations, multiple files MUST be sent 220 by supplying each file in a separate part, but all with the same 221 "name" parameter. 223 Receiving applications intended for wide applicability (e.g. 224 multipart/form-data parsing libraries) SHOULD also support the older 225 method of supplying multiple files. 227 4.5. Content-Type header for each part 229 Each part MAY have an (optional) "content-type", which defaults to 230 "text/plain". If the contents of a file are to be sent, the file 231 data SHOULD be labeled with an appropriate media type, if known, or 232 "application/octet-stream". 234 4.6. The charset parameter for text/plain form data 236 In the case where the form data is text, the charset parameter for 237 the "text/plain" Content-Type MAY be used to indicate the character 238 encoding used in that part. For example, a form with a text field in 239 which a user typed "Joe owes 100" where is the Euro symbol 240 might have form data returned as: 242 --AaB03x 243 content-disposition: form-data; name="field1" 244 content-type: text/plain;charset=UTF-8 245 content-transfer-encoding: quoted-printable 247 Joe owes =E2=82=AC100. 248 --AaB03x 250 In practice, many widely deployed implementations do not supply a 251 charset parameter in each part, but, rather, they rely on the notion 252 of a "default charset" for a multipart/form-data instance. 253 Subsequent sections will explain how the default charset is 254 established. 256 4.7. The _charset_ field for default charset 258 Some form processing applications (including HTML) have the 259 convention that the value of a form entry with entry name "_charset_" 260 and type "hidden" is automatically set when the form is opened; the 261 value is used as the default charset of text field values (see form- 262 charset in Section 5.1.2). In such cases, the value of the default 263 charset for each text/plain part without a charset parameter is the 264 supplied value. For example: 266 --AaB03x 267 content-disposition: form-data; name="_charset_" 269 iso-8859-1 270 --AaB03x-- 271 content-disposition: form-data; name="field1" 273 ...text encoded in iso-8859-1 ... 274 AaB03x-- 276 4.8. Content-Transfer-Encoding deprecated 278 Previously, it was recommended that senders use a "Content-Transfer- 279 Encoding" encoding (such as "quoted-printable") for each non-ASCII 280 part of a multipart/form-data body, because that would allow use in 281 transports that only support a "7BIT" encoding. This use is 282 deprecated for use in contexts that support binary data such as HTTP. 283 Senders SHOULD NOT generate any parts with a "Content-Transfer- 284 Encoding" header. 286 Currently, no deployed implementations that send such bodies have 287 been discovered. 289 4.9. Other Content- headers 291 The "multipart/form-data" media type does not support any MIME 292 headers in the parts other than Content-Type, Content-Disposition, 293 and (in limited circumstances) Content-Transfer-Encoding. Other 294 headers MUST NOT be included and MUST be ignored. 296 5. Operability considerations 298 5.1. Non-ASCII field names and values 300 Normally, MIME headers in multipart bodies are required to consist 301 only of 7-bit data in the US-ASCII character set. While [RFC2388] 302 suggested that non-ASCII field names be encoded according to the 303 method in [RFC2047], this practice doesn't seem to have been followed 304 widely. 306 This specification makes three sets of recommendations for three 307 different states of workflow. 309 5.1.1. Avoid non-ASCII field names 311 For broadest interoperability with existing deployed software, those 312 creating forms SHOULD avoid non-ASCII field names. This should not 313 be a burden, because in general the field names are not visible to 314 users. The field names in the underlying need not match what the 315 user sees on the screen. 317 If non-ASCII field names are unavoidable, form or application 318 creators SHOULD use UTF-8 uniformly. This will minimize 319 interoperability problems. 321 5.1.2. Interpreting forms and creating form-data 323 Some applications of this specification will supply a character 324 encoding to be used for interpretation of the multipart/form-data 325 body. In particular, HTML 5 [W3C.REC-html5-20141028] uses: 327 o The content of a '_charset_' field, if there is one. 329 o the value of an accept-charset attribute of the
element, if 330 there is one, 332 o the character encoding of the document containing the form, if it 333 is US-ASCII compatible, 335 o otherwise UTF-8. 337 Call this value the form-charset. Any text, whether field name, 338 field value, or (text/plain) form data which is uses characters 339 outside the ASCII range MAY be represented directly encoded in the 340 form-charset. 342 5.1.3. Parsing and interpreting form data 344 While this specification provides guidance for creation of multipart/ 345 form-data, parsers and interpreters should be aware of the variety of 346 implementations. File systems differ as to whether and how they 347 normalize Unicode names, for example. The matching of form elements 348 to form-data parts may rely on a fuzzier match. In particular, some 349 multipart/form-data generators might have followed the previous 350 advice of [RFC2388] and used the [RFC2047] "encoded-word" method of 351 encoding non-ASCII values: 353 encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 355 Others have been known to follow [RFC2231], to send unencoded UTF-8, 356 or even strings encoded in the form-charset. 358 For this reason, interpreting "multipart/form-data" (even from 359 conforming generators) may require knowing the charset used in form 360 encoding, in cases where the _charset_ field value or a charset 361 parameter of a text/plain Content-Type header is not supplied. 363 5.2. Ordered fields and duplicated field names 365 Form processors given forms with a well-defined ordering SHOULD send 366 back results in order (note that there are some forms which do not 367 define a natural order.) Intermediaries MUST NOT reorder the 368 results. Form parts with identical field names MUST NOT be 369 coalesced. 371 5.3. Interoperability with web applications 373 Many web applications use the "application/x-url-encoded" method for 374 returning data from forms. This format is quite compact, e.g.: 376 name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send 378 However, there is no opportunity to label the enclosed data with 379 content type, apply a charset, or use other encoding mechanisms. 381 Many form-interpreting programs (primarily web browsers) now 382 implement and generate multipart/form-data, but an existing 383 application might need to optionally support both the application/x- 384 url-encoded format as well. 386 5.4. Correlating form data with the original form 388 This specification provides no specific mechanism by which multipart/ 389 form-data can be associated with the form that caused it to be 390 transmitted. This separation is intentional; many different forms 391 might be used for transmitting the same data. In practice, 392 applications may supply a specific form processing resource (in HTML, 393 the ACTION attribute in a FORM tag) for each different form. 394 Alternatively, data about the form might be encoded in a "hidden 395 field" (a field which is part of the form but which has a fixed value 396 to be transmitted back to the form-data processor.) 398 6. IANA Considerations 400 Please update the Internet Media Type registration of multipart/form- 401 data to point to this document, using the template in Section 8. In 402 addition, please update the registrations of the "name" parameter and 403 the "form-data" value in the "Content Disposition Values and 404 Parameters" registry to both point to this document. 406 7. Security Considerations 408 All form processing software should treat user supplied form-data 409 with sensitivity, as it often contains confidential or personally 410 identifying information. There is widespread use of form "auto-fill" 411 features in web browsers; these might be used to trick users to 412 unknowingly send confidential information when completing otherwise 413 innoccuous tasks. Multipart/form-data does not supply any features 414 for checking integrity, ensuring confidentiality, avoiding user 415 confusion, or other security features; those concerns must be 416 addressed by the form-filling and form-data-interpreting 417 applications. 419 Applications which receive forms and process them must be careful not 420 to supply data back to the requesting form processing site that was 421 not intended to be sent. 423 It is important when interpreting the filename of the Content- 424 Disposition header to not overwrite files in the recipient's file 425 space inadvertently. 427 User applications that request form information from users must be 428 careful not to cause a user to send information to the requestor or a 429 third party unwillingly or unwittingly. For example, a form might 430 request 'spam' information to be sent to an unintended third party, 431 or private information to be sent to someone that the user might not 432 actually intend. While this is primarily an issue for the 433 representation and interpretation of forms themselves (rather than 434 the data representation of the form data), the transportation of 435 private information must be done in a way that does not expose it to 436 unwanted prying. 438 With the introduction of form-data that can reasonably send back the 439 content of files from a user's file space, the possibility arises 440 that a user might be sent an automated script that fills out a form 441 and then sends one of the user's local files to another address. 442 Thus, additional caution is required when executing automated 443 scripting where form-data might include a user's files. 445 Files sent via multipart/form-data may contain arbitrary executable 446 content, and precautions against malicious content are necessary. 448 The considerations of [RFC2183] Sections 2.3 and 5 with respect to 449 the filename parameter of the Content-Disposition header also apply 450 to its usage here. 452 8. Media type registration for multipart/form-data 454 This section is the [RFC6838] media type registration. 456 Type name: multipart 458 Subtype name: form-data 460 Required parameters: boundary 462 Optional parameters: none 464 Encoding considerations: Common use is BINARY. 465 In limited use (or transports that restrict the encoding to 7BIT 466 or 8BIT each part is encoded separately using Content-Transfer- 467 Encoding Section 4.8. 469 Security considerations: See Section 7 of this document. 471 Interoperability considerations: This document makes several 472 recommendations for interoperability with deployed 473 implementations, including Section 4.8. 475 Published specification: This document. 477 Applications that use this media type: Numerous web browsers, 478 servers, and web applications. 480 Fragment identifier considerations: None: Fragment identifiers are 481 not defined for this type. 483 Additional information: None: no deprecated alias names, magic 484 numbers, file extensions or Macintosh ssssfile type codes. 486 Person & email address to contact for further information 487 Author of this document. 489 Intended Usage: COMMON 491 Restrictions on usage: none 493 Author: Author of this document. 495 Change controller: IETF 497 Provisional registration: N/A 499 9. References 501 9.1. Normative References 503 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 504 Extensions (MIME) Part Two: Media Types", RFC 2046, 505 November 1996. 507 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 508 Part Three: Message Header Extensions for Non-ASCII Text", 509 RFC 2047, November 1996. 511 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 512 Requirement Levels", BCP 14, RFC 2119, March 1997. 514 [RFC2183] Troost, R., Dorner, S., and K. Moore, "Communicating 515 Presentation Information in Internet Messages: The 516 Content-Disposition Header Field", RFC 2183, August 1997. 518 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 519 Word Extensions: 520 Character Sets, Languages, and Continuations", RFC 2231, 521 November 1997. 523 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 524 Resource Identifier (URI): Generic Syntax", STD 66, RFC 525 3986, January 2005. 527 9.2. Informative References 529 [I-D.ietf-appsawg-file-scheme] 530 Kerwin, M., "The file URI Scheme", draft-ietf-appsawg- 531 file-scheme-00 (work in progress), January 2015. 533 [RFC1867] Nebel, E. and L. Masinter, "Form-based File Upload in 534 HTML", RFC 1867, November 1995. 536 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ 537 form-data", RFC 2388, August 1998. 539 [RFC5987] Reschke, J., "Character Set and Language Encoding for 540 Hypertext Transfer Protocol (HTTP) Header Field 541 Parameters", RFC 5987, August 2010. 543 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 544 Specifications and Registration Procedures", BCP 13, RFC 545 6838, January 2013. 547 [W3C.REC-html32-19970114] 548 Raggett, D., "HTML 3.2 Reference Specification", World 549 Wide Web Consortium Recommendation REC-html32-19970114, 550 January 1997, . 552 [W3C.REC-html5-20141028] 553 Hickson, I., Berjon, R., Faulkner, S., Leithead, T., 554 Navara, E., O'Connor, E., and S. Pfeiffer, "HTML5", 555 World Wide Web Consortium Recommendation REC- 556 html5-20141028, October 2014, 557 . 559 Appendix A. Changes from RFC 2388 561 The handling of non-ASCII field names changed-- no longer 562 recommending the RFC 2047 method, instead suggesting senders send 563 UTF-8 field names directly, and file names directly in the form- 564 charset. 566 The handling of multiple files submitted as the result of a single 567 form field (e.g. HTML's element) results 568 in each file having its own top level part with the same name 569 parameter; the method of using a nested "multipart/mixed" from 570 [RFC2388] is no longer recommended for creators, and not required for 571 receivers as there are no known implementations of senders. 573 The _charset_ convention and use of an explicit form-data charset is 574 documented. 576 'boundary' is a required parameter in Content-Type. 578 The relationship of the ordering of fields within a form and the 579 ordering of returned values within multipart/form-data was not 580 defined before, nor was the handling of the case where a form has 581 multiple fields with the same name. 583 Editorial: Removed obsolete discussion of alternatives in appendix. 584 Update references. Move outline of form processing into 585 Introduction. 587 Appendix B. Alternatives 589 There are numerous alternative ways in which form data can be 590 encoded; many are listed in [RFC2388] section 5.2. The multipart/ 591 form-data encoding is verbose, especially if there are many fields 592 with short values. In most use cases, this overhead isn't 593 significant. 595 More problematic are the differences introduced when implementors 596 opted to not follow [RFC2388] when encoding non-ASCII field names 597 (perhaps because "may" should have been "MUST"). As a result, 598 parsers need to be more complex for matching against the possible 599 outputs of various encoding methods. 601 Author's Address 603 Larry Masinter 604 Adobe 606 Email: masinter@adobe.com 607 URI: http://larry.masinter.net