<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [
  <!ENTITY rfc.2629 PUBLIC '' 'reference.RFC.2629.xml'>
  <!ENTITY rfc.2046 PUBLIC '' 'reference.RFC.2046.xml'>
  <!ENTITY rfc.2047 PUBLIC '' 'reference.RFC.2047.xml'>
  <!ENTITY rfc.2231 PUBLIC '' 'reference.RFC.2231.xml'>
  <!ENTITY rfc.1806 PUBLIC '' 'reference.RFC.1806.xml'>
  <!ENTITY rfc.1867 PUBLIC '' 'reference.RFC.1867.xml'>
  <!ENTITY rfc.2183 PUBLIC '' 'reference.RFC.2183.xml'>
  <!ENTITY rfc.2184 PUBLIC '' 'reference.RFC.2184.xml'>
  <!ENTITY rfc.2388 PUBLIC '' 'reference.RFC.2388.xml'>
  <!ENTITY w3c.html5 PUBLIC '' 'reference.W3C.CR-html5-20121217.xml'>
]>

<rfc ipr='trust200902' docName='draft-ietf-appsawg-multipart-form-data-00'
     obsoletes='2388' category='std'
     >
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
  <?rfc toc='yes' ?>
  <?rfc symrefs='yes' ?>
  <?rfc sortrefs='yes'?>
  <?rfc iprnotified='no' ?>
  <?rfc strict='yes' ?>
  <front>
    <title abbrev='multipart/form-data'>Returning Values from Forms:
    multipart/form-data</title>
    <author initials='L.'
	    surname='Masinter'
	    fullname='Larry Masinter'>
      <organization>Adobe</organization>
      <address>
	<email>masinter@adobe.com</email>
	<uri>http://larry.masinter.net</uri>
      </address>
    </author>
    <date/>
    <abstract>
      <t>
	This specification (re)defines the multipart/form-data
	Internet Media Type, which can be used by a wide variety of
	applications and transported by a wide variety of protocols as
	a way of returning a set of values as the result of a user
	filling out a form.  It replaces RFC 2388.
      </t>
    </abstract>
    <note title='NOTE'>
      <t>There is a GitHub repository for this draft at
      https://github.com/masinter/multipart-form-data along with an
      issue tracker. This specification has been proposed as a work
      item of the APPSAWG Applications Area working group,
      apps-discuss@ietf.org. Please raise issues in the tracker, or
      send to the apps-discuss list.</t>
    </note>
  </front>
  <middle>
    <section title='Introduction'>
      <t>
	In many applications, it is possible for a user to be
	presented with a form. The user will fill out the form,
	including information that is typed, generated by user input,
	or included from files that the user has selected. When the
	form is filled out, the data from the form is sent from the
	user to the receiving application.
      </t>
      <t>
	The definition of <spanx style='verb'>multipart/form-data</spanx> is derived from one of
	those applications, originally set out in <xref
	target='RFC1867'/> and subsequently incorporated into <xref
	target='HTML3.2'/> and <xref target='HTML4'/>, where forms
	are expressed in HTML, and in which the form values are sent
	via HTTP or electronic mail. This representation is widely
	implemented in numerous web browsers and web servers.
      </t>
      
      <t>
	However, multipart/form-data can be used for forms that are
	presented using representations other than HTML (spreadsheets,
	Portable Document Format, etc.), and for transport using other
	means than electronic mail or HTTP. This document defines the
	representation of form values independently of the application
	for which it is used.
      </t>
    </section>
    <section title='Definition of multipart/form-data'>
      <t>
	The media-type multipart/form-data generally follows the model
	of multipart MIME data streams as described in <xref
	target='RFC2046'/> Section 5.1.
      </t>
      <t>
	In forms, there are a series of fields to be supplied by the
	user who fills out the form.  Each field has a name. Within a
	given form, the names SHOULD be unique. After a form
	has been filled out and submitted (processes defined
	by the form), the result is a set of values for each field-- 
	the form-data. 
      </t>
      <t>
	A <spanx style='verb'>multipart/form-data</spanx> body
	contains a series of parts. Each part MUST contain a <spanx
	style='verb'>Content-Disposition</spanx> header <xref
	target='RFC2183'/> where the disposition type is <spanx
	style='verb'>form-data</spanx>, and where the disposition
	contains an (additional) parameter of <spanx
	style='verb'>name</spanx>; the value of the parameter is the
	original field name from the form (encoded, see <xref
	target='non-ascii-fields'/>).  For example, a part might
	contain a header:
      

      <figure><artwork>
        Content-Disposition: form-data; name="user"
      </artwork></figure>
      with the value corresponding to the entry of the 
      <spanx style='verb'>user</spanx> field.
      </t>
      <section title="Boundary">
	<t>
	  As with other multipart types, the parts are delimited with
	  a boundary, selected such that it does not occur in any of
	  the data. Each field of the form is sent, in the order
	  defined by the sending application and form, as a part of the
	  multipart stream. The boundary is supplied as a "boundary"
	  parameter to the multipart/form-data type, e.g.,
      	</t>
	<t>
	  <figure><artwork>
	    multipart/form-data;boundary="-AaB03x"
	  </artwork></figure>
	</t>
      </section>
      
      <section title="filename attribute">
	<t>
	  For form data that represents the content of a local file,
	  a name for the file SHOULD be supplied as well, by using a
	  "filename" parameter of the Content-Distribution header.
	  (The SHOULD is to allow file uploads that result from 
	  drag-and-drop in systems where the file name is meaningless
	  or private, where the uploaded content is streamed directly
	  from a device, or where the file name is not user visible
	  and would be unrecognized.)
	</t>

	<t>
	  For compatibility with other multipart types, the value
	  of the <spanx style="verb">filename</spanx>
	  parameter MUST be restricted to US-ASCII.
	  File names normally visible to users which contain
	  non-ASCII characters SHOULD
	  be encoded using the &amp;#nn; method described in
	  <xref target="amphash"/>. 
	</t>
      </section>
	  
      <section title="Multiple files for one form field">
	<t>
	  If the value of a form field is a set of files rather than a
	  single file, that value MUST be transmitted by supplying
	  each in a separate part, but all with the same 
	  <spanx style='verb'>name</spanx> parameter. 
	</t>
      </section>
	
      <section title='Content-Type'>
	<t>
	  Each part has an (optional)
	  <spanx style='verb'>Content-Type</spanx>, which defaults
	  to <spanx style='verb'>text/plain</spanx>.
	  If the contents of a file are to be sent,
	  the file data is labeled with an appropriate media type, if
	  known, or <spanx style='verb'>application/octet-stream</spanx>.
	</t>
      </section>
      <section title="The charset parameter">
	<t>
	  In the case where a field value is text,
	  the charset parameter for the <spanx style='verb'>text/plain</spanx>
	  Content-Type
	  may be used to indicate the character encoding used in that part.
	  For example, a form with a text field in which a user typed
	  "Joe owes &lt;eu&gt;100" where &lt;eu&gt; is the Euro symbol
	  might have form data returned as:
	</t>
	<figure><artwork>
	  --AaB03x
	  content-disposition: form-data; name="field1"
	  content-type: text/plain;charset=UTF-8
	  content-transfer-encoding: quoted-printable
	    
	  Joe owes =E2=82=AC100.
	  --AaB03x
	</artwork></figure>
      </section>
	
      <section title="The _charset_ field">
        <t>Forms have the convention that the value of a form entry
        with entry name <spanx style='verb'>_charset_</spanx>
	and type <spanx style='verb'>hidden</spanx> is automatically
        set to the name of the form-charset. In this case, the value
        of the default charset of each text/plain part without a
        charset parameter is the supplied value.</t>
      </section>

      <section title="Content-Transfer-Encoding">
	<t>
	  When used in transports which do not allow
	  arbitrary binary data, each part that cannot be represented
	  within the transport SHOULD be encoded and the
	  <spanx style='verb'>Content-Transfer-Encoding</spanx>
	  header supplied in that part. 

	  For example, some email transports use a 7BIT
	  encoding. (See section 5 of <xref target="RFC2046"/> for
	  more details.)
	  
	  When transferred via HTTP, Content-Transfer-Encoding
	  the form-data values SHOULD NOT be used.
	</t>
      </section>
      <section title="Other Content- headers">
	<t>The <spanx style='verb'>multipart/form-data</spanx>
	media type does not support any
	MIME headers in the parts other than Content-Type,
	Content-Disposition, and (when appropriate),
	Content-Transfer-Encoding.
	</t>
      </section>
    </section>
	    
    <section title="Operability considerations">
      <section title="Non-ASCII field names and values"
	       anchor="non-ascii-fields">
	<t>
	  MIME headers in multipart/form-data are required to consist
	  only of 7-bit data in the US-ASCII character set.
          While <xref target="RFC2388"/> suggested that non-ASCII
          field names should be encoded according to the method in
	  <xref target="RFC2047"/> if they contain characters outside
	  of US-ASCII, practice varies.
	</t>
	<t>This specification makes three recommendations for
	   three different states of workflow.</t>
	<section title="Avoid creating forms with non-ASCII field names">
          <t>For broadest interoperability with existing deployed
          software, those creating forms SHOULD avoid non-ASCII field
          names. This should not be a burden, because in general the
          field names are not visible to users.</t>
        </section>
	<section title="Ampersand hash encoding" anchor="amphash">
	  <t>Within this specification, the "ampersand hash encoding"
	  is used for representing characters that are not allowed in
	  a context: replace each disallowed character
	  character by a string consisting of an ampersand
	  (&amp;), a hash mark (#), one or more ASCII digits
	  representing the Unicode code point of the character in base
	  ten, and finally a semicolon (;).
	  </t>
	</section>

	<section title="Interpreting forms and creating form-data">
	  <t>Some applications of this specification will supply a
	  character encoding to be used for creation of the
	  multipart/form-data result. In particular, <xref
	  target="HTML5"/> uses:
	  <list style='symbols'>
	    <t>the value of an accept-charset attribute of
	    the &lt;form&gt; element, if there is one,</t>
	    <t>the character encoding of the document containing
	    the form, if it is US-ASCII compatible,</t>
	    <t>otherwise UTF-8.</t>
	  </list>
	  Call this the form-charset. Any field name or
	  file name which is not in US-ASCII must be encoded
	  using the &amp;#nn; encoding in <xref target="amphash"/></t>
	  <t>multipart/form-data parts which do not have
	  a Content-Type header and which are not the result
	  of supplying a local file MUST be transformed by
	  the same algorithm.
	  </t>
	</section>
	    
	<section title="Parsing and interpreting form data">
	  <t> 
            While this specification provides guidance for
	    creation of multipart/form-data, interpreters of
	    multipart/form-data should be aware of the variety
	    of implementations. 
	    Currently, deployed browsers differ as to how they
            encode multipart/form-data. For this reason the matching
            of form elements to form-data parts may rely on a fuzzier
            match.  In particular, some form-data generators might have
            followed the advice of <xref target="RFC2388"/> and used
            the <xref target="RFC2047"/> "encoded-word" method of
            encoding non-ASCII values:
	    <figure><artwork>
     encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 
	    </artwork></figure>
	    Others have been known to follow <xref target="RFC2231"/> 
	    or to send unencoded UTF-8 or even unencoded strings
	    in the form-charset.
	  </t>
          <t>
	    Generally, interpreting <spanx
	    style='verb'>multipart/form-data</spanx> (even from
	    conforming generators) may require knowing the charset
	    used in form encoding, in cases where the _charset_ field
	    value or a charset parameter of a text/plain Content-Type
	    header is not supplied.
          </t>
	</section>
      </section>
      <section title="Ordered fields and duplicated field names">
	<t>
	  Form processors given forms with a
	  well-defined ordering SHOULD send back results in the order
	  received and preserve duplicate field names, in order.
	  Intermediaries MUST NOT reorder the results.(Note that
	  there are some forms which do not define a natural order
          of appearance.)
	</t>
      </section>	
      <section title="Interoperability with web applications">
	<t>
	  Many web applications use the "application/x-url-encoded"
	  method for returning data from forms. This format is quite
	  compact, e.g.:
	</t>
	<figure><artwork>
	  name=Xavier+Xantico&amp;verdict=Yes&amp;colour=Blue&amp;happy=sad&amp;Utf%F6r=Send
	</artwork></figure>
	<t>
	  However, there is no opportunity to label the enclosed
	  data with content type, apply a charset, or use other
	  encoding mechanisms.
	</t>
	<t>
	  Many form-interpreting programs (primarily web browsers)
	  now implement and generate multipart/form-data, but an
	  existing application might need to optionally support both
	  the application/x-url-encoded format as well.
	</t>
      </section>
      <section title="Correlating form data with the original form">
      <t>
	This specification provides no specific mechanism by which
	multipart/form-data can be associated with the form that
	caused it to be transmitted. This separation is intentional;
	many different forms might be used for transmitting the same
	data. In practice, applications may supply a specific form
	processing resource (in HTML, the ACTION attribute in a FORM
	tag) for each different form.  Alternatively, data about the
	form might be encoded in a "hidden field" (a field which is
	part of the form but which has a fixed value to be transmitted
	back to the form-data processor.)
      </t>
      </section>
    </section>
    <section title="Security Considerations" anchor="security">
      <t>
	It is important when interpreting the filename of the
	Content-Disposition header to not overwrite files in the
	recipient's file space inadvertently.
      </t>
      <t>
	User applications that request form information from users
	must be careful not to cause a user to send information to the
	requestor or a third party unwillingly or unwittingly. For
	example, a form might request 'spam' information to be sent to
	an unintended third party, or private information to be sent
	to someone that the user might not actually intend. While this
	is primarily an issue for the representation and
	interpretation of forms themselves (rather than the data
	representation of the form data),  the
	transportation of private information must be done in a way
	that does not expose it to unwanted prying.
      </t>
      <t>
	With the introduction of form-data that can reasonably send
	back the content of files from a user's file space, the
	possibility arises that a user might be sent an automated script that
	fills out a form and then sends one of the user's local files to
	another address. Thus, additional caution is required
	when executing automated scripting where form-data might
	include a user's files.
      </t>
    </section>
    <section title="Media type registration for multipart/form-data">
      <t>
	<list style='hanging'>
	<t hangText='Media Type name:'>
	  multipart
	</t>
	<t hangText='Media subtype name:'>
	  form-data
	</t>
	<t hangText='Required parameters:'>
	  boundary
	</t>
	<t hangText='Optional parameters:'>
	  none
	</t>
	<t hangText='Encoding considerations:'>
	  For use in transports that
	  restrict the encoding to 7BIT or 8BIT, each part
	  is encoded separately. 
	</t>
	<t hangText='Security considerations:'>
           Applications which receive forms and process them must be
           careful not to supply data back to the requesting form
           processing site that was not intended to be sent by the
           recipient. This is a consideration for any application that
           generates a multipart/form-data.
	   See <xref target="security"/> of this document.
	</t>
      </list>
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
      &rfc.2046;
      &rfc.2047;
      &rfc.2231;
      &rfc.1806;
      &rfc.2183;
      &rfc.2184;
      
    </references>
    <references title="Informative References">
      &rfc.1867;
      &rfc.2388;
      <reference anchor='HTML5'
		 target='http://www.w3.org/html/wg/drafts/html/CR/'>
	<front>
	  <title>HTML5</title>
	  <author initials='R.' surname='Berjon' fullname='Robin Berjon'>
	    <organization />
	  </author>
	  <author initials='S.' surname='Faulkner'>
	    <organization />
	  </author>
	  <author initials='T.' surname='Leithead' fullname='Travis Leithead'>
	    <organization />
	  </author>
	  <author initials='E.' surname='Navara' fullname='Erika Doyle Navara'>
	    <organization />
	  </author>
	  <author initials='E.' surname='O&apos;Connor' fullname='Edward O&apos;Connor'>
	    <organization />
	  </author>
	  <author initials='S.' surname='Pfeiffer' fullname='Silvia Pfeiffer'>
	    <organization />
	  </author>
	  <date month='September' day='17' year='2013' />
	</front>
      </reference>	    
	    
      <reference anchor="HTML3.2" target="http://www.w3.org/TR/REC-html32-19970114">
	<front>
	  <title>HTML 3.2 Reference Specification</title>
	  <author initials="D." surname="Raggett" fullname="David Raggett">
	    <organization/>
	  </author>
	  <date month="January" day="14" year="1997"/>
	</front>
	<seriesInfo name="World Wide Web Consortium Recommendation" value="REC-html32-19970114"/>
	<format type="HTML" target="http://www.w3.org/TR/REC-html32-19970114"/>
      </reference>
      <reference anchor="HTML4" target="http://www.w3.org/TR/REC-html40-971218">
	<front>
	  <title>HTML 4.0 Recommendation</title>
	  <author initials="D." surname="Raggett" fullname="David Raggett">
	    <organization/>
	  </author>
	  <author initials="A." surname="Hors" fullname="Arnaud Le Hors">
	    <organization/>
	  </author>
	  <author initials="I." surname="Jacobs" fullname="Ian Jacobs">
	    <organization/>
	  </author>
	  <date month="December" day="18" year="1997"/>
	</front>
	<seriesInfo name="World Wide Web Consortium" value="REC-html40-971218"/>
	<format type="HTML" target="http://www.w3.org/TR/REC-html40-971218"/>
      </reference>

    </references>
    <section title="Changes from RFC 2388">

      <t>The handling of multiple files submitted as the result of a
      single form field (e.g., HTML's &lt;input type=file multiple&gt;
      element) results in each file having its own top level part with
      the same name parameter; the method of using a nested
      <spanx style='verb'>multipart/mixed</spanx>
      from <xref target="RFC2388"/> is not
      recommended.</t>

      <t>The _charset_ convention and use of an explicit form-data
      charset is documented.</t>

      <t>The handling of non-ASCII field names is changed significantly.
      Few if any implemented the =?charset:string?= method
      of <xref target="RFC2047"/>.
      </t>

      <t>The relationship of the ordering of fields within a form and
      the ordering of returned values within multipart/form-data was
      not defined before, nor was the handling of the case where a
      form has multiple fields with the same name. </t>

      <t>More prescriptive about order and duplicates.</t>

      <t>Remove obsolete discussion of alternatives.</t>


    </section>

    <section title="Alternatives">
      <t>There are numerous alternative ways in which form data can be
      encoded; many are listed in <xref target="RFC2388"/> under
      "Other data encodings rather than multipart."  The
      multipart/form-data encoding is verbose, especially if there are
      many fields with short values. In most use cases, this
      overhead isn't significant. </t>
      <t>More problematic is the ambiguity introduced because
      implementations did not follow <xref target="RFC2388"/>
      because it used "may" instead of "MUST" when specifying
      encoding of field names, and for other unknown reasons,
      so now, parsers need to be more complex for fuzzy matching
      against the possible outputs of various encoding methods.
      </t>
    </section>
  </back>
</rfc>
       
