idnits 2.17.1 draft-ietf-html-fileupload-02.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 528 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 19, 1995) is 10600 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT E. Nebel 2 Form-based File Upload in HTML L. Masinter 3 draft-ietf-html-fileupload-02.txt Xerox Corporation 4 Expires in 6 months April 19, 1995 6 Form-based File Upload in HTML 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, 12 and its working groups. Note that other groups may also distribute 13 working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet-Drafts as reference 18 material or to cite them other than as ``work in progress.'' 20 To learn the current status of any Internet-Draft, please check the 21 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 22 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 23 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 24 ftp.isi.edu (US West Coast). 26 1. Abstract 28 Currently, HTML forms allow the producer of the form to request 29 information from the user reading the form. These forms have proven 30 useful in a wide variety of applications in which input from the 31 user is necessary. However, this capability is limited because HTML 32 forms don't provide a way to ask the user to submit files of data. 33 Service providers who need to get files from the user have had to 34 implement custom user applications. (Examples of these custom 35 browsers have appeared on the www-talk mailing list.) Since 36 file-upload is a feature that will benefit many applications, this 37 draft proposes an extension to HTML to allow information providers 38 to express file upload requests uniformly, and a MIME compatible 39 representation for file upload responses. This draft also includes 40 a description of a backward compatibility strategy that allows new 41 servers to interact with the current HTML user agents. 43 The proposal is independent of which version of HTML it becomes a 44 part. 46 2. HTML forms with file submission 48 The current draft HTML specification defines eight possible values 49 for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, 50 PASSWORD, RADIO, RESET, SUBMIT, TEXT. 52 In addition, it defines the default ENCTYPE attribute of the FORM 53 element using the POST METHOD to have the default value 54 "application/x-www-form-urlencoded". 56 This proposal makes three changes: 57 1) add a FILE option for the TYPE attribute of INPUT 58 2) Allow an ACCEPT attribute for INPUT tag, which is a list of 59 media types or type patterns allowed for the input 60 3) allow the ENCTYPE of a FORM to be "multipart/form-data". 62 These changes might be considered independently, but are all 63 necessary for reasonable file upload. 65 The author of an HTML form who wants to request one or more files 66 from a user would write (for example): 68
70 File to process: 72 74
76 The change to the HTML DTD is to add one item to the entity 77 "InputType". In addition, it is proposed that the INPUT tag have an 78 ACCEPT attribute, which is a list of comma-separated media types. 80 ... (other elements) ... 82 85 86 99 ... (other elements) ... 101 3. Suggested implementation 103 While user agents that interpret HTML have wide leeway to choose the 104 most appropriate mechanism for their context, this section suggests 105 how one class of user agent, WWW browsers, might implement file 106 upload. 108 3.1 Display of FILE widget 110 When a INPUT tag of type FILE is encountered, the browser might show 111 a display of (previously selected) file names, and a "Browse" button 112 or selection method. Selecting the "Browse" button would cause the 113 browser to enter into a file selection mode appropriate for the 114 platform. Window-based browsers might pop up a file selection 115 window, for example. In such a file selection dialog, the user would 116 have the option of replacing a current selection, adding a new file 117 selection, etc. Browser implementors might choose let the list of 118 file names be manually edited. 120 If an ACCEPT attribute is present, the browser might constrain the 121 file patterns prompted for to match those with the corresponding 122 appropriate file extensions for the platform. 124 3.2 Action on submit 126 When the user completes the form, and selects the SUBMIT element, 127 the browser should send the form data and the content of the 128 selected files. The encoding type application/x-www-form-urlencoded 129 is inefficient for sending large quantities of binary data. Thus, a 130 new media type, multipart/form-data, is proposed as a way of 131 efficiently sending the values associated with a filled-out form 132 from client to server. 134 3.3 use of multipart/form-data 136 The definition of multipart/form-data is included in section 7. 137 The media-type multipart/form-data follows the rules of all 138 multipart MIME data streams as outlined in RFC 1521--a boundary is 139 selected that does not occur in any of the data. Each field of the 140 form is sent, in the order in which it occurs in the form, as a part 141 of the multipart stream. Each part identifies the INPUT name within 142 the original HTML form using a "content-disposition: form-data" header 143 with a name attribute specifying the field name. Each part has an 144 optional Content-Type (which defaults to text/plain). File inputs 145 should be identified as either application/octet-stream or the 146 appropriate media type, if known. If multiple files were selected, 147 they should be transferred together using the multipart/mixed 148 format. 150 The "content-transfer-encoding" header should be supplied for all 151 fields whose values do not conform to the default 7BIT encoding. 152 (All characters 7-bit US-ASCII data with lines no longer than 1000 153 characters.) Otherwise, file data and longer field values may be 154 transferred using a content-transfer-encoding appropriate to the 155 protocol of the ACTION in the form. For HTTP applications, 156 content-transfer-encoding of "binary" may be use. If the ACTION is 157 a "mailto:" URL, then the user agent may encode the data 158 appropriately to the mail transport mechanism. [See section 5 of 159 RFC 1521 for more details.] 161 File inputs may optionally identify the file name using the 162 "filename" attribute on the content-disposition header. This is not 163 required, but is as a convenience for those cases where, for 164 example, the uploaded files might contain references to each other, 165 e.g., a TeX file and its .sty auxiliary style description. 167 On the server end, the ACTION might point to a HTTP URL that 168 implements the forms action via CGI. In such a case, the CGI program 169 would note that the content-type is multipart/form-data, parse the 170 various fields (checking for validity, writing the file data to local 171 files for subsequent processing, etc.). 173 3.4 Interpretation of other attributes 175 The VALUE attribute might be used with tags for 176 a default file name. This use is probably platform dependent. 177 It might be useful, however, in sequences of more than one 178 transaction, e.g., to avoid having the user prompted for the same 179 file name over and over again. 181 The SIZE attribute might be specified using SIZE=width,height, where 182 width is some default for file name width, while height is the 183 expected size showing the list of selected files. For example, this 184 would be useful for forms designers who expect to get several files 185 and who would like to show a multiline file input field in the 186 browser (with a "browse" button beside it, hopefully). It would be 187 useful to show a one line text field when no height is specified 188 (when the forms designer expects one file, only) and to show a 189 multiline text area with scrollbars when the height is greater than 190 1 (when the forms designer expects multiple files). 192 4. Backward compatibility issues 194 While not necessary for successful adoption of an enhancement to the 195 current WWW form mechanism, it is useful to also plan for a 196 migration strategy: users with older browsers can still participate 197 in file upload dialogs, using a helper application. Most current web 198 browers, when given , will treat it as and give the user a text box. The user can type in a file 200 name into this text box. In addition, current browsers seem to 201 ignore the ENCTYPE parameter in the
element, and always 202 transmit the data as application/x-www-form-urlencoded. 204 Thus, the server CGI might be written in a way that would note that 205 the form data returned had content-type 206 application/x-www-form-urlencoded instead of 207 multipart/form-data, and know that the user was using a browser 208 that didn't implement file upload. 210 In this case, rather than replying with a "text/html" response, the 211 CGI on the server could instead send back a data stream that a helper 212 application might process instead; this would be a data stream of 213 type "application/x-please-send-files", which contains: 215 * The (fully qualified) URL to which the actual form data should 216 be posted (terminated with CRLF) 217 * The list of field names that were supposed to be file contents 218 (space separated, terminated with CRLF) 219 * The entire original application/x-www-form-urlencoded form data 220 as originally sent from client to server. 222 In this case, the browser needs to be configured to process 223 application/x-please-send-files to launch a helper application. 225 The helper would read the form data, note which fields contained 226 'local file names' that needed to be replaced with their data 227 content, might itself prompt the user for changing or adding to the 228 list of files available, and then repackage the data & file contents 229 in multipart/form-data for retransmission back to the server. 231 The helper would generate the kind of data that a 'new' browser should 232 actually have sent in the first place, with the intention that the URL 233 to which it is sent corresponds to the original ACTION URL. The point 234 of this is that the server can use the *same* CGI to implement the 235 mechanism for dealing with both old and new browsers. 237 The helper need not display the form data, but *should* ensure that 238 the user actually be prompted about the suitability of sending the 239 files requested (this is to avoid a security problem with malicious 240 servers that ask for files that weren't actually promised by the 241 user.) It would be useful if the status of the transfer of the files 242 involved could be displayed. 244 5. Other considerations 246 5.1 Compression, encryption 248 This scheme doesn't address the possible compression of files. 249 After some consideration, it seemed that the optimization issues of 250 file compression were too complex to try to automatically have 251 browsers decide that files should be compressed. Many link-layer 252 transport mechanisms (e.g., high-speed modems) perform data 253 compression over the link, and optimizing for compression at this 254 layer might not be appropriate. It might be possible for browsers to 255 optionally produce a content-transfer-encoding of x-compress for 256 file data, and for servers to decompress the data before processing, 257 if desired; this was left out of the proposal, however. 259 Similarly, the proposal does not contain a mechanism for encryption 260 of the data; this should be handled by whatever other mechanisms are 261 in place for secure transmission of data, whether via secure HTTP or 262 mail. 264 5.2 Deferred file transmission 266 In some situations, it might be advisable to have the server 267 validate various elements of the form data (user name, account, 268 etc.) before actually preparing to receive the data. However, 269 after some consideration, it seemed best to require that servers 270 that wish to do this should implement this as a series of forms, 271 where some of the data elements that were previously validated might 272 be sent back to the client as 'hidden' fields, or by arranging the 273 form so that the elements that need validation occur first. This 274 puts the onus of maintaining the state of a transaction only on 275 those servers that wish to build a complex application, while 276 allowing those cases that have simple input needs to be built 277 simply. Clients are encouraged to supply content-length for overall 278 file input so that a busy server could detect if the proposed file 279 data is too large to be processed reasonably and just return an 280 error code and close the connection without waiting to process all 281 of the incoming data. 283 If the INPUT tag includes the attribute MAXLENGTH, the user agent 284 should consider its value to represent the maximum Content-Length 285 (in bytes) which the server will accept for transferred files. In 286 this way, servers can hint to the client how much space they have 287 available for a file upload, before that upload takes place. It is 288 important to note, however, that this is only a hint, and the actual 289 requirements of the server may change between form creation and file 290 submission. 292 5.3 Other choices for return transmission of binary data 294 Various people have suggested using new mime top-level type 295 "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of 296 "packet" to express indeterminate-length binary data, rather than 297 relying on the multipart-style boundaries. While we are not opposed 298 to doing so, this would require additional design and 299 standardization work to get acceptance of "aggregate". On the other 300 hand, the 'multipart' mechanisms are well established, simple to 301 implement on both the sending client and receiving server, and as 302 efficient as other methods of dealing with multiple combinations of 303 binary data. 305 5.4 Not overloading : 307 Various people have wondered about the advisability of overloading 308 'INPUT' for this function, rather than merely providing a different 309 type of FORM element. Among other considerations, the migration 310 strategy which is allowed when using is important. In 311 addition, the field *is* already overloaded to contain most 312 kinds of data input; rather than creating multiple kinds of 313 tags, it seems most reasonable to enhance . The 'type' of 314 INPUT is not the content-type of what is returned, but rather the 315 'widget-type'; i.e., it identifies the interaction style with the 316 user. The description here is carefully written to allow to work for text browsers or audio-markup. 319 5.5 Default content-type of field data 321 Many input fields in HTML are to be typed in. There has been some 322 ambiguity as to how form data should be transmitted back to servers. 323 Making the content-type of fields be text/plain clearly 324 disambiguates that the client should properly encode the data before 325 sending it back to the server with CRLFs. 327 5.6 Allow form ACTION to be "mailto:" 329 Independent of this proposal, it would be very useful for HTML 330 interpreting user agents to allow a ACTION in a form to be a 331 "mailto:" URL. This seems like a good idea, with or without this 332 proposal. Similarly, the ACTION for a HTML form which is received 333 via mail should probably default to the "reply-to:" of the message. 334 These two proposals would allow HTML forms to be served via HTTP 335 servers but sent back via mail, or, alternatively, allow HTML forms 336 to be sent by mail, filled out by HTML-aware mail recipients, and 337 the results mailed back. 339 5.7 Remote files with third-party transfer 341 In some scenarios, the user operating the client software might want 342 to specify a URL for remote data rather than a local file. In this 343 case, is there a way to allow the browser to send to the client a 344 pointer to the external data rather than the entire contents? This 345 capability could be implemented, for example, by having the client 346 send to the server data of type "message/external-body" with 347 "access-type" set to, say, "uri", and the URL of the remote data in 348 the body of the message. 350 5.8 File transfer with ENCTYPE=x-www-form-urlencoded 352 If a form contains elements but does not contain 353 an ENCTYPE in the enclosing , the behavior is not specified. 354 It is probably inappropriate to attempt to URN-encode large 355 quantities of data to servers that don't expect it. 357 5.9 CRLF used as line separator 359 As with all MIME transmissions, CRLF is used as the separator for 360 lines in a POST of the data in multipart/www-form-data. 362 6. Examples 364 Suppose the server supplies the following HTML: 366 369 What is your name? 370 What files are you sending? 371
373 and the user types "Joe Blow" in the name field, and selects 374 a text file "file1.txt" and also an image file "file2.gif" for 375 the answer to 'What files are you sending?'. 377 The client would send back the following data: 379 Content-type: multipart/form-data, boundary=AaB03x 380 --AaB03x 381 content-disposition: form-data; name="field1" 383 Joe Blow 384 --AaB03x 385 content-disposition: form-data; name="pics" 386 Content-type: multipart/mixed, boundary=BbC04y 388 --BbC04y 389 Content-disposition: attachment; filename="file1.txt" 390 Content-Type: text/plain 391 Content-Transfer-Encoding: binary 393 ... contents of file1.txt ... 394 --BbC04y 395 Content-disposition: attachment; filename="file2.gif" 396 Content-type: image/gif 397 Content-Transfer-Encoding: binary 399 ...contents of file2.gif... 400 --BbC04y-- 401 --AaB03x-- 403 7. Registration of multipart/form-data 405 The media-type multipart/form-data follows the rules of all 406 multipart MIME data streams as outlined in RFC 1521. It is intended 407 for use in returning the data that comes about from filling out a 408 form. In a form (in HTML, although other applications may also use 409 forms), there are a series of fields to be supplied by the user who 410 fills out the form. Each field has a name. The name of the field 411 is restricted to be a set of US-ASCII graphic characters; within a 412 given form, the names are unique. 414 multipart/form-data contains a series of parts. Each part is expected 415 to contain a content-disposition header where the value is 416 "form-data" and a name attribute specifies the field name within the 417 form, e.g., 'content-disposition: form-data; name="xxxxx"', where 418 xxxxx is the field name corresponding to that field. As with all 419 multipart MIME types, each part has an optional Content-Type which 420 defaults to text/plain. 422 Note that mime headers are generally required to consist only of 423 7-bit data in the US-ASCII character set. This specification thus 424 requires that the field names used consist of 7-bit ascii US 425 characters. 427 If the contents of a file are returned via filling out a form, then 428 the file input is identified as application/octet-stream or the 429 appropriate media type, if known. If multiple files are to be 430 returned as the result of a single form entry, they can be returned 431 as multipart/mixed embedded within the multipart/form-data. 433 The "content-transfer-encoding" header should be supplied for all 434 fields whose values do not conform to the default 7BIT encoding 435 (all characters 7-bit US-ASCII data with lines no longer than 1000 436 characters.) 438 Otherwise, file data and longer field values may be 439 transferred using a content-transfer-encoding appropriate to the 440 protocol of the ACTION in the form. For HTTP applications, 441 content-transfer-encoding of "binary" may be use. If the ACTION is 442 a "mailto:" URL, then the user agent may encode the data 443 appropriately to the mail transport mechanism. [See section 5 of 444 RFC 1521 for more details.] 446 File inputs may also identify the file name. The file name may be 447 described using the 'filename' parameter of the 448 "content-disposition" header. This is not required, but is strongly 449 recommended in any case where the original filename is known. This 450 is useful or necessary in many applications. 452 8. Security Considerations 454 It is important that a user agent not send any file that the user 455 has not asked to be sent, explicitly. Thus, HTML interpreting agents 456 are expected to confirm any default file names that might be 457 suggested with . Never have any 458 hidden fields be able to specify any file. 460 9. Conclusion 462 The suggested implementation gives the client a lot of flexibility in 463 the number and types of files it can send to the server, it gives the 464 server control of the decision to accept the files, and it gives 465 servers a chance to interact with browsers which do not support INPUT 466 TYPE "file". 468 The change to the HTML DTD is very simple, but very powerful. It 469 enables a much greater variety of services to be implemented via the 470 World-Wide Web than is currently possible due to the lack of a file 471 submission facility. This would be an extremely valuable addition to 472 the capabilities of the World-Wide Web. 474 A. Authors' Addresses 476 Larry Masinter masinter@parc.xerox.com 477 Xerox Palo Alto Research Center Voice: (415) 812-4365 478 3333 Coyote Hill Road Fax: (415) 812-4333 479 Palo Alto, CA 94304 481 Ernesto Nebel nebel@xsoft.sd.xerox.com 482 XSoft, Xerox Corporation Voice: (619) 676-7817 483 10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865 484 San Diego, CA 92127-2116 486 B. Media type registration for multipart/form-data 487 Media Type name: 488 multipart 490 Media subtype name: 491 form-data 493 Required parameters: 494 none 496 Optional parameters: 497 none 499 Encoding considerations: 500 No additional considerations other than as for other multipart types. 502 Published specification: 503 draft-ietf-html-fileupload-02.txt 505 Security Considerations 507 The multipart/form-data type introduces no new security 508 considerations beyond what might occur with any of the enclosed 509 parts. 511 Person & email address to contact for further information: 513 Larry Masinter 514 masinter@parc.xerox.com