idnits 2.17.1 draft-salsman-www-device-upload-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 305 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (4 January 1998) is 9601 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC 1867' is defined on line 283, but no explicit reference was found in the text == Unused Reference: 'RFC 1806' is defined on line 286, but no explicit reference was found in the text == Unused Reference: 'RFC 2068' is defined on line 290, but no explicit reference was found in the text == Unused Reference: 'RFC 1889' is defined on line 294, but no explicit reference was found in the text == Unused Reference: 'RFC 1890' is defined on line 298, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1867 (Obsoleted by RFC 2854) ** Obsolete normative reference: RFC 1806 (Obsoleted by RFC 2183) ** Obsolete normative reference: RFC 2068 (Obsoleted by RFC 2616) ** Obsolete normative reference: RFC 1889 (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 1890 (Obsoleted by RFC 3551) Summary: 14 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT J. Salsman 2 Filename: Bovik Research 3 Expiration date: 1 July 1998 4 January 1998 5 Form-based Device Input and Upload in HTML 7 Status of this Memo 9 This draft extends an experimental protocol for the Internet 10 community. This draft does not specify an Internet standard of any 11 kind. Discussion and suggestions for improvement are requested. 12 Distribution of this memo is unlimited. 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its 16 areas, and its working groups. Note that other groups may also 17 distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet- 22 Drafts as reference material or to cite them other than as 23 "work in progress." 25 To view the entire list of current Internet-Drafts, please check 26 the "1id-abstracts.txt" listing contained in the Internet-Drafts 27 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 28 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East 29 Coast), or ftp.isi.edu (US West Coast). 31 1. Abstract 33 Currently, HTML forms allow the producer of the form to request 34 information -- including files of data -- from the operator reading 35 the form. However, this capability is limited because HTML forms 36 don't provide a way to ask the operator to submit input from 37 arbitrary sources such as audio devices like microphones. Since 38 input and upload from various devices is a feature that will 39 benefit many applications, this draft proposes an extension to the 40 HTML INPUT TYPE=FILE form element specified in RFC 1867 to allow 41 information providers to express requests for uploads from audio 42 and other devices uniformly. A discussion of MIME audio data 43 types to facilitate useful audio upload responses follows. Also 44 security discussions, audio usability and quality discussions, and 45 a description of a backward compatibility strategy allowing new 46 user agents to utilize HTML written with earlier proposals for 47 audio input in mind. Motivations, including language instruction 48 assistance, voice transcription, and other applications, conclude. 50 2. HTML forms with device input file upload submission 52 Section 3.1 of RFC 1867 provides for the presentation of an 53 arbitrary "widget" to specify input for file uploads. When an 54 INPUT tag of type FILE is encountered with a DEVICE attribute, the 55 associated value (such as MICROPHONE, or MIC) might select the use 56 of a widget capable of buffering and editing real-time input (such 57 as speech) instead of entering a file selection mode. 59 If an ACCEPT attribute is present in a device file input element, 60 the browser might constrain the MIME type of uploaded data to match 61 those with the corresponding list of types specified. If the value 62 of the DEVICE parameter is FILESYSTEM or FILES then the INPUT 63 element might be treated as usual according to RFC 1867 except that 64 the subset of files presented to the operator to choose from may be 65 constrained by the specified list of MIME types instead of a 66 pattern of file names or extensions. 68 Since there is no original filename as specified in section 3.3 of 69 RFC 1867 for parameters of the 'content-disposition: form-data' and 70 'content-disposition: file' HTTP headers, those headers might be 71 provided with a 'type' parameter representing the MIME type of the 72 encoded data, if known, and a 'device' parameter with the same 73 value as the DEVICE attribute of the associated form input element, 74 unless the device or MIME type(s) specified are unsupported in 75 which case the value of the 'device' header parameter might be 76 'unsupported', or unless the device is unavailable in which case 77 the value might be 'unavailable'. If the MIME types requested are 78 unsupported, an additional parameter 'alternates' might be included 79 with a space-separated list of MIME types of the same content-type 80 which may be supported as alternatives for the specified device. 81 The content-disposition header parameter syntax is described in 82 RFC 1806. 84 There may be significant limitations on the client browser's 85 ability to buffer input for upload. Browsers might provide an 86 estimate of the default MAXLENGTH available for device input and 87 upload through the HTTP header 'Pragma: DEVICE-MAXLENGTH='BYTES 88 which represents the content-length available to the browser for 89 buffering (see section 14.32 of RFC 2068.) 91 Furthermore, the VALUE attribute may be used to provide a 92 disambiguation between multiple similar devices when present. 94 If real time events, such as those described and proposed by 95 Gregory S. Aist in "A General Architecture for a Real-Time 96 Discourse Agent and a Case Study in Computerized Oral Reading 97 Tutoring" (Carnegie Mellon University Computational Linguistics 98 Program, 6 December 1996), are required, then the Real-time 99 Transport Protocol (RTP, currently RFC 1889) should be used 100 instead. Because of security concerns discussed in section 3 101 below, HTML scripts might not be able to invoke a form submission 102 when the form involves any kind of file upload without explicit 103 instructions from the session operator to the contrary. 105 2.1. Examples 107
108 Say something: 109 110
112 In this simple form, the HTML author has requested the upload of 113 sampled microphone input from the operator upon form submission. 115 118 Here MIC is not used as an abbreviation. The author of the HTML has 119 requested that the data input from the microphone be encoded as 120 either the MIME type Audio/L16 -- sixteen bit signed linear audio 121 samples (most-significant byte first) -- as specified in RFC 1890 122 section 4.4.8, with a single (monaural) channel and a sample rate of 123 11,025 samples per second, or an unspecified extended MIME Audio 124 type named 'x-cepstral-voc'. 126 128 Here the form element may be used to upload a file as usual, except 129 that the files to select from might be constrained to text files, 130 without explicit regard of their filename or extensions. 132 134 The final example shows how these extensions may be used to request 135 input from other kinds of devices, such as the second of two or 136 more cameras connected to the system running the browser. Under 137 most conditions the operator should be allowed to select the device 138 input, or re-select it if it was specified with a VALUE parameter. 140 3. User interface usability and quality concerns for audio 142 An audio sample is customarily recorded on computer equipment with 143 a dialog routine capable of allowing the user to record, pause, 144 play back, erase, or otherwise edit the recording. Browsers might 145 provide the operator with the same kind of dialog routine for audio 146 device input. And if a MAXLENGTH has been specified or is in force 147 because of limited buffer size, a display of the buffer size used 148 and remaining might be displayed as a dynamic bar graph (or 149 percentage if graphics are unavailable.) A display of time in 150 seconds used and remaining in the buffer may also be provided. 152 Most MIME types defined for audio do not provide high-quality audio 153 encodings. The 'audio/basic' and other types which use a sample 154 rate of 8,000 samples per second truncate the audio spectrum at 155 4,000 Hz according to the Nyquist theorem, discarding information 156 important for discerning consonants. Also, audio/basic and other 157 MIME Audio types use a sample size of eight bits, which does not 158 usually provide enough dynamic range for accurate automatic speech 159 recognition unless published automatic gain control algorithms are 160 reliably used. If sixteen-bit unsigned audio encodings are used 161 according to section 4.4.8 of RFC 1890, the sample rate -- 162 specified as the 'rate' parameter of the MIME type 'audio/l16' -- 163 might be at least 11,025 or 16,000 to adequately provide sufficient 164 information for automatic speech recognition. Otherwise, the audio 165 feature extraction encoding of the speech recognition algorithm 166 might be used to provide a more compact representation to shorten 167 the upload. 169 4. Security considerations 171 Browser operators may not want to send their files, recordings, 172 pictures, video, or other device inputs to arbitrary sites without 173 their explicit permission and direction. Therefore, browser 174 authors are encouraged to disallow the submission of forms which 175 include any kind of file upload by any means other than the 176 standard HTML operator-controlled buttons for form submission 177 without explicit instruction from the session operator to the 178 contrary. Accordingly, the SIZE parameter, document style sheets, 179 and document layers may be prevented from obscuring any kind of 180 file upload widget, especially those capable of accepting a default 181 filename. Furthermore, just as the operator may take direct action 182 to initiate, terminate, review and edit recording as described in ] 183 the previous section, browser authors are encouraged to prevent 184 HTML scripts from taking those and similar actions, unless for 185 example the operator has specifically enabled such script actions 186 with a security option. Even then, such preferences might be 187 specified by the operator to reset after an interval or at the end 188 of the session. Finally, explicit information might be provided to 189 insure that the operator is informed when files are being uploaded. 191 5. Compatibility with earlier forms of audio input 193 Audio device input has been proposed before and implemented from a 194 microphone at least as early as 1994 in experimental versions of 195 common Web browsers. To accommodate the syntax of these earlier 196 extensions, a browser might interpret a valid XML statement such as 198 200 as the device input form 202 204 with all other attribute/value pairs of the original INPUT element 205 kept the same as specified. This would retain compatibility for 206 all implementations of which the author of this draft is aware. 208 6. HTML Document Type Description changes 210 Along with the extension to the HTML InputType entity described in 211 the previous section, this proposal makes an addition to the HTML 212 DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE 213 of type CDATA. 215 7. Motivations and conclusion 217 The primary motivation for these extensions is to add the 218 capability of speech input to Web-based educational systems. For 219 example, the "Test of English as a Foreign Language," or TOEFL 220 assessment is comprised of multiple choice questions based on media 221 comprised of text and audio recordings, so it would be possible to 222 represent the TOEFL with current HTML multimedia content and forms. 223 However, the TOEFL makes no provision whatsoever about the accuracy 224 of pronunciation by the subjects of the assessment, except that 225 provided by the ability to accurately identify the terms in the 226 text of the assessment. So while scoring on the important ability 227 to listen, the TOEFL does not make provisions to assess the 228 important ability to speak with correct pronunciation. But with 229 form-based audio input and upload, and speech recognition servers 230 capable of aligning and scoring the pronunciation of words and 231 phonemes, such a Web-based TOEFL could be extended to lessen the 232 number of indiscernible graduate teaching assistants, for example. 233 These possibilities for language instruction are not limited to 234 the graduate level or to the English language. 236 Other motivations include the development of "dictation servers" 237 capable of transforming spoken audio uploaded though an HTTP 238 session to the corresponding text suitable for sending in email or 239 including in another document, for example. Natural language 240 continuous speech recognition software conforming to standard APIs 241 for automatic dictation is as of this writing available from retail 242 outlets for less than US$90 so there is ample reason to believe 243 that dictation servers might soon become commonplace on the Web 244 with these extensions. 246 Furthermore, this could be a great help for hearing impaired people 247 who want to use a "phonology server" (similar to the server 248 described in the servers described above) to practice improving 249 their pronunciation without depending on a human speech coach. 251 Finally, Larry Masinter, author of RFC 1867, has indicated that 252 graphical paper scanners might be used for applications such as 253 OCR and bar-code upload. "DEVICE=SCANNER" is suggested. 255 The change to the HTML DTD is very simple, but very powerful. It 256 enables a much greater variety of services to be implemented via 257 the World-Wide Web than is currently possible due to the lack of a 258 peripheral input upload submission facility. This would be a very 259 valuable addition to the capabilities of the World-Wide Web. 261 8. Author's address and acknowledgments 263 James Salsman 264 Bovik Research 265 575 S. Rengstorff Avenue #175 266 Mountain View, CA 94040-1982 268 Email: jps@bovik.org 269 Phone: (650) 938-1440 271 Larry Masinter and Harald Alvestrand contributed excellent advice. 272 Ed Tecot contributed the means of device and media independence. 273 David McMillian contributed to the description of capabilities of 274 the audio widget. Syracuse Language Systems, The Learning Co., 275 and EduSoft, Ltd., contributed much of the inspiration; Jack Mostow 276 and Greg Aist filled in the real-time details for younger grades. 278 "TOEFL" and "Test Of English as a Foreign Language" are 279 registered trademarks of Educational Testing Service. 281 References 283 [RFC 1867] Form-based File Upload in HTML. E. Nebel & L. Masinter, 284 November 1995. 286 [RFC 1806] Communicating Presentation Information in Internet 287 Messages: The Content-Disposition Header. R. Troost, 288 S. Dorner, June 1995. 290 [RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, 291 J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee, 292 January 1997. 294 [RFC 1889] RTP: A Transport Protocol for Real-Time Applications. 295 H. Schulzrinne, S. Casner, R. Frederick, & V. Jacobson, 296 January 1996. 298 [RFC 1890] RTP Profile for Audio and Video Conferences with Minimal 299 Control. H. Schulzrinne, January 1996. 301 END OF INTERNET-DRAFT 302 Filename: 303 Expiration date: 1 July 1998