idnits 2.17.1 draft-ietf-822ext-mime-imt-04.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 311: '...y MIME text type MUST represent a line...' RFC 2119 keyword, line 313: '...in text MUST represent a line break. ...' RFC 2119 keyword, line 399: '...racter encodings MUST use an appropria...' RFC 2119 keyword, line 472: '...CII characters, it SHOULD be marked as...' RFC 2119 keyword, line 864: '...undary delimiter MUST NOT appear insid...' (8 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 483 has weird spacing: '...of text is "p...' == Line 959 has weird spacing: '...F (line break...' == Line 1738 has weird spacing: '...ed, the defau...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 1996) is 10266 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-1563' on line 324 looks like a reference -- Missing reference section? 'RFC-821' on line 368 looks like a reference -- Missing reference section? 'ISO-646' on line 380 looks like a reference -- Missing reference section? 'US-ASCII' on line 430 looks like a reference -- Missing reference section? 'ISO-8859' on line 433 looks like a reference -- Missing reference section? 'JPEG' on line 508 looks like a reference -- Missing reference section? 'PCM' on line 538 looks like a reference -- Missing reference section? 'MPEG' on line 556 looks like a reference -- Missing reference section? 'POSTSCRIPT' on line 646 looks like a reference -- Missing reference section? 'POSTSCRIPT2' on line 647 looks like a reference -- Missing reference section? 'MIME-IMB' on line 881 looks like a reference -- Missing reference section? 'RFC-959' on line 1735 looks like a reference -- Missing reference section? 'RFC-783' on line 1730 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 4 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Nathaniel Borenstein 3 Internet Draft Ned Freed 4 6 Multipurpose Internet Mail Extensions 7 (MIME) Part Two: 9 Media Types 11 March 1996 13 Status of this Memo 15 This document is an Internet-Draft. Internet-Drafts are 16 working documents of the Internet Engineering Task Force 17 (IETF), its areas, and its working groups. Note that other 18 groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months. Internet-Drafts may be updated, replaced, or obsoleted 23 by other documents at any time. It is not appropriate to use 24 Internet-Drafts as reference material or to cite them other 25 than as a "working draft" or "work in progress". 27 To learn the current status of any Internet-Draft, please 28 check the 1id-abstracts.txt listing contained in the 29 Internet-Drafts Shadow Directories on ds.internic.net (US East 30 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 31 or munnari.oz.au (Pacific Rim). 33 1. Abstract 35 STD 11, RFC 822 defines a message representation protocol 36 specifying considerable detail about US-ASCII message headers, 37 but which leaves the message content, or message body, as flat 38 US-ASCII text. This set of documents, collectively called the 39 Multipurpose Internet Mail Extensions, or MIME, redefines the 40 format of messages to allow for 41 (1) textual message bodies in character sets other than 42 US-ASCII, 44 (2) an extensible set of different formats for non-textual 45 message bodies, 47 (3) multi-part message bodies, and 49 (4) textual header information in character sets other than 50 US-ASCII. 52 These documents are based on earlier work documented in RFC 53 934, STD 11, and RFC 1049, but extends and revises them. 54 Because RFC 822 said so little about message bodies, these 55 documents are largely orthogonal to (rather than a revision 56 of) RFC 822. 58 The initial document in this set, RFC MIME-IMB, specifies the 59 various headers used to describe the structure of MIME 60 messages. This second document defines the general structure 61 of the MIME media typing system and defines an initial set of 62 media types. The third document, RFC MIME-HEADERS, describes 63 extensions to RFC 822 to allow non-US-ASCII text data in 64 Internet mail header fields. The fourth document, RFC MIME- 65 REG, specifies various IANA registration procedures for MIME- 66 related facilities. The fifth and final document, RFC MIME- 67 CONF, describes MIME conformance criteria as well as providing 68 some illustrative examples of MIME message formats, 69 acknowledgements, and the bibliography. 71 These documents are revisions of RFCs 1521 and 1522, which 72 themselves were revisions of RFCs 1341 and 1342. An appendix 73 in RFC MIME-CONF describes differences and changes from 74 previous versions. 76 2. Table of Contents 78 1 Abstract .............................................. 1 79 2 Table of Contents ..................................... 3 80 3 Introduction .......................................... 4 81 4 Definition of a Top-Level Media Type .................. 5 82 5 Overview Of The Initial Top-Level Media Types ......... 5 83 6 Discrete Media Type Values ............................ 7 84 6.1 Text Media Type ..................................... 7 85 6.1.1 Representation of Line Breaks ..................... 8 86 6.1.2 Charset Parameter ................................. 8 87 6.1.3 Plain Subtype ..................................... 12 88 6.1.4 Unrecognized Subtypes ............................. 12 89 6.2 Image Media Type .................................... 12 90 6.3 Audio Media Type .................................... 13 91 6.4 Video Media Type .................................... 13 92 6.5 Application Media Type .............................. 14 93 6.5.1 Octet-Stream Subtype .............................. 15 94 6.5.2 PostScript Subtype ................................ 16 95 6.5.3 Other Application Subtypes ........................ 19 96 7 Composite Media Type Values ........................... 19 97 7.1 Multipart Media Type ................................ 20 98 7.1.1 Common Syntax ..................................... 21 99 7.1.2 Handling Nested Messages and Multiparts ........... 28 100 7.1.3 Mixed Subtype ..................................... 28 101 7.1.4 Alternative Subtype ............................... 28 102 7.1.5 Digest Subtype .................................... 31 103 7.1.6 Parallel Subtype .................................. 32 104 7.1.7 Other Multipart Subtypes .......................... 33 105 7.2 Message Media Type .................................. 33 106 7.2.1 RFC822 Subtype .................................... 34 107 7.2.2 Partial Subtype ................................... 34 108 7.2.2.1 Message Fragmentation and Reassembly ............ 36 109 7.2.2.2 Fragmentation and Reassembly Example ............ 37 110 7.2.3 External-Body Subtype ............................. 39 111 7.2.4 Other Message Subtypes ............................ 47 112 8 Experimental Media Type Values ........................ 47 113 9 Summary ............................................... 48 114 10 Security Considerations .............................. 48 115 11 Authors' Addresses ................................... 49 116 A Collected Grammar ..................................... 50 117 3. Introduction 119 The first document in this set, RFC MIME-IMB, defines a number 120 of header fields, including Content-Type. The Content-Type 121 field is used to specify the nature of the data in the body of 122 a MIME entity, by giving media type and subtype identifiers, 123 and by providing auxiliary information that may be required 124 for certain media types. After the type and subtype names, 125 the remainder of the header field is simply a set of 126 parameters, specified in an attribute/value notation. The 127 ordering of parameters is not significant. 129 In general, the top-level media type is used to declare the 130 general type of data, while the subtype specifies a specific 131 format for that type of data. Thus, a media type of 132 "image/xyz" is enough to tell a user agent that the data is an 133 image, even if the user agent has no knowledge of the specific 134 image format "xyz". Such information can be used, for 135 example, to decide whether or not to show a user the raw data 136 from an unrecognized subtype -- such an action might be 137 reasonable for unrecognized subtypes of text, but not for 138 unrecognized subtypes of image or audio. For this reason, 139 registered subtypes of text, image, audio, and video should 140 not contain embedded information that is really of a different 141 type. Such compound formats should be represented using the 142 "multipart" or "application" types. 144 Parameters are modifiers of the media subtype, and as such do 145 not fundamentally affect the nature of the content. The set 146 of meaningful parameters depends on the media type and 147 subtype. Most parameters are associated with a single 148 specific subtype. However, a given top-level media type may 149 define parameters which are applicable to any subtype of that 150 type. Parameters may be required by their defining media type 151 or subtype or they may be optional. MIME implementations must 152 also ignore any parameters whose names they do not recognize. 154 MIME's Content-Type header field and media type mechanism has 155 been carefully designed to be extensible, and it is expected 156 that the set of media type/subtype pairs and their associated 157 parameters will grow significantly over time. Several other 158 MIME facilities, such as transfer encodings and 159 message/external-body access types, are likely to have new 160 values defined over time. In order to ensure that the set of 161 such values is developed in an orderly, well-specified, and 162 public manner, MIME sets up a registration process which uses 163 the Internet Assigned Numbers Authority (IANA) as a central 164 registry for MIME's various areas of extensibility. The 165 registration process for these areas is described in a 166 companion document, RFC MIME-REG. 168 The initial seven standard top-level media type are defined 169 and described in the remainder of this document. 171 4. Definition of a Top-Level Media Type 173 The definition of a top-level media type consists of: 175 (1) a name and a description of the type, including 176 criteria for whether a particular type would qualify 177 under that type, 179 (2) the names and definitions of parameters, if any, which 180 are defined for all subtypes of that type (including 181 whether such parameters are required or optional), 183 (3) how a user agent and/or gateway should handle unknown 184 subtypes of this type, 186 (4) general considerations on gatewaying entities of this 187 top-level type, if any, and 189 (5) any restrictions on content-transfer-encodings for 190 entities of this top-level type. 192 5. Overview Of The Initial Top-Level Media Types 194 The five discrete top-level media types are: 196 (1) text -- textual information. The subtype "plain" in 197 particular indicates plain text containing no 198 formatting commands or directives of any sort. Plain 199 text is intended to be displayed "as-is". No special 200 software is required to get the full meaning of the 201 text, aside from support for the indicated character 202 set. Other subtypes are to be used for enriched text in 203 forms where application software may enhance the 204 appearance of the text, but such software must not be 205 required in order to get the general idea of the 206 content. Possible subtypes of text thus include any 207 word processor format that can be read without 208 resorting to software that understands the format. In 209 particular, formats that employ embeddded binary 210 formatting information are not considered directly 211 readable. A very simple and portable subtype, 212 "richtext", was defined in RFC 1341, with a further 213 revision in RFC 1563 under the name "enriched". 215 (2) image -- image data. Image requires a display device 216 (such as a graphical display, a graphics printer, or a 217 FAX machine) to view the information. An initial 218 subtype is defined for the widely-used image format 219 JPEG. 221 (3) audio -- audio data. Audio requires an audio output 222 device (such as a speaker or a telephone) to "display" 223 the contents. An initial subtype "basic" is defined in 224 this document. 226 (4) video -- video data. Video requires the capability to 227 display moving images, typically including specialized 228 hardware and software. An initial subtype "mpeg" is 229 defined in this document. 231 (5) application -- some other kind of data, typically 232 either uninterpreted binary data or information to be 233 processed by an application. The subtype "octet- 234 stream" is to be used in the case of uninterpreted 235 binary data, in which case the simplest recommended 236 action is to offer to write the information into a file 237 for the user. The "PostScript" subtype is also defined 238 for the transport of PostScript material. Other 239 expected uses for "application" include spreadsheets, 240 data for mail-based scheduling systems, and languages 241 for "active" (computational) messaging, and word 242 processing formats that are not directly readable. 243 Note that security considerations may exist for some 244 types of application data, most notably 245 application/PostScript and any form of active 246 messaging. These issues are discussed later in this 247 document. 249 The two composite top-level media types are: 251 (1) multipart -- data consisting of multiple entities of 252 independent data types. Four subtypes are initially 253 defined, including the basic "mixed" subtype specifying 254 a generic mixed set of parts, "alternative" for 255 representing the same data in multiple formats, 256 "parallel" for parts intended to be viewed 257 simultaneously, and "digest" for multipart entities in 258 which each part has a default type of "message/rfc822". 260 (2) message -- an encapsulated message. A body of media 261 type "message" is itself all or a portion of some kind 262 of message object. Such objects may or may not in turn 263 contain other entities. The "rfc822" subtype is used 264 when the encapsulated content is itself an RFC 822 265 message. The "partial" subtype is defined for partial 266 RFC 822 messages, to permit the fragmented transmission 267 of bodies that are thought to be too large to be passed 268 through transport facilities in one piece. Another 269 subtype, "external-body", is defined for specifying 270 large bodies by reference to an external data source. 272 It should be noted that the list of media type values given 273 here may be augmented in time, via the mechanisms described 274 above, and that the set of subtypes is expected to grow 275 substantially. 277 6. Discrete Media Type Values 279 Five of the seven initial media type values refer to discrete 280 bodies. The content of these types must be handled by non-MIME 281 mechanisms; they are opaque to MIME processors. 283 6.1. Text Media Type 285 The text media type is intended for sending material which is 286 principally textual in form. A "charset" parameter may be 287 used to indicate the character set of the body text for text 288 subtypes, notably including the subtype "text/plain", which 289 indicates plain text that doesn't contain any formatting 290 commands or directives. 292 Beyond plain text, there are many formats for representing 293 what might be known as "extended text" -- text with embedded 294 formatting and presentation information. An interesting 295 characteristic of many such representations is that they are 296 to some extent readable even without the software that 297 interprets them. It is useful, then, to distinguish them, at 298 the highest level, from such unreadable data as images, audio, 299 or text represented in an unreadable form. In the absence of 300 appropriate interpretation software, it is reasonable to show 301 subtypes of text to the user, while it is not reasonable to do 302 so with most nontextual data. 304 Such formatted textual data should be represented using 305 subtypes of text. Plausible subtypes of text are typically 306 given by the common name of the representation format, e.g., 307 "text/enriched" [RFC-1563]. 309 6.1.1. Representation of Line Breaks 311 The canonical form of any MIME text type MUST represent a line 312 break as a CRLF sequence. Similarly, any occurrence of CRLF 313 in text MUST represent a line break. Use of CR and LF outside 314 of line break sequences is also forbidden. 316 This rule applies regardless of format or character set or 317 sets involved. 319 NOTE: The proper interpretation of line breaks when a body is 320 displayed depends on the media type. In particular, while it 321 is appropriate to treat a line break as a transition to a new 322 line when displaying a text/plain body, this treatment is 323 actually incorrect for other subtypes of text like 324 text/enriched [RFC-1563]. Similarly, whether or not line 325 breaks should be added during display operations is also a 326 function of the media type. It should not be necessary to add 327 any line breaks to display text/plain correctly, whereas 328 proper display of text/enriched requires the appropriate 329 addition of line breaks. 331 6.1.2. Charset Parameter 333 A critical parameter that may be specified in the Content-Type 334 field for text/plain data is the character set. This is 335 specified with a "charset" parameter, as in: 337 Content-type: text/plain; charset=iso-8859-1 339 Unlike some other parameter values, the values of the charset 340 parameter are NOT case sensitive. The default character set, 341 which must be assumed in the absence of a charset parameter, 342 is US-ASCII. 344 The specification for any future subtypes of "text" must 345 specify whether or not they will also utilize a "charset" 346 parameter, and may possibly restrict its values as well. When 347 used with a particular body, the semantics of the "charset" 348 parameter should be identical to those specified here for 349 "text/plain", i.e., the body consists entirely of characters 350 in the given charset. In particular, definers of future text 351 subtypes should pay close attention to the implications of 352 multioctet character sets for their subtype definitions. 354 This RFC specifies the definition of the charset parameter for 355 the purposes of MIME to be the name of a character set, as 356 "character set" as defined in MIME-IMB. The rules regarding 357 line breaks detailed in the previous section must also be 358 observed -- a character set whose definition does not conform 359 to these rules cannot be used in a MIME text type. 361 An initial list of predefined character set names can be found 362 at the end of this section. Additional character sets may be 363 registered with IANA. 365 Note that if the specified character set includes 8bit data, a 366 Content-Transfer-Encoding header field and a corresponding 367 encoding on the data are required in order to transmit the 368 body via some mail transfer protocols, such as SMTP [RFC-821]. 370 The default character set, US-ASCII, has been the subject of 371 some confusion and ambiguity in the past. Not only were there 372 some ambiguities in the definition, there have been wide 373 variations in practice. In order to eliminate such ambiguity 374 and variations in the future, it is strongly recommended that 375 new user agents explicitly specify a character set as a media 376 type parameter in the Content-Type header field. "US-ASCII" 377 does not indicate an arbitrary -bit character code, but 378 specifies that the body uses character coding that uses the 379 exact correspondence of octets to characters specified in US- 380 ASCII. National use variations of ISO 646 [ISO-646] are NOT 381 US-ASCII and their use in Internet mail is explicitly 382 discouraged. The omission of the ISO 646 character set from 383 this document is deliberate in this regard. The character set 384 name of "US-ASCII" explicitly refers to ANSI X3.4-1986 [US- 385 ASCII] only. The character set name "ASCII" is reserved and 386 must not be used for any purpose. 388 NOTE: RFC 821 explicitly specifies "ASCII", and references an 389 earlier version of the American Standard. Insofar as one of 390 the purposes of specifying a media type and character set is 391 to permit the receiver to unambiguously determine how the 392 sender intended the coded message to be interpreted, assuming 393 anything other than "strict ASCII" as the default would risk 394 unintentional and incompatible changes to the semantics of 395 messages now being transmitted. This also implies that 396 messages containing characters coded according to national 397 variations on ISO 646, or using code-switching procedures 398 (e.g., those of ISO 2022), as well as 8bit or multiple octet 399 character encodings MUST use an appropriate character set 400 specification to be consistent with this specification. 402 The complete US-ASCII character set is listed in ANSI X3.4- 403 1986. Note that the control characters including DEL (0-31, 404 127) have no defined meaning apart from the combination CRLF 405 (US-ASCII values 13 and 10) indicating a new line. Two of the 406 characters have de facto meanings in wide use: FF (12) often 407 means "start subsequent text on the beginning of a new page"; 408 and TAB or HT (9) often (though not always) means "move the 409 cursor to the next available column after the current position 410 where the column number is a multiple of 8 (counting the first 411 column as column 0)." Aside from these conventions, any use 412 of the control characters or DEL in a body must occur within 413 the context of a private agreement between the sender and 414 recipient. Such private agreements are discouraged and should 415 be replaced by the other capabilities of this document. 417 NOTE: Beyond US-ASCII, an enormous proliferation of character 418 sets is possible. It is the opinion of the IETF working group 419 that a large number of character sets is NOT a good thing. We 420 would prefer to specify a SINGLE character set that can be 421 used universally for representing all of the world's languages 422 in Internet mail. Unfortunately, existing practice in several 423 communities seems to point to the continued use of multiple 424 character sets in the near future. For this reason, we define 425 names for a small number of character sets for which a strong 426 constituent base exists. 428 The defined charset values are: 430 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. 432 (2) ISO-8859-X -- where "X" is to be replaced, as 433 necessary, for the parts of ISO-8859 [ISO-8859]. Note 434 that the ISO 646 character sets have deliberately been 435 omitted in favor of their 8859 replacements, which are 436 the designated character sets for Internet mail. As of 437 the publication of this document, the legitimate values 438 for "X" are the digits 1 through 9. 440 All of these character sets are used as pure 7bit or 8bit sets 441 without any shift or escape functions. The meaning of shift 442 and escape sequences in these character sets is not defined. 444 The character sets specified above are the ones that were 445 relatively uncontroversial during the drafting of MIME. This 446 document does not endorse the use of any particular character 447 set other than US-ASCII, and recognizes that the future 448 evolution of world character sets remains unclear. It is 449 expected that in the future, additional character sets will be 450 registered for use in MIME. 452 Note that the character set used, if anything other than US- 453 ASCII, must always be explicitly specified in the Content-Type 454 field. 456 No other character set name may be used in Internet mail 457 without the publication of a formal specification and its 458 registration with IANA, or by private agreement, in which case 459 the character set name must begin with "X-". 461 Implementors are discouraged from defining new character sets 462 unless absolutely necessary. 464 The "charset" parameter has been defined primarily for the 465 purpose of textual data, and is described in this section for 466 that reason. However, it is conceivable that non-textual data 467 might also wish to specify a charset value for some purpose, 468 in which case the same syntax and values should be used. 470 In general, composition software should always use the "lowest 471 common denominator" character set possible. For example, if a 472 body contains only US-ASCII characters, it SHOULD be marked as 473 being in the US-ASCII character set, not ISO-8859-1, which, 474 like all the ISO-8859 family of character sets, is a superset 475 of US-ASCII. More generally, if a widely-used character set 476 is a subset of another character set, and a body contains only 477 characters in the widely-used subset, it should be labelled as 478 being in that subset. This will increase the chances that the 479 recipient will be able to view the resulting entity correctly. 481 6.1.3. Plain Subtype 483 The simplest and most important subtype of text is "plain". 484 This indicates plain text that does not contain any formatting 485 commands or directives. Plain text is intended to be displayed 486 "as-is", that is, no formatting operations of any sort other 487 than support for the indicated character set should be 488 necessary for proper display. The default media type of 489 "text/plain; charset=us-ascii" for Internet mail describes 490 existing Internet practice. That is, it is the type of body 491 defined by RFC 822. 493 No other text subtype is defined by this document. 495 6.1.4. Unrecognized Subtypes 497 Unrecognized subtypes of text should be treated as subtype 498 "plain" as long as the MIME implementation knows how to handle 499 the charset. Unrecognized subtypes which also specify an 500 unrecognized charset should be treated as "application/octet- 501 stream". 503 6.2. Image Media Type 505 A media type of "image" indicates that the body contains an 506 image. The subtype names the specific image format. These 507 names are not case sensitive. An initial subtype is "jpeg" for 508 the JPEG format using JFIF encoding [JPEG]. 510 The list of image subtypes given here is neither exclusive nor 511 exhaustive, and is expected to grow as more types are 512 registered with IANA, as described in RFC MIME-REG. 514 Unrecognized subtypes of image should at a miniumum be treated 515 as "application/octet-stream". Implementations may optionally 516 elect to pass subtypes of image that they do not specifically 517 recognize to a secure and robust general-purpose image viewing 518 application, if such an application is available. 520 NOTE: Using of a generic-purpose image viewing application 521 this way inherits the security problems of the most dangerous 522 type supported by the application. 524 6.3. Audio Media Type 526 A media type of "audio" indicates that the body contains audio 527 data. Although there is not yet a consensus on an "ideal" 528 audio format for use with computers, there is a pressing need 529 for a format capable of providing interoperable behavior. 531 The initial subtype of "basic" is specified to meet this 532 requirement by providing an absolutely minimal lowest common 533 denominator audio format. It is expected that richer formats 534 for higher quality and/or lower bandwidth audio will be 535 defined by a later document. 537 The content of the "audio/basic" subtype is single channel 538 audio encoded using 8bit ISDN mu-law [PCM] at a sample rate of 539 8000 Hz. 541 Unrecognized subtypes of audio should at a miniumum be treated 542 as "application/octet-stream". Implementations may optionally 543 elect to pass subtypes of audio that they do not specifically 544 recognize to a robust general-purpose audio playing 545 application, if such an application is available. 547 6.4. Video Media Type 549 A media type of "video" indicates that the body contains a 550 time-varying-picture image, possibly with color and 551 coordinated sound. The term "video" is used extremely 552 generically, rather than with reference to any particular 553 technology or format, and is not meant to preclude subtypes 554 such as animated drawings encoded compactly. The subtype 555 "mpeg" refers to video coded according to the MPEG standard 556 [MPEG]. 558 Note that although in general this document strongly 559 discourages the mixing of multiple media in a single body, it 560 is recognized that many so-called "video" formats include a 561 representation for synchronized audio, and this is explicitly 562 permitted for subtypes of "video". 564 Unrecognized subtypes of video should at a minumum be treated 565 as "application/octet-stream". Implementations may optionally 566 elect to pass subtypes of video that they do not specifically 567 recognize to a robust general-purpose video display 568 application, if such an application is available. 570 6.5. Application Media Type 572 The "application" media type is to be used for discrete data 573 which do not fit in any of the other categories, and 574 particularly for data to be processed by some type of 575 application program. This is information which must be 576 processed by an application before it is viewable or usable by 577 a user. Expected uses for the application media type include 578 file transfer, spreadsheets, data for mail-based scheduling 579 systems, and languages for "active" (computational) material. 580 (The latter, in particular, can pose security problems which 581 must be understood by implementors, and are considered in 582 detail in the discussion of the application/PostScript media 583 type.) 585 For example, a meeting scheduler might define a standard 586 representation for information about proposed meeting dates. 587 An intelligent user agent would use this information to 588 conduct a dialog with the user, and might then send additional 589 material based on that dialog. More generally, there have 590 been several "active" messaging languages developed in which 591 programs in a suitably specialized language are transported to 592 a remote location and automatically run in the recipient's 593 environment. 595 Such applications may be defined as subtypes of the 596 "application" media type. This document defines two subtypes: 597 octet-stream, and PostScript. 599 The subtype of application will often be the name of the 600 application for which the data are intended. This does not 601 mean, however, that any application program name may be used 602 freely as a subtype of application. 604 6.5.1. Octet-Stream Subtype 606 The "octet-stream" subtype is used to indicate that a body 607 contains arbitrary binary data. The set of currently defined 608 parameters is: 610 (1) TYPE -- the general type or category of binary data. 611 This is intended as information for the human recipient 612 rather than for any automatic processing. 614 (2) PADDING -- the number of bits of padding that were 615 appended to the bit-stream comprising the actual 616 contents to produce the enclosed 8bit byte-oriented 617 data. This is useful for enclosing a bit-stream in a 618 body when the total number of bits is not a multiple of 619 8. 621 Both of these parameters are optional. 623 An additional parameter, "CONVERSIONS", was defined in RFC 624 1341 but has since been removed. RFC 1341 also defined the 625 use of a "NAME" parameter which gave a suggested file name to 626 be used if the data were to be written to a file. This has 627 been deprecated in anticipation of a separate Content- 628 Disposition header field, to be defined in a subsequent RFC. 630 The recommended action for an implementation that receives an 631 application/octet-stream entity is to simply offer to put the 632 data in a file, with any Content-Transfer-Encoding undone, or 633 perhaps to use it as input to a user-specified process. 635 To reduce the danger of transmitting rogue programs, it is 636 strongly recommended that implementations NOT implement a 637 path-search mechanism whereby an arbitrary program named in 638 the Content-Type parameter (e.g., an "interpreter=" parameter) 639 is found and executed using the message body as input. 641 6.5.2. PostScript Subtype 643 A media type of "application/postscript" indicates a 644 PostScript program. Currently two variants of the PostScript 645 language are allowed; the original level 1 variant is 646 described in [POSTSCRIPT] and the more recent level 2 variant 647 is described in [POSTSCRIPT2]. 649 PostScript is a registered trademark of Adobe Systems, Inc. 650 Use of the MIME media type "application/postscript" implies 651 recognition of that trademark and all the rights it entails. 653 The PostScript language definition provides facilities for 654 internal labelling of the specific language features a given 655 program uses. This labelling, called the PostScript document 656 structuring conventions, or DSC, is very general and provides 657 substantially more information than just the language level. 658 The use of document structuring conventions, while not 659 required, is strongly recommended as an aid to 660 interoperability. Documents which lack proper structuring 661 conventions cannot be tested to see whether or not they will 662 work in a given environment. As such, some systems may assume 663 the worst and refuse to process unstructured documents. 665 The execution of general-purpose PostScript interpreters 666 entails serious security risks, and implementors are 667 discouraged from simply sending PostScript bodies to "off- 668 the-shelf" interpreters. While it is usually safe to send 669 PostScript to a printer, where the potential for harm is 670 greatly constrained by typical printer environments, 671 implementors should consider all of the following before they 672 add interactive display of PostScript bodies to their MIME 673 readers. 675 The remainder of this section outlines some, though probably 676 not all, of the possible problems with the transport of 677 PostScript entities. 679 (1) Dangerous operations in the PostScript language 680 include, but may not be limited to, the PostScript 681 operators "deletefile", "renamefile", "filenameforall", 682 and "file". "File" is only dangerous when applied to 683 something other than standard input or output. 684 Implementations may also define additional nonstandard 685 file operators; these may also pose a threat to 686 security. "Filenameforall", the wildcard file search 687 operator, may appear at first glance to be harmless. 688 Note, however, that this operator has the potential to 689 reveal information about what files the recipient has 690 access to, and this information may itself be 691 sensitive. Message senders should avoid the use of 692 potentially dangerous file operators, since these 693 operators are quite likely to be unavailable in secure 694 PostScript implementations. Message receiving and 695 displaying software should either completely disable 696 all potentially dangerous file operators or take 697 special care not to delegate any special authority to 698 their operation. These operators should be viewed as 699 being done by an outside agency when interpreting 700 PostScript documents. Such disabling and/or checking 701 should be done completely outside of the reach of the 702 PostScript language itself; care should be taken to 703 insure that no method exists for re-enabling full- 704 function versions of these operators. 706 (2) The PostScript language provides facilities for exiting 707 the normal interpreter, or server, loop. Changes made 708 in this "outer" environment are customarily retained 709 across documents, and may in some cases be retained 710 semipermanently in nonvolatile memory. The operators 711 associated with exiting the interpreter loop have the 712 potential to interfere with subsequent document 713 processing. As such, their unrestrained use 714 constitutes a threat of service denial. PostScript 715 operators that exit the interpreter loop include, but 716 may not be limited to, the exitserver and startjob 717 operators. Message sending software should not 718 generate PostScript that depends on exiting the 719 interpreter loop to operate, since the ability to exit 720 will probably be unavailable in secure PostScript 721 implementations. Message receiving and displaying 722 software should completely disable the ability to make 723 retained changes to the PostScript environment by 724 eliminating or disabling the "startjob" and 725 "exitserver" operations. If these operations cannot be 726 eliminated or completely disabled the password 727 associated with them should at least be set to a hard- 728 to-guess value. 730 (3) PostScript provides operators for setting system-wide 731 and device-specific parameters. These parameter 732 settings may be retained across jobs and may 733 potentially pose a threat to the correct operation of 734 the interpreter. The PostScript operators that set 735 system and device parameters include, but may not be 736 limited to, the "setsystemparams" and "setdevparams" 737 operators. Message sending software should not 738 generate PostScript that depends on the setting of 739 system or device parameters to operate correctly. The 740 ability to set these parameters will probably be 741 unavailable in secure PostScript implementations. 742 Message receiving and displaying software should 743 disable the ability to change system and device 744 parameters. If these operators cannot be completely 745 disabled the password associated with them should at 746 least be set to a hard-to-guess value. 748 (4) Some PostScript implementations provide nonstandard 749 facilities for the direct loading and execution of 750 machine code. Such facilities are quite obviously open 751 to substantial abuse. Message sending software should 752 not make use of such features. Besides being totally 753 hardware-specific, they are also likely to be 754 unavailable in secure implementations of PostScript. 755 Message receiving and displaying software should not 756 allow such operators to be used if they exist. 758 (5) PostScript is an extensible language, and many, if not 759 most, implementations of it provide a number of their 760 own extensions. This document does not deal with such 761 extensions explicitly since they constitute an unknown 762 factor. Message sending software should not make use 763 of nonstandard extensions; they are likely to be 764 missing from some implementations. Message receiving 765 and displaying software should make sure that any 766 nonstandard PostScript operators are secure and don't 767 present any kind of threat. 769 (6) It is possible to write PostScript that consumes huge 770 amounts of various system resources. It is also 771 possible to write PostScript programs that loop 772 indefinitely. Both types of programs have the 773 potential to cause damage if sent to unsuspecting 774 recipients. Message-sending software should avoid the 775 construction and dissemination of such programs, which 776 is antisocial. Message receiving and displaying 777 software should provide appropriate mechanisms to abort 778 processing of a document after a reasonable amount of 779 time has elapsed. In addition, PostScript interpreters 780 should be limited to the consumption of only a 781 reasonable amount of any given system resource. 783 (7) It is possible to include raw binary information inside 784 PostScript in various forms. This is not recommended 785 for use in Internet mail, both because it is not 786 supported by all PostScript interpreters and because it 787 significantly complicates the use of a MIME Content- 788 Transfer-Encoding. (Without such binary, PostScript 789 may typically be viewed as line-oriented data. The 790 treatment of CRLF sequences becomes extremely 791 problematic if binary and line-oriented data are mixed 792 in a single Postscript data stream.) 794 (8) Finally, bugs may exist in some PostScript interpreters 795 which could possibly be exploited to gain unauthorized 796 access to a recipient's system. Apart from noting this 797 possibility, there is no specific action to take to 798 prevent this, apart from the timely correction of such 799 bugs if any are found. 801 6.5.3. Other Application Subtypes 803 It is expected that many other subtypes of application will be 804 defined in the future. MIME implementations must at a minimum 805 treat any unrecognized subtypes as being equivalent to 806 "application/octet-stream". 808 7. Composite Media Type Values 810 The remaining two of the seven initial Content-Type values 811 refer to composite entities. Composite entities are handled 812 using MIME mechanisms -- a MIME processor typically handles 813 the body directly. 815 7.1. Multipart Media Type 817 In the case of multipart entities, in which one or more 818 different sets of data are combined in a single body, a 819 "multipart" media type field must appear in the entity's 820 header. The body must then contain one or more body parts, 821 each preceded by a boundary delimiter line, and the last one 822 followed by a closing boundary delimiter line. After its 823 boundary delimiter line, each body part then consists of a 824 header area, a blank line, and a body area. Thus a body part 825 is similar to an RFC 822 message in syntax, but different in 826 meaning. 828 A body part is an entity and hence is NOT to be interpreted as 829 actually being an RFC 822 message. To begin with, NO header 830 fields are actually required in body parts. A body part that 831 starts with a blank line, therefore, is allowed and is a body 832 part for which all default values are to be assumed. In such 833 a case, the absence of a Content-Type header usually indicates 834 that the corresponding body has a content-type of "text/plain; 835 charset=US-ASCII". 837 The only header fields that have defined meaning for body 838 parts are those the names of which begin with "Content-". All 839 other header fields may be ignored in body parts. Although 840 they should generally be retained if at all possible, they may 841 be discarded by gateways if necessary. Such other fields are 842 permitted to appear in body parts but must not be depended on. 843 "X-" fields may be created for experimental or private 844 purposes, with the recognition that the information they 845 contain may be lost at some gateways. 847 NOTE: The distinction between an RFC 822 message and a body 848 part is subtle, but important. A gateway between Internet and 849 X.400 mail, for example, must be able to tell the difference 850 between a body part that contains an image and a body part 851 that contains an encapsulated message, the body of which is a 852 JPEG image. In order to represent the latter, the body part 853 must have "Content-Type: message/rfc822", and its body (after 854 the blank line) must be the encapsulated message, with its own 855 "Content-Type: image/jpeg" header field. The use of similar 856 syntax facilitates the conversion of messages to body parts, 857 and vice versa, but the distinction between the two must be 858 understood by implementors. (For the special case in which 859 parts actually are messages, a "digest" subtype is also 860 defined.) 862 As stated previously, each body part is preceded by a boundary 863 delimiter line that contains the boundary delimiter. The 864 boundary delimiter MUST NOT appear inside any of the 865 encapsulated parts, on a line by itself or as the prefix of 866 any line. This implies that it is crucial that the composing 867 agent be able to choose and specify a unique boundary 868 parameter value that does not contain the boundary parameter 869 value of an enclosing multipart as a prefix. 871 All present and future subtypes of the "multipart" type must 872 use an identical syntax. Subtypes may differ in their 873 semantics, and may impose additional restrictions on syntax, 874 but must conform to the required syntax for the multipart 875 type. This requirement ensures that all conformant user 876 agents will at least be able to recognize and separate the 877 parts of any multipart entity, even those of an unrecognized 878 subtype. 880 As stated in the definition of the Content-Transfer-Encoding 881 field [MIME-IMB], no encoding other than "7bit", "8bit", or 882 "binary" is permitted for entities of type "multipart". The 883 multipart boundary delimiters and header fields are always 884 represented as 7bit US-ASCII in any case (though the header 885 fields may encode non-US-ASCII header text as per RFC MIME- 886 HEADERS) and data within the body parts can be encoded on a 887 part-by-part basis, with Content-Transfer-Encoding fields for 888 each appropriate body part. 890 7.1.1. Common Syntax 892 This section defines a common syntax for subtypes of 893 multipart. All subtypes of multipart must use this syntax. A 894 simple example of a multipart message also appears in this 895 section. An example of a more complex multipart message is 896 given in RFC MIME-CONF. 898 The Content-Type field for multipart entities requires one 899 parameter, "boundary". The boundary delimiter line is then 900 defined as a line consisting entirely of two hyphen characters 901 ("-", decimal value 45) followed by the boundary parameter 902 value from the Content-Type header field, optional linear 903 whitespace, and a terminating CRLF. 905 NOTE: The hyphens are for rough compatibility with the 906 earlier RFC 934 method of message encapsulation, and for ease 907 of searching for the boundaries in some implementations. 908 However, it should be noted that multipart messages are NOT 909 completely compatible with RFC 934 encapsulations; in 910 particular, they do not obey RFC 934 quoting conventions for 911 embedded lines that begin with hyphens. This mechanism was 912 chosen over the RFC 934 mechanism because the latter causes 913 lines to grow with each level of quoting. The combination of 914 this growth with the fact that SMTP implementations sometimes 915 wrap long lines made the RFC 934 mechanism unsuitable for use 916 in the event that deeply-nested multipart structuring is ever 917 desired. 919 WARNING TO IMPLEMENTORS: The grammar for parameters on the 920 Content-type field is such that it is often necessary to 921 enclose the boundary parameter values in quotes on the 922 Content-type line. This is not always necessary, but never 923 hurts. Implementors should be sure to study the grammar 924 carefully in order to avoid producing invalid Content-type 925 fields. Thus, a typical multipart Content-Type header field 926 might look like this: 928 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p 930 But the following is not valid: 932 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p 934 (because of the colon) and must instead be represented as 936 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" 938 This Content-Type value indicates that the content consists of 939 one or more parts, each with a structure that is syntactically 940 identical to an RFC 822 message, except that the header area 941 is allowed to be completely empty, and that the parts are each 942 preceded by the line 944 --gc0pJq0M:08jU534c0p 946 The boundary delimiter MUST occur at the beginning of a line, 947 i.e., following a CRLF, and the initial CRLF is considered to 948 be attached to the boundary delimiter line rather than part of 949 the preceding part. The boundary may be followed by zero or 950 more characters of linear whitespace. It is then terminated by 951 either another CRLF and the header fields for the next part, 952 or by two CRLFs, in which case there are no header fields for 953 the next part. If no Content-Type field is present it is 954 assumed to be of message/rfc822 in a multipart/digest and 955 text/plain otherwise. 957 NOTE: The CRLF preceding the boundary delimiter line is 958 conceptually attached to the boundary so that it is possible 959 to have a part that does not end with a CRLF (line break). 960 Body parts that must be considered to end with line breaks, 961 therefore, must have two CRLFs preceding the boundary 962 delimiter line, the first of which is part of the preceding 963 body part, and the second of which is part of the 964 encapsulation boundary. 966 Boundary delimiters must not appear within the encapsulated 967 material, and must be no longer than 70 characters, not 968 counting the two leading hyphens. 970 The boundary delimiter line following the last body part is a 971 distinguished delimiter that indicates that no further body 972 parts will follow. Such a delimiter line is identical to the 973 previous delimiter lines, with the addition of two more 974 hyphens after the boundary parameter value. 976 --gc0pJq0M:08jU534c0p-- 978 NOTE TO IMPLEMENTORS: Boundary string comparisons must 979 compare the boundary value with the beginning of each 980 candidate line. An exact match of the entire candidate line 981 is not required; it is sufficient that the boundary appear in 982 its entirety following the CRLF. 984 There appears to be room for additional information prior to 985 the first boundary delimiter line and following the final 986 boundary delimiter line. These areas should generally be left 987 blank, and implementations must ignore anything that appears 988 before the first boundary delimiter line or after the last 989 one. 991 NOTE: These "preamble" and "epilogue" areas are generally not 992 used because of the lack of proper typing of these parts and 993 the lack of clear semantics for handling these areas at 994 gateways, particularly X.400 gateways. However, rather than 995 leaving the preamble area blank, many MIME implementations 996 have found this to be a convenient place to insert an 997 explanatory note for recipients who read the message with 998 pre-MIME software, since such notes will be ignored by MIME- 999 compliant software. 1001 NOTE: Because boundary delimiters must not appear in the body 1002 parts being encapsulated, a user agent must exercise care to 1003 choose a unique boundary parameter value. The boundary 1004 parameter value in the example above could have been the 1005 result of an algorithm designed to produce boundary delimiters 1006 with a very low probability of already existing in the data to 1007 be encapsulated without having to prescan the data. Alternate 1008 algorithms might result in more "readable" boundary delimiters 1009 for a recipient with an old user agent, but would require more 1010 attention to the possibility that the boundary delimiter might 1011 appear at the beginning of some line in the encapsulated part. 1012 The simplest boundary delimiter line possible is something 1013 like "---", with a closing boundary delimiter line of "-----". 1015 As a very simple example, the following multipart message has 1016 two parts, both of them plain text, one of them explicitly 1017 typed and one of them implicitly typed: 1019 From: Nathaniel Borenstein 1020 To: Ned Freed 1021 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) 1022 Subject: Sample message 1023 MIME-Version: 1.0 1024 Content-type: multipart/mixed; boundary="simple boundary" 1026 This is the preamble. It is to be ignored, though it 1027 is a handy place for composition agents to include an 1028 explanatory note to non-MIME conformant readers. 1030 --simple boundary 1032 This is implicitly typed plain US-ASCII text. 1033 It does NOT end with a linebreak. 1034 --simple boundary 1035 Content-type: text/plain; charset=us-ascii 1037 This is explicitly typed plain US-ASCII text. 1038 It DOES end with a linebreak. 1040 --simple boundary-- 1042 This is the epilogue. It is also to be ignored. 1044 The use of a media type of multipart in a body part within 1045 another multipart entity is explicitly allowed. In such 1046 cases, for obvious reasons, care must be taken to ensure that 1047 each nested multipart entity uses a different boundary 1048 delimiter. See RFC MIME-CONF for an example of nested 1049 multipart entities. 1051 The use of the multipart media type with only a single body 1052 part may be useful in certain contexts, and is explicitly 1053 permitted. 1055 NOTE: Experience has shown that a multipart media type with a 1056 single body part is useful for sending non-text media types. 1057 It has the advantage of providing the preamble as a place to 1058 include decoding instructions. In addition, a number of SMTP 1059 gateways move or remove the MIME headers, and a clever MIME 1060 decoder can take a good guess at multipart boundaries even in 1061 the absence of the Content-Type header and thereby successful 1062 decode the message. 1064 The only mandatory global parameter for the multipart media 1065 type is the boundary parameter, which consists of 1 to 70 1066 characters from a set of characters known to be very robust 1067 through mail gateways, and NOT ending with white space. (If a 1068 boundary delimiter line appears to end with white space, the 1069 white space must be presumed to have been added by a gateway, 1070 and must be deleted.) It is formally specified by the 1071 following BNF: 1073 boundary := 0*69 bcharsnospace 1075 bchars := bcharsnospace / " " 1077 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 1078 "+" / "_" / "," / "-" / "." / 1079 "/" / ":" / "=" / "?" 1081 Overall, the body of a multipart entity may be specified as 1082 follows: 1084 dash-boundary := "--" boundary 1085 ; boundary taken from the value of 1086 ; boundary parameter of the 1087 ; Content-Type field. 1089 multipart-body := [preamble CRLF] 1090 dash-boundary transport-padding CRLF 1091 body-part *encapsulation 1092 close-delimiter transport-padding 1093 [CRLF epilogue] 1095 transport-padding := *LWSP-char 1096 ; Composers MUST NOT generate 1097 ; non-zero length transport 1098 ; padding, but receivers MUST 1099 ; be able to handle padding 1100 ; added by message transports. 1102 encapsulation := delimiter transport-padding 1103 CRLF body-part 1105 delimiter := CRLF dash-boundary 1107 close-delimiter := delimiter "--" 1108 preamble := discard-text 1110 epilogue := discard-text 1112 discard-text := *(*text CRLF) *text 1113 ; May be ignored or discarded. 1115 body-part := MIME-part-headers [CRLF *OCTET] 1116 ; Lines in a body-part must not start 1117 ; with the specified dash-boundary and 1118 ; the delimiter must not appear anywhere 1119 ; in the body part. Note that the 1120 ; semantics of a body-part differ from 1121 ; the semantics of a message, as 1122 ; described in the text. 1124 OCTET := 1126 IMPORTANT: The free insertion of linear-white-space and RFC 1127 822 comments between the elements shown in this BNF is NOT 1128 allowed since this BNF does not specify a structured header 1129 field. 1131 NOTE: In certain transport enclaves, RFC 822 restrictions 1132 such as the one that limits bodies to printable US-ASCII 1133 characters may not be in force. (That is, the transport 1134 domains may exist that resemble standard Internet mail 1135 transport as specified in RFC 821 and assumed by RFC 822, but 1136 without certain restrictions.) The relaxation of these 1137 restrictions should be construed as locally extending the 1138 definition of bodies, for example to include octets outside of 1139 the US-ASCII range, as long as these extensions are supported 1140 by the transport and adequately documented in the Content- 1141 Transfer-Encoding header field. However, in no event are 1142 headers (either message headers or body part headers) allowed 1143 to contain anything other than US-ASCII characters. 1145 NOTE: Conspicuously missing from the multipart type is a 1146 notion of structured, related body parts. It is recommended 1147 that those wishing to provide more structured or integrated 1148 multipart messaging facilities should define subtypes of 1149 multipart that are syntactically identical but define 1150 relationships between the various parts. For example, subtypes 1151 of multipart could be defined that include a distinguished 1152 part which in turn is used to specify the relationships 1153 between the other parts, probably referring to them by their 1154 Content-ID field. Old implementations will not recognize the 1155 new subtype if this approach is used, but will treat it as 1156 multipart/mixed and will thus be able to show the user the 1157 parts that are recognized. 1159 7.1.2. Handling Nested Messages and Multiparts 1161 The "message/rfc822" subtype defined in a subsequent section 1162 of this document has no terminating condition other than 1163 running out of data. Similarly, an improperly truncated 1164 multipart entity may not have any terminating boundary marker, 1165 and can turn up operationally due to mail system malfunctions. 1167 It is essential that such entities be handled correctly when 1168 they are themselves imbedded inside of another multipart 1169 structure. MIME implementations are therefore required to 1170 recognize outer level boundary markers at ANY level of inner 1171 nesting. It is not sufficient to only check for the next 1172 expected marker or other terminating condition. 1174 7.1.3. Mixed Subtype 1176 The "mixed" subtype of multipart is intended for use when the 1177 body parts are independent and need to be bundled in a 1178 particular order. Any multipart subtypes that an 1179 implementation does not recognize must be treated as being of 1180 subtype "mixed". 1182 7.1.4. Alternative Subtype 1184 The multipart/alternative type is syntactically identical to 1185 multipart/mixed, but the semantics are different. In 1186 particular, each of the body parts is an "alternative" version 1187 of the same information. 1189 Systems should recognize that the content of the various parts 1190 are interchangeable. Systems should choose the "best" type 1191 based on the local environment and references, in some cases 1192 even through user interaction. As with multipart/mixed, the 1193 order of body parts is significant. In this case, the 1194 alternatives appear in an order of increasing faithfulness to 1195 the original content. In general, the best choice is the LAST 1196 part of a type supported by the recipient system's local 1197 environment. 1199 Multipart/alternative may be used, for example, to send a 1200 message in a fancy text format in such a way that it can 1201 easily be displayed anywhere: 1203 From: Nathaniel Borenstein 1204 To: Ned Freed 1205 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST) 1206 Subject: Formatted text mail 1207 MIME-Version: 1.0 1208 Content-Type: multipart/alternative; boundary=boundary42 1210 --boundary42 1211 Content-Type: text/plain; charset=us-ascii 1213 ... plain text version of message goes here ... 1215 --boundary42 1216 Content-Type: text/enriched 1218 ... RFC 1563 text/enriched version of same message 1219 goes here ... 1221 --boundary42 1222 Content-Type: application/x-whatever 1224 ... fanciest version of same message goes here ... 1226 --boundary42-- 1228 In this example, users whose mail systems understood the 1229 "application/x-whatever" format would see only the fancy 1230 version, while other users would see only the enriched or 1231 plain text version, depending on the capabilities of their 1232 system. 1234 In general, user agents that compose multipart/alternative 1235 entities must place the body parts in increasing order of 1236 preference, that is, with the preferred format last. For 1237 fancy text, the sending user agent should put the plainest 1238 format first and the richest format last. Receiving user 1239 agents should pick and display the last format they are 1240 capable of displaying. In the case where one of the 1241 alternatives is itself of type "multipart" and contains 1242 unrecognized sub-parts, the user agent may choose either to 1243 show that alternative, an earlier alternative, or both. 1245 NOTE: From an implementor's perspective, it might seem more 1246 sensible to reverse this ordering, and have the plainest 1247 alternative last. However, placing the plainest alternative 1248 first is the friendliest possible option when 1249 multipart/alternative entities are viewed using a non-MIME- 1250 conformant viewer. While this approach does impose some 1251 burden on conformant MIME viewers, interoperability with older 1252 mail readers was deemed to be more important in this case. 1254 It may be the case that some user agents, if they can 1255 recognize more than one of the formats, will prefer to offer 1256 the user the choice of which format to view. This makes 1257 sense, for example, if a message includes both a nicely- 1258 formatted image version and an easily-edited text version. 1259 What is most critical, however, is that the user not 1260 automatically be shown multiple versions of the same data. 1261 Either the user should be shown the last recognized version or 1262 should be given the choice. 1264 THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each 1265 part of a multipart/alternative entity represents the same 1266 data, but the mappings between the two are not necessarily 1267 without information loss. For example, information is lost 1268 when translating ODA to PostScript or plain text. It is 1269 recommended that each part should have a different Content-ID 1270 value in the case where the information content of the two 1271 parts is not identical. And when the information content is 1272 identical -- for example, where several parts of type 1273 "message/external-body" specify alternate ways to access the 1274 identical data -- the same Content-ID field value should be 1275 used, to optimize any caching mechanisms that might be present 1276 on the recipient's end. However, the Content-ID values used 1277 by the parts should NOT be the same Content-ID value that 1278 describes the multipart/alternative as a whole, if there is 1279 any such Content-ID field. That is, one Content-ID value will 1280 refer to the multipart/alternative entity, while one or more 1281 other Content-ID values will refer to the parts inside it. 1283 7.1.5. Digest Subtype 1285 This document defines a "digest" subtype of the multipart 1286 Content-Type. This type is syntactically identical to 1287 multipart/mixed, but the semantics are different. In 1288 particular, in a digest, the default Content-Type value for a 1289 body part is changed from "text/plain" to "message/rfc822". 1290 This is done to allow a more readable digest format that is 1291 largely compatible (except for the quoting convention) with 1292 RFC 934. 1294 Note: Though it is possible to specify a Content-Type value 1295 for a body part in a digest which is other than 1296 "message/rfc822", such as a text/plain part containing a 1297 description of the material in the digest, actually doing so 1298 is undesireble. The "multipart/digest" Content-Type is 1299 intended to be used to send collections of messages. If a 1300 "text/plain" part is needed, it should be included as a 1301 seperate part of a "multipart/mixed" message. 1303 A digest in this format might, then, look something like this: 1305 From: Moderator-Address 1306 To: Recipient-List 1307 Date: Mon, 22 Mar 1994 13:34:51 +0000 1308 Subject: Internet Digest, volume 42 1309 MIME-Version: 1.0 1310 Content-Type: multipart/mixed; 1311 boundary="---- main boundary ----" 1313 ------ main boundary ---- 1315 ...Introductory text or table of contents... 1317 ------ main boundary ---- 1318 Content-Type: multipart/digest; 1319 boundary="---- next message ----" 1321 ------ next message ---- 1323 From: someone-else 1324 Date: Fri, 26 Mar 1993 11:13:32 +0200 1325 Subject: my opinion 1327 ...body goes here ... 1329 ------ next message ---- 1331 From: someone-else-again 1332 Date: Fri, 26 Mar 1993 10:07:13 -0500 1333 Subject: my different opinion 1335 ... another body goes here ... 1337 ------ next message ------ 1339 ------ main boundary ------ 1341 7.1.6. Parallel Subtype 1343 This document defines a "parallel" subtype of the multipart 1344 Content-Type. This type is syntactically identical to 1345 multipart/mixed, but the semantics are different. In 1346 particular, in a parallel entity, the order of body parts is 1347 not significant. 1349 A common presentation of this type is to display all of the 1350 parts simultaneously on hardware and software that are capable 1351 of doing so. However, composing agents should be aware that 1352 many mail readers will lack this capability and will show the 1353 parts serially in any event. 1355 7.1.7. Other Multipart Subtypes 1357 Other multipart subtypes are expected in the future. MIME 1358 implementations must in general treat unrecognized subtypes of 1359 multipart as being equivalent to "multipart/mixed". 1361 7.2. Message Media Type 1363 It is frequently desirable, in sending mail, to encapsulate 1364 another mail message. A special media type, "message", is 1365 defined to facilitate this. In particular, the "rfc822" 1366 subtype of "message" is used to encapsulate RFC 822 messages. 1368 NOTE: It has been suggested that subtypes of message might be 1369 defined for forwarded or rejected messages. However, 1370 forwarded and rejected messages can be handled as multipart 1371 messages in which the first part contains any control or 1372 descriptive information, and a second part, of type 1373 message/rfc822, is the forwarded or rejected message. 1374 Composing rejection and forwarding messages in this manner 1375 will preserve the type information on the original message and 1376 allow it to be correctly presented to the recipient, and hence 1377 is strongly encouraged. 1379 Subtypes of message often impose restrictions on what 1380 encodings are allowed. These restrictions are described in 1381 conjunction with each specific subtype. 1383 Mail gateways, relays, and other mail handling agents are 1384 commonly known to alter the top-level header of an RFC 822 1385 message. In particular, they frequently add, remove, or 1386 reorder header fields. These operations are explicitly 1387 forbidden for the encapsulated headers embedded in the bodies 1388 of messages of type "message." 1389 7.2.1. RFC822 Subtype 1391 A media type of "message/rfc822" indicates that the body 1392 contains an encapsulated message, with the syntax of an RFC 1393 822 message. However, unlike top-level RFC 822 messages, the 1394 restriction that each message/rfc822 body must include a 1395 "From", "Date", and at least one destination header is removed 1396 and replaced with the requirement that at least one of "From", 1397 "Subject", or "Date" must be present. 1399 It should be noted that, despite the use of the numbers "822", 1400 a message/rfc822 entity isn't restricted to material in strict 1401 conformance to RFC822. Such entities can also include enhanced 1402 information as defined in this document. In other words, a 1403 message/rfc822 message could well be a News article or a MIME 1404 message. 1406 No encoding other than "7bit", "8bit", or "binary" is 1407 permitted for the body of a "message/rfc822" entity. The 1408 message header fields are always US-ASCII in any case, and 1409 data within the body can still be encoded, in which case the 1410 Content-Transfer-Encoding header field in the encapsulated 1411 message will reflect this. Non-US-ASCII text in the headers 1412 of an encapsulated message can be specified using the 1413 mechanisms described in RFC MIME-HEADERS. 1415 7.2.2. Partial Subtype 1417 The "partial" subtype is defined to allow large entities to be 1418 delivered as several separate pieces of mail and automatically 1419 reassembled by a receiving user agent. (The concept is 1420 similar to IP fragmentation and reassembly in the basic 1421 Internet Protocols.) This mechanism can be used when 1422 intermediate transport agents limit the size of individual 1423 messages that can be sent. The media type "message/partial" 1424 thus indicates that the body contains a fragment of a larger 1425 entity. 1427 Because data of type "message" may never be encoded in base64 1428 or quoted-printable, a problem might arise if message/partial 1429 entities are constructed in an environment that supports 1430 binary or 8bit transport. The problem is that the binary data 1431 would be split into multiple message/partial messages, each of 1432 them requiring binary transport. If such messages were 1433 encountered at a gateway into a 7bit transport environment, 1434 there would be no way to properly encode them for the 7bit 1435 world, aside from waiting for all of the fragments, 1436 reassembling the inner message, and then encoding the 1437 reassembled data in base64 or quoted-printable. Since it is 1438 possible that different fragments might go through different 1439 gateways, even this is not an acceptable solution. For this 1440 reason, it is specified that entities of type message/partial 1441 must always have a content-transfer-encoding of 7bit (the 1442 default). In particular, even in environments that support 1443 binary or 8bit transport, the use of a content-transfer- 1444 encoding of "8bit" or "binary" is explicitly prohibited for 1445 MIME entities of type message/partial. This in turn implies 1446 that the inner message must not use "8bit" or "binary" 1447 encoding. 1449 Because some message transfer agents may choose to 1450 automatically fragment large messages, and because such agents 1451 may use very different fragmentation thresholds, it is 1452 possible that the pieces of a partial message, upon 1453 reassembly, may prove themselves to comprise a partial 1454 message. This is explicitly permitted. 1456 Three parameters must be specified in the Content-Type field 1457 of type message/partial: The first, "id", is a unique 1458 identifier, as close to a world-unique identifier as possible, 1459 to be used to match the fragments together. (In general, the 1460 identifier is essentially a message-id; if placed in double 1461 quotes, it can be ANY message-id, in accordance with the BNF 1462 for "parameter" given earlier in this specification.) The 1463 second, "number", an integer, is the fragment number, which 1464 indicates where this fragment fits into the sequence of 1465 fragments. The third, "total", another integer, is the total 1466 number of fragments. This third subfield is required on the 1467 final fragment, and is optional (though encouraged) on the 1468 earlier fragments. Note also that these parameters may be 1469 given in any order. 1471 Thus, the second piece of a 3-piece message may have either of 1472 the following header fields: 1474 Content-Type: Message/Partial; number=2; total=3; 1475 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1477 Content-Type: Message/Partial; 1478 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 1479 number=2 1481 But the third piece MUST specify the total number of 1482 fragments: 1484 Content-Type: Message/Partial; number=3; total=3; 1485 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1487 Note that fragment numbering begins with 1, not 0. 1489 When the fragments of an entity broken up in this manner are 1490 put together, the result is always a complete MIME entity, 1491 which may have its own Content-Type header field, and thus may 1492 contain any other data type. 1494 7.2.2.1. Message Fragmentation and Reassembly 1496 The semantics of a reassembled partial message must be those 1497 of the "inner" message, rather than of a message containing 1498 the inner message. This makes it possible, for example, to 1499 send a large audio message as several partial messages, and 1500 still have it appear to the recipient as a simple audio 1501 message rather than as an encapsulated message containing an 1502 audio message. That is, the encapsulation of the message is 1503 considered to be "transparent". 1505 When generating and reassembling the pieces of a 1506 message/partial message, the headers of the encapsulated 1507 message must be merged with the headers of the enclosing 1508 entities. In this process the following rules must be 1509 observed: 1511 (1) All of the header fields from the initial enclosing 1512 message, except those that start with "Content-" and 1513 the specific header fields "Subject", "Message-ID", 1514 "Encrypted", and "MIME-Version", must be copied, in 1515 order, to the new message. 1517 (2) The header fields in the enclosed message which start 1518 with "Content-", plus the "Subject", "Message-ID", 1519 "Encrypted", and "MIME-Version" fields, must be 1520 appended, in order, to the header fields of the new 1521 message. Any header fields in the enclosed message 1522 which do not start with "Content-" (except for the 1523 "Subject", "Message-ID", "Encrypted", and "MIME- 1524 Version" fields) will be ignored and dropped. 1526 (3) All of the header fields from the second and any 1527 subsequent enclosing messages are discarded by the 1528 reassembly process. 1530 7.2.2.2. Fragmentation and Reassembly Example 1532 If an audio message is broken into two pieces, the first piece 1533 might look something like this: 1535 X-Weird-Header-1: Foo 1536 From: Bill@host.com 1537 To: joe@otherhost.com 1538 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1539 Subject: Audio mail (part 1 of 2) 1540 Message-ID: 1541 MIME-Version: 1.0 1542 Content-type: message/partial; id="ABC@host.com"; 1543 number=1; total=2 1545 X-Weird-Header-1: Bar 1546 X-Weird-Header-2: Hello 1547 Message-ID: 1548 Subject: Audio mail 1549 MIME-Version: 1.0 1550 Content-type: audio/basic 1551 Content-transfer-encoding: base64 1553 ... first half of encoded audio data goes here ... 1555 and the second half might look something like this: 1557 From: Bill@host.com 1558 To: joe@otherhost.com 1559 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1560 Subject: Audio mail (part 2 of 2) 1561 MIME-Version: 1.0 1562 Message-ID: 1563 Content-type: message/partial; 1564 id="ABC@host.com"; number=2; total=2 1566 ... second half of encoded audio data goes here ... 1568 Then, when the fragmented message is reassembled, the 1569 resulting message to be displayed to the user should look 1570 something like this: 1572 X-Weird-Header-1: Foo 1573 From: Bill@host.com 1574 To: joe@otherhost.com 1575 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1576 Subject: Audio mail 1577 Message-ID: 1578 MIME-Version: 1.0 1579 Content-type: audio/basic 1580 Content-transfer-encoding: base64 1582 ... first half of encoded audio data goes here ... 1583 ... second half of encoded audio data goes here ... 1585 The inclusion of a "References" field in the headers of the 1586 second and subsequent pieces of a fragmented message that 1587 references the Message-Id on the previous piece may be of 1588 benefit to mail readers that understand and track references. 1589 However, the generation of such "References" fields is 1590 entirely optional. 1592 Finally, it should be noted that the "Encrypted" header field 1593 has been made obsolete by Privacy Enhanced Messaging (PEM) 1594 [RFC1421, RFC1422, RFC1423, and RFC1424], but the rules above 1595 are nevertheless believed to describe the correct way to treat 1596 it if it is encountered in the context of conversion to and 1597 from message/partial fragments. 1599 7.2.3. External-Body Subtype 1601 The external-body subtype indicates that the actual body data 1602 are not included, but merely referenced. In this case, the 1603 parameters describe a mechanism for accessing the external 1604 data. 1606 When a MIME entity is of type "message/external-body", it 1607 consists of a header, two consecutive CRLFs, and the message 1608 header for the encapsulated message. If another pair of 1609 consecutive CRLFs appears, this of course ends the message 1610 header for the encapsulated message. However, since the 1611 encapsulated message's body is itself external, it does NOT 1612 appear in the area that follows. For example, consider the 1613 following message: 1615 Content-type: message/external-body; 1616 access-type=local-file; 1617 name="/u/nsb/Me.jpeg" 1619 Content-type: image/jpeg 1620 Content-ID: 1621 Content-Transfer-Encoding: binary 1623 THIS IS NOT REALLY THE BODY! 1625 The area at the end, which might be called the "phantom body", 1626 is ignored for most external-body messages. However, it may 1627 be used to contain auxiliary information for some such 1628 messages, as indeed it is when the access-type is "mail- 1629 server". The only access-type defined in this document that 1630 uses the phantom body is "mail-server", but other access-types 1631 may be defined in the future in other documents that use this 1632 area. 1634 The encapsulated headers in ALL message/external-body entities 1635 MUST include a Content-ID header field to give a unique 1636 identifier by which to reference the data. This identifier 1637 may be used for caching mechanisms, and for recognizing the 1638 receipt of the data when the access-type is "mail-server". 1640 Note that, as specified here, the tokens that describe 1641 external-body data, such as file names and mail server 1642 commands, are required to be in the US-ASCII character set. 1643 If this proves problematic in practice, a new mechanism may be 1644 required as a future extension to MIME, either as newly 1645 defined access-types for message/external-body or by some 1646 other mechanism. 1648 As with message/partial, MIME entities of type 1649 message/external-body MUST have a content-transfer-encoding of 1650 7bit (the default). In particular, even in environments that 1651 support binary or 8bit transport, the use of a content- 1652 transfer-encoding of "8bit" or "binary" is explicitly 1653 prohibited for entities of type message/external-body. 1655 7.2.3.1. General External-Body Parameters 1657 The parameters that may be used with any message/external-body 1658 are: 1660 (1) ACCESS-TYPE -- A word indicating the supported access 1661 mechanism by which the file or data may be obtained. 1662 This word is not case sensitive. Values include, but 1663 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- 1664 FILE", and "MAIL-SERVER". Future values, except for 1665 experimental values beginning with "X-", must be 1666 registered with IANA, as described in RFC MIME-REG. 1667 This parameter is unconditionally mandatory and MUST be 1668 present on EVERY message/external-body. 1670 (2) EXPIRATION -- The date (in the RFC 822 "date-time" 1671 syntax, as extended by RFC 1123 to permit 4 digits in 1672 the year field) after which the existence of the 1673 external data is not guaranteed. This parameter may be 1674 used with ANY access-type and is ALWAYS optional. 1676 (3) SIZE -- The size (in octets) of the data. The intent 1677 of this parameter is to help the recipient decide 1678 whether or not to expend the necessary resources to 1679 retrieve the external data. Note that this describes 1680 the size of the data in its canonical form, that is, 1681 before any Content-Transfer-Encoding has been applied 1682 or after the data have been decoded. This parameter 1683 may be used with ANY access-type and is ALWAYS 1684 optional. 1686 (4) PERMISSION -- A case-insensitive field that indicates 1687 whether or not it is expected that clients might also 1688 attempt to overwrite the data. By default, or if 1689 permission is "read", the assumption is that they are 1690 not, and that if the data is retrieved once, it is 1691 never needed again. If PERMISSION is "read-write", 1692 this assumption is invalid, and any local copy must be 1693 considered no more than a cache. "Read" and "Read- 1694 write" are the only defined values of permission. This 1695 parameter may be used with ANY access-type and is 1696 ALWAYS optional. 1698 The precise semantics of the access-types defined here are 1699 described in the sections that follow. 1701 7.2.3.2. The 'ftp' and 'tftp' Access-Types 1703 An access-type of FTP or TFTP indicates that the message body 1704 is accessible as a file using the FTP [RFC-959] or TFTP [RFC- 1705 783] protocols, respectively. For these access-types, the 1706 following additional parameters are mandatory: 1708 (1) NAME -- The name of the file that contains the actual 1709 body data. 1711 (2) SITE -- A machine from which the file may be obtained, 1712 using the given protocol. This must be a fully 1713 qualified domain name, not a nickname. 1715 (3) Before any data are retrieved, using FTP, the user will 1716 generally need to be asked to provide a login id and a 1717 password for the machine named by the site parameter. 1718 For security reasons, such an id and password are not 1719 specified as content-type parameters, but must be 1720 obtained from the user. 1722 In addition, the following parameters are optional: 1724 (1) DIRECTORY -- A directory from which the data named by 1725 NAME should be retrieved. 1727 (2) MODE -- A case-insensitive string indicating the mode 1728 to be used when retrieving the information. The valid 1729 values for access-type "TFTP" are "NETASCII", "OCTET", 1730 and "MAIL", as specified by the TFTP protocol [RFC- 1731 783]. The valid values for access-type "FTP" are 1732 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a 1733 decimal integer, typically 8. These correspond to the 1734 representation types "A" "E" "I" and "L n" as specified 1735 by the FTP protocol [RFC-959]. Note that "BINARY" and 1736 "TENEX" are not valid values for MODE and that "OCTET" 1737 or "IMAGE" or "LOCAL8" should be used instead. IF MODE 1738 is not specified, the default value is "NETASCII" for 1739 TFTP and "ASCII" otherwise. 1741 7.2.3.3. The 'anon-ftp' Access-Type 1743 The "anon-ftp" access-type is identical to the "ftp" access 1744 type, except that the user need not be asked to provide a name 1745 and password for the specified site. Instead, the ftp 1746 protocol will be used with login "anonymous" and a password 1747 that corresponds to the user's mail address. 1749 7.2.3.4. The 'local-file' Access-Type 1751 An access-type of "local-file" indicates that the actual body 1752 is accessible as a file on the local machine. Two additional 1753 parameters are defined for this access type: 1755 (1) NAME -- The name of the file that contains the actual 1756 body data. This parameter is mandatory for the 1757 "local-file" access-type. 1759 (2) SITE -- A domain specifier for a machine or set of 1760 machines that are known to have access to the data 1761 file. This optional parameter is used to describe the 1762 locality of reference for the data, that is, the site 1763 or sites at which the file is expected to be visible. 1764 Asterisks may be used for wildcard matching to a part 1765 of a domain name, such as "*.bellcore.com", to indicate 1766 a set of machines on which the data should be directly 1767 visible, while a single asterisk may be used to 1768 indicate a file that is expected to be universally 1769 available, e.g., via a global file system. 1771 7.2.3.5. The 'mail-server' Access-Type 1773 The "mail-server" access-type indicates that the actual body 1774 is available from a mail server. Two additional parameters 1775 are defined for this access-type: 1777 (1) SERVER -- The addr-spec of the mail server from which 1778 the actual body data can be obtained. This parameter 1779 is mandatory for the "mail-server" access-type. 1781 (2) SUBJECT -- The subject that is to be used in the mail 1782 that is sent to obtain the data. Note that keying mail 1783 servers on Subject lines is NOT recommended, but such 1784 mail servers are known to exist. This is an optional 1785 parameter. 1787 Because mail servers accept a variety of syntaxes, some of 1788 which is multiline, the full command to be sent to a mail 1789 server is not included as a parameter in the content-type 1790 header field. Instead, it is provided as the "phantom body" 1791 when the media type is message/external-body and the access- 1792 type is mail-server. 1794 Note that MIME does not define a mail server syntax. Rather, 1795 it allows the inclusion of arbitrary mail server commands in 1796 the phantom body. Implementations must include the phantom 1797 body in the body of the message it sends to the mail server 1798 address to retrieve the relevant data. 1800 Unlike other access-types, mail-server access is asynchronous 1801 and will happen at an unpredictable time in the future. For 1802 this reason, it is important that there be a mechanism by 1803 which the returned data can be matched up with the original 1804 message/external-body entity. MIME mail servers must use the 1805 same Content-ID field on the returned message that was used in 1806 the original message/external-body entities, to facilitate 1807 such matching. 1809 7.2.3.6. External-Body Security Issues 1811 Message/external-body entities give rise to two important 1812 security issues: 1814 (1) Accessing data via a message/external-body reference 1815 effectively results in the message recipient performing 1816 an operation that was specified by the message 1817 originator. It is therefore possible for the message 1818 originator to trick a recipient into doing something 1819 they would not have done otherwise. For example, an 1820 originator could specify a action that attempts 1821 retrieval of material that the recipient is not 1822 authorized to obtain, causing the recipient to 1823 unwittingly violate some security policy. For this 1824 reason, user agents capable of resolving external 1825 references must always take steps to describe the 1826 action they are to take to the recipient and ask for 1827 explicit permisssion prior to performing it. 1829 The 'mail-server' access-type is particularly 1830 vulnerable, in that it causes the recipient to send a 1831 new message whose contents are specified by the 1832 original message's originator. Given the potential for 1833 abuse, any such request messages that are constructed 1834 should contain a clear indication that they were 1835 generated automatically (e.g. in a Comments: header 1836 field) in an attempt to resolve a MIME 1837 message/external-body reference. 1839 (2) MIME will sometimes be used in environments that 1840 provide some guarantee of message integrity and 1841 authenticity. If present, such guarantees may apply 1842 only to the actual direct content of messages -- they 1843 may or may not apply to data accessed through MIME's 1844 message/external-body mechanism. In particular, it may 1845 be possible to subvert certain access mechanisms even 1846 when the messaging system itself is secure. 1848 It should be noted that this problem exists either with 1849 or without the availabilty of MIME mechanisms. A 1850 casual reference to an FTP site containing a document 1851 in the text of a secure message brings up similar 1852 issues -- the only difference is that MIME provides for 1853 automatic retrieval of such material, and users may 1854 place unwarranted trust is such automatic retrieval 1855 mechanisms. 1857 7.2.3.7. Examples and Further Explanations 1859 When the external-body mechanism is used in conjunction with 1860 the multipart/alternative media type it extends the 1861 functionality of multipart/alternative to include the case 1862 where the same entity is provided in the same format but via 1863 different accces mechanisms. When this is done the originator 1864 of the message must order the parts first in terms of 1865 preferred formats and then by preferred access mechanisms. 1866 The recipient's viewer should then evaluate the list both in 1867 terms of format and access mechanisms. 1869 With the emerging possibility of very wide-area file systems, 1870 it becomes very hard to know in advance the set of machines 1871 where a file will and will not be accessible directly from the 1872 file system. Therefore it may make sense to provide both a 1873 file name, to be tried directly, and the name of one or more 1874 sites from which the file is known to be accessible. An 1875 implementation can try to retrieve remote files using FTP or 1876 any other protocol, using anonymous file retrieval or 1877 prompting the user for the necessary name and password. If an 1878 external body is accessible via multiple mechanisms, the 1879 sender may include multiple entities of type 1880 message/external-body within the body parts of an enclosing 1881 multipart/alternative entity. 1883 However, the external-body mechanism is not intended to be 1884 limited to file retrieval, as shown by the mail-server 1885 access-type. Beyond this, one can imagine, for example, using 1886 a video server for external references to video clips. 1888 The embedded message header fields which appear in the body of 1889 the message/external-body data must be used to declare the 1890 media type of the external body if it is anything other than 1891 plain US-ASCII text, since the external body does not have a 1892 header section to declare its type. Similarly, any Content- 1893 transfer-encoding other than "7bit" must also be declared 1894 here. Thus a complete message/external-body message, 1895 referring to a document in PostScript format, might look like 1896 this: 1898 From: Whomever 1899 To: Someone 1900 Date: Whenever 1901 Subject: whatever 1902 MIME-Version: 1.0 1903 Message-ID: 1904 Content-Type: multipart/alternative; boundary=42 1905 Content-ID: 1907 --42 1908 Content-Type: message/external-body; name="BodyFormats.ps"; 1909 site="thumper.bellcore.com"; mode="image"; 1910 access-type=ANON-FTP; directory="pub"; 1911 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1913 Content-type: application/postscript 1914 Content-ID: 1916 --42 1917 Content-Type: message/external-body; access-type=local-file; 1918 name="/u/nsb/writing/rfcs/RFC-MIME.ps"; 1919 site="thumper.bellcore.com"; 1920 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1922 Content-type: application/postscript 1923 Content-ID: 1925 --42 1926 Content-Type: message/external-body; 1927 access-type=mail-server 1928 server="listserv@bogus.bitnet"; 1929 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1931 Content-type: application/postscript 1932 Content-ID: 1934 get RFC-MIME.DOC 1936 --42-- 1938 Note that in the above examples, the default Content- 1939 transfer-encoding of "7bit" is assumed for the external 1940 postscript data. 1942 Like the message/partial type, the message/external-body media 1943 type is intended to be transparent, that is, to convey the 1944 data type in the external body rather than to convey a message 1945 with a body of that type. Thus the headers on the outer and 1946 inner parts must be merged using the same rules as for 1947 message/partial. In particular, this means that the Content- 1948 type and Subject fields are overridden, but the From field is 1949 preserved. 1951 Note that since the external bodies are not transported along 1952 with the external body reference, they need not conform to 1953 transport limitations that apply to the reference itself. In 1954 particular, Internet mail transports may impose 7bit and line 1955 length limits, but these do not automatically apply to binary 1956 external body references. Thus a Content-Transfer-Encoding is 1957 not generally necessary, though it is permitted. 1959 Note that the body of a message of type "message/external- 1960 body" is governed by the basic syntax for an RFC 822 message. 1961 In particular, anything before the first consecutive pair of 1962 CRLFs is header information, while anything after it is body 1963 information, which is ignored for most access-types. 1965 7.2.4. Other Message Subtypes 1967 MIME implementations must in general treat unrecognized 1968 subtypes of message as being equivalent to 1969 "application/octet-stream". 1971 Future subtypes of message intended for use with email should 1972 be restricted to "7bit" encoding. A type other than message 1973 should be used if restriction to "7bit" is not possible. 1975 8. Experimental Media Type Values 1977 A media type value beginning with the characters "X-" is a 1978 private value, to be used by consenting systems by mutual 1979 agreement. Any format without a rigorous and public 1980 definition must be named with an "X-" prefix, and publicly 1981 specified values shall never begin with "X-". (Older versions 1982 of the widely used Andrew system use the "X-BE2" name, so new 1983 systems should probably choose a different name.) 1984 In general, the use of "X-" top-level types is strongly 1985 discouraged. Implementors should invent subtypes of the 1986 existing types whenever possible. In many cases, a subtype of 1987 application will be more appropriate than a new top-level 1988 type. 1990 9. Summary 1992 The five discrete media types provide provide a standardized 1993 mechanism for tagging entities as audio, image, or several 1994 other kinds of data. The composite "multipart" and "message" 1995 media types allow mixing and hierarchical structuring of 1996 entities of different types in a single message. A 1997 distinguished parameter syntax allows further specification of 1998 data format details, particularly the specification of 1999 alternate character sets. Additional optional header fields 2000 provide mechanisms for certain extensions deemed desirable by 2001 many implementors. Finally, a number of useful media types are 2002 defined for general use by consenting user agents, notably 2003 message/partial, and message/external-body. 2005 10. Security Considerations 2007 Security issues are discussed in the context of the 2008 application/postscript type, the message/external-body type, 2009 and in RFC MIME-REG. Implementors should pay special 2010 attention to the security implications of any media types that 2011 can cause the remote execution of any actions in the 2012 recipient's environment. In such cases, the discussion of the 2013 application/postscript type may serve as a model for 2014 considering other media types with remote execution 2015 capabilities. 2017 11. Authors' Addresses 2019 For more information, the authors of this document are best 2020 contacted via Internet mail: 2022 Nathaniel S. Borenstein 2023 First Virtual Holdings 2024 25 Washington Avenue 2025 Morristown, NJ 07960 2026 USA 2028 Email: nsb@nsb.fv.com 2029 Phone: +1 201 540 8967 2030 Fax: +1 201 993 3032 2032 Ned Freed 2033 Innosoft International, Inc. 2034 1050 East Garvey Avenue South 2035 West Covina, CA 91790 2036 USA 2038 Email: ned@innosoft.com 2039 Phone: +1 818 919 3600 2040 Fax: +1 818 919 3614 2042 MIME is a result of the work of the Internet Engineering Task 2043 Force Working Group on Email Extensions. The chairman of that 2044 group, Greg Vaudreuil, may be reached at: 2046 Gregory M. Vaudreuil 2047 Octel Network Services 2048 17080 Dallas Parkway 2049 Dallas, TX 75248-1905 2050 USA 2052 Email: Greg.Vaudreuil@Octel.Com 2053 Appendix A -- Collected Grammar 2055 This appendix contains the complete BNF grammar for all the 2056 syntax specified by this document. 2058 By itself, however, this grammar is incomplete. It refers by 2059 name to several syntax rules that are defined by RFC 822. 2060 Rather than reproduce those definitions here, and risk 2061 unintentional differences between the two, this document 2062 simply refers the reader to RFC 822 for the remaining 2063 definitions. Wherever a term is undefined, it refers to the 2064 RFC 822 definition. 2066 boundary := 0*69 bcharsnospace 2068 bchars := bcharsnospace / " " 2070 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 2071 "+" / "_" / "," / "-" / "." / 2072 "/" / ":" / "=" / "?" 2074 body-part := <"message" as defined in RFC 822, with all 2075 header fields optional, not starting with the 2076 specified dash-boundary, and with the 2077 delimiter not occurring anywhere in the 2078 body part. Note that the semantics of a 2079 part differ from the semantics of a message, 2080 as described in the text.> 2082 close-delimiter := delimiter "--" 2084 dash-boundary := "--" boundary 2085 ; boundary taken from the value of 2086 ; boundary parameter of the 2087 ; Content-Type field. 2089 delimiter := CRLF dash-boundary 2091 discard-text := *(*text CRLF) 2092 ; May be ignored or discarded. 2094 encapsulation := delimiter transport-padding 2095 CRLF body-part 2097 epilogue := discard-text 2099 multipart-body := [preamble CRLF] 2100 dash-boundary transport-padding CRLF 2101 body-part *encapsulation 2102 close-delimiter transport-padding 2103 [CRLF epilogue] 2105 preamble := discard-text 2107 transport-padding := *LWSP-char 2108 ; Composers MUST NOT generate 2109 ; non-zero length transport 2110 ; padding, but receivers MUST 2111 ; be able to handle padding 2112 ; added by message transports.