idnits 2.17.1 draft-ietf-822ext-mime-imt-02.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 314: '...y MIME text type MUST represent a line...' RFC 2119 keyword, line 316: '...in text MUST represent a line break. ...' RFC 2119 keyword, line 397: '...racter encodings MUST use an appropria...' RFC 2119 keyword, line 470: '...CII characters, it SHOULD be marked as...' RFC 2119 keyword, line 860: '...undary delimiter MUST NOT appear insid...' (8 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 481 has weird spacing: '...of text is "p...' == Line 955 has weird spacing: '...F (line break...' == Line 1713 has weird spacing: '...ed, the defau...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1995) is 10359 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-1341' on line 215 looks like a reference -- Missing reference section? 'RFC-1563' on line 327 looks like a reference -- Missing reference section? 'RFC-821' on line 366 looks like a reference -- Missing reference section? 'ISO-646' on line 378 looks like a reference -- Missing reference section? 'US-ASCII' on line 428 looks like a reference -- Missing reference section? 'ISO-8859' on line 431 looks like a reference -- Missing reference section? 'JPEG' on line 502 looks like a reference -- Missing reference section? 'PCM' on line 532 looks like a reference -- Missing reference section? 'MPEG' on line 550 looks like a reference -- Missing reference section? 'POSTSCRIPT' on line 642 looks like a reference -- Missing reference section? 'POSTSCRIPT2' on line 643 looks like a reference -- Missing reference section? 'MIME-IMB' on line 877 looks like a reference -- Missing reference section? 'RFC-959' on line 1710 looks like a reference -- Missing reference section? 'RFC-783' on line 1705 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 4 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part Two: 8 Media Types 10 December 1995 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822 defines a message representation protocol 35 specifying considerable detail about US-ASCII message headers, 36 but which leaves the message content, or message body, as flat 37 US-ASCII text. This set of documents, collectively called the 38 Multipurpose Internet Mail Extensions, or MIME, redefines the 39 format of messages to allow for 40 (1) textual message bodies in character sets other than 41 US-ASCII, 43 (2) non-textual message bodies, 45 (3) multi-part message bodies, and 47 (4) textual header information in character sets other than 48 US-ASCII. 50 These documents are based on earlier work documented in RFC 51 934, STD 11, and RFC 1049, but extends and revises them. 52 Because RFC 822 said so little about message bodies, these 53 documents are largely orthogonal to (rather than a revision 54 of) RFC 822. 56 In particular, these documents are designed to provide 57 facilities to include multiple parts in a single message, to 58 represent body and header text in character sets other than 59 US-ASCII, to represent formatted multi-font text messages, to 60 represent non-textual material such as images and audio clips, 61 and generally to facilitate later extensions defining new 62 types of Internet mail for use by cooperating mail agents. 64 The initial document in this set, RFC MIME-IMB, specifies the 65 various headers used to describe the structure of MIME 66 messages. This second document defines the general structure 67 of the MIME media typing system and defines an initial set of 68 media types. The third document, RFC MIME-HEADERS, describes 69 extensions to RFC 822 to allow non-US-ASCII text data in 70 Internet mail header fields. The fourth document, RFC MIME- 71 REG, specifies various IANA registration procedures for MIME- 72 related facilities. The fifth and final document, RFC MIME- 73 CONF, describes MIME conformance criteria as well as providing 74 some illustrative examples of MIME message formats, 75 acknowledgements, and the bibliography. 77 These documents are revisions of RFCs 1521 and 1522, which 78 themselves were revisions of RFCs 1341 and 1342. An appendix 79 in RFC MIME-CONF describes differences and changes from 80 previous versions. 82 2. Table of Contents 84 1 Abstract .............................................. 1 85 2 Table of Contents ..................................... 3 86 3 Introduction .......................................... 4 87 4 Definition of a Top-Level Media Type .................. 5 88 5 Overview Of The Initial Top-Level Media Types ......... 5 89 6 Discrete Media Type Values ............................ 7 90 6.1 Text Media Type ..................................... 7 91 6.1.1 Representation of Line Breaks ..................... 8 92 6.1.2 Charset Parameter ................................. 8 93 6.1.3 Plain Subtype ..................................... 12 94 6.1.4 Unrecognized Subtypes ............................. 12 95 6.2 Image Media Type .................................... 12 96 6.3 Audio Media Type .................................... 13 97 6.4 Video Media Type .................................... 13 98 6.5 Application Media Type .............................. 14 99 6.5.1 Octet-Stream Subtype .............................. 15 100 6.5.2 PostScript Subtype ................................ 15 101 6.5.3 Other Application Subtypes ........................ 19 102 7 Composite Media Type Values ........................... 19 103 7.1 Multipart Media Type ................................ 19 104 7.1.1 Common Syntax ..................................... 21 105 7.1.2 Handling Nested Messages and Multiparts ........... 27 106 7.1.3 Mixed Subtype ..................................... 28 107 7.1.4 Alternative Subtype ............................... 28 108 7.1.5 Digest Subtype .................................... 30 109 7.1.6 Parallel Subtype .................................. 31 110 7.1.7 Other Multipart Subtypes .......................... 32 111 7.2 Message Media Type .................................. 32 112 7.2.1 RFC822 Subtype .................................... 32 113 7.2.2 Partial Subtype ................................... 33 114 7.2.2.1 Message Fragmentation and Reassembly ............ 34 115 7.2.2.2 Fragmentation and Reassembly Example ............ 35 116 7.2.3 External-Body Subtype ............................. 37 117 7.2.4 Other Message Subtypes ............................ 46 118 8 Experimental Media Type Values ........................ 46 119 9 Summary ............................................... 47 120 10 Security Considerations .............................. 47 121 11 Authors' Addresses ................................... 48 122 A Collected Grammar ..................................... 49 123 3. Introduction 125 The first document in this set, RFC MIME-IMB, defines a number 126 of header fields, including Content-Type. The Content-Type 127 field is used to specify the nature of the data in the body of 128 a MIME entity, by giving media type and subtype identifiers, 129 and by providing auxiliary information that may be required 130 for certain media types. After the type and subtype names, 131 the remainder of the header field is simply a set of 132 parameters, specified in an attribute/value notation. The 133 ordering of parameters is not significant. 135 In general, the top-level media type is used to declare the 136 general type of data, while the subtype specifies a specific 137 format for that type of data. Thus, a media type of 138 "image/xyz" is enough to tell a user agent that the data is an 139 image, even if the user agent has no knowledge of the specific 140 image format "xyz". Such information can be used, for 141 example, to decide whether or not to show a user the raw data 142 from an unrecognized subtype -- such an action might be 143 reasonable for unrecognized subtypes of text, but not for 144 unrecognized subtypes of image or audio. For this reason, 145 registered subtypes of text, image, audio, and video should 146 not contain embedded information that is really of a different 147 type. Such compound formats should be represented using the 148 "multipart" or "application" types. 150 Parameters are modifiers of the media subtype, and as such do 151 not fundamentally affect the nature of the content. The set 152 of meaningful parameters depends on the media type and 153 subtype. Most parameters are associated with a single 154 specific subtype. However, a given top-level media type may 155 define parameters which are applicable to any subtype of that 156 type. Parameters may be required by their defining media type 157 or subtype or they may be optional. MIME implementations must 158 also ignore any parameters whose names they do not recognize. 160 MIME's Content-Type header field and media type mechanism has 161 been carefully designed to be extensible, and it is expected 162 that the set of media type/subtype pairs and their associated 163 parameters will grow significantly over time. Several other 164 MIME facilities, most notably the list of the name of 165 character sets registered for MIME usage, are likely to have 166 new values defined over time. In order to ensure that the set 167 of such values is developed in an orderly, well-specified, and 168 public manner, MIME sets up a registration process which uses 169 the Internet Assigned Numbers Authority (IANA) as a central 170 registry for MIME's extension areas. The registration process 171 is described in a companion document, RFC MIME-REG. 173 The initial seven standard top-level media type are defined 174 and described in the remainder of this document. 176 4. Definition of a Top-Level Media Type 178 The definition of a top-level media type consists of: 180 (1) a name and a description of the type, including 181 criteria for whether a particular type would qualify 182 under that type, 184 (2) the names and definitions of parameters, if any, which 185 are defined for all subtypes of that type (including 186 whether such parameters are required or optional), 188 (3) how a user agent and/or gateway should handle unknown 189 subtypes of this type, 191 (4) general considerations on gatewaying entities of this 192 top-level type, if any, and 194 (5) any restrictions on content-transfer-encodings for 195 entities of this top-level type. 197 5. Overview Of The Initial Top-Level Media Types 199 The five discrete top-level media types are: 201 (1) text -- textual information. The subtype "plain" in 202 particular indicates plain (unformatted) text. No 203 special software is required to get the full meaning of 204 the text, aside from support for the indicated 205 character set. Other subtypes are to be used for 206 enriched text in forms where application software may 207 enhance the appearance of the text, but such software 208 must not be required in order to get the general idea 209 of the content. Possible subtypes thus include any 210 word processor format that can be read without 211 resorting to software that understands the format. In 212 particular, formats that employ embeddded binary 213 formatting information are not considered directly 214 readable. A very simple and portable subtype, 215 "richtext", was defined in RFC 1341 [RFC-1341], with a 216 further revision in RFC 1563 [RFC-1563] under the name 217 "enriched". 219 (2) image -- image data. Image requires a display device 220 (such as a graphical display, a graphics printer, or a 221 FAX machine) to view the information. An initial 222 subtype is defined for the widely-used image format 223 JPEG. 225 (3) audio -- audio data. Audio requires an audio output 226 device (such as a speaker or a telephone) to "display" 227 the contents. An initial subtype "basic" is defined in 228 this document. 230 (4) video -- video data. Video requires the capability to 231 display moving images, typically including specialized 232 hardware and software. An initial subtype "mpeg" is 233 defined in this document. 235 (5) application -- some other kind of data, typically 236 either uninterpreted binary data or information to be 237 processed by an application. The subtype "octet- 238 stream" is to be used in the case of uninterpreted 239 binary data, in which case the simplest recommended 240 action is to offer to write the information into a file 241 for the user. The "PostScript" subtype is also defined 242 for the transport of PostScript material. Other 243 expected uses for "application" include spreadsheets, 244 data for mail-based scheduling systems, and languages 245 for "active" (computational) messaging, and word 246 processing formats that are not directly readable. 247 Note that security considerations may exist for some 248 types of application data, most notably 249 application/PostScript and any form of active 250 messaging. These issues are discussed later in this 251 document. 253 The two composite top-level media types are: 255 (1) multipart -- data consisting of multiple entities of 256 independent data types. Four subtypes are initially 257 defined, including the basic "mixed" subtype specifying 258 a generic mixed set of parts, "alternative" for 259 representing the same data in multiple formats, 260 "parallel" for parts intended to be viewed 261 simultaneously, and "digest" for multipart entities in 262 which each part has a default type of "message/rfc822". 264 (2) message -- an encapsulated message. A body of media 265 type "message" is itself all or a portion of some kind 266 of message object. Such objects may or may not in turn 267 contain other entities. The "rfc822" subtype is used 268 when the encapsulated content is itself an RFC 822 269 message. The "partial" subtype is defined for partial 270 RFC 822 messages, to permit the fragmented transmission 271 of bodies that are thought to be too large to be passed 272 through transport facilities in one piece. Another 273 subtype, "external-body", is defined for specifying 274 large bodies by reference to an external data source. 276 It should be noted that the list of media type values given 277 here may be augmented in time, via the mechanisms described 278 above, and that the set of subtypes is expected to grow 279 substantially. 281 6. Discrete Media Type Values 283 Five of the seven initial media type values refer to discrete 284 bodies. The content of these types must be handled by non-MIME 285 mechanisms; they are opaque to MIME processors. 287 6.1. Text Media Type 289 The text media type is intended for sending material which is 290 principally textual in form. A "charset" parameter may be 291 used to indicate the character set of the body text for some 292 text subtypes, notably including the subtype "text/plain", 293 which indicates plain (unformatted) text. 295 Beyond plain text, there are many formats for representing 296 what might be known as "extended text" -- text with embedded 297 formatting and presentation information. An interesting 298 characteristic of many such representations is that they are 299 to some extent readable even without the software that 300 interprets them. It is useful, then, to distinguish them, at 301 the highest level, from such unreadable data as images, audio, 302 or text represented in an unreadable form. In the absence of 303 appropriate interpretation software, it is reasonable to show 304 subtypes of text to the user, while it is not reasonable to do 305 so with most nontextual data. 307 Such formatted textual data should be represented using 308 subtypes of text. Plausible subtypes of text are typically 309 given by the common name of the representation format, e.g., 310 "text/enriched" [RFC-1563]. 312 6.1.1. Representation of Line Breaks 314 The canonical form of any MIME text type MUST represent a line 315 break as a CRLF sequence. Similarly, any occurrence of CRLF 316 in text MUST represent a line break. Use of CR and LF outside 317 of line break sequences is also forbidden. 319 This rule applies regardless of format or character set or 320 sets involved. 322 NOTE: The proper interpretation of line breaks when a body is 323 displayed depends on the media type. In particular, while it 324 is appropriate to treat a line break as a transition to a new 325 line when displaying a text/plain body, this treatment is 326 actually incorrect for other subtypes of text like 327 text/enriched [RFC-1563]. 329 6.1.2. Charset Parameter 331 A critical parameter that may be specified in the Content-Type 332 field for text/plain data is the character set. This is 333 specified with a "charset" parameter, as in: 335 Content-type: text/plain; charset=iso-8859-1 337 Unlike some other parameter values, the values of the charset 338 parameter are NOT case sensitive. The default character set, 339 which must be assumed in the absence of a charset parameter, 340 is US-ASCII. 342 The specification for any future subtypes of "text" must 343 specify whether or not they will also utilize a "charset" 344 parameter, and may possibly restrict its values as well. When 345 used with a particular body, the semantics of the "charset" 346 parameter should be identical to those specified here for 347 "text/plain", i.e., the body consists entirely of characters 348 in the given charset. In particular, definers of future text 349 subtypes should pay close attention to the implications of 350 multioctet character sets for their subtype definitions. 352 This RFC specifies the definition of the charset parameter for 353 the purposes of MIME to be the name of a character set, as 354 "character set" as defined in MIME-IMB. The rules regarding 355 line breaks detailed in the previous section must also be 356 observed -- a character set whose definition does not conform 357 to these rules cannot be used in a MIME text type. 359 An initial list of predefined character set names can be found 360 at the end of this section. Additional character sets may be 361 registered with IANA as described in RFC MIME-REG. 363 Note that if the specified character set includes 8bit data, a 364 Content-Transfer-Encoding header field and a corresponding 365 encoding on the data are required in order to transmit the 366 body via some mail transfer protocols, such as SMTP [RFC-821]. 368 The default character set, US-ASCII, has been the subject of 369 some confusion and ambiguity in the past. Not only were there 370 some ambiguities in the definition, there have been wide 371 variations in practice. In order to eliminate such ambiguity 372 and variations in the future, it is strongly recommended that 373 new user agents explicitly specify a character set as a media 374 type parameter in the Content-Type header field. "US-ASCII" 375 does not indicate an arbitrary -bit character code, but 376 specifies that the body uses character coding that uses the 377 exact correspondence of octets to characters specified in US- 378 ASCII. National use variations of ISO 646 [ISO-646] are NOT 379 US-ASCII and their use in Internet mail is explicitly 380 discouraged. The omission of the ISO 646 character set from 381 this document is deliberate in this regard. The character set 382 name of "US-ASCII" explicitly refers to ANSI X3.4-1986 [US- 383 ASCII] only. The character set name "ASCII" is reserved and 384 must not be used for any purpose. 386 NOTE: RFC 821 explicitly specifies "ASCII", and references an 387 earlier version of the American Standard. Insofar as one of 388 the purposes of specifying a media type and character set is 389 to permit the receiver to unambiguously determine how the 390 sender intended the coded message to be interpreted, assuming 391 anything other than "strict ASCII" as the default would risk 392 unintentional and incompatible changes to the semantics of 393 messages now being transmitted. This also implies that 394 messages containing characters coded according to national 395 variations on ISO 646, or using code-switching procedures 396 (e.g., those of ISO 2022), as well as 8bit or multiple octet 397 character encodings MUST use an appropriate character set 398 specification to be consistent with this specification. 400 The complete US-ASCII character set is listed in ANSI X3.4- 401 1986. Note that the control characters including DEL (0-31, 402 127) have no defined meaning apart from the combination CRLF 403 (US-ASCII values 13 and 10) indicating a new line. Two of the 404 characters have de facto meanings in wide use: FF (12) often 405 means "start subsequent text on the beginning of a new page"; 406 and TAB or HT (9) often (though not always) means "move the 407 cursor to the next available column after the current position 408 where the column number is a multiple of 8 (counting the first 409 column as column 0)." Aside from these conventions, any use 410 of the control characters or DEL in a body must occur within 411 the context of a private agreement between the sender and 412 recipient. Such private agreements are discouraged and should 413 be replaced by the other capabilities of this document. 415 NOTE: Beyond US-ASCII, an enormous proliferation of character 416 sets is possible. It is the opinion of the IETF working group 417 that a large number of character sets is NOT a good thing. We 418 would prefer to specify a SINGLE character set that can be 419 used universally for representing all of the world's languages 420 in Internet mail. Unfortunately, existing practice in several 421 communities seems to point to the continued use of multiple 422 character sets in the near future. For this reason, we define 423 names for a small number of character sets for which a strong 424 constituent base exists. 426 The defined charset values are: 428 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. 430 (2) ISO-8859-X -- where "X" is to be replaced, as 431 necessary, for the parts of ISO-8859 [ISO-8859]. Note 432 that the ISO 646 character sets have deliberately been 433 omitted in favor of their 8859 replacements, which are 434 the designated character sets for Internet mail. As of 435 the publication of this document, the legitimate values 436 for "X" are the digits 1 through 9. 438 All of these character sets are used as pure 7bit or 8bit sets 439 without any shift or escape functions. The meaning of shift 440 and escape sequences in these character sets is not defined. 442 The character sets specified above are the ones that were 443 relatively uncontroversial during the drafting of MIME. This 444 document does not endorse the use of any particular character 445 set other than US-ASCII, and recognizes that the future 446 evolution of world character sets remains unclear. It is 447 expected that in the future, additional character sets will be 448 registered for use in MIME. 450 Note that the character set used, if anything other than US- 451 ASCII, must always be explicitly specified in the Content-Type 452 field. 454 No other character set name may be used in Internet mail 455 without the publication of a formal specification and its 456 registration with IANA, or by private agreement, in which case 457 the character set name must begin with "X-". 459 Implementors are discouraged from defining new character sets 460 unless absolutely necessary. 462 The "charset" parameter has been defined primarily for the 463 purpose of textual data, and is described in this section for 464 that reason. However, it is conceivable that non-textual data 465 might also wish to specify a charset value for some purpose, 466 in which case the same syntax and values should be used. 468 In general, composition software should always use the "lowest 469 common denominator" character set possible. For example, if a 470 body contains only US-ASCII characters, it SHOULD be marked as 471 being in the US-ASCII character set, not ISO-8859-1, which, 472 like all the ISO-8859 family of character sets, is a superset 473 of US-ASCII. More generally, if a widely-used character set 474 is a subset of another character set, and a body contains only 475 characters in the widely-used subset, it should be labelled as 476 being in that subset. This will increase the chances that the 477 recipient will be able to view the resulting entity correctly. 479 6.1.3. Plain Subtype 481 The simplest and most important subtype of text is "plain". 482 This indicates plain (unformatted) text. The default media 483 type of "text/plain; charset=us-ascii" for Internet mail 484 describes existing Internet practice. That is, it is the type 485 of body defined by RFC 822. 487 No other text subtype is defined by this document. 489 6.1.4. Unrecognized Subtypes 491 Unrecognized subtypes of text should be treated as subtype 492 "plain" as long as the MIME implementation knows how to handle 493 the charset. Unrecognized subtypes which also specify an 494 unrecognized charset should be treated as "application/octet- 495 stream". 497 6.2. Image Media Type 499 A media type of "image" indicates that the body contains an 500 image. The subtype names the specific image format. These 501 names are not case sensitive. An initial subtype is "jpeg" for 502 the JPEG format using JFIF encoding [JPEG]. 504 The list of image subtypes given here is neither exclusive nor 505 exhaustive, and is expected to grow as more types are 506 registered with IANA, as described in RFC MIME-REG. 508 Unrecognized subtypes of image should at a miniumum be treated 509 as "application/octet-stream". Implementations may optionally 510 elect to pass subtypes of image that they do not specifically 511 recognize to a secure and robust general-purpose image viewing 512 application, if such an application is available. 514 NOTE: Using of a generic-purpose image viewing application 515 this way inherits the security problems of the most dangerous 516 type supported by the application. 518 6.3. Audio Media Type 520 A media type of "audio" indicates that the body contains audio 521 data. Although there is not yet a consensus on an "ideal" 522 audio format for use with computers, there is a pressing need 523 for a format capable of providing interoperable behavior. 525 The initial subtype of "basic" is specified to meet this 526 requirement by providing an absolutely minimal lowest common 527 denominator audio format. It is expected that richer formats 528 for higher quality and/or lower bandwidth audio will be 529 defined by a later document. 531 The content of the "audio/basic" subtype is single channel 532 audio encoded using 8bit ISDN mu-law [PCM] at a sample rate of 533 8000 Hz. 535 Unrecognized subtypes of audio should at a miniumum be treated 536 as "application/octet-stream". Implementations may optionally 537 elect to pass subtypes of audio that they do not specifically 538 recognize to a robust general-purpose audio playing 539 application, if such an application is available. 541 6.4. Video Media Type 543 A media type of "video" indicates that the body contains a 544 time-varying-picture image, possibly with color and 545 coordinated sound. The term "video" is used extremely 546 generically, rather than with reference to any particular 547 technology or format, and is not meant to preclude subtypes 548 such as animated drawings encoded compactly. The subtype 549 "mpeg" refers to video coded according to the MPEG standard 550 [MPEG]. 552 Note that although in general this document strongly 553 discourages the mixing of multiple media in a single body, it 554 is recognized that many so-called "video" formats include a 555 representation for synchronized audio, and this is explicitly 556 permitted for subtypes of "video". 558 Unrecognized subtypes of video should at a minumum be treated 559 as "application/octet-stream". Implementations may optionally 560 elect to pass subtypes of video that they do not specifically 561 recognize to a robust general-purpose video display 562 application, if such an application is available. 564 6.5. Application Media Type 566 The "application" media type is to be used for discrete data 567 which do not fit in any of the other categories, and 568 particularly for data to be processed by some type of 569 application program. This is information which must be 570 processed by an application before it is viewable or usable by 571 a user. Expected uses for the application media type include 572 file transfer, spreadsheets, data for mail-based scheduling 573 systems, and languages for "active" (computational) material. 574 (The latter, in particular, can pose security problems which 575 must be understood by implementors, and are considered in 576 detail in the discussion of the application/PostScript media 577 type.) 579 For example, a meeting scheduler might define a standard 580 representation for information about proposed meeting dates. 581 An intelligent user agent would use this information to 582 conduct a dialog with the user, and might then send additional 583 material based on that dialog. More generally, there have 584 been several "active" messaging languages developed in which 585 programs in a suitably specialized language are transported to 586 a remote location and automatically run in the recipient's 587 environment. 589 Such applications may be defined as subtypes of the 590 "application" media type. This document defines two subtypes: 591 octet-stream, and PostScript. 593 The subtype of application will often be the name of the 594 application for which the data are intended. This does not 595 mean, however, that any application program name may be used 596 freely as a subtype of application. Usage of any subtype 597 (other than subtypes beginning with "x-") must be registered 598 with IANA, as described in RFC MIME-REG. 600 6.5.1. Octet-Stream Subtype 602 The "octet-stream" subtype is used to indicate that a body 603 contains arbitrary binary data. The set of currently defined 604 parameters is: 606 (1) TYPE -- the general type or category of binary data. 607 This is intended as information for the human recipient 608 rather than for any automatic processing. 610 (2) PADDING -- the number of bits of padding that were 611 appended to the bit-stream comprising the actual 612 contents to produce the enclosed 8bit byte-oriented 613 data. This is useful for enclosing a bit-stream in a 614 body when the total number of bits is not a multiple of 615 8. 617 Both of these parameters are optional. 619 An additional parameter, "CONVERSIONS", was defined in RFC 620 1341 but has since been removed. RFC 1341 also defined the 621 use of a "NAME" parameter which gave a suggested file name to 622 be used if the data were to be written to a file. This has 623 been deprecated in anticipation of a separate Content- 624 Disposition header field, to be defined in a subsequent RFC. 626 The recommended action for an implementation that receives an 627 application/octet-stream entity is to simply offer to put the 628 data in a file, with any Content-Transfer-Encoding undone, or 629 perhaps to use it as input to a user-specified process. 631 To reduce the danger of transmitting rogue programs, it is 632 strongly recommended that implementations NOT implement a 633 path-search mechanism whereby an arbitrary program named in 634 the Content-Type parameter (e.g., an "interpreter=" parameter) 635 is found and executed using the message body as input. 637 6.5.2. PostScript Subtype 639 A media type of "application/postscript" indicates a 640 PostScript program. Currently two variants of the PostScript 641 language are allowed; the original level 1 variant is 642 described in [POSTSCRIPT] and the more recent level 2 variant 643 is described in [POSTSCRIPT2]. 645 PostScript is a registered trademark of Adobe Systems, Inc. 646 Use of the MIME media type "application/postscript" implies 647 recognition of that trademark and all the rights it entails. 649 The PostScript language definition provides facilities for 650 internal labelling of the specific language features a given 651 program uses. This labelling, called the PostScript document 652 structuring conventions, or DSC, is very general and provides 653 substantially more information than just the language level. 654 The use of document structuring conventions, while not 655 required, is strongly recommended as an aid to 656 interoperability. Documents which lack proper structuring 657 conventions cannot be tested to see whether or not they will 658 work in a given environment. As such, some systems may assume 659 the worst and refuse to process unstructured documents. 661 The execution of general-purpose PostScript interpreters 662 entails serious security risks, and implementors are 663 discouraged from simply sending PostScript bodies to "off- 664 the-shelf" interpreters. While it is usually safe to send 665 PostScript to a printer, where the potential for harm is 666 greatly constrained by typical printer environments, 667 implementors should consider all of the following before they 668 add interactive display of PostScript bodies to their MIME 669 readers. 671 The remainder of this section outlines some, though probably 672 not all, of the possible problems with the transport of 673 PostScript entities. 675 (1) Dangerous operations in the PostScript language 676 include, but may not be limited to, the PostScript 677 operators "deletefile", "renamefile", "filenameforall", 678 and "file". "File" is only dangerous when applied to 679 something other than standard input or output. 680 Implementations may also define additional nonstandard 681 file operators; these may also pose a threat to 682 security. "Filenameforall", the wildcard file search 683 operator, may appear at first glance to be harmless. 684 Note, however, that this operator has the potential to 685 reveal information about what files the recipient has 686 access to, and this information may itself be 687 sensitive. Message senders should avoid the use of 688 potentially dangerous file operators, since these 689 operators are quite likely to be unavailable in secure 690 PostScript implementations. Message receiving and 691 displaying software should either completely disable 692 all potentially dangerous file operators or take 693 special care not to delegate any special authority to 694 their operation. These operators should be viewed as 695 being done by an outside agency when interpreting 696 PostScript documents. Such disabling and/or checking 697 should be done completely outside of the reach of the 698 PostScript language itself; care should be taken to 699 insure that no method exists for re-enabling full- 700 function versions of these operators. 702 (2) The PostScript language provides facilities for exiting 703 the normal interpreter, or server, loop. Changes made 704 in this "outer" environment are customarily retained 705 across documents, and may in some cases be retained 706 semipermanently in nonvolatile memory. The operators 707 associated with exiting the interpreter loop have the 708 potential to interfere with subsequent document 709 processing. As such, their unrestrained use 710 constitutes a threat of service denial. PostScript 711 operators that exit the interpreter loop include, but 712 may not be limited to, the exitserver and startjob 713 operators. Message sending software should not 714 generate PostScript that depends on exiting the 715 interpreter loop to operate, since the ability to exit 716 will probably be unavailable in secure PostScript 717 implementations. Message receiving and displaying 718 software should completely disable the ability to make 719 retained changes to the PostScript environment by 720 eliminating or disabling the "startjob" and 721 "exitserver" operations. If these operations cannot be 722 eliminated or completely disabled the password 723 associated with them should at least be set to a hard- 724 to-guess value. 726 (3) PostScript provides operators for setting system-wide 727 and device-specific parameters. These parameter 728 settings may be retained across jobs and may 729 potentially pose a threat to the correct operation of 730 the interpreter. The PostScript operators that set 731 system and device parameters include, but may not be 732 limited to, the "setsystemparams" and "setdevparams" 733 operators. Message sending software should not 734 generate PostScript that depends on the setting of 735 system or device parameters to operate correctly. The 736 ability to set these parameters will probably be 737 unavailable in secure PostScript implementations. 738 Message receiving and displaying software should 739 disable the ability to change system and device 740 parameters. If these operators cannot be completely 741 disabled the password associated with them should at 742 least be set to a hard-to-guess value. 744 (4) Some PostScript implementations provide nonstandard 745 facilities for the direct loading and execution of 746 machine code. Such facilities are quite obviously open 747 to substantial abuse. Message sending software should 748 not make use of such features. Besides being totally 749 hardware-specific, they are also likely to be 750 unavailable in secure implementations of PostScript. 751 Message receiving and displaying software should not 752 allow such operators to be used if they exist. 754 (5) PostScript is an extensible language, and many, if not 755 most, implementations of it provide a number of their 756 own extensions. This document does not deal with such 757 extensions explicitly since they constitute an unknown 758 factor. Message sending software should not make use 759 of nonstandard extensions; they are likely to be 760 missing from some implementations. Message receiving 761 and displaying software should make sure that any 762 nonstandard PostScript operators are secure and don't 763 present any kind of threat. 765 (6) It is possible to write PostScript that consumes huge 766 amounts of various system resources. It is also 767 possible to write PostScript programs that loop 768 indefinitely. Both types of programs have the 769 potential to cause damage if sent to unsuspecting 770 recipients. Message-sending software should avoid the 771 construction and dissemination of such programs, which 772 is antisocial. Message receiving and displaying 773 software should provide appropriate mechanisms to abort 774 processing of a document after a reasonable amount of 775 time has elapsed. In addition, PostScript interpreters 776 should be limited to the consumption of only a 777 reasonable amount of any given system resource. 779 (7) It is possible to include raw binary information inside 780 PostScript in various forms. This is not recommended 781 for use in Internet mail, both because it is not 782 supported by all PostScript interpreters and because it 783 significantly complicates the use of a MIME Content- 784 Transfer-Encoding. (Without such binary, PostScript 785 may typically be viewed as line-oriented data. The 786 treatment of CRLF sequences becomes extremely 787 problematic if binary and line-oriented data are mixed 788 in a single Postscript data stream.) 790 (8) Finally, bugs may exist in some PostScript interpreters 791 which could possibly be exploited to gain unauthorized 792 access to a recipient's system. Apart from noting this 793 possibility, there is no specific action to take to 794 prevent this, apart from the timely correction of such 795 bugs if any are found. 797 6.5.3. Other Application Subtypes 799 It is expected that many other subtypes of application will be 800 defined in the future. MIME implementations must at a minimum 801 treat any unrecognized subtypes as being equivalent to 802 "application/octet-stream". 804 7. Composite Media Type Values 806 The remaining two of the seven initial Content-Type values 807 refer to composite entities. Composite entities are handled 808 using MIME mechanisms -- a MIME processor typically handles 809 the body directly. 811 7.1. Multipart Media Type 813 In the case of multipart entities, in which one or more 814 different sets of data are combined in a single body, a 815 "multipart" media type field must appear in the entity's 816 header. The body must then contain one or more body parts, 817 each preceded by a boundary delimiter line, and the last one 818 followed by a closing boundary delimiter line. After its 819 boundary delimiter line, each body part then consists of a 820 header area, a blank line, and a body area. Thus a body part 821 is similar to an RFC 822 message in syntax, but different in 822 meaning. 824 A body part is an entity and hence is NOT to be interpreted as 825 actually being an RFC 822 message. To begin with, NO header 826 fields are actually required in body parts. A body part that 827 starts with a blank line, therefore, is allowed and is a body 828 part for which all default values are to be assumed. In such 829 a case, the absence of a Content-Type header usually indicates 830 that the corresponding body has a content-type of "text/plain; 831 charset=US-ASCII". 833 The only header fields that have defined meaning for body 834 parts are those the names of which begin with "Content-". All 835 other header fields are generally to be ignored in body parts. 836 Although they should generally be retained if at all possible, 837 they may be discarded by gateways if necessary. Such other 838 fields are permitted to appear in body parts but must not be 839 depended on. "X-" fields may be created for experimental or 840 private purposes, with the recognition that the information 841 they contain may be lost at some gateways. 843 NOTE: The distinction between an RFC 822 message and a body 844 part is subtle, but important. A gateway between Internet and 845 X.400 mail, for example, must be able to tell the difference 846 between a body part that contains an image and a body part 847 that contains an encapsulated message, the body of which is a 848 JPEG image. In order to represent the latter, the body part 849 must have "Content-Type: message/rfc822", and its body (after 850 the blank line) must be the encapsulated message, with its own 851 "Content-Type: image/jpeg" header field. The use of similar 852 syntax facilitates the conversion of messages to body parts, 853 and vice versa, but the distinction between the two must be 854 understood by implementors. (For the special case in which 855 most parts actually are messages, a "digest" subtype is also 856 defined.) 858 As stated previously, each body part is preceded by a boundary 859 delimiter line that contains the boundary delimiter. The 860 boundary delimiter MUST NOT appear inside any of the 861 encapsulated parts, on a line by itself or as the prefix of 862 any line. This implies that it is crucial that the composing 863 agent be able to choose and specify a unique boundary 864 parameter value that does not contain the boundary parameter 865 value of an enclosing multipart as a prefix. 867 All present and future subtypes of the "multipart" type must 868 use an identical syntax. Subtypes may differ in their 869 semantics, and may impose additional restrictions on syntax, 870 but must conform to the required syntax for the multipart 871 type. This requirement ensures that all conformant user 872 agents will at least be able to recognize and separate the 873 parts of any multipart entity, even those of an unrecognized 874 subtype. 876 As stated in the definition of the Content-Transfer-Encoding 877 field [MIME-IMB], no encoding other than "7bit", "8bit", or 878 "binary" is permitted for entities of type "multipart". The 879 multipart boundary delimiters and header fields are always 880 represented as 7bit US-ASCII in any case (though the header 881 fields may encode non-US-ASCII header text as per RFC MIME- 882 HEADERS) and data within the body parts can be encoded on a 883 part-by-part basis, with Content-Transfer-Encoding fields for 884 each appropriate body part. 886 7.1.1. Common Syntax 888 This section defines a common syntax for subtypes of 889 multipart. All subtypes of multipart must use this syntax. A 890 simple example of a multipart message also appears in this 891 section. An example of a more complex multipart message is 892 given in RFC MIME-CONF. 894 The Content-Type field for multipart entities requires one 895 parameter, "boundary". The boundary delimiter line is then 896 defined as a line consisting entirely of two hyphen characters 897 ("-", decimal value 45) followed by the boundary parameter 898 value from the Content-Type header field, optional linear 899 whitespace, and a terminating CRLF. 901 NOTE: The hyphens are for rough compatibility with the 902 earlier RFC 934 method of message encapsulation, and for ease 903 of searching for the boundaries in some implementations. 904 However, it should be noted that multipart messages are NOT 905 completely compatible with RFC 934 encapsulations; in 906 particular, they do not obey RFC 934 quoting conventions for 907 embedded lines that begin with hyphens. This mechanism was 908 chosen over the RFC 934 mechanism because the latter causes 909 lines to grow with each level of quoting. The combination of 910 this growth with the fact that SMTP implementations sometimes 911 wrap long lines made the RFC 934 mechanism unsuitable for use 912 in the event that deeply-nested multipart structuring is ever 913 desired. 915 WARNING TO IMPLEMENTORS: The grammar for parameters on the 916 Content-type field is such that it is often necessary to 917 enclose the boundary parameter values in quotes on the 918 Content-type line. This is not always necessary, but never 919 hurts. Implementors should be sure to study the grammar 920 carefully in order to avoid producing invalid Content-type 921 fields. Thus, a typical multipart Content-Type header field 922 might look like this: 924 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p 926 But the following is not valid: 928 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p 930 (because of the colon) and must instead be represented as 932 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" 934 This Content-Type value indicates that the content consists of 935 one or more parts, each with a structure that is syntactically 936 identical to an RFC 822 message, except that the header area 937 is allowed to be completely empty, and that the parts are each 938 preceded by the line 940 --gc0pJq0M:08jU534c0p 942 The boundary delimiter MUST occur at the beginning of a line, 943 i.e., following a CRLF, and the initial CRLF is considered to 944 be attached to the boundary delimiter line rather than part of 945 the preceding part. The boundary may be followed by zero or 946 more characters of linear whitespace. It is then terminated by 947 either another CRLF and the header fields for the next part, 948 or by two CRLFs, in which case there are no header fields for 949 the next part. If no Content-Type field is present it is 950 assumed to be of message/rfc822 in a multipart/digest and 951 text/plain otherwise. 953 NOTE: The CRLF preceding the boundary delimiter line is 954 conceptually attached to the boundary so that it is possible 955 to have a part that does not end with a CRLF (line break). 956 Body parts that must be considered to end with line breaks, 957 therefore, must have two CRLFs preceding the boundary 958 delimiter line, the first of which is part of the preceding 959 body part, and the second of which is part of the 960 encapsulation boundary. 962 Boundary delimiters must not appear within the encapsulated 963 material, and must be no longer than 70 characters, not 964 counting the two leading hyphens. 966 The boundary delimiter line following the last body part is a 967 distinguished delimiter that indicates that no further body 968 parts will follow. Such a delimiter line is identical to the 969 previous delimiter lines, with the addition of two more 970 hyphens after the boundary parameter value. 972 --gc0pJq0M:08jU534c0p-- 974 NOTE TO IMPLEMENTORS: Boundary string comparisons must 975 compare the boundary value with the beginning of each 976 candidate line. An exact match of the entire candidate line 977 is not required; it is sufficient that the boundary appear in 978 its entirety following the CRLF. 980 There appears to be room for additional information prior to 981 the first boundary delimiter line and following the final 982 boundary delimiter line. These areas should generally be left 983 blank, and implementations must ignore anything that appears 984 before the first boundary delimiter line or after the last 985 one. 987 NOTE: These "preamble" and "epilogue" areas are generally not 988 used because of the lack of proper typing of these parts and 989 the lack of clear semantics for handling these areas at 990 gateways, particularly X.400 gateways. However, rather than 991 leaving the preamble area blank, many MIME implementations 992 have found this to be a convenient place to insert an 993 explanatory note for recipients who read the message with 994 pre-MIME software, since such notes will be ignored by MIME- 995 compliant software. 997 NOTE: Because boundary delimiters must not appear in the body 998 parts being encapsulated, a user agent must exercise care to 999 choose a unique boundary parameter value. The boundary 1000 parameter value in the example above could have been the 1001 result of an algorithm designed to produce boundary delimiters 1002 with a very low probability of already existing in the data to 1003 be encapsulated without having to prescan the data. Alternate 1004 algorithms might result in more "readable" boundary delimiters 1005 for a recipient with an old user agent, but would require more 1006 attention to the possibility that the boundary delimiter might 1007 appear at the beginning of some line in the encapsulated part. 1008 The simplest boundary delimiter line possible is something 1009 like "---", with a closing boundary delimiter line of "-----". 1011 As a very simple example, the following multipart message has 1012 two parts, both of them plain text, one of them explicitly 1013 typed and one of them implicitly typed: 1015 From: Nathaniel Borenstein 1016 To: Ned Freed 1017 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) 1018 Subject: Sample message 1019 MIME-Version: 1.0 1020 Content-type: multipart/mixed; boundary="simple boundary" 1022 This is the preamble. It is to be ignored, though it 1023 is a handy place for composition agents to include an 1024 explanatory note to non-MIME conformant readers. 1026 --simple boundary 1028 This is implicitly typed plain US-ASCII text. 1029 It does NOT end with a linebreak. 1030 --simple boundary 1031 Content-type: text/plain; charset=us-ascii 1033 This is explicitly typed plain US-ASCII text. 1034 It DOES end with a linebreak. 1036 --simple boundary-- 1038 This is the epilogue. It is also to be ignored. 1040 The use of a media type of multipart in a body part within 1041 another multipart entity is explicitly allowed. In such 1042 cases, for obvious reasons, care must be taken to ensure that 1043 each nested multipart entity uses a different boundary 1044 delimiter. See RFC MIME-CONF for an example of nested 1045 multipart entities. 1047 The use of the multipart media type with only a single body 1048 part may be useful in certain contexts, and is explicitly 1049 permitted. 1051 NOTE: Experience has shown that a multipart media type with a 1052 single body part is useful for sending non-text media types. 1053 It has the advantage of providing the preamble as a place to 1054 include decoding instructions. In addition, a number of SMTP 1055 gateways move or remove the MIME headers, and a clever MIME 1056 decoder can take a good guess at multipart boundaries even in 1057 the absence of the Content-Type header and thereby successful 1058 decode the message. 1060 The only mandatory global parameter for the multipart media 1061 type is the boundary parameter, which consists of 1 to 70 1062 characters from a set of characters known to be very robust 1063 through mail gateways, and NOT ending with white space. (If a 1064 boundary delimiter line appears to end with white space, the 1065 white space must be presumed to have been added by a gateway, 1066 and must be deleted.) It is formally specified by the 1067 following BNF: 1069 boundary := 0*69 bcharsnospace 1071 bchars := bcharsnospace / " " 1073 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 1074 "+" / "_" / "," / "-" / "." / 1075 "/" / ":" / "=" / "?" 1077 Overall, the body of a multipart entity may be specified as 1078 follows: 1080 dash-boundary := "--" boundary 1081 ; boundary taken from the value of 1082 ; boundary parameter of the 1083 ; Content-Type field. 1085 multipart-body := [preamble CRLF] 1086 dash-boundary transport-padding CRLF 1087 body-part *encapsulation 1088 close-delimiter transport-padding 1089 [CRLF epilogue] 1091 transport-padding := *LWSP-char 1092 ; Composers MUST NOT generate 1093 ; non-zero length transport 1094 ; padding, but receivers MUST 1095 ; be able to handle padding 1096 ; added by message transports. 1098 encapsulation := delimiter transport-padding 1099 CRLF body-part 1101 delimiter := CRLF dash-boundary 1103 close-delimiter := delimiter "--" 1105 preamble := discard-text 1107 epilogue := discard-text 1109 discard-text := *(*text CRLF) *text 1110 ; To be ignored upon receipt. 1112 body-part := MIME-part-headers [CRLF *OCTET] 1113 ; Lines in a body-part must not start 1114 ; with the specified dash-boundary and 1115 ; the delimiter must not appear anywhere 1116 ; in the body part. Note that the 1117 ; semantics of a body-part differ from 1118 ; the semantics of a message, as 1119 ; described in the text. 1121 OCTET := 1123 IMPORTANT: The free insertion of linear-white-space and RFC 1124 822 comments between the elements shown in this BNF is NOT 1125 allowed since this BNF does not specify a structured header 1126 field. 1128 NOTE: In certain transport enclaves, RFC 822 restrictions 1129 such as the one that limits bodies to printable US-ASCII 1130 characters may not be in force. (That is, the transport 1131 domains may exist that resemble standard Internet mail 1132 transport as specified in RFC 821 and assumed by RFC 822, but 1133 without certain restrictions.) The relaxation of these 1134 restrictions should be construed as locally extending the 1135 definition of bodies, for example to include octets outside of 1136 the US-ASCII range, as long as these extensions are supported 1137 by the transport and adequately documented in the Content- 1138 Transfer-Encoding header field. However, in no event are 1139 headers (either message headers or body part headers) allowed 1140 to contain anything other than US-ASCII characters. 1142 NOTE: Conspicuously missing from the multipart type is a 1143 notion of structured, related body parts. It is recommended 1144 that those wishing to provide more structured or integrated 1145 multipart messaging facilities should define subtypes of 1146 multipart that are syntactically identical but define 1147 relationships between the various parts. For example, subtypes 1148 of multipart could be defined that include a distinguished 1149 part which in turn is used to specify the relationships 1150 between the other parts, probably referring to them by their 1151 Content-ID field. Old implementations will not recognize the 1152 new subtype if this approach is used, but will treat it as 1153 multipart/mixed and will thus be able to show the user the 1154 parts that are recognized. 1156 7.1.2. Handling Nested Messages and Multiparts 1158 The "message/rfc822" subtype defined in a subsequent section 1159 of this document has no terminating condition other than 1160 running out of data. Similarly, an improperly truncated 1161 multipart entity may not have any terminating boundary marker, 1162 and can turn up operationally due to mail system malfunctions. 1164 It is essential that such entities be handled correctly when 1165 they are themselves imbedded inside of another multipart 1166 structure. MIME implementations are therefore required to 1167 recognize outer level boundary markers at ANY level of inner 1168 nesting. It is not sufficient to only check for the next 1169 expected marker or other terminating condition. 1171 7.1.3. Mixed Subtype 1173 The "mixed" subtype of multipart is intended for use when the 1174 body parts are independent and need to be bundled in a 1175 particular order. Any multipart subtypes that an 1176 implementation does not recognize must be treated as being of 1177 subtype "mixed". 1179 7.1.4. Alternative Subtype 1181 The multipart/alternative type is syntactically identical to 1182 multipart/mixed, but the semantics are different. In 1183 particular, each of the body parts is an "alternative" version 1184 of the same information. 1186 Systems should recognize that the content of the various parts 1187 are interchangeable. Systems should choose the "best" type 1188 based on the local environment and references, in some cases 1189 even through user interaction. As with multipart/mixed, the 1190 order of body parts is significant. In this case, the 1191 alternatives appear in an order of increasing faithfulness to 1192 the original content. In general, the best choice is the LAST 1193 part of a type supported by the recipient system's local 1194 environment. 1196 Multipart/alternative may be used, for example, to send a 1197 message in a fancy text format in such a way that it can 1198 easily be displayed anywhere: 1200 From: Nathaniel Borenstein 1201 To: Ned Freed 1202 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST) 1203 Subject: Formatted text mail 1204 MIME-Version: 1.0 1205 Content-Type: multipart/alternative; boundary=boundary42 1207 --boundary42 1208 Content-Type: text/plain; charset=us-ascii 1210 ... plain text version of message goes here ... 1212 --boundary42 1213 Content-Type: text/enriched 1215 ... RFC 1563 text/enriched version of same message 1216 goes here ... 1218 --boundary42 1219 Content-Type: application/x-whatever 1221 ... fanciest version of same message goes here ... 1223 --boundary42-- 1225 In this example, users whose mail systems understood the 1226 "application/x-whatever" format would see only the fancy 1227 version, while other users would see only the enriched or 1228 plain text version, depending on the capabilities of their 1229 system. 1231 In general, user agents that compose multipart/alternative 1232 entities must place the body parts in increasing order of 1233 preference, that is, with the preferred format last. For 1234 fancy text, the sending user agent should put the plainest 1235 format first and the richest format last. Receiving user 1236 agents should pick and display the last format they are 1237 capable of displaying. In the case where one of the 1238 alternatives is itself of type "multipart" and contains 1239 unrecognized sub-parts, the user agent may choose either to 1240 show that alternative, an earlier alternative, or both. 1242 NOTE: From an implementor's perspective, it might seem more 1243 sensible to reverse this ordering, and have the plainest 1244 alternative last. However, placing the plainest alternative 1245 first is the friendliest possible option when 1246 multipart/alternative entities are viewed using a non-MIME- 1247 conformant viewer. While this approach does impose some 1248 burden on conformant MIME viewers, interoperability with older 1249 mail readers was deemed to be more important in this case. 1251 It may be the case that some user agents, if they can 1252 recognize more than one of the formats, will prefer to offer 1253 the user the choice of which format to view. This makes 1254 sense, for example, if a message includes both a nicely- 1255 formatted image version and an easily-edited text version. 1256 What is most critical, however, is that the user not 1257 automatically be shown multiple versions of the same data. 1258 Either the user should be shown the last recognized version or 1259 should be given the choice. 1261 THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each 1262 part of a multipart/alternative entity represents the same 1263 data, but the mappings between the two are not necessarily 1264 without information loss. For example, information is lost 1265 when translating ODA to PostScript or plain text. It is 1266 recommended that each part should have a different Content-ID 1267 value in the case where the information content of the two 1268 parts is not identical. And when the information content is 1269 identical -- for example, where several parts of type 1270 "message/external-body" specify alternate ways to access the 1271 identical data -- the same Content-ID field value should be 1272 used, to optimize any caching mechanisms that might be present 1273 on the recipient's end. However, the Content-ID values used 1274 by the parts should NOT be the same Content-ID value that 1275 describes the multipart/alternative as a whole, if there is 1276 any such Content-ID field. That is, one Content-ID value will 1277 refer to the multipart/alternative entity, while one or more 1278 other Content-ID values will refer to the parts inside it. 1280 7.1.5. Digest Subtype 1282 This document defines a "digest" subtype of the multipart 1283 Content-Type. This type is syntactically identical to 1284 multipart/mixed, but the semantics are different. In 1285 particular, in a digest, the default Content-Type value for a 1286 body part is changed from "text/plain" to "message/rfc822". 1287 This is done to allow a more readable digest format that is 1288 largely compatible (except for the quoting convention) with 1289 RFC 934. 1291 A digest in this format might, then, look something like this: 1293 From: Moderator-Address 1294 To: Recipient-List 1295 Date: Mon, 22 Mar 1994 13:34:51 +0000 1296 Subject: Internet Digest, volume 42 1297 MIME-Version: 1.0 1298 Content-Type: multipart/digest; 1299 boundary="---- next message ----" 1301 ------ next message ---- 1303 From: someone-else 1304 Date: Fri, 26 Mar 1993 11:13:32 +0200 1305 Subject: my opinion 1307 ...body goes here ... 1309 ------ next message ---- 1311 From: someone-else-again 1312 Date: Fri, 26 Mar 1993 10:07:13 -0500 1313 Subject: my different opinion 1315 ... another body goes here ... 1317 ------ next message ------ 1319 7.1.6. Parallel Subtype 1321 This document defines a "parallel" subtype of the multipart 1322 Content-Type. This type is syntactically identical to 1323 multipart/mixed, but the semantics are different. In 1324 particular, in a parallel entity, the order of body parts is 1325 not significant. 1327 A common presentation of this type is to display all of the 1328 parts simultaneously on hardware and software that are capable 1329 of doing so. However, composing agents should be aware that 1330 many mail readers will lack this capability and will show the 1331 parts serially in any event. 1333 7.1.7. Other Multipart Subtypes 1335 Other multipart subtypes are expected in the future. MIME 1336 implementations must in general treat unrecognized subtypes of 1337 multipart as being equivalent to "multipart/mixed". 1339 7.2. Message Media Type 1341 It is frequently desirable, in sending mail, to encapsulate 1342 another mail message. A special media type, "message", is 1343 defined to facilitate this. In particular, the "rfc822" 1344 subtype of "message" is used to encapsulate RFC 822 messages. 1346 NOTE: It has been suggested that subtypes of message might be 1347 defined for forwarded or rejected messages. However, 1348 forwarded and rejected messages can be handled as multipart 1349 messages in which the first part contains any control or 1350 descriptive information, and a second part, of type 1351 message/rfc822, is the forwarded or rejected message. 1352 Composing rejection and forwarding messages in this manner 1353 will preserve the type information on the original message and 1354 allow it to be correctly presented to the recipient, and hence 1355 is strongly encouraged. 1357 Subtypes of message often impose restrictions on what 1358 encodings are allowed. These restrictions are described in 1359 conjunction with each specific subtype. 1361 Mail gateways, relays, and other mail handling agents are 1362 commonly known to alter the top-level header of an RFC 822 1363 message. In particular, they frequently add, remove, or 1364 reorder header fields. Such alterations are explicitly 1365 forbidden for the encapsulated headers embedded in the bodies 1366 of messages of type "message." 1368 7.2.1. RFC822 Subtype 1370 A media type of "message/rfc822" indicates that the body 1371 contains an encapsulated message, with the syntax of an RFC 1372 822 message. However, unlike top-level RFC 822 messages, the 1373 restriction that each message/rfc822 body must include a 1374 "From", "Date", and at least one destination header is removed 1375 and replaced with the requirement that at least one of "From", 1376 "Subject", or "Date" must be present. 1378 No encoding other than "7bit", "8bit", or "binary" is 1379 permitted for the body of a "message/rfc822" entity. The 1380 message header fields are always US-ASCII in any case, and 1381 data within the body can still be encoded, in which case the 1382 Content-Transfer-Encoding header field in the encapsulated 1383 message will reflect this. Non-US-ASCII text in the headers 1384 of an encapsulated message can be specified using the 1385 mechanisms described in RFC MIME-HEADERS. 1387 It should be noted that, despite the use of the numbers "822", 1388 a message/rfc822 entity can include enhanced information as 1389 defined in this document. In other words, a message/rfc822 1390 message may be a MIME message. 1392 7.2.2. Partial Subtype 1394 The "partial" subtype is defined to allow large entities to be 1395 delivered as several separate pieces of mail and automatically 1396 reassembled by a receiving user agent. (The concept is 1397 similar to IP fragmentation and reassembly in the basic 1398 Internet Protocols.) This mechanism can be used when 1399 intermediate transport agents limit the size of individual 1400 messages that can be sent. The media type "message/partial" 1401 thus indicates that the body contains a fragment of a larger 1402 entity. 1404 Three parameters must be specified in the Content-Type field 1405 of type message/partial: The first, "id", is a unique 1406 identifier, as close to a world-unique identifier as possible, 1407 to be used to match the fragments together. (In general, the 1408 identifier is essentially a message-id; if placed in double 1409 quotes, it can be ANY message-id, in accordance with the BNF 1410 for "parameter" given earlier in this specification.) The 1411 second, "number", an integer, is the fragment number, which 1412 indicates where this fragment fits into the sequence of 1413 fragments. The third, "total", another integer, is the total 1414 number of fragments. This third subfield is required on the 1415 final fragment, and is optional (though encouraged) on the 1416 earlier fragments. Note also that these parameters may be 1417 given in any order. 1419 Thus, the second piece of a 3-piece message may have either of 1420 the following header fields: 1422 Content-Type: Message/Partial; number=2; total=3; 1423 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1425 Content-Type: Message/Partial; 1426 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 1427 number=2 1429 But the third piece MUST specify the total number of 1430 fragments: 1432 Content-Type: Message/Partial; number=3; total=3; 1433 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 1435 Note that fragment numbering begins with 1, not 0. 1437 When the fragments of an entity broken up in this manner are 1438 put together, the result is always a complete MIME entity, 1439 which may have its own Content-Type header field, and thus may 1440 contain any other data type. 1442 7.2.2.1. Message Fragmentation and Reassembly 1444 The semantics of a reassembled partial message must be those 1445 of the "inner" message, rather than of a message containing 1446 the inner message. This makes it possible, for example, to 1447 send a large audio message as several partial messages, and 1448 still have it appear to the recipient as a simple audio 1449 message rather than as an encapsulated message containing an 1450 audio message. That is, the encapsulation of the message is 1451 considered to be "transparent". 1453 When generating and reassembling the pieces of a 1454 message/partial message, the headers of the encapsulated 1455 message must be merged with the headers of the enclosing 1456 entities. In this process the following rules must be 1457 observed: 1459 (1) All of the header fields from the initial enclosing 1460 message, except those that start with "Content-" and 1461 the specific header fields "Subject", "Message-ID", 1462 "Encrypted", and "MIME-Version", must be copied, in 1463 order, to the new message. 1465 (2) The header fields in the enclosed message which start 1466 with "Content-", plus the "Subject", "Message-ID", 1467 "Encrypted", and "MIME-Version" fields, must be 1468 appended, in order, to the header fields of the new 1469 message. Any header fields in the enclosed message 1470 which do not start with "Content-" (except for the 1471 "Subject", "Message-ID", "Encrypted", and "MIME- 1472 Version" fields) will be ignored and dropped. 1474 (3) All of the header fields from the second and any 1475 subsequent enclosing messages are discarded by the 1476 reassembly process. 1478 7.2.2.2. Fragmentation and Reassembly Example 1480 If an audio message is broken into two pieces, the first piece 1481 might look something like this: 1483 X-Weird-Header-1: Foo 1484 From: Bill@host.com 1485 To: joe@otherhost.com 1486 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1487 Subject: Audio mail (part 1 of 2) 1488 Message-ID: 1489 MIME-Version: 1.0 1490 Content-type: message/partial; id="ABC@host.com"; 1491 number=1; total=2 1493 X-Weird-Header-1: Bar 1494 X-Weird-Header-2: Hello 1495 Message-ID: 1496 Subject: Audio mail 1497 MIME-Version: 1.0 1498 Content-type: audio/basic 1499 Content-transfer-encoding: base64 1501 ... first half of encoded audio data goes here ... 1503 and the second half might look something like this: 1505 From: Bill@host.com 1506 To: joe@otherhost.com 1507 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1508 Subject: Audio mail (part 2 of 2) 1509 MIME-Version: 1.0 1510 Message-ID: 1511 Content-type: message/partial; 1512 id="ABC@host.com"; number=2; total=2 1514 ... second half of encoded audio data goes here ... 1516 Then, when the fragmented message is reassembled, the 1517 resulting message to be displayed to the user should look 1518 something like this: 1520 X-Weird-Header-1: Foo 1521 From: Bill@host.com 1522 To: joe@otherhost.com 1523 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 1524 Subject: Audio mail 1525 Message-ID: 1526 MIME-Version: 1.0 1527 Content-type: audio/basic 1528 Content-transfer-encoding: base64 1530 ... first half of encoded audio data goes here ... 1531 ... second half of encoded audio data goes here ... 1533 Because data of type "message" may never be encoded in base64 1534 or quoted-printable, a problem might arise if message/partial 1535 entities are constructed in an environment that supports 1536 binary or 8bit transport. The problem is that the binary data 1537 would be split into multiple message/partial messages, each of 1538 them requiring binary transport. If such messages were 1539 encountered at a gateway into a 7bit transport environment, 1540 there would be no way to properly encode them for the 7bit 1541 world, aside from waiting for all of the fragments, 1542 reassembling the inner message, and then encoding the 1543 reassembled data in base64 or quoted-printable. Since it is 1544 possible that different fragments might go through different 1545 gateways, even this is not an acceptable solution. For this 1546 reason, it is specified that entities of type message/partial 1547 must always have a content-transfer-encoding of 7bit (the 1548 default). In particular, even in environments that support 1549 binary or 8bit transport, the use of a content-transfer- 1550 encoding of "8bit" or "binary" is explicitly prohibited for 1551 MIME entities of type message/partial. 1553 Because some message transfer agents may choose to 1554 automatically fragment large messages, and because such agents 1555 may use very different fragmentation thresholds, it is 1556 possible that the pieces of a partial message, upon 1557 reassembly, may prove themselves to comprise a partial 1558 message. This is explicitly permitted. 1560 The inclusion of a "References" field in the headers of the 1561 second and subsequent pieces of a fragmented message that 1562 references the Message-Id on the previous piece may be of 1563 benefit to mail readers that understand and track references. 1564 However, the generation of such "References" fields is 1565 entirely optional. 1567 Finally, it should be noted that the "Encrypted" header field 1568 has been made obsolete by Privacy Enhanced Messaging (PEM) 1569 [RFC1421, RFC1422, RFC1423, and RFC1424], but the rules above 1570 are nevertheless believed to describe the correct way to treat 1571 it if it is encountered in the context of conversion to and 1572 from message/partial fragments. 1574 7.2.3. External-Body Subtype 1576 The external-body subtype indicates that the actual body data 1577 are not included, but merely referenced. In this case, the 1578 parameters describe a mechanism for accessing the external 1579 data. 1581 When a MIME entity is of type "message/external-body", it 1582 consists of a header, two consecutive CRLFs, and the message 1583 header for the encapsulated message. If another pair of 1584 consecutive CRLFs appears, this of course ends the message 1585 header for the encapsulated message. However, since the 1586 encapsulated message's body is itself external, it does NOT 1587 appear in the area that follows. For example, consider the 1588 following message: 1590 Content-type: message/external-body; 1591 access-type=local-file; 1592 name="/u/nsb/Me.jpeg" 1594 Content-type: image/jpeg 1595 Content-ID: 1596 Content-Transfer-Encoding: binary 1598 THIS IS NOT REALLY THE BODY! 1600 The area at the end, which might be called the "phantom body", 1601 is ignored for most external-body messages. However, it may 1602 be used to contain auxiliary information for some such 1603 messages, as indeed it is when the access-type is "mail- 1604 server". The only access-type defined in this document that 1605 uses the phantom body is "mail-server", but other access-types 1606 may be defined in the future in other documents that use this 1607 area. 1609 The encapsulated headers in ALL message/external-body entities 1610 MUST include a Content-ID header field to give a unique 1611 identifier by which to reference the data. This identifier 1612 may be used for caching mechanisms, and for recognizing the 1613 receipt of the data when the access-type is "mail-server". 1615 Note that, as specified here, the tokens that describe 1616 external-body data, such as file names and mail server 1617 commands, are required to be in the US-ASCII character set. 1618 If this proves problematic in practice, a new mechanism may be 1619 required as a future extension to MIME, either as newly 1620 defined access-types for message/external-body or by some 1621 other mechanism. 1623 As with message/partial, MIME entities of type 1624 message/external-body MUST have a content-transfer-encoding of 1625 7bit (the default). In particular, even in environments that 1626 support binary or 8bit transport, the use of a content- 1627 transfer-encoding of "8bit" or "binary" is explicitly 1628 prohibited for entities of type message/external-body. 1630 7.2.3.1. General External-Body Parameters 1632 The parameters that may be used with any message/external-body 1633 are: 1635 (1) ACCESS-TYPE -- A word indicating the supported access 1636 mechanism by which the file or data may be obtained. 1637 This word is not case sensitive. Values include, but 1638 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- 1639 FILE", and "MAIL-SERVER". Future values, except for 1640 experimental values beginning with "X-", must be 1641 registered with IANA, as described in RFC MIME-REG. 1642 This parameter is unconditionally mandatory and MUST be 1643 present on EVERY message/external-body. 1645 (2) EXPIRATION -- The date (in the RFC 822 "date-time" 1646 syntax, as extended by RFC 1123 to permit 4 digits in 1647 the year field) after which the existence of the 1648 external data is not guaranteed. This parameter may be 1649 used with ANY access-type and is ALWAYS optional. 1651 (3) SIZE -- The size (in octets) of the data. The intent 1652 of this parameter is to help the recipient decide 1653 whether or not to expend the necessary resources to 1654 retrieve the external data. Note that this describes 1655 the size of the data in its canonical form, that is, 1656 before any Content-Transfer-Encoding has been applied 1657 or after the data have been decoded. This parameter 1658 may be used with ANY access-type and is ALWAYS 1659 optional. 1661 (4) PERMISSION -- A case-insensitive field that indicates 1662 whether or not it is expected that clients might also 1663 attempt to overwrite the data. By default, or if 1664 permission is "read", the assumption is that they are 1665 not, and that if the data is retrieved once, it is 1666 never needed again. If PERMISSION is "read-write", 1667 this assumption is invalid, and any local copy must be 1668 considered no more than a cache. "Read" and "Read- 1669 write" are the only defined values of permission. This 1670 parameter may be used with ANY access-type and is 1671 ALWAYS optional. 1673 The precise semantics of the access-types defined here are 1674 described in the sections that follow. 1676 7.2.3.2. The 'ftp' and 'tftp' Access-Types 1678 An access-type of FTP or TFTP indicates that the message body 1679 is accessible as a file using the FTP [RFC-959] or TFTP [RFC- 1680 783] protocols, respectively. For these access-types, the 1681 following additional parameters are mandatory: 1683 (1) NAME -- The name of the file that contains the actual 1684 body data. 1686 (2) SITE -- A machine from which the file may be obtained, 1687 using the given protocol. This must be a fully 1688 qualified domain name, not a nickname. 1690 (3) Before any data are retrieved, using FTP, the user will 1691 generally need to be asked to provide a login id and a 1692 password for the machine named by the site parameter. 1693 For security reasons, such an id and password are not 1694 specified as content-type parameters, but must be 1695 obtained from the user. 1697 In addition, the following parameters are optional: 1699 (1) DIRECTORY -- A directory from which the data named by 1700 NAME should be retrieved. 1702 (2) MODE -- A case-insensitive string indicating the mode 1703 to be used when retrieving the information. The valid 1704 values for access-type "TFTP" are "NETASCII", "OCTET", 1705 and "MAIL", as specified by the TFTP protocol [RFC- 1706 783]. The valid values for access-type "FTP" are 1707 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a 1708 decimal integer, typically 8. These correspond to the 1709 representation types "A" "E" "I" and "L n" as specified 1710 by the FTP protocol [RFC-959]. Note that "BINARY" and 1711 "TENEX" are not valid values for MODE and that "OCTET" 1712 or "IMAGE" or "LOCAL8" should be used instead. IF MODE 1713 is not specified, the default value is "NETASCII" for 1714 TFTP and "ASCII" otherwise. 1716 7.2.3.3. The 'anon-ftp' Access-Type 1718 The "anon-ftp" access-type is identical to the "ftp" access 1719 type, except that the user need not be asked to provide a name 1720 and password for the specified site. Instead, the ftp 1721 protocol will be used with login "anonymous" and a password 1722 that corresponds to the user's mail address. 1724 7.2.3.4. The 'local-file' Access-Type 1726 An access-type of "local-file" indicates that the actual body 1727 is accessible as a file on the local machine. Two additional 1728 parameters are defined for this access type: 1730 (1) NAME -- The name of the file that contains the actual 1731 body data. This parameter is mandatory for the 1732 "local-file" access-type. 1734 (2) SITE -- A domain specifier for a machine or set of 1735 machines that are known to have access to the data 1736 file. This optional parameter is used to describe the 1737 locality of reference for the data, that is, the site 1738 or sites at which the file is expected to be visible. 1739 Asterisks may be used for wildcard matching to a part 1740 of a domain name, such as "*.bellcore.com", to indicate 1741 a set of machines on which the data should be directly 1742 visible, while a single asterisk may be used to 1743 indicate a file that is expected to be universally 1744 available, e.g., via a global file system. 1746 7.2.3.5. The 'mail-server' Access-Type 1748 The "mail-server" access-type indicates that the actual body 1749 is available from a mail server. Two additional parameters 1750 are defined for this access-type: 1752 (1) SERVER -- The addr-spec of the mail server from which 1753 the actual body data can be obtained. This parameter 1754 is mandatory for the "mail-server" access-type. 1756 (2) SUBJECT -- The subject that is to be used in the mail 1757 that is sent to obtain the data. Note that keying mail 1758 servers on Subject lines is NOT recommended, but such 1759 mail servers are known to exist. This is an optional 1760 parameter. 1762 Because mail servers accept a variety of syntaxes, some of 1763 which is multiline, the full command to be sent to a mail 1764 server is not included as a parameter in the content-type 1765 header field. Instead, it is provided as the "phantom body" 1766 when the media type is message/external-body and the access- 1767 type is mail-server. 1769 Note that MIME does not define a mail server syntax. Rather, 1770 it allows the inclusion of arbitrary mail server commands in 1771 the phantom body. Implementations must include the phantom 1772 body in the body of the message it sends to the mail server 1773 address to retrieve the relevant data. 1775 Unlike other access-types, mail-server access is asynchronous 1776 and will happen at an unpredictable time in the future. For 1777 this reason, it is important that there be a mechanism by 1778 which the returned data can be matched up with the original 1779 message/external-body entity. MIME mail servers must use the 1780 same Content-ID field on the returned message that was used in 1781 the original message/external-body entities, to facilitate 1782 such matching. 1784 7.2.3.6. External-Body Security Issues 1786 Message/external-body entities give rise to two important 1787 security issues: 1789 (1) Accessing data via a message/external-body reference 1790 effectively results in the message recipient performing 1791 an operation that was specified by the message 1792 originator. It is therefore possible for the message 1793 originator to trick a recipient into doing something 1794 they would not have done otherwise. For example, an 1795 originator could specify a action that attempts 1796 retrieval of material that the recipient is not 1797 authorized to obtain, causing the recipient to 1798 unwittingly violate some security policy. For this 1799 reason, user agents capable of resolving external 1800 references must always take steps to describe the 1801 action they are to take to the recipient and ask for 1802 explicit permisssion prior to performing it. 1804 The 'mail-server' access-type is particularly 1805 vulnerable, in that it causes the recipient to send a 1806 new message whose contents are specified by the 1807 original message's originator. Given the potential for 1808 abuse, any such request messages that are constructed 1809 should contain a clear indication that they were 1810 generated automatically (e.g. in a Comments: header 1811 field) in an attempt to resolve a MIME 1812 message/external-body reference. 1814 (2) MIME will sometimes be used in environments that 1815 provide some guarantee of message integrity and 1816 authenticity. If present, such guarantees may apply 1817 only to the actual direct content of messages -- they 1818 may or may not apply to data accessed through MIME's 1819 message/external-body mechanism. In particular, it may 1820 be possible to subvert certain access mechanisms even 1821 when the messaging system itself is secure. 1823 It should be noted that this problem exists either with 1824 or without the availabilty of MIME mechanisms. A 1825 casual reference to an FTP site containing a document 1826 in the text of a secure message brings up similar 1827 issues -- the only difference is that MIME provides for 1828 automatic retrieval of such material, and users may 1829 place unwarranted trust is such automatic retrieval 1830 mechanisms. 1832 7.2.3.7. Examples and Further Explanations 1834 When the external-body mechanism is used in conjunction with 1835 the multipart/alternative media type it extends the 1836 functionality of multipart/alternative to include the case 1837 where the same entity is provided in the same format but via 1838 different accces mechanisms. When this is done the originator 1839 of the message must order the parts first in terms of 1840 preferred formats and then by preferred access mechanisms. 1841 The recipient's viewer should then evaluate the list both in 1842 terms of format and access mechanisms. 1844 With the emerging possibility of very wide-area file systems, 1845 it becomes very hard to know in advance the set of machines 1846 where a file will and will not be accessible directly from the 1847 file system. Therefore it may make sense to provide both a 1848 file name, to be tried directly, and the name of one or more 1849 sites from which the file is known to be accessible. An 1850 implementation can try to retrieve remote files using FTP or 1851 any other protocol, using anonymous file retrieval or 1852 prompting the user for the necessary name and password. If an 1853 external body is accessible via multiple mechanisms, the 1854 sender may include multiple entities of type 1855 message/external-body within the body parts of an enclosing 1856 multipart/alternative entity. 1858 However, the external-body mechanism is not intended to be 1859 limited to file retrieval, as shown by the mail-server 1860 access-type. Beyond this, one can imagine, for example, using 1861 a video server for external references to video clips. 1863 The embedded message header fields which appear in the body of 1864 the message/external-body data must be used to declare the 1865 media type of the external body if it is anything other than 1866 plain US-ASCII text, since the external body does not have a 1867 header section to declare its type. Similarly, any Content- 1868 transfer-encoding other than "7bit" must also be declared 1869 here. Thus a complete message/external-body message, 1870 referring to a document in PostScript format, might look like 1871 this: 1873 From: Whomever 1874 To: Someone 1875 Date: Whenever 1876 Subject: whatever 1877 MIME-Version: 1.0 1878 Message-ID: 1879 Content-Type: multipart/alternative; boundary=42 1880 Content-ID: 1882 --42 1883 Content-Type: message/external-body; name="BodyFormats.ps"; 1884 site="thumper.bellcore.com"; mode="image"; 1885 access-type=ANON-FTP; directory="pub"; 1886 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1888 Content-type: application/postscript 1889 Content-ID: 1891 --42 1892 Content-Type: message/external-body; access-type=local-file; 1893 name="/u/nsb/writing/rfcs/RFC-MIME.ps"; 1894 site="thumper.bellcore.com"; 1895 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1897 Content-type: application/postscript 1898 Content-ID: 1900 --42 1901 Content-Type: message/external-body; 1902 access-type=mail-server 1903 server="listserv@bogus.bitnet"; 1904 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 1906 Content-type: application/postscript 1907 Content-ID: 1909 get RFC-MIME.DOC 1911 --42-- 1913 Note that in the above examples, the default Content- 1914 transfer-encoding of "7bit" is assumed for the external 1915 postscript data. 1917 Like the message/partial type, the message/external-body media 1918 type is intended to be transparent, that is, to convey the 1919 data type in the external body rather than to convey a message 1920 with a body of that type. Thus the headers on the outer and 1921 inner parts must be merged using the same rules as for 1922 message/partial. In particular, this means that the Content- 1923 type and Subject fields are overridden, but the From field is 1924 preserved. 1926 Note that since the external bodies are not transported along 1927 with the external body reference, they need not conform to 1928 transport limitations that apply to the reference itself. In 1929 particular, Internet mail transports may impose 7bit and line 1930 length limits, but these do not automatically apply to binary 1931 external body references. Thus a Content-Transfer-Encoding is 1932 not generally necessary, though it is permitted. 1934 Note that the body of a message of type "message/external- 1935 body" is governed by the basic syntax for an RFC 822 message. 1936 In particular, anything before the first consecutive pair of 1937 CRLFs is header information, while anything after it is body 1938 information, which is ignored for most access-types. 1940 7.2.4. Other Message Subtypes 1942 MIME implementations must in general treat unrecognized 1943 subtypes of message as being equivalent to 1944 "application/octet-stream". 1946 8. Experimental Media Type Values 1948 A media type value beginning with the characters "X-" is a 1949 private value, to be used by consenting systems by mutual 1950 agreement. Any format without a rigorous and public 1951 definition must be named with an "X-" prefix, and publicly 1952 specified values shall never begin with "X-". (Older versions 1953 of the widely used Andrew system use the "X-BE2" name, so new 1954 systems should probably choose a different name.) 1956 In general, the use of "X-" top-level types is strongly 1957 discouraged. Implementors should invent subtypes of the 1958 existing types whenever possible. In many cases, a subtype of 1959 application will be more appropriate than a new top-level 1960 type. 1962 9. Summary 1964 The five discrete media types provide provide a standardized 1965 mechanism for tagging entities as audio, image, or several 1966 other kinds of data. The composite "multipart" and "message" 1967 media types allow mixing and hierarchical structuring of 1968 entities of different types in a single message. A 1969 distinguished parameter syntax allows further specification of 1970 data format details, particularly the specification of 1971 alternate character sets. Additional optional header fields 1972 provide mechanisms for certain extensions deemed desirable by 1973 many implementors. Finally, a number of useful media types are 1974 defined for general use by consenting user agents, notably 1975 message/partial, and message/external-body. 1977 10. Security Considerations 1979 Security issues are discussed in the context of the 1980 application/postscript type, the message/external-body type, 1981 and in RFC MIME-REG. Implementors should pay special 1982 attention to the security implications of any media types that 1983 can cause the remote execution of any actions in the 1984 recipient's environment. In such cases, the discussion of the 1985 application/postscript type may serve as a model for 1986 considering other media types with remote execution 1987 capabilities. 1989 11. Authors' Addresses 1991 For more information, the authors of this document are best 1992 contacted via Internet mail: 1994 Nathaniel S. Borenstein 1995 First Virtual Holdings 1996 25 Washington Avenue 1997 Morristown, NJ 07960 1998 USA 2000 Email: nsb@nsb.fv.com 2001 Phone: +1 201 540 8967 2002 Fax: +1 201 993 3032 2004 Ned Freed 2005 Innosoft International, Inc. 2006 1050 East Garvey Avenue South 2007 West Covina, CA 91790 2008 USA 2010 Email: ned@innosoft.com 2011 Phone: +1 818 919 3600 2012 Fax: +1 818 919 3614 2014 MIME is a result of the work of the Internet Engineering Task 2015 Force Working Group on Email Extensions. The chairman of that 2016 group, Greg Vaudreuil, may be reached at: 2018 Gregory M. Vaudreuil 2019 Tigon Corporation 2020 17060 Dallas Parkway 2021 Dallas Texas, 75248 2023 Email: greg.vaudreuil@ons.octel.com 2024 Phone: +1 214 733 2722 2025 Appendix A -- Collected Grammar 2027 This appendix contains the complete BNF grammar for all the 2028 syntax specified by this document. 2030 By itself, however, this grammar is incomplete. It refers by 2031 name to several syntax rules that are defined by RFC 822. 2032 Rather than reproduce those definitions here, and risk 2033 unintentional differences between the two, this document 2034 simply refers the reader to RFC 822 for the remaining 2035 definitions. Wherever a term is undefined, it refers to the 2036 RFC 822 definition. 2038 boundary := 0*69 bcharsnospace 2040 bchars := bcharsnospace / " " 2042 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 2043 "+" / "_" / "," / "-" / "." / 2044 "/" / ":" / "=" / "?" 2046 body-part := <"message" as defined in RFC 822, with all 2047 header fields optional, not starting with the 2048 specified dash-boundary, and with the 2049 delimiter not occurring anywhere in the 2050 body part. Note that the semantics of a 2051 part differ from the semantics of a message, 2052 as described in the text.> 2054 close-delimiter := delimiter "--" 2056 dash-boundary := "--" boundary 2057 ; boundary taken from the value of 2058 ; boundary parameter of the 2059 ; Content-Type field. 2061 delimiter := CRLF dash-boundary 2063 discard-text := *(*text CRLF) 2064 ; To be ignored upon receipt. 2066 encapsulation := delimiter transport-padding 2067 CRLF body-part 2069 epilogue := discard-text 2071 multipart-body := [preamble CRLF] 2072 dash-boundary transport-padding CRLF 2073 body-part *encapsulation 2074 close-delimiter transport-padding 2075 [CRLF epilogue] 2077 preamble := discard-text 2079 transport-padding := *LWSP-char 2080 ; Composers MUST NOT generate 2081 ; non-zero length transport 2082 ; padding, but receivers MUST 2083 ; be able to handle padding 2084 ; added by message transports.