idnits 2.17.1 draft-ietf-822ext-mime-imb-06.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 417: '... in accordance with this document MUST...' RFC 2119 keyword, line 946: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 951: '... Octets with values of 9 and 32 MAY be...' RFC 2119 keyword, line 953: '...espectively, but MUST NOT be so repres...' RFC 2119 keyword, line 955: '... an encoded line MUST thus be followed...' (4 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1453 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 1996) is 10267 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC821' on line 345 looks like a reference -- Missing reference section? 'ATK' on line 142 looks like a reference -- Missing reference section? 'X400' on line 147 looks like a reference -- Missing reference section? 'RFC-1741' on line 1122 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part One: 8 Format of Internet Message Bodies 10 March 1996 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822, defines a message representation protocol 35 specifying considerable detail about US-ASCII message headers, 36 and leaves the message content, or message body, as flat US- 37 ASCII text. This set of documents, collectively called the 38 Multipurpose Internet Mail Extensions, or MIME, redefines the 39 format of messages to allow for 40 (1) textual message bodies in character sets other than 41 US-ASCII, 43 (2) an extensible set of different formats for non-textual 44 message bodies, 46 (3) multi-part message bodies, and 48 (4) textual header information in character sets other than 49 US-ASCII. 51 These documents are based on earlier work documented in RFC 52 934, STD 11, and RFC 1049, but extends and revises them. 53 Because RFC 822 said so little about message bodies, these 54 documents are largely orthogonal to (rather than a revision 55 of) RFC 822. 57 This initial document specifies the various headers used to 58 describe the structure of MIME messages. The second document, 59 RFC MIME-IMT, defines the general structure of the MIME media 60 typing system and defines an initial set of media types. The 61 third document, RFC MIME-HEADERS, describes extensions to RFC 62 822 to allow non-US-ASCII text data in Internet mail header 63 fields. The fourth document, RFC MIME-REG, specifies various 64 IANA registration procedures for MIME-related facilities. The 65 fifth and final document, RFC MIME-CONF, describes MIME 66 conformance criteria as well as providing some illustrative 67 examples of MIME message formats, acknowledgements, and the 68 bibliography. 70 These documents are revisions of RFCs 1521, 1522, and 1590, 71 which themselves were revisions of RFCs 1341 and 1342. An 72 appendix in RFC MIME-CONF describes differences and changes 73 from previous versions. 75 2. Table of Contents 77 1 Abstract .............................................. 1 78 2 Table of Contents ..................................... 3 79 3 Introduction .......................................... 4 80 4 Definitions, Conventions, and Generic BNF Grammar ..... 6 81 4.1 CRLF ................................................ 7 82 4.2 Character Set ....................................... 7 83 4.3 Message ............................................. 8 84 4.4 Entity .............................................. 8 85 4.5 Body Part ........................................... 8 86 4.6 Body ................................................ 8 87 4.7 7bit Data ........................................... 9 88 4.8 8bit Data ........................................... 9 89 4.9 Binary Data ......................................... 9 90 4.10 Lines .............................................. 9 91 5 MIME Header Fields .................................... 9 92 6 MIME-Version Header Field ............................. 10 93 7 Content-Type Header Field ............................. 12 94 7.1 Syntax of the Content-Type Header Field ............. 14 95 7.2 Content-Type Defaults ............................... 16 96 8 Content-Transfer-Encoding Header Field ................ 17 97 8.1 Content-Transfer-Encoding Syntax .................... 17 98 8.2 Content-Transfer-Encodings Semantics ................ 17 99 8.3 New Content-Transfer-Encodings ...................... 19 100 8.4 Interpretation and Use .............................. 19 101 8.5 Translating Encodings ............................... 21 102 8.6 Canonical Encoding Model ............................ 22 103 8.7 Quoted-Printable Content-Transfer-Encoding .......... 22 104 8.8 Base64 Content-Transfer-Encoding .................... 26 105 9 Content-ID Header Field ............................... 29 106 10 Content-Description Header Field ..................... 30 107 11 Additional MIME Header Fields ........................ 30 108 12 Summary .............................................. 30 109 13 Security Considerations .............................. 31 110 14 Authors' Addresses ................................... 32 111 A Collected Grammar ..................................... 33 112 3. Introduction 114 Since its publication in 1982, RFC 822 has defined the 115 standard format of textual mail messages on the Internet. Its 116 success has been such that the RFC 822 format has been 117 adopted, wholly or partially, well beyond the confines of the 118 Internet and the Internet SMTP transport defined by RFC 821. 119 As the format has seen wider use, a number of limitations have 120 proven increasingly restrictive for the user community. 122 RFC 822 was intended to specify a format for text messages. 123 As such, non-text messages, such as multimedia messages that 124 might include audio or images, are simply not mentioned. Even 125 in the case of text, however, RFC 822 is inadequate for the 126 needs of mail users whose languages require the use of 127 character sets richer than US-ASCII. Since RFC 822 does not 128 specify mechanisms for mail containing audio, video, Asian 129 language text, or even text in most European languages, 130 additional specifications are needed. 132 One of the notable limitations of RFC 821/822 based mail 133 systems is the fact that they limit the contents of electronic 134 mail messages to relatively short lines (e.g. 1000 characters 135 or less [RFC821]) of 7bit US-ASCII. This forces users to 136 convert any non-textual data that they may wish to send into 137 seven-bit bytes representable as printable US-ASCII characters 138 before invoking a local mail UA (User Agent, a program with 139 which human users send and receive mail). Examples of such 140 encodings currently used in the Internet include pure 141 hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 142 RFC 1421, the Andrew Toolkit Representation [ATK], and many 143 others. 145 The limitations of RFC 822 mail become even more apparent as 146 gateways are designed to allow for the exchange of mail 147 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 148 specifies mechanisms for the inclusion of non-textual material 149 within electronic mail messages. The current standards for 150 the mapping of X.400 messages to RFC 822 messages specify 151 either that X.400 non-textual material must be converted to 152 (not encoded in) IA5Text format, or that they must be 153 discarded, notifying the RFC 822 user that discarding has 154 occurred. This is clearly undesirable, as information that a 155 user may wish to receive is lost. Even though a user agent 156 may not have the capability of dealing with the non-textual 157 material, the user might have some mechanism external to the 158 UA that can extract useful information from the material. 159 Moreover, it does not allow for the fact that the message may 160 eventually be gatewayed back into an X.400 message handling 161 system (i.e., the X.400 message is "tunneled" through Internet 162 mail), where the non-textual information would definitely 163 become useful again. 165 This document describes several mechanisms that combine to 166 solve most of these problems without introducing any serious 167 incompatibilities with the existing world of RFC 822 mail. In 168 particular, it describes: 170 (1) A MIME-Version header field, which uses a version 171 number to declare a message to be conformant with this 172 specification and allows mail processing agents to 173 distinguish between such messages and those generated 174 by older or non-conformant software, which are presumed 175 to lack such a field. 177 (2) A Content-Type header field, generalized from RFC 1049, 178 which can be used to specify the media type and subtype 179 of data in the body of a message and to fully specify 180 the native representation (canonical form) of such 181 data. 183 (3) A Content-Transfer-Encoding header field, which can be 184 used to specify both the encoding transformation that 185 was applied to the body and the domain of the result. 186 Encoding transformations other than the identity 187 transformation are usually applied to data in order to 188 allow it to pass through mail transport mechanisms 189 which may have data or character set limitations. 191 (4) Two additional header fields that can be used to 192 further describe the data in a body, the Content-ID and 193 Content-Description header fields. 195 All of the header fields defined in this document are subject 196 to the general syntactic rules for header fields specified in 197 RFC 822. In particular, all of these header fields except for 198 Content-Disposition can include RFC 822 comments, which have 199 no semantic content and should be ignored during MIME 200 processing. 202 Finally, to specify and promote interoperability, RFC MIME- 203 CONF provides a basic applicability statement for a subset of 204 the above mechanisms that defines a minimal level of 205 "conformance" with this document. 207 HISTORICAL NOTE: Several of the mechanisms described in this 208 set of documents may seem somewhat strange or even baroque at 209 first reading. It is important to note that compatibility 210 with existing standards AND robustness across existing 211 practice were two of the highest priorities of the working 212 group that developed this set of documents. In particular, 213 compatibility was always favored over elegance. 215 Please refer to the current edition of the "IAB Official 216 Protocol Standards" for the standardization state and status 217 of this protocol. RFC 822 and RFC 1123 also provide 218 essential background for MIME since no conforming 219 implementation of MIME can violate them. In addition, several 220 other informational RFC documents will be of interest to the 221 MIME implementor, in particular RFC 1344, RFC 1345, and RFC 222 1524. 224 4. Definitions, Conventions, and Generic BNF Grammar 226 Although the mechanisms specified in this set of documents are 227 all described in prose, most are also described formally in 228 the augmented BNF notation of RFC 822. Implementors will need 229 to be familiar with this notation in order to understand this 230 specification, and are referred to RFC 822 for a complete 231 explanation of the augmented BNF notation. 233 Some of the augmented BNF in this set of documents makes named 234 references to syntax rules defined in RFC 822. A complete 235 formal grammar, then, is obtained by combining the collected 236 grammar appendices in each document in this set with the BNF 237 of RFC 822 plus the modifications to RFC 822 defined in RFC 238 1123 (which specifically changes the syntax for `return', 239 `date' and `mailbox'). 241 All numeric and octet values are given in decimal notation in 242 this set of documents. All media type values, subtype values, 243 and parameter names as defined are case-insensitive. However, 244 parameter values are case-sensitive unless otherwise specified 245 for the specific parameter. 247 FORMATTING NOTE: Notes, such at this one, provide additional 248 nonessential information which may be skipped by the reader 249 without missing anything essential. The primary purpose of 250 these non-essential notes is to convey information about the 251 rationale of this set of documents, or to place these 252 documents in the proper historical or evolutionary context. 253 Such information may in particular be skipped by those who are 254 focused entirely on building a conformant implementation, but 255 may be of use to those who wish to understand why certain 256 design choices were made. 258 4.1. CRLF 260 The term CRLF, in this set of documents, refers to the 261 sequence of octets corresponding to the two US-ASCII 262 characters CR (decimal value 13) and LF (decimal value 10) 263 which, taken together, in this order, denote a line break in 264 RFC 822 mail. 266 4.2. Character Set 268 The term "character set" is used in MIME to refer to a method 269 of converting a sequence of octets into a sequence of 270 characters. Note that unconditional and unambiguous 271 conversion in the other direction is not required, in that not 272 all characters may be representable by a given character set 273 and a character set may provide more than one sequence of 274 octets to represent a particular sequence of characters. 276 This definition is intended to allow various kinds of 277 character encodings, from simple single-table mappings such as 278 US-ASCII to complex table switching methods such as those that 279 use ISO 2022's techniques, to be used as character sets. 280 However, the definition associated with a MIME character set 281 name must fully specify the mapping to be performed. In 282 particular, use of external profiling information to determine 283 the exact mapping is not permitted. 285 NOTE: The term "character set" was originally used in MIME 286 with specifications such as US-ASCII and other 7bit and 8bit 287 schemes which have a simple mapping from single octets to 288 single characters. Multi-octet coded character sets and 289 switching techniques make the situation more complex. For 290 example, some communities use the term "character encoding" 291 for what MIME calls a "character set", while using the phrase 292 "coded character set" to denote an abstract mapping from 293 integers (not octets) to characters. 295 4.3. Message 297 The term "message", when not further qualified, means either a 298 (complete or "top-level") RFC 822 message being transferred on 299 a network, or a message encapsulated in a body of type 300 "message/rfc822" or "message/partial". 302 4.4. Entity 304 The term "entity", refers specifically to the MIME-defined 305 header fields and contents of either a message or one of the 306 parts in the body of a multipart entity. The specification of 307 such entities is the essence of MIME. Since the contents of 308 an entity are often called the "body", it makes sense to speak 309 about the body of an entity. Any sort of field may be present 310 in the header of an entity, but only those fields whose names 311 begin with "content-" actually have any MIME-related meaning. 312 Note that this does NOT imply thay they have no meaning at all 313 -- an entity that is also a message has non-MIME header fields 314 whose meanings are defined by RFC 822. 316 4.5. Body Part 318 The term "body part" refers to an entity inside of a multipart 319 entity. 321 4.6. Body 323 The term "body", when not further qualified, means the body of 324 an entity, that is, the body of either a message or of a body 325 part. 327 NOTE: The previous four definitions are clearly circular. 328 This is unavoidable, since the overall structure of a MIME 329 message is indeed recursive. 331 4.7. 7bit Data 333 "7bit data" refers to data that is all represented as 334 relatively short lines with 998 octets or less between CRLF 335 line separation sequences [RFC821]. No octets with decimal 336 values greater than 127 are allowed and neither are NULs 337 (octets with decimal value 0). CR (decimal value 13) and LF 338 (decimal value 10) octets only occur as part of CRLF line 339 separation sequences. 341 4.8. 8bit Data 343 "8bit data" refers to data that is all represented as 344 relatively short lines with 998 octets or less between CRLF 345 line separation sequences [RFC821]), but octets with decimal 346 values greater than 127 may be used. As with "7bit data" CR 347 and LF octets only occur as part of CRLF line separation 348 sequences and no NULs are allowed. 350 4.9. Binary Data 352 "Binary data" refers to data where any sequence of octets 353 whatsoever is allowed. 355 4.10. Lines 357 "Lines" are defined as sequences of octets separated by a CRLF 358 sequences. This is consistent with both RFC 821 and RFC 822. 359 "Lines" only refers to a unit of data in a message, which may 360 or may not correspond to something that is actually displayed 361 by a user agent. 363 5. MIME Header Fields 365 MIME defines a number of new RFC 822 header fields that are 366 used to describe the content of a MIME entity. These header 367 fields occur in at least two contexts: 369 (1) As part of a regular RFC 822 message header. 371 (2) In a MIME body part header within a multipart 372 construct. 374 The formal definition of these header fields is as follows: 376 entity-headers := [ content CRLF ] 377 [ encoding CRLF ] 378 [ id CRLF ] 379 [ description CRLF ] 380 *( MIME-extension-field CRLF ) 382 MIME-message-headers := entity-headers 383 fields 384 version CRLF 385 ; The ordering of the header 386 ; fields implied by this BNF 387 ; definition should be ignored. 389 MIME-part-headers := entity-headers 390 [ fields ] 391 ; Any field not beginning with 392 ; "content-" can have no defined 393 ; meaning and may be ignored. 394 ; The ordering of the header 395 ; fields implied by this BNF 396 ; definition should be ignored. 398 The syntax of the various specific MIME header fields will be 399 described in the following sections. 401 6. MIME-Version Header Field 403 Since RFC 822 was published in 1982, there has really been 404 only one format standard for Internet messages, and there has 405 been little perceived need to declare the format standard in 406 use. This document is an independent document that 407 complements RFC 822. Although the extensions in this document 408 have been defined in such a way as to be compatible with RFC 409 822, there are still circumstances in which it might be 410 desirable for a mail-processing agent to know whether a 411 message was composed with the new standard in mind. 413 Therefore, this document defines a new header field, "MIME- 414 Version", which is to be used to declare the version of the 415 Internet message body format standard in use. 417 Messages composed in accordance with this document MUST 418 include such a header field, with the following verbatim text: 420 MIME-Version: 1.0 422 The presence of this header field is an assertion that the 423 message has been composed in compliance with this document. 425 Since it is possible that a future document might extend the 426 message format standard again, a formal BNF is given for the 427 content of the MIME-Version field: 429 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 431 Thus, future format specifiers, which might replace or extend 432 "1.0", are constrained to be two integer fields, separated by 433 a period. If a message is received with a MIME-version value 434 other than "1.0", it cannot be assumed to conform with this 435 specification. 437 Note that the MIME-Version header field is required at the top 438 level of a message. It is not required for each body part of 439 a multipart entity. It is required for the embedded headers 440 of a body of type "message/rfc822" or "message/partial" if and 441 only if the embedded message is itself claimed to be MIME- 442 conformant. 444 It is not possible to fully specify how a mail reader that 445 conforms with MIME as defined in this document should treat a 446 message that might arrive in the future with some value of 447 MIME-Version other than "1.0". 449 It is also worth noting that version control for specific 450 media types is not accomplished using the MIME-Version 451 mechanism. In particular, some formats (such as 452 application/postscript) have version numbering conventions 453 that are internal to the media format. Where such conventions 454 exist, MIME does nothing to supersede them. Where no such 455 conventions exist, a MIME media type might use a "version" 456 parameter in the content-type field if necessary. 458 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 459 RFC 822 comment strings that are present must be ignored. In 460 particular, the following four MIME-Version fields are 461 equivalent: 463 MIME-Version: 1.0 465 MIME-Version: 1.0 (produced by MetaSend Vx.x) 467 MIME-Version: (produced by MetaSend Vx.x) 1.0 469 MIME-Version: 1.(produced by MetaSend Vx.x)0 471 In the absence of a MIME-Version field, a receiving mail user 472 agent (whether conforming to MIME requirements or not) may 473 optionally choose to interpret the body of the message 474 according to local conventions. Many such conventions are 475 currently in use and it should be noted that in practice non- 476 MIME messages can contain just about anything. 478 It is impossible to be certain that a non-MIME mail message is 479 actually plain text in the US-ASCII character set since it 480 might well be a message that, using some set of nonstandard 481 local conventions that predate this document, includes text in 482 another character set or non-textual data presented in a 483 manner that cannot be automatically recognized (e.g., a 484 uuencoded compressed UNIX tar file). 486 7. Content-Type Header Field 488 The purpose of the Content-Type field is to describe the data 489 contained in the body fully enough that the receiving user 490 agent can pick an appropriate agent or mechanism to present 491 the data to the user, or otherwise deal with the data in an 492 appropriate manner. The value in this field is called a media 493 type. 495 HISTORICAL NOTE: The Content-Type header field was first 496 defined in RFC 1049. RFC 1049 used a simpler and less 497 powerful syntax, but one that is largely compatible with the 498 mechanism given here. 500 The Content-Type header field specifies the nature of the data 501 in the body of an entity by giving media type and subtype 502 identifiers, and by providing auxiliary information that may 503 be required for certain media types. After the media type and 504 subtype names, the remainder of the header field is simply a 505 set of parameters, specified in an attribute=value notation. 506 The ordering of parameters is not significant. 508 In general, the top-level media type is used to declare the 509 general type of data, while the subtype specifies a specific 510 format for that type of data. Thus, a media type of 511 "image/xyz" is enough to tell a user agent that the data is an 512 image, even if the user agent has no knowledge of the specific 513 image format "xyz". Such information can be used, for 514 example, to decide whether or not to show a user the raw data 515 from an unrecognized subtype -- such an action might be 516 reasonable for unrecognized subtypes of text, but not for 517 unrecognized subtypes of image or audio. For this reason, 518 registered subtypes of text, image, audio, and video should 519 not contain embedded information that is really of a different 520 type. Such compound formats should be represented using the 521 "multipart" or "application" types. 523 Parameters are modifiers of the media subtype, and as such do 524 not fundamentally affect the nature of the content. The set 525 of meaningful parameters depends on the media type and 526 subtype. Most parameters are associated with a single 527 specific subtype. However, a given top-level media type may 528 define parameters which are applicable to any subtype of that 529 type. Parameters may be required by their defining content 530 type or subtype or they may be optional. MIME implementations 531 must ignore any parameters whose names they do not recognize. 533 For example, the "charset" parameter is applicable to any 534 subtype of "text", while the "boundary" parameter is required 535 for any subtype of the "multipart" media type. 537 There are NO globally-meaningful parameters that apply to all 538 media types. Truly global mechanisms are best addressed, in 539 the MIME model, by the definition of additional Content-* 540 header fields. 542 An initial set of seven top-level media types is defined in 543 MIME-IMT. Five of these are discrete types whose content is 544 essentially opaque as far as MIME processing is concerned. 545 The remaining two are composite types whose contents require 546 additional handling by MIME processors. 548 This set of top-level media types is intended to be 549 substantially complete. It is expected that additions to the 550 larger set of supported types can generally be accomplished by 551 the creation of new subtypes of these initial types. In the 552 future, more top-level types may be defined only by a 553 standards-track extension to this standard. If another top- 554 level type is to be used for any reason, it must be given a 555 name starting with "X-" to indicate its non-standard status 556 and to avoid a potential conflict with a future official name. 558 7.1. Syntax of the Content-Type Header Field 560 In the Augmented BNF notation of RFC 822, a Content-Type 561 header field value is defined as follows: 563 content := "Content-Type" ":" type "/" subtype 564 *(";" parameter) 565 ; Matching of media type and subtype 566 ; is ALWAYS case-insensitive. 568 type := discrete-type / composite-type 570 discrete-type := "text" / "image" / "audio" / "video" / 571 "application" / extension-token 573 composite-type := "message" / "multipart" / extension-token 575 extension-token := ietf-token / x-token 577 ietf-token := 581 x-token := 584 subtype := extension-token / iana-token 586 iana-token := 590 parameter := attribute "=" value 591 attribute := token 592 ; Matching of attributes 593 ; is ALWAYS case-insensitive. 595 value := token / quoted-string 597 token := 1* 600 tspecials := "(" / ")" / "<" / ">" / "@" / 601 "," / ";" / ":" / "\" / <"> 602 "/" / "[" / "]" / "?" / "=" 603 ; Must be in quoted-string, 604 ; to use within parameter values 606 Note that the definition of "tspecials" is the same as the RFC 607 822 definition of "specials" with the addition of the three 608 characters "/", "?", and "=", and the removal of ".". 610 Note also that a subtype specification is MANDATORY -- it may 611 not be omitted from a Content-Type header field. As such, 612 there are no default subtypes. 614 The type, subtype, and parameter names are not case sensitive. 615 For example, TEXT, Text, and TeXt are all equivalent top-level 616 media types. Parameter values are normally case sensitive, 617 but sometimes are interpreted in a case-insensitive fashion, 618 depending on the intended use. (For example, multipart 619 boundaries are case-sensitive, but the "access-type" parameter 620 for message/External-body is not case-sensitive.) 622 Note that the value of a quoted string parameter does not 623 include the quotes. That is, the quotation marks in a 624 quoted-string are not a part of the value of the parameter, 625 but are merely used to delimit that parameter value. In 626 addition, comments are allowed in accordance with RFC 822 627 rules for structured header fields. Thus the following two 628 forms 630 Content-type: text/plain; charset=us-ascii (Plain text) 632 Content-type: text/plain; charset="us-ascii" 634 are completely equivalent. 636 Beyond this syntax, the only syntactic constraint on the 637 definition of subtype names is the desire that their uses must 638 not conflict. That is, it would be undesirable to have two 639 different communities using "Content-Type: application/foobar" 640 to mean two different things. The process of defining new 641 media subtypes, then, is not intended to be a mechanism for 642 imposing restrictions, but simply a mechanism for publicizing 643 their definition and usage. There are, therefore, two 644 acceptable mechanisms for defining new media subtypes: 646 (1) Private values (starting with "X-") may be defined 647 bilaterally between two cooperating agents without 648 outside registration or standardization. Such values 649 cannot be registered or standardized. 651 (2) New standard values should be registered with IANA as 652 described in RFC MIME-REG. 654 The second document in this set, RFC MIME-IMT, defines the 655 initial set of media types for MIME. 657 7.2. Content-Type Defaults 659 Default RFC 822 messages without a MIME Content-Type header 660 are taken by this protocol to be plain text in the US-ASCII 661 character set, which can be explicitly specified as: 663 Content-type: text/plain; charset=us-ascii 665 This default is assumed if no Content-Type header field is 666 specified. It is also recommend that this default be assumed 667 when a syntactically invalid Content-Type header field is 668 encountered. In the presence of a MIME-Version header field 669 and the absence of any Content-Type header field, a receiving 670 User Agent can also assume that plain US-ASCII text was the 671 sender's intent. Plain US-ASCII text may still be assumed in 672 the absence of a MIME-Version or the presence of an 673 syntactically invalid Content-Type header field, but the 674 sender's intent might have been otherwise. 676 8. Content-Transfer-Encoding Header Field 678 Many media types which could be usefully transported via email 679 are represented, in their "natural" format, as 8bit character 680 or binary data. Such data cannot be transmitted over some 681 transfer protocols. For example, RFC 821 (SMTP) restricts 682 mail messages to 7bit US-ASCII data with lines no longer than 683 1000 characters including any trailing CRLF line separator. 685 It is necessary, therefore, to define a standard mechanism for 686 encoding such data into a 7bit short line format. Proper 687 labelling of unencoded material in less restrictive formats 688 for direct use over less restrictive transports is also 689 desireable. This document specifies that such encodings will 690 be indicated by a new "Content-Transfer-Encoding" header 691 field. This field has not been defined by any previous 692 standard. 694 8.1. Content-Transfer-Encoding Syntax 696 The Content-Transfer-Encoding field's value is a single token 697 specifying the type of encoding, as enumerated below. 698 Formally: 700 encoding := "Content-Transfer-Encoding" ":" mechanism 702 mechanism := "7bit" / "8bit" / "binary" / 703 "quoted-printable" / "base64" / 704 ietf-token / x-token 706 These values are not case sensitive -- Base64 and BASE64 and 707 bAsE64 are all equivalent. An encoding type of 7BIT requires 708 that the body is already in a 7bit mail-ready representation. 709 This is the default value -- that is, "Content-Transfer- 710 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 711 header field is not present. 713 8.2. Content-Transfer-Encodings Semantics 715 This single Content-Transfer-Encoding token actually provides 716 two pieces of information. It specifies what sort of encoding 717 transformation the body was subjected to, and it specifies 718 what the domain of the result is. 720 Three transformations are currently defined: identity, the 721 "quoted-printable" encoding, and the "base64" encoding. The 722 domains are "binary", "8bit" and "7bit". 724 The Content-Transfer-Encoding values "7bit", "8bit", and 725 "binary" all mean that the identity (i.e. NO) encoding 726 transformation has been performed. As such, they serve simply 727 as indicators of the domain of the body data, and provide 728 useful information about the sort of encoding that might be 729 needed for transmission in a given transport system. The 730 terms "7bit data", "8bit data", and "binary data" are all 731 defined in Section 4. 733 The quoted-printable and base64 encodings transform their 734 input from an arbitrary domain into material in the "7bit" 735 range, thus making it safe to carry over restricted 736 transports. The specific definition of the transformations 737 are given below. 739 The proper Content-Transfer-Encoding label must always be 740 used. Labelling unencoded data containing 8bit characters as 741 "7bit" is not allowed, nor is labelling unencoded non-line- 742 oriented data as anything other than "binary" allowed. 744 Unlike media subtypes, a proliferation of Content-Transfer- 745 Encoding values is both undesirable and unnecessary. However, 746 establishing only a single transformation into the "7bit" 747 domain does not seem possible. There is a tradeoff between 748 the desire for a compact and efficient encoding of largely- 749 binary data and the desire for a readable encoding of data 750 that is mostly, but not entirely, 7bit. For this reason, at 751 least two encoding mechanisms are necessary: a "readable" 752 encoding (quoted-printable) and a "dense" encoding (base64). 754 Mail transport for unencoded 8bit data is defined in RFC 1652. 755 As of the initial publication of this document, there are no 756 standardized Internet mail transports for which it is 757 legitimate to include unencoded binary data in mail bodies. 758 Thus there are no circumstances in which the "binary" 759 Content-Transfer-Encoding is actually valid in Internet mail. 760 However, in the event that binary mail transport becomes a 761 reality in Internet mail, or when this document is used in 762 conjunction with any other binary-capable transport mechanism, 763 binary bodies should be labelled as such using this mechanism. 765 NOTE: The five values defined for the Content-Transfer- 766 Encoding field imply nothing about the media type other than 767 the algorithm by which it was encoded or the transport system 768 requirements if unencoded. 770 8.3. New Content-Transfer-Encodings 772 Implementors may, if necessary, define private Content- 773 Transfer-Encoding values, but must use an x-token, which is a 774 name prefixed by "X-", to indicate its non-standard status, 775 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 776 Additional standardized Content-Transfer-Encoding values must 777 be specified by a standards-track RFC. Additional 778 requirements such specifications must meet are given in RFC 779 REG. As such, all content-transfer-encoding namespace except 780 that beginning with "X-" is explicitly reserved to the IETF 781 for future use. 783 Unlike media types and subtypes, the creation of new Content- 784 Transfer-Encoding values is STRONGLY discouraged, as it seems 785 likely to hinder interoperability with little potential 786 benefit 788 8.4. Interpretation and Use 790 If a Content-Transfer-Encoding header field appears as part of 791 a message header, it applies to the entire body of that 792 message. If a Content-Transfer-Encoding header field appears 793 as part of an entity's headers, it applies only to the body of 794 that entity. If an entity is of type "multipart" the 795 Content-Transfer-Encoding is not permitted to have any value 796 other than "7bit", "8bit" or "binary". Even more severe 797 restrictions apply to some subtypes of the "message" type. 799 It should be noted that most media types are defined in terms 800 of octets rather than bits, so that the mechanisms described 801 here are mechanisms for encoding arbitrary octet streams, not 802 bit streams. If a bit stream is to be encoded via one of 803 these mechanisms, it must first be converted to an 8bit byte 804 stream using the network standard bit order ("big-endian"), in 805 which the earlier bits in a stream become the higher-order 806 bits in a 8bit byte. A bit stream not ending at an 8bit 807 boundary must be padded with zeroes. RFC MIME-IMT provides a 808 mechanism for noting the addition of such padding in the case 809 of the application/octet-stream media type, which has a 810 "padding" parameter. 812 The encoding mechanisms defined here explicitly encode all 813 data in US-ASCII. Thus, for example, suppose an entity has 814 header fields such as: 816 Content-Type: text/plain; charset=ISO-8859-1 817 Content-transfer-encoding: base64 819 This must be interpreted to mean that the body is a base64 820 US-ASCII encoding of data that was originally in ISO-8859-1, 821 and will be in that character set again after decoding. 823 Certain Content-Transfer-Encoding values may only be used on 824 certain media types. In particular, it is EXPRESSLY FORBIDDEN 825 to use any encodings other than "7bit", "8bit", or "binary" 826 with any composite media type, i.e. one that recursively 827 includes other Content-Type fields. Currently the only 828 composite media types are "multipart" and "message". All 829 encodings that are desired for bodies of type multipart or 830 message must be done at the innermost level, by encoding the 831 actual body that needs to be encoded. 833 It should also be noted that, by definition, if a composite 834 entity has a transfer-encoding value such as "7bit", but one 835 of the enclosed entities has a less restrictive value such as 836 "8bit", then either the outer "7bit" labelling is in error, 837 because 8bit data are included, or the inner "8bit" labelling 838 placed an unnecessarily high demand on the transport system 839 because the actual included data were actually 7bit-safe. 841 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 842 using content-transfer-encodings on composite body data may 843 seem overly restrictive, it is necessary to prevent nested 844 encodings, in which data are passed through an encoding 845 algorithm multiple times, and must be decoded multiple times 846 in order to be properly viewed. Nested encodings add 847 considerable complexity to user agents: Aside from the 848 obvious efficiency problems with such multiple encodings, they 849 can obscure the basic structure of a message. In particular, 850 they can imply that several decoding operations are necessary 851 simply to find out what types of bodies a message contains. 853 Banning nested encodings may complicate the job of certain 854 mail gateways, but this seems less of a problem than the 855 effect of nested encodings on user agents. 857 Any entity with an unrecognized Content-Transfer-Encoding must 858 be treated as if it has a Content-Type of "application/octet- 859 stream", regardless of what the Content-Type header field 860 actually says. 862 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 863 TRANSFER-ENCODING: It may seem that the Content-Transfer- 864 Encoding could be inferred from the characteristics of the 865 media that is to be encoded, or, at the very least, that 866 certain Content-Transfer-Encodings could be mandated for use 867 with specific media types. There are several reasons why this 868 is not the case. First, given the varying types of transports 869 used for mail, some encodings may be appropriate for some 870 combinations of media types and transports but not for others. 871 (For example, in an 8bit transport, no encoding would be 872 required for text in certain character sets, while such 873 encodings are clearly required for 7bit SMTP.) 875 Second, certain media types may require different types of 876 transfer encoding under different circumstances. For example, 877 many PostScript bodies might consist entirely of short lines 878 of 7bit data and hence require no encoding at all. Other 879 PostScript bodies (especially those using Level 2 PostScript's 880 binary encoding mechanism) may only be reasonably represented 881 using a binary transport encoding. Finally, since the 882 Content-Type field is intended to be an open-ended 883 specification mechanism, strict specification of an 884 association between media types and encodings effectively 885 couples the specification of an application protocol with a 886 specific lower-level transport. This is not desirable since 887 the developers of a media type should not have to be aware of 888 all the transports in use and what their limitations are. 890 8.5. Translating Encodings 892 The quoted-printable and base64 encodings are designed so that 893 conversion between them is possible. The only issue that 894 arises in such a conversion is the handling of hard line 895 breaks in quoted-printable encoding output. When converting 896 from quoted-printable to base64 a hard line break must be 897 converted into a CRLF sequence. Similarly, a CRLF sequence in 898 base64 data must be converted to a quoted-printable hard line 899 break, but ONLY when converting text data. 901 8.6. Canonical Encoding Model 903 There was some confusion, in the previous versions of this 904 RFC, regarding the model for when email data was to be 905 converted to canonical form and encoded, and in particular how 906 this process would affect the treatment of CRLFs, given that 907 the representation of newlines varies greatly from system to 908 system, and the relationship between content-transfer- 909 encodings and character sets. A canonical model for encoding 910 is presented in RFC MIME-CONF for this reason. 912 8.7. Quoted-Printable Content-Transfer-Encoding 914 The Quoted-Printable encoding is intended to represent data 915 that largely consists of octets that correspond to printable 916 characters in the US-ASCII character set. It encodes the data 917 in such a way that the resulting octets are unlikely to be 918 modified by mail transport. If the data being encoded are 919 mostly US-ASCII text, the encoded form of the data remains 920 largely recognizable by humans. A body which is entirely US- 921 ASCII may also be encoded in Quoted-Printable to ensure the 922 integrity of the data should the message pass through a 923 character-translating, and/or line-wrapping gateway. 925 In this encoding, octets are to be represented as determined 926 by the following rules: 928 (1) (General 8bit representation) Any octet, except a CR or 929 LF that is part of a CRLF line break of the canonical 930 (standard) form of the data being encoded, may be 931 represented by an "=" followed by a two digit 932 hexadecimal representation of the octet's value. The 933 digits of the hexadecimal alphabet, for this purpose, 934 are "0123456789ABCDEF". Uppercase letters must be used 935 when sending hexadecimal data, though a robust 936 implementation may choose to recognize lowercase 937 letters on receipt. Thus, for example, the decimal 938 value 12 (US-ASCII form feed) can be represented by 939 "=0C", and the decimal value 61 (US-ASCII EQUAL SIGN) 940 can be represented by "=3D". This rule must be 941 followed except when the following rules allow an 942 alternative encoding. 944 (2) (Literal representation) Octets with decimal values of 945 33 through 60 inclusive, and 62 through 126, inclusive, 946 MAY be represented as the US-ASCII characters which 947 correspond to those octets (EXCLAMATION POINT through 948 LESS THAN, and GREATER THAN through TILDE, 949 respectively). 951 (3) (White Space) Octets with values of 9 and 32 MAY be 952 represented as US-ASCII TAB (HT) and SPACE characters, 953 respectively, but MUST NOT be so represented at the end 954 of an encoded line. Any TAB (HT) or SPACE characters 955 on an encoded line MUST thus be followed on that line 956 by a printable character. In particular, an "=" at the 957 end of an encoded line, indicating a soft line break 958 (see rule #5) may follow one or more TAB (HT) or SPACE 959 characters. It follows that an octet with decimal 960 value 9 or 32 appearing at the end of an encoded line 961 must be represented according to Rule #1. This rule is 962 necessary because some MTAs (Message Transport Agents, 963 programs which transport messages from one user to 964 another, or perform a portion of such transfers) are 965 known to pad lines of text with SPACEs, and others are 966 known to remove "white space" characters from the end 967 of a line. Therefore, when decoding a Quoted-Printable 968 body, any trailing white space on a line must be 969 deleted, as it will necessarily have been added by 970 intermediate transport agents. 972 (4) (Line Breaks) A line break in a text body, represented 973 as a CRLF sequence in the text canonical form, must be 974 represented by a (RFC 822) line break, which is also a 975 CRLF sequence, in the Quoted-Printable encoding. Since 976 the canonical representation of media types other than 977 text do not generally include the representation of 978 line breaks as CRLF sequences, no hard line breaks 979 (i.e. line breaks that are intended to be meaningful 980 and to be displayed to the user) should occur in the 981 quoted-printable encoding of such types. Sequences 982 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 983 appear in non-text data represented in quoted- 984 printable, of course. 986 Note that many implementations may elect to encode the 987 local representation of various content types directly 988 rather than converting to canonical form first, 989 encoding, and then converting back to local 990 representation. In particular, this may apply to plain 991 text material on systems that use newline conventions 992 other than a CRLF terminator sequence. Such an 993 implementation optimization is permissible, but only 994 when the combined canonicalization-encoding step is 995 equivalent to performing the three steps separately. 997 (5) (Soft Line Breaks) The Quoted-Printable encoding 998 REQUIRES that encoded lines be no more than 76 999 characters long. If longer lines are to be encoded 1000 with the Quoted-Printable encoding, "soft" line breaks 1001 must be used. An equal sign as the last character on a 1002 encoded line indicates such a non-significant ("soft") 1003 line break in the encoded text. 1005 Thus if the "raw" form of the line is a single unencoded line 1006 that says: 1008 Now's the time for all folk to come to the aid of their country. 1010 This can be represented, in the Quoted-Printable encoding, as: 1012 Now's the time = 1013 for all folk to come= 1014 to the aid of their country. 1016 This provides a mechanism with which long lines are encoded in 1017 such a way as to be restored by the user agent. The 76 1018 character limit does not count the trailing CRLF, but counts 1019 all other characters, including any equal signs. 1021 Since the hyphen character ("-") may be represented as itself 1022 in the Quoted-Printable encoding, care must be taken, when 1023 encapsulating a quoted-printable encoded body inside one or 1024 more multipart entities, to ensure that the boundary delimiter 1025 does not appear anywhere in the encoded body. (A good 1026 strategy is to choose a boundary that includes a character 1027 sequence such as "=_" which can never appear in a quoted- 1028 printable body. See the definition of multipart messages in 1029 MIME-IMT.) 1030 NOTE: The quoted-printable encoding represents something of a 1031 compromise between readability and reliability in transport. 1032 Bodies encoded with the quoted-printable encoding will work 1033 reliably over most mail gateways, but may not work perfectly 1034 over a few gateways, notably those involving translation into 1035 EBCDIC. A higher level of confidence is offered by the base64 1036 Content-Transfer-Encoding. A way to get reasonably reliable 1037 transport through EBCDIC gateways is to also quote the US- 1038 ASCII characters 1040 !"#$@[\]^`{|}~ 1042 according to rule #1. 1044 Because quoted-printable data is generally assumed to be 1045 line-oriented, it is to be expected that the representation of 1046 the breaks between the lines of quoted printable data may be 1047 altered in transport, in the same manner that plain text mail 1048 has always been altered in Internet mail when passing between 1049 systems with differing newline conventions. If such 1050 alterations are likely to constitute a corruption of the data, 1051 it is probably more sensible to use the base64 encoding rather 1052 than the quoted-printable encoding. 1054 WARNING TO IMPLEMENTORS: If binary data are encoded in 1055 quoted-printable, care must be taken to encode CR and LF 1056 characters as "=0D" and "=0A", respectively. In particular, a 1057 CRLF sequence in binary data should be encoded as "=0D=0A". 1058 Otherwise, if CRLF were represented as a hard line break, it 1059 might be incorrectly decoded on platforms with different line 1060 break conventions. 1062 For formalists, the syntax of quoted-printable data is 1063 described by the following grammar: 1065 quoted-printable := qp-line *(CRLF qp-line) 1067 qp-line := *(qp-segment transport-padding CRLF) 1068 qp-part transport-padding 1070 qp-part := qp-section 1071 ; Maximum length of 76 characters 1073 qp-segment := qp-section *(SPACE / TAB) "=" 1074 ; Maximum length of 76 characters 1076 qp-section := [*(ptext / SPACE / TAB) ptext] 1078 ptext := hex-octet / safe-char 1080 safe-char := 1082 ; Characters not listed as "mail-safe" in 1083 ; RFC MIME-CONF are also not recommended. 1085 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1086 ; Octet must be used for characters > 127, =, 1087 ; SPACEs or TABs at the ends of lines, and is 1088 ; recommended for any character not listed in 1089 ; RFC MIME-CONF as "mail-safe". 1091 transport-padding := *LWSP-char 1092 ; Composers MUST NOT generate 1093 ; non-zero length transport 1094 ; padding, but receivers MUST 1095 ; be able to handle padding 1096 ; added by message transports. 1098 IMPORTANT: The addition of LWSP between the elements shown in 1099 this BNF is NOT allowed since this BNF does not specify a 1100 structured header field. 1102 8.8. Base64 Content-Transfer-Encoding 1104 The Base64 Content-Transfer-Encoding is designed to represent 1105 arbitrary sequences of octets in a form that need not be 1106 humanly readable. The encoding and decoding algorithms are 1107 simple, but the encoded data are consistently only about 33 1108 percent larger than the unencoded data. This encoding is 1109 virtually identical to the one used in Privacy Enhanced Mail 1110 (PEM) applications, as defined in RFC 1421. 1112 A 65-character subset of US-ASCII is used, enabling 6 bits to 1113 be represented per printable character. (The extra 65th 1114 character, "=", is used to signify a special processing 1115 function.) 1117 NOTE: This subset has the important property that it is 1118 represented identically in all versions of ISO 646, including 1119 US-ASCII, and all characters in the subset are also 1120 represented identically in all versions of EBCDIC. Other 1121 popular encodings, such as the encoding used by the uuencode 1122 utility, Macintosh binhex 4.0 [RFC-1741], and the base85 1123 encoding specified as part of Level 2 PostScript, do not share 1124 these properties, and thus do not fulfill the portability 1125 requirements a binary transport encoding for mail must meet. 1127 The encoding process represents 24-bit groups of input bits as 1128 output strings of 4 encoded characters. Proceeding from left 1129 to right, a 24-bit input group is formed by concatenating 3 1130 8bit input groups. These 24 bits are then treated as 4 1131 concatenated 6-bit groups, each of which is translated into a 1132 single digit in the base64 alphabet. When encoding a bit 1133 stream via the base64 encoding, the bit stream must be 1134 presumed to be ordered with the most-significant-bit first. 1135 That is, the first bit in the stream will be the high-order 1136 bit in the first 8bit byte, and the eighth bit will be the 1137 low-order bit in the first 8bit byte, and so on. 1139 Each 6-bit group is used as an index into an array of 64 1140 printable characters. The character referenced by the index 1141 is placed in the output string. These characters, identified 1142 in Table 1, below, are selected so as to be universally 1143 representable, and the set excludes characters with particular 1144 significance to SMTP (e.g., ".", CR, LF) and to the multipart 1145 boundary delimiters defined in MIME-IMT (e.g., "-"). 1147 Table 1: The Base64 Alphabet 1149 Value Encoding Value Encoding Value Encoding Value Encoding 1150 0 A 17 R 34 i 51 z 1151 1 B 18 S 35 j 52 0 1152 2 C 19 T 36 k 53 1 1153 3 D 20 U 37 l 54 2 1154 4 E 21 V 38 m 55 3 1155 5 F 22 W 39 n 56 4 1156 6 G 23 X 40 o 57 5 1157 7 H 24 Y 41 p 58 6 1158 8 I 25 Z 42 q 59 7 1159 9 J 26 a 43 r 60 8 1160 10 K 27 b 44 s 61 9 1161 11 L 28 c 45 t 62 + 1162 12 M 29 d 46 u 63 / 1163 13 N 30 e 47 v 1164 14 O 31 f 48 w (pad) = 1165 15 P 32 g 49 x 1166 16 Q 33 h 50 y 1168 The encoded output stream must be represented in lines of no 1169 more than 76 characters each. All line breaks or other 1170 characters not found in Table 1 must be ignored by decoding 1171 software. In base64 data, characters other than those in 1172 Table 1, line breaks, and other white space probably indicate 1173 a transmission error, about which a warning message or even a 1174 message rejection might be appropriate under some 1175 circumstances. 1177 Special processing is performed if fewer than 24 bits are 1178 available at the end of the data being encoded. A full 1179 encoding quantum is always completed at the end of a body. 1180 When fewer than 24 input bits are available in an input group, 1181 zero bits are added (on the right) to form an integral number 1182 of 6-bit groups. Padding at the end of the data is performed 1183 using the "=" character. Since all base64 input is an 1184 integral number of octets, only the following cases can arise: 1185 (1) the final quantum of encoding input is an integral 1186 multiple of 24 bits; here, the final unit of encoded output 1187 will be an integral multiple of 4 characters with no "=" 1188 padding, (2) the final quantum of encoding input is exactly 8 1189 bits; here, the final unit of encoded output will be two 1190 characters followed by two "=" padding characters, or (3) the 1191 final quantum of encoding input is exactly 16 bits; here, the 1192 final unit of encoded output will be three characters followed 1193 by one "=" padding character. 1195 Because it is used only for padding at the end of the data, 1196 the occurrence of any "=" characters may be taken as evidence 1197 that the end of the data has been reached (without truncation 1198 in transit). No such assurance is possible, however, when the 1199 number of octets transmitted was a multiple of three and no 1200 "=" characters are present. 1202 Any characters outside of the base64 alphabet are to be 1203 ignored in base64-encoded data. 1205 Care must be taken to use the proper octets for line breaks if 1206 base64 encoding is applied directly to text material that has 1207 not been converted to canonical form. In particular, text 1208 line breaks must be converted into CRLF sequences prior to 1209 base64 encoding. The important thing to note is that this may 1210 be done directly by the encoder rather than in a prior 1211 canonicalization step in some implementations. 1213 NOTE: There is no need to worry about quoting potential 1214 boundary delimiters within base64-encoded bodies within 1215 multipart entities because no hyphen characters are used in 1216 the base64 encoding. 1218 9. Content-ID Header Field 1220 In constructing a high-level user agent, it may be desirable 1221 to allow one body to make reference to another. Accordingly, 1222 bodies may be labelled using the "Content-ID" header field, 1223 which is syntactically identical to the "Message-ID" header 1224 field: 1226 id := "Content-ID" ":" msg-id 1228 Like the Message-ID values, Content-ID values must be 1229 generated to be world-unique. 1231 The Content-ID value may be used for uniquely identifying MIME 1232 entities in several contexts, particularly for caching data 1233 referenced by the message/external-body mechanism. Although 1234 the Content-ID header is generally optional, its use is 1235 MANDATORY in implementations which generate data of the 1236 optional MIME media type "message/external-body". That is, 1237 each message/external-body entity must have a Content-ID field 1238 to permit caching of such data. 1240 It is also worth noting that the Content-ID value has special 1241 semantics in the case of the multipart/alternative media type. 1242 This is explained in the section of MIME-IMT dealing with 1243 multipart/alternative. 1245 10. Content-Description Header Field 1247 The ability to associate some descriptive information with a 1248 given body is often desirable. For example, it may be useful 1249 to mark an "image" body as "a picture of the Space Shuttle 1250 Endeavor." Such text may be placed in the Content-Description 1251 header field. This header field is always optional. 1253 description := "Content-Description" ":" *text 1255 The description is presumed to be given in the US-ASCII 1256 character set, although the mechanism specified in RFC MIME- 1257 HEADERS may be used for non-US-ASCII Content-Description 1258 values. 1260 11. Additional MIME Header Fields 1262 Future documents may elect to define additional MIME header 1263 fields for various purposes. Any new header field that 1264 further describes the content of a message should begin with 1265 the string "Content-" to allow such fields which appear in a 1266 message header to be distinguished from ordinary RFC 822 1267 message header fields. 1269 MIME-extension-field := 1273 12. Summary 1275 Using the MIME-Version, Content-Type, and Content-Transfer- 1276 Encoding header fields, it is possible to include, in a 1277 standardized way, arbitrary types of data with RFC 822 1278 conformant mail messages. No restrictions imposed by either 1279 RFC 821 or RFC 822 are violated, and care has been taken to 1280 avoid problems caused by additional restrictions imposed by 1281 the characteristics of some Internet mail transport mechanisms 1282 (see RFC MIME-CONF). 1284 The next document in this set, RFC MIME-IMT, specifies the 1285 initial set of media types that can be labelled and 1286 transported using these headers. 1288 13. Security Considerations 1290 Security issues are discussed in the second document in this 1291 set, RFC MIME-IMT. 1293 14. Authors' Addresses 1295 For more information, the authors of this document are best 1296 contacted via Internet mail: 1298 Nathaniel S. Borenstein 1299 First Virtual Holdings 1300 25 Washington Avenue 1301 Morristown, NJ 07960 1302 USA 1304 Email: nsb@nsb.fv.com 1305 Phone: +1 201 540 8967 1306 Fax: +1 201 993 3032 1308 Ned Freed 1309 Innosoft International, Inc. 1310 1050 East Garvey Avenue South 1311 West Covina, CA 91790 1312 USA 1314 Email: ned@innosoft.com 1315 Phone: +1 818 919 3600 1316 Fax: +1 818 919 3614 1318 MIME is a result of the work of the Internet Engineering Task 1319 Force Working Group on Email Extensions. The chairman of that 1320 group, Greg Vaudreuil, may be reached at: 1322 Gregory M. Vaudreuil 1323 Octel Network Services 1324 17080 Dallas Parkway 1325 Dallas, TX 75248-1905 1326 USA 1328 Email: Greg.Vaudreuil@Octel.Com 1329 Appendix A -- Collected Grammar 1331 This appendix contains the complete BNF grammar for all the 1332 syntax specified by this document. 1334 By itself, however, this grammar is incomplete. It refers by 1335 name to several syntax rules that are defined by RFC 822. 1336 Rather than reproduce those definitions here, and risk 1337 unintentional differences between the two, this document 1338 simply refers the reader to RFC 822 for the remaining 1339 definitions. Wherever a term is undefined, it refers to the 1340 RFC 822 definition. 1342 attribute := token 1343 ; Matching of attributes 1344 ; is ALWAYS case-insensitive. 1346 composite-type := "message" / "multipart" / extension-token 1348 content := "Content-Type" ":" type "/" subtype 1349 *(";" parameter) 1350 ; Matching of media type and subtype 1351 ; is ALWAYS case-insensitive. 1353 description := "Content-Description" ":" *text 1355 discrete-type := "text" / "image" / "audio" / "video" / 1356 "application" / extension-token 1358 encoding := "Content-Transfer-Encoding" ":" mechanism 1360 entity-headers := [ content CRLF ] 1361 [ encoding CRLF ] 1362 [ id CRLF ] 1363 [ description CRLF ] 1364 *( MIME-extension-field CRLF ) 1366 extension-token := ietf-token / x-token 1367 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1368 ; Octet must be used for characters > 127, =, 1369 ; SPACEs or TABs at the ends of lines, and is 1370 ; recommended for any character not listed in 1371 ; RFC MIME-CONF as "mail-safe". 1373 iana-token := 1377 ietf-token := 1381 id := "Content-ID" ":" msg-id 1383 mechanism := "7bit" / "8bit" / "binary" / 1384 "quoted-printable" / "base64" / 1385 ietf-token / x-token 1387 MIME-extension-field := 1391 MIME-message-headers := entity-headers 1392 fields 1393 version CRLF 1394 ; The ordering of the header 1395 ; fields implied by this BNF 1396 ; definition should be ignored. 1398 MIME-part-headers := entity-headers 1399 [fields] 1400 ; Any field not beginning with 1401 ; "content-" can have no defined 1402 ; meaning and may be ignored. 1403 ; The ordering of the header 1404 ; fields implied by this BNF 1405 ; definition should be ignored. 1407 parameter := attribute "=" value 1409 ptext := hex-octet / safe-char 1410 qp-line := *(qp-segment transport-padding CRLF) 1411 qp-part transport-padding 1413 qp-part := qp-section 1414 ; Maximum length of 76 characters 1416 qp-section := [*(ptext / SPACE / TAB) ptext] 1418 qp-segment := qp-section *(SPACE / TAB) "=" 1419 ; Maximum length of 76 characters 1421 quoted-printable := qp-line *(CRLF qp-line) 1423 safe-char := 1425 ; Characters not listed as "mail-safe" in 1426 ; RFC MIME-CONF are also not recommended. 1428 subtype := extension-token / iana-token 1430 token := 1* 1433 transport-padding := *LWSP-char 1434 ; Composers MUST NOT generate 1435 ; non-zero length transport 1436 ; padding, but receivers MUST 1437 ; be able to handle padding 1438 ; added by message transports. 1440 tspecials := "(" / ")" / "<" / ">" / "@" / 1441 "," / ";" / ":" / "\" / <"> 1442 "/" / "[" / "]" / "?" / "=" 1443 ; Must be in quoted-string, 1444 ; to use within parameter values 1446 type := discrete-type / composite-type 1448 value := token / quoted-string 1450 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1452 x-token :=