idnits 2.17.1 draft-ietf-mhtml-spec-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC1866]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 142: '...the MUST requirements for the protocol...' RFC 2119 keyword, line 143: '...atisfies all the MUST and all the SHOU...' RFC 2119 keyword, line 145: '...all the MUST requirements but not all ...' RFC 2119 keyword, line 277: '... code. Its value MUST be an absolute U...' RFC 2119 keyword, line 372: '...tching relative URIs MUST be followed....' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 791 has weird spacing: '...ization of th...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: When a sending MUA sends objects which were retrieved from the WWW, it SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into some other URI form prior to transmitting them. This will allow the receiving MUA to both verify MICs included with the email message, as well as verify the documents against their WWW counterpoints. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Some Security Considerations include the potential to mail someone an object, and claim that it is represented by a particular URI (by giving it a Content-Location: header). There can be no assurance that a WWW request for that same URI would normally result in that same object. It might be unsuitable to cache the data in such a way that the cached data can be used for retrieval of this URI from other messages or message parts than those included in the same message as the Content-Location header. Because of this problem, receiving User Agents SHOULD not cache this data in the same way that data that was retrieved through an HTTP or FTP request might be cached. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1996) is 10115 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 1866' is mentioned on line 29, but not defined ** Obsolete undefined reference: RFC 1866 (Obsoleted by RFC 2854) == Missing Reference: 'HTML' is mentioned on line 189, but not defined == Unused Reference: 'CONDISP' is defined on line 780, but no explicit reference was found in the text == Unused Reference: 'MD5' is defined on line 797, but no explicit reference was found in the text == Unused Reference: 'MIME-IMB' is defined on line 812, but no explicit reference was found in the text == Unused Reference: 'NEWS' is defined on line 816, but no explicit reference was found in the text == Unused Reference: 'SMTP' is defined on line 837, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1806 (ref. 'CONDISP') (Obsoleted by RFC 2183) ** Obsolete normative reference: RFC 1866 (ref. 'HTML2') (Obsoleted by RFC 2854) ** Downref: Normative reference to an Historic draft: draft-ietf-html-i18n (ref. 'HTML-I18N') ** Downref: Normative reference to an Informational RFC: RFC 1945 (ref. 'HTTP') ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') == Outdated reference: A later version (-03) exists of draft-ietf-mhtml-cid-00 ** Obsolete normative reference: RFC 1521 (ref. 'MIME1') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) == Outdated reference: A later version (-04) exists of draft-ietf-822ext-mime-imt-02 -- Unexpected draft version: The latest known version of draft-ietf-822ext-mime-imb is -06, but you're referring to -07. ** Obsolete normative reference: RFC 1036 (ref. 'NEWS') (Obsoleted by RFC 5536, RFC 5537) -- Possible downref: Non-RFC (?) normative reference: ref. 'PDF' -- Possible downref: Non-RFC (?) normative reference: ref. 'REL' ** Obsolete normative reference: RFC 1808 (ref. 'RELURL') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) -- Possible downref: Non-RFC (?) normative reference: ref. 'SGML' ** Obsolete normative reference: RFC 821 (ref. 'SMTP') (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 1738 (ref. 'URL') (Obsoleted by RFC 4248, RFC 4266) Summary: 23 errors (**), 0 flaws (~~), 13 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Jacob Palme 2 Internet Draft Stockholm University/KTH 3 draft-ietf-mhtml-spec-03.txt Alexander Hopmann 4 Category-to-be: Proposed standard ResNova Software, Inc. 5 Expires: February 1997 August 1996 7 MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML) 9 Status of this Document 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, and 13 its working groups. Note that other groups may also distribute working 14 documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference material 19 or to cite them other than as ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Abstract 29 Although HTML [RFC 1866] was designed within the context of MIME, more 30 than the specification of HTML as defined in RFC 1866 is needed for two 31 electronic mail user agents to be able to interoperate using HTML as a 32 document format. These issues include the naming of objects that are 33 normally referred to by URIs, and the means of aggregating objects that 34 go together. This document describes a set of guidelines that will allow 35 conforming mail user agents to be able to send, deliver and display 36 these objects, such as HTML objects, that can contain links represented 37 by URIs. In order to be able to handle inter-linked objects, the 38 document proposes to use the MIME type multipart/related and specifies 39 the MIME content-headers "Content-Location" and "Content-Base". 41 Table of Contents 43 1. Introduction 44 2. Terminology 45 2.1 Conformance requirement terminology 46 2.2 Other terminology 47 4. The Content-Location and Content-Base MIME Content Headers 48 4.1 MIME content headers 49 4.2 The Content-Base header 50 4.3 The Content-Location Header 51 4.4 Encoding of URIs in e-mail headers 52 5. Base URIs for resolution of relative URIs 53 6. Sending documents without linked objects 54 7. Use of the Content-Type: Multipart/related 55 8. Format of Links to Other Body Parts 56 8.1 General principle 57 8.2 Use of the Content-Location header 58 8.3 Use of the Content-ID header and CID URLs 59 9 Examples 60 9.1 Example of a HTML body without included linked objects 61 9.3 Example with relative URIs to an embedded GIF picture 62 9.4 Example using CID URL and Content-ID header to an embedded GIF 63 picture 64 10. Content-Disposition header 65 11. Character encoding issues and end-of-line issues 66 12. Security Considerations 67 13. Acknowledgments 68 14. References 69 15. Author's Address 71 Mailing List Information 73 Further discussion on this document should be done through the mailing 74 list MHTML@SEGATE.SUNET.SE. 76 To subscribe to this list, send a message to 77 LISTSERV@SEGATE.SUNET.SE 78 which contains the text 79 SUB MHTML 81 Archives of this list are available by anonymous ftp from 82 FTP://SEGATE.SUNET.SE/lists/mHTML/ 83 The archives are also available by e-mail. Send a message to 84 LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of 85 the archive files, and then a new message "GET " to retrieve 86 the archive files. 88 Comments on less important details may also be sent to the editor, Jacob 89 Palme . 91 More information may also be available at URL: 92 HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML 93 1. Introduction 95 There are a number of document formats, HTML [HTML2], PDF [PDF] and VRML 96 for example, which provide links using URIs for their resolution. There 97 is an obvious need to be able to send documents in these formats in e- 98 mail [RFC821=SMTP, RFC822]. This document gives additional 99 specifications on how to send such documents in MIME [RFC 1521=MIME1] e- 100 mail messages. This version of this standard was based on full 101 consideration only of the needs for objects with links in the Text/HTML 102 media type (as defined in RFC 1866 [HTML2]), but the standard may still 103 be applicable also to other formats for sets of interlinked objects, 104 linked by URIs. There is no conformance requirement that implementations 105 claiming conformance to this standard are able to handle URI-s in other 106 document formats than HTML. 108 URIs in documents in HTML and other similar formats reference other 109 objects and resources, either embedded or directly accessible through 110 hypertext links. When mailing such a document, it is often desirable to 111 also mail all of the additional resources that are referenced in it; 112 those elements are necessary for the complete interpretation of the 113 primary object. 115 An alternative way for sending an HTML document or other object 116 containing URIs in e-mail is to only send the URL, and let the recipient 117 look up the document using HTTP. That method is described in [URLBODY] 118 and is not described in this document. 120 2. Terminology 122 2.1 Conformance requirement terminology 124 This specification uses the same words as RFC 1123 [HOSTS] for defining 125 the significance of each particular requirement. These words are: 127 MUST This word or the adjective "required" means that the item is 128 an absolute requirement of the specification. 130 SHOULD This word or the adjective "recommended" means that there may 131 exist valid reasons in particular circumstances to ignore this 132 item, but the full implications should be understood and the 133 case carefully weighed before choosing a different course. 135 MAY This word or the adjective "optional" means that this item is 136 truly optional. One vendor may choose to include the item 137 because a particular marketplace requires it or because it 138 enhances the product, for example; another vendor may omit the 139 same item. 141 An implementation is not compliant if it fails to satisfy one or more of 142 the MUST requirements for the protocols it implements. An implementation 143 that satisfies all the MUST and all the SHOULD requirements for its 144 protocols is said to be "unconditionally compliant"; one that satisfies 145 all the MUST requirements but not all the SHOULD requirements for its 146 protocols is said to be "conditionally compliant." 148 2.2 Other terminology 150 Most of the terms used in this document are defined in other RFCs. 152 Absolute URI, See RFC 1808 [RELURL]. 153 AbsoluteURI 155 CID See [MIDCID]. 157 Content-Base See section 4.2 below. 159 Content-ID See [MIDCID]. 161 Content-Location MIME message or content part header with the URI of 162 the MIME message or content part body, defined in 163 section 4.3 below. 165 Content-Transfer- Conversion of a text into 7-bit octets as specified 166 Encoding in [MIME1]. 168 CR See [RFC822]. 170 CRLF See [RFC822]. 172 Displayed text The text shown to the user reading a document with 173 a web browser. This may be different from the HTML 174 markup, see the definition of HTML markup below. 176 Header Field in a message or content heading specifying 177 the value of one attribute. 179 Heading Part of a message or content before the first 180 CRLFCRLF, containing formatted fields with 181 attributes of the message or content. 183 HTML See RFC 1866 [HTML2]. 185 HTML Aggregate HTML objects together with some or all objects, to 186 objects which the HTML object contains hyperlinks. 188 HTML markup A file containing HTML encodings as specified in 189 [HTML] which may be different from the displayed 190 text which a person using a web browser sees. For 191 example, the HTML markup may contain "<" where 192 the displayed text contains the character "<". 194 LF See [RFC822]. 196 MIC Message Integrity Codes, codes use to verify that a 197 message has not been illegally modified. 199 MIME See RFC 1521 [MIME1], [MIME2]. 201 MUA Messaging User Agent. 203 PDF Portable Document Format, see [PDF]. 205 Relative URI, See RFC 1866 [HTML2] and RFC 1808[RELURL]. 206 RelativeURI 208 URI, absolute and See RFC 1866 [HTML2]. 209 relative 211 URL See RFC 1738 [URL]. 213 URL, relative See [RELURL]. 215 VRML Virtual Reality Markup Language. 217 3. Overview 219 An aggregate document is a MIME-encoded message that contains a root 220 document as well as other data that is required in order to represent 221 that document (inline pictures, style sheets, applets, etc.). Aggregate 222 documents can also include additional elements that are linked to the 223 first object. It is important to keep in mind the differing needs of 224 several audiences. Mail sending agents might send aggregate documents as 225 an encoding of normal day-to-day electronic mail. Mail sending agents 226 might also send aggregate documents when a user wishes to mail a 227 particular document from the web to someone else. Finally mail sending 228 agents might send aggregate documents as automatic responders, providing 229 access to WWW resources for non-IP connected clients. 231 Mail receiving agents also have several differing needs. Some mail 232 receiving agents might be able to receive an aggregate document and 233 display it just as any other text content type would be displayed. 234 Others might have to pass this aggregate document to a browsing program, 235 and provisions need to be made to make this possible. 237 Finally several other constraints on the problem arise. It is important 238 that it be possible for a document to be signed and for it to be able to 239 be transmitted to a client and displayed with a minimum risk of breaking 240 the message integrity (MIC) check that is part of the signature. 242 4. The Content-Location and Content-Base MIME Content Headers 244 4.1 MIME content headers 246 In order to resolve URI references to other body parts, two MIME content 247 headers are defined, Content-Location and Content-Base. Both these 248 headers can occur in any message or content heading, and will then be 249 valid within this heading and for its content. 251 In practice, at present only those URIs which are URLs are used, but it 252 is anticipated that other forms of URIs will in the future be used. 254 The syntax for these headers is, using the syntax definition tools from 255 [RFC822]: 257 content-location ::= "Content-Location:" ( absoluteURI | relativeURI 258 ) 260 content-base ::= "Content-Base:" absoluteURI 262 where URI is at present (June 1996) restricted to the syntax for URLs as 263 defined in RFC 1738 [URL]. 265 These two headers are valid only for exactly the content heading or 266 message heading where they occurs and its text. They are thus not valid 267 for the parts inside multipart headings, and are thus meaningless in 268 multipart headings. 270 These two headers may occur both inside and outside of a 271 multipart/related part. 273 4.2 The Content-Base header 275 The Content-Base gives a base for relative URIs occurring in other 276 heading fields and in content which do not have any BASE element in its 277 HTML code. Its value MUST be an absolute URI. 279 Example showing which Content-Base is valid where: 281 Content-Type: Multipart/related; boundary="boundary-example-1"; 282 type=Text/HTML; start=foo2*foo3@bar2.net 283 ; A Content-Base header cannot be placed here, since this is a 284 ; multipart MIME object. 286 --boundary-example-1 288 Part 1: 289 Content-Type: Text/HTML; charset=US-ASCII 290 Content-ID: foo2*foo3@bar2.net 291 Content-Location: "http/www.ietf.cnir.reston.va.us/images/foo1.bar1"; 292 ; This Content-Location must contain an absolute URI, since no base 293 ; is valid here. 295 --boundary-example-1 297 Part 2: 298 Content-Type: Text/HTML; charset=US-ASCII 299 Content-ID: foo4*foo5@bar2.net 300 Content-Location: "foo1.bar1" ; The Content-Base below applies to 301 ; this relative URI 302 Content-Base: "http:/www.ietf.cnri.reston.va.us/images/" 304 --boundary-example-1-- 306 4.3 The Content-Location Header 308 The Content-Location header specifies the URI that corresponds to the 309 content of the body part in whose heading the header is placed. Its 310 value CAN be an absolute or relative URI. Any URI or URL scheme may be 311 used, but use of non-standardized URI or URL schemes might entail some 312 risk that recipients cannot handle them correctly. 314 The Content-Location header can be used to indicate that the data sent 315 under this heading is also retrievable, in identical format, through 316 normal use of this URI. If used for this purpose, it must contain an 317 absolute URI or be resolvable, through a Content-Base header, into an 318 absolute URI. In this case, the information sent in the message can be 319 seen as a cached version of the original data. 321 The header can also be used for data which is not available to some or 322 all recipients of the message, for example if the header refers to an 323 object which is only retrievable using this URI in a restricted domain, 324 such as within a company-internal web space. The header can even contain 325 a fictious URI and need in that case not be globally unique. 327 Example: 329 Content-Type: Multipart/related; boundary="boundary-example-1"; 330 type=Text/HTML 332 --boundary-example-1 334 Part 1: 335 Content-Type: Text/HTML; charset=US-ASCII 337 ... ... ... ... 339 --boundary-example-1 341 Part 2: 342 Content-Type: Text/HTML; charset=US-ASCII 343 Content-Location: "fiction1/fiction2" 345 --boundary-example-1-- 347 4.4 Encoding of URIs in e-mail headers 349 Since MIME header fields have a limited length and URIs can get quite 350 long, these lines may have to be folded. If such folding is done, the 351 algorithm defined in [URLBODY] section 3.1 should be employed. 353 5. Base URIs for resolution of relative URIs 355 Relative URIs inside contents of MIME body parts are resolved relative 356 to a base URI. In order to determine this base URI, the first-listed 357 method in the following list applies. 359 (a) There is a base specification inside the MIME body part 360 containing the link which resolves relative URIs into absolute 361 URIs. For example, HTML provides the BASE element for this. 363 (b) There is a Content-Base header (as defined in section 4.2), 364 specifying the base to be used. 366 (c) There is a Content-Location header in the heading of the body 367 part which can then serve as the base in the same way as the 368 request URI can serve as a base for relative URIs within a file 369 retrieved via HTTP [HTTP]. 371 When the methods above do not yield an absolute URI the procedure in 372 section 8.2 for matching relative URIs MUST be followed. 374 6. Sending documents without linked objects 376 If a document, such as an HTML object, is sent without other objects, to 377 which it is linked, it MAY be sent as a Text/HTML body part by itself. 378 In this case, multipart/related need not be used. 380 Such a document may either not include any links, or contain links which 381 the recipient resolves via ordinary net look up, or contain links which 382 the recipient cannot resolve. 384 Inclusion of links which the recipient has to look up through the net 385 may not work for some recipients, since all e-mail recipients do not 386 have full internet connectivity. Also, such links may work for the 387 sender but not for the recipient, for example when the link refers to an 388 URI within a company-internal network not accessible from outside the 389 company. 391 Note that documents with links that the recipient cannot resolve MAY be 392 sent, although this is discouraged. For example, two persons developing 393 a new HTML page may exchange incomplete versions. 395 7. Use of the Content-Type: Multipart/related 397 If a message contains one or more MIME body parts containing links and 398 also contains as separate body parts, data, to which these links (as 399 defined, for example, in RFC 1866 [HTML2]) refers, then this whole set 400 of body parts (referring body parts and referred-to body parts) SHOULD 401 be sent within a multipart/related body part as defined in [REL]. 403 The root body part of the multipart/related SHOULD be the start object 404 for rendering the object, such as a text/html object, and which contains 405 links to objects in other body parts, or a multipart/alternative of 406 which at least one alternative resolves to such a start object. 407 Implementors are warned, however, that many mail programs treat 408 multipart/alternative as if it had been multipart/mixed (even though 409 MIME [MIME1] requires support for multipart/alternative). 411 [REL] requires that the type attribute of the "Content-Type: 412 multipart/related" statement be the type of the root object, and this 413 value can thus be "multipart/alternative". If the root is not the first 414 body part within the multipart/related, [REL] further requires that its 415 Content-ID MUST be given in a start parameter to the "Content-Type: 416 multipart/related" header. 418 When presenting the root body part to the user, the additional body 419 parts within the multipart/related can be used: 421 (a) For those recipients who only have e-mail but not full Internet 422 access. 424 (b) For those recipients who for other reasons, such as firewalls 425 or the use of company-internal links, cannot retrieve the 426 linked body parts through the net. 428 Note that this means that you can, via e-mail, send HTML which 429 includes URIs which the recipient cannot resolve via HTTPor 430 other connectivity-requiring URIs. 432 (c) For items which are not available on the web. 434 (d) For any recipient to speed up access. 436 The type parameter of the Content-Type: multipart/related MUST be the 437 same as the Content-Type of its root. 439 When a sending MUA sends objects which were retrieved from the WWW, it 440 SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into 441 some other URI form prior to transmitting them. This will allow the 442 receiving MUA to both verify MICs included with the email message, as 443 well as verify the documents against their WWW counterpoints. 445 This standard does not cover the case where a multipart/related contains 446 links to MIME body parts outside of the current multipart/related or in 447 other MIME messages, even if methods similar to those described in this 448 standard are used. Implementors who provide such links are warned that 449 mailers implementing this standard may not be able to resolve such 450 links. 452 Within such a multipart/related, ALL different parts MUST have different 453 Content-Location or Content-ID values. 455 8. Format of Links to Other Body Parts 457 8.1 General principle 459 A body part, such as a text/HTML body part, may contain hyperlinks to 460 objects which are included as other body parts in the same message and 461 within the same multipart/related content. Often such linked objects are 462 meant to be displayed inline to the reader of the main document; for 463 example, objects referenced with the IMG tag in HTML [RFC 1866=HTML2]. 465 New tags with this property are proposed in the ongoing development of 466 HTML (example: applet, frame). 468 In order to send such messages, there is a need to indicate which other 469 body parts are referred to by the links in the body parts containing 470 such links. For example, a body part of Content-Type: Text/HTML often 471 has links to other objects, which might be included in other body parts 472 in the same MIME message. The referencing of other body parts is done in 473 the following way: For each body part containing links and each distinct 474 URI within it, which refers to data which is sent in the same MIME 475 message, there SHOULD be a separate body part within the current 476 multipart/related part of the message containing this data. Each such 477 body part SHOULD contain a Content-Location header (see section 8.2) or 478 a Content-ID header (see section 8.3). 480 An e-mail system which claims conformance to this standard MUST support 481 receipt of multipart/related (as defined in section 7) with links 482 between body parts using both the Content-Location (as defined in 483 section 8.2) and the Content-ID method (as defined in section 8.3). 485 8.2 Use of the Content-Location header 487 If there is a Content-Base header, then the recipient MUST employ 488 relative to absolute resolution as defined in RFC 1808 [RELURL] of 489 relative URIs in both the HTML markup and the Content-Location header 490 before matching a hyperlink in the HTML markup to a Content-Location 491 header. The same applies if the Content-Location contains an absolute 492 URI, and the HTML markup contains a BASE element so that relative URIs 493 in the HTML markup can be resolved. 495 If there is NO Content-Base header, and the Content-Location header 496 contains a relative URI, then NO relative to absolute resolution SHOULD 497 be performed when matching Content-Location headers (even if there is a 498 BASE specification, such as the BASE element in HTML, in the body part 499 containing the URI), and exact textual match of the relative URI-s in 500 the Content-Location and the HTML markup is performed instead (after 501 removal of LWSP introduced as described in section 4.4 above). Note that 502 this only applies for matching Content-Location headers, not for URL-s 503 in the HTML document which are resolved through network look up at read 504 time. 506 If there is NO Content-Base header, and the Content-Location header 507 contains a relative URI, then NO relative to absolute resolution SHOULD 508 be performed. Matching the relative URI in the Content-Location header 509 to a hyperlink in an HTML markup text is in this case a two step 510 process. First remove any LWSP from the relative URI which may have been 511 introduced as described in section 4.4. Then perform an exact textual 512 match against the HTML URIs. For this matching process, ignore BASE 513 specifications, such as the BASE element in HTML. Note that this only 514 applies for matching Content-Location headers, not for URL-s in the HTML 515 document which are resolved through network look up at read time. 517 The URI in the Content-Location header need not refer to an object which 518 is actually available globally for retrieval using this URI (after 519 resolution of relative URIs). However, URI-s in Content-Location headers 520 (if absolute, or resolvable to absolute URIs) SHOULD still be globally 521 unique. 523 8.3 Use of the Content-ID header and CID URLs 525 When CID (Content-ID) URLs as defined in RFC 1738 [URL] and RFC 1873 526 [MIDCID] is used for links between body parts, the Content-Location 527 statement will normally be replaced by a Content-ID header. Thus, the 528 following two headers are identical in meaning: 530 Content-ID: foo@bar.net 531 Content-Location: CID: foo@bar.net 533 Note: Content-IDs MUST be globally unique [MIME1]. It is thus not 534 permitted to make them unique only within this message or within this 535 multipart/related. 537 9 Examples 539 9.1 Example of a HTML body without included linked objects 541 The first example is the simplest form of an HTML email message. This is 542 not an aggregate HTML object, but simply a message with a single HTML 543 body part. This message contains a hyperlink but does not provide the 544 ability to resolve the hyperlink. To resolve the hyperlink the receiving 545 client would need either IP access to the Internet, or an electronic 546 mail web gateway. 548 From: foo1@bar.net 549 To: foo2@bar.net 550 Subject: A simple example 551 Mime-Version: 1.0 552 Content-Type: Text/HTML; charset=US-ASCII 554 555 556 557

Hi there!

558 An example of an HTML message.

559 Try clicking here.

560 562 9.2 Example with absolute URIs to an embedded GIF picture: 564 From: foo1@bar.net 565 To: foo2@bar.net 566 Subject: A simple example 567 Mime-Version: 1.0 568 Content-Type: multipart/related; boundary="boundary-example-1"; 569 type=Text/HTML; start=foo3*foo1@bar.net 571 --boundary-example 1 572 Content-Type: Text/HTML;charset=US-ASCII 573 Content-ID: foo3*foo1@bar.net 575 ... text of the HTML document, which might contain a hyperlink 576 to the other body part, for example through a statement such as: 577 IETF logo 580 --boundary-example-1 581 Content-Location: 582 "http://www.ietf.cnri.reston.va.us/images/ietflogo.gif" 583 Content-Type: IMAGE/GIF 584 Content-Transfer-Encoding: BASE64 586 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 587 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 588 etc... 590 --boundary-example-1-- 592 9.3 Example with relative URIs to an embedded GIF picture 594 From: foo1@bar.net 595 To: foo2@bar.net 596 Subject: A simple example 597 Mime-Version: 1.0 598 Content-Base: "http://www.ietf.cnri.reston.va.us" 599 Content-Type: multipart/related; boundary="boundary-example-1"; 600 type=Text/HTML 602 --boundary-example 1 603 Content-Type: Text/HTML; charset=ISO-8859-1 604 Content-Transfer-Encoding: QUOTED-PRINTABLE 606 ... text of the HTML document, which might contain a hyperlink 607 to the other body part, for example through a statement such as: 608 IETF logo 609 Example of a copyright sign encoded with Quoted-Printable: =A9 610 Example of a copyright sign mapped onto HTML markup: ¨ 612 --boundary-example-1 613 Content-Location: "/images/ietflogo.gif" 614 Content-Type: IMAGE/GIF 615 Content-Transfer-Encoding: BASE64 617 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 618 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 619 etc... 621 --boundary-example-1-- 623 9.4 Example using CID URL and Content-ID header to an embedded GIF 624 picture 626 From: foo1@bar.net 627 To: foo2@bar.net 628 Subject: A simple example 629 Mime-Version: 1.0 630 Content-Type: multipart/related; boundary="boundary-example-1"; 631 type=Text/HTML 633 --boundary-example 1 634 Content-Type: Text/HTML; charset=US-ASCII 636 ... text of the HTML document, which might contain a hyperlink 637 to the other body part, for example through a statement such as: 638 IETF logo 640 --boundary-example-1 641 Content-ID: foo4*foo1@bar.net 642 Content-Type: IMAGE/GIF 643 Content-Transfer-Encoding: BASE64 645 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 646 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 647 etc... 649 --boundary-example-1-- 651 10. Content-Disposition header 653 Note the specification in [REL] on the relations between Content- 654 Disposition and multipart/related. 656 11. Character encoding issues and end-of-line issues 658 For the encoding of characters in HTML documents and other text 659 documents into a MIME-compatible octet stream, the following mechanisms 660 are relevant: 662 - HTML [HTML2, HTML-I18N] as an application of SGML [SGML] allows 663 characters to be denoted by character entities as well as by numeric 664 character references (e.g. "Latin small letter a with acute accent" 665 may 666 be represented by "á" or "á") in the HTML markup. 668 - HTML documents, in common with other documents of the MIME content- 669 type 670 "text", can be represented in MIME using one of several character 671 encodings. The MIME content-type "charset" parameter value indicates 672 the particular encoding used. For the exact meaning and use of the 673 "charset" parameter, please see [MIME-IMB section 4.2]. 675 Note that the "charset" parameter refers only to the MIME character 676 encoding. For example, the string "á" can be sent in MIME with 677 "charset=US-ASCII", while the raw character "Latin small letter a with 678 acute accent" cannot. 680 The above mechanisms are well defined and documented, and therefore not 681 further explained here. In sending a message, all the above mentioned 682 mechanisms MAY be used, and any mixture of them MAY occur when sending 683 the document via e-mail. Receiving mail user agents (together with the 684 Web browser they may use to display the document) MUST be capable of 685 handling any combinations of these mechanisms. 687 Also note that: 689 - Any documents including HTML documents that contain octet values 690 outside 691 the 7-bit range need a content-transfer-encoding applied before 692 transmission over certain transport protocols [MIME1, chapter 5]. 694 - The MIME standard [MIME1] requires that documents of "Content- 695 Type:text" 696 MUST be in canonical form before Content-Transfer-Encoding, i.e. that 697 line breaks are encoded as CRLFs, not as bare CRs or bare LFs or 698 something else. This is in contrast to [HTTP] where section 3.6.1 699 allows other representations of line breaks. 701 Note that this might cause problems with integrity checks based on 702 checksums, which might not be preserved when moving a document from the 703 HTTP to the MIME environment. If a document has to be converted in such 704 a way that a checksum integrity check becomes invalid, then this 705 integrity check header SHOULD be removed from the document. 707 Other sources of problems are "Content-Encoding" used in HTTP but not 708 allowed in MIME, and "charsets that are not able to represent line 709 breaks as CRLF. A good overview of the differences between HTTP and MIME 710 with regards to Content-Type:text" can be found in [HTTP], appendix C. 712 If the original document has line breaks in the canonical form (CRLF), 713 then the document SHOULD remain unconverted so that integrity check sums 714 are not invalidated. 716 A provider of HTML documents who wants his documents to be transferable 717 via both HTTP and SMTP without invalidating checksum integrity checks, 718 should always provide original documents in the canonical form with CRLF 719 for line breaks. 721 Some transport mechanisms may specify a default "charset" parameter if 722 none is supplied [HTTP, MIME1]. Because the default differs for 723 different mechanisms, when HTML is transferred through mail, the charset 724 parameter SHOULD be included, rather than relying on the default. 726 12. Security Considerations 728 Some Security Considerations include the potential to mail someone an 729 object, and claim that it is represented by a particular URI (by giving 730 it a Content-Location: header). There can be no assurance that a WWW 731 request for that same URI would normally result in that same object. It 732 might be unsuitable to cache the data in such a way that the cached data 733 can be used for retrieval of this URI from other messages or message 734 parts than those included in the same message as the Content-Location 735 header. Because of this problem, receiving User Agents SHOULD not cache 736 this data in the same way that data that was retrieved through an HTTP 737 or FTP request might be cached. 739 URLs, especially File URLs, may in their name contain company-internal 740 information, which may then inadvertently be revealed to recipients of 741 documents containing such URLs. 743 One way of implementing messages with linked body parts is to handle the 744 linked body parts in a combined mail and WWW proxy server. The mail 745 client is only given the start body part, which it passes to a web 746 browser. This web browser requests the linked parts from the proxy 747 server. If this method is used, and if the combined server is used by 748 more than one user, then methods must be employed to ensure that body 749 parts of a message to one person is not retrievable by another person. 750 Use of passwords (also known as tickets or magic cookies) is one way of 751 achieving this. Note that some caching WWW proxy servers may not 752 distinguish between cached objects from e-mail and HTTP, which may be a 753 security risk. 755 In addition, by allowing people to mail aggregate objects, we are 756 opening the door to other potential security problems that until now 757 were only problems for WWW users. For example, some HTML documents now 758 either themselves contain executable content (JavaScript) or contain 759 links to executable content (The "INSERT" specification, Java). It would 760 be exceedingly dangerous for a receiving User Agent to execute content 761 received through a mail message without careful attention to 762 restrictions on the capabilities of that executable content. 764 13. Acknowledgments 766 Harald T. Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst, 767 Lewis Geer, Roy Fielding, Al Gilman, Paul Hoffman, Richard W. Jesmajian, 768 Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed 769 Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin 770 Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski, Steve 771 Zilles and several other people have helped us with preparing this 772 document. I alone take responsibility for any errors which may still be 773 in the document. 775 14. References 777 Ref. Author, title 778 --------- -------------------------------------------------------- 780 [CONDISP] R. Troost, S. Dorner: "Communicating Presentation 781 Information in Internet Messages: The Content- 782 Disposition Header", RFC 1806, June 1995. 784 [HOSTS] R. Braden (editor): "Requirements for Internet Hosts -- 785 Application and Support", STD-3, RFC 1123, October 1989. 787 [HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language 788 - 2.0", RFC 1866, November 1995. 790 [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, & M. Duerst: 791 "Internationalization of the Hypertext Markup 792 Language". draft-ietf-html-i18n-04.txt, May 1996. 794 [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext 795 Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996. 797 [MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, 798 April 1992. 800 [MIDCID] E. Levinson: "Message/External-Body Content-ID Access 801 Type", draft-ietf-mhtml-cid-00.txt, August 1996. 803 [MIME1] N. Borenstein & N. Freed: "MIME (Multipurpose Internet 804 Mail Extensions) Part One: Mechanisms for Specifying and 805 Describing the Format of Internet Message Bodies", RFC 806 1521, Sept 1993. 808 [MIME2] N. Borenstein & N. Freed: "Multipurpose Internet Mail 809 Extensions (MIME) Part Two: Media Types". draft-ietf- 810 822ext-mime-imt-02.txt, December 1995. 812 [MIME-IMB] N. Freed & N. Borenstein: "Multipurpose Internet Mail 813 Extensions (MIME) Part One: Format of Internet Message 814 Bedies". draft-ietf-822ext-mime-imb-07.txt, June 1996. 816 [NEWS] M.R. Horton, R. Adams: "Standard for interchange of 817 USENET messages", RFC 1036, December 1987. 819 [PDF] Bienz, T., Cohn, R. and Meehan, J.: "Portable Document 820 Format Reference Manual, Version 1.1", Adboe Systems 821 Inc. 823 [REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME 824 Multipart/Related Content-type", , May 1995. 827 [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC 828 1808, June 1995. 830 [RFC822] D. Crocker: "Standard for the format of ARPA Internet 831 text messages." STD 11, RFC 822, August 1982. 833 [SGML] ISO 8879. Information Processing -- Text and Office - 834 Standard Generalized Markup Language (SGML), 835 1986. 837 [SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC 838 821, August 1982. 840 [URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform 841 Resource Locators (URL)", RFC 1738, December 1994. 843 [URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME 844 External-Body Access-Type", draft-ietf-mailext-acc-url- 845 01.txt, November 1995. 847 15. Author's Address 849 For contacting the editors, preferably write to Jacob Palme rather than 850 Alex Hopmann. 852 Jacob Palme Phone: +46-8-16 16 67 853 Stockholm University and KTH Fax: +46-8-783 08 29 854 Electrum 230 E-mail: jpalme@dsv.su.se 855 S-164 40 Kista, Sweden 857 Alex Hopmann 858 President 859 ResNova Software, Inc. E-mail: alex.hopmann@resnova.com 860 5011 Argosy Dr. #13 861 Huntington Beach, CA 92649 863 Working group chairman: 865 Einar Stefferud