idnits 2.17.1 draft-ietf-mhtml-rev-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1414 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 7 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 522 has weird spacing: '...ocation that ...' == Line 598 has weird spacing: '...another multi...' == Line 648 has weird spacing: '...sources which...' == Line 1144 has weird spacing: '...visible field...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: When a sending MUA sends objects which were retrieved from the WWW, it SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into some other URI form prior to transmitting them. This will allow the receiving MUA to both verify MICs included with the message, as well as verify the documents against their WWW counterpoints, if this is appropriate. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1998) is 9566 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 1866' is mentioned on line 32, but not defined ** Obsolete undefined reference: RFC 1866 (Obsoleted by RFC 2854) == Missing Reference: 'HTML' is mentioned on line 328, but not defined == Missing Reference: 'ABNF' is mentioned on line 391, but not defined == Missing Reference: 'FWS' is mentioned on line 411, but not defined == Missing Reference: 'CFWS' is mentioned on line 413, but not defined == Missing Reference: 'RFC 1428' is mentioned on line 510, but not defined == Unused Reference: 'CONDISP' is defined on line 1271, but no explicit reference was found in the text == Unused Reference: 'HOSTS' is defined on line 1275, but no explicit reference was found in the text == Unused Reference: 'MD5' is defined on line 1300, but no explicit reference was found in the text == Unused Reference: 'MIME4' is defined on line 1318, but no explicit reference was found in the text == Unused Reference: 'MIME5' is defined on line 1322, but no explicit reference was found in the text == Unused Reference: 'NEWS' is defined on line 1326, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2070 (ref. 'HTML-I18N') (Obsoleted by RFC 2854) ** Obsolete normative reference: RFC 1866 (ref. 'HTML2') (Obsoleted by RFC 2854) ** Downref: Normative reference to an Informational RFC: RFC 1945 (ref. 'HTTP') -- Possible downref: Non-RFC (?) normative reference: ref. 'INFO' ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') ** Obsolete normative reference: RFC 2048 (ref. 'MIME4') (Obsoleted by RFC 4288, RFC 4289) ** Obsolete normative reference: RFC 1036 (ref. 'NEWS') (Obsoleted by RFC 5536, RFC 5537) -- Possible downref: Non-RFC (?) normative reference: ref. 'PDF' -- No information found for draft-ietf-mhtml-re-v2 - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'REL' ** Obsolete normative reference: RFC 1808 (ref. 'RELURL') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) -- Possible downref: Non-RFC (?) normative reference: ref. 'SGML' ** Obsolete normative reference: RFC 821 (ref. 'SMTP') (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 1738 (ref. 'URL') (Obsoleted by RFC 4248, RFC 4266) -- Possible downref: Non-RFC (?) normative reference: ref. 'VRML' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' Summary: 21 errors (**), 0 flaws (~~), 21 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Jacob Palme 2 Internet Draft Stockholm University/KTH 3 draft-ietf-mhtml-rev-07.txt Alexander Hopmann 4 IETF status to be: Proposed standard Microsoft Corporation 5 Replaces: RFC 2110 Nick Shelness 6 Lotus Corporation 7 Expires: August 1998 February 1998 9 MIME Encapsulation of Aggregate Documents, such as HTML (MHTML) 11 Status of this Document 13 This document is an Internet-Draft. Internet-Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, and 15 its working groups. Note that other groups may also distribute working 16 documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference material 21 or to cite them other than as ``work in progress.'' 23 To learn the current status of any Internet-Draft, please check the 24 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 25 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 26 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 27 ftp.isi.edu (US West Coast). 29 Copyright (C) The Internet Society 1998. All Rights Reserved. 30 Abstract 32 HTML [RFC 1866] defines a powerful means of specifying multimedia 33 documents. These multimedia documents consist of a text/html root 34 resource (object) and other subsidiary resources (image, video clip, 35 applet, etc. objects) referenced by Uniform Resource Identifiers (URIs) 36 within the text/html root resource. When an HTML multimedia document is 37 retrieved by a browser, each of these component resources is 38 individually retrieved in real time from a location, and using a 39 protocol, specified by each URI. 41 In order to transfer a complete HTML multimedia document in a single 42 e-mail message, it is necessary to: a) aggregate a text/html root 43 resource and all of the subsidiary resources it references into a 44 single composite message structure, and b) define a means by which URIs 45 in the text/html root can reference subsidiary resources within that 46 composite message structure. 48 This document a) defines the use of a MIME multipart/related structure 49 to aggregate a text/html root resource and the subsidiary resources it 50 references, and b) specifies one MIME content-headers 51 (Content-Location) that allow URIs in a multipart/related text/html 52 root body part to reference subsidiary resources in other body parts of 53 the same multipart/related structure. 55 While initially designed to support e-mail transfer of complete 56 multi-resource HTML multimedia documents, these conventions can also be 57 employed by other transfer protocols such as HTTP and FTP to retrieve a 58 complete multi-resource HTML multimedia document in a single transfer 59 or for storage and archiving of complete HTML-documents. 61 Differences between this and a previous version of this standard, which 62 was published as RFC 2110, are summarized in chapter 12. 64 Table of Contents 66 1. Introduction 67 2. Terminology 68 2.1 Conformance requirement terminology 69 2.2 Other terminology 70 3. Overview 71 4. The Content-Location MIME Content Header 72 4.1 MIME content headers 73 4.2 The Content-Location Header 74 4.3 URIs of MHTML aggregates 75 4.4 Encoding and decoding of URIs in MIME header fields 76 5. Base URIs for resolution of relative URIs 77 6. Sending documents without linked objects 78 7. Use of the Content-Type "multipart/related" 79 8. Usage of Links to Other Body Parts 80 8.1 General principle 81 8.2 Resolution of URIs in text/html body parts 82 8.3 Use of the Content-ID header and CID URLs 83 9. Examples 84 9.1 Example of a HTML body without included linked objects 85 9.2 Example with an absolute URI to an embedded GIF picture 86 9.3 Example with relative URIs to embedded GIF pictures 87 9.4 Example with a relative URI and no BASE available 88 9.5 Example using CID URL and Content-ID header to an embedded GIF 89 picture 90 9.6 Example showing permitted and forbidden references between 91 nested body parts 92 10. Character encoding issues and end-of-line issues 93 11. Security Considerations 94 11.1 Security considerations not related to caching 95 11.2 Security considerations related to caching 96 12. Differences as compared to the previous version of this proposed 97 standard in RFC 2110 98 13. Copyright 99 14. Acknowledgments 100 15. References 101 16. Author's Addresses 103 Differences since version 06 of this draft 105 Changed the syntax of the start parameter in examples, to show that it 106 must always be quoted (since it contains the special character "@", and 107 all Content-Type parameters containing special characters must be 108 quoted according to MIME. 110 Also the list of references has been updated. 112 Differences since version 05 of this draft 114 The definition of "HTML aggregate objects" has been changed from 115 HTML objects together with some or all objects, to which the HTML 116 object contains hyperlinks. 117 to 118 HTML objects together with some or all objects, to which the HTML 119 object contains hyperlinks, directly or indirectly. 121 Erroneous quotes around "multipart/related" have been removed in the 122 example in section 4.2. 124 In section 8.2, the following sentence: 125 The resolution of URIs in text/html body parts is performed in the 126 following way: 127 has been changed to 128 The resolution of inline, retrieval and other kinds of URIs in 129 text/html body parts is performed in the following way: 130 in order to remind the reader that also parts which are not inline can 131 be sent with MHTML. 133 In section 8.2, the following text: 134 (d) For each referencing URI in a text/html body part, compare the 135 value of the referencing URI after resolution as described in (a) 136 and (b), with the URI derived from Content-ID and Content-Location 137 headers for other body parts within the same Multipart/related 138 structure. 139 has been changed to: 140 (d) For each referencing URI in a text/html body part, compare the 141 value of the referencing URI after resolution as described in (a) 142 and (b), with the URI derived from Content-ID and Content-Location 143 headers for other body parts within the same or a surrounding 144 Multipart/related structure. 146 In section 9.3, the following text: 147 ; Note - Relative Content-Location is resolved by base 148 ; specified in the Multipart/Related heading 149 has been changed to: 150 ; Note - Relative Content-Location is resolved by base 151 ; specified in the Multipart/Related Content-Location heading 153 In section 11.1, the following paragraph has been added: 154 HTML-formatted messages can be used to investigate user behaviour 155 for example to break anonymity, in ways which invade the privacy of 156 individuals. If you send a message with a inline link to an object 157 which is not itself included in the message, the recipients mailer 158 or browser may request that object through HTTP. The HTTP 159 transaction will then reveal who is reading the message. Example: A 160 person who wants to find out who is behind an anonymous user 161 identity, or from which workstation a user is reading his mail, can 162 do this by sending a message with an inline link and then observe 163 from where this link is used to request the object. 165 In all the examples, all indentation which was there to make the text 166 more legible, but which was not correct according to RFC822, has been 167 removed. In one case, indentation was missing on a continuation line 168 and has been added. 170 Mailing List Information 172 To write contributions 174 Further discussion on this document should be done through the 175 mailing list MHTML@SEGATE.SUNET.SE. 177 Comments on less important details may also be sent to the editor, 178 Jacob Palme . 180 To subscribe 182 To subscribe to this mailing list, send a message to 183 LISTSERV@SEGATE.SUNET.SE 184 which contains the text 185 SUB MHTML 187 To unsubscribe 189 To unsubscribe from this list, send a message to 190 LISTSERV@SEGATE.SUNET.SE 191 which contains the text 192 UNS MHTML 194 To access mailing list archives 196 Archives of this list are available for bulk downloading by 197 anonymous ftp from 198 FTP://SEGATE.SUNET.SE/lists/mhtml/ 200 The archives are available for browsing from 201 HTTP://segate.sunet.se/archives/mhtml.html 203 and may be available in searchable format from 205 http://www.reference.com/cgi-bin/pn/listarch?list=MHTML@segate.sunet.se 207 Finally, the archives are available by email. Send a message to 208 LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list 209 of the archive files, and then a new message "GET " to 210 retrieve archive files. 212 More information 214 Information about the IETF work in developing this standard may also 215 be available at URL: 216 http://www.dsv.su.se/~jpalme/ietf/mhtml.html 218 A collection of test messages is available at 219 http://www.dsv.su.se/~jpalme/mimetest/MHTML-test-messages.html 221 An informational draft [INFO] with advice on how to implement this 222 standard is under development. You can find the most recent draft from 223 http://www.dsv.su.se/~jpalme/ietf/mhtml.html#drafts, or, after it has 224 been published, from 225 http://www.dsv.su.se/~jpalme/ietf/mhtml.html#published. 227 1. Introduction 229 There are a number of document formats (Hypertext Markup Language 230 [HTML2], Extended Markup Language [XML], Portable Document format [PDF] 231 and Virtual Reality Markup Language [VRML]) that specify documents 232 consisting of a root resource and a number of distinct subsidiary 233 resources referenced by URIs within that root resource. There is an 234 obvious need to be able to send such multi-resource documents in e-mail 235 [SMTP], [RFC822] messages. 237 The standard defined in this document specifies how to aggregate such 238 multi-resource documents in MIME-formatted [MIME1 to MIME5] messages 239 for precisely this purpose. 241 While this specification was developed to satisfy the specific 242 aggregation requirements of multi-resource HTML documents, it may also 243 be applicable to other multi-resource document representations linked 244 by URIs. While this is the case, there is no requirement that 245 implementations claiming conformance to this standard be able to handle 246 any URI linked document representations other than those whose root is 247 HTML. 249 This aggregation into a single message of a root resource and the 250 subsidiary resources it references may also be applicable to other 251 protocols such as HTTP or FTP, or to the archiving of complete web 252 pages as they appeared at a particular point in time. 254 An informational RFC will be published as a supplement to this 255 standard. The informational RFC will discuss implementation methods and 256 some implementation problems. Implementers are strongly recommended to 257 read this informational RFC when developing implementations of this 258 standard. You can find it through URL 259 http://www.dsv.su.se/~jpalme/ietf/mhtml.html. 261 This standard specifies that body parts to be referenced can be 262 identified either by a Content-ID (containing a Message-ID value) or by 263 a Content-Location (containing an arbitrary URL). The reason why this 264 standard does not only recommend the use of Content-ID-s is that it 265 should be possible to forward existing web pages via e-mail without 266 having to rewrite the source text of the web pages. Such rewriting has 267 several disadvantages, one of them that security checksums will 268 probably be invalidated. 270 2. Terminology 272 2.1 Conformance requirement terminology 274 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 275 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 276 document are to be interpreted as described in [IETF-TERMS]. 278 An implementation is not compliant if it fails to satisfy one or more 279 of the MUST requirements for the protocols it implements. An 280 implementation that satisfies all the MUST and all the SHOULD 281 requirements for its protocols is said to be "unconditionally 282 compliant"; one that satisfies all the MUST requirements but not all 283 the SHOULD requirements for its protocols is said to be "conditionally 284 compliant." 286 2.2 Other terminology 288 Most of the terms used in this document are defined in other RFCs. 290 Absolute URI, See Relative Uniform Resource Locators [RELURL]. 291 AbsoluteURI 292 CID See Message/External Body Content-ID [MIDCID]. 294 Content-Base This header was specified in RFC 2110, but has been 295 removed in this new version of the MHTML standard. 297 Content-ID See Message/External Body Content-ID [MIDCID]. 299 Content-Location MIME message or content part header with one URI of 300 the MIME message or content part body, defined in 301 section 4.2 below. 303 Content-Transfer-Enco Conversion of a text into 7-bit octets as specified 304 ding in [MIME1] chapter 6. 306 CR See [RFC822]. 308 CRLF See [RFC822]. 310 Displayed text The text shown to the user reading a document with 311 a web browser. This may be different from the HTML 312 markup, see the definition of HTML markup below. 314 Header Field in a message or content heading specifying 315 the value of one attribute. 317 Heading Part of a message or content before the first 318 CRLFCRLF, containing formatted fields with 319 attributes of the message or content. 321 HTML See HTML 2 specification [HTML2]. 323 HTML Aggregate HTML objects together with some or all objects, to 324 objects which the HTML object contains hyperlinks, directly 325 or indirectly. 327 HTML markup A file containing HTML encodings as specified in 328 [HTML] which may be different from the displayed 329 text which a person using a web browser sees. For 330 example, the HTML markup may contain "<" where 331 the displayed text contains the character "<". 333 LF See [RFC822]. 335 MIC Message Integrity Codes, codes use to verify that a 336 message has not been modified. 338 MIME See the MIME specifications [MIME1 to MIME5]. 340 MUA Messaging User Agent. 342 PDF Portable Document Format, see [PDF]. 344 Relative URI, See HTML 2 [HTML2] and RFC 1808[RELURL]. 345 RelativeURI 346 URI, absolute and See RFC 1866 [HTML2]. 347 relative 348 URL See RFC 1738 [URL]. 350 URL, relative See Relative Uniform Resource Locators [RELURL]. 352 VRML See Virtual Reality Markup Language [VRML]. 354 3. Overview 356 An aggregate document is a MIME-encoded message that contains a root 357 resource (object) as well as other resources linked to it via URIs. 358 These other resources may be required to display a multimedia document 359 based on the root resource (inline pictures, style sheets, applets, 360 etc.), or be the root resources of other multimedia documents. It is 361 important to keep in mind that aggregate documents need to satisfy the 362 differing needs of several audiences. 364 Mail sending agents might send aggregate documents as an encoding of 365 normal day-to-day electronic mail. Mail sending agents might also send 366 aggregate documents when a user wishes to mail a particular document 367 from the web to someone else. Finally mail sending agents might send 368 aggregate documents as automatic responders, providing access to WWW 369 resources for non-IP connected clients. Also with other protocols such 370 as HTTP or FTP, there may sometimes be a need to retrieve aggregate 371 documents. Receiving agents also have several differing needs. Some 372 receiving agents might be able to receive an aggregate document and 373 display it just as any other text content type would be displayed. 374 Others might have to pass this aggregate document to a browsing 375 program, and provisions need to be made to make this possible. 377 Finally several other constraints on the problem arise. It is important 378 that it be possible for a document to be signed and for it to be 379 transmitted and displayed without breaking the message integrity (MIC) 380 checksum that is part of the signature. 382 4. The Content-Location MIME Content Header 384 4.1 MIME content headers 386 In order to resolve URI references to resources in other body parts, 387 one MIME content header is defined, Content-Location. This header can 388 occur in any message or content heading. 390 The syntax for this header is, using the syntax definition tools from 391 [ABNF]: 393 quoted-pair = ("\" text) 395 text = %d1-9 / ; Characters excluding CR and LF 396 %d11-12 / 397 %d14-127 399 WSP = SP / HTAB ; Whitespace characters 401 FWS = ([*WSP CRLF] 1*WSP) ; Folding white-space 403 ctext = NO-WS-CTL / ; Non-white-space controls 404 %d33-39 / ; The rest of the US-ASCII 405 %d42-91 / ; characters not including "(", 406 %d93-127 ; ")", or "\" 408 comment = "(" *([FWS] (ctext / quoted-pair / comment)) 409 [FWS] ")" 411 CFWS = *([FWS] comment) (([FWS] comment) / FWS) 413 content-location = "Content-Location:" [CFWS] URI [CFWS] 415 URI = absoluteURI | relativeURI 417 where URI is restricted to the syntax for URLs as defined in Uniform 418 Resource Locators [URL] until IETF specifies other kinds of URIs. 420 4.2 The Content-Location Header 422 A Content-Location header specifies an URI that labels the content of a 423 body part in whose heading it is placed. Its value CAN be an absolute 424 or a relative URI. Any URI or URL scheme may be used, but use of 425 non-standardized URI or URL schemes might entail some risk that 426 recipients cannot handle them correctly. 428 An URI in a Content-Location header need not refer to an resource which 429 is globally available for retrieval using this URI (after resolution of 430 relative URIs). However, URI-s in Content-Location headers (if 431 absolute, or resolvable to absolute URIs) SHOULD still be globally 432 unique. 434 A Content-Location header can thus be used to label a resource which is 435 not retrievable by some or all recipients of a message. For example a 436 Content-Location header may label an object which is only retrievable 437 using this URI in a restricted domain, such as within a 438 company-internal web space. A Content-Location header can even contain 439 a fictitious URI. Such an URI need not be globally unique. 441 A single Content-Location header field is allowed in any message or 442 content heading, in addition to a Content-ID header (as specified in 443 [MIME1]) and, in Message headings, a Message-ID (as specified in 444 [RFC822]). All of these constitute different, equally valid body part 445 labels, and any of them may be used to satisfy a reference to a body 446 part. Multiple Content-Location header fields in the same message 447 heading are not allowed. 449 Example of a multipart/related structure containing body parts with 450 both Content-Location and Content-ID labels: 452 Content-Type: multipart/related; boundary="boundary-example"; 453 type="text/html" 455 --boundary-example 457 Content-Type: text/html; charset=US-ASCII 459 ... ... ... ... 460 ... ... ... ... 462 --boundary-example 463 Content-Type: image/gif 464 Content-ID: <97116092511xyz@foo.bar.net> 465 Content-Location: fiction1/fiction2 467 --boundary-example 468 Content-Type: image/gif 469 Content-ID: <97116092811xyz@foo.bar.net> 470 Content-Location: fiction1/fiction3 472 --boundary-example-- 474 4.3 URIs of MHTML aggregates 476 The URI of an MHTML aggregate is not the same as the URI of its root. 477 The URI of its root will directly retrieve only the root resource 478 itself, even if it may cause a web browser to separately retrieve 479 in-line linked resources. If a Content-Location header field is used in 480 the heading of a multipart/related, this Content-Location SHOULD apply 481 to the whole aggregate, not to its root part. 483 When an URI referring to an MHTML aggregate is used to retrieve this 484 aggregate, the set of resources retrieved can be different from the set 485 of resources retrieved using the Content-Locations of its parts. For 486 example, retrieving an MHTML aggregate may return an old version, while 487 retrieving the root URI and its in-line linked objects may return a 488 newer version. 490 4.4 Encoding and decoding of URIs in MIME header fields 492 4.4.1 Encoding of URIs containing inappropriate characters 494 Some documents may contain URIs with characters that are inappropriate 495 for an RFC 822 header, either because the URI itself has an incorrect 496 syntax according to [URL] or the URI syntax standard has been changed 497 to allow characters not previously allowed in MIME headers. These URIs 498 cannot be sent directly in a message header. If such a URI occurs, all 499 spaces and other illegal characters in it must be encoded using one of 500 the methods described in [MIME3] section 4. This encoding MUST only be 501 done in the header, not in the HTML text. Receiving clients MUST decode 502 the [MIME3] encoding in the heading before comparing URIs in body text 503 to URIs in Content-Location headers. 505 The charset parameter value "US-ASCII" SHOULD be used if the URI 506 contains no octets outside of the 7-bit range. If such octets are 507 present, the correct charset parameter value (derived e.g. from 508 information about the HTML document the URI was found in) SHOULD be 509 used. If this cannot be safely established, the value "UNKNOWN-8BIT" 510 [RFC 1428] MUST be used. 512 Note, that for the matching of URIs in text/html body parts to URIs in 513 Content-Location headers, the value of the charset parameter is 514 irrelevant, but that it may be relevant for other purposes, and that 515 incorrect labeling MUST, therefore, be avoided. Warning: Irrelevance of 516 the charset parameter may not be true in the future, if different 517 character encodings of the same non-English filename are used in HTML. 519 4.4.2 Folding of long URIs 521 Since MIME header fields have a limited length and long URIs can result 522 in Content-Location that exceed this length, Content-Location headers 523 may have to be folded. 525 Encoding as discussed in clause 4.4.1 MUST be done before such folding. 526 After that, the folding can be done, using the algorithm defined in 527 [URLBODY] section 3.1. 529 4.4.3 Unfolding and decoding of received URLs in MIME header fields 531 Upon receipt, folded MIME header fields should be unfolded, and then 532 any MIME encoding should be removed, to retrieve the original URI. 534 5. Base URIs for resolution of relative URIs 536 Relative URIs inside the contents of MIME body parts are resolved 537 relative to a base URI using the methods for resolving relative URIs 538 described in [RELURL]. In order to determine this base URI, the 539 first-applicable method in the following list applies. 541 (a) There is a base specification inside the MIME body part containing 542 the relative URI which resolves relative URIs into absolute URIs. 543 For example, HTML provides the BASE element for this purpose. 545 (b) There is a Content-Location header in the immediately surrounding 546 heading of the body part and it contains an absolute URI. This URI 547 can serve as a base in the same way as a requested URI can serve as 548 a base for relative URIs within a file retrieved via HTTP [HTTP]. 550 (c) If necessary, step (b) can be repeated recursively to find a 551 suitable Content-Location header in a surrounding multi-part and 552 message heading. 554 (d) If the MIME object is returned in a HTTP response, use the 555 URI used to initiate the request 557 (e) When the methods above do not yield an absolute URI, a base URL of 558 "this_message:/" MUST be employed. This base URL has been defined 559 for the sole purpose of resolving relative references within a 560 multipart/related structure when no other base URI is specified. 562 This is also described in other words in section 8.2 below. 564 6. Sending documents without linked objects 566 If a text/html resource (object) is sent without subsidiary resources, 567 to which it refers, it MAY be sent by itself. In this case, embedding 568 it in a multipart/related structure is not necessary. 570 Such a text/html resource may either contain no URIs, or URIs which the 571 recipient is expected to retrieve (if possible) via a URI specified 572 protocol. A text/html resource may also be sent with unresolvable links 573 in special cases, such as when two authors exchange drafts of 574 unfinished resources. 576 Inclusion of URIs referencing resources which the recipient has to 577 retrieve via an URI specified protocol may not work for some 578 recipients. This is because not all e-mail recipients have full 579 Internet connectivity, or because URIs which work for a sender will not 580 work for a recipient. This occurs, for example, when an URI refers to a 581 resource within a company-internal network that is not accessible from 582 outside the company. 584 7. Use of the Content-Type "multipart/related" 586 If a message contains one or more MIME body parts containing URIs and 587 also contains as separate body parts, resources, to which these URIs 588 (as defined, for example, in HTML 2.0 [HTML2]) refer, then this whole 589 set of body parts (referring body parts and referred-to body parts) 590 SHOULD be sent within a multipart/related structure as defined in 591 [REL]. 593 Even though headers can occur in a message that lacks an associated a 594 multipart/related structure, this standard only covers their use for 595 resolution of URIs between body parts inside a multipart/related 596 structure. This standard does cover the case where a resource in a 597 nested multipart/related structure contains URIs that reference MIME 598 body parts in another multipart/related structure, in which it is 599 enclosed. This standard does not cover the case where a resource in a 600 multipart/related structure contains URIs that reference MIME body 601 parts in another parallel or nested multipart/related structure, or in 602 another MIME message, even if methods similar to those described in 603 this standard are used. Implementers who employ such URIs are warned 604 that receiving agents implementing this standard may not be able to 605 process such references. 607 When the start body part of a multipart/related structure is an atomic 608 object, such as a text/html resource, it SHOULD be employed as the root 609 resource of that multipart/related structure. When the start body part 610 of a multipart/related structure is a multipart/alternative structure, 611 and that structure contains at least one alternative body part which is 612 a suitable atomic object, such as a text/html resource, then that body 613 part SHOULD be employed as the root resource of the aggregate document. 614 Implementers are warned, however, that some receiving agents treat 615 multipart/alternative as if it had been multipart/mixed (even though 616 MIME [MIME1] requires support for multipart/alternative). 618 [REL] specifies that a type parameter is mandatory in a "Content-Type: 619 multipart/related" header, and requires that it be employed to specify 620 the type of the multipart/related start object. Thus, the type 621 parameter value shall be "multipart/alternative", when the start part 622 is of "Content-type multipart/alternative", even if the actual root 623 resource is of type "text/html". In addition, if the multipart/related 624 start object is not the first body part in a multipart/related 625 structure, [REL] further requires that its Content-ID MUST be specified 626 as the value of a start parameter in the "Content-Type: 627 multipart/related" header. 629 When rendering a resource in a multipart/related structure, URI 630 references within that resource can be satisfied by body parts within 631 the same multipart/related structure. This is useful: 633 (a) For those recipients who only have email but not full Internet 634 access. 636 (b) For those recipients who for other reasons, such as firewalls or 637 the use of company-internal links, cannot retrieve URI referenced 638 resources via URI specified protocols. 640 Note, that this means that you can, via e-mail, send text/html 641 objects which includes URIs which the recipient cannot resolve via 642 HTTP or other connectivity-requiring URIs. 644 (c) To send a document whose content is preserved even if the 645 resources to which embedded URIs refer are later changed 646 or deleted. 648 (d) For resources which are not available for protocol based 649 retrieval. 651 (e) To speed up access. 653 When a sending MUA sends objects which were retrieved from the WWW, it 654 SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into 655 some other URI form prior to transmitting them. This will allow the 656 receiving MUA to both verify MICs included with the message, as well as 657 verify the documents against their WWW counterpoints, if this is 658 appropriate. 660 In certain cases this will not work - for example, if a resource 661 contains URIs as parameters to objects and applets. In such a case, it 662 might be better to rewrite the document before sending it. This problem 663 is discussed in more detail in the informational RFC which will be 664 published as a supplement to this standard. 666 Within a multipart/related structure, each body part MUST have, if 667 assigned, a different Content-ID header value and a Content-Location 668 header field values which resolve to a different URI. 670 Two body parts in the same multipart/related structure can have the 671 same relative Content-Location header value, only if when resolved to 672 absolute URIs they become different. 674 8. Usage of Links to Other Body Parts 676 8.1 General principle 678 A body part, such as a text/html body part, may contain URIs that 679 reference resources which are included as body parts in the same 680 message -- in detail, as body parts within the same multipart/related 681 structure. Often such URI linked resources are meant to be displayed 682 inline to the viewer of the referencing body part; for example, objects 683 referenced with the SRC attribute of the IMG element in HTML 2.0 684 [HTML2]. New elements and attributes with this property are proposed in 685 the ongoing development of HTML (examples: applet, frame, profile, 686 OBJECT, classid, codebase, data, SCRIPT). A sender might also want to 687 send a set of HTML documents which the reader can traverse, and which 688 are related with the attribute href of the A element. 690 If a user retrieves and displays a web page formed from a text/html 691 resource, and the subsidiary resources it references, and merely saves 692 the text/html resource, that user may not at a later time be able to 693 retrieve and display the web page as it appeared when saved. The format 694 described in this standard can be used to archive and retrieve all of 695 the resources required to display the web page, as it originally 696 appeared at a certain moment of time, in one aggregate file. 698 In order to send or store complete such messages, there is a need to 699 specify how a URI in one body part can reference a resource in another 700 body part. 702 8.2 Resolution of URIs in text/html body parts 704 The resolution of inline, retrieval and other kinds of URIs in 705 text/html body parts is performed in the following way: 707 (a) Unfold multiple line header values according to [URLBODY]. Do NOT 708 however translate character encodings of the kind described in 709 [URL]. Example: Do not transform "a%2eb/c%20d" into "a/b/c d". 711 (b) Remove all MIME encodings, such as content-transfer encoding and 712 header encodings as defined in MIME part 3 [MIME3] Do NOT however 713 translate character encodings of the kind described in [URL]. 714 Example: Do not transform "a%2eb/c%20d" into "a/b/c d". 716 (c) Try to resolve all relative URIs in the HTML content and in 717 Content-Location headers using the procedure described in chapter 718 5 above. The result of this resolution can be an absolute URI, 719 or an absolute URI with the base "this_message:/" as specified 720 in chapter 5. 722 (d) For each referencing URI in a text/html body part, compare the 723 value of the referencing URI after resolution as described in (a) 724 and (b), with the URI derived from Content-ID and Content-Location 725 headers for other body parts within the same or a surrounding 726 Multipart/related structure. If the strings are identical, octet by 727 octet, then the referencing URI references that body part. This 728 comparison will only succeed if the two URIs are identical. This 729 means that if one of the two URIs to be compared was a fictitious 730 absolute URI with the base"this_message:/", the other must also be 731 such a fictitious absolute URI, and not resolvable to a real 732 absolute URI. 734 (e) If (d) fails, try to retrieve the URI referenced resource 735 hyperlink through ordinary Internet lookup. Resolution of URIs of 736 the URL-types "mid" or "cid" to other content-parts, outside the 737 same multipart/related structure, or in other separately sent 738 messages, is not covered by this standard, and is thus neither 739 encouraged nor forbidden. 741 8.3 Use of the Content-ID header and CID URLs 743 When URIs employing a CID (Content-ID) scheme as defined in [URL] and 744 [MIDCID] are used to reference other body parts in an MHTML 745 multipart/related structure, they MUST only be matched against 746 Content-ID header values, and not against Content-Location header with 747 CID: values. Thus, even though the following two headers are identical 748 in meaning, only the Content-ID value will be matched, and the 749 Content-Location value will be ignored. 751 Content-ID: 752 Content-Location: CID: foo@bar.net 754 Note: Content-IDs MUST be globally unique [MIME1]. It is thus not 755 permitted to make them unique only within a message or within a single 756 multipart/related structure. 758 9. Examples 760 Warning: The examples are provided for illustrative purposes only. If 761 there is a contradiction between the explanatory text and the examples 762 in this standard, then the explanatory text is normative. 764 Notation: The examples contain indentation to show the structure, the 765 real objects should not be indented in this way. 767 9.1 Example of a HTML body without included linked objects 769 The first example is the simplest form of an HTML email message. This 770 message does not contain an aggregate HTML object, but simply a message 771 with a single HTML body part. This body part contains a URI but the 772 messages does not contain the resource referenced by that URI. To 773 retrieve the resource referenced by the URI the receiving client would 774 need either IP access to the Internet, or an electronic mail web 775 gateway. 777 From: foo1@bar.net 778 To: foo2@bar.net 779 Subject: A simple example 780 Mime-Version: 1.0 781 Content-Type: text/html; charset=iso-8859-1 782 Content-Transfer-Encoding: 8bit 784 785 786 787

Acute accent

788 The following two lines look have the same screen rendering:

789 E with acute accent becomes �.
790 E with acute accent becomes É.

791 Try clicking 792 here.

793 795 9.2 Example with an absolute URI to an embedded GIF picture 797 The second example is an HTML message which includes a single image, 798 referenced using the Content-Location mechanism. 800 From: foo1@bar.net 801 To: foo2@bar.net 802 Subject: A simple example 803 Mime-Version: 1.0 804 Content-Type: multipart/related; boundary="boundary-example"; 805 type="text/html"; start="" 807 --boundary-example 808 Content-Type: text/html;charset=US-ASCII 809 Content-ID: 811 ... text of the HTML document, which might contain a URI 812 referencing a resource in another body part, for example 813 through a statement such as: 814 IETF logo 817 --boundary-example 818 Content-Location: 819 http://www.ietf.cnri.reston.va.us/images/ietflogo.gif 820 Content-Type: IMAGE/GIF 821 Content-Transfer-Encoding: BASE64 823 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 824 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 825 etc... 827 --boundary-example-- 829 9.3 Example with relative URIs to embedded GIF pictures 831 In this example, a Content-Location header field in the outermost 832 heading will be a base to all relative URLs, also inside the HTML text 833 being sent. 835 From: foo1@bar.net 836 To: foo2@bar.net 837 Subject: A simple example 838 Mime-Version: 1.0 839 Content-Location: http://www.ietf.cnri.reston.va.us/ 840 Content-Type: multipart/related; boundary="boundary-example"; 841 type="text/html" 843 --boundary-example 844 Content-Type: text/html; charset=ISO-8859-1 845 Content-Transfer-Encoding: QUOTED-PRINTABLE 847 ... text of the HTML document, which might contain URIs 848 referencing resources in other body parts, for example through 849 statements such as: 851 IETF logo1 852 IETF logo2 853 IETF logo3 855 Example of a copyright sign encoded with Quoted-Printable: =A9 856 Example of a copyright sign mapped onto HTML markup: ¨ 858 --boundary-example 859 Content-Location: 860 http://www.ietf.cnri.reston.va.us/images/ietflogo1.gif 861 ; Note - Absolute Content-Location does not require a 862 ; base 863 Content-Type: IMAGE/GIF 864 Content-Transfer-Encoding: BASE64 866 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 867 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 868 etc... 870 --boundary-example 871 Content-Location: ietflogo2.gif 872 ; Note - Relative Content-Location is resolved by base 873 ; specified in the Multipart/Related Content-Location heading 874 Content-Transfer-Encoding: BASE64 876 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 877 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 878 etc... 880 --boundary-example 881 Content-Location: 882 http://www.ietf.cnri.reston.va.us/images/ietflogo3.gif 883 Content-Transfer-Encoding: BASE64 885 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 886 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 887 etc... 889 --boundary-example-- 891 9.4 Example with a relative URI and no BASE available 893 From: foo1@bar.net 894 To: foo2@bar.net 895 Subject: A simple example 896 Mime-Version: 1.0 897 Content-Type: multipart/related; boundary="boundary-example"; 898 type="text/html" 900 --boundary-example 901 Content-Type: text/html; charset=iso-8859-1 902 Content-Transfer-Encoding: QUOTED-PRINTABLE 904 ... text of the HTML document, which might contain a URI 905 referencing a resource in another body part, for example 906 through a statement such as: 907 IETF logo 908 Example of a copyright sign encoded with Quoted-Printable: =A9 909 Example of a copyright sign mapped onto HTML markup: ¨ 911 --boundary-example 912 Content-Location: ietflogo.gif 913 Content-Type: IMAGE/GIF 914 Content-Transfer-Encoding: BASE64 916 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 917 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 918 etc... 920 --boundary-example-- 922 9.5 Example using CID URL and Content-ID header to an embedded GIF 923 picture 925 From: foo1@bar.net 926 To: foo2@bar.net 927 Subject: A simple example 928 Mime-Version: 1.0 929 Content-Type: multipart/related; boundary="boundary-example"; 930 type="text/html" 932 --boundary-example 933 Content-Type: text/html; charset=US-ASCII 935 ... text of the HTML document, which might contain a URI 936 referencing a resource in another body part, for example 937 through a statement such as: 938 IETF logo 940 --boundary-example 941 Content-Location: CID:something@else ; this header is disregarded 942 Content-ID: 943 Content-Type: IMAGE/GIF 944 Content-Transfer-Encoding: BASE64 946 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 947 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 948 etc... 950 --boundary-example-- 952 9.6 Example showing permitted and forbidden references between nested 953 body parts 955 This example shows in which cases references are allowed between 956 multiple multipart/related body parts in a message. 958 From: foo1@bar.net 959 To: foo2@bar.net 960 Subject: A simple example 961 Mime-Version: 1.0 962 Content-Type: multipart/related; boundary="boundary-example-1"; 963 type="text/html" 965 --boundary-example-1 966 Content-Type: text/html;charset=US-ASCII 967 Content-ID: 969 The image reference below will be resolved with the image 970 in the next body part. 971 IETF logo with white background 974 The image reference below cannot be resolved within this 975 MIME message, since it contains a reference from an outside 976 body part to an inside body part, which is not supported 977 by this standard. 978 IETF logo with transparent background 981 The anchor reference immediately below will be resolved with 982 the nested text/html body part below: 983 989 Even more info 991 --boundary-example-1 992 Content-Location: 993 http://www.ietf.cnri.reston.va.us/images/ietflogo.gif 994 Content-Type: IMAGE/GIF 995 Content-Transfer-Encoding: BASE64 997 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 998 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A 999 etc... 1001 --boundary-example-1 1002 Content-Location: 1003 http://www.ietf.cnri.reston.va.us/more-info 1004 Content-Type: multipart/related; boundary="boundary-example-2"; 1005 type="text/html" 1006 --boundary-example-2 1007 Content-Type: text/html;charset=US-ASCII 1008 Content-ID: 1010 The image reference below will be resolved with the image 1011 in the surrounding multipart/related above. 1012 IETF logo with white background 1015 The image reference below will be resolved with the image 1016 inside the current nested multipart/related below. 1017 IETF logo with transparent background 1020 --boundary-example-2 1021 Content-Location: http:images/ietflogo2e.gif 1022 Content-Type: IMAGE/GIF 1023 Content-Transfer-Encoding: BASE64 1025 R0lGODlhGAGgANX/ACkpKTExMTk5OUJCQkpKSlJSUlpaWmNjY2tra3Nzc3t7e4 1026 SEhIyMjJSUlJycnKWlpa2trbW1tcDAwM7Ozv/eQnNzjHNzlGtrjGNjhFpae1pa 1027 etc... 1029 --boundary-example-2-- 1030 --boundary-example-1 1031 Content-Location: 1032 http://www.ietf.cnri.reston.va.us/more-info 1033 Content-Type: multipart/related; boundary="boundary-example-3"; 1034 type="text/html" 1035 --boundary-example-3 1036 Content-Type: text/html;charset=US-ASCII 1037 Content-ID: <4@foo@bar.net> 1039 The image reference below will be resolved with the image 1040 inside the current nested multipart/related below. 1041 IETF logo with shadows 1044 The image reference below cannot be resolved according to 1045 this standard since references between parallel multipart/ 1046 related structures are not supported. 1047 IETF logo with transparent background 1050 --boundary-example-3 1051 Content-Location: http:images/ietflogo2d.gif 1052 Content-Type: IMAGE/GIF 1053 Content-Transfer-Encoding: BASE64 1055 R0lGODlhGAGgANX/AMDAwCkpKTExMTk5OUJCQkpKSlJSUlpaWmNjY2tra3Nz 1056 c3t7e4SEhIyMjJSUlJycnKWlpa2trbW1tb29vcbGxs7OztbW1t7e3ufn5+/v 1057 etc... 1059 --boundary-example-3-- 1060 --boundary-example-1-- 1062 10. Character encoding issues and end-of-line issues 1064 For the encoding of characters in HTML documents and other text 1065 documents into a MIME-compatible octet stream, the following mechanisms 1066 are relevant: 1068 - HTML [HTML2], [HTML-I18N] as an application of SGML [SGML] allows 1069 characters to be denoted by character entities as well as by numeric 1070 character references (e.g. "Latin small letter a with acute accent" 1071 may be represented by "á" or "á") in the HTML markup. 1073 - HTML documents, in common with other documents of the MIME 1074 Content-Type "text", can be represented in MIME using one of several 1075 character encodings. The MIME Content-Type "charset" parameter value 1076 indicates the particular encoding used. For the exact meaning and 1077 use of the "charset" parameter, please see [MIME2] chapter 4. 1079 Note that the "charset" parameter refers only to the MIME character 1080 encoding. For example, the string "á" can be sent in MIME 1081 with "charset=US-ASCII", while the raw character "Latin small letter 1082 a with acute accent" cannot. 1084 The above mechanisms are well defined and documented, and therefore not 1085 further explained here. In sending a message, all the above mentioned 1086 mechanisms MAY be used, and any mixture of them MAY occur when sending 1087 the document in MIME format. Receiving user agents (together with any 1088 Web browser they may use to display the document) MUST be capable of 1089 handling any combinations of these mechanisms. 1091 Also note that: 1093 - Any documents including HTML documents that contain octet values 1094 outside the 7-bit range need a content-transfer-encoding applied 1095 before transmission over certain transport protocols [MIME1, 1096 chapter 5]. 1098 - The MIME standard [MIME2] requires that e-mailed documents of 1099 "Content-Type: Text/ MUST be in canonical form before a 1100 Content-Transfer-Encoding is applied, i.e. that line breaks are 1101 encoded as CRLFs, not as bare CRs or bare LFs or something else. 1102 This is in contrast to [HTTP] where section 3.6.1 allows other 1103 representations of line breaks. 1105 Note that this might cause problems with integrity checks based on 1106 checksums, which might not be preserved when moving a document from the 1107 HTTP to the MIME environment. If a document has to be converted in such 1108 a way that a checksum based message integrity check becomes invalid, 1109 then this integrity check header SHOULD be removed from the document. 1111 Other sources of problems are Content-Encoding used in HTTP but not 1112 allowed in MIME, and character sets that are not able to represent line 1113 breaks as CRLF. A good overview of the differences between HTTP and 1114 MIME with regards to Content-Type: "text" can be found in [HTTP], 1115 appendix C. 1117 Some transport mechanisms may specify a default "charset" parameter if 1118 none is supplied [HTTP, MIME1]. Because the default differs for 1119 different mechanisms, when HTML is transferred through e-mail, the 1120 charset parameter SHOULD be included, rather than relying on the 1121 default. 1123 11. Security Considerations 1125 11.1 Security considerations not related to caching 1127 It is possible for a message sender to misrepresent the source of a 1128 multipart/related body part to a message recipient by labeling it with 1129 a Content-Location URI that references another resource. Therefore, 1130 message recipients should only interpret Content-Location URIs as 1131 labeling a body part for the resolution of references from body parts 1132 in the same multipart/related message structure, and not as the source 1133 of a resource, unless this can be verified by other means. 1135 URIs, especially File URIs, if used without change in a message, may 1136 inadvertently reveal information that was not intended to be revealed 1137 outside a particular security context. Message senders should take care 1138 when constructing messages containing the new header fields, defined in 1139 this standard, that they are not revealing information outside of any 1140 security contexts to which they belong. 1142 Some resource servers hide passwords and tickets (access tokens to 1143 information which should not be reveled to others) and other sensitive 1144 information in non-visible fields or URIs within a text/html resource. 1145 If such a text/html resource is forwarded in an email message, this 1146 sensitive information may be inadvertently revealed to others. 1148 Since HTML documents can either directly contain executable content 1149 (i.e., JavaScript) or indirectly reference executable content (The 1150 "INSERT" specification, Java). It is exceedingly dangerous for a 1151 receiving User Agent to execute content received in a mail message 1152 without careful attention to restrictions on the capabilities of that 1153 executable content. (Why??? I do not understand this! What 1154 resdtrictions of what capabilities???/jp) 1156 HTML-formatted messages can be used to investigate user behaviour, for 1157 example to break anonymity, in ways which invade the privacy of 1158 individuals. If you send a message with a inline link to an object 1159 which is not itself included in the message, the recipients mailer or 1160 browser may request that object through HTTP. The HTTP transaction will 1161 then reveal who is reading the message. Example: A person who wants to 1162 find out who is behind an anonymous user identity, or from which 1163 workstation a user is reading his mail, can do this by sending a 1164 message with an inline link and then observe from where this link is 1165 used to request the object. 1167 11.2 Security considerations related to caching 1169 There is a well-known problem with the caching of directly retrieved 1170 web resources. A resource retrieved from a cache may differ from that 1171 re-retrieved from its source. This problem, also manifests itself when 1172 a copy of a resource is delivered in a multipart/related structure. 1174 When processing (rendering) a text/html body part in an MHTML 1175 multipart/related structure, all URIs in that text/html body part which 1176 reference subsidiary resources within the same multipart/related 1177 structure SHALL be satisfied by those resources and not by resources 1178 from any another local or remote source. 1180 Therefore, if a sender wishes a recipient to always retrieve an URI 1181 referenced resource from its source, an URI labeled copy of that 1182 resource MUST NOT be included in the same multipart/related structure. 1184 In addition, since the source of a resource received in a 1185 multipart/related structure can be misrepresented (see 11.1 above), if 1186 a resource received in multipart/related structure is stored in a 1187 cache, it MUST NOT be retrieved from that cache other than by a 1188 reference contained in a body part of the same multipart/related 1189 structure. Failure to honor this directive will allow a 1190 multipart/related structure to be employed as a Trojan Horse. For 1191 example, to inject bogus resources (i.e. a misrepresentation of a 1192 competitor's Web site) into a recipient's generally accessible Web 1193 cache. 1195 12. Differences as compared to the previous version of this proposed 1196 standard in RFC 2110 1198 The specification has been changed to show that the formats described 1199 do not only apply to multipart MIME in email, but also to multipart 1200 MIME transferred through other protocols such as HTTP or FTP. 1202 In order to agree with [RELURL], Content-Location headers in multipart 1203 Content-Headings can now be used as a base to resolve relative URIs in 1204 their component parts, but only if no base URI can be derived from the 1205 component part itself. Base URIs in Content-Location header fields in 1206 inner headings have precedence over base URIs in outer multipart 1207 headings. 1209 The Content-Base header, which was present in RFC 2110, has been 1210 removed. A conservative implementor may choose to accept this header in 1211 input for compatibility with implementations of RFC 2110, but MUST 1212 never send any Content-Base header, since this header is not any more a 1213 part of this standard. 1215 A section 4.4.1 has been added, specifying how to handle the case of 1216 sending a body part whose URI does not agree with the correct URI 1217 syntax. 1219 The handling of relative and absolute URIs for matching between body 1220 parts have been merged into a single description, by specifying that 1221 relative URIs, which cannot be resolved otherwise, should be handled as 1222 if they had been given the URL "this_message:/". 1224 13. Copyright 1226 Copyright (C) The Internet Society 1998. All Rights Reserved. 1228 This document and translations of it may be copied and furnished to 1229 others, and derivative works that comment on or otherwise explain it or 1230 assist in its implementation may be prepared, copied, published and 1231 distributed, in whole or in part, without restriction of any kind, 1232 provided that the above copyright notice and this paragraph are 1233 included on all such copies and derivative works. However, this 1234 document itself may not be modified in any way, such as by removing the 1235 copyright notice or references to the Internet Society or other 1236 Internet organizations, except as needed for the purpose of developing 1237 Internet standards in which case the procedures for copyrights defined 1238 in the Internet Standards process must be followed, or as required to 1239 translate it into languages other than English. 1241 The limited permissions granted above are perpetual and will not be 1242 revoked by the Internet Society or its successors or assigns. 1244 This document and the information contained herein is provided on an 1245 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1246 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 1247 NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL 1248 NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1249 FITNESS FOR A PARTICULAR PURPOSE. 1251 14. Acknowledgments 1253 Harald T. Alvestrand, Richard Baker, Isaac Chan, Dave Crocker, Martin 1254 J. Duerst, Lewis Geer, Roy Fielding, Ned Freed, Al Gilman, Paul 1255 Hoffman, Andy Jacobs, Richard W. Jesmajian, Mark K. Joseph, Greg 1256 Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed Levinson, Jay Levitt, 1257 Albert Lunde, Larry Masinter, Keith Moore, Gavin Nicol, Martyn W. Peck, 1258 Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski, Steve Zilles 1259 and several other people have helped us with preparing this document. I 1260 alone take responsibility for any errors which may still be in the 1261 document. 1263 15. References 1265 Ref. Author, title 1266 --------- -------------------------------------------------------- 1268 |ABNF] D. Rocker, P. Overell: Augmented BNF for Syntax 1269 Specifications: ABNF, RFC 2234, November 1997. 1271 [CONDISP] R. Troost, S. Dorner: "Communicating Presentation 1272 Information in Internet Messages: The 1273 Content-Disposition Header", RFC 2183, August 1997. 1275 [HOSTS] R. Braden (editor): "Requirements for Internet Hosts -- 1276 Application and Support", STD-3, RFC 1123, October 1989. 1278 [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, & M. Duerst: 1279 "Internationalization of the Hypertext Markup Language". 1280 RFC 2070, January 1997. 1282 [HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language 1283 - 2.0", RFC 1866, November 1995. 1285 [HTML3.2] Dave Raggett: HTML 3.2 Reference Specification, W3C 1286 Recommendation, January 1997, at URL 1287 http://www.w3.org/TR/REC-html32.html 1289 [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext 1290 Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996. 1292 [IETF-TERMS] S. Bradner: Key words for use in RFCs to Indicate 1293 Requirements Levels. RFC 2119, March 1997. 1295 [INFO] J. Palme: Sending HTML in MIME, an informational 1296 supplement to the RFC: MIME Encapsulation of Aggregate 1297 Documents, such as HTML (MHTML), work in progress within 1298 IETF in April 1998. 1300 [MD5] R. Rivest: "The MD5 Message-Digest Algorithm", RFC 1321, 1301 April 1992. 1303 [MIDCID] E. Levinson: Content-ID and Message-ID Uniform Resource 1304 Locators", draft-ietf-mhtml-cid-v2-00.txt, July 1997. 1306 [MIME1] N. Freed, N. Borenstein, "Multipurpose Internet Mail 1307 Extensions (MIME) Part One: Format of Internet Message 1308 Bodies", RFC 2045, December 1996. 1309 . 1310 [MIME2] N. Freed, N. Borenstein, "Multipurpose Internet Mail 1311 Extensions (MIME) Part Two: Media Types", RFC 2046, 1312 December 1996. 1314 [MIME3] K. Moore, "MIME (Multipurpose Internet Mail Extensions) 1315 Part Three: Message Header Extensions for Non-ASCII 1316 Text", RFC 2047, December 1996. 1318 [MIME4] N. Freed, J. Klensin, J. Postel, "Multipurpose Internet 1319 Mail Extensions (MIME) Part Four: Registration 1320 Procedures", RFC 2048, January 1997. 1322 [MIME5] "Multipurpose Internet Mail Extensions (MIME) Part Five: 1323 Conformance Criteria and Examples", RFC 2049, December 1324 1996. 1326 [NEWS] M.R. Horton, R. Adams: "Standard for interchange of 1327 USENET messages", RFC 1036, December 1987. 1329 [PDF] Tim Bienz and Richar Cohn: "Portable Document Format 1330 Reference Manual", Addison-Wesley, Reading, MA, USA, 1331 1993, ISBN 0-201-62628-4. 1333 [REL] Edward Levinson: "The MIME 1334 Multipart/Related"multipart/related" Content-Type", 1335 draft-ietf-mhtml-re-v2-00.txt, September 1997. 1337 [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC 1338 1808, June 1995. 1340 [RFC822] D. Crocker: "Standard for the format of ARPA Internet 1341 text messages." STD 11, RFC 822, August 1982. 1343 [SGML] ISO 8879. Information Processing -- Text and Office - 1344 Standard Generalized Markup Language (SGML), 1986. 1345 1347 [SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC 1348 821, August 1982. 1350 [URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform 1351 Resource Locators (URL)", RFC 1738, December 1994. 1353 [URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME 1354 External-Body Access-Type", RFC 2017, October 1996. 1356 [VRML] Gavin Bell, Anthony Parisi, Mark Pesce: "Virtual Reality 1357 Modeling Language (VRML) Version 1.0 Language 1358 Specification." May 1995, 1359 http://www.vrml.org/Specifications/. 1361 [XML] Extensible Markup Language, published by the World Wide 1362 Web Consortium, URL http://www.w3.org/XML/ 1364 16. Author's Addresses 1366 For contacting the editors, preferably write to Jacob Palme. 1368 Jacob Palme Phone: +46-8-16 16 67 1369 Stockholm University and KTH Fax: +46-8-783 08 29 1370 Electrum 230 Email: jpalme@dsv.su.se 1371 S-164 40 Kista, Sweden 1373 Alex Hopmann Email: alexhop@microsoft.com 1374 Microsoft Corporation Phone: +1-425-703-8238 1375 One Microsoft Way 1376 Redmond WA 98052 1378 Nick Shelness Email: Shelness@lotus.com 1379 Lotus Development Corporation 1380 55 Cambridge Parkway 1381 Cambridge MA 02142-1295 1383 Working group chairman: 1385 Einar Stefferud Email: stef@nma.com