idnits 2.17.1 draft-freed-pvcsc-03.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 206 has weird spacing: '...ginning of...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1997) is 9783 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC-822' is defined on line 362, but no explicit reference was found in the text == Unused Reference: 'RFC-1766' is defined on line 366, but no explicit reference was found in the text == Unused Reference: 'RFC-2045' is defined on line 370, but no explicit reference was found in the text == Unused Reference: 'RFC-2046' is defined on line 376, but no explicit reference was found in the text == Unused Reference: 'RFC-2047' is defined on line 381, but no explicit reference was found in the text == Unused Reference: 'RFC-2048' is defined on line 387, but no explicit reference was found in the text == Unused Reference: 'RFC-2049' is defined on line 393, but no explicit reference was found in the text == Unused Reference: 'RFC-CDISP' is defined on line 412, but no explicit reference was found in the text ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) ** Obsolete normative reference: RFC 2048 (Obsoleted by RFC 4288, RFC 4289) ** Obsolete normative reference: RFC 2060 (Obsoleted by RFC 3501) ** Downref: Normative reference to an Informational RFC: RFC 2130 -- Possible downref: Non-RFC (?) normative reference: ref. 'RFC-CDISP' Summary: 14 errors (**), 0 flaws (~~), 11 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Ned Freed 3 Internet Draft Keith Moore 4 6 MIME Parameter Value and Encoded Word Extensions: 7 Character Sets, Languages, and Continuations 9 June 1997 11 Status of this Memo 13 This document is an Internet-Draft. Internet-Drafts are 14 working documents of the Internet Engineering Task Force 15 (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months. Internet-Drafts may be updated, replaced, or obsoleted 21 by other documents at any time. It is not appropriate to use 22 Internet-Drafts as reference material or to cite them other 23 than as a "working draft" or "work in progress". 25 To learn the current status of any Internet-Draft, please 26 check the 1id-abstracts.txt listing contained in the 27 Internet-Drafts Shadow Directories on ds.internic.net (US East 28 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 29 or munnari.oz.au (Pacific Rim). 31 The current draft of this memo reflects comments received 32 during the last call period. In particular, a reference to RFC 33 2119 has been added, as have some directives on how to handle 34 character sets with embedded language tagging facilities. 36 1. Abstract 38 This memo defines extensions to the RFC 2045 media type and 39 RFC CDISP disposition parameter value mechanisms to provide 41 (1) a means to specify parameter values in character sets 42 other than US-ASCII, 43 (2) to specify the language to be used should the value be 44 displayed, and 46 (3) a continuation mechanism for long parameter values to 47 avoid problems with header line wrapping. 49 This memo also defines an extension to the encoded words 50 defined in RFC 2047 to allow the specification of the language 51 to be used for display as well as the character set. 53 2. Introduction 55 The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, 56 RFC-2046, RFC-2047, RFC-2048, RFC-2049], define a message 57 format that allows for 59 (1) textual message bodies in character sets other than 60 US-ASCII, 62 (2) non-textual message bodies, 64 (3) multi-part message bodies, and 66 (4) textual header information in character sets other than 67 US-ASCII. 69 MIME is now widely deployed and is used by a variety of 70 Internet protocols, including, of course, Internet email. 71 However, MIME's success has resulted in the need for 72 additional mechanisms that were not provided in the original 73 protocol specification. 75 In particular, existing MIME mechanisms provide for named 76 media type (content-type field) parameters as well as named 77 disposition (content-disposition field). A MIME media type 78 may specify any number of parameters associated with all of 79 its subtypes, and any specific subtype may specify additional 80 parameters for its own use. A MIME disposition value may 81 specify any number of associated paramters, the most important 82 of which is probably the attachment disposition's filename 83 parameter. 85 These parameter names and values end up appearing in the 86 content-type and content-disposition header fields in Internet 87 email. This inherently imposes three crucial limitations: 89 (1) Lines in Internet email header fields are folded 90 according to RFC 822 folding rules. This makes long 91 parameter values problematic. 93 (2) MIME headers, like the RFC 822 headers they often 94 appear in, are limited to 7bit US-ASCII, and the 95 encoded-word mechanisms of RFC 2047 are not available 96 to parameter values. This makes it impossible to have 97 parameter values in character sets other than US-ASCII 98 without specifying some sort of private per-parameter 99 encoding. 101 (3) It has recently become clear that character set 102 information is not sufficient to properly display some 103 sorts of information -- language information is also 104 needed [RFC-2130]. For example, support for 105 handicapped users may require reading text string 106 aloud. The language the text is written in is needed 107 for this to be done correctly. Some parameter values 108 may need to be displayed, hence there is a need to 109 allow for the inclusion of language information. 111 The last problem on this list is also an issue for the encoded 112 words defined by RFC 2047, as encoded words are intended 113 primarily for display purposes. 115 This document defines extensions that address all of these 116 limitations. All of these extensions are implemented in a 117 fashion that is completely compatible at a syntactic level 118 with existing MIME implementations. In addition, the 119 extensions are designed to have as little impact as possible 120 on existing uses of MIME. 122 IMPORTANT NOTE: These mechanisms end up being somewhat 123 gibbous when they actually are used. As such, use of these 124 mechanisms should not be used lightly; they should be reserved 125 for situations where a real need for them exists. 127 2.1. Requirements notation 129 This document occasionally uses terms that appear in capital 130 letters. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD 131 NOT", and "MAY" appear capitalized, they are being used to 132 indicate particular requirements of this specification. A 133 discussion of the meanings of these terms appears in [RFC- 134 2119]. 136 3. Parameter Value Continuations 138 Long MIME media type or disposition parameter values do not 139 interact well with header line wrapping conventions. In 140 particular, proper header line wrapping depends on there being 141 places where linear whitespace (LWSP) is allowed, which may or 142 may not be present in a parameter value, and even if present 143 may not be recognizable as such since specific knowledge of 144 parameter value syntax may not be available to the agent doing 145 the line wrapping. The result is that long parameter values 146 may end up getting truncated or otherwise damaged by incorrect 147 line wrapping implementations. 149 A mechanism is therefore needed to break up parameter values 150 into smaller units that are amenable to line wrapping. Any 151 such mechanism MUST be compatible with existing MIME 152 processors. This means that 154 (1) the mechanism MUST NOT change the syntax of MIME media 155 type and disposition lines, and 157 (2) the mechanism MUST NOT depend on parameter ordering 158 since MIME states that parameters are not order 159 sensitive. Note that while MIME does prohibit 160 modification of MIME headers during transport, it is 161 still possible that parameters will be reordered when 162 user agent level processing is done. 164 The obvious solution, then, is to use multiple parameters to 165 contain a single parameter value and to use some kind of 166 distinguished name to indicate when this is being done. And 167 this obvious solution is exactly what is specified here: The 168 asterisk character ("*") followed by a decimal count is 169 employed to indicate that multiple parameters are being used 170 to encapsulate a single parameter value. The count starts at 171 0 and increments by 1 for each subsequent section of the 172 parameter value. Decimal values are used and neither leading 173 zeroes nor gaps in the sequence are allowed. 175 The original parameter value is recovered by concatenating the 176 various sections of the parameter, in order. For example, the 177 content-type field 179 Content-Type: message/external-body; access-type=URL; 180 URL*0="ftp://"; 181 URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" 183 is semantically identical to 185 Content-Type: message/external-body; access-type=URL; 186 URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" 188 Note that quotes around parameter values are part of the value 189 syntax; they are NOT part of the value itself. Furthermore, 190 it is explicitly permitted to have a mixture of quoted and 191 unquoted continuation fields. 193 4. Parameter Value Character Set and Language Information 195 Some parameter values may need to be qualified with character 196 set or language information. It is clear that a distinguished 197 parameter name is needed to identify when this information is 198 present along with a specific syntax for the information in 199 the value itself. In addition, a lightweight encoding 200 mechanism is needed to accomodate 8 bit information in 201 parameter values. 203 Asterisks ("*") are reused to provide the indicator that 204 language and character set information is present and encoding 205 is being used. A single quote ("'") is used to delimit the 206 character set and language information at the beginning of 207 the parameter value. Percent signs ("%") are used as the 208 encoding flag, which agrees with RFC 2047. 210 Specifically, an asterisk at the end of a parameter name acts 211 as an indicator that character set and language information 212 may appear at the beginning of the parameter value. A single 213 quote is used to separate the character set, language, and 214 actual value information in the parameter value string, and an 215 percent sign is used to flag octets encoded in hexadecimal. 216 For example: 218 Content-Type: application/x-stuff; 219 title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A 221 Note that it is perfectly permissible to leave either the 222 character set or language field blank. Note also that the 223 single quote delimiters MUST be present even when one of the 224 field values is omitted. This is done when either character 225 set, language, or both are not relevant to the parameter value 226 at hand. This MUST NOT be done in order to indicate a default 227 character set or language -- parameter field definitions MUST 228 NOT assign a default character set or lanugage. 230 4.1. Combining Character Set, Language, and Parameter 231 Continuations 233 Character set and language information may be combined with 234 the parameter continuation mechanism. For example: 236 Content-Type: application/x-stuff 237 title*1*=us-ascii'en'This%20is%20even%20more%20 238 title*2*=%2A%2A%2Afun%2A%2A%2A%20 239 title*3="isn't it!" 241 Note that: 243 (1) Language and character set information only appear at 244 the beginning of a given parameter value. 246 (2) Continuations do not provide a facility for using more 247 than one character set or language in the same 248 parameter value. 250 (3) A value presented using multiple continuations may 251 contain a mixture of encoded and unencoded segments. 253 (4) The first segment of a continuation MUST be encoded if 254 language and character set information are given. 256 (5) If the first segment of a continued parameter value is 257 encoded the language and character set field delimiters 258 MUST be present even when the fields are left blank. 260 5. Language specification in Encoded Words 262 RFC 2047 provides support for non-US-ASCII character sets in 263 RFC 822 message header comments, phrases, and any unstructured 264 text field. This is done by defining an encoded word 265 construct which can appear in any of these places. Given that 266 these are fields intended for display, it is sometimes 267 necessary to associate language information with encoded words 268 as well as just the character set. This specification extends 269 the definition of an encoded word to allow the inclusion of 270 such information. This is simply done by suffixing the 271 character set specification with an asterisk followed by the 272 language tag. For example: 274 From: =?US-ASCII*EN?Q?Keith_Moore?= 276 6. IMAP4 Handling of Parameter Values 278 IMAP4 [RFC-2060] servers SHOULD decode parameter value 279 continuations when generating the BODY and BODYSTRUCTURE fetch 280 attributes. 282 7. Modifications to MIME ABNF 284 The ABNF for MIME parameter values given in RFC 2045 is: 286 parameter := attribute "=" value 288 attribute := token 289 ; Matching of attributes 290 ; is ALWAYS case-insensitive. 292 This specification changes this ABNF to: 294 parameter := regular-parameter / extended-parameter 296 regular-parameter := regular-parameter-name "=" value 298 regular-parameter-name := attribute [section] 300 attribute := 1*attribute-char 301 attribute-char := 304 section := initial-section / other-sections 306 initial-section := "*1" 308 other-sections := "*" (("2" / "3" / "4" / "5" / 309 "6" / "7" / "8" / "9") *DIGIT) / 310 ("1" 1*DIGIT)) 312 extended-parameter := (extended-initial-name "=" 313 extended-value) / 314 (extended-other-names "=" 315 extended-other-values) 317 extended-initial-name := attribute [initial-section] "*" 319 extended-other-names := attribute other-sections "*" 321 extended-initial-value := [charset] "'" [language] "'" 322 extended-other-values 324 extended-other-values := *(ext-octet / attribute-char) 326 ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 328 charset := 330 language := 332 The ABNF given in RFC 2047 for encoded-words is: 334 encoded-word := "=?" charset "?" encoding "?" encoded-text "?=" 336 This specification changes this ABNF to: 338 encoded-word := "=?" charset ["*" language] "?" encoded-text "?=" 340 8. Character sets which allow specification of language 342 In the future it is likely that some character sets will 343 provide facilities for inline language labelling. Such 344 facilities are inherently more flexible than those defined 345 here as they allow for language switching in the middle of a 346 string. 348 If and when such facilities are developed they SHOULD be used 349 in preference to the language labelling facilities specified 350 here. Note that all the mechanisms defined here allow for the 351 omission of language labels so as to be able to accomodate 352 this possible future usage. 354 9. Security Considerations 356 This RFC does not discuss security issues and is not believed 357 to raise any security issues not already endemic in electronic 358 mail and present in fully conforming implementations of MIME. 360 10. References 362 [RFC-822] 363 Crocker, D., "Standard for the Format of ARPA Internet 364 Text Messages", RFC 822 August, 1982. 366 [RFC-1766] 367 Alvestrand, H., "Tags for the Identification of 368 Languages", RFC 1766, March, 1995. 370 [RFC-2045] 371 Freed, N. and Borenstein, N., "Multipurpose Internet Mail 372 Extensions (MIME) Part One: Format of Internet Message 373 Bodies", RFC 2045, Innosoft, First Virtual Holdings, 374 December 1996. 376 [RFC-2046] 377 Freed, N. and Borenstein, N., "Multipurpose Internet Mail 378 Extensions (MIME) Part Two: Media Types", RFC 2046, 379 Innosoft, First Virtual Holdings, December 1996. 381 [RFC-2047] 382 Moore, K., "Multipurpose Internet Mail Extensions (MIME) 383 Part Three: Representation of Non-ASCII Text in Internet 384 Message Headers", RFC 2047, University of Tennessee, 385 December 1996. 387 [RFC-2048] 388 Freed, N., Klensin, J., Postel, J., "Multipurpose 389 Internet Mail Extensions (MIME) Part Four: MIME 390 Registration Procedures", RFC 2048, Innosoft, MCI, ISI, 391 December 1996. 393 [RFC-2049] 394 Freed, N. and Borenstein, N., "Multipurpose Internet Mail 395 Extensions (MIME) Part Five: Conformance Criteria and 396 Examples", RFC 2049, Innosoft, FIrst Virtual Holdings, 397 December 1996. 399 [RFC-2060] 400 Crispin, M., "Internet Message Access Protocol - Version 401 4rev1", RFC 2060, December 1996. 403 [RFC-2119] 404 Bradner, S., "Key words for use in RFCs to Indicate 405 Requirement Levels", RFC 2119, March 1997. 407 [RFC-2130] 408 Weider, C., Preston, C., Simonsen, K., Alvestrand, H., 409 Atkinson, R., Crispin, M., Svanberg, P., "Report from the 410 IAB Character Set Workshop", RFC 2130, April 1997. 412 [RFC-CDISP] 413 Troost, R., Dorner, S., and Moore, K., "Communicating 414 Presentation Information in Internet Messages: The 415 Content-Disposition Header", Internet Draft, February 416 1997. 418 11. Authors' Addresses 420 Ned Freed 421 Innosoft International, Inc. 422 1050 East Garvey Avenue South 423 West Covina, CA 91790 424 USA 425 tel: +1 818 919 3600 fax: +1 818 919 3614 426 email: ned@innosoft.com 428 Keith Moore 429 Computer Science Dept. 430 University of Tennessee 431 107 Ayres Hall 432 Knoxville, TN 37996-1301 433 USA 434 email: moore@cs.utk.edu