idnits 2.17.1 draft-hoffman-widetext-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Found some kind of copyright notice around line 26 but it does not match any copyright boilerplate known by this tool. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 572 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 79 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 12, 1998) is 9238 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' -- No information found for draft-hoffman-utf16-xx - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'UTF16' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 Internet Mail Consortium 3 December 12, 1998 5 Registration for the "widetext" Media Type 7 Status of this Memo 9 This document is an Internet-Draft. Internet-Drafts are working documents 10 of the Internet Engineering Task Force (IETF), its areas, and its working 11 groups. Note that other groups may also distribute working documents as 12 Internet- Drafts. 14 Internet-Drafts are draft documents valid for a maximum of six months. 15 Internet-Drafts may be updated, replaced, or obsoleted by other documents 16 at any time. It is not appropriate to use Internet-Drafts as reference 17 material or to cite them other than as a "working draft" or "work in 18 progress". 20 To view the entire list of current Internet-Drafts, please check the 21 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 22 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), 23 ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), 24 ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 26 Copyright (C) The Internet Society (1998). All Rights Reserved. 28 1. Introduction 30 This document defines a new MIME top-level media type, "widetext", which 31 can be used to carry text that employs the UTF-16 character encoding 32 scheme. The use of the "widetext" media type is limited to text-like MIME 33 bodies that cannot be represented using the "text" media type. 35 1.1 Terminology 37 This document uses the same definitions for "type" and "top-level" that are 38 used in the MIME media types document [MIMETYPES]. 40 The internationalization community has a variety of definitions for many 41 terms that have to do with characters. The following definitions are used 42 in this document: 44 - A "character set" (more precisely called a "coded character set" or 45 "CCS") is a mapping from a set of abstract characters to a set of 46 integers. Examples of coded character sets include ISO 10646, US-ASCII, 47 and the ISO 8859 series. 49 - A "character encoding scheme" or "CES" is a mapping from one or more 50 coded character sets to a set of octets. Some CESs are associated with a 51 single CCS; for example, UTF-16 applies only to ISO 10646. Other CESs, 52 such as ISO 2022, are associated with many CCSs. 54 - A "charset" is a method of mapping a sequence of octets to a sequence of 55 abstract characters. One way to construct a charset is to combine a CES 56 with one or more CCSs. 58 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 59 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 60 document are to be interpreted as described in RFC 2119 [MUSTSHOULD]. 62 2. Need for the "widetext" type 64 [MIMETYPES] describes the purpose for the "text" type. Section 4.1 of that 65 specification says: 67 The "text" media type is intended for sending material which is 68 principally textual in form. 70 However, not all character encoding schemes can be represented in "text" 71 body parts. Section 4.1.1 of that specifications says: 73 The canonical form of any MIME "text" subtype MUST always represent a 74 line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME 75 "text" MUST represent a line break. Use of CR and LF outside of line 76 break sequences is also forbidden. 78 This means that a CES used with the "text" type must assure that the octets 79 with the values 0x0D (CR) and 0x0A (LF) must never appear by themselves, 80 and when they appear in the sequence 0x0D0A they must indicate an line 81 break. Some popular CESs do not conform to this requirement. 83 In particular, the UTF-16 CES has many characters with bare 0x0D and 0x0A 84 octets. The UTF-16 CES is optionally used by some document formats such as 85 XML [XML]. 87 Note that the "widetext" media type is being defined for the first time in 88 this specification, whereas the "text" media type has been defined for many 89 years and is deployed in every MIME agent. It is much more likely that the 90 receiver of a MIME message will have an agent that understands the "text" 91 type than one that the "widetext" type. 93 Thus, if the creator of a MIME body part has a choice, he or she should 94 preferentially create a "text" type instead of a "widetext" type, even if 95 they have to change from one CES to another (as long as that is allowed by 96 the format requirements of the object). The only time a creator should use 97 the "widetext" type is when they cannot use a "text" type due to the need 98 to use a CES that cannot be used with the "text" type. 100 3. Definition of the "widetext" type 102 The "widetext" media type MUST only be used for sending material which is 103 principally textual in form and uses the UTF-16 CES, as defined in 104 [ISO-10646]. (Note that other CESs that can be used with the "widetext" 105 media type may be specified in the future.) A "charset" parameter MAY be 106 used to indicate the character set of the body text for "widetext" 107 subtypes. 109 It is noteworthy that the same set of characters is defined by the Unicode 110 standard [UNICODE], which further defines additional character properties 111 and other application details of great interest to implementors. Up to the 112 present time, changes in Unicode and amendments to ISO/IEC 10646 have 113 tracked each other, so that the character repertoires and code point 114 assignments have remained in sync. The relevant standardization committees 115 have committed to maintain this very useful synchronism. 117 3.1 Representation of line breaks 119 The definition of the line break characters in the canonical form of any 120 subtype of "widetext" is explicitly undefined in this specification. Any 121 charset that is used with a "widetext" subtype MUST have a method for 122 indicating the ends of text lines. 124 3.2 Charset parameter 126 The "charset" parameter for the "widetext" type is similar to that for the 127 "text" type. There are two significant differences: 129 - There is no default character set for the "widetext" type. This is a 130 significant difference from the "text" type. In the "text" type, there 131 are enough restrictions that you can still perform many operations even 132 if you don't recognize the charset. This is not true of text in the 133 "widetext" type. Therefore, "widetext" body that has no "charset" 134 parameter SHOULD be treated as application/octet-stream. 136 - At the time of this writing, the only valid values for the "charset" 137 parameter are "UTF-16", "UTF-16BE", and "UTF-16LE", as defined in 138 [UTF16]. Note that each of these charsets have their byte order 139 specified in the charset definition. Other valid values for the 140 "charset" parameter may be registered in the future. 142 3.3 Default display semantics 144 For unrecognized subtypes in a known character set, a MIME displaying 145 program MAY offer to display the text uninterpreted and MUST have the 146 ability to save the text to a file (after removing any transfer encodings). 148 3.4 Encoding issues 150 UTF-16 text requires a binary-safe transport. Before sending a widetext 151 object over a 7-bit or 8-bit transport, the sender SHOULD use Base64 152 transfer encoding. 154 3.5 Media requirements 156 The "widetext" type is used when the recipient is expected to have a 157 processor to interpret UTF-16, and additionally have a display or printer 158 with facilities that render ISO 10464. 160 4. Subtypes of "widetext" 162 The "text" type has many subtypes that have been defined. Some of the 163 subtypes for "text" also apply for "widetext", while others do not. 164 Registrations for all subtypes appear in Appendix A. 166 Note that the "widetext" type does not inherit any subtypes from the 167 "text" type. All definitions of "widetext" subtypes must be specific 168 to the "widetext" type. 170 Unrecognized subtypes of "widetext" should be treated as subtype "plain" as 171 long as the MIME implementation knows how to handle the charset. 172 Unrecognized subtypes which also specify an unrecognized charset should be 173 treated as "application/octet-stream". 175 It is permitted to have a subtype of "widetext" that is not present in 176 "text" and vice versa. If a subtype name is registered under both 177 "widetext" and "text", the semantics MUST NOT differ in any way other than 178 in the charsets that are permitted. If a subtype is available in both 179 "widetext" and "text", an agent which generates the "widetext" form SHOULD 180 be capable of generating the "text" form with the UTF-8 charset. 182 The canonical form for each subtype of "widetext" is lines ending with the 183 character sequence "CARRIAGE RETURN" "LINE FEED" (0x000D 0x000A). Bare 184 "CARRIAGE RETURN" (0x000D) or "LINE FEED" (0x000A) characters SHOULD NOT 185 appear in any subtype of "widetext". 187 4.1 widetext/plain 189 The simplest and most important subtype of "widetext" is "plain". This 190 indicates plain text that does not contain any formatting commands or 191 directives. Plain text is intended to be displayed "as-is", that is, no 192 interpretation of embedded formatting commands, font attribute 193 specifications, processing instructions, interpretation directives, or 194 content markup should be necessary for proper display. 196 In "widetext/plain", the character sequence "CARRIAGE RETURN" "LINE FEED" 197 (0x000D 0x000A) is equivalent to the character "LINE SEPARATOR" (0x2028). A 198 program creating "widetext/plain" text from primitive characters SHOULD use 199 "CARRIAGE RETURN" "LINE FEED" instead of "LINE SEPARATOR". "widetext/plain" 200 is permitted to carry columnar data such as formatted plain text tables 201 intended for a fixed-width font display. 203 4.2 widetext/paragraph 205 The "paragraph" subtype of "widetext" is similar to the "plain" subtype, 206 except that it can be used for text that is in paragraph form using ISO 207 10646 paragraph marks. Specifically: 209 - the character sequence "CARRIAGE RETURN" "LINE FEED" (0x000D 0x000A) is 210 equivalent to the character "PARAGRAPH SEPARATOR" (0x2029) - no column 211 alignment or fixed-width display is presumed 213 4.3 Other allowed and disallowed subtypes 215 At the time of this specification, the following subtypes have been 216 registered for the "text" type (this list excludes subtypes in the "prs." 217 and "vnd." namespaces). Each subtype of "text" is analyzed for its ability 218 to be used as a subclass of "widetext". 220 4.3.1 Additional subtypes 222 The "widetext" subtypes "html", "sgml", and "xml" are defined in Appendix 223 A. Other registrations for subtypes to "widetext" may appear in the future, 224 as long as they conform to the requirements in this specification. 226 4.3.2 Disallowed subtypes 228 directory -- MUST NOT be used as a subclass of widetext. Section 5.8.1 of 229 RFC 2425 requires CRLFs for line terminators. 231 enriched -- MUST NOT be used as a subclass of widetext. RFC 1896 specifies 232 that multi-byte character sets have to address internal conversion to an 233 ASCII-compatible character set for markup. 235 rfc822-headers -- MUST NOT be used as a subclass of widetext. Only used to 236 encode RFC 822 headers, which always use US-ASCII. 238 richtext -- MUST NOT be used as a subclass of widetext. This subtype is 239 little used and may be obsolete. 241 rtf -- MUST NOT be used as a subclass of widetext. RTF uses only 7 bits per 242 octet. 244 tab-separated-values -- MUST NOT be used as a subclass of widetext. This 245 subtype is little used and may be obsolete. 247 uri-list -- MUST NOT be used as a subclass of widetext. URIs are only 248 defined in US-ASCII. 250 5. Security considerations 252 The introduction of the "widetext" media type does not introduce any inherent 253 security issues. However, using the UTF-16 charset definitely does introduce 254 security issues, and those issues are covered in [UTF16]. 256 6. References 258 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Information 259 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: 260 Architecture and Basic Multilingual Plane. Twelve amendments and two 261 technical corrigenda have been published up to now. UTF-16 is described in 262 Annex Q, published as Amendment 1. Many other amendments are currently at 263 various stages of standardization. 265 [MIMETYPES] N. Freed, N. Borenstein, "MIME Part Two: Media Types", RFC 266 2046, November 1996. 268 [MUSTSHOULD] Bradner, S., "Key words for use in RFCs to Indicate 269 Requirement Levels", BCP 14, RFC 2119, March 1997. 271 [UTF16] "UTF-16, an encoding of ISO 10646", draft in progress, 272 draft-hoffman-utf16-xx.txt. 274 [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.1", 275 Unicode Technical Report #8. 277 [XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible Markup 278 Language (XML)", World Wide Web Consortium Recommendation REC-xml-19980210, 279 . 281 7. Acknowledgments 283 Chris Newman contributed a great deal of editing and writing to the early 284 drafts of this document. Other significant contributors include: 285 Keith Moore 286 Martin Duerst 287 Ned Freed 289 8. Author's address 291 Paul Hoffman 292 Internet Mail Consortium 293 127 Segre Place 294 Santa Cruz, CA 95060 USA 295 phoffman@imc.org 297 9. Changes from -00 to -01 299 Small editorial changes throughout. 301 3.2: Added a bunch of text to the first bullet to explain why you should 302 default to application/octet-stream if there is not charset given. 304 4: Added the second paragraph. In the (now) fourth paragraph, downgraded 305 the MUST to SHOULD. 307 4.3.2: Removed "css" from the beginning of the list. Also updated the 308 reasoning for "rtf" from the current MIME registration. 310 A: Set all the Macintosh type codes to "none". 312 A.6: Added this because CSS does all UTF-16. 314 A. Subtype registrations 316 A.1 widetext/plain 318 To: ietf-types@iana.org 319 Subject: Registration of MIME media type widetext/plain 321 MIME media type name: widetext 323 MIME subtype name: plain 325 Required parameters: none 327 Optional parameters: charset 329 Encoding considerations: All allowed charsets require transfer encoding for 330 7-bit or 8-bit environments. 332 Security considerations: See security section of this specification. 334 Interoperability considerations: Text in "widetext/plain" can be converted 335 to "text/plain" only for applications that allow UTF-8, and only if the 336 input text uses the same line-ending semantics as "text/plain". 338 Published specification: This specification 340 Applications which use this media type: Any application that requires the 341 use of UTF-16. 343 Additional information: 345 Magic number(s): none 346 File extension(s): .txt 347 Macintosh File Type Code(s): none 349 Person & email address to contact for further information: 350 Paul Hoffman 352 Intended usage: COMMON 354 Author/Change controller: 355 Paul Hoffman 357 Other requirements for "widetext/plain" are given in the main body of this 358 specification. 360 A.2 widetext/paragraph 362 To: ietf-types@iana.org 363 Subject: Registration of MIME media type widetext/paragraph 365 MIME media type name: widetext 367 MIME subtype name: paragraph 369 Required parameters: none 371 Optional parameters: charset 373 Encoding considerations: All allowed charsets require transfer encoding for 374 7-bit or 8-bit environments. 376 Security considerations: See security section of this specification. 378 Interoperability considerations: Text in "widetext/paragraph" can be 379 converted to "text/plain" only for applications that allow UTF-8, and only 380 if the input text uses the same line-ending semantics as "text/plain". 382 Published specification: This specification 384 Applications which use this media type: Any application that requires the 385 use of UTF-16. 387 Additional information: 389 Magic number(s): none 390 File extension(s): .txt 391 Macintosh File Type Code(s): none 393 Person & email address to contact for further information: 394 Paul Hoffman 396 Intended usage: COMMON 398 Author/Change controller: 399 Paul Hoffman 401 Other requirements for "widetext/paragraph" are given in the main body of 402 this specification. 404 A.3 widetext/html 406 To: ietf-types@iana.org 407 Subject: Registration of MIME media type widetext/html 409 MIME media type name: widetext 411 MIME subtype name: html 413 Required parameters: none 415 Optional parameters: charset 417 Encoding considerations: All allowed charsets require transfer encoding for 418 7-bit or 8-bit environments. 420 Security considerations: See security section of this specification. 422 Interoperability considerations: Text in "widetext/html" can be converted 423 to "text/html". 425 Published specification: HTML is defined in RFC 1866. More recently, HTML 426 has been defined by the W3C at . 428 Applications which use this media type: HTML applications that require the 429 use of UTF-16. 431 Additional information: 433 Magic number(s): none 434 File extension(s): .htm or .html 435 Macintosh File Type Code(s): none 437 Person & email address to contact for further information: 438 Paul Hoffman 440 Intended usage: COMMON 442 Author/Change controller: 443 Paul Hoffman 445 A.4 widetext/sgml 447 To: ietf-types@iana.org 448 Subject: Registration of MIME media type widetext/sgml 450 MIME media type name: widetext 452 MIME subtype name: sgml 454 Required parameters: none 456 Optional parameters: charset, SGML-bctf, SGML-boot 458 Encoding considerations: All allowed charsets require transfer encoding for 459 7-bit or 8-bit environments. 461 Security considerations: See security section of this specification. 463 Interoperability considerations: Text in "widetext/sgml" can be converted 464 to "text/sgml" and "application/sgml". 466 Published specification: This registration is based on RFC 1874. 468 Applications which use this media type: SGML applications that require the 469 use of UTF-16. 471 Additional information: 473 Magic number(s): none 474 File extension(s): none 475 Macintosh File Type Code(s): none 477 Person & email address to contact for further information: 478 Paul Hoffman 480 Intended usage: COMMON 482 Author/Change controller: 483 Paul Hoffman 485 A.5 widetext/xml 487 To: ietf-types@iana.org 488 Subject: Registration of MIME media type widetext/xml 490 MIME media type name: widetext 492 MIME subtype name: xml 494 Required parameters: none 496 Optional parameters: charset 498 Encoding considerations: All allowed charsets require transfer encoding for 499 7-bit or 8-bit environments. 501 Security considerations: See security section of this specification. 503 Interoperability considerations: Text in "widetext/xml" can be converted 504 to "text/xml" and "application/sgml". 506 Published specification: This registration is based on RFC 2376. 508 Applications which use this media type: XML applications that require the 509 use of UTF-16. 511 Additional information: 513 Magic number(s): none 514 File extension(s): .xml 515 Macintosh File Type Code(s): none 517 Person & email address to contact for further information: 518 Paul Hoffman 520 Intended usage: COMMON 522 Author/Change controller: 523 Paul Hoffman 525 A.6 widetext/css 527 To: ietf-types@iana.org 528 Subject: Registration of MIME media type widetext/css 530 MIME media type name: widetext 532 MIME subtype name: css 534 Required parameters: none 536 Optional parameters: charset 538 Encoding considerations: All allowed charsets require transfer encoding for 539 7-bit or 8-bit environments. 541 Security considerations: See security section of this specification. 543 Interoperability considerations: Text in "widetext/css" can be converted 544 to "text/css". 546 Published specification: This registration is based on RFC 2318. 548 Applications which use this media type: CSS applications that require the 549 use of UTF-16. 551 Additional information: 553 Magic number(s): none 554 File extension(s): .css 555 Macintosh File Type Code(s): none 557 Person & email address to contact for further information: 558 Paul Hoffman 560 Intended usage: COMMON 562 Author/Change controller: 563 Paul Hoffman