idnits 2.17.1 draft-seantek-text-nfo-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == It seems as if not all pages are separated by form feeds - found 13 form feeds but 437 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2591 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Leonard 3 Internet-Draft Penango, Inc. 4 Intended Status: Informational March 13, 2017 5 Expires: September 14, 2017 7 The text/nfo Media Type 8 draft-seantek-text-nfo-04 10 Abstract 12 This document registers the text/nfo media type for use with release 13 iNFOrmation. While compatible with text/plain, ".NFO" files and 14 content have distinguishing characteristics from typical plain text 15 because they are meant to be output to IBM PC-compatible system 16 consoles that support certain "ANSI" escape sequences. 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 Copyright Notice 35 Copyright (c) 2017 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 1. iNFOrmation 50 Packagers of files or other bundled content commonly include a common 51 human-readable manifest that describes their packages. While an 52 obvious solution is to include a README, in an archive such as a ZIP 53 file, READMEs are generally written for software applications and 54 provide late-breaking instructions on how to configure and install 55 the software, along with known bugs and changelogs. (Plain) text 56 READMEs are also generally limited to printable US-ASCII characters. 58 Starting from circa 1990, packagers of various types of content 59 settled upon the Release iNFOrmation format (NFO, commonly pronounced 60 "EN-foe" or "info") to describe their releases. An NFO file serves 61 similar purposes to a README, but with several nuanced differences. 62 NFOs usually contain release information about the media, rather than 63 about software per-se. NFOs credit the releasers or packagers. Much 64 like the Received: Internet Message header [RFC5322], intermediates 65 ("couriers") can also insert NFOs. 67 Most distinctly, NFOs have come to contain elaborate ASCII or ANSI 68 artwork that is remarkable in its own right in the pantheon of the 69 postmodern computing culture. Many NFOs have been authored with the 70 intent of displaying them on a terminal display with monospaced, 71 inverted text (black background, gray or off-white foreground); some 72 NFOs even include escape sequences to generate animations or color. 73 The widely accepted encoding for NFOs is "OEM Code Page 437", the 74 character set of the original IBM PC and MS-DOS. 76 When served in the same manner as plain text (text/plain), a lot of 77 the elaborate artwork in NFOs is lost, garbled, or misaligned on 78 display. As NFOs are still in considerable use, the goal of this 79 registration is to rectify these interchange problems and reclaim 80 this piece of living computer history. 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in [RFC2119]. 86 2. Release iNFOrmation Media Type Registration Application 88 Type name: text 90 Subtype name: nfo 92 Required parameters: 94 charset: Per Section 4.2.1 of [RFC6838], charset is REQUIRED. Unlike 95 most other text types, the default value is the character set of 96 the original IBM PC and MS-DOS, called OEM Code Page 437, and named 97 "oem437". Implementations MUST support OEM Code Page 437. 98 Unfortunately, the simple application of the IANA registered 99 character set "IBM437" (aka "cp437") [RFC1345] will miss some 100 important characters, so conformant implementations MUST support 101 OEM Code Page 437 as specified in Section 3. NFOs authored for more 102 modern computing environments are known to use ISO-8859-1, ISO- 103 8859-15 (including support for the Euro sign), or UTF-8; however, 104 for maximum interoperability, these or any other character sets 105 MUST be declared by the sender. When absent, a receiver MAY guess, 106 unless UTF-8 encoding is patently obvious. A RECOMMENDED detection 107 algorithm is provided in Appendix A. 109 Optional parameters: 111 baud: A natural number (integer greater than 0) indicating the gross 112 bit rate ("symbol rate") at which the NFO is supposed to be 113 rendered to screen. This optional parameter provides a nostalgic 114 effect from the days of dialup modems and fixed-speed serial lines. 115 It also controls the animation rate, to the extent that the NFO 116 employs optional escape sequences. While the term "bps" might be 117 more accurate, this parameter is meant to be interpreted the way 118 that an end user would experience the real-world conditions that a 119 dialup modem would provide on the eve of Y2K. (The term "baud" is 120 also used by a couple of popular modern viewers of this format.) 121 For example, a conforming implementation could implement "57600" as 122 if the data were being downloaded using a V.92 modem, replete with 123 random stalls due to retransmission attempts on account of noise on 124 the line. 126 Encoding considerations: 128 Text with 8-bit code points; all 8-bit combinations (including NUL) 129 are possible. 131 Security considerations: 133 It's just text; this format provides no facilities for 134 confidentiality or integrity. The ANSI escape sequence "CSI 5 m" 135 could, however, blink you to death. As only a subset of ANSI escape 136 sequences MUST be interpreted; interpreting a greater range than 137 the subset prescribed in this registration may introduce other 138 security issues, such as transmitting operating system commands. 140 Some code points in oem437 have been used ambiguously in practice, 141 so implementations SHOULD NOT assume that the mapping between this 142 charset and Unicode is bijective. When displayed, codes 00, 20, and 143 FF MAY appear to be similar, i.e., as a blank space. 145 Interoperability considerations: 147 NFOs are plain text but look best when read in a terminal view or 148 with a dedicated NFO viewer that can emulate terminal features. As 149 a result, they SHOULD be treated differently than text/plain files. 150 The reference environment for NFO viewers to emulate is an IBM 151 PC-compatible machine running MS-DOS 6.22 with the ANSI.SYS MS-DOS 152 device driver loaded, where the NFO is displayed as if it were 153 output to the terminal using the "TYPE" command. 155 Published specification: [[Note to RFC Editor: Insert number here.]] 157 Applications that use this media type: 159 NFO viewers; text editors; terminals. 161 Fragment identifier considerations: 163 Same as text/plain [RFC5147]. 165 Additional information: 167 Deprecated alias names for this type: text/x-nfo 168 File extension(s): .nfo 169 Macintosh file type code(s): 170 TEXT. A uniform type identifier (UTI) of "public.nfo", which 171 conforms to "public.plain-text", is RECOMMENDED. 173 Person & email address to contact for further information: 175 Sean Leonard 177 Restrictions on usage: None. 179 Author/Change controller: Sean Leonard 181 Intended usage: COMMON 183 Provisional registration? No 185 "OEM Code Page 437" refers to the character set of the original IBM 186 PC and MS-DOS. The code page actually represents two related things: 187 the set of 256 graphemes stored in video read-only memory (ROM) that 188 are accessed with a single 8-bit code, and an 8-bit encoding for text 189 content that displays the graphemes or causes other behavior as 190 defined by the code, the operating system, and the loaded device 191 drivers. NFO is encoded with the aforementioned 8-bit encoding, which 192 means that not all 256 graphemes are directly available for use. 194 For example: the sequence 0D 0A (CR LF) identifies a new line; the 195 code 1A (SUB) is the MS-DOS end-of-file marker. The code 0D cannot be 196 used directly to express the grapheme U+266A EIGHTH NOTE; the code 0A 197 cannot be used directly to express the grapheme U+25D9 INVERSE WHITE 198 CIRCLE; the code 1A cannot be used to express U+2191 RIGHTWARDS 199 ARROW. 201 The registration for IBM437 [RFC1345] is used as a basis for this 202 specification, which only elaborates upon the differences. Suggested 203 mappings to Unicode characters are included; however, the mapping is 204 not bijective. Octets are in hexadecimal. The symbols below next to 205 the octets match [RFC1345], although the actual character has the 206 meaning described here rather than the [RFC1345] meaning. 208 3.1. Low-Order Codes (00-7F) 210 The codes in the 20-7E range are the same as in US-ASCII and IBM437. 212 01-06, 0B, 0C, 0E-19, and 1C-1F are displayed as their corresponding 213 ROM graphemes. 215 00 NUL is displayed (and treated) as a space. Depending on the output 216 environment, an implementation MAY map this code to U+0000 217 NULL, or U+0020 SPACE. 219 07 BEL MAY cause an audible bell sound (beep) to be emitted. Actually 220 emitting a sound is not required for conformance. However, 221 implementations that progressively render the output MUST 222 pause for this code as if a sound were emitted. 224 08 BS causes the prior character to be erased: the prior grapheme is 225 displayed and treated as a regular or non-breaking space (SP 226 or NBSP), depending on whether the prior character would have 227 been breaking or non-breaking. 229 09 HT causes horizontal tabbing, which for purposes of conformance, 230 SHOULD produce the equivalent spaces so that the subsequent 231 text is aligned on the next 8-character boundary. 233 0A LF causes a new line to be created and the text insertion point 234 ("cursor") to be moved to the beginning of that line. 236 0D CR causes the text insertion point ("cursor") to be moved to the 237 beginning of the current line. Subsequent text will overwrite 238 the characters on the current line, until the cursor moves 239 somewhere else. (0A creates and moves the cursor to a new 240 line; therefore, 0A in the middle of overwriting the current 241 line will not insert or erase any characters that might 242 otherwise be on that line.) 244 1A SUB is the MS-DOS end-of-file (EOF) marker; it ends the display. 245 Codes after 1A MUST NOT be displayed. 1A can be used to 246 delimit metadata from the main NFO content, although this 247 practice is rarely used for NFOs. A well-known metadata format 248 in this technology area is SAUCE (Standard Architecture for 249 Universal Comment Extensions) [SAUCE], which implementations 250 MAY support. A SAUCE record can specify a different code page. 251 An implementation that supports SAUCE SHOULD support following 252 the code page directive in the SAUCE record when the MIME 253 entity's charset is oem437. 255 1B ESC may be the start of an ANSI ESC sequence. If no valid ESC 256 sequence is recognized, output the corresponding ROM grapheme 257 (U+2190 LEFTWARDS ARROW) and continue normal processing with 258 the next code. 260 7F DEL is displayed as the corresponding ROM grapheme (U+2302 HOUSE). 262 3.2. High-Order Codes (80-FF) 264 The codes in the 80-AF range are a selection of Latin characters; 265 they are the same as in IBM437. A conformant implementation MUST NOT 266 treat these codes as C1 control characters. 268 The codes in the B0-DF range are box drawing and block characters; 269 they are the same as in IBM437. 271 The codes in the E0-FF range are for mathematical symbols, which are 272 the same as in IBM437, with the following exceptions. The preferred 273 Unicode mapping in Microsoft's OEM Code Page 437 documentation is 274 designated with [OEMCP437]: 276 E1 b* can be either U+03B2 GREEK SMALL LETTER BETA, or U+00DF LATIN 277 SMALL LETTER SHARP S (German Eszett) [OEMCP437]. The two were 278 undistinguishable at low resolution on the original IBM 279 hardware. Newer grapheme sets, including those of the IBM EGA 280 and VGA graphics cards, display this code as the Eszett. 281 Unfortunately only context can determine the proper character 282 to use. 284 E3 p* can be U+03C0 GREEK SMALL LETTER PI [OEMCP437], U+03A0 GREEK 285 CAPITAL LETTER PI, or U+220F N-ARY PRODUCT, depending on the 286 particular grapheme used. 288 E4 S* can be either U+03A3 GREEK CAPITAL LETTER SIGMA [OEMCP437] or 289 U+2211 N-ARY SUMMATION. 291 E6 m* can be either U+00B5 MICRO SIGN [OEMCP437] or U+03BC GREEK 292 SMALL LETTER MU. 294 EA W* can be either U+2126 OHM SIGN or U+03A9 GREEK CAPITAL LETTER 295 OMEGA [OEMCP437]. 297 EB d* is U+03B4 GREEK SMALL LETTER DELTA [OEMCP437]. However, it can 298 be used as a surrogate for U+00F0 LATIN SMALL LETTER ETH 299 (Icelandic, Faroese, Old English, IPA) or U+2202 PARTIAL 300 DIFFERENTIAL. 302 ED /0 is U+03C6 GREEK SMALL LETTER PHI [OEMCP437], but in MS-DOS was 303 mainly used as U+2205 EMPTY SET. Other possible meanings 304 include U+03D5 GREEK PHI SYMBOL (used as a technical symbol, 305 with a stroked glyph) (to name angles), U+2300 DIAMETER SIGN, 306 or U+00F8 SMALL LETTER O WITH STROKE (as a surrogate). 308 EE e* is U+03B5 GREEK SMALL LETTER EPSILON [OEMCP437] or U+2208 309 ELEMENT OF. 311 FF NS is NBSP, also known as U+00A0 NO-BREAK SPACE. The ROM grapheme 312 is the same as SP (SPACE), i.e., it is blank. 314 3.3. ANSI Escape Sequences 316 To support NFO content containing colors and other goodies, an NFO 317 viewer MUST support a subset of "ANSI" escape sequences. (The 318 required sequences are not directly related to ANSI, but rather to 319 [ANSI.SYS].) 321 [ANSI.SYS] supports cursor positioning, erasing, Set Graphics Mode 322 (SGR), mode switching, and keyboard remapping. Of these functions, a 323 conforming implementation MUST support the Set Graphics Mode (SGR) 324 escape sequence. An implementation MUST support setting foreground 325 colors (30-37) and background colors (40-47), which are also in 326 [ISO6429]. An implementation MUST support all of the [ANSI.SYS] text 327 attributes (0, 1, 4, (5 and/or 6), 7, and 8). Text attribute 5 is 328 "Blink: Slow" (less than 150 per minute); text attribute 6 is "Blink: 329 Fast" (more than 150 per minute). While [ANSI.SYS] does not document 330 attribute 6, that was the behavior of the actual ANSI.SYS. An 331 implementation SHOULD reproduce similar functionality. 333 The other [ANSI.SYS] escape sequences are OPTIONAL. An implementation 334 MAY support standard or vendor-specific escape sequences. For a list 335 of standard sequences, see, e.g., [ISO6429] and [ISO8613]. 337 3.4. Accessing Hidden Grapheme Codes 339 There is no obvious way to encode the graphemes that are inaccessible 340 at the values 07, 08, 09, 0A, 0D, 1A, and 1B. This specification 341 provides a technique to access these graphemes in the context of OEM 342 Code Page 437. This technique is RECOMMENDED, but not required. 344 Although MS-DOS and ANSI.SYS did not conform to [ISO2022], that 345 standard defines escape sequences to switch to other character sets. 346 Unicode contains appropriate code points for all of the inaccessible 347 graphemes (characters). Accordingly, the escape sequence: 348 ESC % G 349 switches the code to UTF-8 (with unspecified implementation level) 350 [REG196]. While in UTF-8, the escape sequence: 351 ESC % @ 352 reverts the code back to the original [ISO2022]. Normally the code 353 would be [ISO2022], but given the starting context of OEM Code Page 354 437, the code returns to OEM Code Page 437. The codes are as follows: 356 ROM grapheme number 357 | IBM437 symbol 358 | | Unicode code point 359 | | | Unicode name: UTF-8 encoding 360 | | | | 361 07 BEL U+2022 BULLET: E2 80 A2 363 08 BS U+25D8 INVERSE BULLET: E2 97 98 365 09 HT U+25CB WHITE CIRCLE: E2 97 8B 367 0A LF U+25D9 INVERSE WHITE CIRCLE: E2 97 99 369 0D CR U+266A EIGHTH NOTE: E2 99 AA 371 1A SUB U+2192 RIGHTWARDS ARROW: E2 86 92 373 1B ESC U+2190 LEFTWARDS ARROW: E2 86 90 375 3.5. UTF-8/Unicode Processing 377 When NFO content is encoded in UTF-8 or another Unicode encoding 378 [UTF], the C0 and C1 code points may be present. These codes MUST be 379 treated as control codes, not graphemes. They have the same behavior 380 as specified for the special low-order codes described in Section 381 3.1. For example, 1A ends the display, and 09 emits spaces sufficient 382 for 8-column tabbing. 1B is ALWAYS treated as the start of an ESC 383 sequence; if the sequence is not recognized, 1B does NOT revert to 384 outputting a LEFTWARDS ARROW grapheme. Instead, nothing is displayed. 385 For LEFTWARDS ARROW, encode U+2190 instead. 387 The C1 control code 9B (CSI: Control Sequence Introducer) (Unicode 388 code point U+009B) MUST be recognized as such; it is equivalent to 1B 389 5B (ESC [). 391 3.6. Grapheme Reference 393 The following figure is a reference of all 256 graphemes in the IBM 394 PC ROM. The figure is a MIME (base64)-encoded PNG image. 396 MIME-Version: 1.0 397 Content-Type: image/png 398 Content-Disposition: attachment; filename="Codepage-437.png" 399 Content-Transfer-Encoding: base64 401 iVBORw0KGgoAAAANSUhEUgAAASwAAACMCAMAAADxyGQdAAAAGXRFWHRTb2Z0d2FyZQBBZG9i 402 ZSBJbWFnZVJlYWR5ccllPAAAAAZQTFRFqKioAAAAmKDP8QAACYZJREFUeNrsXYt24zoIhP// 403 6XvO7daVmBlAfrRJq+xuk20SWxojBMPD5vvRftiGYIO1wdpgbbA2WJ3D/Xtkv9lg3QyWLY3y 404 u66G8emOL+4DKz4f5yDfmo8an4YR2ud/4kjki9vAss+nXCLUCArJsmNen6hNc/fh53yIGerj 405 3z+kyCin6/EcWPbxYxrKcT49lJ5keQQLJCvI3tfrQeQjWG4oV7g+lsAaJd/6kjUvoxksEL4B 406 mmFqg+igZI1vD3+nc9FlSC9nZxlqUbMBpYtgDc8fP4xJxPHjQOd4ikoHwApiFLQZ01l+RmeR 407 k8PAjGwdZBn+P4SvA+K4GVggR3ZcqeGAcfGiZI1HLiTLw0rsgcWmA8ti2pIysP5NNSLCJMth 408 o+uBxZ+jZFU66wBrVF2JqlqQrLgl9XdD9s3kuqlliGrdp43Kgs6Su+EsPuOBiJUySTOT5/lb 409 n0oV1XTXdBgwWjAdoqyhZM0XIei+hp01avYgWVSdt7fD6TirRqk1rJjadDBjYjiANb6VGDha 410 Z3WX4Vu7O1en8WPuznakN1gbrNcGy/42V2gpMNIs/pswmpw3gkWWE+M/zjj5rjkWj26aRRsE 411 bUj0Ok7umLm7g7TR7H2BtQjQ3GPXgE87ezrAjwn3iX6rXhnohhVgHXzF7KraaDeagcWNInZG 412 2sfrxY5IHFI6YwGx8Ba5z9ACK9r+UfadsW4IFljnK0KlpoQifwks7T/OE6p19SA+zGEB7xAv 413 zxKxOxOBiZcXuPjIRYYxt0hytRJPgAXaO9IDDKzAXpjwV1EcAawY0DDABsUnk7xCXZDVcA4s 414 uWFeliycKl3DKRcllU4lWWY41gIsxVfgpWqBxXSWRC1yOAVpcw6sTpSMjnBBsvh+3IgUdASf 415 6XK+oOBy0bcYWIBwuRteWIaHHh0VkkQrW3MSLGbuxchF4Oe55QkMvqUbtxDc5WX4gFPzrr7R 416 dqR/jqLZYO3HBmsdlw2WMi6bdhY6lWSDhuSXSOPEDX4IvaLP5CEUbGANqN/MobBg6tAXMbLJ 417 6SkzDRaGnanTjlQXMVGF404syq8Mgfh24zdjBHsFrGisGnPjedIX+QxJNHM0kscXM9W1ANZX 418 os4JsMbEAAFWnIV50+nMFBnxGNCfmUTM8eReghVphJjV1AGLe58KLCR+HdMBbBksHC4jbefF 419 Pidnucg2wNSG0Y/6yl6TZ5eZLQ2wSK4DPttZsEg2Ykey4mx7YPmQM2JMYxD9wMmJCFbGHaLL 420 b5clKwOrChAQhkaweaChK1rBPSUkQPHzxRde2TJYrsFKFPwCWPPXm2BZteGyMY9akBCKDsrP 421 FhV8QhAlpoN7ORNhcHTBsjZYLP1bU4lMsXfAQoXaNUqHS2Ni3FNSIISLpFEaB0ZzGGMkBTOa 422 nab2glGqSK93c3dehhh6D7Ds5848RZC207wpmmfBsg3bBuuZZYgZ7Q1Va8m3wktjBvMPKxET 423 LrVxQmslEMoQrfNCmA34IiE1XSciZOUqWG7d8b8PWAqgRvFEYjHPBvzMDcCyI27aWDkYKOPh 424 N/IzmHuUE6SiJi8Dy6qINHVDnCY9kApCkjkJHMvkNlnFiwJqKpmnYkrxgJVWnvUMr91xRtFA 425 ITdLp+F5ftEVi0lYPbAwbbEGi0RSOkWuXNeyZIkaLOarfwdYnE+hYDG+VKZdviJYlPZbB0vX 426 1PFac1NRm4fA4rNZBotkWS6DxasYCZtkUme1s9qKgvJxX8O9z/qsU6jcZW8hWPyk4lzuccfM 427 +Czab8OSOM+jjnSWsLluELXDU99ql902itcH6/JxbhtHVgN1Biuovr+Bxtt81h8m/5oRvBcE 428 K/brwcVvueepJi0ahvAys3rQoiyMWkzMdDBqMeAUsS+HzAUgJhShfUxazMSqMCsYJuGI6VOw 429 8HNVohDK3ky0jhjjhq6oqayHkKW1K0K2aM1FZy/TmWYyUwagDjHKsfsSsV5pHjxtPXHXdrii 430 qngvptMWSLIMEd8saotgwVoxzXA5rKfCBEdEpkUn6BWHxMDUyk8zzajkoKvhQzJHEyxa9ZIR 431 sdkFLHyxJNdhaOgTE96KRM02WJhcR0oZK7B0TbZJypQ22yqZSTpBg5y31vwvgBVUPiE7gs7S 432 CTLpOSmfo/2WsgFYzCX0p8Ca9wSZqMR3w5Ng6druHliephy5PwCW44URXae8tLOKEYhkvFt1 433 1hSnuB8siJv0wYKIA6Wx1Av33m7IX7ApjQyZjXmnhQ7FEepuGFhqPu7/Umdd9sZeuTfP+ZEF 434 +9fupDB+G1bRQ/ntFM2tMrD5rKfBEkkm94/sI7H76HWq+6lkTn/tiJO3ihK6hBRix8Z9ras6 435 ek1Wjn8ff74azL4CWA2sUKC6Z6nN1LeSrBZWrApjESyb6SD96ZeWrNX9lJiDq4zS2+osFqU1 436 Uo0ythZfs2qokxQmOQx2QbKAoeBUtEEZ2lAngm/lkpXUpUVqgeSTYSS8T9HQq36XZLFazjQ/ 437 sidZfbC0rccpcFH5R4Z/4FFLFvcRUWoSyXJOvamuiqS5UwcspaOSjnKKun5GslBGkvLyrrN0 438 FqysnRlr2hkyaZDev6CzfiNYB0Hsbm5lydrrSpabGakTjCRRvfcX7IVqc3W7nfUoWN/Nljxt 439 Z70/WNbvlbgl6xtZhz8F1otL1lIfys4oBZFd/GbFy8tbrhf9Pg3a+sN/LHxmg7XB2mBtsP4W 440 WEf41mLDI4sNsqYPhwf9DJtSqGkU90QlX39Nyarp9YbBsiZZ8pZrj0kWf3TeYg24pPh0JCt5 441 aPLR3KuvpjNFsBY4+I6saRXTorobctTRYh2jlHrzG6wN1gZrg7XB2mBtsDZYPwLWGz2+J8Xz 442 14NVBJvGdhnr1fcm67IuhCmY87Uwp/rY2fosnVRsOtQES5UbdeHDe3tXhAp2Pa+lR7vvV8Aq 443 V3MNFuk/zj5N2lDojjkaK19q3NTgOloxX2fNxzVYhr07PBkRtvAtiybiPuSkeobGusUt8FIc 444 mynCpL1WA6wxBZL0AuJCw2pcst4vE6039SIy0rgJ1rV3Ou+E6SyChf3VJFjuuWTxNa7Ampqs 445 uZMWRKGQsFyGtWRZU+dVwrcAFtYuXpCsFKxYppgvwzpfL0VCphs+Ahaa0vTmiUlGs7otaApW 446 vhs6y3tsLcNrYFkOVriHi+fqI6kRL1Q33kK0vfWvmfLZTVqv7oZzDaAAi93/2sUtBufqgdS4 447 Mx7JaZd33GMJXzLRf7zo7psH8N6+4QZrg7XB+muPDdYGa4O1wdpg/dLHfwIMAORIgm4Mk35I 448 AAAAAElFTkSuQmCC 450 Figure 1: Code Page 437 Grapheme Reference 452 3.7. Charset Registration Template 454 To: ietf-charsets@iana.org 455 Subject: Registration of new charset oem437 457 Charset name: oem437 459 Charset aliases: None. 461 Suitability for use in MIME text: Suitable. 463 Published specification(s): This specification; [OEMCP437]. 465 ISO 10646 equivalency table: 467 This table is taken from the IBM437 registration in [RFC1345], 468 with modifications based on actual implementations of [OEMCP437], 469 as discussed in this document. Character mnemonic symbols 470 generally map to the Unicode code points listed in Section 3 471 of [RFC1345], with the following exceptions. The symbol suffix 472 $ (for example, HT$) means that the Unicode code point 473 mapping is essentially correct, but an implementation might 474 need to perform additional or special processing as discussed 475 in this document, depending on the output environment. 477 The symbol $$ means that this code point has special 478 considerations as discussed in this document, so no 479 single, definitive Unicode code point mapping can be given. 480 Finally, three characters have no corresponding mnemonic 481 symbols in Section 3 of [RFC1345], so symbols are defined here: 483 $> 25ba BLACK RIGHT-POINTING POINTER 484 $< 25c4 BLACK LEFT-POINTING POINTER 485 $B 21a8 UP DOWN ARROW WITH BASE 487 NU$ 0u 0U cH- cD- cC cS BL$ BS$ HT$ LF$ Ml Fm CR$ M2 SU 488 $> $< UD !*2 PI SE SR $B -! -v $$ EC$ -L <> UT Dt 489 SP ! " Nb DO % & ' ( ) * + , - . / 490 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 491 At A B C D E F G H I J K L M N O 492 P Q R S T U V W X Y Z <( // )> '> _ 493 '! a b c d e f g h i j k l m n o 494 p q r s t u v w x y z (! !! !) '? Eh 495 C, u: e' a> a: a! aa c, e> e: e! i: i> i! A: AA 496 E' ae AE o> o: o! u> u! y: O: U: Ct Pd Ye Pt Fl 497 a' i' o' u' n? N? -a -o ?I NI NO 12 14 !I << >> 498 .S :S ?S vv vl vL Vl Dl dL VL VV LD UL Ul uL dl 499 ur uh dh vr hh vh vR Vr UR DR UH DH VR HH VH uH 500 Uh dH Dh Ur uR dR Dr Vh vH ul dr FB LB lB RB TB 501 a* $$ G* $$ $$ s* $$ t* F* H* $$ $$ 00 $$ $$ (U 502 =3 +- >= =< Iu Il -: ?2 Ob .M Sb RT nS 2S fS NS$ 504 Additional information: 506 See this document for details on how to handle particular codes 507 that correspond both to graphemes in the IBM PC ROM, and 508 to control characters. 510 Person & email address to contact for further information: 512 Sean Leonard 514 Intended usage: COMMON 516 4. Example 518 The following example is a RELEASE.NFO file as an e-mail attachment, 519 with base64 encoding. Note that the character set is (correctly) 520 assumed to be OEM Code Page 437. 522 MIME-Version: 1.0 523 Content-Type: text/nfo 524 Content-Disposition: attachment; filename="RELEASE.NFO" 525 Content-Transfer-Encoding: base64 527 TODO/PutInBase64EncodedContentHere== 529 5. IANA Considerations 531 IANA is asked to register the media type text/nfo in the Standards 532 tree using the application provided in Section 2 of this document. 534 IANA is asked to register the charset oem437 in the Character Sets 535 registry using the application provided in Section 3 of this 536 document. 538 6. Security Considerations 540 It's just text; this format provides no facilities for 541 confidentiality or integrity. The ANSI escape sequence "CSI 5 m" 542 could, however, blink you to death. As only a subset of ANSI escape 543 sequences MUST be interpreted; interpreting a greater range than the 544 subset prescribed in this registration may introduce other security 545 issues, such as transmitting operating system commands. 547 Some code points in oem437 have been used ambiguously in practice, so 548 implementations SHOULD NOT assume that the mapping between this 549 charset and Unicode is bijective. When displayed, codes 00, 20, and 550 FF MAY appear to be similar, i.e., as a blank space. 552 7. References 554 7.1. Normative References 556 [ANSI.SYS] Microsoft Corporation, "ANSI.SYS", MSDN ID cc722862, 1994, 557 . 559 [OEMCP437] Microsoft Corporation, "OEM 437", MSDN ID cc305156, 2014, 560 . 562 [RFC1345] Simonsen, K., "Character Mnemonics and Character Sets", 563 RFC 1345, June 1992. 565 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 566 Requirement Levels", BCP 14, RFC 2119, March 1997. 568 [RFC5147] Wilde, E. and M. Duerst, "URI Fragment Identifiers for the 569 text/plain Media Type", RFC 5147, April 2008. 571 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 572 Specifications and Registration Procedures", BCP 13, RFC 573 6838, January 2013. 575 [UTF] The Unicode Consortium, "The Unicode Standard, Version 576 8.0.0", Chapter 3: "Conformance", The Unicode Consortium, 577 August 2015. 579 7.2. Informative References 581 [ISO2022] International Organization for Standardization, "Character 582 Code Structure and Extension Techniques, 6th edition", ISO 583 Standard 2022, ECMA-35, December 1994. 585 [ISO6429] International Organization for Standardization, 586 "Information Technology - Control Functions for Coded 587 Character Sets, 3rd edition", ISO Standard 6429, December 588 1992. 590 [ISO8613] International Organization for Standardization, 591 "Information Technology - Open Document Architecture (ODA) 592 and Interchange Format: Character Content Architectures", 593 ISO Standard 8613-6, ITU-T T.416, March 1993. 595 [REG196] International Organization for Standardization, 596 "International Register of Coded Character Sets: UTF-8 597 without implementation level", Sec. 2.8.1, Reg. 196, April 598 1996, . 600 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, 601 October 2008. 603 [SAUCE] O. "Tasmaniac" Reubens / ACiD, "SAUCE--Standard 604 Architecture for Universal Comment Extensions", 00.5, 605 November 2013, . 607 Appendix A. IBM Code Page 437 vs. UTF-8 Detection Algorithm 609 In cases of ambiguity, the following algorithm SHOULD be used to 610 detect UTF-8 encoded data in text/nfo content: 612 If the octets EF BB BF are present at the beginning => UTF-8. 614 Considering all octets in the content: 616 If no octets are greater than 7F => oem437. 617 If any octets are F5 - FF, C0, or C1 => oem437. 618 If any UTF-8 encodings are "ill-formed" => oem437. 619 If any UTF-8 encodings represent illegal code points 620 (e.g., surrogate code points) => oem437. 622 Ragged line tests: 624 If display characters decoded with oem437 625 result in identical line widths => oem437. 626 If display characters decoded with UTF-8 627 result in identical line widths => UTF-8. 629 Finally: 630 => UTF-8 or oem437; prefer oem437. 632 Author's Address 634 Sean Leonard 635 Penango, Inc. 636 5900 Wilshire Boulevard 637 21st Floor 638 Los Angeles, CA 90036 639 USA 641 EMail: dev+ietf@seantek.com 642 URI: http://www.penango.com/