idnits 2.17.1 draft-faltstrom-base45-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 1, 2021) is 1002 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '65 66' is mentioned on line 169, but not defined == Missing Reference: '105 101' is mentioned on line 195, but not defined == Missing Reference: '116 102' is mentioned on line 195, but not defined -- Looks like a reference, but probably isn't: '33' on line 195 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO18004' Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Faltstrom 3 Internet-Draft Netnod 4 Intended status: Standards Track F. Ljunggren 5 Expires: January 2, 2022 Kirei 6 D. van Gulik 7 Webweaving 8 July 1, 2021 10 The Base45 Data Encoding 11 draft-faltstrom-base45-07 13 Abstract 15 This document describes the Base45 encoding scheme which is built 16 upon the Base64, Base32 and Base16 encoding schemes. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 2, 2022. 35 Copyright Notice 37 Copyright (c) 2021 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Conventions Used in This Document . . . . . . . . . . . . . . 2 54 3. Interpretation of Encoded Data . . . . . . . . . . . . . . . 2 55 4. The Base45 Encoding . . . . . . . . . . . . . . . . . . . . . 3 56 4.1. When to use Base45 . . . . . . . . . . . . . . . . . . . 3 57 4.2. The alphabet used in Base45 . . . . . . . . . . . . . . . 4 58 4.3. Encoding examples . . . . . . . . . . . . . . . . . . . . 4 59 4.4. Decoding examples . . . . . . . . . . . . . . . . . . . . 5 60 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 61 6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 62 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 63 8. Normative References . . . . . . . . . . . . . . . . . . . . 6 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 66 1. Introduction 68 A QR-code is used to encode text as a graphical image. Depending on 69 the characters used in the text various encoding options for a QR- 70 code exist, e.g. Numeric, Alphanumeric and Byte mode. Even in Byte 71 mode a typical QR-code reader tries to interpret a byte sequence as a 72 UTF-8 or ISO/IEC 8859-1 encoded text. Thus QR-codes cannot be used 73 to encode arbitrary binary data directly. Such data has to be 74 converted into an appropriate text before that text could be encoded 75 as a QR-code. Compared to already established Base64, Base32 and 76 Base16 encoding schemes, that are described in RFC 4648 [RFC4648], 77 the Base45 scheme described in this document offer a more compact QR- 78 code encoding. 80 One important difference from those and Base45 is the key table and 81 that the padding with '=' is not required. 83 2. Conventions Used in This Document 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 87 document are to be interpreted as described in RFC 2119 [RFC2119]. 89 3. Interpretation of Encoded Data 91 Encoded data is to be interpreted as described in RFC 4648 [RFC4648] 92 with the exception that a different alphabet is selected. 94 4. The Base45 Encoding 96 A 45-character subset of US-ASCII is used; the 45 characters usable 97 in a QR code in Alphanumeric mode. Base45 encodes 2 bytes in 3 98 characters, compared to Base64, which encodes 3 bytes in 4 99 characters. 101 For encoding two bytes [a, b] MUST be interpreted as a number n in 102 base 256, i.e. as an unsigned integer over 16 bits so that the number 103 n = (a*256) + b. 105 This number n is converted to base 45 [c, d, e] so that n = c + 106 (d*45) + (e*45*45). Note the order of c, d and e which are chosen so 107 that the left-most [c] is the least significant. 109 The values c, d and e are then looked up in Table 1 to produce a 110 three character string. The process is reversed when decoding. 112 For encoding a single byte [a], it MUST be interpreted as a base 256 113 number, i.e. as an unsigned integer over 8 bits. That integer MUST 114 be converted to base 45 [c d] so that a = c + (45*d). The values c 115 and d are then looked up in Table 1 to produce a two character 116 string. 118 A byte string [a b c d ... x y z] with arbitrary content and 119 arbitrary length MUST be encoded as follows: From left to right pairs 120 of bytes are encoded as described above. If the number of bytes is 121 even, then the encoded form is a string with a length which is evenly 122 divisible by 3. If the number of bytes is odd, then the last 123 (rightmost) byte is encoded on two characters as described above. 125 For decoding a Base45 encoded string the inverse operations are 126 performed. 128 4.1. When to use Base45 130 If binary data is to be stored in a QR-Code one possible way is to 131 use the Alphanumeric mode that uses 11 bits for 2 characters as 132 defined in section 7.3.4 in ISO/IEC 18004:2015 [ISO18004]. The ECI 133 mode indicator for this encoding is 0010. 135 If the data is to be sent via some other transport, a transport 136 encoding suitable for that transport should be used instead of 137 Base45. It is not recommended to first encode data in Base45 and 138 then encode the resulting string in for example Base64 if the data is 139 to be sent via email. Instead the Base45 encoding should be removed, 140 and the data itself should be encoded in Base64. 142 4.2. The alphabet used in Base45 144 The Alphanumeric mode is defined to use 45 characters as specified in 145 this alphabet. 147 Table 1: The Base45 Alphabet 149 Value Encoding Value Encoding Value Encoding Value Encoding 150 00 0 12 C 24 O 36 Space 151 01 1 13 D 25 P 37 $ 152 02 2 14 E 26 Q 38 % 153 03 3 15 F 27 R 39 * 154 04 4 16 G 28 S 40 + 155 05 5 17 H 29 T 41 - 156 06 6 18 I 30 U 42 . 157 07 7 19 J 31 V 43 / 158 08 8 20 K 32 W 44 : 159 09 9 21 L 33 X 160 10 A 22 M 34 Y 161 11 B 23 N 35 Z 163 4.3. Encoding examples 165 It should be noted that although the examples are all text, Base45 is 166 an encoding for binary data where each octet can have any value 167 0-255. 169 Encoding example 1: The string "AB" is the byte sequence [65 66]. 170 The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11 171 + 45 * 45 * 8 so the sequence in base 45 is [11 11 8]. By looking up 172 these values in the Table 1 we get the encoded string "BB8". 174 Encoding example 2: The string "Hello!!" as ASCII is the byte 175 sequence [72 101 108 108 111 33 33]. If we look at each 16 bit 176 value, it is [18533 27756 28449 33]. Note the 33 for the last byte. 177 When looking at the values modulo 45, we get [[38 6 9] [36 31 13] [9 178 2 14] [33 0]] where the last byte is represented by two. By looking 179 up these values in the Table 1 we get the encoded string "%69 180 VD92EX0". 182 Encoding example 3: The string "base-45" as ASCII is the byte 183 sequence [98 97 115 101 45 52 53]. If we look at each 16 bit value, 184 it is [25185 29541 11572 53]. Note the 53 for the last byte. When 185 looking at the values modulo 45, we get [[30 19 12] [21 26 14] [7 32 186 5] [8 1]] where the last byte is represented by two. By looking up 187 these values in the Table 1 we get the encoded string "UJCLQE7W581". 189 4.4. Decoding examples 191 Decoding example 1: The string "QED8WEX0" represents, when looked up 192 in Table 1, the values [26 14 13 8 32 14 33 0]. We arrange the 193 numbers in chunks of three, except for the last one which can be two, 194 and get [[26 14 13] [8 32 14] [33 0]]. In base 45 we get [26981 195 29798 33] where the bytes are [[105 101] [116 102] [33]]. If we look 196 at the ASCII values we get the string "ietf!". 198 5. IANA Considerations 200 There are no considerations for IANA in this document. 202 6. Security Considerations 204 When implementing encoding and decoding it is important to be very 205 careful so that buffer overflow or similar does not occur. This of 206 course includes the calculations for modulo 45 and lookup in the 207 table of characters (Table 1). A decoder must also be robust 208 regarding input, including proper handling of any octet value 0-255, 209 including the NUL character (ASCII 0). 211 It should be noted that Base64 and some other encodings pad the 212 string so that the encoding starts with an aligned number of 213 characters, Base45 specifically avoids padding. Because of this, 214 special care has to be taken when odd number of octets are to be 215 encoded, which results not in N*3 characters, but (N-1)*3+2 216 characters in the encoded string and similarly, at decoding, when the 217 number of encoded characters are not evenly divisible by 3. 219 Base encodings use a specific, reduced alphabet to encode binary 220 data. Non-alphabet characters could exist within base-encoded data, 221 caused by data corruption or by design. Non-alphabet characters may 222 be exploited as a "covert channel", where non-protocol data can be 223 sent for nefarious purposes. Non-alphabet characters might also be 224 sent in order to exploit implementation errors leading to, e.g., 225 buffer overflow attacks. 227 Implementations MUST reject the encoded data if it contains 228 characters outside the base alphabet (in Table 1) when interpreting 229 base-encoded data. 231 Even though a Base45 encoded string contains only characters from the 232 alphabet in Table 1 the following case has to be considered: The 233 string "FGW" represents 65535 (FFFF in base 16), which is a valid 234 encoding. The string "GGW" would represent 65536 (10000 in base 16), 235 which is represented by more than 16 bit. 237 Implementations MUST reject the encoded data if it contains a triplet 238 of characters which, when decoded, results in an unsigned integer 239 which is greater than 65535 (ffff in base 16). 241 It should be noted that the resulting string after encoding to Base45 242 might include non-URL-safe characters so if the URL including the 243 Base45 encoded data has to be URL safe, one has to use %-encoding. 245 7. Acknowledgements 247 The authors thank Anders Ahl, Alan Barrett, Alfred Fiedler, Tomas 248 Harreveld, Erik Hellman, Joakim Jardenberg, Christian Landgren, 249 Anders Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby 250 Whitehead for the feedback. Also everyone that have been working 251 with Base64 over a long period of years and have proven the 252 implementions are stable. 254 8. Normative References 256 [ISO18004] 257 ISO/IEC JTC 1/SC 31, "ISO/IEC 18004:2015 Information 258 technology - Automatic identification and data capture 259 techniques - QR Code bar code symbology specification", 260 ISO/IEC 261 18004:2015 https://www.iso.org/standard/62021.html, 262 February 2015. 264 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 265 Requirement Levels", BCP 14, RFC 2119, 266 DOI 10.17487/RFC2119, March 1997, 267 . 269 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 270 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 271 . 273 Authors' Addresses 275 Patrik Faltstrom 276 Netnod 278 Email: paf@netnod.se 280 Fredrik Ljunggren 281 Kirei 283 Email: fredrik@kirei.se 284 Dirk-Willem van Gulik 285 Webweaving 287 Email: dirkx@webweaving.org