idnits 2.17.1 draft-faltstrom-base45-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 16, 2022) is 673 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '65 66' is mentioned on line 181, but not defined == Missing Reference: '105 101' is mentioned on line 207, but not defined == Missing Reference: '116 102' is mentioned on line 207, but not defined -- Looks like a reference, but probably isn't: '33' on line 207 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Faltstrom 3 Internet-Draft Netnod 4 Intended status: Informational F. Ljunggren 5 Expires: December 18, 2022 Kirei 6 D. van Gulik 7 Webweaving 8 June 16, 2022 10 The Base45 Data Encoding 11 draft-faltstrom-base45-12 13 Abstract 15 This document describes the Base45 encoding scheme which is built 16 upon the Base64, Base32 and Base16 encoding schemes. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on December 18, 2022. 35 Copyright Notice 37 Copyright (c) 2022 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Conventions Used in This Document . . . . . . . . . . . . . . 2 54 3. Interpretation of Encoded Data . . . . . . . . . . . . . . . 2 55 4. The Base45 Encoding . . . . . . . . . . . . . . . . . . . . . 3 56 4.1. When to, and not to, use Base45 . . . . . . . . . . . . . 4 57 4.2. The alphabet used in Base45 . . . . . . . . . . . . . . . 4 58 4.3. Encoding examples . . . . . . . . . . . . . . . . . . . . 4 59 4.4. Decoding examples . . . . . . . . . . . . . . . . . . . . 5 60 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 61 6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 62 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 63 8. Normative References . . . . . . . . . . . . . . . . . . . . 6 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 66 1. Introduction 68 A QR-code is used to encode text as a graphical image. Depending on 69 the characters used in the text various encoding options for a QR- 70 code exist, e.g. Numeric, Alphanumeric and Byte mode. Even in Byte 71 mode a typical QR-code reader tries to interpret a byte sequence as a 72 UTF-8 or ISO/IEC 8859-1 encoded text. Thus, QR-codes cannot be used 73 to encode arbitrary binary data directly. Such data has to be 74 converted into an appropriate text before that text could be encoded 75 as a QR-code. Compared to already established Base64, Base32 and 76 Base16 encoding schemes, that are described in RFC 4648 [RFC4648], 77 the Base45 scheme described in this document offer a more compact QR- 78 code encoding. 80 One important difference from those others and Base45 is the key 81 table and that the padding with '=' is not required. 83 2. Conventions Used in This Document 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 87 "OPTIONAL" in this document are to be interpreted as described in BCP 88 14 [RFC2119] [RFC8174] when, and only when, they appear in all 89 capitals, as shown here. 91 3. Interpretation of Encoded Data 93 Encoded data is to be interpreted as described in RFC 4648 [RFC4648] 94 with the exception that a different alphabet is selected. 96 4. The Base45 Encoding 98 QR codes have a limited ability to store binary data. In practice 99 binary data have to be encoded in characters according to one of the 100 modes already defined in the standard for QR codes. The easiest mode 101 to use in called Alphanumeric mode (see section 7.3.4 and Table 2 of 102 ISO/IEC 18004:2015 [ISO18004]). Unfortunately Alphanumeric mode uses 103 45 different characters which implies neither Base32 nor Base64 are 104 very effective encodings. 106 A 45-character subset of US-ASCII is used; the 45 characters usable 107 in a QR code in Alphanumeric mode (see section 7.3.4 and Table 2 of 108 ISO/IEC 18004:2015 [ISO18004]). Base45 encodes 2 bytes in 3 109 characters, compared to Base64, which encodes 3 bytes in 4 110 characters. 112 For encoding, two bytes [a, b] MUST be interpreted as a number n in 113 base 256, i.e. as an unsigned integer over 16 bits so that the number 114 n = (a*256) + b. 116 This number n is converted to base 45 [c, d, e] so that n = c + 117 (d*45) + (e*45*45). Note the order of c, d and e which are chosen so 118 that the left-most [c] is the least significant. 120 The values c, d and e are then looked up in Table 1 to produce a 121 three character string. The process is reversed when decoding. 123 For encoding a single byte [a], it MUST be interpreted as a base 256 124 number, i.e. as an unsigned integer over 8 bits. That integer MUST 125 be converted to base 45 [c d] so that a = c + (45*d). The values c 126 and d are then looked up in Table 1 to produce a two character 127 string. 129 A byte string [a b c d ... x y z] with arbitrary content and 130 arbitrary length MUST be encoded as follows: From left to right pairs 131 of bytes MUST be encoded as described above. If the number of bytes 132 is even, then the encoded form is a string with a length which is 133 evenly divisible by 3. If the number of bytes is odd, then the last 134 (rightmost) byte MUST be encoded on two characters as described 135 above. 137 For decoding a Base45 encoded string the inverse operations are 138 performed. 140 4.1. When to, and not to, use Base45 142 If binary data is to be stored in a QR-Code, the suggested mechanism 143 is to use the Alphanumeric mode that uses 11 bits for 2 characters as 144 defined in section 7.3.4 in ISO/IEC 18004:2015 [ISO18004]. The ECI 145 mode indicator for this encoding is 0010. 147 On the other hand if the data is to be sent via some other transport, 148 a transport encoding suitable for that transport should be used 149 instead of Base45. For example, it is not recommended to first 150 encode data in Base45 and then encode the resulting string in Base64 151 if the data is to be sent via email. Instead, the Base45 encoding 152 should be removed, and the data itself should be encoded in Base64. 154 4.2. The alphabet used in Base45 156 The Alphanumeric mode is defined to use 45 characters as specified in 157 this alphabet. 159 Table 1: The Base45 Alphabet 161 Value Encoding Value Encoding Value Encoding Value Encoding 162 00 0 12 C 24 O 36 Space 163 01 1 13 D 25 P 37 $ 164 02 2 14 E 26 Q 38 % 165 03 3 15 F 27 R 39 * 166 04 4 16 G 28 S 40 + 167 05 5 17 H 29 T 41 - 168 06 6 18 I 30 U 42 . 169 07 7 19 J 31 V 43 / 170 08 8 20 K 32 W 44 : 171 09 9 21 L 33 X 172 10 A 22 M 34 Y 173 11 B 23 N 35 Z 175 4.3. Encoding examples 177 It should be noted that although the examples are all text, Base45 is 178 an encoding for binary data where each octet can have any value 179 0-255. 181 Encoding example 1: The string "AB" is the byte sequence [65 66]. 182 The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11 183 + 45 * 45 * 8, so the sequence in base 45 is [11 11 8]. By looking 184 up these values in the Table 1 we get the encoded string "BB8". 186 Encoding example 2: The string "Hello!!" as ASCII is the byte 187 sequence [72 101 108 108 111 33 33]. If we look at each 16 bit 188 value, it is [18533 27756 28449 33]. Note the 33 for the last byte. 189 When looking at the values in base 45, we get [[38 6 9] [36 31 13] [9 190 2 14] [33 0]] where the last byte is represented by two. The 191 resulting string "%69 VD92EX0" is created by looking up these values 192 in Table 1. It should be noted it includes a space. 194 Encoding example 3: The string "base-45" as ASCII is the byte 195 sequence [98 97 115 101 45 52 53]. If we look at each 16 bit value, 196 it is [25185 29541 11572 53]. Note the 53 for the last byte. When 197 looking at the values in base 45, we get [[30 19 12] [21 26 14] [7 32 198 5] [8 1]] where the last byte is represented by two. By looking up 199 these values in the Table 1 we get the encoded string "UJCLQE7W581". 201 4.4. Decoding examples 203 Decoding example 1: The string "QED8WEX0" represents, when looked up 204 in Table 1, the values [26 14 13 8 32 14 33 0]. We arrange the 205 numbers in chunks of three, except for the last one which can be two, 206 and get [[26 14 13] [8 32 14] [33 0]]. In base 45 we get [26981 207 29798 33] where the bytes are [[105 101] [116 102] [33]]. If we look 208 at the ASCII values we get the string "ietf!". 210 5. IANA Considerations 212 There are no considerations for IANA in this document. 214 6. Security Considerations 216 When implementing encoding and decoding it is important to be very 217 careful so that buffer overflow or similar does not occur. This of 218 course includes the calculations in base 45 and lookup in the table 219 of characters (Table 1). A decoder must also be robust regarding 220 input, including proper handling of any octet value 0-255, including 221 the NUL character (ASCII 0). 223 It should be noted that Base64 and some other encodings pad the 224 string so that the encoding starts with an aligned number of 225 characters while Base45 specifically avoids padding. Because of 226 this, special care has to be taken when odd number of octets are to 227 be encoded. Similarly, care must be taken if the number of 228 characters to decode are not evenly divisible by 3. 230 Base encodings use a specific, reduced alphabet to encode binary 231 data. Non-alphabet characters could exist within base-encoded data, 232 caused by data corruption or by design. Non-alphabet characters may 233 be exploited as a "covert channel", where non-protocol data can be 234 sent for nefarious purposes. Non-alphabet characters might also be 235 sent in order to exploit implementation errors leading to, e.g., 236 buffer overflow attacks. 238 Implementations MUST reject any input that is not a valid encoding. 239 For example, it MUST reject the input (encoded data) if it contains 240 characters outside the base alphabet (in Table 1) when interpreting 241 base-encoded data. 243 Even though a Base45 encoded string contains only characters from the 244 alphabet in Table 1, cases like the following has to be considered: 245 The string "FGW" represents 65535 (FFFF in base 16), which is a valid 246 encoding of 16 bits. A slightly different encoded string of the same 247 length, "GGW", would represent 65536 (10000 in base 16), which is 248 represented by more than 16 bits. Implementations MUST also reject 249 the encoded data if it contains a triplet of characters which, when 250 decoded, results in an unsigned integer which is greater than 65535 251 (ffff in base 16). 253 It should be noted that the resulting string after encoding to Base45 254 might include non-URL-safe characters so if the URL including the 255 Base45 encoded data has to be URL safe, one has to use %-encoding. 257 7. Acknowledgements 259 The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens 260 Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim 261 Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders 262 Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby 263 Whitehead for the feedback. Also, everyone that have been working 264 with Base64 over a long period of years and have proven the 265 implementations are stable. 267 8. Normative References 269 [ISO18004] 270 ISO/IEC JTC 1/SC 31, "ISO/IEC 18004:2015 Information 271 technology - Automatic identification and data capture 272 techniques - QR Code bar code symbology specification", 273 ISO/IEC 274 18004:2015 https://www.iso.org/standard/62021.html, 275 February 2015. 277 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 278 Requirement Levels", BCP 14, RFC 2119, 279 DOI 10.17487/RFC2119, March 1997, 280 . 282 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 283 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 284 . 286 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 287 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 288 May 2017, . 290 Authors' Addresses 292 Patrik Faltstrom 293 Netnod 295 Email: paf@netnod.se 297 Fredrik Ljunggren 298 Kirei 300 Email: fredrik@kirei.se 302 Dirk-Willem van Gulik 303 Webweaving 305 Email: dirkx@webweaving.org