idnits 2.17.1 draft-ietf-idn-lace-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 12 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 860 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 373 has weird spacing: '... bits char...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 580 -- Possible downref: Normative reference to a draft: ref. 'IDNComp' -- No information found for draft-ietf-idn-requirement - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'IDNReq' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' ** Downref: Normative reference to an Informational RFC: RFC 2781 -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode3' Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Mark Davis 2 draft-ietf-idn-lace-01.txt IBM 3 January 5, 2001 Paul Hoffman 4 Expires July 5, 2001 IMC & VPNC 6 LACE: Length-based ASCII Compatible Encoding for IDN 8 Status of this memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other 15 groups may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 This document describes a transformation method for representing 31 non-ASCII characters in host name parts in a fashion that is completely 32 compatible with the current DNS. It is a potential candidate for an 33 ASCII-Compatible Encoding (ACE) for internationalized host names, as 34 described in the comparison document from the IETF IDN Working Group. 35 This method is based on the observation that many internationalized host 36 name parts will have a few substrings from a small number of rows of the 37 ISO 10646 repertoire. Run-length encoding for these types of 38 host names will be fairly compact, and is fairly easy to describe. 40 1. Introduction 42 There is a strong world-wide desire to use characters other than plain 43 ASCII in host names. Host names have become the equivalent of business 44 or product names for many services on the Internet, so there is a need 45 to make them usable by people whose native scripts are not representable 46 by ASCII. The requirements for internationalizing host names are 47 described in the IDN WG's requirements document, [IDNReq]. 49 The IDN WG's comparison document [IDNComp] describes three potential 50 main architectures for IDN: arch-1 (just send binary), arch-2 (send 51 binary or ACE), and arch-3 (just send ACE). LACE is an ACE, called 52 Length-based ACE or LACE, that can be used with protocols that match arch-2 53 or arch-3. LACE specifies an ACE format as specified in ace-1 in 54 [IDNComp]. Further, it specifies an identifying mechanism for ace-2 in 55 [IDNComp], namely ace-2.1.1 (add hopefully-unique legal tag to the 56 beginning of the name part). 58 In formal terms, LACE describes a character encoding scheme of the 59 ISO/IEC 10646 [ISO10646] coded character set (whose assignment of 60 characters is synchronized with Unicode [Unicode3]) and the rules for 61 using that scheme in the DNS. As such, it could also be called a 62 "charset" as defined in [IDNReq]. It can also be viewed as a specialized 63 UTF (transformation format), designed to work within the restrictions of 64 the DNS. 66 The LACE protocol has the following features: 68 - There is exactly one way to convert internationalized host parts to 69 and from LACE parts. Host name part uniqueness is preserved. 71 - Host parts that have no international characters are not changed. 73 - Names using LACE can include more internationalized characters than 74 with other ACE protocols that have been suggested to date. LACE-encoded 75 names are variable length, depending on the number of transitions 76 between rows in the ISO 10646 repertoire that appear in the name part. 77 Name parts that cannot be compressed using run-length encoding can have 78 up to 17 characters, and names that can be compressed can have up to 35 79 characters. Further, a name that has just a few row transitions 80 typically can have over 30 characters. 82 It is important to note that the following sections contain many 83 normative statements with "MUST" and "MUST NOT". Any implementation that 84 does not follow these statements exactly is likely to cause damage to 85 the Internet by creating non-unique representations of host names. 87 1.1 Terminology 89 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and 90 "MAY" in this document are to be interpreted as described in RFC 2119 91 [RFC2119]. 93 Hexadecimal values are shown preceded with an "0x". For example, 94 "0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are 95 shown preceded with an "0b". For example, a nine-bit value might be 96 shown as "0b101101111". 98 Examples in this document use the notation for code points and names 99 from the Unicode Standard [Unicode3] and ISO 10646. For example, the 100 letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER 101 A". 103 LACE converts strings with internationalized characters into 104 strings of US-ASCII that are acceptable as host name parts in current 105 DNS host naming usage. The former are called "pre-converted" and the 106 latter are called "post-converted". 108 1.2 IDN summary 110 Using the terminology in [IDNComp], LACE specifies an ACE format as 111 specified in ace-1. Further, it specifies an identifying mechanism for 112 ace-2, namely ace-2.1.1 (add hopefully-unique legal tag to the beginning 113 of the name part). 115 LACE has the following length characteristics. 117 - LACE-encoded names are variable length, depending on the number of 118 transitions between rows that appear in the name part. 120 - Name parts that cannot be compressed using run-length encoding can 121 have up to 17 characters. 123 - Names that can be compressed can have up to 35 characters. 125 -A name that has just a few row transitions typically can have over 30 126 characters. 128 2. Host Part Transformation 130 According to [STD13], host parts must be case-insensitive, start and 131 end with a letter or digit, and contain only letters, digits, and the 132 hyphen character ("-"). This, of course, excludes any internationalized 133 characters, as well as many other characters in the ASCII character 134 repertoire. Further, domain name parts must be 63 octets or shorter in 135 length. 137 2.1 Name tagging 139 All post-converted name parts that contain internationalized characters 140 begin with the string "lq--". (Of course, because host name parts are 141 case-insensitive, this might also be represented as "Lq--" or "lQ--" or 142 "LQ--".) The string "lq--" was chosen because it is extremely unlikely 143 to exist in host parts before this specification was produced. As a 144 historical note, in late October 2000, none of the second-level host 145 name parts in any of the .com, .edu, .net, and .org top-level domains 146 began with "lq--"; there are many tens of thousands of other strings of 147 three characters followed by a hyphen that have this property and could 148 be used instead. The string "lq--" will change to other strings with the 149 same properties in future versions of this draft. 151 Note that a zone administrator might still choose to use "lq--" at the 152 beginning of a host name part even if that part does not contain 153 internationalized characters. Zone administrators SHOULD NOT create host 154 part names that begin with "lq--" unless those names are post-converted 155 names. Creating host part names that begin with "lq--" but that are not 156 post-converted names may cause two distinct problems. Some display 157 systems, after converting the post-converted name part back to an 158 internationalized name part, might display the name parts in a 159 possibly-confusing fashion to users. More seriously, some resolvers, 160 after converting the post-converted name part back to an 161 internationalized name part, might reject the host name if it contains 162 illegal characters. 164 2.2 Converting an internationalized name to an ACE name part 166 To convert a string of internationalized characters into an ACE name 167 part, the following steps MUST be preformed in the exact order of the 168 subsections given here. 170 If a name part consists exclusively of characters that conform to the 171 host name requirements in [STD13], the name MUST NOT be converted to 172 LACE. That is, a name part that can be represented without LACE MUST NOT 173 be encoded using LACE. This absolute requirement prevents there from 174 being two different encodings for a single DNS host name. 176 If any checking for prohibited name parts (such as ones that are 177 prohibited characters, case-folding, or canonicalization) is to be done, 178 it MUST be done before doing the conversion to an ACE name part. 180 Characters outside the first plane of characters (those with codepoints 181 above U+FFFF) MUST be represented using surrogates, as described in 182 RFC 2781 [RFC2781]. 184 The input name string consists of characters from the ISO 10646 185 character set in big-endian UTF-16 encoding. This is the pre-converted 186 string. 188 2.2.1 Check the input string for disallowed names 190 If the input string consists only of characters that conform to the host 191 name requirements in [STD13], the conversion MUST stop with an error. 193 2.2.2 Compress the pre-converted string 195 The entire pre-converted string MUST be compressed using the compression 196 algorithm specified in section 2.4. The result of this step is the 197 compressed string. 199 2.2.3 Check the length of the compressed string 201 The compressed string MUST be 36 octets or shorter. If the compressed 202 string is 37 octets or longer, the conversion MUST stop with an error. 204 2.2.4 Encode the compressed string with Base32 206 The compressed string MUST be converted using the Base32 encoding 207 described in section 2.5. The result of this step is the encoded string. 209 2.2.5 Prepend "lq--" to the encoded string and finish 211 Prepend the characters "lq--" to the encoded string. This is the host 212 name part that can be used in DNS resolution. 214 2.3 Converting a host name part to an internationalized name 216 The input string for conversion is a valid host name part. Note that if 217 any checking for prohibited name parts (such as prohibited characters, 218 case-folding, or canonicalization is to be done, it MUST be done after 219 doing the conversion from an ACE name part. 221 If a decoded name part consists exclusively of characters that conform 222 to the host name requirements in [STD13], the conversion from LACE MUST 223 fail. Because a name part that can be represented without LACE MUST NOT 224 be encoded using LACE, the decoding process MUST check for name parts 225 that consists exclusively of characters that conform to the host name 226 requirements in [STD13] and, if such a name part is found, MUST 227 beconsidered an error (and possibly a security violation). 229 2.3.1 Strip the "lq--" 231 The input string MUST begin with the characters "lq--". If it does not, 232 the conversion MUST stop with an error. Otherwise, remove the characters 233 "lq--" from the input string. The result of this step is the stripped 234 string. 236 2.3.2 Decode the stripped string with Base32 238 The entire stripped string MUST be checked to see if it is valid Base32 239 output. The entire stripped string MUST be changed to all lower-case 240 letters and digits. If any resulting characters are not in Table 1, the 241 conversion MUST stop with an error; the input string is the 242 post-converted string. Otherwise, the entire resulting string MUST be 243 converted to a binary format using the Base32 decoding described in 244 section 2.5. The result of this step is the decoded string. 246 2.3.3 Decompress the decoded string 248 The entire decoded string MUST be converted to ISO 10646 characters 249 using the decompression algorithm described in section 2.4. The result 250 of this is the internationalized string. 252 2.3.4 Check the internationalized string for disallowed names 254 If the internationalized string consists only of characters that conform 255 to the host name requirements in [STD13], the conversion MUST stop with 256 an error. 258 2.4 Compression algorithm 260 The basic method for compression is to reduce a substring that consists 261 of characters all from a single row of the ISO 10646 repertoire to a 262 count octet followed by the row header followed by the lower octets of 263 the characters. If this ends up being longer than the input, the string 264 is not compressed, but instead has a unique one-octet header attached. 266 Although the uncompressed mode limits the number of characters in a LACE 267 name part to 17, this is still generally enough for all names in almost 268 scripts. Also, this limit is close to the limits set by other encoding 269 proposals. 271 Note that the compression and decompression rules MUST be followed 272 exactly. This requirement prevents a single host name part from having 273 two encodings. Thus, for any input to the algorithm, there is only one 274 possible output. An implementation cannot chose to use one-octet mode or 275 two-octet mode using anything other than the logic given in this 276 section. 278 2.4.1 Compressing a string 280 The input string is in the UTF-16 encoding (big-endian UTF-16 with no 281 byte order mark). 283 Design note: No checking is done on the input to this algorithm. It is 284 assumed that all checking for valid ISO/IEC 10646 characters has already 285 been done by a previous step in the conversion process. 287 1) If the length (measured in octets) of the input is not even, or is 288 less than 2, stop with an error. 290 2) Set the input pointer, called IP, to the first octet of the input 291 string. 293 3) Set the variable called HIGH to the octet at IP. 295 4) Determine the number of contiguous pairs at or after IP that have 296 HIGH as the first octet; call this COUNT. 298 5) Put into an output buffer the single octet for COUNT followed by the 299 single octet for HIGH, followed by all those low octets. Move IP to the 300 end of those pairs; that is, set IP to IP+(2*COUNT). 302 6) If IP is not at the end of the input string, go to step 3. 304 7) If the length of the output buffer is less than or equal to the 305 length of the input buffer (in octets, not in characters), emit the 306 output buffer. Otherwise, output the octet 0xFF followed by the input 307 buffer. Note that there can only be one possible representation for a 308 name part, so that outputting the wrong name part is a serious security 309 error. Decompression schemes MUST accept only the valid form and MUST 310 NOT accept invalid forms. 312 2.4.2 Decompressing a string 314 1. Set the input pointer, called IP, to the first octet of the input 315 string. If there is no first octet, stop with an error. 317 2. If the octet at IP is 0xFF, set IP to IP+1, copy the rest of the 318 input buffer to the output buffer, and go to step 9. 320 3. Get the octet at IP, call it COUNT. If COUNT equals zero or is 321 greater than 36, stop with an error. Set IP to IP+1. If IP is now at the 322 end of the input string, stop with an error. 324 4. Get the octet at IP, call it HIGH. Set IP to IP+1. 326 5. If IP is now at the end of the input string, stop with an error. Get 327 the octet at IP, call it LOW. Set IP to IP+1. 329 6. Output HIGH, then LOW, to the output buffer. 331 7. Decrement COUNT. If COUNT is greater than 0, go to step 5. 333 8. If IP is not at the end of the input buffer, go to step 3. 335 9. If the length of the output buffer is odd, stop with an error. 336 Compress the output buffer into a separate comparison buffer following 337 the steps for compression above. If the contents of the comparison 338 buffer does not equal the input to the compression step, stop with an 339 error. Otherwise, send out the output buffer and stop. 341 2.4.3 Compression examples 343 The five input characters are 344 represented in big-endian UTF-16 as the ten octets <30 E6 30 CB 30 B3 30 345 FC 30 C9>. All the code units are in the same row (03). The output 346 buffer has seven octets <05 30 E6 CB B3 FC C9>, which is shorter than 347 the input string. Thus the output is <05 30 E6 CB B3 FC C9>. 349 The four input characters are represented 350 in big-endian UTF-16 as the eight octets <01 2F 01 11 01 49 00 E5>. The 351 output buffer has eight octets <03 01 2F 11 49 01 00 E5>, which is the 352 same length as the input string. Thus, the output is <03 01 2F 11 49 01 353 00 E5>. 355 The three input characters are represented in 356 big-endian UTF-16 as the six octets <01 2F 00 E0 01 4B>. The output 357 buffer is nine octets <01 01 2F 01 00 E0 01 01 4B>, which is longer than 358 the input buffer. Thus, the output is . 360 2.5 Base32 362 In order to encode non-ASCII characters in DNS-compatible host name parts, 363 they must be converted into legal characters. This is done with Base32 364 encoding, described here. 366 Table 1 shows the mapping between input bits and output characters in 367 Base32. Design note: the digits used in Base32 are "2" through "7" 368 instead of "0" through "6" in order to avoid digits "0" and "1". This 369 helps reduce errors for users who are entering a Base32 stream and may 370 misinterpret a "0" for an "O" or a "1" for an "l". 372 Table 1: Base32 conversion 373 bits char hex bits char hex 374 00000 a 0x61 10000 q 0x71 375 00001 b 0x62 10001 r 0x72 376 00010 c 0x63 10010 s 0x73 377 00011 d 0x64 10011 t 0x74 378 00100 e 0x65 10100 u 0x75 379 00101 f 0x66 10101 v 0x76 380 00110 g 0x67 10110 w 0x77 381 00111 h 0x68 10111 x 0x78 382 01000 i 0x69 11000 y 0x79 383 01001 j 0x6a 11001 z 0x7a 384 01010 k 0x6b 11010 2 0x32 385 01011 l 0x6c 11011 3 0x33 386 01100 m 0x6d 11100 4 0x34 387 01101 n 0x6e 11101 5 0x35 388 01110 o 0x6f 11110 6 0x36 389 01111 p 0x70 11111 7 0x37 391 2.5.1 Encoding octets as Base32 393 The input is a stream of octets. However, the octets are then treated 394 as a stream of bits. 396 Design note: The assumption that the input is a stream of octets 397 (instead of a stream of bits) was made so that no padding was needed. 398 If you are reusing this algorithm for a stream of bits, you must add a 399 padding mechanism in order to differentiate different lengths of input. 401 1) Set the read pointer to the beginning of the input bit stream. 403 2) Look at the five bits after the read pointer. If there are not five 404 bits, go to step 5. 406 3) Look up the value of the set of five bits in the bits column of 407 Table 1, and output the character from the char column (whose hex value 408 is in the hex column). 410 4) Move the read pointer five bits forward. If the read pointer is at 411 the end of the input bit stream (that is, there are no more bits in the 412 input), stop. Otherwise, go to step 2. 414 5) Pad the bits seen until there are five bits. 416 6) Look up the value of the set of five bits in the bits column of 417 Table 1, and output the character from the char column (whose hex value 418 is in the hex column). 420 2.5.2 Decoding Base32 as octets 422 The input is octets in network byte order. The input octets MUST be 423 values from the second column in Table 1. 425 1) Count the number of octets in the input and divide it by 8; call the 426 remainder INPUTCHECK. If INPUTCHECK is 1 or 3 or 6, stop with an error. 428 2) Set the read pointer to the beginning of the input octet stream. 430 3) Look up the character value of the octet in the char column (or hex 431 value in hex column) of Table 1, and add the five bits from the bits 432 column to the output buffer. 434 4) Move the read pointer one octet forward. If the read pointer is not 435 at the end of the input octet stream (that is, there are more octets in 436 the input), go to step 3. 438 5) Count the number of bits that are in the output buffer and divide it 439 by 8; call the remainder PADDING. If the PADDING number of bits at the 440 end of the output buffer are not all zero, stop with an error. 441 Otherwise, emit the output buffer and stop. 443 2.5.3 Base32 example 445 Assume you want to encode the value 0x3a270f93. The bit string is: 447 3 a 2 7 0 f 9 3 448 00111010 00100111 00001111 10010011 450 Broken into chunks of five bits, this is: 452 00111 01000 10011 10000 11111 00100 11 454 Padding is added to make the last chunk five bits: 456 00111 01000 10011 10000 11111 00100 11000 458 The output of encoding is: 460 00111 01000 10011 10000 11111 00100 11000 461 h i t q 7 e y 462 or "hitq7ey". 464 3. Security Considerations 466 Much of the security of the Internet relies on the DNS. Thus, any 467 change to the characteristics of the DNS can change the security of 468 much of the Internet. Thus, LACE makes no changes to the DNS 469 itself. 471 Host names are used by users to connect to Internet servers. The 472 security of the Internet would be compromised if a user entering a 473 single internationalized name could be connected to different servers 474 based on different interpretations of the internationalized host 475 name. 477 LACE is designed so that every internationalized host name part 478 can be represented as one and only one DNS-compatible string. If there 479 is any way to follow the steps in this document and get two or more 480 different results, it is a severe and fatal error in the protocol. 482 4. References 484 [IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name Proposals", 485 draft-ietf-idn-compare. 487 [IDNReq] James Seng, "Requirements of Internationalized Domain Names", 488 draft-ietf-idn-requirement. 490 [ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information 491 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- 492 Part 1: Architecture and Basic Multilingual Plane. Five amendments and 493 a technical corrigendum have been published up to now. UTF-16 is 494 described in Annex Q, published as Amendment 1. 17 other amendments are 495 currently at various stages of standardization. [[[ THIS REFERENCE 496 NEEDS TO BE UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]] 498 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate 499 Requirement Levels", March 1997, RFC 2119. 501 [RFC2781] Paul Hoffman and Francois Yergeau, "UTF-16, an encoding of ISO 502 10646", February 2000, RFC 2781. 504 [STD13] Paul Mockapetris, "Domain names - implementation and 505 specification", November 1987, STD 13 (RFC 1035). 507 [Unicode3] The Unicode Consortium, "The Unicode Standard -- Version 508 3.0", ISBN 0-201-61633-5. Described at 509 . 511 A. Acknowledgements 513 Rick Wesson pointed out some error conditions that need to be 514 tested for. Scott Hollenbeck pointed out some errors in the 515 compression. 517 Base32 is quite obviously inspired by the tried-and-true Base64 518 Content-Transfer-Encoding from MIME. 520 B. Sample code 522 The following is sample Javascript code for the LACE algorithm. 523 This code is believed to be correct, but there may be errors in 524 it. The code is provided as-is and comes with no warranty of 525 fitness, correctness, blah blah blah. 527 /** 528 * Converts to LACE compression format (without Base32) from 529 * UTF-16BE array 530 * @parameter iArray Array of bytes in UTF16-BE 531 * @parameter iCount Number of elements. Must be 0..63 532 * @parameter oArray Array for output of LACE bytes. 533 * Must be at least 100 octets long to provide internal working space 534 * @return Length of output array used 535 * @parameter parseResult output error value if any 536 * @author Mark Davis 537 */ 539 function toLACE(iArray, iCount, oArray, parseResult) { 540 //debugger; 541 if (iCount < 1 || iCount > 62) �{ 542 parseResult.set("Lace: count out of range", iCount); 543 return; 544 } 545 if ((iCount % 2) == 1) �{ 546 parseResult.set("Lace: odd length, can't be UTF-16", iCount); 547 return; 548 } 549 var op = 0; �// input index 550 var ip = 0; �// output index 551 var lastHigh = -1; 552 var lenp = 0; 553 while (ip < iCount) { 554 var high = iArray[ip++]; 555 if (high != lastHigh) { 556 if (lastHigh != -1) { �// store last length 557 var len = op - lenp - 2; 558 oArray[lenp] = len; 559 } � 560 lenp = op++; // reserve space 561 oArray[op++] = high; 562 lastHigh = high; 563 } 564 oArray[op++] = iArray[ip++]; 565 } 567 // store last len 569 var len = op - lenp - 2; 570 oArray[lenp] = len; 572 // see if the input is short, and we should 573 // just copy 575 if (op > iCount) { 576 if (op > 63) �{ 577 parseResult.set("Lace: output too long", op); 578 return; 579 } 580 oArray[0] = 0xFF; 581 copyTo(iArray, 0, iCount, oArray, 1); 582 op = iCount + 1; 583 } 584 return op; 585 } 587 /** 588 * Converts from LACE compressed format (without Base32) to 589 * UTF-16BE array 590 * @parameter iArray Array of bytes in LACE format 591 * @parameter iCount Number of elements 592 * @parameter oArray Array for output of bytes, UTF16-BE. 593 * Must be at least iCount+1 long 594 * @return Length of output array used 595 * @parameter parseResult output error value if any 596 * @author Mark Davis 597 */ 599 function fromLACE(iArray, iCount, oArray, parseResult) { 600 var high; 601 if (iCount < 1 || iCount > 63) { 602 parseResult.set("fromLACE: count out of range", iCount); 603 return; 604 } 605 var op = 0; 606 var ip = 0; 607 var result = 0; 608 if (iArray[ip] == 0xFF) { �// special case FF 609 copyTo(iArray, 1, iCount-1, oArray, 0); 610 result = iCount-1; 611 } else { 612 while (ip < iCount) { �// loop over runs 613 var count = iArray[ip++]; 614 if (ip == iCount) { 615 parseResult.set("fromLACE: truncated before high", ip); 616 return; 617 } 618 high = iArray[ip++]; 619 for (var i = 0; i < count; ++i) { 620 oArray[op++] = high; 621 if (ip == iCount) �{ 622 parseResult.set("fromLACE: truncated from count", ip); 623 return; 624 } 625 oArray[op++] = iArray[ip++]; 626 } 627 } 628 result = op; 629 } 631 // check for uniqueness 633 var checkArray = []; 634 var checkCount = toLACE(oArray, result, checkArray, parseResult); 635 if (!equals(iArray, iCount, checkArray, checkCount)) { 636 parseResult.set("fromLACE: illegal input form"); 637 return; 638 } � 639 return result; 640 } 642 /** 643 * Utility routine for comparing arrays 644 * @parameter array1 first array to compare 645 * @parameter count1 number of elements to compare in first array 646 * @parameter array2 second array to compare 647 * @parameter count1 number of elements to compare in second array 648 * @return true iff counts are same, and elements from 0 to count-1 649 * are the same 650 */ 652 function equals(array1, count1, array2, count2) { 653 if (count1 != count2) return false; 654 for (var i = 0; i < count1; ++i) { 655 if (array1[i] != array2[i]) return false; 656 } 657 return true; 658 } 660 /** 661 * Utility routine for getting array of bytes from UTF-16 string 662 * @parameter str source string 663 * @parameter oArray output array to fill in 664 * @return count of bytes put into oArray 665 */ 667 function utf16FromString(str, oArray) { 668 var op = 0; 669 for (var i = 0; i < str.length; ++i) { 670 var code = str.charCodeAt(i); 671 oArray[op++] = (code >>> 8); �// top byte 672 oArray[op++] = (code & 0xFF); // bottom byte 673 } 674 return op; 675 } 677 /** 678 * Utility routine to see if string doesn't need LACE 679 * @parameter str source string 680 * @return true if ok already 681 */ 683 function okAlready(str) { 684 for (var i = 0; i < str.length; ++i) { 685 var c = str.charAt(i); 686 if (c == '-' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9') 687 continue; 688 return false; 689 } 690 return true 691 } 693 /** 694 * Convert from bytes to base32 695 * @parameter input Input buffer of bytes with values 00 to FF 696 * @parameter inputLength Length of input buffer 697 * @parameter output Output buffer, to be filled with with values from 698 a-z2-7. 699 * Must be of at least length input*8/5 + 1 700 * @return Length of output buffer used 701 * @author Mark Davis 702 */ 704 function toBase32(input, inputLength, output, parseResult) { 705 //debugger; 706 var bits = 0; 707 var bitCount = 0; 708 var ip = 0; 709 var op = 0; 710 var val = 0; 711 while (true) { 713 // get bits if we don't have enough 715 if (bitCount < 5) { 716 if (ip >= inputLength) break; 717 // get another input 718 bits <<= 8; 719 if (baseDebugTo) alert("byte: " + input[ip].toString(16) + ", 720 bitCount: " + (bitCount+8)); 722 bits = bits | input[ip++]; 723 bitCount += 8; 724 } 726 // emit and remove them 728 bitCount -= 5; 729 val = (bits >> bitCount); 730 if (baseDebugTo) alert("Val: " + val.toString(16) + ", bitCount: " 731 + bitCount); 732 output[op++] = toLetter(val); 733 //if (baseDebugTo) alert("out: " + output[op-1].toString(16)); 734 bits &= ~(0x1F << bitCount); 735 } 737 // add padding and output if necessary 739 if (bitCount > 0) { 740 if (baseDebugTo) alert("bits*: " + bits.toString(16) + 741 ", bitCount: " + bitCount); 742 val = bits << (5 - bitCount); 743 if (baseDebugTo) alert("out*: " + val.toString(16)); 744 output[op++] = toLetter(val); 745 } 746 return op; 747 } 749 /** 750 * Convert from base32 to bytes 751 * @parameter input Input buffer of bytes with values from a-z2-7 752 * @parameter inputLength Length of input buffer 753 * @parameter output Output buffer, to be filled with bytes from 754 * 00 to FF 755 * Must be of at least length input*5/8 + 1 756 * @return Length of output buffer used 757 * @author Mark Davis 758 */ 760 function fromBase32(input, inputLength, output, parseResult) { 761 //debugger; 762 var inputCheck = inputLength % 8; 763 if (inputCheck == 1 || inputCheck == 3 || inputCheck == 6) { 764 parseResult.set("Base32 excess length", null, inputLength); 765 return; 766 } 767 var bits = 0; 768 var bitCount = 0; 769 var ip = 0; 770 var op = 0; 771 var val = 0; 772 while (ip < inputLength) { 774 // get more bits 775 var val = input[ip++]; 776 val = fromLetter(val); 777 if (val < 0 || val > 0x3F) { 778 parseResult.set("Bad Base32 byte", val, ip-1); 779 return; 780 } 781 if (baseDebugFrom) alert("base32: " + val.toString(16)); 782 bits <<= 5; 783 bits = bits | val; 784 bitCount += 5; 785 if (baseDebugFrom) alert("from: " + val.toString(16) + 786 ", bitCount: " + bitCount); 788 // emit & remove if we can 790 if (bitCount >= 8) { 791 bitCount -= 8; 792 output[op++] = bits >> bitCount; 793 if (baseDebugFrom) alert("out2: " + (bits >> bitCount) + 794 ", bitCount: " + bitCount); 795 bits &= ~(0xFF << bitCount); 796 } 797 } 799 // check that padding is with zero! 800 if (bits != 0) return -ip; 801 return op; 802 } 804 function toLetter(val) { 805 if (val > 25) return val - 26 + 0x32; 806 return val + 0x61; 807 // return val + (val < 26 ? 0x61 : 0x18); 808 } 810 function fromLetter(val) { 811 if (val < 0x61) return val + 26 - 0x32; 812 return val - 0x61; 813 } 815 C. Difrerences between -00 and -01 817 1: Minor typos. 819 2.1: Changed the tag to 'lq--'. 821 2.2 and 2.3: Added check for all-STD13 names in the steps. 823 2.4.1: Clarified first sentence. Step 5: fixed the moving of the IP. 825 2.4.2: Moved the last sentence of step 4 to be the first sentence of 826 step 5. Added the check for odd-length output. Changed the exit 827 comparision to doing a full comparison (instead of looking for lengths). 829 2.5.2: Changed the sense of the test in step 3 and added step 4 to check 830 for malformed input. Also made the output a buffer. Also added new step 831 1. 833 Changed Appendix B from IANA Considerations (of which there are none) to 834 Javascript code sample. 836 D. Author Contact Information 838 Mark Davis 839 IBM 840 10275 N. De Anza Blvd 841 Cupertino, CA 95014 842 mark.davis@us.ibm.com and mark.davis@macchiato.com 844 Paul Hoffman 845 Internet Mail Consortium and VPN Consortium 846 127 Segre Place 847 Santa Cruz, CA 95060 USA 848 paul.hoffman@imc.org and paul.hoffman@vpnc.org