idnits 2.17.1 draft-ietf-idn-lace-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 12 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 860 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 373 has weird spacing: '... bits char...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 580

  -- Possible downref: Normative reference to a draft: ref. 'IDNComp' 

  -- No information found for draft-ietf-idn-requirement - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'IDNReq' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  ** Downref: Normative reference to an Informational RFC: RFC 2781

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode3'


     Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                            Mark Davis
2	draft-ietf-idn-lace-01.txt                                       IBM
3	January 5, 2001                                         Paul Hoffman
4	Expires July 5, 2001                                      IMC & VPNC

6	        LACE: Length-based ASCII Compatible Encoding for IDN

8	Status of this memo

10	This document is an Internet-Draft and is in full conformance with all
11	provisions of Section 10 of RFC2026.

13	Internet-Drafts are working documents of the Internet Engineering Task
14	Force (IETF), its areas, and its working groups. Note that other
15	groups may also distribute working documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference
20	material or to cite them other than as "work in progress."

22	     The list of current Internet-Drafts can be accessed at
23	     http://www.ietf.org/ietf/1id-abstracts.txt

25	     The list of Internet-Draft Shadow Directories can be accessed at
26	     http://www.ietf.org/shadow.html.

28	Abstract

30	This document describes a transformation method for representing
31	non-ASCII characters in host name parts in a fashion that is completely
32	compatible with the current DNS. It is a potential candidate for an
33	ASCII-Compatible Encoding (ACE) for internationalized host names, as
34	described in the comparison document from the IETF IDN Working Group.
35	This method is based on the observation that many internationalized host
36	name parts will have a few substrings from a small number of rows of the
37	ISO 10646 repertoire. Run-length encoding for these types of
38	host names will be fairly compact, and is fairly easy to describe.

40	1. Introduction

42	There is a strong world-wide desire to use characters other than plain
43	ASCII in host names. Host names have become the equivalent of business
44	or product names for many services on the Internet, so there is a need
45	to make them usable by people whose native scripts are not representable
46	by ASCII. The requirements for internationalizing host names are
47	described in the IDN WG's requirements document, [IDNReq].

49	The IDN WG's comparison document [IDNComp] describes three potential
50	main architectures for IDN: arch-1 (just send binary), arch-2 (send
51	binary or ACE), and arch-3 (just send ACE). LACE is an ACE, called
52	Length-based ACE or LACE, that can be used with protocols that match arch-2
53	or arch-3. LACE specifies an ACE format as specified in ace-1 in
54	[IDNComp]. Further, it specifies an identifying mechanism for ace-2 in
55	[IDNComp], namely ace-2.1.1 (add hopefully-unique legal tag to the
56	beginning of the name part).

58	In formal terms, LACE describes a character encoding scheme of the
59	ISO/IEC 10646 [ISO10646] coded character set (whose assignment of
60	characters is synchronized with Unicode [Unicode3]) and the rules for
61	using that scheme in the DNS. As such, it could also be called a
62	"charset" as defined in [IDNReq]. It can also be viewed as a specialized
63	UTF (transformation format), designed to work within the restrictions of
64	the DNS.

66	The LACE protocol has the following features:

68	- There is exactly one way to convert internationalized host parts to
69	and from LACE parts. Host name part uniqueness is preserved.

71	- Host parts that have no international characters are not changed.

73	- Names using LACE can include more internationalized characters than
74	with other ACE protocols that have been suggested to date. LACE-encoded
75	names are variable length, depending on the number of transitions
76	between rows in the ISO 10646 repertoire that appear in the name part.
77	Name parts that cannot be compressed using run-length encoding can have
78	up to 17 characters, and names that can be compressed can have up to 35
79	characters. Further, a name that has just a few row transitions
80	typically can have over 30 characters.

82	It is important to note that the following sections contain many
83	normative statements with "MUST" and "MUST NOT". Any implementation that
84	does not follow these statements exactly is likely to cause damage to
85	the Internet by creating non-unique representations of host names.

87	1.1 Terminology

89	The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
90	"MAY" in this document are to be interpreted as described in RFC 2119
91	[RFC2119].

93	Hexadecimal values are shown preceded with an "0x". For example,
94	"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
95	shown preceded with an "0b". For example, a nine-bit value might be
96	shown as "0b101101111".

98	Examples in this document use the notation for code points and names
99	from the Unicode Standard [Unicode3] and ISO 10646. For example, the
100	letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
101	A".

103	LACE converts strings with internationalized characters into
104	strings of US-ASCII that are acceptable as host name parts in current
105	DNS host naming usage. The former are called "pre-converted" and the
106	latter are called "post-converted".

108	1.2 IDN summary

110	Using the terminology in [IDNComp], LACE specifies an ACE format as
111	specified in ace-1. Further, it specifies an identifying mechanism for
112	ace-2, namely ace-2.1.1 (add hopefully-unique legal tag to the beginning
113	of the name part).

115	LACE has the following length characteristics.

117	- LACE-encoded names are variable length, depending on the number of
118	transitions between rows that appear in the name part.

120	- Name parts that cannot be compressed using run-length encoding can
121	have up to 17 characters.

123	- Names that can be compressed can have up to 35 characters.

125	-A name that has just a few row transitions typically can have over 30
126	characters.

128	2. Host Part Transformation

130	According to [STD13], host parts must be case-insensitive, start and
131	end with a letter or digit, and contain only letters, digits, and the
132	hyphen character ("-"). This, of course, excludes any internationalized
133	characters, as well as many other characters in the ASCII character
134	repertoire. Further, domain name parts must be 63 octets or shorter in
135	length.

137	2.1 Name tagging

139	All post-converted name parts that contain internationalized characters
140	begin with the string "lq--". (Of course, because host name parts are
141	case-insensitive, this might also be represented as "Lq--" or "lQ--" or
142	"LQ--".) The string "lq--" was chosen because it is extremely unlikely
143	to exist in host parts before this specification was produced. As a
144	historical note, in late October 2000, none of the second-level host
145	name parts in any of the .com, .edu, .net, and .org top-level domains
146	began with "lq--"; there are many tens of thousands of other strings of
147	three characters followed by a hyphen that have this property and could
148	be used instead. The string "lq--" will change to other strings with the
149	same properties in future versions of this draft.

151	Note that a zone administrator might still choose to use "lq--" at the
152	beginning of a host name part even if that part does not contain
153	internationalized characters. Zone administrators SHOULD NOT create host
154	part names that begin with "lq--" unless those names are post-converted
155	names. Creating host part names that begin with "lq--" but that are not
156	post-converted names may cause two distinct problems. Some display
157	systems, after converting the post-converted name part back to an
158	internationalized name part, might display the name parts in a
159	possibly-confusing fashion to users. More seriously, some resolvers,
160	after converting the post-converted name part back to an
161	internationalized name part, might reject the host name if it contains
162	illegal characters.

164	2.2 Converting an internationalized name to an ACE name part

166	To convert a string of internationalized characters into an ACE name
167	part, the following steps MUST be preformed in the exact order of the
168	subsections given here.

170	If a name part consists exclusively of characters that conform to the
171	host name requirements in [STD13], the name MUST NOT be converted to
172	LACE. That is, a name part that can be represented without LACE MUST NOT
173	be encoded using LACE. This absolute requirement prevents there from
174	being two different encodings for a single DNS host name.

176	If any checking for prohibited name parts (such as ones that are
177	prohibited characters, case-folding, or canonicalization) is to be done,
178	it MUST be done before doing the conversion to an ACE name part.

180	Characters outside the first plane of characters (those with codepoints
181	above U+FFFF) MUST be represented using surrogates, as described in
182	RFC 2781 [RFC2781].

184	The input name string consists of characters from the ISO 10646
185	character set in big-endian UTF-16 encoding. This is the pre-converted
186	string.

188	2.2.1 Check the input string for disallowed names

190	If the input string consists only of characters that conform to the host
191	name requirements in [STD13], the conversion MUST stop with an error.

193	2.2.2 Compress the pre-converted string

195	The entire pre-converted string MUST be compressed using the compression
196	algorithm specified in section 2.4. The result of this step is the
197	compressed string.

199	2.2.3 Check the length of the compressed string

201	The compressed string MUST be 36 octets or shorter. If the compressed
202	string is 37 octets or longer, the conversion MUST stop with an error.

204	2.2.4 Encode the compressed string with Base32

206	The compressed string MUST be converted using the Base32 encoding
207	described in section 2.5. The result of this step is the encoded string.

209	2.2.5 Prepend "lq--" to the encoded string and finish

211	Prepend the characters "lq--" to the encoded string. This is the host
212	name part that can be used in DNS resolution.

214	2.3 Converting a host name part to an internationalized name

216	The input string for conversion is a valid host name part. Note that if
217	any checking for prohibited name parts (such as prohibited characters,
218	case-folding, or canonicalization is to be done, it MUST be done after
219	doing the conversion from an ACE name part.

221	If a decoded name part consists exclusively of characters that conform
222	to the host name requirements in [STD13], the conversion from LACE MUST
223	fail. Because a name part that can be represented without LACE MUST NOT
224	be encoded using LACE, the decoding process MUST check for name parts
225	that consists exclusively of characters that conform to the host name
226	requirements in [STD13] and, if such a name part is found, MUST
227	beconsidered an error (and possibly a security violation).

229	2.3.1 Strip the "lq--"

231	The input string MUST begin with the characters "lq--". If it does not,
232	the conversion MUST stop with an error. Otherwise, remove the characters
233	"lq--" from the input string. The result of this step is the stripped
234	string.

236	2.3.2 Decode the stripped string with Base32

238	The entire stripped string MUST be checked to see if it is valid Base32
239	output. The entire stripped string MUST be changed to all lower-case
240	letters and digits. If any resulting characters are not in Table 1, the
241	conversion MUST stop with an error; the input string is the
242	post-converted string. Otherwise, the entire resulting string MUST be
243	converted to a binary format using the Base32 decoding described in
244	section 2.5. The result of this step is the decoded string.

246	2.3.3 Decompress the decoded string

248	The entire decoded string MUST be converted to ISO 10646 characters
249	using the decompression algorithm described in section 2.4. The result
250	of this is the internationalized string.

252	2.3.4 Check the internationalized string for disallowed names

254	If the internationalized string consists only of characters that conform
255	to the host name requirements in [STD13], the conversion MUST stop with
256	an error.

258	2.4 Compression algorithm

260	The basic method for compression is to reduce a substring that consists
261	of characters all from a single row of the ISO 10646 repertoire to a
262	count octet followed by the row header followed by the lower octets of
263	the characters. If this ends up being longer than the input, the string
264	is not compressed, but instead has a unique one-octet header attached.

266	Although the uncompressed mode limits the number of characters in a LACE
267	name part to 17, this is still generally enough for all names in almost
268	scripts. Also, this limit is close to the limits set by other encoding
269	proposals.

271	Note that the compression and decompression rules MUST be followed
272	exactly. This requirement prevents a single host name part from having
273	two encodings. Thus, for any input to the algorithm, there is only one
274	possible output. An implementation cannot chose to use one-octet mode or
275	two-octet mode using anything other than the logic given in this
276	section.

278	2.4.1 Compressing a string

280	The input string is in the UTF-16 encoding (big-endian UTF-16 with no
281	byte order mark).

283	Design note: No checking is done on the input to this algorithm. It is
284	assumed that all checking for valid ISO/IEC 10646 characters has already
285	been done by a previous step in the conversion process.

287	1) If the length (measured in octets) of the input is not even, or is
288	less than 2, stop with an error.

290	2) Set the input pointer, called IP, to the first octet of the input
291	string.

293	3) Set the variable called HIGH to the octet at IP.

295	4) Determine the number of contiguous pairs at or after IP that have
296	HIGH as the first octet; call this COUNT.

298	5) Put into an output buffer the single octet for COUNT followed by the
299	single octet for HIGH, followed by all those low octets. Move IP to the
300	end of those pairs; that is, set IP to IP+(2*COUNT).

302	6) If IP is not at the end of the input string, go to step 3.

304	7) If the length of the output buffer is less than or equal to the
305	length of the input buffer (in octets, not in characters), emit the
306	output buffer. Otherwise, output the octet 0xFF followed by the input
307	buffer. Note that there can only be one possible representation for a
308	name part, so that outputting the wrong name part is a serious security
309	error. Decompression schemes MUST accept only the valid form and MUST
310	NOT accept invalid forms.

312	2.4.2 Decompressing a string

314	1. Set the input pointer, called IP, to the first octet of the input
315	string. If there is no first octet, stop with an error.

317	2. If the octet at IP is 0xFF, set IP to IP+1, copy the rest of the
318	input buffer to the output buffer, and go to step 9.

320	3. Get the octet at IP, call it COUNT. If COUNT equals zero or is
321	greater than 36, stop with an error. Set IP to IP+1. If IP is now at the
322	end of the input string, stop with an error.

324	4. Get the octet at IP, call it HIGH. Set IP to IP+1.

326	5. If IP is now at the end of the input string, stop with an error. Get
327	the octet at IP, call it LOW. Set IP to IP+1.

329	6. Output HIGH, then LOW, to the output buffer.

331	7. Decrement COUNT. If COUNT is greater than 0, go to step 5.

333	8. If IP is not at the end of the input buffer, go to step 3.

335	9. If the length of the output buffer is odd, stop with an error.
336	Compress the output buffer into a separate comparison buffer following
337	the steps for compression above. If the contents of the comparison
338	buffer does not equal the input to the compression step, stop with an
339	error. Otherwise, send out the output buffer and stop.

341	2.4.3 Compression examples

343	The five input characters  are
344	represented in big-endian UTF-16 as the ten octets <30 E6 30 CB 30 B3 30
345	FC 30 C9>. All the code units are in the same row (03). The output
346	buffer has seven octets <05 30 E6 CB B3 FC C9>, which is shorter than
347	the input string. Thus the output is <05 30 E6 CB B3 FC C9>.

349	The four input characters  are represented
350	in big-endian UTF-16 as the eight octets <01 2F 01 11 01 49 00 E5>. The
351	output buffer has eight octets <03 01 2F 11 49 01 00 E5>, which is the
352	same length as the input string. Thus, the output is <03 01 2F 11 49 01
353	00 E5>.

355	The three input characters  are represented in
356	big-endian UTF-16 as the six octets <01 2F 00 E0  01 4B>. The output
357	buffer is nine octets <01 01 2F 01 00 E0 01 01 4B>, which is longer than
358	the input buffer. Thus, the output is .

360	2.5 Base32

362	In order to encode non-ASCII characters in DNS-compatible host name parts,
363	they must be converted into legal characters. This is done with Base32
364	encoding, described here.

366	Table 1 shows the mapping between input bits and output characters in
367	Base32. Design note: the digits used in Base32 are "2" through "7"
368	instead of "0" through "6" in order to avoid digits "0" and "1". This
369	helps reduce errors for users who are entering a Base32 stream and may
370	misinterpret a "0" for an "O" or a "1" for an "l".

372	                    Table 1: Base32 conversion
373	             bits   char  hex         bits   char  hex
374	             00000   a    0x61        10000   q    0x71
375	             00001   b    0x62        10001   r    0x72
376	             00010   c    0x63        10010   s    0x73
377	             00011   d    0x64        10011   t    0x74
378	             00100   e    0x65        10100   u    0x75
379	             00101   f    0x66        10101   v    0x76
380	             00110   g    0x67        10110   w    0x77
381	             00111   h    0x68        10111   x    0x78
382	             01000   i    0x69        11000   y    0x79
383	             01001   j    0x6a        11001   z    0x7a
384	             01010   k    0x6b        11010   2    0x32
385	             01011   l    0x6c        11011   3    0x33
386	             01100   m    0x6d        11100   4    0x34
387	             01101   n    0x6e        11101   5    0x35
388	             01110   o    0x6f        11110   6    0x36
389	             01111   p    0x70        11111   7    0x37

391	2.5.1 Encoding octets as Base32

393	The input is a stream of octets. However, the octets are then treated
394	as a stream of bits.

396	Design note: The assumption that the input is a stream of octets
397	(instead of a stream of bits) was made so that no padding was needed.
398	If you are reusing this algorithm for a stream of bits, you must add a
399	padding mechanism in order to differentiate different lengths of input.

401	1) Set the read pointer to the beginning of the input bit stream.

403	2) Look at the five bits after the read pointer. If there are not five
404	bits, go to step 5.

406	3) Look up the value of the set of five bits in the bits column of
407	Table 1, and output the character from the char column (whose hex value
408	is in the hex column).

410	4) Move the read pointer five bits forward. If the read pointer is at
411	the end of the input bit stream (that is, there are no more bits in the
412	input), stop. Otherwise, go to step 2.

414	5) Pad the bits seen until there are five bits.

416	6) Look up the value of the set of five bits in the bits column of
417	Table 1, and output the character from the char column (whose hex value
418	is in the hex column).

420	2.5.2 Decoding Base32 as octets

422	The input is octets in network byte order. The input octets MUST be
423	values from the second column in Table 1.

425	1) Count the number of octets in the input and divide it by 8; call the
426	remainder INPUTCHECK. If INPUTCHECK is 1 or 3 or 6, stop with an error.

428	2) Set the read pointer to the beginning of the input octet stream.

430	3) Look up the character value of the octet in the char column (or hex
431	value in hex column) of Table 1, and add the five bits from the bits
432	column to the output buffer.

434	4) Move the read pointer one octet forward. If the read pointer is not
435	at the end of the input octet stream (that is, there are more octets in
436	the input), go to step 3.

438	5) Count the number of bits that are in the output buffer and divide it
439	by 8; call the remainder PADDING. If the PADDING number of bits at the
440	end of the output buffer are not all zero, stop with an error.
441	Otherwise, emit the output buffer and stop.

443	2.5.3 Base32 example

445	Assume you want to encode the value 0x3a270f93. The bit string is:

447	3   a    2   7    0   f    9   3
448	00111010 00100111 00001111 10010011

450	Broken into chunks of five bits, this is:

452	00111 01000 10011 10000 11111 00100 11

454	Padding is added to make the last chunk five bits:

456	00111 01000 10011 10000 11111 00100 11000

458	The output of encoding is:

460	00111 01000 10011 10000 11111 00100 11000
461	  h     i     t     q     7     e     y
462	or "hitq7ey".

464	3. Security Considerations

466	Much of the security of the Internet relies on the DNS. Thus, any
467	change to the characteristics of the DNS can change the security of
468	much of the Internet. Thus, LACE makes no changes to the DNS
469	itself.

471	Host names are used by users to connect to Internet servers. The
472	security of the Internet would be compromised if a user entering a
473	single internationalized name could be connected to different servers
474	based on different interpretations of the internationalized host
475	name.

477	LACE is designed so that every internationalized host name part
478	can be represented as one and only one DNS-compatible string. If there
479	is any way to follow the steps in this document and get two or more
480	different results, it is a severe and fatal error in the protocol.

482	4. References

484	[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name Proposals",
485	draft-ietf-idn-compare.

487	[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
488	draft-ietf-idn-requirement.

490	[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
491	technology -- Universal Multiple-Octet Coded Character Set (UCS) --
492	Part 1: Architecture and Basic Multilingual Plane.  Five amendments and
493	a technical corrigendum have been published up to now. UTF-16 is
494	described in Annex Q, published as Amendment 1. 17 other amendments are
495	currently at various stages of standardization. [[[ THIS REFERENCE
496	NEEDS TO BE UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]

498	[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
499	Requirement Levels", March 1997, RFC 2119.

501	[RFC2781] Paul Hoffman and Francois Yergeau, "UTF-16, an encoding of ISO
502	10646", February 2000, RFC 2781.

504	[STD13] Paul Mockapetris, "Domain names - implementation and
505	specification", November 1987, STD 13 (RFC 1035).

507	[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
508	3.0", ISBN 0-201-61633-5. Described at
509	.

511	A. Acknowledgements

513	Rick Wesson pointed out some error conditions that need to be
514	tested for. Scott Hollenbeck pointed out some errors in the
515	compression.

517	Base32 is quite obviously inspired by the tried-and-true Base64
518	Content-Transfer-Encoding from MIME.

520	B. Sample code

522	The following is sample Javascript code for the LACE algorithm.
523	This code is believed to be correct, but there may be errors in
524	it. The code is provided as-is and comes with no warranty of
525	fitness, correctness, blah blah blah.

527	/**
528	 * Converts to LACE compression format (without Base32) from
529	 *   UTF-16BE array
530	 * @parameter iArray Array of bytes in UTF16-BE
531	 * @parameter iCount Number of elements. Must be 0..63
532	 * @parameter oArray Array for output of LACE bytes.
533	 *   Must be at least 100 octets long to provide internal working space
534	 * @return Length of output array used
535	 * @parameter parseResult output error value if any
536	 * @author Mark Davis
537	 */

539	function toLACE(iArray, iCount, oArray, parseResult) {
540	//debugger;
541	  if (iCount < 1 || iCount > 62) �{
542	    parseResult.set("Lace: count out of range", iCount);
543	    return;
544	  }
545	  if ((iCount % 2) == 1) �{
546	    parseResult.set("Lace: odd length, can't be UTF-16", iCount);
547	    return;
548	  }
549	  var op = 0;                     �// input index
550	  var ip = 0;                     �// output index
551	  var lastHigh = -1;
552	  var lenp = 0;
553	  while (ip < iCount) {
554	    var high = iArray[ip++];
555	    if (high != lastHigh) {
556	      if (lastHigh != -1) {       �// store last length
557	        var len = op - lenp - 2;
558	        oArray[lenp] = len;
559	      }   �
560	      lenp = op++;                 // reserve space
561	      oArray[op++] = high;
562	      lastHigh = high;
563	    }
564	    oArray[op++] = iArray[ip++];
565	  }

567	  // store last len

569	  var len = op - lenp - 2;
570	  oArray[lenp] = len;

572	  // see if the input is short, and we should
573	  // just copy

575	  if (op > iCount) {
576	    if (op > 63) �{
577	      parseResult.set("Lace: output too long", op);
578	      return;
579	    }
580	    oArray[0] = 0xFF;
581	    copyTo(iArray, 0, iCount, oArray, 1);
582	    op = iCount + 1;
583	  }
584	  return op;
585	}

587	/**
588	 * Converts from LACE compressed format (without Base32) to
589	 *   UTF-16BE array
590	 * @parameter iArray Array of bytes in LACE format
591	 * @parameter iCount Number of elements
592	 * @parameter oArray Array for output of bytes, UTF16-BE.
593	 *   Must be at least iCount+1 long
594	 * @return Length of output array used
595	 * @parameter parseResult output error value if any
596	 * @author Mark Davis
597	 */

599	function fromLACE(iArray, iCount, oArray, parseResult) {
600	  var high;
601	  if (iCount < 1 || iCount > 63) {
602	    parseResult.set("fromLACE: count out of range", iCount);
603	    return;
604	  }
605	  var op = 0;
606	  var ip = 0;
607	  var result = 0;
608	  if (iArray[ip] == 0xFF) { �// special case FF
609	    copyTo(iArray, 1, iCount-1, oArray, 0);
610	    result = iCount-1;
611	  } else {
612	    while (ip < iCount) { �// loop over runs
613	      var count = iArray[ip++];
614	      if (ip == iCount) {
615	        parseResult.set("fromLACE: truncated before high", ip);
616	        return;
617	      }
618	      high = iArray[ip++];
619	      for (var i = 0; i < count; ++i) {
620	        oArray[op++] = high;
621	        if (ip == iCount) �{
622	          parseResult.set("fromLACE: truncated from count", ip);
623	          return;
624	        }
625	        oArray[op++] = iArray[ip++];
626	      }
627	    }
628	    result = op;
629	  }

631	  // check for uniqueness

633	  var checkArray = [];
634	  var checkCount = toLACE(oArray, result, checkArray, parseResult);
635	  if (!equals(iArray, iCount, checkArray, checkCount)) {
636	    parseResult.set("fromLACE: illegal input form");
637	    return;
638	  }   �
639	  return result;
640	}

642	/**
643	 * Utility routine for comparing arrays
644	 * @parameter array1 first array to compare
645	 * @parameter count1 number of elements to compare in first array
646	 * @parameter array2 second array to compare
647	 * @parameter count1 number of elements to compare in second array
648	 * @return true iff counts are same, and elements from 0 to count-1
649	 *   are the same
650	 */

652	function equals(array1, count1, array2, count2) {
653	  if (count1 != count2) return false;
654	  for (var i = 0; i < count1; ++i) {
655	    if (array1[i] != array2[i]) return false;
656	  }
657	  return true;
658	}

660	/**
661	 * Utility routine for getting array of bytes from UTF-16 string
662	 * @parameter str source string
663	 * @parameter oArray output array to fill in
664	 * @return count of bytes put into oArray
665	 */

667	function utf16FromString(str, oArray) {
668	  var op = 0;
669	  for (var i = 0; i < str.length; ++i) {
670	    var code = str.charCodeAt(i);
671	    oArray[op++] = (code >>> 8); �// top byte
672	    oArray[op++] = (code & 0xFF); // bottom byte
673	  }
674	  return op;
675	}

677	/**
678	 * Utility routine to see if string doesn't need LACE
679	 * @parameter str source string
680	 * @return true if ok already
681	 */

683	function okAlready(str) {
684	  for (var i = 0; i < str.length; ++i) {
685	    var c = str.charAt(i);
686	    if (c == '-' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9')
687	       continue;
688	    return false;
689	  }
690	  return true
691	}

693	/**
694	 * Convert from bytes to base32
695	 * @parameter input Input buffer of bytes with values 00 to FF
696	 * @parameter inputLength Length of input buffer
697	 * @parameter output Output buffer, to be filled with with values from
698	a-z2-7.
699	 * Must be of at least length input*8/5 + 1
700	 * @return Length of output buffer used
701	 * @author Mark Davis
702	 */

704	function toBase32(input, inputLength, output, parseResult) {
705	  //debugger;
706	  var bits = 0;
707	  var bitCount = 0;
708	  var ip = 0;
709	  var op = 0;
710	  var val = 0;
711	  while (true) {

713	    // get bits if we don't have enough

715	    if (bitCount < 5) {
716	      if (ip >= inputLength) break;
717	       // get another input
718	      bits <<= 8;
719	      if (baseDebugTo) alert("byte: " + input[ip].toString(16) + ",
720	        bitCount: " + (bitCount+8));

722	      bits = bits | input[ip++];
723	      bitCount += 8;
724	    }

726	    // emit and remove them

728	    bitCount -= 5;
729	    val = (bits >> bitCount);
730	    if (baseDebugTo) alert("Val: " + val.toString(16) + ", bitCount: "
731	      + bitCount);
732	    output[op++] = toLetter(val);
733	    //if (baseDebugTo) alert("out: " + output[op-1].toString(16));
734	    bits &= ~(0x1F << bitCount);
735	  }

737	  // add padding and output if necessary

739	  if (bitCount > 0) {
740	    if (baseDebugTo) alert("bits*: " + bits.toString(16) +
741	      ", bitCount: " + bitCount);
742	    val = bits << (5 - bitCount);
743	    if (baseDebugTo) alert("out*: " + val.toString(16));
744	    output[op++] = toLetter(val);
745	  }
746	  return op;
747	}

749	/**
750	 * Convert from base32 to bytes
751	 * @parameter input Input buffer of bytes with values from a-z2-7
752	 * @parameter inputLength Length of input buffer
753	 * @parameter output Output buffer, to be filled with bytes from
754	 *   00 to FF
755	 * Must be of at least length input*5/8 + 1
756	 * @return Length of output buffer used
757	 * @author Mark Davis
758	 */

760	function fromBase32(input, inputLength, output, parseResult) {
761	  //debugger;
762	  var inputCheck = inputLength % 8;
763	  if (inputCheck == 1 || inputCheck == 3 || inputCheck == 6) {
764	    parseResult.set("Base32 excess length", null, inputLength);
765	    return;
766	  }
767	  var bits = 0;
768	  var bitCount = 0;
769	  var ip = 0;
770	  var op = 0;
771	  var val = 0;
772	  while (ip < inputLength) {

774	    // get more bits
775	    var val = input[ip++];
776	    val = fromLetter(val);
777	    if (val < 0 || val > 0x3F) {
778	      parseResult.set("Bad Base32 byte", val, ip-1);
779	      return;
780	    }
781	    if (baseDebugFrom) alert("base32: " + val.toString(16));
782	    bits <<= 5;
783	    bits = bits | val;
784	    bitCount += 5;
785	    if (baseDebugFrom) alert("from: " + val.toString(16) +
786	      ", bitCount: " + bitCount);

788	    // emit & remove if we can

790	    if (bitCount >= 8) {
791	      bitCount -= 8;
792	      output[op++] = bits >> bitCount;
793	      if (baseDebugFrom) alert("out2: " + (bits >> bitCount) +
794	        ", bitCount: " + bitCount);
795	      bits &= ~(0xFF << bitCount);
796	    }
797	  }

799	  // check that padding is with zero!
800	  if (bits != 0) return -ip;
801	  return op;
802	}

804	function toLetter(val) {
805	  if (val > 25) return val - 26 + 0x32;
806	  return val + 0x61;
807	  // return val + (val < 26 ? 0x61 : 0x18);
808	}

810	function fromLetter(val) {
811	  if (val < 0x61) return val + 26 - 0x32;
812	  return val - 0x61;
813	}

815	C. Difrerences between -00 and -01

817	1: Minor typos.

819	2.1: Changed the tag to 'lq--'.

821	2.2 and 2.3: Added check for all-STD13 names in the steps.

823	2.4.1: Clarified first sentence. Step 5: fixed the moving of the IP.

825	2.4.2: Moved the last sentence of step 4 to be the first sentence of
826	step 5. Added the check for odd-length output. Changed the exit
827	comparision to doing a full comparison (instead of looking for lengths).

829	2.5.2: Changed the sense of the test in step 3 and added step 4 to check
830	for malformed input. Also made the output a buffer. Also added new step
831	1.

833	Changed Appendix B from IANA Considerations (of which there are none) to
834	Javascript code sample.

836	D. Author Contact Information

838	Mark Davis
839	IBM
840	10275 N. De Anza Blvd
841	Cupertino, CA 95014
842	mark.davis@us.ibm.com and mark.davis@macchiato.com

844	Paul Hoffman
845	Internet Mail Consortium and VPN Consortium
846	127 Segre Place
847	Santa Cruz, CA  95060 USA
848	paul.hoffman@imc.org and paul.hoffman@vpnc.org