idnits 2.17.1 draft-costello-idn-amc-ace-z-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 3 longer pages, the longest (page 7) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([UNICODE], [IDNA], [IDN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 786 has weird spacing: '... return cp - ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'NAMEPREP' is mentioned on line 76, but not defined

  == Missing Reference: 'RFC2119' is mentioned on line 119, but not defined

  == Missing Reference: '-1' is mentioned on line 976, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1074

  -- Looks like a reference, but probably isn't: '1' on line 1102

  -- Looks like a reference, but probably isn't: '2' on line 1049

  -- Looks like a reference, but probably isn't: '3' on line 1055

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN'

  == Outdated reference: A later version (-13) exists of
     draft-ietf-idn-idna-02

  == Outdated reference: A later version (-10) exists of
     draft-ietf-idn-nameprep-03

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PROVINCIAL'

  ** Downref: Normative reference to an Unknown state RFC: RFC  952

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'


     Summary: 6 errors (**), 0 flaws (~~), 8 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                          Adam M. Costello
2	draft-costello-idn-amc-ace-z-00.txt                          2001-Jul-11
3	Expires 2002-Jan-11

5	                         AMC-ACE-Z version 0.2.1

7	Status of this Memo

9	    This document is an Internet-Draft and is in full conformance with
10	    all provisions of Section 10 of RFC2026.

12	    Internet-Drafts are working documents of the Internet Engineering
13	    Task Force (IETF), its areas, and its working groups.  Note
14	    that other groups may also distribute working documents as
15	    Internet-Drafts.

17	    Internet-Drafts are draft documents valid for a maximum of six
18	    months and may be updated, replaced, or obsoleted by other documents
19	    at any time.  It is inappropriate to use Internet-Drafts as
20	    reference material or to cite them other than as "work in progress."

22	    The list of current Internet-Drafts can be accessed at
23	    http://www.ietf.org/ietf/1id-abstracts.txt

25	    The list of Internet-Draft Shadow Directories can be accessed at
26	    http://www.ietf.org/shadow.html

28	    Distribution of this document is unlimited.  Please send comments
29	    to the author at amc@cs.berkeley.edu, or to the idn working
30	    group at idn@ops.ietf.org.  A non-paginated (and possibly
31	    newer) version of this specification may be available at
32	    http://www.cs.berkeley.edu/~amc/charset/

34	Abstract

36	    AMC-ACE-Z is a simple and efficient ASCII-Compatible Encoding (ACE)
37	    designed for use with Internationalized Domain Names [IDN] [IDNA].
38	    It transforms a Unicode string [UNICODE] into a string of characters
39	    allowed in hostname labels (ASCII letters, digits, and hyphens)
40	    and back again.  AMC-ACE-Z is an instance of Bootstring that uses
41	    particular parameter values appropriate for IDNA and uses an IDNA
42	    signature prefix.  Bootstring allows a string of basic code points
43	    to uniquely represent any string of code points drawn from a larger
44	    set.  This document specifies Bootstring and the parameter values
45	    for AMC-ACE-Z.

47	Contents

49	    1. Introduction
50	    2. Terminology
51	    3. Bootstring description
52	        3.1 Basic code point segregation
53	        3.2 Insertion unsort coding
54	        3.3 Generalized variable-length integers
55	        3.4 Bias adaptation
56	    4. Bootstring parameters
57	    5. Parameter values for AMC-ACE-Z
58	    6. Bootstring algorithms
59	        6.1 Bias adaptation function
60	        6.2 Decoding procedure
61	        6.3 Encoding procedure
62	    7. AMC-ACE-Z example strings
63	    8. Security considerations
64	    9. References
65	    A. Author contact information
66	    B. Mixed-case annotation
67	    C. Sample implementation

69	1. Introduction

71	    The IDNA draft [IDNA] describes an architecture for supporting
72	    internationalized domain names.  Each label of a domain name may
73	    begin with a special prefix, in which case the remainder of the
74	    label is an ASCII-Compatible Encoding (ACE) of a Unicode string
75	    satisfying certain constraints.  For the details of the constraints,
76	    see [IDNA] and [NAMEPREP].  The prefix has not yet been specified,
77	    but see http://www.i-d-n.net/ for prefixes to be used for testing
78	    and experimentation.

80	    Bootstring has been designed to have the following features:

82	      * Completeness:  Every extended string (sequence of arbitrary code
83	        points) can be represented by a basic string (sequence of basic
84	        code points).  Restrictions on what strings are allowed, and on
85	        length, may be imposed by higher layers.

87	      * Uniqueness:  Every extended string maps to at most one basic
88	        string.

90	      * Reversibility:  Any extended string mapped to a basic string can
91	        be recovered from that basic string.

93	      * Efficient encoding:  The ratio of extended string length to
94	        basic string length is small.  This is important in the context
95	        of domain names because RFC 1034 [RFC1034] restricts the length
96	        of a domain label to 63 characters.

98	      * Simplicity:  The encoding and decoding algorithms are reasonably
99	        simple to implement.  The goals of efficiency and simplicity are
100	        at odds; Bootstring aims at a good balance between them.

102	      * Readability:  Basic code points appearing in the extended
103	        string are represented as themselves in the basic string.  This
104	        comes for free because it makes the encoding more efficient on
105	        average.

107	    In addition, AMC-ACE-Z can support an optional feature described in
108	    appendix B "Mixed-case annotation".

110	    AMC-ACE-Z is a working name that should be changed if it is adopted.
111	    (The Z merely indicates that it is the twenty-sixth ACE devised by
112	    this author.  Most were not worth releasing.)

114	2. Terminology

116	    The key words "must", "shall", "required", "should", "recommended",
117	    and "may" in this document are to be interpreted as described in RFC
118	    2119 [RFC2119].

120	    As in the Unicode Standard [UNICODE], Unicode code points are
121	    denoted by "U+" followed by four to six hexadecimal digits, while a
122	    range of code points is denoted by two hexadecimal numbers separated
123	    by "..", with no prefixes.

125	    The operators div and mod perform integer division; (x div y) is the
126	    quotient of x divided by y, discarding the remainder, and (x mod y)
127	    is the remainder, so (x div y) * y + (x mod y) == x.  Bootstring
128	    uses these operators only with nonnegative operands, so the quotient
129	    and remainder are always nonnegative.

131	    The ?: operator is a conditional; (x ? y : z) means y if x is true,
132	    z if x is false.  It is just like "if x then y else z" except that y
133	    and z are expressions rather than statements.

135	    The "break" statement jumps out of the innermost loop (as in C).

137	3. Bootstring description

139	    Bootstring represents an arbitrary sequence of code points (the
140	    "extended string") as a sequence of basic code points (the
141	    "basic string").  This section describes the representation.
142	    Section 6 "Bootstring algorithms" presents the algorithms as
143	    pseudocode.  There is also commented C code in appendix C "Sample
144	    implementation".

146	3.1 Basic code point segregation

148	    All basic code points appearing in the extended string are
149	    represented literally at the beginning of the basic string, in their
150	    original order, followed by a delimiter if (and only if) the number
151	    of basic code points is nonzero.  The delimiter is a particular
152	    basic code point, which never appears in the remainder of the basic
153	    string.  The decoder can therefore find the end of the literal
154	    portion (if there is one) by scanning for the last delimiter.

156	3.2 Insertion unsort coding

158	    The remainder of the basic string (after the last delimiter if there
159	    is one) represents a sequence of nonnegative integral deltas as
160	    generalized variable-length integers, described in section 3.3.  The
161	    meaning of the deltas is best understood in terms of the decoder.

163	    The decoder builds the extended string incrementally.  Initially,
164	    the extended string is a copy of the literal portion of the basic
165	    string (excluding the last delimiter).  Each delta causes the
166	    decoder to insert a code point into the extended string according
167	    to the following procedure.  There are two state variables: a
168	    code point n, and an index i that ranges from zero (which is the
169	    first position of the extended string) to the current length of
170	    the extended string (which refers to a potential position beyond
171	    the current end).  The decoder advances the state monotonically
172	    (never returning to an earlier state) by taking steps only upward.
173	    Each step increments i, except when i already equals the length
174	    of the extended string, in which case a step resets i to zero
175	    and increments n.  For each delta (in order), the decoder takes
176	    delta steps upward, then inserts the value n into the extended
177	    string at position i, then increments i (to skip over the code
178	    point just inserted).  (An implementation should not take each
179	    step individually, but should insead use division and remainder
180	    calculations to advance by delta steps all at once.)

182	    The encoder's main task is to derive the sequence of deltas that
183	    will cause the decoder to construct the desired string.  It can do
184	    this by repeatedly scanning the extended string for the next code
185	    point that the decoder would need to insert, and counting the number
186	    of steps the decoder would need to take, mindful of the fact that
187	    the decoder will be stepping over only those code points that have
188	    already been inserted.  Section 6.3 "Encoding procedure" gives a
189	    precise algorithm.

191	3.3 Generalized variable-length integers

193	    In a conventional integer representation the base is the number of
194	    distinct symbols for digits, whose values are 0 through base-1.  Let
195	    digit_0 denote the least significant digit, digit_1 the next least
196	    significant, and so on.  The value represented is the sum over j of
197	    digit_j * w(j), where w(j) = base^j is the weight (scale factor)
198	    for position j.  For example, in the base 8 integer 437, the digits
199	    are 7, 3, and 4, and the weights are 1, 8, and 64, so the value is
200	    7 + 3*8 + 4*64 = 287.  This representation has two disadvantages:
201	    First, there are multiple encodings of each value (because there
202	    can be extra zeros in the most significant positions), which
203	    is inconvenient when unique encodings are required.  Second,
204	    the integer is not self-delimiting, so if multiple integers are
205	    concatenated the boundaries between them are lost.

207	    The generalized variable-length representation solves these two
208	    problems.  The digit values are still 0 through base-1, but now
209	    the integer is self-delimiting by means of thresholds t(j), each
210	    of which is in the range 0 through base-1.  Exactly one digit, the
211	    most significant, satisfies digit_j < t(j).  Therefore, if several
212	    integers are concatenated, it is easy to separate them, starting
213	    with the first if they are little-endian (least significant digit
214	    first), or starting with the last if they are big-endian (most
215	    significant digit first).  As before, the value is the sum over j of
216	    digit_j * w(j), but the weights are different:

218	        w(0) = 1
219	        w(j) = w(j-1) * (base - t(j-1)) for j > 0

221	    For example, consider the little-endian sequence of base 8 digits
222	    734251...  Suppose the thresholds are 2, 3, 5, 5, 5, 5...  This
223	    implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5)
224	    = 90, 90*(8-5) = 270, and so on.  7 is not less than 2, and 3 is
225	    not less than 3, but 4 is less than 5, so 4 must be the last digit.
226	    The value of 734 is 7*1 + 3*6 + 4*30 = 145.  The next integer is
227	    251, with value 2*1 + 5*6 + 1*30 = 62.  Decoding this representation
228	    is very similar to decoding a conventional integer:  Start with a
229	    current value of N = 0 and a weight w = 1.  Fetch the next digit d
230	    and increase N by d * w.  If d is less than the current threshold
231	    (t) then stop, otherwise increase w by a factor of (base - t),
232	    update t for the next position, and repeat.

234	    Encoding this representation is similar to encoding a conventional
235	    integer:  If N < t then output one digit for N and stop, otherwise
236	    output the digit for t + ((N - t) mod (base - t)), then replace N
237	    with (N - t) div (base - t), update t for the next position, and
238	    repeat.

240	    For any particular set of values of t(j), there is exactly one
241	    generalized variable-length representation of each nonnegative
242	    integral value.

244	    Bootstring uses little-endian ordering so that the deltas can be
245	    separated starting with the first.  The t(j) values are defined in
246	    terms of the constants base, tmin, and tmax, and a state variable
247	    called bias:

249	        t(j) = base * (j + 1) - bias,
250	        clamped to the range tmin through tmax

252	    (The clamping means that if the formula yields a value less than
253	    tmin or greater than tmax, then t(j) = tmin or tmax, respectively.)
254	    These t(j) values cause the representation to favor integers within
255	    a particular range determined by the bias.

257	3.4 Bias adaptation

259	    After each delta is encoded or decoded, bias is set for the next
260	    delta as follows:

262	     1. Delta is scaled in order to avoid overflow in the next step:

264	            let delta = delta div 2

266	        But when this is the very first delta, the divisor is not 2, but
267	        instead a constant called damp.  This compensates for the fact
268	        that the second delta is usually much smaller than the first.

270	     2. Delta is increased to compensate for the fact that the next
271	        delta will be inserting into a longer string:

273	            let delta = delta + (delta div numpoints)

275	        numpoints is the total number of code points encoded/decoded so
276	        far (including the one corresponding to this delta itself, and
277	        including the basic code points).

279	     3. Delta is repeatedly divided until it falls within a threshold,
280	        to predict the minimum number of digits needed to represent the
281	        next delta:

283	            while delta > ((base - tmin) * tmax) div 2
284	            do let delta = delta div (base - tmin)

286	     4. The bias is set:

288	            let bias =
289	              (base * the number of divisions performed in step 3) +
290	              (((base - tmin + 1) * delta) div (delta + skew))

292	    The motivation for this procedure is that the current delta provides
293	    a hint about the likely size of the next delta, and so t(j) is
294	    set to tmax for the more significant digits starting with the one
295	    expected to be last, tmin for the less significant digits up through
296	    the one expected to be third-last, and somewhere between tmin and
297	    tmax for the digit expected to be second-last (balancing the hope of
298	    the expected-last digit being unnecessary against the danger of it
299	    being insufficient).

301	4. Bootstring parameters

303	    Given a set of basic code points, one must be chosen as the
304	    delimiter.  The base is the number of distinguishable basic code
305	    points remaining.  They must be associated with the values in the
306	    range 0 through base-1.  In some cases multiple code points must
307	    represent the same value; for example, uppercase and lowercase
308	    versions of a letter must be equivalent if basic strings are
309	    case-insensitive.

311	    The initial value of n should be the minimum non-basic code point
312	    that is allowed in extended strings.

314	    The remaining five parameters (tmin, tmax, skew, damp, and the
315	    initial value of bias) must satisfy the following constraints:

317	        0 <= tmin <= tmax <= base-1
318	        skew >= 1
319	        damp >= 2
320	        initial_bias mod base <= base - tmin

322	    Provided the constraints are satisfied, these five parameters affect
323	    efficiency but not correctness.  They should be chosen empirically.

325	    If support for mixed-case annotation is desired (see appendix B),
326	    make sure that the code points corresponding to 0 through tmax-1 all
327	    have both uppercase and lowercase forms.

329	5. Parameter values for AMC-ACE-Z

331	    AMC-ACE-Z uses the following values for the Bootstring parameters:

333	        base         = 36
334	        tmin         = 1
335	        tmax         = 26
336	        skew         = 38
337	        damp         = 700
338	        initial_bias = 72
339	        initial_n    = U+00A1

341	    In AMC-ACE-Z, code points are Unicode code points [UNICODE], that
342	    is, integers in the range 0..10FFFF, but not D800..DFFF, which are
343	    reserved for use by UTF-16.  The basic code points, along with their
344	    values, are:

346	        U+002D (-)   = delimiter
347	        41..5A (A-Z) = 0 to 25, respectively
348	        61..7A (a-z) = 0 to 25, respectively
349	        30..39 (0-9) = 26 to 35, respectively

351	    Using hyphen-minus as the delimiter implies that the ACE can end
352	    with a hyphen-minus only if the Unicode string consists entirely
353	    of basic code points, but IDNA forbids such strings from being
354	    ACE-encoded.  And since IDNA prepends a prefix that does not begin
355	    with a hyphen-minus, AMC-ACE-Z conforms to the RFC 952 requirement
356	    that hostname labels neither begin nor end with a hyphen-minus
357	    [RFC952].

359	    A decoder must recognize the letters in both uppercase and lowercase
360	    forms (including mixtures of both forms).  An encoder should output
361	    only uppercase forms or only lowercase forms, unless it uses
362	    mixed-case annotation (see appendix B).

364	    Presumably most users will not manually copy ACEs by writing or
365	    typing them (as opposed to letting computers do it via cut & paste),
366	    but those that do will need to be alert to the potential visual
367	    ambiguity between the following sets of characters:

369	        G 6
370	        I l 1
371	        O 0
372	        S 5
373	        U V
374	        Z 2
375	    Such ambiguities are usually resolved by context, but in an ACE
376	    there is no context apparent to humans.

378	6. Bootstring algorithms

380	6.1 Bias adaptation function

382	    function adapt(delta,numpoints,firsttime):
383	      let delta = delta div (firsttime ? damp : 2)
384	      let delta = delta + (delta div numpoints)
385	      let k = 0
386	      while delta > ((base - tmin) * tmax) div 2
387	      do let delta = delta div (base - tmin) and let k = k + base
388	      return k + (((base - tmin + 1) * delta) div (delta + skew))

390	6.2 Decoding procedure

392	    let n = initial_n
393	    let i = 0
394	    let bias = initial_bias
395	    let output = an empty string indexed from 0
396	    search the input for the last delimiter (do not consume the input)
397	    if one is found that is not at the very beginning then consume all
398	      preceeding code points, copy them to output, consume the delimiter
399	    while the input is not exhausted do begin
400	      let oldi = i
401	      let w = 1
402	      for k = base to infinity in steps of base do begin
403	        consume a code point, fail on end-of-input or invalid code point
404	        let digit = the code point's value
405	        let i = i + digit * w, fail on overflow
406	        let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias
407	        if digit < t then break
408	        let w = w * (base - t), fail on overflow
409	      end
410	      let bias = adapt(i - oldi, length(output) + 1, oldi == 0)
411	      let n = n + i div (length(output) + 1), fail on overflow
412	      let i = i mod (length(output) + 1)
413	      if n is a basic code point then fail  # see Note1 below
414	      insert n into output at position i
415	      increment i
416	    end

418	    Note1:  The check for whether n is a basic code point can be omitted
419	    if initial_n exceeds all basic code points (which is true for
420	    AMC-ACE-Z), because n only increases from initial_n.

422	    Because the decoder state can only advance monotonically, and there
423	    is only one representation of any delta, there is therefore only
424	    one encoded string that can represent a given sequence of integers.
425	    The only error conditions are invalid code points, unexpected
426	    end-of-input, overflow (attempts to compute values that exceed the
427	    maximum value of an integer variable), and basic code points encoded
428	    using deltas instead of appearing literally.  If the decoder fails
429	    on these errors as shown above, then it cannot produce the same
430	    output for two distinct inputs, and hence it need not re-encode its
431	    output to verify that it matches the input.

433	    The assignment of t, where t is clamped to the range tmin through
434	    tmax, does not handle the case where bias < k < bias + tmin, but
435	    that is impossible because of the way bias is computed and because
436	    of the constraints on the parameters.

438	    If the programming language does not provide overflow detection,
439	    the following technique can be used.  Suppose A, B, and C are
440	    representable nonnegative integers and C is nonzero.  Then A + B
441	    overflows if and only if B > maxint - A, and A + (B * C) overflows
442	    if and only if B > (maxint - A) div C.  See appendix C "Sample
443	    implementation" for demonstrations of this technique in AMC-ACE-Z.

445	6.3 Encoding procedure

447	    let n = initial_n
448	    let delta = 0
449	    let bias = initial_bias
450	    let h = b = the number of basic code points in the input
451	    copy them to the output in order, followed by a delimiter if b > 0
452	    if the input contains a non-basic code point < n then fail
453	    while h < length(input) do begin
454	      let m = the minimum non-basic code point >= n in the input # Note2
455	      let delta = delta + (m - n) * (h + 1), fail on overflow
456	      let n = m
457	      for each integer m in the input (in order) do begin
458	        if m is a basic code point  # see Note2 below
459	        then increment delta, fail on overflow, and continue
460	        if m < n then increment delta, fail on overflow
461	        if m == n then begin
462	          let q = delta
463	          for k = base to infinity in steps of base do begin
464	            let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias
465	            if q < t then break
466	            output the code point for digit t + ((q - t) mod (base - t))
467	            let q = (q - t) div (base - t)
468	          end
469	          output the code point for digit q
470	          let bias = adapt(delta, h + 1, h == b)
471	          let delta = 0
472	          increment h
473	        end
474	      end
475	      increment delta and n
476	    end

478	    Note2:  There are two places in the main loop where the encoder
479	    checks whether a code point is basic.  If initial_n exceeds all
480	    basic code points (which is true for AMC-ACE-Z) then m and n can
481	    never be basic code points, and the logic can be simplified.

483	    The checks for overflow are necessary to avoid producing invalid
484	    output when the input contains very large values or is very long.
485	    Wider integer variables can handle more extreme inputs.  For
486	    AMC-ACE-Z, 26-bit unsigned integers are sufficient, because in
487	    IDNA code points are limited 0..10FFFF and ACEs are limited to 59
488	    characters (excluding the prefix).

490	    The increment of delta at the bottom of the outer loop cannot
491	    overflow because delta < length(input) before the increment, and
492	    length(input) is already assumed to be representable.  The increment
493	    of n could overflow, but only if h == length(input), in which case
494	    the procedure is finished anyway.

496	7. AMC-ACE-Z example strings

498	    In the AMC-ACE-Z encodings below, the IDNA signature prefix is not
499	    shown.  AMC-ACE-Z is abbreviated AMC-Z.  Backslashes show where line
500	    breaks have been inserted in strings too long for one line.

502	    The first several examples are all translations of the sentence "Why
503	    can't they just speak in ?" (courtesy of Michael Kaplan's
504	    "provincial" page [PROVINCIAL]).  Word breaks and punctuation have
505	    been removed, as is often done in domain names.

507	    (A) Arabic (Egyptian):
508	        u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
509	        u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
510	        AMC-Z:  gfbpdaj6bu4bxfgehfvwxn

512	    (B) Chinese (simplified):
513	        u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
514	        AMC-Z:  kgqwcrb4cv8a8dqg056pqjye

516	    (C) Czech: Proprostnemluvesky
517	        U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
518	        u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
519	        u+0065 u+0073 u+006B u+0079
520	        AMC-Z:  Proprostnemluvesky-xgb24dma41a

522	    (D) Hebrew:
523	        u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
524	        u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
525	        u+05D1 u+05E8 u+05D9 u+05EA
526	        AMC-Z:  6cbcagdahymbxekheh6e0a7fei0b

528	    (E) Hindi (Devanagari):
529	        u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
530	        u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
531	        u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
532	        u+0939 u+0948 u+0902
533	        AMC-Z:  k0baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd

535	    (F) Japanese (kanji and hiragana):
536	        u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
537	        u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
538	        AMC-Z:  p7jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa

540	    (G) Korean (Hangul syllables):
541	        u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
542	        u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
543	        u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
544	        AMC-Z:  c89aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5jps\
545	                d879ccm6fea98c
546	    (H) Russian (Cyrillic):
547	        U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
548	        u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
549	        u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
550	        u+0438
551	        AMC-Z:  d0abfaaepdrnnbgefbaDotcwatmq2g4l

553	    (I) Spanish: PorqunopuedensimplementehablarenEspaol
554	        U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
555	        u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
556	        u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
557	        u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
558	        u+0061 u+00F1 u+006F u+006C
559	        AMC-Z:  PorqunopuedensimplementehablarenEspaol-nkc56a

561	    (J) Taiwanese:
562	        u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
563	        AMC-Z:  kgqwctvzc91f659drss3x8bo0yb

565	    (K) Vietnamese:
566	        Tisaohkhngthch\
567	        nitingVit
568	        U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
569	        u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
570	        u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
571	        U+0056 u+0069 u+1EC7 u+0074
572	        AMC-Z:  TisaohkhngthchnitingVit-xvbr8268qyxafd2f1b9g

574	    The next several examples are all names of Japanese music artists,
575	    song titles, and TV programs, just because the author happens to
576	    have them handy (but Japanese is useful for providing examples
577	    of single-row text, two-row text, ideographic text, and various
578	    mixtures thereof).

580	    (L) 3B
581	        u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F
582	        AMC-Z:  3B-2t4c5e180e575a65lsy2b

584	    (M) -with-SUPER-MONKEYS
585	        u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
586	        u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D
587	        U+004F U+004E U+004B U+0045 U+0059 U+0053
588	        AMC-Z:  -with-SUPER-MONKEYS-us48ag80a8qai00g7n9n

590	    (N) Hello-Another-Way-
591	        U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F
592	        u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D
593	        u+305D u+308C u+305E u+308C u+306E u+5834 u+6240
594	        AMC-Z:  Hello-Another-Way--it3qua05auwb3674vfr0b

596	    (O) 2
597	        u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032
598	        AMC-Z:  2-y7tlzr9756bt3uc0v

600	    (P) MajiKoi5
601	        U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
602	        u+308B u+0035 u+79D2 u+524D
603	        AMC-Z:  MajiKoi5-q03gue6qz075azm5e
604	    (Q) de
605	        u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
606	        AMC-Z:  de-pd4avhby1noc0d

608	    (R) 
609	        u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
610	        AMC-Z:  f8juau41awczczp

612	8. Security considerations

614	    Users expect each domain name in DNS to be controlled by a single
615	    authority.  If a Unicode string intended for use as a domain label
616	    could map to multiple ACE labels, then an internationalized domain
617	    name could map to multiple ACE domain names, each controlled by
618	    a different authority, some of which could be spoofs that hijack
619	    service requests intended for another.  Therefore AMC-ACE-Z is
620	    designed so that each Unicode string has a unique encoding.

622	    However, there can still be multiple Unicode representations of the
623	    "same" text, for various definitions of "same".  This problem is
624	    addressed to some extent by the Unicode standard under the topic of
625	    canonicalization, and this work is leveraged for domain names by
626	    "nameprep" [NAMEPREP03].

628	References

630	    [IDN] Internationalized Domain Names (IETF working group),
631	    http://www.i-d-n.net/, idn@ops.ietf.org.

633	    [IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host
634	    Names In Applications (IDNA)", 2001-Jun-16, draft-ietf-idn-idna-02.

636	    [NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation
637	    of Internationalized Host Names", 2001-Feb-24,
638	    draft-ietf-idn-nameprep-03.

640	    [PROVINCIAL] Michael Kaplan, "The 'anyone can be provincial!' page",
641	    http://www.trigeminal.com/samples/provincial.html.

643	    [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host
644	    Table Specification", 1985-Oct, RFC 952.

646	    [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
647	    1987-Nov, RFC 1034.

649	    [UNICODE] The Unicode Consortium, "The Unicode Standard",
650	    http://www.unicode.org/unicode/standard/standard.html.

652	A. Author contact information

654	    Adam M. Costello 
655	    University of California, Berkeley
656	    http://www.cs.berkeley.edu/~amc/
657	B. Mixed-case annotation

659	    In order to use AMC-ACE-Z to represent case-insensitive strings,
660	    higher layers need to case-fold the strings prior to AMC-ACE-Z
661	    encoding.  The encoded string can, however, use mixed case as an
662	    annotation telling how to convert the original folded string into a
663	    mixed-case string for display purposes.

665	    Basic code points are represented literally, and can therefore use
666	    mixed case directly.  Each non-basic code point is represented by
667	    a delta, which is represented by a sequence of basic code points,
668	    the last of which provides the annotation.  If it is uppercase,
669	    it is a suggestion to map the non-basic code point to uppercase
670	    (if possible); if it is lowercase, it is a suggestion to map the
671	    non-basic code point to lowercase (if possible).

673	    AMC-ACE-Z encoders and decoders are not required to support these
674	    annotations, and higher layers need not use them.

676	C. Sample implementation

678	/******************************************/
679	/* amc-ace-z.c 0.2.1 (2001-Jul-11-Wed)    */
680	/* Adam M. Costello  */
681	/******************************************/

683	/* This is ANSI C code (C89) implementing AMC-ACE-Z version 0.2.x. */

685	/************************************************************/
686	/* Public interface (would normally go in its own .h file): */

688	#include 

690	enum amc_ace_status {
691	  amc_ace_success,
692	  amc_ace_bad_input,   /* Input is invalid.                         */
693	  amc_ace_big_output,  /* Output would exceed the space provided.   */
694	  amc_ace_overflow     /* Input requires wider integers to process. */
695	};

697	#if UINT_MAX >= (1 << 26) - 1
698	typedef unsigned int amc_ace_z_uint;
699	#else
700	typedef unsigned long amc_ace_z_uint;
701	#endif

703	enum amc_ace_status amc_ace_z_encode(
704	  amc_ace_z_uint input_length,
705	  const amc_ace_z_uint input[],
706	  const unsigned char uppercase_flags[],
707	  amc_ace_z_uint *output_size,
708	  char output[] );
709	    /* amc_ace_z_encode() converts Unicode to AMC-ACE-Z (without      */
710	    /* any signature).  The input must be represented as an array     */
711	    /* of Unicode code points (not code units; surrogate pairs        */
712	    /* are not allowed), and the output will be represented as        */
713	    /* null-terminated ASCII.  The input_length is the number of      */
714	    /* code points in the input.  The output_size is an in/out        */
715	    /* argument: the caller must pass in the maximum number of        */
716	    /* characters that may be output (including the terminating       */
717	    /* null), and on successful return it will contain the number of  */
718	    /* characters actually output (including the terminating null,    */
719	    /* so it will be one more than strlen() would return, which is    */
720	    /* why it is called output_size rather than output_length).  The  */
721	    /* uppercase_flags array must hold input_length boolean values,   */
722	    /* where nonzero means the corresponding Unicode character should */
723	    /* be forced to uppercase after being decoded, and zero means it  */
724	    /* is caseless or should be forced to lowercase.  Alternatively,  */
725	    /* uppercase_flags may be a null pointer, which is equivalent     */
726	    /* to all zeros.  The letters a-z and A-Z are always encoded      */
727	    /* literally, regardless of the corresponding flags.  The return  */
728	    /* value may be any of the amc_ace_status values defined above;   */
729	    /* if not amc_ace_success, then output_size and output may        */
730	    /* contain garbage.                                               */

732	enum amc_ace_status amc_ace_z_decode(
733	  const char input[],
734	  amc_ace_z_uint *output_length,
735	  amc_ace_z_uint output[],
736	  unsigned char uppercase_flags[] );

738	    /* amc_ace_z_decode() converts AMC-ACE-Z (without any signature)  */
739	    /* to Unicode.  The input must be represented as null-terminated  */
740	    /* ASCII, and the output will be represented as an array of       */
741	    /* Unicode code points.  The output_length is an in/out argument: */
742	    /* the caller must pass in the maximum number of code points      */
743	    /* that may be output, and on successful return it will contain   */
744	    /* the actual number of code points output.  The uppercase_flags  */
745	    /* array must have room for at least output_length values, or it  */
746	    /* may be a null pointer if the case information is not needed.   */
747	    /* A nonzero flag indicates that the corresponding Unicode        */
748	    /* character should be forced to uppercase by the caller, while   */
749	    /* zero means it is caseless or should be forced to lowercase.    */
750	    /* The letters a-z and A-Z are output already in the proper case, */
751	    /* but their flags will be set appropriately so that applying the */
752	    /* flags would be harmless.  The return value may be any of the   */
753	    /* amc_ace_status values defined above; if not amc_ace_success,   */
754	    /* then output_length, output, and uppercase_flags may contain    */
755	    /* garbage.  On success, the decoder will never need to write     */
756	    /* an output_length greater than the length of the input (not     */
757	    /* counting the null terminator), because of how the encoding is  */
758	    /* defined.                                                       */
759	/**********************************************************/
760	/* Implementation (would normally go in its own .c file): */

762	#include 

764	/*** Bootstring parameters for AMC-ACE-Z ***/

766	enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700,
767	       initial_bias = 72, initial_n = 0xA1, delimiter = 0x2D };

769	/* encode_digit(d) returns the basic code point whose value  */
770	/* (when used for representing integers) is d, which must be */
771	/* in the range 0 to base-1.  The lowercase form is used.    */

773	static char encode_digit(amc_ace_z_uint d)
774	{
775	  return d + 22 + 75 * (d < 26);
776	  /*  0..25 map to ASCII a..z */
777	  /* 26..35 map to ASCII 0..9 */
778	}

780	/* decode_digit(cp) returns the numeric value of a basic code point */
781	/* (for use in representing integers) in the range 0 to base-1, or  */
782	/* base if cp is the delimiter, or base+1 otherwise.                */

784	static amc_ace_z_uint decode_digit(amc_ace_z_uint cp)
785	{
786	  return  cp - 48 < 10 ? cp - 22 :  cp - 65 < 26 ? cp - 65 :
787	          cp - 97 < 26 ? cp - 97 :  cp == delimiter ? base :  base + 1;
788	}

790	/*** Useful constants ***/

792	/* maxint is the maximum value of an amc_ace_z_uint variable: */
793	static const amc_ace_z_uint maxint = -1;

795	/* lobase and cutoff are used in the calculation of bias: */
796	enum { lobase = base - tmin, cutoff = lobase * tmax / 2 };

798	/*** Main encode function ***/

800	enum amc_ace_status amc_ace_z_encode(
801	  amc_ace_z_uint input_length,
802	  const amc_ace_z_uint input[],
803	  const unsigned char uppercase_flags[],
804	  amc_ace_z_uint *output_size,
805	  char output[] )
806	{
807	  amc_ace_z_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t;
808	  char shift;

810	  /* Initialize the state: */

812	  n = initial_n;
813	  delta = out = 0;
814	  max_out = *output_size;
815	  bias = initial_bias;
816	  /* Handle the basic code points, and make sure     */
817	  /* that all code points < n are basic code points: */

819	  for (j = 0;  j < input_length;  ++j) {
820	    if (decode_digit(input[j]) <= base) {
821	      if (max_out - out < 2) return amc_ace_big_output;
822	      output[out++] = input[j];
823	    }
824	    else if (input[j] < n) return amc_ace_bad_input;
825	  }

827	  h = b = out;

829	  /* h is the number of code points that have been handled, b is the  */
830	  /* number of basic code points, and out is the number of characters */
831	  /* that have been output.                                           */

833	  if (b > 0) output[out++] = delimiter;

835	  /* Main encoding loop: */

837	  while (h < input_length) {
838	    /* All non-basic code points < n have been     */
839	    /* handled already.  Find the next larger one: */

841	    for (m = maxint, j = 0;  j < input_length;  ++j) {
842	      /* not needed for AMC-ACE-Z: */
843	      /* if (decode_digit(input[j]) <= base) continue; */
844	      if (input[j] >= n && input[j] < m) m = input[j];
845	    }

847	    /* Increase delta enough to advance the decoder's    */
848	    /*  state to , but guard against overflow: */

850	    if (m - n > (maxint - delta) / (h + 1)) return amc_ace_overflow;
851	    delta += (m - n) * (h + 1);
852	    n = m;

854	    for (j = 0;  j < input_length;  ++j) {
855	      /* Not needed for AMC-ACE-Z: */
856	      #if 0
857	      if (decode_digit(input[j]) <= base) {
858	        if (++delta == 0) return amc_ace_overflow;
859	        continue;
860	      }
861	      #endif

863	      if (input[j] < n && ++delta == 0) return amc_ace_overflow;

865	      if (input[j] == n) {
866	        /* Represent delta as a generalized variable-length integer: */
867	        for (q = delta, k = base;  ;  k += base) {
868	          if (out >= max_out) return amc_ace_big_output;
869	          t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias;
870	          if (q < t) break;
871	          output[out++] = encode_digit(t + (q - t) % (base - t));
872	          q = (q - t) / (base - t);
873	        }

875	        shift = uppercase_flags && uppercase_flags[j] ? 32 : 0;
876	        /* shift controls the case of the terminal character: */
877	        output[out++] = encode_digit(q) - shift;

879	        /* Adapt the bias: */
880	        delta = h == b ? delta / damp : delta >> 1;
881	        delta += delta / (h + 1);
882	        for (bias = 0;  delta > cutoff;  bias += base) delta /= lobase;
883	        bias += (lobase + 1) * delta / (delta + skew);

885	        delta = 0;
886	        ++h;
887	      }
888	    }

890	    ++delta, ++n;
891	  }

893	  /* Append the null terminator: */
894	  if (out >= max_out) return amc_ace_big_output;
895	  output[out++] = 0;

897	  *output_size = out;
898	  return amc_ace_success;
899	}

901	/*** Main decode function ***/

903	enum amc_ace_status amc_ace_z_decode(
904	  const char input[],
905	  amc_ace_z_uint *output_length,
906	  amc_ace_z_uint output[],
907	  unsigned char uppercase_flags[] )
908	{
909	  amc_ace_z_uint n, out, i, oldi, max_out, bias, w, k, delta, digit, t;
910	  const char *in, *p;

912	  /* Initialize the state: */

914	  n = initial_n;
915	  out = i = 0;
916	  max_out = *output_length;
917	  bias = initial_bias;

919	  /* Handle the basic code points:  Let p point to the last */
920	  /* delimiter, or to the start if there is none, then copy */
921	  /* everything before p to the output.                     */

923	  for (p = in = input;  *in;  ++in) if (*in == delimiter) p = in;
924	  if (p - input > max_out) return amc_ace_big_output;
925	  for (in = input;  in < p;  ++in) {
926	    if (uppercase_flags) uppercase_flags[out] = *in >= 65 && *in <= 90;
927	    output[out++] = *in;
928	  }

930	  /* Main decoding loop:  Start just after p if any basic code */
931	  /* points were copied; start at the beginning otherwise.     */

933	  for (in = p > input ? p + 1 : input;  *in != 0;  ++out) {

935	    /* in points to the next character to be consumed, and   */
936	    /* out is the number of code points in the output array. */

938	    /* Decode a generalized variable-length integer into delta,  */
939	    /* which gets added to i.  The overflow checking is easier   */
940	    /* if we increase i as we go, then subtract off its starting */
941	    /* value at the end to obtain delta.                         */

943	    for (oldi = i, w = 1, k = base;  ;  k += base) {
944	      digit = decode_digit(*in++);
945	      if (digit >= base) return amc_ace_bad_input;
946	      if (digit > (maxint - i) / w) return amc_ace_overflow;
947	      i += digit * w;
948	      t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias;
949	      if (digit < t) break;
950	      if (w > maxint / (base - t)) return amc_ace_overflow;
951	      w *= (base - t);
952	    }

954	    /* Adapt the bias: */
955	    delta = oldi == 0 ? i / damp : (i - oldi) >> 1;
956	    delta += delta / (out + 1);
957	    for (bias = 0;  delta > cutoff;  bias += base) delta /= lobase;
958	    bias += (lobase + 1) * delta / (delta + skew);

960	    /* i was supposed to wrap around from out+1 to 0,   */
961	    /* incrementing n each time, so we'll fix that now: */

963	    if (i / (out + 1) > maxint - n) return amc_ace_overflow;
964	    n += i / (out + 1);
965	    i %= (out + 1);

967	    /* Insert n at position i of the output: */

969	    /* not needed for AMC-ACE-Z: */
970	    /* if (decode_digit(n) <= base) return amc_ace_invalid_input; */
971	    if (out >= max_out) return amc_ace_big_output;

973	    if (uppercase_flags) {
974	      memmove(uppercase_flags + i + 1, uppercase_flags + i, out - i);
975	      /* Case of last character determines uppercase flag: */
976	      uppercase_flags[i] = in[-1] >= 65 && in[-1] <= 90;
977	    }

979	    memmove(output + i + 1, output + i, (out - i) * sizeof *output);
980	    output[i++] = n;
981	  }
982	  *output_length = out;
983	  return amc_ace_success;
984	}

986	/******************************************************************/
987	/* Wrapper for testing (would normally go in a separate .c file): */

989	#include 
990	#include 
991	#include 
992	#include 

994	/* For testing, we'll just set some compile-time limits rather than */
995	/* use malloc(), and set a compile-time option rather than using a  */
996	/* command-line option.                                             */

998	enum {
999	  unicode_max_length = 256,
1000	  ace_max_size = 256
1001	};

1003	static void usage(char **argv)
1004	{
1005	  fprintf(stderr,
1006	    "%s -e reads code points and writes an AMC-ACE-Z string.\n"
1007	    "%s -d reads an AMC-ACE-Z string and writes code points.\n"
1008	    "Input and output are plain text in the native character set.\n"
1009	    "Code points are in the form u+hex separated by whitespace.\n"
1010	    "An AMC-ACE-Z string is a newline-terminated sequence of LDH\n"
1011	    "characters (without any signature).\n"
1012	    "The case of the u in u+hex is the force-to-uppercase flag.\n"
1013	    , argv[0], argv[0]);
1014	  exit(EXIT_FAILURE);
1015	}

1017	static void fail(const char *msg)
1018	{
1019	  fputs(msg,stderr);
1020	  exit(EXIT_FAILURE);
1021	}

1023	static const char too_big[] =
1024	  "input or output is too large, recompile with larger limits\n";
1025	static const char invalid_input[] = "invalid input\n";
1026	static const char overflow[] = "arithmetic overflow\n";
1027	static const char io_error[] = "I/O error\n";

1029	/* The following string is used to convert LDH      */
1030	/* characters between ASCII and the native charset: */
1031	static const char ldh_ascii[] =
1032	  "................"
1033	  "................"
1034	  ".............-.."
1035	  "0123456789......"
1036	  ".ABCDEFGHIJKLMNO"
1037	  "PQRSTUVWXYZ....."
1038	  ".abcdefghijklmno"
1039	  "pqrstuvwxyz";

1041	int main(int argc, char **argv)
1042	{
1043	  enum amc_ace_status status;
1044	  int r;
1045	  char *p;

1047	  if (argc != 2) usage(argv);
1048	  if (argv[1][0] != '-') usage(argv);
1049	  if (argv[1][2] != 0) usage(argv);

1051	  if (argv[1][1] == 'e') {
1052	    amc_ace_z_uint input[unicode_max_length];
1053	    unsigned long codept;
1054	    unsigned char uppercase_flags[unicode_max_length];
1055	    char output[ace_max_size], uplus[3];
1056	    unsigned int input_length, output_size, i;

1058	    /* Read the input code points: */

1060	    input_length = 0;

1062	    for (;;) {
1063	      r = scanf("%2s%lx", uplus, &codept);
1064	      if (ferror(stdin)) fail(io_error);
1065	      if (r == EOF || r == 0) break;

1067	      if (r != 2 || uplus[1] != '+' || codept > (amc_ace_z_uint)-1) {
1068	        fail(invalid_input);
1069	      }

1071	      if (input_length == unicode_max_length) fail(too_big);

1073	      if (uplus[0] == 'u') uppercase_flags[input_length] = 0;
1074	      else if (uplus[0] == 'U') uppercase_flags[input_length] = 1;
1075	      else fail(invalid_input);
1076	      input[input_length++] = codept;
1077	    }

1079	    /* Encode: */

1081	    output_size = ace_max_size;
1082	    status = amc_ace_z_encode(input_length, input, uppercase_flags,
1083	                              &output_size, output);
1084	    if (status == amc_ace_bad_input) fail(invalid_input);
1085	    if (status == amc_ace_big_output) fail(too_big);
1086	    if (status == amc_ace_overflow) fail(overflow);
1087	    assert(status == amc_ace_success);

1089	    /* Convert to native charset and output: */

1091	    for (p = output;  *p != 0;  ++p) {
1092	      i = *p;
1093	      assert(i <= 122 && ldh_ascii[i] != '.');
1094	      *p = ldh_ascii[i];
1095	    }

1097	    r = puts(output);
1098	    if (r == EOF) fail(io_error);
1099	    return EXIT_SUCCESS;
1100	  }

1102	  if (argv[1][1] == 'd') {
1103	    char input[ace_max_size], *pp;
1104	    amc_ace_z_uint output[unicode_max_length];
1105	    unsigned char uppercase_flags[unicode_max_length];
1106	    unsigned int input_length, output_length, i;

1108	    /* Read the AMC-ACE-Z input string and convert to ASCII: */

1110	    fgets(input, ace_max_size, stdin);
1111	    if (ferror(stdin)) fail(io_error);
1112	    if (feof(stdin)) fail(invalid_input);
1113	    input_length = strlen(input);
1114	    if (input[input_length - 1] != '\n') fail(too_big);
1115	    input[--input_length] = 0;

1117	    for (p = input;  *p != 0;  ++p) {
1118	      pp = strchr(ldh_ascii, *p);
1119	      if (pp == 0) fail(invalid_input);
1120	      *p = pp - ldh_ascii;
1121	    }

1123	    /* Decode: */

1125	    output_length = unicode_max_length;
1126	    status = amc_ace_z_decode(input, &output_length,
1127	                              output, uppercase_flags);
1128	    if (status == amc_ace_bad_input) fail(invalid_input);
1129	    if (status == amc_ace_big_output) fail(too_big);
1130	    if (status == amc_ace_overflow) fail(overflow);
1131	    assert(status == amc_ace_success);
1132	    /* Output the result: */

1134	    for (i = 0;  i < output_length;  ++i) {
1135	      r = printf("%s+%04lX\n",
1136	                 uppercase_flags[i] ? "U" : "u",
1137	                 (unsigned long) output[i] );
1138	      if (r < 0) fail(io_error);
1139	    }

1141	    return EXIT_SUCCESS;
1142	  }

1144	  usage(argv);
1145	  return EXIT_SUCCESS;  /* not reached, but quiets compiler warning */
1146	}

1148	                   INTERNET-DRAFT expires 2002-Jan-11