idnits 2.17.1 draft-costello-idn-amc-ace-z-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 3 longer pages, the longest (page 7) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([UNICODE], [IDNA], [IDN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 786 has weird spacing: '... return cp - ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'NAMEPREP' is mentioned on line 76, but not defined == Missing Reference: 'RFC2119' is mentioned on line 119, but not defined == Missing Reference: '-1' is mentioned on line 976, but not defined -- Looks like a reference, but probably isn't: '0' on line 1074 -- Looks like a reference, but probably isn't: '1' on line 1102 -- Looks like a reference, but probably isn't: '2' on line 1049 -- Looks like a reference, but probably isn't: '3' on line 1055 -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN' == Outdated reference: A later version (-13) exists of draft-ietf-idn-idna-02 == Outdated reference: A later version (-10) exists of draft-ietf-idn-nameprep-03 -- Possible downref: Non-RFC (?) normative reference: ref. 'PROVINCIAL' ** Downref: Normative reference to an Unknown state RFC: RFC 952 -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' Summary: 6 errors (**), 0 flaws (~~), 8 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Adam M. Costello 2 draft-costello-idn-amc-ace-z-00.txt 2001-Jul-11 3 Expires 2002-Jan-11 5 AMC-ACE-Z version 0.2.1 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note 14 that other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other documents 19 at any time. It is inappropriate to use Internet-Drafts as 20 reference material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html 28 Distribution of this document is unlimited. Please send comments 29 to the author at amc@cs.berkeley.edu, or to the idn working 30 group at idn@ops.ietf.org. A non-paginated (and possibly 31 newer) version of this specification may be available at 32 http://www.cs.berkeley.edu/~amc/charset/ 34 Abstract 36 AMC-ACE-Z is a simple and efficient ASCII-Compatible Encoding (ACE) 37 designed for use with Internationalized Domain Names [IDN] [IDNA]. 38 It transforms a Unicode string [UNICODE] into a string of characters 39 allowed in hostname labels (ASCII letters, digits, and hyphens) 40 and back again. AMC-ACE-Z is an instance of Bootstring that uses 41 particular parameter values appropriate for IDNA and uses an IDNA 42 signature prefix. Bootstring allows a string of basic code points 43 to uniquely represent any string of code points drawn from a larger 44 set. This document specifies Bootstring and the parameter values 45 for AMC-ACE-Z. 47 Contents 49 1. Introduction 50 2. Terminology 51 3. Bootstring description 52 3.1 Basic code point segregation 53 3.2 Insertion unsort coding 54 3.3 Generalized variable-length integers 55 3.4 Bias adaptation 56 4. Bootstring parameters 57 5. Parameter values for AMC-ACE-Z 58 6. Bootstring algorithms 59 6.1 Bias adaptation function 60 6.2 Decoding procedure 61 6.3 Encoding procedure 62 7. AMC-ACE-Z example strings 63 8. Security considerations 64 9. References 65 A. Author contact information 66 B. Mixed-case annotation 67 C. Sample implementation 69 1. Introduction 71 The IDNA draft [IDNA] describes an architecture for supporting 72 internationalized domain names. Each label of a domain name may 73 begin with a special prefix, in which case the remainder of the 74 label is an ASCII-Compatible Encoding (ACE) of a Unicode string 75 satisfying certain constraints. For the details of the constraints, 76 see [IDNA] and [NAMEPREP]. The prefix has not yet been specified, 77 but see http://www.i-d-n.net/ for prefixes to be used for testing 78 and experimentation. 80 Bootstring has been designed to have the following features: 82 * Completeness: Every extended string (sequence of arbitrary code 83 points) can be represented by a basic string (sequence of basic 84 code points). Restrictions on what strings are allowed, and on 85 length, may be imposed by higher layers. 87 * Uniqueness: Every extended string maps to at most one basic 88 string. 90 * Reversibility: Any extended string mapped to a basic string can 91 be recovered from that basic string. 93 * Efficient encoding: The ratio of extended string length to 94 basic string length is small. This is important in the context 95 of domain names because RFC 1034 [RFC1034] restricts the length 96 of a domain label to 63 characters. 98 * Simplicity: The encoding and decoding algorithms are reasonably 99 simple to implement. The goals of efficiency and simplicity are 100 at odds; Bootstring aims at a good balance between them. 102 * Readability: Basic code points appearing in the extended 103 string are represented as themselves in the basic string. This 104 comes for free because it makes the encoding more efficient on 105 average. 107 In addition, AMC-ACE-Z can support an optional feature described in 108 appendix B "Mixed-case annotation". 110 AMC-ACE-Z is a working name that should be changed if it is adopted. 111 (The Z merely indicates that it is the twenty-sixth ACE devised by 112 this author. Most were not worth releasing.) 114 2. Terminology 116 The key words "must", "shall", "required", "should", "recommended", 117 and "may" in this document are to be interpreted as described in RFC 118 2119 [RFC2119]. 120 As in the Unicode Standard [UNICODE], Unicode code points are 121 denoted by "U+" followed by four to six hexadecimal digits, while a 122 range of code points is denoted by two hexadecimal numbers separated 123 by "..", with no prefixes. 125 The operators div and mod perform integer division; (x div y) is the 126 quotient of x divided by y, discarding the remainder, and (x mod y) 127 is the remainder, so (x div y) * y + (x mod y) == x. Bootstring 128 uses these operators only with nonnegative operands, so the quotient 129 and remainder are always nonnegative. 131 The ?: operator is a conditional; (x ? y : z) means y if x is true, 132 z if x is false. It is just like "if x then y else z" except that y 133 and z are expressions rather than statements. 135 The "break" statement jumps out of the innermost loop (as in C). 137 3. Bootstring description 139 Bootstring represents an arbitrary sequence of code points (the 140 "extended string") as a sequence of basic code points (the 141 "basic string"). This section describes the representation. 142 Section 6 "Bootstring algorithms" presents the algorithms as 143 pseudocode. There is also commented C code in appendix C "Sample 144 implementation". 146 3.1 Basic code point segregation 148 All basic code points appearing in the extended string are 149 represented literally at the beginning of the basic string, in their 150 original order, followed by a delimiter if (and only if) the number 151 of basic code points is nonzero. The delimiter is a particular 152 basic code point, which never appears in the remainder of the basic 153 string. The decoder can therefore find the end of the literal 154 portion (if there is one) by scanning for the last delimiter. 156 3.2 Insertion unsort coding 158 The remainder of the basic string (after the last delimiter if there 159 is one) represents a sequence of nonnegative integral deltas as 160 generalized variable-length integers, described in section 3.3. The 161 meaning of the deltas is best understood in terms of the decoder. 163 The decoder builds the extended string incrementally. Initially, 164 the extended string is a copy of the literal portion of the basic 165 string (excluding the last delimiter). Each delta causes the 166 decoder to insert a code point into the extended string according 167 to the following procedure. There are two state variables: a 168 code point n, and an index i that ranges from zero (which is the 169 first position of the extended string) to the current length of 170 the extended string (which refers to a potential position beyond 171 the current end). The decoder advances the state monotonically 172 (never returning to an earlier state) by taking steps only upward. 173 Each step increments i, except when i already equals the length 174 of the extended string, in which case a step resets i to zero 175 and increments n. For each delta (in order), the decoder takes 176 delta steps upward, then inserts the value n into the extended 177 string at position i, then increments i (to skip over the code 178 point just inserted). (An implementation should not take each 179 step individually, but should insead use division and remainder 180 calculations to advance by delta steps all at once.) 182 The encoder's main task is to derive the sequence of deltas that 183 will cause the decoder to construct the desired string. It can do 184 this by repeatedly scanning the extended string for the next code 185 point that the decoder would need to insert, and counting the number 186 of steps the decoder would need to take, mindful of the fact that 187 the decoder will be stepping over only those code points that have 188 already been inserted. Section 6.3 "Encoding procedure" gives a 189 precise algorithm. 191 3.3 Generalized variable-length integers 193 In a conventional integer representation the base is the number of 194 distinct symbols for digits, whose values are 0 through base-1. Let 195 digit_0 denote the least significant digit, digit_1 the next least 196 significant, and so on. The value represented is the sum over j of 197 digit_j * w(j), where w(j) = base^j is the weight (scale factor) 198 for position j. For example, in the base 8 integer 437, the digits 199 are 7, 3, and 4, and the weights are 1, 8, and 64, so the value is 200 7 + 3*8 + 4*64 = 287. This representation has two disadvantages: 201 First, there are multiple encodings of each value (because there 202 can be extra zeros in the most significant positions), which 203 is inconvenient when unique encodings are required. Second, 204 the integer is not self-delimiting, so if multiple integers are 205 concatenated the boundaries between them are lost. 207 The generalized variable-length representation solves these two 208 problems. The digit values are still 0 through base-1, but now 209 the integer is self-delimiting by means of thresholds t(j), each 210 of which is in the range 0 through base-1. Exactly one digit, the 211 most significant, satisfies digit_j < t(j). Therefore, if several 212 integers are concatenated, it is easy to separate them, starting 213 with the first if they are little-endian (least significant digit 214 first), or starting with the last if they are big-endian (most 215 significant digit first). As before, the value is the sum over j of 216 digit_j * w(j), but the weights are different: 218 w(0) = 1 219 w(j) = w(j-1) * (base - t(j-1)) for j > 0 221 For example, consider the little-endian sequence of base 8 digits 222 734251... Suppose the thresholds are 2, 3, 5, 5, 5, 5... This 223 implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5) 224 = 90, 90*(8-5) = 270, and so on. 7 is not less than 2, and 3 is 225 not less than 3, but 4 is less than 5, so 4 must be the last digit. 226 The value of 734 is 7*1 + 3*6 + 4*30 = 145. The next integer is 227 251, with value 2*1 + 5*6 + 1*30 = 62. Decoding this representation 228 is very similar to decoding a conventional integer: Start with a 229 current value of N = 0 and a weight w = 1. Fetch the next digit d 230 and increase N by d * w. If d is less than the current threshold 231 (t) then stop, otherwise increase w by a factor of (base - t), 232 update t for the next position, and repeat. 234 Encoding this representation is similar to encoding a conventional 235 integer: If N < t then output one digit for N and stop, otherwise 236 output the digit for t + ((N - t) mod (base - t)), then replace N 237 with (N - t) div (base - t), update t for the next position, and 238 repeat. 240 For any particular set of values of t(j), there is exactly one 241 generalized variable-length representation of each nonnegative 242 integral value. 244 Bootstring uses little-endian ordering so that the deltas can be 245 separated starting with the first. The t(j) values are defined in 246 terms of the constants base, tmin, and tmax, and a state variable 247 called bias: 249 t(j) = base * (j + 1) - bias, 250 clamped to the range tmin through tmax 252 (The clamping means that if the formula yields a value less than 253 tmin or greater than tmax, then t(j) = tmin or tmax, respectively.) 254 These t(j) values cause the representation to favor integers within 255 a particular range determined by the bias. 257 3.4 Bias adaptation 259 After each delta is encoded or decoded, bias is set for the next 260 delta as follows: 262 1. Delta is scaled in order to avoid overflow in the next step: 264 let delta = delta div 2 266 But when this is the very first delta, the divisor is not 2, but 267 instead a constant called damp. This compensates for the fact 268 that the second delta is usually much smaller than the first. 270 2. Delta is increased to compensate for the fact that the next 271 delta will be inserting into a longer string: 273 let delta = delta + (delta div numpoints) 275 numpoints is the total number of code points encoded/decoded so 276 far (including the one corresponding to this delta itself, and 277 including the basic code points). 279 3. Delta is repeatedly divided until it falls within a threshold, 280 to predict the minimum number of digits needed to represent the 281 next delta: 283 while delta > ((base - tmin) * tmax) div 2 284 do let delta = delta div (base - tmin) 286 4. The bias is set: 288 let bias = 289 (base * the number of divisions performed in step 3) + 290 (((base - tmin + 1) * delta) div (delta + skew)) 292 The motivation for this procedure is that the current delta provides 293 a hint about the likely size of the next delta, and so t(j) is 294 set to tmax for the more significant digits starting with the one 295 expected to be last, tmin for the less significant digits up through 296 the one expected to be third-last, and somewhere between tmin and 297 tmax for the digit expected to be second-last (balancing the hope of 298 the expected-last digit being unnecessary against the danger of it 299 being insufficient). 301 4. Bootstring parameters 303 Given a set of basic code points, one must be chosen as the 304 delimiter. The base is the number of distinguishable basic code 305 points remaining. They must be associated with the values in the 306 range 0 through base-1. In some cases multiple code points must 307 represent the same value; for example, uppercase and lowercase 308 versions of a letter must be equivalent if basic strings are 309 case-insensitive. 311 The initial value of n should be the minimum non-basic code point 312 that is allowed in extended strings. 314 The remaining five parameters (tmin, tmax, skew, damp, and the 315 initial value of bias) must satisfy the following constraints: 317 0 <= tmin <= tmax <= base-1 318 skew >= 1 319 damp >= 2 320 initial_bias mod base <= base - tmin 322 Provided the constraints are satisfied, these five parameters affect 323 efficiency but not correctness. They should be chosen empirically. 325 If support for mixed-case annotation is desired (see appendix B), 326 make sure that the code points corresponding to 0 through tmax-1 all 327 have both uppercase and lowercase forms. 329 5. Parameter values for AMC-ACE-Z 331 AMC-ACE-Z uses the following values for the Bootstring parameters: 333 base = 36 334 tmin = 1 335 tmax = 26 336 skew = 38 337 damp = 700 338 initial_bias = 72 339 initial_n = U+00A1 341 In AMC-ACE-Z, code points are Unicode code points [UNICODE], that 342 is, integers in the range 0..10FFFF, but not D800..DFFF, which are 343 reserved for use by UTF-16. The basic code points, along with their 344 values, are: 346 U+002D (-) = delimiter 347 41..5A (A-Z) = 0 to 25, respectively 348 61..7A (a-z) = 0 to 25, respectively 349 30..39 (0-9) = 26 to 35, respectively 351 Using hyphen-minus as the delimiter implies that the ACE can end 352 with a hyphen-minus only if the Unicode string consists entirely 353 of basic code points, but IDNA forbids such strings from being 354 ACE-encoded. And since IDNA prepends a prefix that does not begin 355 with a hyphen-minus, AMC-ACE-Z conforms to the RFC 952 requirement 356 that hostname labels neither begin nor end with a hyphen-minus 357 [RFC952]. 359 A decoder must recognize the letters in both uppercase and lowercase 360 forms (including mixtures of both forms). An encoder should output 361 only uppercase forms or only lowercase forms, unless it uses 362 mixed-case annotation (see appendix B). 364 Presumably most users will not manually copy ACEs by writing or 365 typing them (as opposed to letting computers do it via cut & paste), 366 but those that do will need to be alert to the potential visual 367 ambiguity between the following sets of characters: 369 G 6 370 I l 1 371 O 0 372 S 5 373 U V 374 Z 2 375 Such ambiguities are usually resolved by context, but in an ACE 376 there is no context apparent to humans. 378 6. Bootstring algorithms 380 6.1 Bias adaptation function 382 function adapt(delta,numpoints,firsttime): 383 let delta = delta div (firsttime ? damp : 2) 384 let delta = delta + (delta div numpoints) 385 let k = 0 386 while delta > ((base - tmin) * tmax) div 2 387 do let delta = delta div (base - tmin) and let k = k + base 388 return k + (((base - tmin + 1) * delta) div (delta + skew)) 390 6.2 Decoding procedure 392 let n = initial_n 393 let i = 0 394 let bias = initial_bias 395 let output = an empty string indexed from 0 396 search the input for the last delimiter (do not consume the input) 397 if one is found that is not at the very beginning then consume all 398 preceeding code points, copy them to output, consume the delimiter 399 while the input is not exhausted do begin 400 let oldi = i 401 let w = 1 402 for k = base to infinity in steps of base do begin 403 consume a code point, fail on end-of-input or invalid code point 404 let digit = the code point's value 405 let i = i + digit * w, fail on overflow 406 let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias 407 if digit < t then break 408 let w = w * (base - t), fail on overflow 409 end 410 let bias = adapt(i - oldi, length(output) + 1, oldi == 0) 411 let n = n + i div (length(output) + 1), fail on overflow 412 let i = i mod (length(output) + 1) 413 if n is a basic code point then fail # see Note1 below 414 insert n into output at position i 415 increment i 416 end 418 Note1: The check for whether n is a basic code point can be omitted 419 if initial_n exceeds all basic code points (which is true for 420 AMC-ACE-Z), because n only increases from initial_n. 422 Because the decoder state can only advance monotonically, and there 423 is only one representation of any delta, there is therefore only 424 one encoded string that can represent a given sequence of integers. 425 The only error conditions are invalid code points, unexpected 426 end-of-input, overflow (attempts to compute values that exceed the 427 maximum value of an integer variable), and basic code points encoded 428 using deltas instead of appearing literally. If the decoder fails 429 on these errors as shown above, then it cannot produce the same 430 output for two distinct inputs, and hence it need not re-encode its 431 output to verify that it matches the input. 433 The assignment of t, where t is clamped to the range tmin through 434 tmax, does not handle the case where bias < k < bias + tmin, but 435 that is impossible because of the way bias is computed and because 436 of the constraints on the parameters. 438 If the programming language does not provide overflow detection, 439 the following technique can be used. Suppose A, B, and C are 440 representable nonnegative integers and C is nonzero. Then A + B 441 overflows if and only if B > maxint - A, and A + (B * C) overflows 442 if and only if B > (maxint - A) div C. See appendix C "Sample 443 implementation" for demonstrations of this technique in AMC-ACE-Z. 445 6.3 Encoding procedure 447 let n = initial_n 448 let delta = 0 449 let bias = initial_bias 450 let h = b = the number of basic code points in the input 451 copy them to the output in order, followed by a delimiter if b > 0 452 if the input contains a non-basic code point < n then fail 453 while h < length(input) do begin 454 let m = the minimum non-basic code point >= n in the input # Note2 455 let delta = delta + (m - n) * (h + 1), fail on overflow 456 let n = m 457 for each integer m in the input (in order) do begin 458 if m is a basic code point # see Note2 below 459 then increment delta, fail on overflow, and continue 460 if m < n then increment delta, fail on overflow 461 if m == n then begin 462 let q = delta 463 for k = base to infinity in steps of base do begin 464 let t = k <= bias ? tmin : k - bias > tmax ? tmax : k - bias 465 if q < t then break 466 output the code point for digit t + ((q - t) mod (base - t)) 467 let q = (q - t) div (base - t) 468 end 469 output the code point for digit q 470 let bias = adapt(delta, h + 1, h == b) 471 let delta = 0 472 increment h 473 end 474 end 475 increment delta and n 476 end 478 Note2: There are two places in the main loop where the encoder 479 checks whether a code point is basic. If initial_n exceeds all 480 basic code points (which is true for AMC-ACE-Z) then m and n can 481 never be basic code points, and the logic can be simplified. 483 The checks for overflow are necessary to avoid producing invalid 484 output when the input contains very large values or is very long. 485 Wider integer variables can handle more extreme inputs. For 486 AMC-ACE-Z, 26-bit unsigned integers are sufficient, because in 487 IDNA code points are limited 0..10FFFF and ACEs are limited to 59 488 characters (excluding the prefix). 490 The increment of delta at the bottom of the outer loop cannot 491 overflow because delta < length(input) before the increment, and 492 length(input) is already assumed to be representable. The increment 493 of n could overflow, but only if h == length(input), in which case 494 the procedure is finished anyway. 496 7. AMC-ACE-Z example strings 498 In the AMC-ACE-Z encodings below, the IDNA signature prefix is not 499 shown. AMC-ACE-Z is abbreviated AMC-Z. Backslashes show where line 500 breaks have been inserted in strings too long for one line. 502 The first several examples are all translations of the sentence "Why 503 can't they just speak in ?" (courtesy of Michael Kaplan's 504 "provincial" page [PROVINCIAL]). Word breaks and punctuation have 505 been removed, as is often done in domain names. 507 (A) Arabic (Egyptian): 508 u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644 509 u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F 510 AMC-Z: gfbpdaj6bu4bxfgehfvwxn 512 (B) Chinese (simplified): 513 u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587 514 AMC-Z: kgqwcrb4cv8a8dqg056pqjye 516 (C) Czech: Proprostnemluvesky 517 U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074 518 u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D 519 u+0065 u+0073 u+006B u+0079 520 AMC-Z: Proprostnemluvesky-xgb24dma41a 522 (D) Hebrew: 523 u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8 524 u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2 525 u+05D1 u+05E8 u+05D9 u+05EA 526 AMC-Z: 6cbcagdahymbxekheh6e0a7fei0b 528 (E) Hindi (Devanagari): 529 u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D 530 u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939 531 u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947 532 u+0939 u+0948 u+0902 533 AMC-Z: k0baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd 535 (F) Japanese (kanji and hiragana): 536 u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092 537 u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B 538 AMC-Z: p7jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa 540 (G) Korean (Hangul syllables): 541 u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774 542 u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74 543 u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C 544 AMC-Z: c89aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5jps\ 545 d879ccm6fea98c 546 (H) Russian (Cyrillic): 547 U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E 548 u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440 549 u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A 550 u+0438 551 AMC-Z: d0abfaaepdrnnbgefbaDotcwatmq2g4l 553 (I) Spanish: PorqunopuedensimplementehablarenEspaol 554 U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070 555 u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070 556 u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061 557 u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070 558 u+0061 u+00F1 u+006F u+006C 559 AMC-Z: PorqunopuedensimplementehablarenEspaol-nkc56a 561 (J) Taiwanese: 562 u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587 563 AMC-Z: kgqwctvzc91f659drss3x8bo0yb 565 (K) Vietnamese: 566 Tisaohkhngthch\ 567 nitingVit 568 U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B 569 u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068 570 u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067 571 U+0056 u+0069 u+1EC7 u+0074 572 AMC-Z: TisaohkhngthchnitingVit-xvbr8268qyxafd2f1b9g 574 The next several examples are all names of Japanese music artists, 575 song titles, and TV programs, just because the author happens to 576 have them handy (but Japanese is useful for providing examples 577 of single-row text, two-row text, ideographic text, and various 578 mixtures thereof). 580 (L) 3B 581 u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F 582 AMC-Z: 3B-2t4c5e180e575a65lsy2b 584 (M) -with-SUPER-MONKEYS 585 u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074 586 u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D 587 U+004F U+004E U+004B U+0045 U+0059 U+0053 588 AMC-Z: -with-SUPER-MONKEYS-us48ag80a8qai00g7n9n 590 (N) Hello-Another-Way- 591 U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F 592 u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D 593 u+305D u+308C u+305E u+308C u+306E u+5834 u+6240 594 AMC-Z: Hello-Another-Way--it3qua05auwb3674vfr0b 596 (O) 2 597 u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032 598 AMC-Z: 2-y7tlzr9756bt3uc0v 600 (P) MajiKoi5 601 U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059 602 u+308B u+0035 u+79D2 u+524D 603 AMC-Z: MajiKoi5-q03gue6qz075azm5e 604 (Q) de 605 u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0 606 AMC-Z: de-pd4avhby1noc0d 608 (R) 609 u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067 610 AMC-Z: f8juau41awczczp 612 8. Security considerations 614 Users expect each domain name in DNS to be controlled by a single 615 authority. If a Unicode string intended for use as a domain label 616 could map to multiple ACE labels, then an internationalized domain 617 name could map to multiple ACE domain names, each controlled by 618 a different authority, some of which could be spoofs that hijack 619 service requests intended for another. Therefore AMC-ACE-Z is 620 designed so that each Unicode string has a unique encoding. 622 However, there can still be multiple Unicode representations of the 623 "same" text, for various definitions of "same". This problem is 624 addressed to some extent by the Unicode standard under the topic of 625 canonicalization, and this work is leveraged for domain names by 626 "nameprep" [NAMEPREP03]. 628 References 630 [IDN] Internationalized Domain Names (IETF working group), 631 http://www.i-d-n.net/, idn@ops.ietf.org. 633 [IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host 634 Names In Applications (IDNA)", 2001-Jun-16, draft-ietf-idn-idna-02. 636 [NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation 637 of Internationalized Host Names", 2001-Feb-24, 638 draft-ietf-idn-nameprep-03. 640 [PROVINCIAL] Michael Kaplan, "The 'anyone can be provincial!' page", 641 http://www.trigeminal.com/samples/provincial.html. 643 [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host 644 Table Specification", 1985-Oct, RFC 952. 646 [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", 647 1987-Nov, RFC 1034. 649 [UNICODE] The Unicode Consortium, "The Unicode Standard", 650 http://www.unicode.org/unicode/standard/standard.html. 652 A. Author contact information 654 Adam M. Costello 655 University of California, Berkeley 656 http://www.cs.berkeley.edu/~amc/ 657 B. Mixed-case annotation 659 In order to use AMC-ACE-Z to represent case-insensitive strings, 660 higher layers need to case-fold the strings prior to AMC-ACE-Z 661 encoding. The encoded string can, however, use mixed case as an 662 annotation telling how to convert the original folded string into a 663 mixed-case string for display purposes. 665 Basic code points are represented literally, and can therefore use 666 mixed case directly. Each non-basic code point is represented by 667 a delta, which is represented by a sequence of basic code points, 668 the last of which provides the annotation. If it is uppercase, 669 it is a suggestion to map the non-basic code point to uppercase 670 (if possible); if it is lowercase, it is a suggestion to map the 671 non-basic code point to lowercase (if possible). 673 AMC-ACE-Z encoders and decoders are not required to support these 674 annotations, and higher layers need not use them. 676 C. Sample implementation 678 /******************************************/ 679 /* amc-ace-z.c 0.2.1 (2001-Jul-11-Wed) */ 680 /* Adam M. Costello */ 681 /******************************************/ 683 /* This is ANSI C code (C89) implementing AMC-ACE-Z version 0.2.x. */ 685 /************************************************************/ 686 /* Public interface (would normally go in its own .h file): */ 688 #include 690 enum amc_ace_status { 691 amc_ace_success, 692 amc_ace_bad_input, /* Input is invalid. */ 693 amc_ace_big_output, /* Output would exceed the space provided. */ 694 amc_ace_overflow /* Input requires wider integers to process. */ 695 }; 697 #if UINT_MAX >= (1 << 26) - 1 698 typedef unsigned int amc_ace_z_uint; 699 #else 700 typedef unsigned long amc_ace_z_uint; 701 #endif 703 enum amc_ace_status amc_ace_z_encode( 704 amc_ace_z_uint input_length, 705 const amc_ace_z_uint input[], 706 const unsigned char uppercase_flags[], 707 amc_ace_z_uint *output_size, 708 char output[] ); 709 /* amc_ace_z_encode() converts Unicode to AMC-ACE-Z (without */ 710 /* any signature). The input must be represented as an array */ 711 /* of Unicode code points (not code units; surrogate pairs */ 712 /* are not allowed), and the output will be represented as */ 713 /* null-terminated ASCII. The input_length is the number of */ 714 /* code points in the input. The output_size is an in/out */ 715 /* argument: the caller must pass in the maximum number of */ 716 /* characters that may be output (including the terminating */ 717 /* null), and on successful return it will contain the number of */ 718 /* characters actually output (including the terminating null, */ 719 /* so it will be one more than strlen() would return, which is */ 720 /* why it is called output_size rather than output_length). The */ 721 /* uppercase_flags array must hold input_length boolean values, */ 722 /* where nonzero means the corresponding Unicode character should */ 723 /* be forced to uppercase after being decoded, and zero means it */ 724 /* is caseless or should be forced to lowercase. Alternatively, */ 725 /* uppercase_flags may be a null pointer, which is equivalent */ 726 /* to all zeros. The letters a-z and A-Z are always encoded */ 727 /* literally, regardless of the corresponding flags. The return */ 728 /* value may be any of the amc_ace_status values defined above; */ 729 /* if not amc_ace_success, then output_size and output may */ 730 /* contain garbage. */ 732 enum amc_ace_status amc_ace_z_decode( 733 const char input[], 734 amc_ace_z_uint *output_length, 735 amc_ace_z_uint output[], 736 unsigned char uppercase_flags[] ); 738 /* amc_ace_z_decode() converts AMC-ACE-Z (without any signature) */ 739 /* to Unicode. The input must be represented as null-terminated */ 740 /* ASCII, and the output will be represented as an array of */ 741 /* Unicode code points. The output_length is an in/out argument: */ 742 /* the caller must pass in the maximum number of code points */ 743 /* that may be output, and on successful return it will contain */ 744 /* the actual number of code points output. The uppercase_flags */ 745 /* array must have room for at least output_length values, or it */ 746 /* may be a null pointer if the case information is not needed. */ 747 /* A nonzero flag indicates that the corresponding Unicode */ 748 /* character should be forced to uppercase by the caller, while */ 749 /* zero means it is caseless or should be forced to lowercase. */ 750 /* The letters a-z and A-Z are output already in the proper case, */ 751 /* but their flags will be set appropriately so that applying the */ 752 /* flags would be harmless. The return value may be any of the */ 753 /* amc_ace_status values defined above; if not amc_ace_success, */ 754 /* then output_length, output, and uppercase_flags may contain */ 755 /* garbage. On success, the decoder will never need to write */ 756 /* an output_length greater than the length of the input (not */ 757 /* counting the null terminator), because of how the encoding is */ 758 /* defined. */ 759 /**********************************************************/ 760 /* Implementation (would normally go in its own .c file): */ 762 #include 764 /*** Bootstring parameters for AMC-ACE-Z ***/ 766 enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700, 767 initial_bias = 72, initial_n = 0xA1, delimiter = 0x2D }; 769 /* encode_digit(d) returns the basic code point whose value */ 770 /* (when used for representing integers) is d, which must be */ 771 /* in the range 0 to base-1. The lowercase form is used. */ 773 static char encode_digit(amc_ace_z_uint d) 774 { 775 return d + 22 + 75 * (d < 26); 776 /* 0..25 map to ASCII a..z */ 777 /* 26..35 map to ASCII 0..9 */ 778 } 780 /* decode_digit(cp) returns the numeric value of a basic code point */ 781 /* (for use in representing integers) in the range 0 to base-1, or */ 782 /* base if cp is the delimiter, or base+1 otherwise. */ 784 static amc_ace_z_uint decode_digit(amc_ace_z_uint cp) 785 { 786 return cp - 48 < 10 ? cp - 22 : cp - 65 < 26 ? cp - 65 : 787 cp - 97 < 26 ? cp - 97 : cp == delimiter ? base : base + 1; 788 } 790 /*** Useful constants ***/ 792 /* maxint is the maximum value of an amc_ace_z_uint variable: */ 793 static const amc_ace_z_uint maxint = -1; 795 /* lobase and cutoff are used in the calculation of bias: */ 796 enum { lobase = base - tmin, cutoff = lobase * tmax / 2 }; 798 /*** Main encode function ***/ 800 enum amc_ace_status amc_ace_z_encode( 801 amc_ace_z_uint input_length, 802 const amc_ace_z_uint input[], 803 const unsigned char uppercase_flags[], 804 amc_ace_z_uint *output_size, 805 char output[] ) 806 { 807 amc_ace_z_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t; 808 char shift; 810 /* Initialize the state: */ 812 n = initial_n; 813 delta = out = 0; 814 max_out = *output_size; 815 bias = initial_bias; 816 /* Handle the basic code points, and make sure */ 817 /* that all code points < n are basic code points: */ 819 for (j = 0; j < input_length; ++j) { 820 if (decode_digit(input[j]) <= base) { 821 if (max_out - out < 2) return amc_ace_big_output; 822 output[out++] = input[j]; 823 } 824 else if (input[j] < n) return amc_ace_bad_input; 825 } 827 h = b = out; 829 /* h is the number of code points that have been handled, b is the */ 830 /* number of basic code points, and out is the number of characters */ 831 /* that have been output. */ 833 if (b > 0) output[out++] = delimiter; 835 /* Main encoding loop: */ 837 while (h < input_length) { 838 /* All non-basic code points < n have been */ 839 /* handled already. Find the next larger one: */ 841 for (m = maxint, j = 0; j < input_length; ++j) { 842 /* not needed for AMC-ACE-Z: */ 843 /* if (decode_digit(input[j]) <= base) continue; */ 844 if (input[j] >= n && input[j] < m) m = input[j]; 845 } 847 /* Increase delta enough to advance the decoder's */ 848 /* state to , but guard against overflow: */ 850 if (m - n > (maxint - delta) / (h + 1)) return amc_ace_overflow; 851 delta += (m - n) * (h + 1); 852 n = m; 854 for (j = 0; j < input_length; ++j) { 855 /* Not needed for AMC-ACE-Z: */ 856 #if 0 857 if (decode_digit(input[j]) <= base) { 858 if (++delta == 0) return amc_ace_overflow; 859 continue; 860 } 861 #endif 863 if (input[j] < n && ++delta == 0) return amc_ace_overflow; 865 if (input[j] == n) { 866 /* Represent delta as a generalized variable-length integer: */ 867 for (q = delta, k = base; ; k += base) { 868 if (out >= max_out) return amc_ace_big_output; 869 t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; 870 if (q < t) break; 871 output[out++] = encode_digit(t + (q - t) % (base - t)); 872 q = (q - t) / (base - t); 873 } 875 shift = uppercase_flags && uppercase_flags[j] ? 32 : 0; 876 /* shift controls the case of the terminal character: */ 877 output[out++] = encode_digit(q) - shift; 879 /* Adapt the bias: */ 880 delta = h == b ? delta / damp : delta >> 1; 881 delta += delta / (h + 1); 882 for (bias = 0; delta > cutoff; bias += base) delta /= lobase; 883 bias += (lobase + 1) * delta / (delta + skew); 885 delta = 0; 886 ++h; 887 } 888 } 890 ++delta, ++n; 891 } 893 /* Append the null terminator: */ 894 if (out >= max_out) return amc_ace_big_output; 895 output[out++] = 0; 897 *output_size = out; 898 return amc_ace_success; 899 } 901 /*** Main decode function ***/ 903 enum amc_ace_status amc_ace_z_decode( 904 const char input[], 905 amc_ace_z_uint *output_length, 906 amc_ace_z_uint output[], 907 unsigned char uppercase_flags[] ) 908 { 909 amc_ace_z_uint n, out, i, oldi, max_out, bias, w, k, delta, digit, t; 910 const char *in, *p; 912 /* Initialize the state: */ 914 n = initial_n; 915 out = i = 0; 916 max_out = *output_length; 917 bias = initial_bias; 919 /* Handle the basic code points: Let p point to the last */ 920 /* delimiter, or to the start if there is none, then copy */ 921 /* everything before p to the output. */ 923 for (p = in = input; *in; ++in) if (*in == delimiter) p = in; 924 if (p - input > max_out) return amc_ace_big_output; 925 for (in = input; in < p; ++in) { 926 if (uppercase_flags) uppercase_flags[out] = *in >= 65 && *in <= 90; 927 output[out++] = *in; 928 } 930 /* Main decoding loop: Start just after p if any basic code */ 931 /* points were copied; start at the beginning otherwise. */ 933 for (in = p > input ? p + 1 : input; *in != 0; ++out) { 935 /* in points to the next character to be consumed, and */ 936 /* out is the number of code points in the output array. */ 938 /* Decode a generalized variable-length integer into delta, */ 939 /* which gets added to i. The overflow checking is easier */ 940 /* if we increase i as we go, then subtract off its starting */ 941 /* value at the end to obtain delta. */ 943 for (oldi = i, w = 1, k = base; ; k += base) { 944 digit = decode_digit(*in++); 945 if (digit >= base) return amc_ace_bad_input; 946 if (digit > (maxint - i) / w) return amc_ace_overflow; 947 i += digit * w; 948 t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; 949 if (digit < t) break; 950 if (w > maxint / (base - t)) return amc_ace_overflow; 951 w *= (base - t); 952 } 954 /* Adapt the bias: */ 955 delta = oldi == 0 ? i / damp : (i - oldi) >> 1; 956 delta += delta / (out + 1); 957 for (bias = 0; delta > cutoff; bias += base) delta /= lobase; 958 bias += (lobase + 1) * delta / (delta + skew); 960 /* i was supposed to wrap around from out+1 to 0, */ 961 /* incrementing n each time, so we'll fix that now: */ 963 if (i / (out + 1) > maxint - n) return amc_ace_overflow; 964 n += i / (out + 1); 965 i %= (out + 1); 967 /* Insert n at position i of the output: */ 969 /* not needed for AMC-ACE-Z: */ 970 /* if (decode_digit(n) <= base) return amc_ace_invalid_input; */ 971 if (out >= max_out) return amc_ace_big_output; 973 if (uppercase_flags) { 974 memmove(uppercase_flags + i + 1, uppercase_flags + i, out - i); 975 /* Case of last character determines uppercase flag: */ 976 uppercase_flags[i] = in[-1] >= 65 && in[-1] <= 90; 977 } 979 memmove(output + i + 1, output + i, (out - i) * sizeof *output); 980 output[i++] = n; 981 } 982 *output_length = out; 983 return amc_ace_success; 984 } 986 /******************************************************************/ 987 /* Wrapper for testing (would normally go in a separate .c file): */ 989 #include 990 #include 991 #include 992 #include 994 /* For testing, we'll just set some compile-time limits rather than */ 995 /* use malloc(), and set a compile-time option rather than using a */ 996 /* command-line option. */ 998 enum { 999 unicode_max_length = 256, 1000 ace_max_size = 256 1001 }; 1003 static void usage(char **argv) 1004 { 1005 fprintf(stderr, 1006 "%s -e reads code points and writes an AMC-ACE-Z string.\n" 1007 "%s -d reads an AMC-ACE-Z string and writes code points.\n" 1008 "Input and output are plain text in the native character set.\n" 1009 "Code points are in the form u+hex separated by whitespace.\n" 1010 "An AMC-ACE-Z string is a newline-terminated sequence of LDH\n" 1011 "characters (without any signature).\n" 1012 "The case of the u in u+hex is the force-to-uppercase flag.\n" 1013 , argv[0], argv[0]); 1014 exit(EXIT_FAILURE); 1015 } 1017 static void fail(const char *msg) 1018 { 1019 fputs(msg,stderr); 1020 exit(EXIT_FAILURE); 1021 } 1023 static const char too_big[] = 1024 "input or output is too large, recompile with larger limits\n"; 1025 static const char invalid_input[] = "invalid input\n"; 1026 static const char overflow[] = "arithmetic overflow\n"; 1027 static const char io_error[] = "I/O error\n"; 1029 /* The following string is used to convert LDH */ 1030 /* characters between ASCII and the native charset: */ 1031 static const char ldh_ascii[] = 1032 "................" 1033 "................" 1034 ".............-.." 1035 "0123456789......" 1036 ".ABCDEFGHIJKLMNO" 1037 "PQRSTUVWXYZ....." 1038 ".abcdefghijklmno" 1039 "pqrstuvwxyz"; 1041 int main(int argc, char **argv) 1042 { 1043 enum amc_ace_status status; 1044 int r; 1045 char *p; 1047 if (argc != 2) usage(argv); 1048 if (argv[1][0] != '-') usage(argv); 1049 if (argv[1][2] != 0) usage(argv); 1051 if (argv[1][1] == 'e') { 1052 amc_ace_z_uint input[unicode_max_length]; 1053 unsigned long codept; 1054 unsigned char uppercase_flags[unicode_max_length]; 1055 char output[ace_max_size], uplus[3]; 1056 unsigned int input_length, output_size, i; 1058 /* Read the input code points: */ 1060 input_length = 0; 1062 for (;;) { 1063 r = scanf("%2s%lx", uplus, &codept); 1064 if (ferror(stdin)) fail(io_error); 1065 if (r == EOF || r == 0) break; 1067 if (r != 2 || uplus[1] != '+' || codept > (amc_ace_z_uint)-1) { 1068 fail(invalid_input); 1069 } 1071 if (input_length == unicode_max_length) fail(too_big); 1073 if (uplus[0] == 'u') uppercase_flags[input_length] = 0; 1074 else if (uplus[0] == 'U') uppercase_flags[input_length] = 1; 1075 else fail(invalid_input); 1076 input[input_length++] = codept; 1077 } 1079 /* Encode: */ 1081 output_size = ace_max_size; 1082 status = amc_ace_z_encode(input_length, input, uppercase_flags, 1083 &output_size, output); 1084 if (status == amc_ace_bad_input) fail(invalid_input); 1085 if (status == amc_ace_big_output) fail(too_big); 1086 if (status == amc_ace_overflow) fail(overflow); 1087 assert(status == amc_ace_success); 1089 /* Convert to native charset and output: */ 1091 for (p = output; *p != 0; ++p) { 1092 i = *p; 1093 assert(i <= 122 && ldh_ascii[i] != '.'); 1094 *p = ldh_ascii[i]; 1095 } 1097 r = puts(output); 1098 if (r == EOF) fail(io_error); 1099 return EXIT_SUCCESS; 1100 } 1102 if (argv[1][1] == 'd') { 1103 char input[ace_max_size], *pp; 1104 amc_ace_z_uint output[unicode_max_length]; 1105 unsigned char uppercase_flags[unicode_max_length]; 1106 unsigned int input_length, output_length, i; 1108 /* Read the AMC-ACE-Z input string and convert to ASCII: */ 1110 fgets(input, ace_max_size, stdin); 1111 if (ferror(stdin)) fail(io_error); 1112 if (feof(stdin)) fail(invalid_input); 1113 input_length = strlen(input); 1114 if (input[input_length - 1] != '\n') fail(too_big); 1115 input[--input_length] = 0; 1117 for (p = input; *p != 0; ++p) { 1118 pp = strchr(ldh_ascii, *p); 1119 if (pp == 0) fail(invalid_input); 1120 *p = pp - ldh_ascii; 1121 } 1123 /* Decode: */ 1125 output_length = unicode_max_length; 1126 status = amc_ace_z_decode(input, &output_length, 1127 output, uppercase_flags); 1128 if (status == amc_ace_bad_input) fail(invalid_input); 1129 if (status == amc_ace_big_output) fail(too_big); 1130 if (status == amc_ace_overflow) fail(overflow); 1131 assert(status == amc_ace_success); 1132 /* Output the result: */ 1134 for (i = 0; i < output_length; ++i) { 1135 r = printf("%s+%04lX\n", 1136 uppercase_flags[i] ? "U" : "u", 1137 (unsigned long) output[i] ); 1138 if (r < 0) fail(io_error); 1139 } 1141 return EXIT_SUCCESS; 1142 } 1144 usage(argv); 1145 return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ 1146 } 1148 INTERNET-DRAFT expires 2002-Jan-11