idnits 2.17.1 draft-ietf-idn-dude-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 899 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 32 instances of too long lines in the document, the longest one being 18 characters in excess of 72. == There are 5 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 02, 2001) is 8273 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'IDNRACE' is mentioned on line 116, but not defined

  == Missing Reference: '0123456789abcdef' is mentioned on line 256, but not
     defined

  -- Looks like a reference, but probably isn't: '0123456789' on line 262

  == Missing Reference: 'Arabic' is mentioned on line 350, but not defined

  == Missing Reference: 'Hindi' is mentioned on line 372, but not defined

  == Missing Reference: 'Chinese' is mentioned on line 389, but not defined

  == Missing Reference: 'Russian' is mentioned on line 408, but not defined

  -- Looks like a reference, but probably isn't: '512' on line 824

  -- Looks like a reference, but probably isn't: '128' on line 830

  == Unused Reference: 'IDNCOMP' is defined on line 467, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNrACE' is defined on line 470, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNNAMEPREP' is defined on line 479, but no explicit
     reference was found in the text

  -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' 

  -- Possible downref: Normative reference to a draft: ref. 'IDNrACE' 

  -- Possible downref: Normative reference to a draft: ref. 'IDNLACE' 

  -- No information found for draft-ietf-idn-requirement - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'IDNREQ' 

  -- Possible downref: Normative reference to a draft: ref. 'IDNDUERST' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE3'


     Summary: 4 errors (**), 0 flaws (~~), 12 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force (IETF)                         Mark Welter
2	INTERNET-DRAFT                                          Brian W. Spolarich
3	draft-ietf-idn-dude-01.txt                                     WALID, Inc.
4	March 02, 2001                                  Expires September 02, 2001

6	              DUDE: Differential Unicode Domain Encoding

8	Status of this memo

10	This document is an Internet-Draft and is in full conformance with all
11	provisions of Section 10 of RFC2026.

13	Internet-Drafts are working documents of the Internet Engineering Task
14	Force (IETF), its areas, and its working groups. Note that other
15	groups may also distribute working documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference
20	material or to cite them other than as "work in progress."

22	     The list of current Internet-Drafts can be accessed at
23	     http://www.ietf.org/ietf/1id-abstracts.txt

25	     The list of Internet-Draft Shadow Directories can be accessed at
26	     http://www.ietf.org/shadow.html.

28	The distribution of this document is unlimited.

30	Copyright (c) The Internet Society (2000).  All Rights Reserved.

32	Abstract

34	This document describes a tranformation method for representing
35	Unicode character codepoints in host name parts in a fashion that is
36	completely compatible with the current Domain Name System.  It provides
37	for very efficient representation of typical Unicode sequences as
38	host name parts, while preserving simplicity.  It is proposed as a
39	potential candidate for an ASCII-Compatible Encoding (ACE) for supporting
40	the deployment of an internationalized Domain Name System.

42	Table of Contents

44	1.        Introduction
45	1.1         Terminology
46	2.        Hostname Part Transformation
47	2.1         Post-Converted Name Prefix
48	2.2         Radix Selection
49	2.3         Hostname Prepartion
50	2.4         Definitions
51	2.5         DUDE Encoding
52	2.5.1         Extended Variable Length Hex Encoding
53	2.5.2         DUDE Compression Algorithm
54	2.5.3         Forward Transformation Algorithm
55	2.6         DUDE Decoding
56	2.6.1         Extended Variable Length Hex Decoding
57	2.6.2         DUDE Decompression Algorithm
58	2.6.3         Reverse Transformation Algorithm
59	3.        Examples
60	4.        Optional Case Preservation
61	5.        Security Considerations
62	6.        References

64	1. Introduction

66	DUDE describes an encoding scheme of the ISO/IEC 10646 [ISO10646]
67	character set (whose character code assignments are synchronized
68	with Unicode [UNICODE3]), and the procedures for using this scheme
69	to transform host name parts containing Unicode character sequences
70	into sequences that are compatible with the current DNS protocol
71	[STD13].  As such, it satisfies the definition of a 'charset' as
72	defined in [IDNREQ].

74	1.1 Terminology

76	The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
77	"MAY" in this document are to be interpreted as described in RFC 2119
78	[RFC2119].

80	Hexadecimal values are shown preceded with an "0x". For example,
81	"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
82	shown preceded with an "0b". For example, a nine-bit value might be
83	shown as "0b101101111".

85	Examples in this document use the notation from the Unicode Standard
86	[UNICODE3] as well as the ISO 10646 names. For example, the letter "a"
87	may be represented as either "U+0061" or "LATIN SMALL LETTER A".

89	DUDE converts strings with internationalized characters into
90	strings of US-ASCII that are acceptable as host name parts in current
91	DNS host naming usage. The former are called "pre-converted" and the
92	latter are called "post-converted".  This specification defines both
93	a forward and reverse transformation algorithm.

95	2. Hostname Part Transformation

97	According to [STD13], hostname parts must start and end with a letter
98	or digit, and contain only letters, digits, and the hyphen character
99	("-"). This, of course, excludes most characters used by non-English
100	speakers, characters, as well as many other characters in the ASCII
101	character repertoire. Further, domain name parts must be 63 octets or
102	shorter in length.

104	2.1  Post-Converted Name Prefix

106	This document defines the string 'dq--' as a prefix to identify
107	DUDE-encoded sequences.  For the purposes of comparison in the IDN
108	Working Group activities, the 'dq--' prefix should be used solely to
109	identify DUDE sequences.  However, should this document proceed beyond
110	draft status the prefix should be changed to whatever prefix, if any,
111	is the final consensus of the IDN working group.

113	Note that the prepending of a fixed identifier sequence is only one
114	mechanism for differentiating ASCII character encoded international
115	domain names from 'ordinary' domain names.  One method, as proposed in
116	[IDNRACE], is to include a character prefix or suffix that does not
117	appear in any name in any zone file.  A second method is to insert a
118	domain component which pushes off any international names one or more
119	levels deeper into the DNS hierarchy.  There are trade-offs between
120	these two methods which are independent of the Unicode to ASCII
121	transcoding method finally chosen.  We do not address the international
122	vs. 'ordinary' name differention issue in this paper.

124	2.2  Radix Selection

126	There are many proposed methods for representing Unicode characters
127	within the allowed target character set, which can be split into groups
128	on the basis of the underlying radix.  We have chosen a method with
129	radix 16 because both UTF-32 and ASCII are represented by even multiples
130	of four bits.  This allows a Unicode character to be encoded as a
131	whole number of ASCII characters, and permits easier manipulation of
132	the resulting encoded data by humans.

134	2.3  Hostname Preparation

136	The hostname part is assumed to have at least one character disallowed
137	by [STD13], and that is has been processed for logically equivalent
138	character mapping, filtering of disallowed characters (if any), and
139	compatibility composition/decomposition before presentation to the DUDE
140	conversion algorithm.

142	While it is possible to invent a transcoding mechanism that relies
143	on certain Unicode characters being deemed illegal within domain names
144	and hence available to the transcoding mechanism for improving encoding
145	efficiency, we feel that such a proposal would complicate matters
146	excessively.

148	2.4  Definitions

150	For clarity:

152	  'integer' is an unsigned binary quantity;
153	  'byte' is an 8-bit integer quantity;
154	  'nibble' is a 4-bit integer quantity.

156	2.5  DUDE Encoding

158	The idea behind this scheme is to provide compression by encoding the
159	contiguous least significant nibbles of a character that differ from the
160	preceding character.  Using a variant of the variable length hex encoding
161	desribed in [IDNDUERST] and elsewhere, by encoding leading zero nibbles
162	this technique allows recovery of the differential length. The encoding
163	is, with some practice, easy to perform manually.

165	2.5.1  Extended Variable Length Hex Encoding

167	The variable length hex encoding algorithm was introduced by Duerst in
168	[IDNDUERST].  It encodes an integer value in a slight modification of
169	traditional hexadecimal notation, the difference being that the most
170	significant digit is represented with an alternate set of "digits"
171	- -- 'g through 'v' are used to represent 0 through 15.  The result is a
172	variable length encoding which can efficiently represent integers of
173	arbitrary length.

175	This specification extends the variable length hex encoding algorithm
176	to support the compression scheme defined below by potentially not
177	supressing leading zero nibbles.

179	The extended variable length nibble encoding of an integer, C,
180	to length N, is defined as follows:

182	  1.  Start with I, the Nth least significant nibble from the least
183	      significant nibble of C;

185	  2.  Emit the Ith character of the sequence [ghijklmnopqrstuv];

187	  3.  Continue from the most to least significant, encoding each
188	      remaining nibble J by emitting the Jth character of the
189	      sequence [0123456789abcdef].

191	2.5.2  DUDE Compression Algorithm

193	  1.  Let PREV = 0;

195	  2.  If there are no more characters in the input, terminate successfully;

197	  4.  Let C be the next character in the input;

199	  5.  If C != '-' , then go to step 7;

201	  6.  Consume the input character, emit '-', and go to step 2;

203	  7.  Let D be the result of PREV exclusive ORed with C;

205	  8.  Find the least positive value N such that
206	        D bitwise ANDed with M is zero
207	        where M = the bitwise complement of (16**N) - 1;

209	  9.  Let V be C ANDed with the bitwise complement of M;

211	 10.  Variable length hex encode V to length N and emit the result;

213	 11.  Let PREV = C and go to step 2.

215	2.5.3  Forward Transformation Algorithm

217	The DUDE transformation algorithm accepts a string in UTF-32
218	[UNICODE3] format as input.  It is assumed that prior nameprep
219	processing has disallowed the private use code points in
220	0X100000 throuh 0X10FFFF, so that we are left with the task of
221	encoding 20 bit integers. The encoding algorithm is as follows:

223	  1.  Break the hostname string into dot-separated hostname parts.
224	      For each hostname part which contains one or more characters
225	      disallowed by [STD13], perform steps 2 and 3 below;

227	  2.  Compress the hostname part using the method described in section
228	      2.5.2 above, and encode using the encoding described in section
229	      2.5.1;

231	  3.  Prepend the post-converted name prefix 'dq--' (see section 2.1
232	      above) to the resulting string.

234	2.6  DUDE Decoding

236	2.6.1  Extended Variable Length Hex Decoding

238	  Decoding extended variable length hex encoded strings is identical
239	to the standard variable length hex encoding, and is defined as
240	follows:

242	  1.  Let CL be the lower case of the first input character,

244	      If CL is not in set [ghijklmnopqrstuv],
245	        return error,
246	      else
247	        consume the input character;

249	  2.  Let R = CL - 'g',
250	      Let N = 1;

252	  3.  If no more input characters exist, go to step 9.

254	  4.  Let CL be the lower case of the next input character;

256	  5.  If CL is not in the set [0123456789abcdef], go to Step 9;

258	  6.  Consume the next input character,
259	      Let N = N + 1;
260	      Let R = R * 16;

262	  7.  If N is in set [0123456789],
263	        then let R = R + (N - '0')
264	        else let R = R + (N - 'a') + 10;

266	  8.  Go to step 3;

268	  9.  Let MASK be the bitwise complement of (16**N) - 1;

270	 10.  Return decoded result R as well as MASK.

272	2.6.2  DUDE Decompression Algorithm

274	  1.  Let PREV = 0;

276	  2.  If there are no more input characters then terminate successfully;

278	  3.  Let C be the next input character;

280	  4.  If C == '-', append '-' to the result string, consume the character,
281	        and go to step 2,

283	  5.  Let VPART, MASK be the next extended variable length hex decoded
284	        value and mask;

286	  6.  If VPART > 0xFFFFF then return error status,

288	  7.  Let CU = ( PREV bitwise-AND MASK) + VPART,
289	      Let PREV = CU;

291	  8.  Append the UTF-32 character CU to the result string;

293	  9.  Go to step 2.

295	2.6.3  Reverse Transformation Algorithm

297	  1.  Break the string into dot-separated components and apply Steps
298	      2 through 4 to each component;

300	  2.  Remove the post converted name prefix 'dq--' (see Section 2.1);

302	  3.  Decompress the component using the decompression algorithm
303	      described above (which in turn invokes the decoding algorithm
304	      also described above);

306	  4.  Concatenate the decoded segments with dot separators and return.

308	3.  Examples

310	The examples below illustrate the encoding algorithm.  Allowed RFC1035
311	characters, including period [U+002E] and dash [U+002D] are shown as
312	literals in the UTF-16 version of the example.  DUDE is compared to
313	LACE as proposed in [IDNLACE].  A comprehensive comparison of ACE
314	proposals is outside of the scope of this document.  However we believe
315	that DUDE shows a good balance between efficiency (resulting in shorter
316	ACE sequences for typical names) and complexity.

318	3.1  'www.walid.com' [Arabic]:

320	  UTF-16:  U+0645 U+0648 U+0642 U+0639 . U+0648 U+0644 U+064A U+062F .
321	           U+0634 U+0631 U+0643 U+0629

323	  DUDE:    dq--m45oij9.dq--m48kqif.dq--m34hk3i9

325	  LACE:    bq--aqdekscche.bq--aqdeqrckf5.bq--aqddimkdfe

327	3.2  'Abugazalah-Intellectual-Property.com' [Arabic]:

329	  UTF-16:  U+0623 U+0628 U+0648 U+063A U+0632 U+0627 U+0644 U+0629 -
330	           U+0644 U+0644 U+0645 U+0644 U+0643 U+064A U+0629 - U+0627
331	           U+0644 U+0641 U+0643 U+0631 U+064A U+0629 . U+0634 U+0631
332	           U+0643 U+0629

334	  DUDE:    dq--m23ok8jaii7k4i9-m44klkjqi9-m27k4hjj1kai9.dq--m34hk3i9

336	  LACE:    bq--badcgkcihizcorbjaeac2bygircekrcdjiuqcabna4dcorcbimyuuki.
337	           bq--aqddimkdfe

339	3.3  'King-Hussain.person.jr' [Arabic]

341	  UTF-16:  U+0627 U+0644 U+0645 U+0644 U+0643 - U+062D U+0633 U+064A
342	           U+0646 . U+0634 U+062E U+0635 . U+0627 U+0644 U+0623 U+0631
343	           U+062F U+0646

345	  DUDE:    dq--m27k4lkj-m2dj3kam.dq--m34iej5.dq--m27k4i3j1ifk6

347	  LACE:    bq--audcorcfirbqcabnaudegljtjjda.bq--amddilrv.
348	           bq--aydcorbdgexum

350	3.4  'Jordanian-Dental-Center.com.jr' [Arabic]

352	  UTF-16:  U+0645 U+0631 U+0643 U+0632 - U+0627 U+0644 U+0623 U+0631 U+062F
353	           U+0646 - U+0644 U+0644 U+0623 U+0633 U+0646 U+0627 U+0646 .
354	           U+0634 U+0631 U+0643 U+0629 . U+0627 U+0644 U+0623 U+0631 U+062F
355	           U+0646

357	  DUDE:    dq--m45j1k3j2-m27k4i3j1ifk6-m44ki3j3k6i7k6.dq--m34hk3i9.
358	           dq--m27k4i3j1ifk6

360	  LACE:    bq--aqdekmkdgiaqaligaytuiizrf5dacabna4deirbdgndcorq.
361	           bq--aqddimkdfe.bq--aydcorbdgexum

363	3.5  'Mahindra.com' [Hindi]:

365	  UTF-16:  U+092E U+0939 U+093F U+0928 U+094D U+0926 U+094D U+0930
366	           U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930

368	  DUDE:    dq--p2ej9vi8kdi6kdj0u.dq--p35kdifjeiajeg

370	  LACE:    bq--bees4oj7fbgsmtjqhy.bq--a4etktjphyvd4ma

372	3.6  'Webdunia.com' [Hindi]:

374	  UTF-16:  U+0935 U+0947 U+092C U+0926 U+0941 U+0928 U+093F U+092F
375	           U+093E . U+0935 U+094D U+092F U+093E U+092A U+093E U+0930

377	  DUDE:    dq--p35k7icmk1i8jfifje.dq--p35kdifjeiajeg

379	  LACE:    bq--beetkrzmezasqpzphy.bq--a4etktjphyvd4ma

381	3.7  'Chinese Finance.com' [Traditional Chinese]

383	  UTF-16:  U+4E2D U+83EF U+8CA1 U+7D93 . c o m

385	  DUDE:    dq--ke2do3efsa1nd93.com

387	  LACE:    bq--75hc3a7prsqx3ey.com

389	3.8  'Chinese Readers.net' [Chinese]

391	  UTF-16:  U+842C U+7DAD U+8B80 U+8005 . U+7DB2 U+7D61

393	  DUDE:    dq--o42cndadob80g05.dq--ndb2m1

395	  LACE:    bq--76ccy7nnroaiabi.bq--aj63eyi

397	3.9  'Russian-Standard.com.ru' [Russian]

399	  UTF-16:  U+0440 U+0443 U+0441 U+0441 U+043A U+0438 U+0439 -
400	           U+0441 U+0442 U+0430 U+043D U+0434 U+0430 U+0440 U+0442 .
401	           U+043A U+043E U+043C . U+0440 U+0444

403	  DUDE:    dq--k40jhhjaop-k3ausk1ij0tkgk0i.dq--k3aus.dq--k40k

405	  LACE:    bq--a4ceaq2bie5dqoibaawqqbcbiiyd2nbqibba.bq--amcdupr4.
406	           bq--aiceara

408	3.10  'Vladimir-Putin.person.ru' [Russian]

410	  UTF-16:  U+0432 U+043B U+0430 U+0434 U+0438 U+043C U+0438 U+0440 -
411	           U+043F U+0443 U+0442 U+0438 U+043D . U+043B U+0438 U+0447
412	           U+043D U+043E U+0441 U+0442 U+044C . U+0440 U+0444 U+0020

414	  DUDE:    dq--k32rgkosok0-k3fk3ij8t.dq--k3bok7jduk1is.dq--k40k

416	  LACE:    bq--bacdeozqgq4dyocaaeac2bieh5bueob5.
417	           bq--bacdwochhu7ecqsm.bq--aiceara

419	4. Optional Case Preservation

421	An extension to the DUDE concept recognizes that the first
422	character emitted by the variable length hex encoding algorithm is
423	always alphabetic.  We encode the case (if any) of the original Unicode
424	character in the case of the initial "hex" character.  Because the DNS
425	performs case-insensitive comparisons, mixed case international domain
426	names behave in exactly the same way as traditional domain names.
427	In particular, this enables reverse lookups to return names in the
428	preferred case.

430	In contrast to other proposals as of this writing, such a case preserving
431	version of DUDE will interoperate with the non case preserving version.

433	Despite the foregoing, we feel that the additional complexity of tracking
434	character case through the nameprep processing is not warranted by the
435	marginal utility of the result.

437	5. Security Considerations

439	Much of the security of the Internet relies on the DNS and any
440	change to the characteristics of the DNS may change the security of
441	much of the Internet. Therefore DUDE makes no changes to the DNS itself.

443	DUDE is designed so that distinct Unicode sequences map to distinct
444	domain name sequences (modulo the Unicode and DNS equivalence rules).
445	Therefore use of DUDE with DNS will not negatively affect security below
446	the application level.

448	If an application has security reliance on the Unicode string S, produced
449	by an inverse ACE transformation of a name T, the application must verify
450	that the nameprepped and ACE encoded result of S is DNS-equivalent to T.

452	6. Change History

454	The statement that we intended to submit a Nameprep draft was removed in
455	light of the changes made between the frist and second nameprep drafts.

457	The details of DUDE extensions for case preservation etc. have been
458	removed.  Basic DUDE was changed to operate over the relevant 20 bit
459	UTF32 code points.

461	Examples have been extended.

463	ACE security issues were clarified.

465	7. References

467	[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
468	Proposals", draft-ietf-idn-compare;

470	[IDNrACE] Paul Hoffman, "RACE: Row-Based ASCII Compatible Encoding for
471	IDN", draft-ietf-idn-race;

473	[IDNLACE] Mark Davis, "LACE: Length-Based ASCII Compatible Encoding for
474	IDN", draft-ietf-idn-lace;

476	[IDNREQ] James Seng, "Requirements of Internationalized Domain Names",
477	draft-ietf-idn-requirement;

479	[IDNNAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of
480	Internationalized Host Names", draft-ietf-idn-nameprep;

482	[IDNDUERST] M. Duerst, "Internationalization of Domain Names",
483	draft-duerst-dns-i18n;

485	[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
486	technology -- Universal Multiple-Octet Coded Character Set (UCS) --
487	Part 1: Architecture and Basic Multilingual Plane.  Five amendments and
488	a technical corrigendum have been published up to now. UTF-16 is
489	described in Annex Q, published as Amendment 1. 17 other amendments are
490	currently at various stages of standardization;

492	[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
493	Requirement Levels", March 1997, RFC 2119;

495	[STD13] Paul Mockapetris, "Domain names - implementation and
496	specification", November 1987, STD 13 (RFC 1035);

498	[UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version
499	3.0", ISBN 0-201-61633-5. Described at
500	.

502	A. Acknowledgements

504	The structure (and some of the structural text) of this document is
505	intentionally borrowed from the LACE IDN draft (draft-ietf-idn-lace-00)
506	by Mark Davis and Paul Hoffman.

508	B. IANA Considerations

510	There are no IANA considerations in this document.

512	C. Author Contact Information

514	Mark Welter
515	Brian W. Spolarich
516	WALID, Inc.
517	State Technology Park
518	2245 S. State St.
519	Ann Arbor, MI  48104
520	+1-734-822-2020

522	mwelter@walid.com
523	briansp@walid.com

525	D. DUDE C++ Implementation

527	#include 
528	#include 
529	#include 
530	#include 

532	#define IDN_ERROR INT_MIN

534	#define DUDETAG "dq--"

536	typedef unsigned int uchar_t;

538	bool idn_isRFC1035(const uchar_t * in, int len)
539	{
540	        const uchar_t * end = in + len;

542	        while (in < end)
543	        {
544	                if ((*in > 127) ||
545	                        !strchr("abcdefghijklmnopqrstuvwxyz0123456789-.", tolower(*in)))
546	                        return false;
547	                in++;
548	        }
549	        return true;
550	}

552	static const char *hexchar = "0123456789abcdef";
553	static const char *leadchar = "ghijklmnopqrstuv";

555	/*
556	        dudehex -- convert an integer, v, into n DUDE hex characters.
557	        The result is placed in ostr.  The buffer ends at the byte before
558	        eop, and false is returned to indicate insufficient buffer space.
559	*/
560	static bool dudehex(char * & ostr, const char * eop,
561	                                unsigned int v, int n)
562	{
563	        if ((ostr + n) >= eop)
564	                return false;

566	        n--; // convert to zero origin

568	        *ostr++ = leadchar[(v >> (n << 2)) & 0x0F];

570	        while (n > 0)
571	        {
572	                n--;
573	                *ostr++ = hexchar[(v >> (n << 2)) & 0x0F];
574	        }
575	        return true;
576	}

578	/*
579	        idn_dudeseg converts istr, a utf-32 domain name segment into DUDE.
580	        eip points at the character after the input segment.
581	        ostr points at an output buffer which ends just before eop.
582	        If there is insufficient buffer space, the function return is false.
583	        Invalid surrogate sequences will also cause a return of false.
584	*/
585	static bool idn_dudeseg(const uchar_t * istr, const uchar_t * eip,
586	                                char * & ostr, char * eop)
587	{
588	        const uchar_t * ip = istr;
589	        unsigned p = 0;

591	        while (ip < eip)
592	        {
593	                if (*ip == '-')
594	                        *ostr++ =  *ip;
595	                else  // if (validnc(*ip))
596	                {
597	                        unsigned int c = *ip;

599	                        unsigned d = p ^ c;  // d now has the difference (xor)
600	                                             // between the current and previous char

602	                        int n = 1;           // Count the number of significant nibbles
603	                        while (d >>= 4)
604	                                n++;

606	                        dudehex(ostr, eop, c, n);
607	                        p = c;
608	                }
609	                ip++;
610	        }
611	        *ostr = 0;
612	        return true;
613	}

615	/*
616	        idn_UTF32toDUDE converts a UTF-32 domain name into DUDE.
617	        in, a UTF-32 vector of length inlen is the input domain name.
618	        outstr is a char output buffer of length outmax.
619	        On success, the number of output characters is returned.
620	        On failure, a negative number is returned.

622	        It is assumed that the input has been nameprepped.

624	        If this routine is used in a registration context, segment and
625	        overall length restrictions must be checked by the user.
626	*/

628	int idn_UTF32toDUDE(const uchar_t * in, int inlen, char *outstr, int outmax)
629	{
630	        const uchar_t *ip = in;
631	        const uchar_t *eip = in + inlen;
632	        const uchar_t *ep = ip;
633	        char *op = outstr;
634	        char *eop = outstr + outmax - 1;

636	        while (ip < eip)
637	        {
638	                ep = ip;
639	                while ((ep < eip) && (*ep != '.'))
640	                        ep++;

642	                const char * tagp = DUDETAG;  // prefix the segment
643	                while (*tagp)                 // with the tag (dq--)
644	                {
645	                        if (op >= eop)
646	                        {
647	                                *outstr = '\0';
648	                                return IDN_ERROR;
649	                        }
650	                        *op++ = *tagp++;
651	                }

653	                if (idn_isRFC1035(ip, ep - ip))
654	                {
655	                        if ((ep - ip) >= (eop - op))
656	                        {
657	                                *outstr = '\0';
658	                                return IDN_ERROR;
659	                        }
660	                        while (ip < ep)
661	                                *op++ = *ip++;
662	                }
663	                else
664	                {
665	                        if (!idn_dudeseg(ip, ep, op, eop))
666	                        {
667	                                *outstr = '\0';
668	                                return IDN_ERROR;
669	                        }
670	                }

672	                if (op >= eop)                  // check for output buffer overflow
673	                {
674	                        *outstr = '\0';
675	                        return IDN_ERROR;
676	                }
677	                if (ep < eip)
678	                        *op++ = *ep;            // copy '.'

680	                ip = ep + 1;
681	        }

683	        *op = '\0';

685	        return (op - outstr) - 1;
686	}

688	/*
689	        idn_DUDEsegtoUTF32 converts instr, DUDE encoded domain name segment
690	        into UTF32.
691	        eip points at the character after the input segment.
692	        ostr points at an output buffer which ends just before eop.
693	        If there is insufficient buffer space, the function return is false.
694	*/
695	static int idn_DUDEsegtoUTF32(const char * instr, int inlen,
696	                                        uchar_t * outstr, int maxlen)
697	{
698	        const char * ip = instr;
699	        const char * eip = instr + inlen;
700	        uchar_t * op = outstr;
701	        uchar_t * eop = op + maxlen - 1;

703	        unsigned prev = 0;

705	        while (ip < eip)
706	        {
707	                if (*ip == '-')
708	                        *op++ = '-';
709	                else
710	                {
711	                        char c0 = tolower(*ip);
712	                        if ((c0 < 'g') || (c0 > 'v'))
713	                                return false;

715	                        ip++;

717	                        unsigned r = c0 - 'g';
718	                        int n = 1;
719	                        while (ip < eip)
720	                        {
721	                                char cl = tolower(*ip);
722	                                if ((cl >= '0') && (cl <= '9'))
723	                                {
724	                                        r <<= 4;
725	                                        r += cl - '0';
726	                                }
727	                                else if ((cl >= 'a') && (cl <= 'f'))
728	                                {
729	                                        r <<= 4;
730	                                        r += (cl - 'a') + 10;
731	                                }
732	                                else
733	                                        break;

735	                                ip++;
736	                                n++;
737	                        }

739	                        if (r >= 0x0fffff)
740	                        {
741	                                return false;
742	                        }
743	                        unsigned mask = -1 << (n << 2);

745	                        unsigned cu = (prev & mask) + r;
746	                        prev = cu;

748	                        if (op >= eop)
749	                                return IDN_ERROR;
750	                        *op++ = cu;
751	                }
752	        }
753	        *op = '\0';
754	        return (op - outstr);
755	}

757	int idn_DUDEtoUTF32(const char * in, int inlen, uchar_t * outstr, int outmax)
758	{
759	        const char *ip = in;
760	        const char *eip = in + inlen;
761	        const char *ep = ip;
762	        uchar_t *op = outstr;
763	        uchar_t *eop = outstr + outmax - 1;

765	        while (ip < eip)
766	        {
767	                ep = ip;
768	                while ((ep < eip) && (*ep != L'.'))
769	                        ep++;

771	                const char * tip = ip;
772	                const char * tagp = DUDETAG;
773	                while (*tagp && (tip < ep) && (tolower(*tagp) == tolower(*tip)))
774	                {
775	                        tip++;
776	                        tagp++;
777	                }

779	                if (*tagp)
780	                {                              // tag doesn't match, copy segment verbatim
781	                        while (ip < ep)
782	                        {
783	                                if (op >= eop)
784	                                        return IDN_ERROR;
785	                                *op++ = *ip++;
786	                        }
787	                }
788	                else
789	                {
790	                        ip = tip;
791	                        int rv = idn_DUDEsegtoUTF32(ip, ep - ip, op, eop - op);

793	                        if (rv < 0)
794	                                return IDN_ERROR;

796	                        op += rv;
797	                }

799	                *op++ = *ep;

801	                if (!*ep)
802	                        break;

804	                ip = ep + 1;
805	        }

807	        if (op >= eop)
808	                return IDN_ERROR;

810	        *op = '\0';

812	        return (op - outstr) - 1;
813	}

815	/*
816	        DUDE test driver
817	*/

819	void printres(char *title, int rv, char *buff);
820	void printres(char *title, int rv, uchar_t *buff);

822	int main(int argc, char *argv[])
823	{
824	        char inbuff[512];

826	        while (fgets(inbuff, sizeof(inbuff), stdin))
827	        {
828	                char cbuff[128];
829	                uchar_t wbuff[128];
830	                uchar_t iwbuff[128];
831	                uchar_t *wsp = wbuff;
832	                uchar_t wc;
833	                int in;
834	                int nr;

836	                char * inp = inbuff;
837	                wsp = wbuff;
838	                while (sscanf(inp, "%x%n", &in, &nr) > 0)
839	                {
840	                        inp += nr;
841	                        *wsp++ = in;
842	                }
843	                fprintf(stdout, "\n");

845	                int rv;
846	                rv = idn_UTF32toDUDE(wbuff, wsp - wbuff, cbuff, sizeof(cbuff));
847	                printres("toDUDE", rv, cbuff);

849	                if (rv >= 0)
850	                {
851	                        rv = idn_DUDEtoUTF32(cbuff, rv, iwbuff, sizeof(iwbuff));
852	                        printres("toUTF32", rv, iwbuff);
853	                }

855	        }
856	        return 0;
857	}

859	void printres(char *title, int rv, char *buff)
860	{
861	        fprintf(stdout, "%s (%d) : ", title, rv);
862	        if (rv >= 0)
863	        {
864	                unsigned char *dp = (unsigned char *) buff;
865	                while (*dp)
866	                {
867	                        fprintf(stdout, "%c", *dp++);
868	                }
869	        }
870	        fprintf(stdout, "\n");
871	}

873	void printres(char *title, int rv, uchar_t *buff)
874	{
875	        fprintf(stdout, "%s (%d) : ", title, rv);
876	        if (rv >= 0)
877	        {
878	                uchar_t *dp = buff;
879	                while (*dp)
880	                {
881	                        fprintf(stdout, " %05x", *dp++);
882	                }
883	        }
884	        fprintf(stdout, "\n");
885	}