idnits 2.17.1 

draft-whistler-plane14-00.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == Mismatching filename: the document gives the document name as
     'draft-whistler-plane14-01', but the file name used is
     'draft-whistler-plane14-00'

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 604 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 5 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The abstract seems to contain references ([UNICODE], [ISO10646],
     [RFC1766]), which it shouldn't.  Please replace those with straight
     textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 15, 1998) is 9565 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'A-Z' is mentioned on line 364, but not defined

  == Unused Reference: 'RFC2070' is defined on line 553, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282)

  ** Obsolete normative reference: RFC 2070 (Obsoleted by RFC 2854)

  ** Downref: Normative reference to an Informational RFC: RFC 2130

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'


     Summary: 14 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                               Ken Whistler, Sybase
2	Internet Draft                                     Glenn Adams, Spyglass
3	                                         <draft-whistler-plane14-01.txt>

5	                 Language Tagging in Unicode Plain Text

7	                            February 15, 1998

9	                          Status of this Memo

11	This document is an Internet-Draft.  Internet-Drafts are working
12	documents of the Internet Engineering Task Force (IETF), its areas, and
13	its working groups. Note that other groups may also distribute working
14	documents as Internet- Drafts.

16	Internet-Drafts are draft documents valid for a maximum of six months.
17	Internet-Drafts may be updated, replaced, or obsoleted by other
18	documents at any time.  It is not appropriate to use Internet-Drafts as
19	reference material or to cite them other than as a "working draft" or
20	"work in progress".

22	To learn the current status of any Internet-Draft, please check the
23	1id-abstracts.txt listing contained in the Internet-Drafts Shadow
24	Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe),
25	ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).

27	1.    Abstract

29	This document proposed a mechanism for language tagging in [UNICODE]
30	plain text. A set of special-use tag characters on Plane 14 of
31	[ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding forms)
32	are proposed for encoding to enable the spelling out of ASCII-based
33	string tags using characters which can be strictly separated from
34	ordinary text content characters in ISO10646 (or UNICODE).

36	One tag identification character and one cancel tag character are also
37	proposed. In particular, a language tag identification character is
38	proposed to identify a language tag string specifically; the language
39	tag itself makes use of [RFC1766] language tag strings spelled out
40	using the Plane 14 tag characters. Provision of a specific,
41	low-overhead mechanism for embedding language tags in plain text is
42	aimed at meeting the need of Internet Protocols such as ACAP, which
43	require a standard mechanism for marking language in UTF-8 strings.

45	The tagging mechanism as well the characters proposed in this document
46	have been approved by the Unicode Consortium for inclusion in The
47	Unicode Standard.  However, implementation of this decision awaits
48	formal acceptance by ISO JTC1/SC2/WG2, the working group responsible
49	for ISO10646. Potential implementers should be aware that until this
50	formal acceptance occurs, any usage of the characters proposed herein
51	is strictly experimental and not sanctioned for standardized character
52	data interchange.

54	2.    Definitions and Notation

56	No attempt is made to define all terms used in this document. In
57	particular, the terminology pertaining to the subject of coded
58	character systems is not explicitly specified. See [UNICODE],
59	[ISO10646], and [RFC2130] for additional definitions in this area.

61	2.1   Requirements Notation

63	This document occasionally uses terms that appear in capital letters.
64	When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
65	appear capitalized, they are being used to indicate particular
66	requirements of this specification. A discussion of the meanings of
67	these terms appears in [RFC2119].

69	2.2   Definitions

71	The terms defined below are used in special senses and thus warrant
72	some clarification.

74	2.2.1 Tagging

76	The association of attributes of text with a point or range of the
77	primary text. (The value of a particular tag is not generally
78	considered to be a part of the "content" of the text. Typical examples
79	of tagging is to mark language or font of a portion of text.)

81	2.2.2 Annotation

83	The association of secondary textual content with a point or range of
84	the primary text. (The value of a particular annotation *is* considered
85	to be a part of the "content" of the text. Typical examples include
86	glossing, citations, exemplication, Japanese yomi, etc.)

88	2.2.3 Out-of-band

90	An out-of-band channel conveys a tag in such a way that the textual
91	content, as encoded, is completely untouched and unmodified. This is
92	typically done by metadata or hyperstructure of some sort.

94	2.2.4 In-band

96	An in-band channel conveys a tag along with the textual content, using
97	the same basic encoding mechanism as the text itself. This is done by
98	various means, but an obvious example is SGML markup, where the tags
99	are encoded in the same character set as the text and are interspersed
100	with and carried along with the text data.

102	3.0   Background

104	There has been much discussion over the last 8 years of
105	language tagging and of other kinds of tagging of Unicode plain
106	text. It is fair to say that there is more-or-less universal
107	agreement that language tagging of Unicode plain text is
108	required for certain textual processes. For example, language
109	"hinting" of multilingual text is necessary for multilingual
110	spell-checking based on multiple dictionaries to work well.
111	Language tagging provides a minimum level of required
112	information for text-to-speech processes to work correctly.
113	Language tagging is regularly done on web pages, to enable
114	selection of alternate content, for example.

116	However, there has been a great deal of controversy regarding
117	the appropriate placement of language tags. Some have
118	held that the only appropriate placement of language tags
119	(or other kinds of tags) is out-of-band, making use of
120	attributed text structures or metadata. Others have argued
121	that there are requirements for lower-complexity in-band
122	mechanisms for language tags (or other tags) in plain text.

124	The controversy has been muddied by the existence and widespread
125	use of a number of in-band text markup mechanisms (HTML,
126	text/enriched, etc.) which enable language tagging, but
127	which imply the use of general parsing mechanisms which
128	are deemed too "heavyweight" for protocol developers and
129	a number of other applications. The difficulty of using
130	general in-band text markup for simple protocols derives
131	from the fact that some characters are used both for textual
132	content and for the text markup; this makes it more difficult
133	to write simple, fast algorithms to find only the textual
134	content and ignore the tags, or vice versa. (Think of this
135	as the algorithmic equivalent of the difficulty the human
136	reader has attempting to read just the content of raw
137	HTML source text without a browser interpreting all the
138	markup tags.)

140	The Plane 14 proposal addresses the recurrent and persistent
141	call for a lighter-weight mechanism for text tagging than
142	typical text markup mechanisms in Unicode. It proposes a special set
143	of characters used *only* for tagging. These tag characters
144	can be embedded into plain text and can be identified and/or
145	ignored with trivial algorithms, since there is no overloading
146	of usage for these tag characters--they can only express
147	tag values and never textual content itself.

149	The Plane 14 proposal is not intended for general annotation
150	of text, such as textual citations, phonetic readings (e.g.
151	Japanese Yomi), etc. In its present form, its use is intended
152	to be restriced solely to specifying in-line language tags.
153	Future extensions may widen this scope of intended usage.

155	4.0   Proposal

157	This proposal suggests the use of 97 dedicated tag characters
158	encoded at the start of Plane 14 of ISO/IEC 10646 consisting of
159	a clone of the 94 printable 7-bit ASCII graphic characters and
160	ASCII SPACE, as well as a tag identification character and a tag
161	cancel character.

163	These tag characters are to be used to spell out any ASCII-
164	based tagging scheme which needs to be embedded in Unicode
165	plain text. In particular, they can be used to spell out
166	language tags in order to meet the expressed requirements
167	of the ACAP protocol and the likely requirements of other
168	new protocols following the guidelines of the IAB character
169	workshop (RFC 2130).

171	The suggested range in Plane 14 for the block reserved for
172	tag characters is as follows, expressed in each of the
173	three most generally used encoding schemes for ISO/IEC
174	10646:

176	UCS-4

178	U-000E0000 .. U-000E007F

180	UTF-16

182	U+DB40 U+DC00 .. U+DB40 U+DC7F

184	UTF-8

186	0xF3 0xA0 0x80 0x80 .. 0xF3 0xA0 0x81 0xBF

188	Of this range, U-000E0020 .. U-000E007E is the
189	suggested range for the ASCII clone tag characters themselves.

191	4.1   Names for the Tag Characters

193	The names for the ASCII clone tag characters should be exactly
194	the ISO 10646 names for 7-bit ASCII, prefixed with the word
195	"TAG".

197	In addition, there is one tag identification character
198	and a CANCEL TAG character. The use and syntax of these characters
199	is described in detail below.

201	The entire encoding for the proposed Plane 14 tag characters and
202	names of those characters can be derived from the following list.
203	(The encoded values here and throughout this proposal are listed
204	in UCS-4 form, which is easiest to interpret. It is assumed that
205	most Unicode applications will, however, be making use either
206	of UTF-16 or UTF-8 encoding forms for actual implementation.)

208	U-000E0000  <reserved>
209	U-000E0001  LANGUAGE TAG
210	U-000E0002  <reserved>
211	....
212	U-000E001F  <reserved>
213	U-000E0020  TAG SPACE
214	U-000E0021  TAG EXCLAMATION MARK
215	....
216	U-000E0041  TAG LATIN CAPITAL LETTER A
217	....
218	U-000E007A  TAG LATIN SMALL LETTER Z
219	....
220	U-000E007E  TAG TILDE
221	U-000E007F  CANCEL TAG

223	4.2   Range Checking for Tag Characters

225	The range checks required for code testing for tag characters
226	would be as follows. The same range check is expressed here
227	in C for each of the three significant encoding forms for 10646.

229	Range check expressed in UCS-4:

231	     if ( ( *s >= 0xE0000 ) || ( *s <= 0xE007F ) )

233	Range check expressed in UTF-16 (Unicode):

235	    if ( ( *s == 0xDB40 ) && ( *(s+1) >= 0xDC00 ) && ( *(s+1) <= 0xDC7F ) )

237	Expressed in UTF-8:

239	    if ( ( *s == 0xF3 ) && ( *(s+1) == 0xA0 ) && ( *(s+2) & 0xE0 == 0x80 )

241	Because of the choice of the range for the tag characters, it would also
242	be possible to express the range check for UCS-4 or UTF-16 in terms of
243	bitmask operations, as well.

245	4.3   Syntax for Embedding Tags

247	The use of the Plane 14 tag characters is very simple. In order
248	to embed any ASCII-derived tag in Unicode plain text, the tag
249	is simply spelled out with the tag characters instead, prefixed
250	with the relevant tag identification character. The
251	resultant string is embedded directly in the text.

253	The tag identification character is used as a mechanism for
254	identifying tags of different types. This enables multiple
255	types of tags to coexist amicably embedded in plain text and
256	solves the problem of delimitation if a tag is concatenated
257	directly onto another tag. Although only one type of tag is
258	currently specified, namely the language tag, the encoding
259	of other tag identification characters in the future would
260	allow for distinct tag types to be used.

262	No termination character is required for a tag. A tag terminates
263	either when the first non Plane 14 Tag Character (i.e. any
264	other normal Unicode value) is encountered, or when the next
265	tag identification character is encountered.

267	All tag arguments must be encoded only with the tag characters
268	U-000E0020 .. U-000E007E. No other characters are valid for
269	expressing the tag argument.

271	A detailed BNF syntax for tags is listed below.

273	4.4   Tag Scope and Nesting

275	The value of an established tag continues from the point the
276	tag is embedded in text until either:

278	   A. The text itself goes out of scope, as defined by the
279	      application. (E.g. for line-oriented protocols, when
280	      reaching the end-of-line or end-of-string; for text
281	      streams, when reaching the end-of-stream; etc.)

283	or

285	   B. The tag is explicitly cancelled by the CANCEL TAG
286	      character.

288	Tags of the same type cannot be nested in any way. The appearance
289	of a new embedded language tag, for example, after text which
290	was already language tagged, simply changes the tagged value for
291	subsequent text to that specified in the new tag.

293	Tags of different type can have interdigitating scope, but
294	not hierarchical scope. In effect,
295	tags of different type completely ignore each other, so that
296	the use of language tags can be completely asynchronous with the
297	use of character set source tags (or any other tag type) in the
298	same text in the future.

300	4.5   Cancelling Tag Values

302	U-000E007F CANCEL TAG is provided to allow the specific cancelling
303	of a tag value. The use of CANCEL TAG has the following syntax.
304	To cancel a tag value of a particular type, prefix the CANCEL
305	TAG character with the tag identification character of the
306	appropriate type. For example, the complete string to cancel
307	a language tag is:

309	U-000E0001 U-000E007F

311	The value of the relevant tag type returns to the default state
312	for that tag type, namely: no tag value specified, the same as
313	untagged text.

315	The use of CANCEL TAG without a prefixed tag identification
316	character cancels *any* Plane 14 tag values which may be
317	defined. Since only language tags are currently provided with
318	an explicit tag identification character, only language tags
319	are currently affected.

321	The main function of CANCEL TAG is to make possible such
322	operations as blind concatenation of strings in a tagged context
323	without the propagation of inappropriate tag values across the
324	string boundaries. For example, a string tagged with a Japanese
325	language tag can have its tag value "sealed off" with a terminating
326	CANCEL TAG before another string of unknown language value is
327	concatenated to it. This would prevent the string of unknown
328	language from being erroneously marked as being Japanese simply
329	because of a concatenation to a Japanese string.

331	4.6   Tag Syntax Description

333	An extended BNF (Backus-Naur Form) description of the tags specified
334	in this proposal is found below.  Note the following BNF extensions
335	used in this formalism:

337	1. Semantic constraints are specified by rules in the form of an
338	   assertion specified between double braces; the variable $$ denotes
339	   the string consisting of all terminal symbols matched by the
340	   this non-terminal.

342	   Example:   {{ Assert ( $$[0] == '?' ); }}

344	   Meaning:   The first character of the string matched by this
345	              non-terminal must be '?'

347	2. A number of predicate functions are employed in semantic constraint
348	   rules which are not otherwise defined; their name is sufficient for
349	   determining their predication.

351	   Example:   IsRFC1766LanguageIdentifier ( tag-argument )

353	   Meaning:   tag-argument is a valid RFC1766 language identifier

355	3. A lexical expander function, TAG, is employed to denote the tag
356	   form of an ASCII character; the argument to this function is either
357	   a character or a character set specified by a range or enumeration
358	   expression.

360	   Example:   TAG('-')

362	   Meaning:   TAG HYPHEN-MINUS

364	   Example:   TAG([A-Z])

366	   Meaning:   TAG LATIN CAPITAL LETTER A ...
367	              TAG LATIN CAPITAL LETTER Z

369	4. A macro is employed to denote terminal symbols that are character
370	   literals which can't be directly represented in ASCII. The argument
371	   to the macro is the UNICODE (ISO/IEC 10646) character name.

373	   Example:   '${TAG CANCEL}'

375	   Meaning:   character literal whose code value is U-000E007F

377	5. Occurrence indicators used are '+' (one or more) and '*' (zero
378	   or more); optional occurrence is indicated by enclosure in '['
379	   and ']'.

381	4.6.1 Formal Tag Syntax

383	tag                     :   language-tag
384	                        |   cancel-all-tag
385	                        ;

387	language-tag            :   language-tag-introducer language-tag-argument
388	                        ;

390	language-tag-argument   :   tag-argument
391	              {{ Assert ( IsRFC1766LanguageIdentifier ( $$ ); }}
392	                        |   tag-cancel
393	                        ;

395	cancel-all-tag          :   tag-cancel
396	                        ;

398	tag-argument            :   tag-character+
399	                        ;

401	tag-character           :   { c : c in
402	              TAG( { a : a in printable ASCII characters or SPACE } ) }
403	                        ;

405	language-tag-introducer :   '${TAG LANGUAGE}'
406	                        ;

408	tag-cancel              :   '${TAG CANCEL}'
409	                        ;

411	5.0   Tag Types

413	5.1   Language Tags

415	Language tags are of general interest and should have a high
416	degree of interoperability for protocol usage. To this end, a
417	specific LANGUAGE TAG tag identification character is provided.
418	A Plane 14 tag string prefixed by U-000E0001 LANGUAGE TAG is
419	specified to constitute a language tag. Furthermore, the tag values
420	for the language tag are to be spelled out as specified in RFC
421	1766, making use only of registered tag values or of user-defined
422	language tags starting with the characters "x-".

424	For example, to embed a language tag for Japanese, the Plane 14
425	characters would be used as follows. The Japanese tag from RFC 1766
426	is "ja" (composed of ISO 639 language id) or, alternatively,
427	"ja-JP" (composed of ISO 639 language id plus ISO 3166 country id).
428	Since RFC 1766 specifies that language tags are not case significant,
429	it is recommended that for language tags, the entire tag be
430	lowercased before conversion to Plane 14 tag characters. (This
431	would not be required for Unicode conformance, but should be followed
432	as general practice by protocols making use of RFC 1766 language tags,
433	to simplify and speed up the processing for operations which need to
434	identify or ignore language tags embedded in text.) Lowercasing,
435	rather than uppercasing, is recommended because it follows the majority
436	practice of expressing language tag values in lowercase letters.

438	Thus the entire language tag (in its longer form) would be converted
439	to Plane 14 tag characters as follows:

441	U-000E0001 U-000E006A U-000E0061 U-000E002D U-000E006A U-000E0070

443	The language tag (in its shorter, "ja" form) could be expressed
444	as follows:

446	U-000E0001 U-000E006A U-000E0061

448	The value of this string is then expressed in whichever encoding
449	form (UCS-4, UTF-16, UTF-8) is required and embedded in text at
450	the relevant point.

452	5.2   Additional Tags

454	Additional tag identification characters might be defined in the
455	future. An example would be a CHARACTER SET SOURCE TAG, or a
456	GENERIC TAG for private definition of tags.

458	In each case, when a specific tag identification character is encoded,
459	a corresponding reference standard for the values of the tags associated
460	with the identifier should be designated, so that interoperating
461	parties which make use of the tags will know how to interpret the
462	values the tags may take.

464	6.0   Display Issues

466	All characters in the tag character block are considered to have
467	no visible rendering in normal text. A process which interprets
468	tags may choose to modify the rendering of text based on the tag
469	values (as for example, changing font to preferred style for
470	rendering Chinese versus Japanese). The tag characters
471	themselves have no display; they may be considered similar to
472	a U+200B ZERO WIDTH SPACE in that regard. The tag characters also
473	do not affect breaking, joining, or any other format or layout
474	properties, except insofar as the process interpreting the
475	tag chooses to impose such behavior based on the tag value.

477	For debugging or other operations which must render the tags
478	themselves visible, it is advisable that the tag characters be
479	rendered using the corresponding ASCII character glyphs (perhaps
480	modified systematically to differentiate them from normal ASCII
481	characters). But, as noted below, the tag character values are
482	chosen so that even without display support, the tag characters
483	will be interpretable in most debuggers.

485	8.0   Unicode Conformance Issues

487	The basic rules for Unicode conformance for the tag characters are
488	exactly the same as for any other Unicode characters. A conformant
489	process is not required to interpret the tag characters. If it does
490	not interpret tag characters, it should leave their values undisturbed
491	and do whatever it does with any other uninterpreted characters. If
492	it does interpret them, it should interpret them according to the
493	standard, i.e. as spelled-out tags.

495	So for a non-TagAware Unicode application, any language tag characters
496	(or any other kind of tag expressed with Plane 14 tag characters)
497	encountered would be handled exactly as for uninterpreted Tibetan
498	from the BMP, uninterpreted Linear B from Plane 1, or uninterpreted
499	Egyptian hieroglyphics from private use space in Plane 15.

501	A TagAware but TagPhobic Unicode application can recognize the tag
502	character range in Plane 14 and choose to deliberately strip them
503	out completely to produce plain text with no tags.

505	The presence of a correctly formed tag cannot be taken as a
506	guarantee that the data so tagged is correctly tagged. For example,
507	nothing prevents an application from erroneously labelling French
508	data as Spanish, or from labelling JIS-derived data as Japanese, even
509	if it contains Greek or Cyrillic characters.

511	8.1   Note on Encoding Language Tags

513	The fact that this proposal for encoding tag characters in
514	Unicode includes a mechanism for specifying language tag values
515	does not mean that Unicode is departing from one of its
516	basic encoding principles:

518	    Unicode encodes scripts, not languages.

520	This is still true of the Unicode encoding (and ISO/IEC 10646), even
521	in the presence of a mechanism for specifying language tags
522	in plain text.  There is nothing obligatory about the use of Plane 14
523	tags, whether for language tags or any other kind of tags.

525	Language tagging in no way impacts current encoded characters
526	or the encoding of future scripts.

528	It is fully anticipated that implementations of Unicode which
529	already make use of out-of-band mechanisms for language tagging
530	or "heavy-weight" in-band mechanisms such as HTML will continue
531	to do exactly what they are doing and will ignore Plane 14
532	tag characters completely.

534	9.0   Security Considerations

536	Security issues are not discussed in this memo.

538	************************************************************************

540	References

542	[ISO10646]

544	    ISO/IEC 10646-1:1993 International Organization for Standardization.
545	    "Information Technology -- Universal Multiple-Octet Coded Character
546	    Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane",
547	    Geneva, 1993.

549	[RFC1766]

551	    Alvestrand, H., "Tags for the Identification of Languages", RFC 1766.

553	[RFC2070]

555	    F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Internationalization
556	    of the Hypertext Markup Language", RFC 2070, January 1997.

558	[RFC2119]

560	    S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels",
561	    RFC 2119, March 1997.

563	[RFC2130]

565	    C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. Atkinson,
566	    M. Crispin, and P. Svanberg, "The Report of the IAB Character Set
567	    Workshop held 29 February - 1 March, 1996", RFC 2130, April 1997.

569	[UNICODE]

571	    The Unicode Standard, Version 2.0, The Unicode Consortium,
572	    Addison-Wesley, July 1996.

574	Acknowledgements

576	The following people also contributed to this document, directly or
577	indirectly: Chris Newman, Mark Crispin, Rick McGowan, Joe Becker,
578	John Jenkins, and Asmus Freytag. This document also was reviewed by
579	the Unicode Technical Committee, and the authors wish to thank all
580	of the UTC representatives for their input. The authors are, of course,
581	responsible for any errors or omissions which may remain in the text.

583	Authors' Addresses

585	Ken Whistler
586	Sybase, Inc.
587	6475 Christie Ave.
588	Emeryville, CA 94608-1050
589	Phone: +1 510 922 3611
590	Email: kenw@sybase.com

592	Glenn Adams
593	Spyglass, Inc.
594	One Cambridge Center
595	Cambridge, MA 02142
596	Phone: +1 617 679 4652
597	Email: glenn@spyglass.com