idnits 2.17.1 

draft-ietf-ltru-matching-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 698.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 675.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 682.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 688.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC3066], [19], [1]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 170 has weird spacing: '...schemes  that ...'

  == Line 171 has weird spacing: '...ing and  looku...'

  == Line 373 has weird spacing: '...age tag  being...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 13, 2005) is 6922 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'RFC 3066' on line 46

  -- Looks like a reference, but probably isn't: 'RFC 2119' on line 101

  == Unused Reference: '2' is defined on line 546, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 549, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 554, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 560, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 564, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 567, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 571, but no explicit reference
     was found in the text

  == Unused Reference: '11' is defined on line 579, but no explicit reference
     was found in the text

  == Unused Reference: '12' is defined on line 583, but no explicit reference
     was found in the text

  == Unused Reference: '13' is defined on line 588, but no explicit reference
     was found in the text

  == Unused Reference: '14' is defined on line 592, but no explicit reference
     was found in the text

  == Unused Reference: '15' is defined on line 596, but no explicit reference
     was found in the text

  == Unused Reference: '16' is defined on line 599, but no explicit reference
     was found in the text

  == Unused Reference: '17' is defined on line 603, but no explicit reference
     was found in the text

  == Unused Reference: '18' is defined on line 608, but no explicit reference
     was found in the text

  == Unused Reference: '20' is defined on line 614, but no explicit reference
     was found in the text

  == Outdated reference: A later version (-14) exists of
     draft-ietf-ltru-registry-01

  ** Obsolete normative reference: RFC 1327 (ref. '2') (Obsoleted by RFC 2156)

  ** Obsolete normative reference: RFC 1521 (ref. '3') (Obsoleted by RFC
     2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)

  ** Obsolete normative reference: RFC 2028 (ref. '4') (Obsoleted by RFC 9281)

  ** Obsolete normative reference: RFC 2234 (ref. '7') (Obsoleted by RFC 4234)

  ** Obsolete normative reference: RFC 2396 (ref. '8') (Obsoleted by RFC 3986)

  ** Obsolete normative reference: RFC 2434 (ref. '9') (Obsoleted by RFC 5226)

  ** Obsolete normative reference: RFC 2616 (ref. '10') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Downref: Normative reference to an Informational RFC: RFC 2860 (ref.
     '11')

  -- Obsolete informational reference (is this intentional?): RFC 1766 (ref.
     '18') (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 3066 (ref.
     '19') (Obsoleted by RFC 4646, RFC 4647)


     Summary: 12 errors (**), 0 flaws (~~), 23 warnings (==), 11 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                            Quest Software
4	Expires: November 14, 2005                                 M. Davis, Ed.
5	                                                                     IBM
6	                                                            May 13, 2005

8	                     Matching Language Identifiers
9	                      draft-ietf-ltru-matching-00

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on November 14, 2005.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   This document describes different mechanisms for comparing and
43	   matching the tags for the identification of languages defined by [RFC
44	   3066bis] [1].  Possible algorithms for language negotiation and
45	   content selection are described.  Portions of this document obsolete
46	   [RFC 3066] [19].

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
51	   2.  The Language Range . . . . . . . . . . . . . . . . . . . . . .  4
52	     2.1   Basic Language Range . . . . . . . . . . . . . . . . . . .  4
53	       2.1.1   Matching . . . . . . . . . . . . . . . . . . . . . . .  5
54	       2.1.2   Lookup . . . . . . . . . . . . . . . . . . . . . . . .  5
55	     2.2   Extended Language Range  . . . . . . . . . . . . . . . . .  6
56	       2.2.1   Extended Range Matching  . . . . . . . . . . . . . . .  7
57	       2.2.2   Extended Range Lookup  . . . . . . . . . . . . . . . .  8
58	       2.2.3   Scored Matching  . . . . . . . . . . . . . . . . . . .  9
59	     2.3   Meaning of Language Tags and Ranges  . . . . . . . . . . . 10
60	     2.4   Choosing Between Alternate Matching Schemes  . . . . . . . 11
61	     2.5   Considerations for Private Use Subtags . . . . . . . . . . 11
62	   3.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
63	   4.  Changes  . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
64	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 15
65	   6.  Character Set Considerations . . . . . . . . . . . . . . . . . 16
66	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
67	     7.1   Normative References . . . . . . . . . . . . . . . . . . . 17
68	     7.2   Informative References . . . . . . . . . . . . . . . . . . 18
69	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 18
70	   A.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
71	       Intellectual Property and Copyright Statements . . . . . . . . 20

73	1.  Introduction

75	   Human beings on our planet have, past and present, used a number of
76	   languages.  There are many reasons why one would want to identify the
77	   language used when presenting or requesting information.

79	   Information about a user's language preferences commonly needs to be
80	   identified so that appropriate processing can be applied.  For
81	   example, the user's language preferences in a browser can be used to
82	   select web pages appropriately.  A choice of language preference can
83	   also be used to select among tools (such as dictionaries) to assist
84	   in the processing or understanding of content in different languages.

86	   Given a set of language identifiers, such as those defined in
87	   RFC3066bis, various mechanisms can be envisioned for performing
88	   language negotiation and tag matching.  The suitability of a
89	   particular mechanism to a particular application depends on the needs
90	   of that application.

92	   This document defines language ranges and syntax for specifying user
93	   preferences in a request for language content.  It also specifies a
94	   default algorithm for matching language ranges to content (language
95	   tags), as well as alternate mechanisms suitable for certain
96	   applications.

98	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
99	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
100	   document are to be interpreted as described in [RFC 2119] [5].

102	2.  The Language Range

104	   Language Tags are used to identify the language of some information
105	   item or content.  Applications that use language tags are often faced
106	   with the problem of identifying sets of content that share certain
107	   language attributes.  For example, HTTP 1.1 [10] describes language
108	   ranges in its discussion of the Accept-Language header (Section
109	   14.4), which is used for selecting content from servers based on the
110	   language of that content.

112	   When selecting content according to its language, it is useful to
113	   have a mechanism for identifying sets of language tags that share
114	   specific attributes.  This allows users to select or filter content
115	   based on specific requirements.  Such an identifier is called a
116	   "Language Range".

118	2.1  Basic Language Range

120	   A basic language range (such as described in RFC 3066 [19] and HTTP
121	   1.1 [10]) is a set of languages whose tags all begin with the same
122	   sequence of subtags.  A basic language range can be represented by a
123	   'language-range' tag, by using the definition from HTTP/1.1 [10] :
124	   language-range = language-tag / "*"

126	   That is, a language-range has the same syntax as a language-tag or is
127	   the single character "*".  This definition of language-range implies
128	   that there is a semantic relationship between tags that share the
129	   same prefix.

131	   In particular, the set of language tags that match a specific
132	   language-range may not all be mutually intelligible.  The use of a
133	   prefix when matching tags to language ranges does not imply that
134	   language tags are assigned to languages in such a way that it is
135	   always true that if a user understands a language with a certain tag,
136	   then this user will also understand all languages with tags for which
137	   this tag is a prefix.  The prefix rule simply allows the use of
138	   prefix tags if this is the case.

140	   When working with tags and ranges you should also note the following:

142	   1.  Private-use and Extension subtags are normally orthogonal to
143	       language tag fallback.  Implementations should ignore
144	       unrecognized private-use and extension subtags when performing
145	       language tag fallback.  Since these subtags are always at the end
146	       of the sequence of subtags, they naturally fall out of the
147	       default fallback pattern (above).  Thus a request to match the
148	       tag "en-US-boont-x-1943" would produce exactly the same
149	       information content as the example above.

151	   2.  Implementations that choose not to interpret one or more private-
152	       use or extension subtags should not remove or modify these
153	       extensions in content that they are processing.  When a language
154	       tag instance is to be used in a specific, known protocol, and is
155	       not being passed through to other protocols, language tags may be
156	       filtered to remove subtags and extensions that are not supported
157	       by that protocol.  This should be done with caution, since it it
158	       is removing information that may be relevant if services on the
159	       other end of the protocol would make use of that information.

161	   3.  Some applications of language tags may want or need to consider
162	       extensions and private-use subtags when matching tags.  If
163	       extensions and private-use subtags are included in a matching
164	       process that utilizes the default fallback mechanism, then the
165	       implementation should canonicalize the language tags and/or
166	       ranges before performing the matching.  Note that language tag
167	       processors that claim to be "well-formed" processors as defined
168	       in [1] generally fall into this category.

170	   There are two matching schemes  that are commonly associated with
171	   basic language ranges:  matching and  lookup.

173	2.1.1  Matching

175	   Language tag matching is used to select all content that matches a
176	   given prefix.  In matching, the language range represents the least
177	   specific tag which is an acceptable match and every piece of content
178	   that matches is returned.

180	   For example, if an application is applying a style to all content in
181	   a web page in a particular language, it might use language tag
182	   matching to perform the matching.

184	   A language-range matches a language-tag if it exactly equals the tag,
185	   or if it exactly equals a prefix of the tag such that the first
186	   character following the prefix is "-".  (That is, the language-range
187	   "en-de" matches the language tag "en-DE-boont", but not the language
188	   tag "en-Deva".)

190	   The special range "*" matches any tag.  A protocol which uses
191	   language ranges may specify additional rules about the semantics of
192	   "*"; for instance, HTTP/1.1 specifies that the range "*" matches only
193	   languages not matched by any other range within an "Accept-Language:"
194	   header.

196	2.1.2  Lookup

198	   Content lookup is used to select the single information item that
199	   best matches the language range for a given request.  In lookup, the
200	   language range represents the most specific tag which is an
201	   acceptable match and only the closest matching item is returned.

203	   For example, if an application inserts some dynamic content into a
204	   web page, returning an empty string if there is no exact match is not
205	   an option.  Instead, the application "falls back".

207	   When performing lookup, the language range is progressively truncated
208	   from the end until a matching piece of content is located.  For
209	   example, starting with the range "zh-Hant-CN-x-wadegile", the lookup
210	   would progressively search for content as shown below:

212	   Range to match: zh-Hant-CN-x-wadegile
213	   1. zh-Hant-CN-x-wadegile
214	   2. zh-Hant-CN
215	   3. zh-Hant
216	   4. zh
217	   5. (default content or the empty tag)

219	                Figure 2: Default Fallback Pattern Example

221	   This scheme allows some flexibility in finding content.  It also
222	   typically provides better results when data is not available at a
223	   specific level of tag granularity or is sparsely populated (than if
224	   the default language for the system or content were used).

226	2.2  Extended Language Range

228	   Prefix matching using a Basic Language Range, as described above, is
229	   not always the most appropriate way to access the information
230	   contained in language tags when selecting or filtering content.  Some
231	   applications may wish to define a more granular matching scheme and
232	   such a matching scheme requires the ability to specify the various
233	   attributes of a language tag in the language range.  An extended
234	   language range can be represented by the following ABNF:
235	   extended-language-range = grandfathered / privateuse / range
236	   range   = ( lang [ "-" script ] [ "-" region ] *( "-" variant )
237	                [ "-" privateuse ] )
238	   lang    = ( 2*8ALPHA *[ *( "-" extlang ] ) ) / "*"
239	   extlang = 3ALPHA / "*"
240	   script  = 4ALPHA / "*"
241	   region  = 2ALPHA / 3DIGIT / "*"
242	   variant = 5*8alphanum / ( DIGIT 3alphanum ) / "*"
243	   privateuse    = ( "x" / "X" ) 1*( "-" ( 1*8alphanum ) )
244	   grandfathered = 1*3ALPHA 1*2( "-" ( 2*8alphanum ) )
245	   alphanum      = ( ALPHA / DIGIT )
246	   In an extended language range, the identifier takes the form of a
247	   series of subtags which must consist of well-formed subtags or the
248	   special subtag "*".  For example, the language range "en-*-US"
249	   specifies a primary language of 'en', followed by any script subtag,
250	   followed by the region subtag 'US'.

252	   A field not present in the middle of an extended language range MAY
253	   be treated as if the field contained a "*".  For example, the range
254	   "en-US" MAY be considered to be equivalent to the range "en-*-US".

256	   There are several matching algorithms or schemes which may be applied
257	   when matching extended language ranges to language tags.

259	2.2.1  Extended Range Matching

261	   In extended range matching, the subtags in a language tag are
262	   compared to the corresponding subtags in the extended language range.
263	   A subtag is considered to match if it exactly matches the
264	   corresponding subtag in the range or the range contains a subtag with
265	   the value "*" (which matches all subtags, including the empty
266	   subtag).  Extended Range Matching is an extension of basic matching
267	   (Section 2.1.1): the language range represents the least specific tag
268	   which is an acceptable match.

270	   By default all extensions and their subtags are ignored for extended
271	   language range matching.

273	   Private use subtags may be specified in the language range and MUST
274	   NOT be ignored when matching.

276	   Subtags not specified, included those at the end of the language
277	   range, are assigned the value "*".  This makes each range into a
278	   prefix much like that used in basic language range matching.  For
279	   example, the extended language range "zh-*-CN" matches all of the
280	   following tags because the unspecified variant field is expanded to
281	   "*":

283	      zh-Hant-CN

285	      zh-CN

287	      zh-Hans-CN

289	      zh-CN-x-wadegile

291	      zh-Latn-CN-boont

293	2.2.2  Extended Range Lookup

295	   In extended range lookup, the subtags in a language tag are compared
296	   to the corresponding subtags in the extended language range.  The
297	   subtag is considered to match if it exactly matches the corresponding
298	   subtag in the range or the range contains a subtag with the value "*"
299	   (which matches all subtags, including the empty subtag).  Extended
300	   language range lookup is an extension of basic lookup
301	   (Section 2.1.2): the language range represents the most specific tag
302	   which will form an acceptable match.

304	   Subtags not specified are assigned the value "*" prior to performing
305	   tag matching.  Unlike in extended range matching, however, fields at
306	   the end of the range MUST NOT be expanded in this manner.  For
307	   example, "en-US" must not be considered to be the same as the range
308	   "en-US-*".  This allows ranges to be specific.  The "*" wildcard MUST
309	   be used at the end of the range to indicate that all tags with the
310	   range as a prefix are allowable matches.  That is, the range "zh-*"
311	   matches the tags "zh-Hant" and "zh-Hant-CN", while the range "zh"
312	   matches neither of those tags.

314	   The wildcard "*" at the end of a range SHOULD be considered to match
315	   any private use subtag sequences (making extended language range
316	   lookup function exactly like extended range matching Section 2.2.1).

318	   By default all extensions and their subtags SHOULD be ignored for
319	   extended language range lookup.  Private use subtags may be specified
320	   in the language range and MUST NOT be ignored when performing lookup.
321	   The wildcard "*" at the end of a range SHOULD be considered to match
322	   any private use subtag sequences in addition to variants.

324	   For example, the range "*-US" matches all of the following tags:

326	      en-US

328	      en-Latn-US

330	      en-US-r-extends (extensions are ignored)

332	      fr-US

334	   For example, the range "en-*-US" matches _none_ of the following
335	   tags:

337	      fr-US

339	      en (missing region US)
340	      en-Latn (missing region US)

342	      en-Latn-US-scouse (variant field is present)

344	   For example, the range "en-*" matches all of the following tags:

346	      en-Latn

348	      en-Latn-US

350	      en-Latn-US-scouse

352	      en-US

354	      en-scouse

356	   It should be noted that the ability to be specific in extended range
357	   lookup may make this matching scheme a more appropriate replacement
358	   for basic matching than the extended range matching scheme.

360	2.2.3  Scored Matching

362	   In the "scored matching" scheme, the extended language range and the
363	   language tags are pre-normalized by mapping grandfathered and
364	   obsolete tags into modern equivalents.

366	   The language range and the language tags are normalized into
367	   quadruples of the form (language, script, country, variant), where
368	   extended language is considered part of language and x-private-codes
369	   are considered part of the language if they are initial and part of
370	   the variant if not initial.  Missing components are set to "*".  An
371	   "*" pattern becomes the quadruple ("*", "*", "*", "*").

373	   Each language tag  being matched or filtered is assigned a "quality
374	   value" such that higher values indicate better matches and lower
375	   values indicate worse ones.  If the language matches, add 8 to the
376	   quality value.  If the script matches, add 4 to the quality value.
377	   If the region matches, add 2 to the quality value.  If the variant
378	   matches, add 1 to the quality value.  Elements of the quadruples are
379	   considered to match if they are the same or if one of them is "*".

381	   A value of 15 is a perfect match; 0 is no match at all.  Different
382	   values may be more or less appropriate for different applications and
383	   implementations should probably allow users to choose the most
384	   appropriate selection value.

386	2.3  Meaning of Language Tags and Ranges

388	   A language tag defines a language as spoken (or written, signed or
389	   otherwise signaled) by human beings for communication of information
390	   to other human beings.

392	   If a language tag B contains language tag A as a prefix, then B is
393	   typically "narrower" or "more specific" than A. For example, "zh-
394	   Hant-TW" is more specific than "zh-Hant".

396	   This relationship is not guaranteed in all cases: specifically,
397	   languages that begin with the same sequence of subtags are NOT
398	   guaranteed to be mutually intelligible, although they may be.  For
399	   example, the tag "az" shares a prefix with both "az-Latn"
400	   (Azerbaijani written using the Latin script) and "az-Cyrl"
401	   (Azerbaijani written using the Cyrillic script).  A person fluent in
402	   one script may not be able to read the other, even though the text
403	   might be otherwise identical.  Content tagged as "az" most probably
404	   is written in just one script and thus might not be intelligible to a
405	   reader familiar with the other script.

407	   The relationship between the tag and the information it relates to is
408	   defined by the standard describing the context in which it appears.
409	   Accordingly, this section can only give possible examples of its
410	   usage.

412	   o  For a single information object, the associated language tags
413	      might be interpreted as the set of languages that is required for
414	      a complete comprehension of the complete object.  Example: Plain
415	      text documents.

417	   o  For an aggregation of information objects, the associated language
418	      tags could be taken as the set of languages used inside components
419	      of that aggregation.  Examples: Document stores and libraries.

421	   o  For information objects whose purpose is to provide alternatives,
422	      the associated language tags could be regarded as a hint that the
423	      content is provided in several languages, and that one has to
424	      inspect each of the alternatives in order to find its language or
425	      languages.  In this case, the presence of multiple tags might not
426	      mean that one needs to be multi-lingual to get complete
427	      understanding of the document.  Example: MIME multipart/
428	      alternative.

430	   o  In markup languages, such as HTML and XML, language information
431	      can be added to each part of the document identified by the markup
432	      structure (including the whole document itself).  For example, one
433	      could write <span lang="FR">C'est la vie.</span> inside a
434	      Norwegian document; the Norwegian-speaking user could then access
435	      a French-Norwegian dictionary to find out what the marked section
436	      meant.  If the user were listening to that document through a
437	      speech synthesis interface, this formation could be used to signal
438	      the synthesizer to appropriately apply French text-to-speech
439	      pronunciation rules to that span of text, instead of misapplying
440	      the Norwegian rules.

442	2.4  Choosing Between Alternate Matching Schemes

444	   Implementations MAY choose to implement different styles of matching
445	   for different kinds of processing.  For example, an implementation
446	   could treat an absent script subtag as a "wildcard" field; thus
447	   "az-AZ" would match "az-AZ", "az-Cyrl-AZ", "az-Latn-AZ", etc. but not
448	   "az" (this is extended range lookup).  If one item is to be chosen,
449	   the implementation could pick among those matches based on other
450	   information, such as the most likely script used in the language/
451	   region in question or the script used by other content selected.

453	   Because the primary language subtag cannot be absent in a language
454	   tag, the 'UND' subtag may sometimes be used as a 'wildcard' in basic
455	   matching.  For example, in a query where you want to select all
456	   language tags that contain 'Latn' as the script code and 'AZ' as the
457	   region code, you could use the range "und-Latn-AZ".  This requires an
458	   implementation to examine the actual values of the subtags, though.
459	   The matching schemes described elsewhere in this document do not
460	   require implementations to examine the values supplied and, except
461	   for scored matching, they do not require access to the Language
462	   Subtag Registry nor the use of valid subtags in language tags or
463	   ranges.  This has great benefit for speed and simplicity of
464	   implementation.

466	   Implementations may also wish to use semantic information external to
467	   the langauge tags when performing fallback.  For example, the primary
468	   language subtags 'nn' (Nynorsk Norwegian) and 'nb' (Bokmal Norwegian)
469	   might both be usefully matched to the more general subtag 'no'
470	   (Norwegian).  Or an application might infer that content labeled
471	   "zh-CN" is morely likely to match the range "zh-Hans" than equivalent
472	   content labeled "zh-TW".

474	2.5  Considerations for Private Use Subtags

476	   Private-use subtags require private agreement between the parties
477	   that intend to use or exchange language tags that use them and great
478	   caution should be used in employing them in content or protocols
479	   intended for general use.  Private-use subtags are simply useless for
480	   information exchange without prior arrangement.

482	   The value and semantic meaning of private-use tags and of the subtags
483	   used within such a language tag are not defined.  Matching private
484	   use tags using language ranges or extended language ranges may result
485	   in unpredictable content being returned.

487	3.  IANA Considerations

489	   This document presents no new or existing considerations for IANA.

491	4.  Changes

493	   This is the first version of this document.  Changes from the
494	   reference work (draft-phillips-matching-00) are too numerious to
495	   record.

497	5.  Security Considerations

499	   The only security issue that has been raised with language tags since
500	   the publication of RFC 1766, which stated that "Security issues are
501	   believed to be irrelevant to this memo", is a concern with language
502	   ranges used in content negotiation - that they may be used to infer
503	   the nationality of the sender, and thus identify potential targets
504	   for surveillance.

506	   This is a special case of the general problem that anything you send
507	   is visible to the receiving party.  It is useful to be aware that
508	   such concerns can exist in some cases.

510	   The evaluation of the exact magnitude of the threat, and any possible
511	   countermeasures, is left to each application protocol.

513	   Although the specification of valid subtags for an extension MUST be
514	   available over the Internet, implementations SHOULD NOT mechanically
515	   depend on it being always accessible, to prevent denial-of-service
516	   attacks.

518	6.  Character Set Considerations

520	   The syntax in this document requires that language ranges use only
521	   the characters A-Z, a-z, 0-9, and HYPHEN-MINUS legal in language
522	   tags.  These characters are present in most character sets, so
523	   presentation of language tags should not have any character set
524	   issues.

526	   Rendering of characters based on the content of a language tag is not
527	   addressed in this memo.  Historically, some languages have relied on
528	   the use of specific character sets or other information in order to
529	   infer how a specific character should be rendered (notably this
530	   applies to language and culture specific variations of Han ideographs
531	   as used in Japanese, Chinese, and Korean).  When language tags are
532	   applied to spans of text, rendering engines may use that information
533	   in deciding which font to use in the absence of other information,
534	   particularly where languages with distinct writing traditions use the
535	   same characters.

537	7.  References

539	7.1  Normative References

541	   [1]   Phillips, A., Ed. and M. Davis, Ed., "Tags for the
542	         Identification of Languages (Internet-Draft)", February 2005, <
543	         http://www.ietf.org/internet-drafts/
544	         draft-ietf-ltru-registry-01.txt>.

546	   [2]   Hardcastle-Kille, S., "Mapping between X.400(1988) / ISO 10021
547	         and RFC 822", RFC 1327, May 1992.

549	   [3]   Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
550	         Extensions) Part One: Mechanisms for Specifying and Describing
551	         the Format of Internet Message Bodies", RFC 1521,
552	         September 1993.

554	   [4]   Hovey, R. and S. Bradner, "The Organizations Involved in the
555	         IETF Standards Process", BCP 11, RFC 2028, October 1996.

557	   [5]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
558	         Levels", BCP 14, RFC 2119, March 1997.

560	   [6]   Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word
561	         Extensions: Character Sets, Languages, and Continuations",
562	         RFC 2231, November 1997.

564	   [7]   Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
565	         Specifications: ABNF", RFC 2234, November 1997.

567	   [8]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
568	         Resource Identifiers (URI): Generic Syntax", RFC 2396,
569	         August 1998.

571	   [9]   Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
572	         Considerations Section in RFCs", BCP 26, RFC 2434,
573	         October 1998.

575	   [10]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
576	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
577	         HTTP/1.1", RFC 2616, June 1999.

579	   [11]  Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
580	         Understanding Concerning the Technical Work of the Internet
581	         Assigned Numbers Authority", RFC 2860, June 2000.

583	   [12]  Yergeau, F., "UTF-8, a transformation format of ISO 10646",
584	         STD 63, RFC 3629, November 2003.

586	7.2  Informative References

588	   [13]  International Organization for Standardization, "ISO 639-
589	         1:2002, Codes for the representation of names of languages --
590	         Part 1: Alpha-2 code", ISO Standard 639, 2002.

592	   [14]  International Organization for Standardization, "ISO 639-2:1998
593	         - Codes for the representation of names of languages -- Part 2:
594	         Alpha-3 code - edition 1", August 1988.

596	   [15]  ISO TC46/WG3, "ISO 15924:2003 (E/F) - Codes for the
597	         representation of names of scripts", January 2004.

599	   [16]  International Organization for Standardization, "Codes for the
600	         representation of names of countries, 3rd edition",
601	         ISO Standard 3166, August 1988.

603	   [17]  Statistical Division, United Nations, "Standard Country or Area
604	         Codes for Statistical Use", UN Standard Country or Area Codes
605	         for Statistical Use, Revision 4 (United Nations publication,
606	         Sales No. 98.XVII.9, June 1999.

608	   [18]  Alvestrand, H., "Tags for the Identification of Languages",
609	         RFC 1766, March 1995.

611	   [19]  Alvestrand, H., "Tags for the Identification of Languages",
612	         BCP 47, RFC 3066, January 2001.

614	   [20]  Klyne, G. and C. Newman, "Date and Time on the Internet:
615	         Timestamps", RFC 3339, July 2002.

617	Authors' Addresses

619	   Addison Phillips (editor)
620	   Quest Software

622	   Email: addison dot phillips at quest dot com

624	   Mark Davis (editor)
625	   IBM

627	   Email: mark dot davis at ibm dot com

629	Appendix A.  Acknowledgements

631	   Any list of contributors is bound to be incomplete; please regard the
632	   following as only a selection from the group of people who have
633	   contributed to make this document what it is today.

635	   The contributors to RFC 3066 and RFC 1766, the precursors of this
636	   document, made enormous contributions directly or indirectly to this
637	   document and are generally responsible for the success of language
638	   tags.

640	   The following people (in alphabetical order) contributed to this
641	   document or to RFCs 1766 and 3066:

643	   Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet,
644	   Nathaniel Borenstein, Eric Brunner, Sean M. Burke, Jeremy Carroll,
645	   John Clews, Jim Conklin, Peter Constable, John Cowan, Mark Crispin,
646	   Dave Crocker, Martin Duerst, Michael Everson, Doug Ewell, Ned Freed,
647	   Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren,
648	   Elliotte Rusty Harold, Paul Hoffman, Richard Ishida, Olle Jarnefors,
649	   Kent Karlsson, John Klensin, Alain LaBonte, Eric Mader, Keith Moore,
650	   Chris Newman, Masataka Ohta, George Rhoten, Markus Scherer, Keld Jorn
651	   Simonsen, Thierry Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys
652	   Weatherley, Misha Wolf, Francois Yergeau and many, many others.

654	   Very special thanks must go to Harald Tveit Alvestrand, who
655	   originated RFCs 1766 and 3066, and without whom this document would
656	   not have been possible.  Special thanks must go to Michael Everson,
657	   who has served as language tag reviewer for almost the complete
658	   period since the publication of RFC 1766.  Special thanks to Doug
659	   Ewell, for his production of the first complete subtag registry, and
660	   his work in producing a test parser for verifying language tags.

662	   For this particular document, John Cowan originated the scheme
663	   described in Section 2.2.3.  Mark Davis originated the scheme
664	   described in the Section 2.1.2.

666	Intellectual Property Statement

668	   The IETF takes no position regarding the validity or scope of any
669	   Intellectual Property Rights or other rights that might be claimed to
670	   pertain to the implementation or use of the technology described in
671	   this document or the extent to which any license under such rights
672	   might or might not be available; nor does it represent that it has
673	   made any independent effort to identify any such rights.  Information
674	   on the procedures with respect to rights in RFC documents can be
675	   found in BCP 78 and BCP 79.

677	   Copies of IPR disclosures made to the IETF Secretariat and any
678	   assurances of licenses to be made available, or the result of an
679	   attempt made to obtain a general license or permission for the use of
680	   such proprietary rights by implementers or users of this
681	   specification can be obtained from the IETF on-line IPR repository at
682	   http://www.ietf.org/ipr.

684	   The IETF invites any interested party to bring to its attention any
685	   copyrights, patents or patent applications, or other proprietary
686	   rights that may cover technology that may be required to implement
687	   this standard.  Please address the information to the IETF at
688	   ietf-ipr@ietf.org.

690	Disclaimer of Validity

692	   This document and the information contained herein are provided on an
693	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
694	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
695	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
696	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
697	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
698	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

700	Copyright Statement

702	   Copyright (C) The Internet Society (2005).  This document is subject
703	   to the rights, licenses and restrictions contained in BCP 78, and
704	   except as set forth therein, the authors retain all their rights.

706	Acknowledgment

708	   Funding for the RFC Editor function is currently provided by the
709	   Internet Society.