idnits 2.17.1 

draft-ietf-ltru-matching-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 833.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 810.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 817.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 823.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 4, 2006) is 6620 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2616errata' is defined on line 753, but no
     explicit reference was found in the text

  ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234)

  -- Obsolete informational reference (is this intentional?): RFC 1766
     (Obsoleted by RFC 3066, RFC 3282)

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Duplicate reference: RFC2616, mentioned in 'RFC2616errata', was also
     mentioned in 'RFC2616'.

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC 3066
     (Obsoleted by RFC 4646, RFC 4647)


     Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                                Yahoo! Inc
4	Obsoletes: 3066 (if approved)                              M. Davis, Ed.
5	Expires: September 5, 2006                                        Google
6	                                                           March 4, 2006

8	                       Matching of Language Tags
9	                      draft-ietf-ltru-matching-11

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on September 5, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   This document describes different mechanisms for comparing and
43	   matching language tags.  Possible algorithms for language negotiation
44	   or content selection, filtering, and lookup are described.  This
45	   document, in combination with RFC 3066bis (Ed.: replace "3066bis"
46	   with the RFC number assigned to draft-ietf-ltru-registry-14),
47	   replaces RFC 3066, which replaced RFC 1766.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  The Language Range . . . . . . . . . . . . . . . . . . . . . .  4
53	     2.1.  Basic Language Range . . . . . . . . . . . . . . . . . . .  4
54	     2.2.  Extended Language Range  . . . . . . . . . . . . . . . . .  5
55	     2.3.  The Language Priority List . . . . . . . . . . . . . . . .  5
56	   3.  Types of Matching  . . . . . . . . . . . . . . . . . . . . . .  7
57	     3.1.  Choosing a Type of Matching  . . . . . . . . . . . . . . .  7
58	     3.2.  Filtering  . . . . . . . . . . . . . . . . . . . . . . . .  9
59	       3.2.1.  Basic Filtering  . . . . . . . . . . . . . . . . . . .  9
60	       3.2.2.  Extended Filtering . . . . . . . . . . . . . . . . . . 10
61	     3.3.  Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 11
62	   4.  Other Considerations . . . . . . . . . . . . . . . . . . . . . 15
63	     4.1.  Choosing Language Ranges . . . . . . . . . . . . . . . . . 15
64	     4.2.  Meaning of Language Tags and Ranges  . . . . . . . . . . . 16
65	     4.3.  Considerations for Private Use Subtags . . . . . . . . . . 16
66	     4.4.  Length Considerations for Language Ranges  . . . . . . . . 17
67	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
68	   6.  Changes  . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
69	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
70	   8.  Character Set Considerations . . . . . . . . . . . . . . . . . 21
71	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
72	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 22
73	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 22
74	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 23
75	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
76	   Intellectual Property and Copyright Statements . . . . . . . . . . 25

78	1.  Introduction

80	   Human beings on our planet have, past and present, used a number of
81	   languages.  There are many reasons why one would want to identify the
82	   language used when presenting or requesting information or in some
83	   specific set of information items or "content".

85	   One use for language identifiers, such as those defined in
86	   [RFC3066bis], is to select content by matching the associated
87	   language tags to a user's language preferences.

89	   This document defines a syntax (called a language range (Section 2))
90	   for specifying items in the user's list of language preferences
91	   (called a language priority list (Section 2.3)), as well as several
92	   schemes for selecting or filtering sets of content by comparing the
93	   content's language tags to the user's preferences.  Applications,
94	   protocols, or specifications will have varying needs and requirements
95	   that affect the choice of a suitable matching scheme.  Depending on
96	   the choice of scheme, there are various options left to the
97	   implementation.  Protocols that implement a matching scheme either
98	   need to specify each particular choice or indicate the options that
99	   are left to the implementation to decide.

101	   This document is divided into three main sections.  One describes how
102	   to indicate a user's preferences using language ranges.  Then a
103	   section describes various schemes for matching these ranges to a set
104	   of language tags.  There is also a section that deals with various
105	   practical considerations that apply to implementing and using these
106	   schemes.

108	   This document, in combination with [RFC3066bis] (Ed.: replace
109	   "3066bis" globally in this document with the RFC number assigned to
110	   draft-ietf-ltru-registry-14), replaces [RFC3066], which replaced
111	   [RFC1766].

113	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
114	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
115	   document are to be interpreted as described in [RFC2119].

117	2.  The Language Range

119	   Language Tags [RFC3066bis] are used to identify the language of some
120	   information item or "content".  Applications or protocols that use
121	   language tags are often faced with the problem of identifying sets of
122	   content that share certain language attributes.  For example,
123	   HTTP/1.1 [RFC2616] describes one such mechanism in its discussion of
124	   the Accept-Language header (Section 14.4), which is used when
125	   selecting content from servers based on the language of that content.

127	   When selecting content according to its language, it is useful to
128	   have a mechanism for identifying sets of language tags that share
129	   specific attributes.  This allows users to select or filter content
130	   based on specific requirements.  Such an identifier is called a
131	   "language range".

133	   There are different types of language range, whose specific
134	   attributes vary according to their application.  Language ranges are
135	   similar to language tags: they consist of a sequence of subtags
136	   separated by hyphens.  In a language range, each subtag MUST either
137	   be a sequence of ASCII alphanumeric characters or the single
138	   character '*' (%2A, ASTERISK).  The character '*' is a "wildcard"
139	   that matches any sequence of subtags.  The meaning and uses of
140	   wildcards vary according to the type of language range.

142	   Language tags and thus language ranges are to be treated as case-
143	   insensitive: there exist conventions for the capitalization of some
144	   of the subtags, but these MUST NOT be taken to carry meaning.
145	   Matching of language tags to language ranges MUST be done in a case-
146	   insensitive manner.

148	2.1.  Basic Language Range

150	   A "basic language range" describes a user's language preference as a
151	   specific, uninterrupted, sequence of subtags.  Each range consists of
152	   a sequence of alphanumeric subtags separated by hyphens.  The basic
153	   language range is defined by the following ABNF [RFC4234]:

155	   language-range   = (1*8ALPHA *("-" 1*8alphanum)) / "*"
156	   alphanum         = ALPHA / DIGIT

158	   Basic language ranges (originally described by HTTP/1.1 [RFC2616] and
159	   later [RFC3066]) have the same syntax as an [RFC3066] language tag or
160	   are the single character "*".  They differ from the language tags
161	   defined in [RFC3066bis] only in that there is no requirement that
162	   they be "well-formed" or be validated against the IANA Language
163	   Subtag Registry (although such ill-formed ranges will probably not
164	   match anything).  (Note that the ABNF [RFC4234] in [RFC2616] is
165	   incorrect, since it disallows the use of digits anywhere in the
166	   'language-range': this is mentioned in the errata)

168	   Use of a basic language range seems to imply that there is a semantic
169	   relationship between language tags that share the same prefix.  While
170	   this is often the case, it is not always true and users should note
171	   that the set of language tags that match a specific language range
172	   may not represent mutually intelligible languages.

174	2.2.  Extended Language Range

176	   Occasionally users will wish to select a set of language tags based
177	   on the presence of specific subtags.  An "extended language range"
178	   describes a user's language preference as an ordered sequence of
179	   subtags.  For example, a user might wish to select all language tags
180	   that contain the region subtag 'CH' (Switzerland).  Extended language
181	   ranges are useful in specifying a particular sequence of subtags that
182	   appear in the set of matching tags without having to specify all of
183	   the intervening subtags.

185	   An extended language range can be represented by the following ABNF:

187	   extended-language-range = (1*8ALPHA / "*")
188	                             *("-" (1*8alphanum / "*"))

190	   Figure 2: Extended Language Range

192	   The wildcard subtag '*' can occur in any position in the extended
193	   language range, where it matches any sequence of subtags that might
194	   occur in that position in a language tag.  However, wildcards outside
195	   the first position in an extended language range are ignored by most
196	   matching schemes.  Use of one or more wildcards SHOULD NOT be taken
197	   to imply that a certain number of subtags will appear in the matching
198	   set of language tags.

200	   Implementations that specify basic ranges MAY map extended language
201	   ranges to basic language ranges: if the first subtag is a "*" then
202	   the entire range is treated as "*", otherwise each wildcard subtag is
203	   removed.  For example, if the language range were "en-*-US", then the
204	   range would be mapped to "en-US".

206	2.3.  The Language Priority List

208	   A user's language preferences will often need to specify more than
209	   one language range and thus users often need to specify a prioritized
210	   list of language ranges in order to best reflect their language
211	   preferences.  This is especially true for speakers of minority
212	   languages.  A speaker of Breton in France, for example, may specify
213	   "be" followed by "fr", meaning that if Breton is available, it is
214	   preferred, but otherwise French is the best alternative.  It can get
215	   more complex: a user may wish to fall back from Skolt Sami to
216	   Northern Sami to Finnish.

218	   A "language priority list" is a prioritized or weighted list of
219	   language ranges.  One well known example of such a list is the
220	   "Accept-Language" header defined in RFC 2616 [RFC2616] (see Section
221	   14.4) and RFC 3282 [RFC3282].

223	   The various matching operations described in this document include
224	   considerations for using a language priority list.  This document
225	   does not define the syntax for a language priority list; defining
226	   such a syntax is the responsibility of the protocol, application, or
227	   specification that uses it.  When given as examples in this document,
228	   language priority lists will be shown as a quoted sequence of ranges
229	   separated by commas, like this: "en, fr, zh-Hant" (which would be
230	   read as "English before French before Chinese as written in the
231	   Traditional script").

233	   A simple list of ranges is considered to be in descending order of
234	   priority.  Other language priority lists provide "quality weights"
235	   for the language ranges in order to specify the relative priority of
236	   the user's language preferences.  An example of this would be the use
237	   of "q" values in the syntax of the "Accept-Language" header (defined
238	   in [RFC2616], Section 14.4, and [RFC3282]).

240	3.  Types of Matching

242	   Matching language ranges to language tags can be done in a number of
243	   different ways.  This section describes several different matching
244	   schemes, as well as the considerations for choosing between them.
245	   Protocols and specifications SHOULD clearly indicate the particular
246	   mechanism used in selecting or matching language tags.

248	   There are several types of matching scheme.  This document presents
249	   two types: those that produce zero or more information items (called
250	   "filtering") and those that produce a single information item for a
251	   given request (called "lookup").

253	   Implementations or protocols MAY use different matching schemes from
254	   the ones described in this document, as long as those mechanisms are
255	   clearly specified.

257	3.1.  Choosing a Type of Matching

259	   Applications, protocols, and specifications are faced with the
260	   decision of what type of matching to use.  Sometimes, different
261	   styles of matching are suited to different kinds of processing within
262	   a particular application or protocol.

264	   Language tag matching is a tool, and does not by itself specify a
265	   complete procedure for the use of language tags.  Such procedures are
266	   intimately tied to the application protocol in which they occur.
267	   When specifying a protocol operation using matching, the protocol
268	   MUST specify:

270	   o  Which type(s) of language tag matching it uses

272	   o  Whether the operation returns a single result (lookup) or a
273	      possibly empty set of results (filtering)

275	   o  For lookup, what the result is when no matching tag is found.  For
276	      instance, a protocol might define the result as failure of the
277	      operation, an empty value, returning some protocol defined or
278	      implementation defined default, or returning i-default [RFC2277].

280	   This document describes three types of matching:

282	   1.  Basic Filtering (Section 3.2.1) matches a language priority list
283	       consisting of basic language ranges (Section 2.1) to sets of
284	       language tags.

286	   2.  Extended Filtering (Section 3.2.2) matches a language priority
287	       list consisting of extended language ranges (Section 2.2) to sets
288	       of language tags.

290	   3.  Lookup (Section 3.3) matches a language priority list consisting
291	       of basic language ranges to sets of language tags to find the one
292	       _exact_ language tag that best matches the range.

294	   Filtering can be used to produce a set of results (such as a
295	   collection of documents) by comparing the user's preferences to
296	   language tags associated with the set of content.  For example, when
297	   performing a search, one might use filtering to limit the results to
298	   items tagged as being in the French language.  Filtering can also be
299	   used when deciding whether to perform a language-sensitive process on
300	   some content.  For example, a process might cause paragraphs whose
301	   language tag matched the language range "nl" to be displayed in
302	   italics within a document.

304	   Lookup produces the single result that best matches the user's
305	   preferences, so it is useful in cases in which only a single item can
306	   be returned.  For example, if a process were to insert a human
307	   readable error message into a protocol header, it might select the
308	   text based on the user's language priority list.  Since the process
309	   can return only one item, it must choose a single item and it must
310	   return some item, even if none of the content's language tags match
311	   the language priority list supplied by the user.

313	   The types of matching in this document are designed so that
314	   implementations are not required to validate or understand any of the
315	   semantics of the language tags or ranges or of the subtags in them.
316	   None of them require access to the IANA Language Subtag Registry (see
317	   Section 3 in [RFC3066bis]).  This simplifies implementation of these
318	   schemes.  An implementation MAY choose to check if either the
319	   language ranges or language tags being matched are "well-formed" or
320	   "valid" (see [RFC3066bis], Section 2.2.9) and MAY choose not to
321	   process invalid ranges.

323	   Regardless of the matching scheme chosen, protocols and
324	   implementations MAY canonicalize language tags and ranges by mapping
325	   grandfathered and obsolete tags or subtags into modern equivalents.
326	   If an implementation canonicalizes either ranges or tags, then the
327	   implementation will require the IANA Language Subtag Registry
328	   information for that purpose.  Implementations MAY also use semantic
329	   information external to the registry when matching tags.  For
330	   example, the primary language subtags 'nn' (Nynorsk Norwegian) and
331	   'nb' (Bokmal Norwegian) might both be usefully matched to the more
332	   general subtag 'no' (Norwegian).  Or an implementation might infer
333	   that content labeled "zh-Hans" (Chinese as written in the Simplified
334	   script) is more likely to match the range "zh-CN" (Chinese as used in
335	   China, where the Simplified script is predominant) than equivalent
336	   content labeled "zh-TW" (Chinese as used in Taiwan, where the
337	   Traditional script is predominant).

339	3.2.  Filtering

341	   Filtering is used to select the set of language tags that matches a
342	   given language priority list and return the associated content.  It
343	   is called "filtering" because this set might contain no items at all
344	   or it might return an arbitrarily large number of matching items: as
345	   many items as match the language priority list, thus "filtering out"
346	   the non-matching items.

348	   In filtering, each language range represents the _least_ specific
349	   language tag (that is, the language tag with fewest number of
350	   subtags) which is an acceptable match.  All of the language tags in
351	   the matching set of tags will have an equal or greater number of
352	   subtags than the language range.  Every non-wildcard subtag in the
353	   language range will appear in every one of the matching language
354	   tags.  For example, if the language priority list consists of the
355	   range "de-CH", one might see tags such as "de-CH-1996" but one will
356	   never see a tag such as "de" (because the 'CH' subtag is missing).

358	   If the language priority list (see Section 2.3) contains more than
359	   one range, the content returned is typically ordered in descending
360	   level of preference, but it MAY be unordered, according to the needs
361	   of the application or protocol.

363	   Some examples of applications where filtering might be appropriate
364	   include:

366	   o  Applying a style to sections of a document in a particular set of
367	      languages.

369	   o  Displaying the set of documents containing a particular set of
370	      keywords written in a specific set of languages.

372	   o  Selecting all email items written in a specific set of languages.

374	   o  Selecting audio files spoken in a particular language.

376	3.2.1.  Basic Filtering

378	   When filtering using basic language ranges, each basic language range
379	   in the language priority list is considered in turn, according to
380	   priority.  A particular language tag matches a language range if, in
381	   a case-insensitive comparison, it exactly equals the tag, or if it
382	   exactly equals a prefix of the tag such that the first character
383	   following the prefix is "-".  For example, the language-range "de-de"
384	   matches the language tag "de-DE-1996", but not the language tags "de-
385	   Deva" or "de-Latn-DE".

387	   The special range "*" in a language priority list matches any tag.  A
388	   protocol which uses language ranges MAY specify additional rules
389	   about the semantics of "*"; for instance, HTTP/1.1 [RFC2616]
390	   specifies that the range "*" matches only languages not matched by
391	   any other range within an "Accept-Language" header.

393	   Basic filtering is identical to the type of matching described in
394	   [RFC3066], Section 2.5 (Language-range).

396	3.2.2.  Extended Filtering

398	   When filtering using extended language ranges, each extended language
399	   range in the language priority list is considered in turn, according
400	   to priority.  A particular language range is compared to each
401	   language tag using the following process:

403	   Compare the first subtag in the extended language tag to the first
404	   subtag in the language tag in a case insensitive manner.  If the
405	   first subtag in the range is "*", it matches any value.  Otherwise
406	   the two values must match or the overall match fails.

408	   Take each non-wildcard subtag in the language range and compare it in
409	   a case-insensitive manner to the next subtag in the language tag.  If
410	   the range's subtag exactly matches the tag's subtag, proceed to the
411	   next non-wildcard subtag in the language range (and beginning with
412	   the next subtag in the language tag) until the list of subtags in the
413	   language range is exhausted or the match fails.  If the tag's subtag
414	   is a "singleton" (a single letter or digit, which, in this case,
415	   includes the private-use subtag 'x') and the range's subtag does not
416	   match or if the language tag's list of subtags is exhausted, the
417	   match fails.  If the language range's list of subtags is exhausted,
418	   the match succeeds.

420	   Subtags not specified, including those at the end of the language
421	   range, are thus treated as if assigned the wildcard value "*".  Much
422	   like basic filtering, extended filtering selects content with
423	   arbitrarily long tags that share the same initial subtags as the
424	   language range.  In addition extended filtering selects content with
425	   any intermediate subtags unspecified in the language range.  For
426	   example, the extended language range "de-*-DE" matches all of the
427	   following tags:

429	      de-DE
430	      de-Latn-DE

432	      de-Latf-DE

434	      de-de

436	      de-DE-x-goethe

438	      de-Latn-DE-1996

440	   The same range does not match any of the following tags for the
441	   reasons shown:

443	      de (missing 'DE')

445	      de-x-DE (singleton 'x' occurs before 'DE')

447	      de-Deva ('Deva' not equal to 'DE')

449	   Note: The structure of language tags defined by [RFC3066bis] defines
450	   each type of subtag (language, script, region, and so forth)
451	   according to position, size, and content.  This means that subtags in
452	   a language range can only match specific types of subtags in a
453	   language tag.  For example, a subtag such as 'Latn' is always a
454	   script subtag (unless it follows a singleton) while a subtag such as
455	   'nedis' can only match the equivalent variant subtag.

457	3.3.  Lookup

459	   Lookup is used to select the single language tag that best matches
460	   the language priority list for a given request and return the
461	   associated content.  When performing lookup, each language range in
462	   the language priority list is considered in turn, according to
463	   priority.  By contrast with filtering, each language range represents
464	   the _most_ specific tag which is an acceptable match.  The first
465	   content found with a matching tag, according to the user's priority,
466	   is considered the closest match and is the content returned.  For
467	   example, if the language range is "de-ch", a lookup operation might
468	   produce content with the tags "de" or "de-CH" but never one with the
469	   tag "de-CH-1996".  Usually if no content matches the request, the
470	   "default" content is returned.

472	   For example, if an application inserts some dynamic content into a
473	   document, returning an empty string if there is no exact match is not
474	   an option.  Instead, the application "falls back" until it finds a
475	   matching language tag associated with a suitable piece of content to
476	   insert.  Examples of lookup might include:

478	   o  Selection of a template containing the text for an automated email
479	      response.

481	   o  Selection of a item containing some text for inclusion in a
482	      particular Web page.

484	   o  Selection of a string of text for inclusion in an error log.

486	   o  Selection of an audio file to play as a prompt in a phone system.

488	   In the lookup scheme, the language range is progressively truncated
489	   from the end until a matching piece of content is located.  Single
490	   letter or digit subtags (including both the letter 'x' which
491	   introduces private-use sequences, and the subtags that introduce
492	   extensions) are removed at the same time as their closest trailing
493	   subtag.  For example, starting with the range "zh-Hant-CN-x-private1-
494	   private2", the lookup progressively searches for content as shown
495	   below:

497	   Range to match: zh-Hant-CN-x-private1-private2
498	   1. zh-Hant-CN-x-private1-private2
499	   2. zh-Hant-CN-x-private1
500	   3. zh-Hant-CN
501	   4. zh-Hant
502	   5. zh
503	   6. (default content)

505	   Figure 3: Example of a Lookup Fallback Pattern

507	   This allows some flexibility in finding a match.  For example, lookup
508	   provides better results for cases in which content is not available
509	   that exactly matches the user request than if the default language
510	   for the system or content were returned immediately.  Not every
511	   specific level of tag granularity is usually available or language
512	   content may be sparsely populated.  "Falling back" through the subtag
513	   sequence provides more opportunity to find a match between available
514	   language tags and the user's request.

516	   The default behavior when no tag matches the language priority list
517	   is implementation defined.  An implementation might, for example,
518	   return content:

520	   o  with no language tag

522	   o  of a non-linguistic nature, such as an image or sound

524	   o  with an empty language tag value, in cases where the protocol
525	      permits the empty value (see, for example, "xml:lang" in [XML10],
526	      which indicates that the element contains non-linguistic content)

528	   o  in a particular language designated for the bit of content being
529	      selected

531	   o  labelled with the tag "i-default" (see [RFC2277])

533	   When performing lookup using a language priority list, the
534	   progressive search MUST process each language range in the list
535	   before finding the default content or empty tag.

537	   One common way for an application or implementation to provide for a
538	   default is to allow a specific language range to be set as the
539	   default for a specific type of request.  This language range is then
540	   treated as if it were appended to the end of the language priority
541	   list as a whole, rather than after each item in the language priority
542	   list.

544	   For example, if a particular user's language priority list were
545	   "fr-FR, zh-Hant" and the program doing the matching had a default
546	   language range of "ja-JP", the program would search for content as
547	   follows:
548	   1. fr-FR
549	   2. fr
550	   3. zh-Hant // next language
551	   4. zh
552	   5. (search for the default content)
553	      a. ja-JP
554	      b. ja
555	      c. (implementation defined default)

557	   Figure 4: Lookup Using a Language Priority List

559	   Implementations SHOULD ignore extensions and unrecognized private-use
560	   subtags when performing lookup, since these subtags are usually
561	   orthogonal to the user's request.

563	   The special language range "*" matches any language tag.  In the
564	   lookup scheme, this range does not convey enough information by
565	   itself to determine which content is most appropriate, since it
566	   matches everything.  If the language range "*" is followed by other
567	   language ranges, it SHOULD be skipped.  If the language range "*" is
568	   the only one in the language priority list or if no other language
569	   range follows, the default content SHOULD be returned.

571	   In some cases, the language priority list might contain one or more
572	   extended language ranges (as, for example, when the same language
573	   priority list is used as input for both lookup and filtering
574	   operations).  Wildcard values in an extended language range normally
575	   match any value that occurs in that position in a language tag.
576	   Since only one item can be returned for any given lookup request,
577	   wildcards in a language range have to be processed in a consistent
578	   manner or the same request will produce widely varying results.
579	   Implementations that accept extended language ranges MUST define
580	   which content is returned when more than one item matches the
581	   extended language range.

583	   For example, an implementation could return the matching tag that is
584	   first in ASCII-order.  If the language range were "*-CH" and the set
585	   of tags included "de-CH", "fr-CH", and "it-CH", then the tag "de-CH"
586	   would be returned.  Another possibility would be for an
587	   implementation to map the extended language ranges to basic ranges.

589	4.  Other Considerations

591	   When working with language ranges and matching schemes, there are
592	   some additional points that may influence the choice of either.

594	4.1.  Choosing Language Ranges

596	   Users indicate their language preferences via the choice of a
597	   language range or the list of language ranges in a language priority
598	   list.  The type of matching affects what the best choice is for a
599	   user.

601	   Most matching schemes make no attempt to process the semantic meaning
602	   of the subtags and the language range is compared, in a case-
603	   insensitive manner, to each language tag being matched, using basic
604	   string processing.  Users SHOULD select language ranges that are
605	   well-formed, valid language tags according to [RFC3066bis]
606	   (substituting wildcards as appropriate in extended language ranges).

608	   Users SHOULD replace tags or subtags which have been deprecated with
609	   the Preferred-Value from the IANA Language Subtag Registry.  If the
610	   user is working with content that might use the older form, the user
611	   might include both the new and old forms in a language priority list.
612	   For example, the tag "art-lojban" is deprecated.  The subtag 'jbo' is
613	   supposed to be used instead, so the user might use it to form the
614	   language range.  Or the user might include both in a language
615	   priority list: "jbo, art-lojban".

617	   Users SHOULD avoid subtags that add no distinguishing value to a
618	   language range.  When filtering, the fewer the number of subtags that
619	   appear in the language range, the more content the range will
620	   probably match, while in lookup unnecessary subtags might cause
621	   "better", more-specific content to be skipped in favor of less
622	   specific content.  For example, the range "de-Latn-DE" would return
623	   content tagged "de" instead of content tagged "de-DE", even though
624	   the latter is probably a better match.

626	   Many languages are written predominantly in a single script.  This is
627	   usually recorded in the Suppress-Script field in that language
628	   subtag's registry entry.  For these languages, script subtags SHOULD
629	   NOT be used to form a language range.  Thus the language range "en-
630	   Latn" is inappropriate in most cases (because the vast majority of
631	   English documents are written in the Latin script and thus the 'en'
632	   language subtag has a Suppress-Script field for 'Latn' in the
633	   registry).

635	   When working with tags and ranges, note that extensions and most
636	   private-use subtags are orthogonal to language tag matching, in that
637	   they specify additional attributes of the text not related to the
638	   goals of most matching schemes.  Users SHOULD avoid using these
639	   subtags in language ranges, since they interfere with the selection
640	   of available content.  When used in language tags (as opposed to
641	   ranges), these subtags normally do not interfere with filtering
642	   (Section 3), since they appear at the end of the tag and will match
643	   all prefixes.  Lookup (Section 3.3) implementations often ignore
644	   unrecognized private-use and extension subtags when performing
645	   language tag fallback.

647	   Applications, specifications, or protocols that choose not to
648	   interpret one or more private-use or extension subtags SHOULD NOT
649	   remove or modify these extensions in content that they are
650	   processing.  When a language tag instance is to be used in a
651	   specific, known protocol, and is not being passed through to other
652	   protocols, language tags MAY be altered to remove subtags and
653	   extensions that are not supported by that protocol.  Such alterations
654	   SHOULD be avoided, if possible, since they remove information that
655	   might be relevant elsewhere that would make use of that information.

657	   Some applications of language tags might want or need to consider
658	   extensions and private-use subtags when matching tags.  If extensions
659	   and private-use subtags are included in a matching process that
660	   utilizes one of the schemes described in this document, then the
661	   implementation SHOULD canonicalize the language tags and/or ranges
662	   before performing the matching.  Note that language tag processors
663	   that claim to be "well-formed" processors as defined in [RFC3066bis]
664	   generally fall into this category.

666	4.2.  Meaning of Language Tags and Ranges

668	   Selecting content using language ranges requires some understanding
669	   by users of what they are selecting.  The meaning of the various
670	   subtags in a language range are identical to their meaning in a
671	   language tag (see Section 4.2 in [RFC3066bis]), with the addition
672	   that the wildcard "*" represents any matching sequence of values.

674	4.3.  Considerations for Private Use Subtags

676	   Private-use subtags require private agreement between the parties
677	   that intend to use or exchange language tags that use them and great
678	   caution SHOULD be used in employing them in content or protocols
679	   intended for general use.  Private-use subtags are simply useless for
680	   information exchange without prior arrangement.

682	   The value and semantic meaning of private-use tags and of the subtags
683	   used within such a language tag are not defined.  Matching private-
684	   use tags using language ranges or extended language ranges can result
685	   in unpredictable content being returned.

687	4.4.  Length Considerations for Language Ranges

689	   Language ranges are very similar to language tags in terms of content
690	   and usage.  The same types of restrictions on length that apply to
691	   language tags can also apply to language ranges.  See [RFC3066bis]
692	   Section 4.3 (Length Considerations).

694	5.  IANA Considerations

696	   This document presents no new or existing considerations for IANA.

698	6.  Changes

700	   This is the first version of this document.

702	7.  Security Considerations

704	   Language ranges used in content negotiation might be used to infer
705	   the nationality of the sender, and thus identify potential targets
706	   for surveillance.  In addition, unique or highly unusual language
707	   ranges or combinations of language ranges might be used to track a
708	   specific individual's activities.

710	   This is a special case of the general problem that anything you send
711	   is visible to the receiving party.  It is useful to be aware that
712	   such concerns can exist in some cases.

714	   The evaluation of the exact magnitude of the threat, and any possible
715	   countermeasures, is left to each application or protocol.

717	8.  Character Set Considerations

719	   Language tags permit only the characters A-Z, a-z, 0-9, and HYPHEN-
720	   MINUS (%x2D).  Language ranges also use the character ASTERISK
721	   (%x2A).  These characters are present in most character sets, so
722	   presentation or exchange of language tags or ranges should not be
723	   constrained by character set issues.

725	9.  References

727	9.1.  Normative References

729	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
730	              Requirement Levels", BCP 14, RFC 2119, March 1997.

732	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
733	              Languages", BCP 18, RFC 2277, January 1998.

735	   [RFC3066bis]
736	              Phillips, A., Ed. and M. Davis, Ed., "Tags for the
737	              Identification of Languages", October 2005, <http://
738	              www.ietf.org/internet-drafts/
739	              draft-ietf-ltru-registry-14.txt>.

741	   [RFC4234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
742	              Specifications: ABNF", RFC 4234, October 2005.

744	9.2.  Informative References

746	   [RFC1766]  Alvestrand, H., "Tags for the Identification of
747	              Languages", RFC 1766, March 1995.

749	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
750	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
751	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

753	   [RFC2616errata]
754	              IETF, "HTTP/1.1 Specification Errata", 10 2004,
755	              <http://purl.org/NET/http-errata>.

757	   [RFC3066]  Alvestrand, H., "Tags for the Identification of
758	              Languages", BCP 47, RFC 3066, January 2001.

760	   [RFC3282]  Alvestrand, H., "Content Language Headers", RFC 3282,
761	              May 2002.

763	   [XML10]    Bray (et al), T., "Extensible Markup Language (XML) 1.0",
764	              02 2004.

766	Appendix A.  Acknowledgements

768	   Any list of contributors is bound to be incomplete; please regard the
769	   following as only a selection from the group of people who have
770	   contributed to make this document what it is today.

772	   The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of
773	   which is a precursor to this document, made enormous contributions
774	   directly or indirectly to this document and are generally responsible
775	   for the success of language tags.

777	   The following people (in alphabetical order by family name)
778	   contributed to this document:

780	   Harald Alvestrand, Jeremy Carroll, John Cowan, Martin Duerst, Frank
781	   Ellermann, Doug Ewell, Marion Gunn, Kent Karlsson, Ira McDonald, M.
782	   Patton, Randy Presuhn, Eric van der Poel, Markus Scherer, and many,
783	   many others.

785	   Very special thanks must go to Harald Tveit Alvestrand, who
786	   originated RFCs 1766 and 3066, and without whom this document would
787	   not have been possible.

789	Authors' Addresses

791	   Addison Phillips (editor)
792	   Yahoo! Inc

794	   Email: addison at inter dash locale dot com

796	   Mark Davis (editor)
797	   Google

799	   Email: mark dot davis at macchiato dot com

801	Intellectual Property Statement

803	   The IETF takes no position regarding the validity or scope of any
804	   Intellectual Property Rights or other rights that might be claimed to
805	   pertain to the implementation or use of the technology described in
806	   this document or the extent to which any license under such rights
807	   might or might not be available; nor does it represent that it has
808	   made any independent effort to identify any such rights.  Information
809	   on the procedures with respect to rights in RFC documents can be
810	   found in BCP 78 and BCP 79.

812	   Copies of IPR disclosures made to the IETF Secretariat and any
813	   assurances of licenses to be made available, or the result of an
814	   attempt made to obtain a general license or permission for the use of
815	   such proprietary rights by implementers or users of this
816	   specification can be obtained from the IETF on-line IPR repository at
817	   http://www.ietf.org/ipr.

819	   The IETF invites any interested party to bring to its attention any
820	   copyrights, patents or patent applications, or other proprietary
821	   rights that may cover technology that may be required to implement
822	   this standard.  Please address the information to the IETF at
823	   ietf-ipr@ietf.org.

825	Disclaimer of Validity

827	   This document and the information contained herein are provided on an
828	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
829	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
830	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
831	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
832	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
833	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

835	Copyright Statement

837	   Copyright (C) The Internet Society (2006).  This document is subject
838	   to the rights, licenses and restrictions contained in BCP 78, and
839	   except as set forth therein, the authors retain all their rights.

841	Acknowledgment

843	   Funding for the RFC Editor function is currently provided by the
844	   Internet Society.