[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Tag perversity text revised... (#944)



I feel slightly uncomfortable with the last chapter, because the examples given for the shortening of the subtag sequence (i.e., "en-a-abc-x-private1-private2" to "en-a-abc-x-private1" to "en-a-abc") show sequences that are quite short for required truncation (without pointing this out).

Sincerely,
Erkki I. Kolehmainen
coordinator, cultural diversity issues in ICT
Research Institute for the Languages of Finland

Addison Phillips wrote:

Results of all the suggested changes (I used 42 this time).

~Addison

--
2.1.1  Length Considerations

   RFC 3066 [23] did not provide for an upper limit on the size of
   language tags.  While the largest tag that could be generated from
   language and region codes defined under RFC 3066 could not exceed six
   characters in length, much larger registered tags were not only
   possible but were actually registered.

   The ABNF and other guidelines in this document also do not impose a
   fixed upper limit on the number of subtags in a Language Tag (and
   thus the upper bound on the size of a tag) and it is possible to
   envision quite long and complex subtag sequences.  The upper bound on
   generative subtag combinations has expanded and, depending on the
   specific language, may require more characters to form a complete
   tag.

   In practice, most tags will not require substantially more
   characters.  This is partly because additional granularity in tags
   seldom adds useful distinguishing information and because longer,
   more granular tags interfere with the meaning, understanding, and
   processing of language tags.  The 'Prefix' and 'Suppress-Script'
   fields in the registry (see Section 3.1) limit the way in which
   subtags may be combined to form meaningful, valid tags.  For example,
   variant subtags SHOULD be used only with the  prefix specified in the
   registry.

   Some applications and protocols must allocate fixed buffer sizes or
   otherwise limit the length of a language tag in a particular
   application.  A conformant implementation or specification MAY refuse
   to support the storage of language tags which exceed a specified
   length.  For an example, see [RFC 2231] [22].  Any such limitation
   SHOULD be clearly documented, and such documentation SHOULD include
   the disposition of any longer tags (for example, whether an error
   value is generated or the language tag is truncated).  This limit
   SHOULD be at least 42 characters in length.  If truncation is
   permitted it MUST NOT permit a subtag to be divided.  A protocol that
   allows tags to be truncated at an arbitrary limit, without giving any
   indication of what that limit is, has the potential for causing harm
   by changing the meaning of tags in substantial ways.  Here is how the
   42-character length of the longest generated tag is derived:

   language      = 3
   extlang1      = 4 (each subsequent subtag includes '-')
   extlang2      = 4 (unlikely: needs prefix="language-extlang1")
   extlang3      = 4 (extremely unlikely)
   script        = 5 (must not be suppressed)
   region        = 4 (UN M.49; ISO 3166 requires 3)
   variant1      = 9 (must have language as a prefix)
   variant2      = 9 (must have language-variant1 as a prefix)

   total         = 42 characters

             Figure 2: Derivation of the Limit on Tag Length

   Since language tags may be truncated by an application or protocol
   that limits tag sizes, when choosing language tags users and
   applications SHOULD avoid adding subtags that add no distinguishing
   value and MUST follow fields in the registry (such as Prefix and
   Suppress-Script) that limit subtag composition.  See Section 4.1.

   Applications or protocols that must truncate a tag MUST do so by
   progressively removing subtags along with their preceding "-" from
   the right side of the language tag until the tag is short enough for
   the given buffer.  If the resulting tag ends with a single-character
   subtag, that subtag and its preceding "-" MUST also be removed.  For
   example, when shortening the subtag sequence "en-a-abc-x-private1-
   private2", the first truncation produces "en-a-abc-x-private1" and
   the second truncation produces "en-a-abc".
--

Addison P. Phillips
Globalization Architect, Quest Software
Chair, W3C Internationalization Core Working Group

Internationalization is not a feature.
It is an architecture.



------------------------------------------------------------------------

_______________________________________________
Ltru mailing list
Ltru at lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




_______________________________________________
Ltru mailing list
Ltru at lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.