[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] solving the chinese thing... some text (part 1)



> >
> > <t> Historically subtags for the various encompassed Chinese
> > languages were not available, so  content in these languages have
> > historically used either 'zh' or a tag (now grandfathered) beginning
> > with 'zh'.
>
> Not necessarily true. Definitely not true in the data I've seen.
> Indicating that Cantonese docs typically use "zh" is different from the
> statement that the bulk of "zh" documents are Mandarin.

Ignoring illegal tags for a minute, items that were correctly tagged as being in Cantonese (or some other Chinese encompassed language) used a tag starting with 'zh' because those are the grandfathered ones available.

> My experience is
> that the users who need to specify Cantonese most often make up an
> illegal
> tag. Not saying that's what we should recommend, but I believe my
> experience does not support the statement as worded.

Nothing we can do about illegal tags. Not much we can say about them either, except "don’t do that".

> Typo on
> "deliniate" FWIW.

Thanks.

To address the above comments, I rewrote that paragraph to read:

--
<t>Historically subtags for the various encompassed Chinese languages were not available. To overcome this deficiency, some content in these languages used grandfathered tags registered for specific languages (such as "zh-yue" for Cantonese or "zh-xiang" for Xiang) or used regional subtag combinations such as "zh-HK" (Chinese, Hong Kong) to imply a specific encompassed language.  The grandfathered tags were deprecated in the registry upon adoption of this document, but remain valid for use. The other tag combinations remain valid as well, but their meaning is not sufficiently precise. Instead, tags using the encompassed language subtags  are preferred. Since implementations might not be able to associate the grandfathered tags with these modern equivalents, implementations are encouraged to canonicalize tags for comparison purposes.</t>
--

> > <t> Finally, macrolanguage information can be usefully applied when
> > searching for content or when providing fallbacks in language
> > negotiation.  For example, the information that 'yue' has a
> > macrolangauge of 'zh' could be used in the Lookup algorithm (<xref
> > target="RFC4647"></xref>) to include the range "zh-Hans-CN" as a
> > fallback from a request for "yue-Hans-CN" (preserving the script and
> > region information) even though the user did not specify "zh-Hans-
> > CN" in their request. For the Chinese languages in particular, this
> > practice of conflating the encompassed langauge with its
> > macrolanguage is RECOMMENDED.</t>
>
> Add "with written content" to the end of the last sentence? Useless
> with
> audio.

Not necessarily. See Peter's scenario for older content classifications... or for mixed-media content... or even because people "make up their own tagging rules" :-). It wouldn't hurt if people completely eschew the use of 'zh-*' for their audio. But my guess is that they won't all "get the memo", so even with audio content it may help. Finding "some Chinese" after you have failed to find any Cantonese might be a better choice for some applications than finding, say, French instead.

Addison
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.