Leif opined: > > > > It means: it is not linked to "macrolanguage", a feature of ISO > > 639-3. It is strictly recognition that past tagging practice > > has used "zh-*" and "sgn-*". It has nothing to do with anything > > else. > > So the currently grandfathered "zh-cmn" etc will remain > grandfatherd. No. It will be reclassified as redundant under this proposal. > The tagger will find "zh" and "cmn" separately, in > order to construct "zh-cmn". Yes. > But when something has been tagged > "zh-cmn", then it may be treated either as 'zh' with an extlang, > or as an atomic tag. When treated as atomic, it is in reality > treated as one of the grandfathered tags -- as long as it is a > grandfathered tag. No. It isn't "in reality treated as one of the grandfathered tags". What I meant byFrom ltru-bounces at ietf.org Fri May 30 09:52:24 2008 Return-Path: <ltru-bounces at ietf.org> X-Original-To: ltru-archive at megatron.ietf.org Delivered-To: ietfarch-ltru-archive at core3.amsl.com Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A36FB28C227; Fri, 30 May 2008 09:52:24 -0700 (PDT) X-Original-To: ltru at core3.amsl.com Delivered-To: ltru at core3.amsl.com Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 66E173A6823 for <ltru at core3.amsl.com>; Fri, 30 May 2008 09:52:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at amsl.com X-Spam-Flag: NO X-Spam-Score: -106.599 X-Spam-Level: X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100] Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H2E7Q-m8HBUk for <ltru at core3.amsl.com>; Fri, 30 May 2008 09:52:17 -0700 (PDT) Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by core3.amsl.com (Postfix) with ESMTP id C218D3A67DD for <ltru at ietf.org>; Fri, 30 May 2008 09:49:42 -0700 (PDT) X-IronPort-AV: E=Sophos;i="4.25,630,1199664000"; d="scan'208";a="44385092" Received: from smtp-in-5102.iad5.amazon.com ([10.218.9.29]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 30 May 2008 16:49:39 +0000 Received: from ex-hub-4101.ant.amazon.com (ex-hub-4101.ant.amazon.com [10.248.163.22]) by smtp-in-5102.iad5.amazon.com (8.12.11/8.12.11) with ESMTP id m4UGnc6l018208 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL); Fri, 30 May 2008 16:49:39 GMT Received: from EX-SEA5-D.ant.amazon.com ([10.248.163.28]) by ex-hub-4101.ant.amazon.com ([10.248.163.22]) with mapi; Fri, 30 May 2008 09:49:38 -0700 From: "Phillips, Addison" <addison at amazon.com> To: Leif Halvard Silli <lhs at malform.no> Date: Fri, 30 May 2008 09:49:35 -0700 Thread-Topic: [Ltru] a modest proposal... Thread-Index: AcjCVjMyqbfAFLnIQUmdv0tmmDVLRwADvcNw Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA013A90FE4F at EX-SEA5-D.ant.amazon.com> References: <4D25F22093241741BC1D0EEBC2DBB1DA013A84C706 at EX-SEA5-D.ant.amazon.com> <483F43C2.50905 at malform.no> <4D25F22093241741BC1D0EEBC2DBB1DA013A90F954 at EX-SEA5-D.ant.amazon.com> <483FFC2D.9060703 at malform.no> In-Reply-To: <483FFC2D.9060703 at malform.no> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US MIME-Version: 1.0 Cc: LTRU Working Group <ltru at ietf.org> Subject: Re: [Ltru] a modest proposal... X-BeenThere: ltru at ietf.org X-Mailman-Version: 2.1.9 Precedence: list List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org> List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request at ietf.org?subject=unsubscribe> List-Archive: <http://www.ietf.org/pipermail/ltru> List-Post: <mailto:ltru at ietf.org> List-Help: <mailto:ltru-request at ietf.org?subject=help> List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request at ietf.org?subject=subscribe> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ltru-bounces at ietf.org Errors-To: ltru-bounces at ietf.org Leif opined: > > > > It means: it is not linked to "macrolanguage", a feature of ISO > > 639-3. It is strictly recognition that past tagging practice > > has used "zh-*" and "sgn-*". It has nothing to do with anything > > else. > > So the currently grandfathered "zh-cmn" etc will remain > grandfatherd. No. It will be reclassified as redundant under this proposal. > The tagger will find "zh" and "cmn" separately, in > order to construct "zh-cmn". Yes. > But when something has been tagged > "zh-cmn", then it may be treated either as 'zh' with an extlang, > or as an atomic tag. When treated as atomic, it is in reality > treated as one of the grandfathered tags -- as long as it is a > grandfathered tag. No. It isn't "in reality treated as one of the grandfathered tags". What I meant by atomic atomic is that a sequence such as "zh-cmn" could be treated as if it were a single subtag. It is not required to be treated this way. But it is permitted. Thus the tag "zh-cmn-Hans-CN" contains four subtags (zh, cmn, Hans, CN), but some operations (notably the Lookup algorithm in RFC 4647) could treat the "zh-cmn" sequence as if it were a single subtag meaning "Mandarin Chinese". > > > You are proposing having TWO methods tagging the same language > > variety, an idea that is antithetical to language tagging and > > which presents more problems than it solves > > > I meant that zh-* should be allowed not for strict compatibility > reasons, as you said, but in order to tag macrolanguage > relationship. Still one method, though. Of course, this means that > sgn-* would have to be allowed for another reason -- for the > compatibility reason. I know that's what *you* mean, but in my proposal we are cherry-picking 'zh' and the cherry-picking has nothing to do with its status as a macrolanguage. We are picking it purely because people have a long-standing association between the subtag 'zh' and the various Chinese languages and because those same people think that maintaining this tag-level relationship is of some value. I think you might be ascribing too much meaning to the concept of macrolanguage. Macrolanguage records a particular relationship that exists between language *codes* assigned by ISO 639, which is that some language codes assigned by ISO 639-3 are encompassed by codes assigned previously to a broader range of languages. The macrolanguage has little to do with the *actual* linguistic relationship between the encompassed languages (save that the association tends to exist because the languages are related somehow). Many languages that do not have a macrolanguage mapping are actually more closely related or mutually intelligible than some pairs of macrolanguage related languages. > > The text below serves as background for the following question: Is > your proposal, that the language production might be read as > atomic, only relevant for the Chinese tags? Or did you also have > 'ar-*' and 'sgn-*' in mind? :: sigh :: My proposal was to allow/permit *some* processes to treat the primary-extlang combination as if it were atomic in *some* circumstances. In particular, I had in mind the lookup algorithm, where the fallback would be permitted to go like this: zh-yue-Hans-CN zh-yue-Hans zh-yue (default) Or like this: zh-yue-Hans-CN zh-yue-Hans zh-yue zh-Hans-CN zh-Hans zh (default) Whereas the current behavior (which applies even to grandfathered tags) is: zh-yue-Hans-CN zh-yue-Hans zh-yue zh (default) > > Background: > > The macrolangauge field came up, as I understood it, because there > were no extlang. In Doug's draft, all encompassed language subtags > have that field. Except the grandfathered tags "zh-cmn" etc. The macrolanguage field in the registry is an informational field determined solely by the ISO 639-3 standard. All encompassed subtags have it, regardless of whether they would be permitted as extlang. *IF* we restore extlang, the grandfathered items that match extlang would become redundant. However, under the current draft (no-extlang), these are just grandfathered tags, similar to "no-nyn" and "i-enochian". In all cases, these tags would be deprecated in favor of non-atomic subtag sequences. > > But my question is: What macrolanguge information comes with > 'zh-cmn' when it is read as an atomic? You are confusing processing the tag and the meaning of specific subtags. > The answer should be, that > when read as an atomic --- grandfathered zh-cmn -- the registry > says that it is a synonym of 'cmn'. The 'cmn' then has a > Macrolanguaga field which might be used by the application in > order to find 'zh' etc. I think you are making a series of possibly-erroneous assumptions here. First, any extlangs will have a field called "Prefix" that indicates what subtag they are required to follow. Thus, 'cmn' and its extlang kin will have a Prefix field with the value 'zh'. There is no need to is that a sequence such as "zh-cmn" could be treated as if it were a single subtag. It is not required to be treated this way. But it is permitted. Thus the tag "zh-cmn-Hans-CN" contains four subtags (zh, cmn, Hans, CN), but some operations (notably the Lookup algorithm in RFC 4647) could treat the "zh-cmn" sequence as if it were a single subtag meaning "Mandarin Chinese". > > > You are proposing having TWO methods tagging the same language > > variety, an idea that is antithetical to language tagging and > > which presents more problems than it solves > > > I meant that zh-* should be allowed not for strict compatibility > reasons, as you said, but in order to tag macrolanguage > relationship. Still one method, though. Of course, this means that > sgn-* would have to be allowed for another reason -- for the > compatibility reason. I know that's what *you* mean, but in my proposal we are cherry-picking 'zh' and the cherry-picking has nothing to do with its status as a macrolanguage. We are picking it purely because people have a long-standing association between the subtag 'zh' and the various Chinese languages and because those same people think that maintaining this tag-level relationship is of some value. I think you might be ascribing too much meaning to the concept of macrolanguage. Macrolanguage records a particular relationship that exists between language *codes* assigned by ISO 639, which is that some language codes assigned by ISO 639-3 are encompassed by codes assigned previously to a broader range of languages. The macrolanguage has little to do with the *actual* linguistic relationship between the encompassed languages (save that the association tends to exist because the languages are related somehow). Many languages that do not have a macrolanguage mapping are actually more closely related or mutually intelligible than some pairs of macrolanguage related languages. > > The text below serves as background for the following question: Is > your proposal, that the language production might be read as > atomic, only relevant for the Chinese tags? Or did you also have > 'ar-*' and 'sgn-*' in mind? :: sigh :: My proposal was to allow/permit *some* processes to treat the primary-extlang combination as if it were atomic in *some* circumstances. In particular, I had in mind the lookup algorithm, where the fallback would be permitted to go like this: zh-yue-Hans-CN zh-yue-Hans zh-yue (default) Or like this: zh-yue-Hans-CN zh-yue-Hans zh-yue zh-Hans-CN zh-Hans zh (default) Whereas the current behavior (which applies even to grandfathered tags) is: zh-yue-Hans-CN zh-yue-Hans zh-yue zh (default) > > Background: > > The macrolangauge field came up, as I understood it, because there > were no extlang. In Doug's draft, all encompassed language subtags > have that field. Except the grandfathered tags "zh-cmn" etc. The macrolanguage field in the registry is an informational field determined solely by the ISO 639-3 standard. All encompassed subtags have it, regardless of whether they would be permitted as extlang. *IF* we restore extlang, the grandfathered items that match extlang would become redundant. However, under the current draft (no-extlang), these are just grandfathered tags, similar to "no-nyn" and "i-enochian". In all cases, these tags would be deprecated in favor of non-atomic subtag sequences. > > But my question is: What macrolanguge information comes with > 'zh-cmn' when it is read as an atomic? You are confusing processing the tag and the meaning of specific subtags. > The answer should be, that > when read as an atomic --- grandfathered zh-cmn -- the registry > says that it is a synonym of 'cmn'. The 'cmn' then has a > Macrolanguaga field which might be used by the application in > order to find 'zh' etc. I think you are making a series of possibly-erroneous assumptions here. First, any extlangs will have a field called "Prefix" that indicates what subtag they are required to follow. Thus, 'cmn' and its extlang kin will have a Prefix field with the value 'zh'. There is no need to use the use the Macrolanguage field to find this out, nor any dependency on any grandfathered tag. Similarly a subtag such as 'ase' (American Sign Language) could have a prefix of 'sgn'. Some non-extlangs will have a Macrolanguage field. For example, 'nn' (Nynorsk) would have one, even though it is not an extlang and has no Prefix. Secondly, the registry will NOT say that "zh-cmn" (considered as a redundant tag) is a synonym for 'cmn'. Redundant tags in the registry can be ignored because their meaning is otherwise assigned by individual subtags. If we were to keep the current setup (no extlang), the tag "zh-cmn" would remain grandfathered and would be deprecated in favor of 'cmn'. But this is *not* the same thing as being a synonym for it. > > However, if you read 'ar-*' or 'sgn-*' as an atomic, then there is > no grandfathered tags that they match. And thus no place to find > the macrolanguage information (for 'ar-*' -- unrelevant to 'sgn-*'). Grandfathered tags have nothing to do with this proposal. The fact that some would become redundant is a Good Thing. In any event, many more grandfathered tags would be either redundant or deprecated with any of the proposals on the table at present. Being valid (which these tags will remain forever) is not the same as being a good tagging choice (which most of the affected grandfathered tags will cease to be; redundant tags, of course, could be removed from the registry as they are merely historical curiosities). Addison _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru Macrolanguage field to find this out, nor any dependency on any grandfathered tag. Similarly a subtag such as 'ase' (American Sign Language) could have a prefix of 'sgn'. Some non-extlangs will have a Macrolanguage field. For example, 'nn' (Nynorsk) would have one, even though it is not an extlang and has no Prefix. Secondly, the registry will NOT say that "zh-cmn" (considered as a redundant tag) is a synonym for 'cmn'. Redundant tags in the registry can be ignored because their meaning is otherwise assigned by individual subtags. If we were to keep the current setup (no extlang), the tag "zh-cmn" would remain grandfathered and would be deprecated in favor of 'cmn'. But this is *not* the same thing as being a synonym for it. > > However, if you read 'ar-*' or 'sgn-*' as an atomic, then there is > no grandfathered tags that they match. And thus no place to find > the macrolanguage information (for 'ar-*' -- unrelevant to 'sgn-*'). Grandfathered tags have nothing to do with this proposal. The fact that some would become redundant is a Good Thing. In any event, many more grandfathered tags would be either redundant or deprecated with any of the proposals on the table at present. Being valid (which these tags will remain forever) is not the same as being a good tagging choice (which most of the affected grandfathered tags will cease to be; redundant tags, of course, could be removed from the registry as they are merely historical curiosities). Addison _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.