[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] I'm really confused by chinese in 3066bis



> You can easily programme into implementations the scenarios for when a
> user
> has used an extlang as a primary language subtag and you don't need a
> type 2
> on the macrolanguage to say that it can be used as a standalone primary
> language subtag - that's covered by the text.

No, actually we can't "program implementations" without breaking existing "well-formed" implementations (which I think represents the majority of implementations).

Reliance on metadata in the registry is, of course, something you are familiar and comfortable with---ISO 639-6 relies on it. But BCP 47--and the matching schemes in particular--have historically eschewed such reliance. If we introduce reliance on the registry, it breaks the implementations that rely solely on tag structure. Arguments between extlangistas and non-extlangistas have greatly revolved around this particular problem.

If this were just a one-time thing, such as the grandfathered list, it might not be so bad, since we could give the list to hardcode in the RFC. But there is no indication that the macrolanguage list is closed or particularly stable (even if entries are stable, new ones will probably be created).

BCP 47 is not the perfect language identifier system that suits all needs everywhere. It is a compromise that suits many applications.

Of course, I'm also pretty sure that a "perfect" language identifier system is not likely to appear, since any such system must make choices about how to split or lump languages and thus will ill-serve some application.

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: Debbie Garside [mailto:debbie at ictmarketing.co.uk]
> Sent: Saturday, May 10, 2008 1:49 PM
> To: Phillips, Addison; 'Doug Ewell'; 'LTRU Working Group'
> Subject: RE: [Ltru] I'm really confused by chinese in 3066bis
>
> Actually, logically perhaps it should read:
>
> Subtag: zh
> Type: Macrolanguage
>
>
> and...
>
> Subtag: cmn
> Type: extlang
> Macrolanguage: zh;1
>
> Subtag: yue
> Type: extlang
> Macrolanguage: zh
>
> You can easily programme into implementations the scenarios for when a
> user
> has used an extlang as a primary language subtag and you don't need a
> type 2
> on the macrolanguage to say that it can be used as a standalone primary
> language subtag - that's covered by the text.
>
> Debbie
>
>
> > -----Original Message-----
> > From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On
> > Behalf Of Phillips, Addison
> > Sent: 10 May 2008 20:02
> > To: Doug Ewell; LTRU Working Group
> > Subject: Re: [Ltru] I'm really confused by chinese in 3066bis
> >
> > If we need to indicate that a language is a macrolanguage in
> > the registry, we should use a separate field. Presumably it
> > would be something like:
> >
> > Subtag: zh
> > Type: language
> > Encompasses: cmn, yue, wuu, ... etc.
> >
> > Or perhaps:
> >
> > Subtag: zh
> > Type: language
> > Encompasses: cmn
> > Encompasses: yue
> > Encompasses: wuu
> > ... etc ...
> >
> > And the encompasses languages would look like:
> >
> > Subtag: cmn
> > Type: language // or extlang
> > Macrolanguage: zh
> >
> > If we were to restore the use of 'extlang', perhaps the above
> > wouldn't be necessary, since enclosed languages would not be
> > of the same type and the 'cmn' record would look more like:
> >
> > Subtag: cmn
> > Type: extlang
> > Macrolanguage: zh
> > Prefix: zh
> >
> > ... but we would have some enclosed languages grandfathered
> > into the language slot:
> >
> > Subtag: nn
> > Type: language
> > Macrolanguage: no
> >
> > Addison
> >
> > Addison Phillips
> > Globalization Architect -- Lab126
> >
> > Internationalization is not a feature.
> > It is an architecture.
> >
> >
> > > -----Original Message-----
> > > From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org]
> > On Behalf
> > > Of Doug Ewell
> > > Sent: Saturday, May 10, 2008 11:48 AM
> > > To: LTRU Working Group
> > > Subject: Re: [Ltru] I'm really confused by chinese in 3066bis
> > >
> > > Debbie Garside <debbie at ictmarketing dot co dot uk> wrote:
> > >
> > > > Yes, I could see that you could get a reverse match from the
> > > languages
> > > > containing the Macrolanguage field but I still feel that
> > the actual
> > > > macrolanguage should be labelled.  If only to stop people from
> > > > assuming that they should tag all Arabic as 'ar'.  If I
> > was tagging
> > > > a document and looking for the right code I might well do
> > a search
> > > > for "Arabic" or "Chinese".  The result would not identify
> > the subtag
> > > > as a Macrolanguages and thus I may not look any further.
> > > >
> > > > For the sake of 50 or so fields, I think it is worth putting a
> > > > Macrolangauge field in the Registry - for humans to use :-)
> > >
> > > I see three particular problems with using the
> > Macrolanguage field to
> > > mean two opposite concepts, "this language HAS a macrolanguage" and
> > > "this language IS a macrolanguage":
> > >
> > > 1.  The software problem.  In beginning programming you
> > learn to use
> > > values like -1 to mean "not a valid value" or "end of list"
> > or similar.
> > > This works OK when the real values are non-negative, such as the
> > > population of a town, but not so well when the values could be
> > > negative, such as its elevation.  Also, other parts of the software
> > > have to know to treat -1 as a special case, not like an ordinary
> > > value.  Sometimes this gets confusing and you see -1 pop up
> > in places
> > > it shouldn't.  In intermediate programming you learn to stop doing
> > > this, and represent special situations in other ways
> > >
> > > Similarly, it is possible to imagine software looking
> > fruitlessly for
> > > a language subtag 'True' that is the macrolanguage of 'ar'
> > instead of
> > > remembering that 'True' is a special case.  Remember that 4-letter
> > > language subtags, though "reserved for future use" (for
> > some standard,
> > > I forget which ;-), are valid in the ABNF, and that the casing of
> > > subtags doesn't matter, though we're supposed to get it
> > right in the
> > > Registry.
> > >
> > > 2.  The human problem.  I can easily imagine readers of the
> > Registry
> > > becoming confused over this dual usage of a single field.
> > They might
> > > wonder why other subtags don't have "Macrolanguage: False",
> > or why 'ar'
> > > is considered the opposite of 'True'.  They might also experience
> > > confusion similar to problem 1, as 'tru' is a valid RFC 4646bis
> > > language subtag for Turoyo.  (At least the proposal wasn't to use
> > > "Macrolanguage:
> > > yes", thus causing instant confusion with Yeskwa.)
> > >
> > > 3.  The maintenance problem.  Technically the 'True' value is
> > > redundant information; it can be derived from the records for the
> > > encompassed languages.  Any time you have to maintain redundant
> > > information, especially in a different place, there is a
> > much greater
> > > chance of making a human mistake.
> > >
> > > Suppose your friendly team of Designated Experts, presented
> > with a new
> > > batch of several dozen ISO 639-3 changes including a new
> > > classification of macrolanguage 'qma' with encompassed
> > languages 'qea'
> > > and 'qeb', remembers to put "Macrolanguage: qma" on the records for
> > > 'qea' and 'qeb'
> > > but forgets to put "Macrolanguage: True" on the record for 'qma'.
> > > Suppose further that the ietf-languages list didn't catch
> > this during
> > > the 1-week review.  We would end up with an internal inconsistency
> > > within the Registry.  Gosh, your friendly Experts would hate that.
> > > They
> > > would also hate the inevitable e-mail flames about "process
> failure"
> > > and
> > > the possibility of removal or replacement at the IESG's discretion.
> > >
> > > If it is really felt necessary to indicate that 'ar' is a
> > > macrolanguage in both ways, with a special value on the
> > macrolanguage
> > > record as well as the encompassed languages (ignoring
> > problem 3), then
> > > we should have two fields, something like
> > Is-A-Macrolanguage and My-Macrolanguage-Is.
> > > (Suggestions for better names are solicited.)  As with the Comments
> > > field, I don't support overloading a single field for fundamentally
> > > different purposes.
> > >
> > > --
> > > Doug Ewell  *  Arvada, Colorado, USA  *  RFC 4645  *  UTN #14
> > > http://www.ewellic.org
> > > http://www1.ietf.org/html.charters/ltru-charter.html
> > > http://www.alvestrand.no/mailman/listinfo/ietf-languages  ^
> > >
> > > _______________________________________________
> > > Ltru mailing list
> > > Ltru at ietf.org
> > > https://www.ietf.org/mailman/listinfo/ltru
> > _______________________________________________
> > Ltru mailing list
> > Ltru at ietf.org
> > https://www.ietf.org/mailman/listinfo/ltru
> >
>
>
>

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.