[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] I'm really confused by chinese in 3066bis



In terms of process and data integrity, I'm with Doug on this: don't conflate a data category; if you want pointers in both directions, have separate fields, or just treat your "true" data point as an informative annotation, adding "This is a macrolanguage entry" (or something like that) as a comment.


Peter

> -----Original Message-----
> From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On Behalf Of
> Doug Ewell
> Sent: Saturday, May 10, 2008 11:48 AM
> To: LTRU Working Group
> Subject: Re: [Ltru] I'm really confused by chinese in 3066bis
>
> Debbie Garside <debbie at ictmarketing dot co dot uk> wrote:
>
> > Yes, I could see that you could get a reverse match from the
> languages
> > containing the Macrolanguage field but I still feel that the actual
> > macrolanguage should be labelled.  If only to stop people from
> > assuming that they should tag all Arabic as 'ar'.  If I was tagging a
> > document and looking for the right code I might well do a search for
> > "Arabic" or "Chinese".  The result would not identify the subtag as a
> > Macrolanguages and thus I may not look any further.
> >
> > For the sake of 50 or so fields, I think it is worth putting a
> > Macrolangauge field in the Registry - for humans to use :-)
>
> I see three particular problems with using the Macrolanguage field to
> mean two opposite concepts, "this language HAS a macrolanguage" and
> "this language IS a macrolanguage":
>
> 1.  The software problem.  In beginning programming you learn to use
> values like -1 to mean "not a valid value" or "end of list" or similar.
> This works OK when the real values are non-negative, such as the
> population of a town, but not so well when the values could be negative,
> such as its elevation.  Also, other parts of the software have to know
> to treat -1 as a special case, not like an ordinary value.  Sometimes
> this gets confusing and you see -1 pop up in places it shouldn't.  In
> intermediate programming you learn to stop doing this, and represent
> special situations in other ways
>
> Similarly, it is possible to imagine software looking fruitlessly for a
> language subtag 'True' that is the macrolanguage of 'ar' instead of
> remembering that 'True' is a special case.  Remember that 4-letter
> language subtags, though "reserved for future use" (for some standard,
> I
> forget which ;-), are valid in the ABNF, and that the casing of subtags
> doesn't matter, though we're supposed to get it right in the Registry.
>
> 2.  The human problem.  I can easily imagine readers of the Registry
> becoming confused over this dual usage of a single field.  They might
> wonder why other subtags don't have "Macrolanguage: False", or why 'ar'
> is considered the opposite of 'True'.  They might also experience
> confusion similar to problem 1, as 'tru' is a valid RFC 4646bis
> language
> subtag for Turoyo.  (At least the proposal wasn't to use "Macrolanguage:
> yes", thus causing instant confusion with Yeskwa.)
>
> 3.  The maintenance problem.  Technically the 'True' value is redundant
> information; it can be derived from the records for the encompassed
> languages.  Any time you have to maintain redundant information,
> especially in a different place, there is a much greater chance of
> making a human mistake.
>
> Suppose your friendly team of Designated Experts, presented with a new
> batch of several dozen ISO 639-3 changes including a new classification
> of macrolanguage 'qma' with encompassed languages 'qea' and 'qeb',
> remembers to put "Macrolanguage: qma" on the records for 'qea' and
> 'qeb'
> but forgets to put "Macrolanguage: True" on the record for 'qma'.
> Suppose further that the ietf-languages list didn't catch this during
> the 1-week review.  We would end up with an internal inconsistency
> within the Registry.  Gosh, your friendly Experts would hate that.
> They
> would also hate the inevitable e-mail flames about "process failure"
> and
> the possibility of removal or replacement at the IESG's discretion.
>
> If it is really felt necessary to indicate that 'ar' is a macrolanguage
> in both ways, with a special value on the macrolanguage record as well
> as the encompassed languages (ignoring problem 3), then we should have
> two fields, something like Is-A-Macrolanguage and My-Macrolanguage-Is.
> (Suggestions for better names are solicited.)  As with the Comments
> field, I don't support overloading a single field for fundamentally
> different purposes.
>
> --
> Doug Ewell  *  Arvada, Colorado, USA  *  RFC 4645  *  UTN #14
> http://www.ewellic.org
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
>
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.