In terms of process and data integrity, I'm with Doug on this: don't conflate a data category; if you want pointers in both directions, have separate fields, or just treat your "true" data point as an informative annotation, adding "This is a macrolanguage entry" (or something like that) as a comment. Peter > -----Original Message----- > From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On Behalf Of > Doug Ewell > Sent: Saturday, May 10, 2008 11:48 AM > To: LTRU Working Group > Subject: Re: [Ltru] I'm really confused by chinese in 3066bis > > Debbie Garside <debbie at ictmarketing dot co dot uk> wrote: > > > Yes, I could see that you could get a reverse match from the > languages > > containing the Macrolanguage field but I still feel that the actual > > macrolanguage should be labelled. If only to stop people from > > assuming that they should tag all Arabic as 'ar'. If I was tagging a > > document and looking for the right code I might well do a search for > > "Arabic" or "Chinese". The result would not identify the subtag as a > > Macrolanguages and thus I may not look any further. > > > > For the sake of 50 or so fields, I think it is worth putting a > > Macrolangauge field in the Registry - for humans to use :-) > > I see three particular problems with using the Macrolanguage field to > mean two opposite concepts, "this language HAS a macrolanguage" and > "this language IS a macrolanguage": > > 1. The software problem. In beginning programming you learn to use > values like -1 to mean "not a valid value" or "end of list" or similar. > This works OK when the real values are non-negative, such as the > population of a town, but not so well when the values could be negative, > such as its elevation. Also, other parts of the software have to know > to treat -1 as a special case, not like an ordinary value. Sometimes > this gets confusing and you see -1 pop up in places it shouldn't. In > intermediate programming you learn to stop doing this, and represent > special situations in other ways > > Similarly, it is possible to imagine software looking fruitlessly for a > language subtag 'True' that is the macrolanguage of 'ar' instead of > remembering that 'True' is a special case. Remember that 4-letter > language subtags, though "reserved for future use" (for some standard, > I > forget which ;-), are valid in the ABNF, and that the casing of subtags > doesn't matter, though we're supposed to get it right in the Registry. > > 2. The human problem. I can easily imagine readers of the Registry > becoming confused over this dual usage of a single field. They might > wonder why other subtags don't have "Macrolanguage: False", or why 'ar' > is considered the opposite of 'True'. They might also experience > confusion similar to problem 1, as 'tru' is a valid RFC 4646bis > language > subtag for Turoyo. (At least the proposal wasn't to use "Macrolanguage: > yes", thus causing instant confusion with Yeskwa.) > > 3. The maintenance problem. Technically the 'True' value is redundant > information; it can be derived from the records for the encompassed > languages. Any time you have to maintain redundant information, > especially in a different place, there is a much greater chance of > making a human mistake. > > Suppose your friendly team of Designated Experts, presented with a new > batch of several dozen ISO 639-3 changes including a new classification > of macrolanguage 'qma' with encompassed languages 'qea' and 'qeb', > remembers to put "Macrolanguage: qma" on the records for 'qea' and > 'qeb' > but forgets to put "Macrolanguage: True" on the record for 'qma'. > Suppose further that the ietf-languages list didn't catch this during > the 1-week review. We would end up with an internal inconsistency > within the Registry. Gosh, your friendly Experts would hate that. > They > would also hate the inevitable e-mail flames about "process failure" > and > the possibility of removal or replacement at the IESG's discretion. > > If it is really felt necessary to indicate that 'ar' is a macrolanguage > in both ways, with a special value on the macrolanguage record as well > as the encompassed languages (ignoring problem 3), then we should have > two fields, something like Is-A-Macrolanguage and My-Macrolanguage-Is. > (Suggestions for better names are solicited.) As with the Comments > field, I don't support overloading a single field for fundamentally > different purposes. > > -- > Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 > http://www.ewellic.org > http://www1.ietf.org/html.charters/ltru-charter.html > http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ > > _______________________________________________ > Ltru mailing list > Ltru at ietf.org > https://www.ietf.org/mailman/listinfo/ltru _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.