> From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On Behalf Of > Martin Hosken > > <section title="Canonicalization of Language Tags" anchor="canonical"> > > I would propose adding (to draft-15): Presumably this would be in 16. > 6. Redundant script tags which are the same as the suppressed script > for a language are not present. IIUC, you're saying that a canonical tag won't have a script subtag if that subtag is given in the Suppress-Script field in the subtag record for the language subtag used in that tag. (E.g., suppose language subtag 'aa' has Suppress-Script: 'Bbbb'; then tags of the form "aa-(xxx-)Bbbb..." are not canonical.) Is that right? Two problems: first, you don't suggest how to produce a canonical equivalent for any given tag. Secondly, while "en-Latn" is not generally recommended, there may be application scenarios in which it is appropriate, perhaps used in contrast to something like "en-Hang", in which case the canonical form should include the normally-suppressed script subtag. > In addition, I would like to ask about fonipa and fonupa. Currently in > 4645 they have no prefix, which makes sense since they can be used with > any language. But they can only be used with Latn script. In fact, they > imply Latn script. So what should the canonical form of Eastern Bru in > IPA be? > > bru-Latn-fonipa > > or can we in some way indicate the implication of Latn by fonipa and > just use: > > bru-fonipa By that line of reasoning, the canonical form for "fr-1694acad" could be simply "1694acad" -- except that that's not even well formed. You could remove script or region subtags in deriving a canonical tag if they were specified as the prefix for a variant, but they would have to be the *only* prefix for that variant (else the canonical form wouldn't be deterministic), and you'd end up with a canonical tag that is not valid, and not likely the same as what is used anywhere else. Comparisons would not be any easier, and we couldn't suggest that these canonical forms actually be what gets generated and used in interchange. That is, if these got a Prefix field and you were eliminating the prefix. Alternatively, I suppose you could allow Suppress-Script or Suppress-Region fields for variant subtags and use that in deriving canonical tags, but that just means adding more machinery. Either way, I think this would just get complicated for no particular gain. My inclination would be to say that 'fonipa' should require a prefix of 'Latn' (probably specified as an extended language range "*-Latn"), and always require that script tag be included, even if that seems redundant. It's clear that, in many cases such as "fr-1694acad" or "sl-nedis", we already end up requiring some degree of redundancy in the tag; I don't see a problem doing that in these cases. > I would suggest that if we want to do this, then specifying a prefix in > the record for fonipa, of Latn would be sufficient although we may want > some wording somewhere along the lines of: > > Where the prefix tag for a variant consists of a script, that variant > implies the presence of that prefix and the prefix is required not to > exist in the canonical form of the tag. But there's the rub: we'd be saying in one place that prefix SHOULD be included in tags with the given variant, but in another place that they not be used in the canonical form, and that the caonical form SHOULD be used. Not very self-consistent. Peter _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.