[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Private Use Tags



Dear Dylan,
this request makes a lot of sense. There are many issues there. A few ones.

1. you characterise in a way the use of a document. I am not sure this directly fits with the characterisation of a language. But it characterise a relation channel. I mean by that the way the document is intended to be received, send or exchanged, and from there classified. Today the proposed Draft leaves this undefined in at least two ways:

- first paragraph. It says "Human beings on our planet have, past and present, used a number of languages. There are many reasons why one would want to identify the language used when presenting or requesting information.". One could say that "used" may relate to exchanged, "presenting" to send and "requesting" to received, with some variation because for example requesting does no mean that it was received.

- part 2. "The language tag always defines a language as used (which includes being spoken, written, signed, or otherwise signaled) by human beings for communication of information to other human beings. Computer languages such as programming languages are explicitly excluded."

The problem here is that a language is not defined (what it is? how is it identified? etc.) however the langtag is normative of that something. The usage of the proposed langtags can only be subjective (the perception of the users) and the discussed language to be rather undefined concepts. Your proposition creates languages _values_ (the instanciation of the Kentucky press). I am not sure you can really qualify it within the Draft framework. This is because it is filtered by a media (Kentucky press) and not by a speakers community (unless you mean the readers - or the authors? - of the Kentucky press). Please recall that they do not want to accept man/computer and computer/computer languages. This obviously creates a classification problem with StarWars, what is H2D2 speaking? and Yoda who is not a human being? The Japanese Fair of Robotics this years, shown Droïds interrelating in Japanese or in English. There is also a vacuum for computer generated texts, alarms, etc. Would you introduce "r" for Robots? but that would oppose the spirit of the Draft?

2. you want to permit organisations and persons to define their personal name space (cf. John Klensin recent Draft on IANA) and define their own format. This was recently proposed and denied. The first problem you meet with this is size of the namespace you need and its structure. You consider that Microsoft would register "mcrsoft", why that? Mr. Sungil Yoon who owns McrSoft.com has the right to use it. You will say that "Microsoft" is longer than 8alpha. Right, but RFC 1766 said that you cannot change that and is to be respected by consensus of this WG. Obviously you can object to this consensus, I will too, others will probably too and this will not be anymore a consensus. But short of that, only user owning universal rights (every class, every country) can claim a tag (like "mercedes" or "cocacola") otherwise we have conflicts.

I note that RFC 2860 can also create a problem for the Draft. Your naming part is by essence an ICANN part of the IANA: the Registrar and Examiner must be designated by the ICANN BoD and appeals probably subject to GAC. This should be reflected in the Draft. Is this really what you want?

Another problem you may not have investigated is that Microsoft could have different branches and needs, for example "microsoft.corp", "microsoft.us", etc. the languages spoken in its different branches being certainly different. This is why we have introduced three warnings you can find on http://rfc3066.org:

- there is no subtag size limit in the x-tags part
- the "." and the ":" are accepted characters, "." introducing a comment or an additional part and ":" permitting to use URNs? For us "x-en.microsoft.us-Latn-de" qualifies the language of a Mr. Gates visiting Germany.

But please note that, if this proposition, initially presented by F. Charles, cannot conflict with any previous format since it is a private area, it is not supported by this WG, what is deemed to have consensually opposed, (one or two objections).

3. you consider the notions of referent (ex.: commonly accepted reading level -L-6) and context (to know about the Kentucky cities and life). These are two levels which are very important to the support of a relation (together with their dates - as per ISO 11179 - to know which version is to be used). Other referents can be Dictionaries, Grammars, publishers, etc. Other contexts can be style, mimics, accents, etc. These notions are most probably too complex for the Draft and can be multiplied and need priorities in case of conflicts when two referential systems have different descriptions. Please accept that the Draft only supports one single mode (script) and has no provision (yet) to support other modes.

All this can/should certainly be supported. But this would call for a general framework introduction of language support (BCP 47) within the Internet architecture, as a continuation/extension of the RFC 3066. In this case the Draft would be an application of this framework. The sentence "This document replaces RFC 3066" should then be replaced by "this document complements RFC 3066": this is a part of the debate over the Charter, this WG consensus does seem to want to engage.

4. you say you do not consider that using domain names would be adequate, but you do not document it. This is one of the solutions to support individual/avatars and contexts grids. I would therefore be interested you document your position. This is a point which is hotly debated in some ISO committees, and belongs to what is qualified as the "pulverisation" of a user-centric Internet (i.e. its ultimate granularity). Work currently carried one coreboxes and OPES (WG-OPES) go down to this degree and even below (the individual relation level and context: the way you speak when you are with someone else specific, under some identified circumstances. ex: the language you use with a cop who stopped you on the road).

Thank you for this interesting thinking.
jfc



At 00:04 06/07/2005, Dylan N. Pierce wrote:
(This is a re-send of an e-mail I originally sent to the authors of a previous draft; I have since been educated as to the proper way to comment.)

Dear Mr. Phillips and Mr. Davis,

First, please forgive me if I'm not following proper procedure in commenting on this draft; while I do have a strong programmer's interest in this standard, I admit that I'm not typically a participant in these procedures and haven't thoroughly educated myself on the policies for submitting comments.

I would like to recommend an addition to this draft, for which I think I can make a rather compelling case based on hypothetical but quite reasonable scenarios. Personally, I hope very much that your draft becomes a standard, as the problems with a canonical parsing of current RFC 3066 language tags are well-known and bothersome to developers everywhere. Your draft strikes me as an excellent way to finally standardize the practice in a way which will be accessible to all developers without having to investigate thirty different standards and documents from ten different organizations.

Regarding Section 3.4 on extensions and extension namespace: You already have here a mechanism in place for extending this specification. I would like to suggest an extension which should probably be incorporated into the main specification. I believe you should define an "organization convention" extension for use by private companies and organizations for their own purposes.

I realize that a "private use" extension is already defined in section 2.2.7. However, I maintain that the private use extension is not sufficient for potential development and interdevelopment among important organizations, as there is no way a parsing agent could assume anything significant about the tags which follow. And yet, the registration of 3.4 extensions is also insufficient because, frankly, you'll rapidly run out of letters if you make a sincere effort to define namespace for private companies and organizations.

Let's take a concrete example. Let's say that the American Library Association (ALA) decides to define an extension to help them classify books by reading level. As your specification stands, they have two choices: they can register a 3.4 extension (we'll say they register "L") and then use their subtags as follows:

en-US-L-g6: A book written in English as spoken in the United States at the sixth-grade reading level.

The ALA would have excellent reasons for wanting such a tag, as it would greatly facilitate the database querying and transfer of material to public schools.

However, we see the first problem: the ALA has their tag, which many schools would use. Then, Associated Press would want their tag to indicate regional assumptions. We'll give them "P" (for "press"):

en-US-P-ky: An article written in English as spoken in the United States which assumes readers are already familiar with names, cities, politics, etc., in Kentucky. (They would use this to distribute versions to Kentucky press where they don't have to explain that Frankfurt is the capital, distinguishing them from national or international versions which would make no such assumption and explicitly specify that Frankfurt is the capital.)

If we keep up like this, as I mentioned, we'll rapidly run out of singleton letters. Everyone will want one, some for valid reasons, others for silly reasons, and then your registration authority would be in the unenviable position of having to make value judgments regarding what is valid and what is silly, given such limited real estate.

Furthermore, you'll be putting the organizations themselves in a difficult position. For example, if the ALA decides to modify their convention, this is something that is only of interest to them and the people who use their specification. However, in order to make their own internal changes, they will technically have to go through the entire process of revising a stable specification through the registration authority (according to 3.4, which requires stability and canonical representation), something which is never recommendable.

And finally, parsing agents which have no interest in the ALA's tag (which will be most of them) will nonetheless have the burden of checking conformance.

If we take the other approach, and say, "We have the 'x' tag for private use. The ALA and AP can take that tag and follow it up however they want," then we're creating another problem. All of the parsing agents which do have an interest in those tags cannot be guaranteed that they mean what they think they mean.

For example, if the ALA decides to go with:

en-US-x-ala-g6

But subsequently the Associate Press decides that their private tag "x-ala" means articles of interest to Alabamans, then what's the ALA do to when they want to classify articles written by the AP? The problem is that parsing user agents will be unable to assume anything about the tag that follows, and once a conflict occurs, both tags become either useless, or subject to the type of interpretation that a human might perform easily but a machine cannot.

The solution is simply to define an organizational namespace. We take a random tag--we'll say "P" for private--and then allow companies and organizations to register their own namespace. Everything that follows their namespace tag is then interpreted according to their standard, whatever that may be. For example, the ALA would register "ala," the AP would register "ap," Microsoft would register "mcrsoft," Adobe would register "adobe" and so on.

Then, anyone seeing a tag like this:

en-US-P-ala-g6

could know unambiguously that whatever follows the P-ala is to be interpreted by the ALA's own convention, whatever that might be. Each registering organization could then be responsible for the stability and canonical representations of their own namespace without affecting the stability of the specification as a whole.

Parsing agents which are not interested in the AP's tags simply knows to ignore anything after the "P" tag that isn't an organization in which it has an interest. Parsing agents that are interested can now know with assurance that the information is what they're looking for. Companies and organizations can establish their own standards which can easily evolve to suit their needs. Private companies can establish compatibility standards between themselves which won't affect the specification as a whole.

This could be infinitely extensible merely by setting aside one of the organizational tags to mean "check the next set." For example, if the American Library association registers "ala" as above, and then later the Association of Libertarians and Anarchists shows up, finds that all the mnemonic representations of their name are already used and there's not much space left on the registery (and with 368 alphanumeric possibilities, that's not likely, but let's pretend), they could define their namespace as "set2-ala" (assuming we've already decided that "set2" is the tag when means "check the next set").

This allows all companies and organizations which have a need to define their own namespaces and then use them as the needs of their particular domain indicate in a way that is nonetheless unambiguously established for parsing agents which can then make error-free decisions about whether or not the information which follows is useful to their needs, all done without sacrificing the stability of the main specification.

This is the extent of my speculation on the issue. I did consider the possibility of using Java-package-name-like identifiers tied to domain registration, so that Microsoft could have the "com-microsoft" tag and the ALA could have the "org-ala" tag, but this would end up violating the eight-character rule and allow just any yahoo with a website to include whatever he sees fit (en-US-com-sexychicks-38D comes to mind), which I don't think is a desirable solution at all.

If you have found this comment at all useful, I would appreciate hearing back.

Sincerely,
Dylan N. Pierce
IT Coordinator, TykeTek

TykeTek/Diapositivas Gloria
Salvador Quevedo y Zubieta #821 Int. 6
Col. la Perla
C.P. 44360 Guadalajara, Jal.
MEXICO

E-Mail: dylanpierce at megared.net.mx
Telephone: +52 (33) 3617.3660
Cellular: +52 (33) 1149.7057

_______________________________________________
Ltru mailing list
Ltru at lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



_______________________________________________
Ltru mailing list
Ltru at lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.