[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] draft review



Gentlemen,
This is the review I made of the current draft : the text and the way it addresses the Charter. I accept there is a challenge in the charter, which is to reconcile two of its assigned challenges. Its authors chose to address one of them (to identify the role of the subtags in the langtag), while my approach is to start from the other (stability) to address both, in a more general way.

The result is that my review has only a very limited number of positive remarks. But that does not mean that I full oppose. It confirms the only two possibilities already offered by the ietf at ietf.org mailing list:

- either to accept the draft in the strict area quoted by the Charter (XML, HTML and CLDR) to describe the language human readers should know to read a tagged text.

- or to write another one, in a totally different perspective, to support the multilingual internet architectural consistency we need.

In any case the first solution would only be a temporary patch until the second one can support the world priority for a multilingual internet.

I only hope this will permit Addison Philips and Mark Davis to improve their text so it might make an RFC.




DRATF RELATED COMMENTS AND QUESTIONS

This review only concerns the points discussed in the proposed draft, not the points which are missing.

1. Abstract: "indicate the language" does that mean qualify (tell about) or define (show the way) or both? 2. Abstract: "information object": does that include programs, services, users communities 3. Abstract: "interchange". Is there a special reason to use the word "interchange" (quoted 13 times in the IETF RFC database) and not "exchanges" (38 times). What are the differences intended?

4. Introduction: the word identify/identifier is used 6 times, while indicate/indicating only 2 to avoid repetition of identify. Why not to use the word "identify" in Abstract (cf. question 1)? 5. Introduction: the introduction describes the use of the tag to name a menu of references (dictionaries) and documents that it is often necessary to document style related elements (dialect, orthography) and writing system. These are the 5 elements I want to see documented. 6. Introduction: documents that knowing the language is useful (qualifies) or required (defines) for some processes. What is the intended meaning? 7. Introduction: indicates that labels are one of the means of indicating (meaning?) languages used. No other means is alluded to, and no comments about their common consistency is provided? 8. Introduction: identifies only two functions for the document: an "identifier mechanism" and a "registration function". It does not talk of the dissemination of that information, nor of the way applications should use it. 9. Introduction: the sentence should say "This document intended to replace RTC 3066 and as such to become the new RFC 047" if this is the intent.

10. The language Tag: introduction talks about labels. What is the difference between the labels which have been documented as necessary and the languages tags now documented?

11. Syntax: the sentence "this makes it possible to construct a parser ?even if specific subtag values are not recognised" is quite obscure. What is the exact meaning of "recognised": understood, known, accepted, authoritative, canonical, identified? 12. Syntax: "a parser need not have an up-to-date copy .. to perform .. most .. searching and matching", what tells the parser the values it uses are up-to-date? How can we quantify the "most" and the number of occurrences of the remaining cases, for one billion users and more?

13. 2.1.1: how can the complex subtag sequence adding more precision can "seldom add useful distinguishing information" they are obviously intended to? Where is documented the authority of the following "because" saying that more granular tags interfere with the meaning, etc.. intended by the user. I do not oppose that users can be clumsy, but I think that the idea they are seldom smart should be explained. I feel the problem is more with filtering/analysis limitations. 14. 2.1.1: that subtags SHOULD be limited to four subtags is not documented. The allusion to the 2.3. for more information (which provides guidance about best choice of subtag content) does not seems to document this at all. 15. 2.1.1: is accepted as a "conformant implementation" an application not supporting a non specified length (1, 10, 100 chars?). The consequences on usage are not documented.

16. 2.2: it is noted that the language used for the "language tags namespace" and its registry is quite similar to the domain name system. However the proposed semantic is not consistent with other Internet spaces like DNS, IPv4, OID, etc. where the dot-separation is used, something users and parsers are accustomed to and existing processes have identified in different scripts. It relies to the contrary on the "-" as a separator which is more confusing and may have less identified homographs. 17. 2.2: the proposed design of language tags mixes identification of subtags by their position unless it is by their length. It is probable that in the particular case of the legacy and initial situation this can work. Such a two, possibly contradictory, systems format will never scale, is far too dependent from external changes and unable to support innovation. This cannot be made a world-wide standard through an IETF BCP except in the cases defined by the charter if the IESG wants to run into the risk of endless conflicts and of a quick obsolescence. It is to be noted that the referred ISO standard to be used are less than 30 years old and yet ISO 3166-2 cannot be supported.

18. 2.2.1: primary language: fixed length identification starts with 2 or 3 and possibly 8 but discouraged language ID coming this way only from ISO 639. This removes the possibility to consider computer related (non only programming) languages, dialects, etc. nor to adapt to evolutions, adjustments, and passed languages. For example, ISO 3166 could back to 1800 and possibly to 1000 and even to much before, either as a an ISO document or as a consistent table. Support of historic languages will be required. Blocking document historical consistency is unthinkable let just consider the current effort by Google and various libraries. 19. 2.2.1.: the 2 letters code for language is an oddity inherited from earlier times of RFC 1776 and ISO 639. Nothing against this being the default in some legacy or private application. It is likely that at some time it will be timed out by ISO or/and by usage may be even by anti-racist laws Time is now to update existing applications rather than to increase complexity of the years/century to come. This makes me think to "UK" instead of "GB". I know why Mr. Peter Jones made the world to use ".uk": I will be also able to tell my grand-grand-son why they are to riot against the 2/3 letters cultures discrimination. This seems also in contradiction with the spirit of the quoted ISO 639/RA-JAC statement which says "users are directed in Internet applications to employ the alpha-3 code" which sounds as the part of the statement which will stay as universal .. for a short while (?).

20. 2.2.2: language subtags are permitted only if they are 3 characters (a permanent rigid position) based the anticipation of a non documented ISO 639 works. This is also a violation of the Internet standard process: the document in reference should be quoted and cannot be a draft. Language extended subtags are the most active part of languages, yet the rigidity imposed by the chosen format obliges to prevent their registration by IANA (what is the very purpose of the document: to permit flexibility to support real network life, where ISO would be too slow).

21. 2.2.3: Script subtags follow the same rigid logic and constraints from the format. What happens if the memory waste of ISO 15924 (3 bytes lost) is corrected, or if another code element has a fixed 4 characters length in the future?

22. 2.2.4: I understand that all the regional language differences of the world are to be supported by the ISO 3166 alpha-3/digit-3 list. This means that regions like NY, TX or California are not entitled a code but the 56 persons of Pitcairn Island yes? I doubt that disparity can hold very long, all the more than ISO 3166-2 provides all the possibilities for a far more adequate granularity.

23. 2.2.9: There is a MUST in "there MUST be an attempt to register" which cannot be enforced if there is not a non-delaying procedure to verify that a language was attempted to be registered with ISO 639. Otherwise this part is to be understood as a disguised way, concerted with ISO, to block names. The concern on this point is high enough to see the Draft blocked. It seems that the second paragraph is a smoky verbose replay of the same idea, without any procedural description nor request/provision of formal proof. The general idea is precisely in opposition with the purpose of the proposed RFC: to be able to register names not registered by ISO. This amount to a legitimisation of censoring, and censoring against the very intent of this document. 24. 2.2.9: registrations are left to a decision of appropriateness by someone debating with undefined others for a matter without any importance on the network stability and security (documented in in part 4) non on the end to end interoperability. This seems to amount to pure intellectual censoring.

25. 2.3: recommendation 3 seems inappropriate. Aliases are aliases. All the aliases must be equally supported because (a) they are aliases (b) to make sure developers develops correct code.

26. 2.4. "language tags always define a language as spoken by human being for communications of information to other human beings. Computer languages ? are explicitly excluded" has no ground in the Charter and in reality. Web Services relations are excluded which may speak limited languages. Coded human languages should be supported: they fit the definition.

27. 2.4.1: in the canonicalization part "" is reminded as a deprecation indicator, yet this is not documented earlier. It seems this is an external ISO practice. This should be documented in the format description part. All the more than this practice is counter intuitive "" being understood intuitively as "-(nul)-". And the "" being used in IDN there could be some homograph confusion to investigate.

28. 3: the reference to RFC 2434 is correct but the rest of the part 3 seems inappropriate. RFC 2434 says "If the IANA is expected to play a role in the management of a name-space the IANA must be given clear and concise instructions describing that role". The part 3 is neither clear and concise and is contradictory with the document which describes a IANA file to be maintained by an IESG reviewer. The IESG having authority on the IANA, the role of the IANA is to store and disseminate the current file version as maintained by the reviewer.

29. 3.1: Description of "description" is clueless. It is a description but does not intend to be an English description but it is one. The addition made in the IANA file are intended to be additions to corresponding documented ISO tables. They MUST comply with the format of these tables otherwise they add a disparity between the table and their IANA "appendix". 30. 3.1. includes a registry format description (OK) but also considerations on the way the tags should be formed which have nothing to do in a file description. They should be moved into 2.4.1 31. 3.1. also includes direction to the Reviewer which should be presented in a separate part from the format description.

32. 3.2. this part is not a IANA procedure but a long guidance for the Reviewer and the Reviewing process participants, limited to current possible cases.

33. 3.3 : Understanding the meaning of "Subtags required for stability and to keep the registry synchronised" will probably be a source of long debates. It should be documented. 34. 3.3. why a "MAY" concerning the "description, note and prefix fields" is not documented by conditions? Is that not a "CAN". 35. 3.3. the registration procedure is of extreme confusion and mixes the form to use, the lack of definition of the requester, the iana.org list which is not introduced, the registration request which must be guessed, a non commented MAY, registration tricks, comments on probable behaviour of the reviewing list, digression on Slovenian, designation of the reviewer by the IESG, what should happen when the review period has elapsed without any guidance to the reviewer, that a IANA list Members and an IESG designated reviewer make an IETF decision, that the initial registrant has some moral pre-eminence (under the form of a comment) and that languages are not considered for registration on the fact they actually exist, but on their own (non documented) merits.

36. 3.4. Difficult to understand. The first sentence is probably inherited from the former versions of the draft. "compatible with applications that process language tags according to this specification" seems to refer to filtering which should be part of the of the second document produced by the WG-ltru. 37. 3.4. The description of information to be maintained is clear, but the format is not described. This permits IANA to freely change it or to present it in HTML form. This does not help its automated reading.

38. 4. security considerations should not deal with users political security outside of their network usage. Otherwise tons of such considerations should be presented. 39. 4. An important security consideration is homographs. It is certainly possible to include part of text in a foreign language which look printed as in another language or having a different meaning or printing (phishing). Concerns are also the double "-" which is specifically used by the IANA code "xn". 40. 4. Fourth paragraph tend to say that specification of valid sub-tags MUST be available over the internet but that applications should take possible DoS into consideration. This is an important indication on the way the Draft proposes the registry file to be used and accessed. It can be read that applications can freely access it and proposed mirrors: this may impose on the IANA a load which will result in its permanent inability of service.

41. 5. character set consideration are contradictory: they say that character a-z exist in most character sets (good news) [what means that there are some where they do not exist] so there should not be character set presentation issue [in the character set where they do not exist?]. Also the consideration only concerns the "display" what has a limited interest if the a-z characters do not exist on the keyboard. But may be this supposes that "intelligent people" use ascii compatible keyboards (see below).

42. 6. compatibility is preserved with RFC 3066 but not with evolution of ISO code elements. The XML Schema version 1.0 requirements are quoted but not documented. 43. 6. Stability. Confusion between document. This document does not provide a mechanism but a format that can be used by the mechanism described in the next document. This text has not been adapted after the split. 44. 6. Validity. This document should define the IQ of the "intelligent people" being considered or the collective IQ augmentation necessary to understand the system ?? Please see the ideas of the one who created the NIC and grand fathered the RFC system (http://bootstrap.org). 45. 6. Extensibility such as presented actually results (in a very limited way) from the underlying ISO codes. This is not the target of Charter which is to permit scalability even when a code element it is not supported by ISO. 46. 6. the document uses several times the term "extlang" but does not defines it. 47. 6. last: added text for "" is not sufficient enough, or is missing in my version.


CHARTER VS DRATF RELATED COMMENTS AND QUESTIONS

48. language preferences are uniquely understood in HTML, XML only. CLDR are quoted in the charter and not quoted in the Draft. The Charter does not prevent other applications, systems to be supported. The Draft does not allude to them.

49. The charter lists RFC 3066 problems. These problems are: (a) stability there is a paragraph on the matter; (b) accessibility to the underlying ISO standard this is definitely impeached by the format (no ISO 3166-2, no other ISO 639 format than 2 or 3 characters no other script description format than 4 characters, etc. as if the current ISO presentation will never improve); (c) difficulty with registration and acceptance: this could be improved by the subtag registration system but it seems to be made worse, due to the censoring rules introduced to prevent non-ISO entries to be entered in the IANA non-ISO table; (e) lack of clear guidance to identify script and region: scripts are Unicode only, region are 2 letter Telex codes; (f) lack of parseability and well-formedness : this has certainly been addressed [it seems to be both the major improvement of the Draft ? and the source of most of its problems due to the rigidity it introduces].

50. The main purpose of this Draft from the charter is to describe the IANA registry to support the resolution of the above problems, and how transition from RFC 3066. This is to be in a clear and concise way. RFC 3066 represents roughly 17.000 characters and the draft 70.000 (out of the IETF format and verbose). This makes it confuse. From what I understand it includes 3 parts: (a) the subtags file with a clear format (b) the accompanying registration/update forms (c) the variant tables with a clear, yet less precise format. From what I understand (a)(b) are the real responsibility of the old aliased distribution list and of a Reviewer designated by IESG with unlimited veto powers; (c) of the IESG when reviewing RFCs requesting entries, and of the updating mechanism defined by these RFC.

51. it lists challenges to be addressed. Stability: "how the language tags remains stable even if the underlying references should change". This means a process where the tag name is unrelated to its underlying components, like a domain name is stable even if the underlying IP address changes. This is not provided.

52. it lists challenges to be addressed. Accessibility: "a simple way to determine if a subtag is valid as of a given date. Like receiving a 404 when calling an expired domain name". Such a mechanism is not provided.

53. it lists challenges to be addressed. extensibility: this meant not having to record millions of combinations. This is provided. To the price of format rigidity, impossible use of foreseen or existing ISO code elements, and a censoring of the non-ISO extensions which may lead to more harassment. It also meant addition of the script in language tags. This is permitted by the proposed format but to the detriment of other 4 letters entries. Registration of non ISO scripts is not permitted.

54. it lists challenges to be addressed. "provide mechanism to support the evolution of the underlying standards, in particular ISO 693-3, mechanisms to support variant registration and format extensions, as well as allowing generative private use when necessary": I am not sure what "generative" may mean in here but I feel it is not supported, the rest is certainly opposed by the chosen format;

55. it lists challenges to be addressed: "to specify a mechanism for easily identifying the role of each subtag in the language tag". This is addressed by the Draft. But this challenge is contradictory with stability challenge above. If a language tag displays an identifiable subtag, it becomes by nature dependent from the underlying value of the subtag.


I will study carefully the responses to this review before introducing my own Draft, to try to build if possible on the largest possible number of consensual elements.

My current thinking is totally different. It is an open framework which respects the XML, HTML, CLDR requirements in welcoming your own (adapted) Draft, the ISO evolution, the requirements of an Internet for the people of the world by the people of the world, at an affordable cost, with an highly innovative technical approach, a great care for operation security and stability and in total continuity with the funding concept which gave us thirty years of international public network stability.

But I think the issues it rises are important enough to call on an understanding, comments, and a support by all those concerned by a "multilingual cyberspace", an equal cultural dignity empowerment on the digital ecosystem and an open e-commerce. This is because language tags are by nature the basic building blocks of the multilingual internet which is also to be user centric, multitechnology (convergence), multicontent (information society), multilateral, as the WSIS shows it.

jfcm

_______________________________________________
Ltru mailing list
Ltru at lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.