Hello Addison, others, On 2009/06/09 7:53, Phillips, Addison wrote:
I think this is a potentially serious problem, although we did address it during document development (of RFC 4646). Some of the changes in this document have watered down the way we addressed it in that document. There is also a general recognition that mechanically retrieving the registry is a Bad Thing.14). Language Tag validity verification requires applications to keep up-to-date copies of the Language Tag registry.Actually, this is false. The validity text is careful to say "as of the particular registry date" when referring to validity. There is an expectation that most implementation will use a specific registry version.
I think the problem here is with "there is an expectation". Of course we expect people to understand things our way, and to be reasonable. Yet still that's not always the case. An example story can be found at http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic. Also, I know that W3C staff at one point was contacted by staff from the Unicode consortium about a similar issue.
Such applications might be designed to automatically fetch new versions of the registry. This has three problems: I). IANA is generally not happy about applications automatically downloading registries, unless data is located in DNS. This is primarily to scaling concerns.Obviously this would be a concern. In fact, Section 6 (Security Considerations) says: -- Although the specification of valid subtags for an extension (see Section 3.7 (Extensions and the Extensions Registry)) MUST be available over the Internet, implementations SHOULD NOT mechanically depend on it being always accessible, to prevent denial-of-service attacks. -- This should also address the IANA registry. I think that's an oversight. I actually thought it said both.
Okay. Here is proposed text for insertion just before the currently last paragraph in section 6:
>>>>Although the Language Subtag Registry and the Language Tag Extensions Registry are available over the Internet, applications SHOULD NOT mechanically depend on it being always accessible, to prevent denial-of-service attacks.
>>>>
II). There is also the issue of when application should be fetching the updated registry. I.e. what is the mechanism(s) for determining when to fetch.Usually: when you make a new or updated version of your software. Infrequent polling of the registry itself would also not be harmful. Neither would monitoring the list of registration forms (which IANA archives separately). However, there is also a specific mechanism in Section 5.1 for knowing when a change is made: -- Developers who are dependent upon the language subtag registry sometimes would like to be informed of changes in the registry so that they can update their implementations. When any change is made to the language subtag registry, IANA will send an announcement message to "ietf-languages-announcements at iana.org" (a self-subscribing list that only IANA can post to). --
I think this is well and good for the scenarios you are used to, similar to shrink-wrapped software. But somebody may think that they want to be more up-to-date, and build in some automatic updating mechanism into their software.
III). The registry format is not designed for figuring out the minimal list of changes between any 2 versions. For example there is no way of only fetching changes since a given File-Date value.No, there isn't. The records do not contain a "last modified" date. Is this necessarily important? When I update one of my implementations, I just reparse the latest registry. Stability rules exist to protect my doing it this way. Since the records are not in any particular order, there isn't necessarily utility to using diff to reconstruct changes either.BTW, are people aware of implementations that try to download Language Tag registry automatically?Yes, see above.While I don't expect this issue to be fully fixed at such a late stage, it needs to be discussed in the document. Some thoughts about what can be said on the topic: Regarding I): application should be advised against frequent polling of the registry. For example: The registries specified in this document are not suitable for applications that require real-time access to, or retrieval, of the full registry contents.There are some notes already in place. It was considered during document development (of 4646). There is no evidence, btw, that implementations do rely on active access and, with the size expansion envisioned with this document, it seems less likely that an implementer would want to create such a linkage.
I agree that there is no evidence that a language tag implementation currently does frequent downloads. But there is evidence for other cases of frequent downloads, where the frequency of changes is way smaller or non-existent.
I therefore suggest to continue the text I proposed above as follows: >>>>The registries specified in this document are not suitable for frequent or real-time access to, or retrieval, of the full registry contents. Most applications do not need registry data at all. For the others, being able to validate or canonicalize language tags as of a particular registry date will be sufficient. Also, the registry contents changes only occasionally. Changes are announced to ietf-languages-announcements at iana.org. Changes, or the absence thereof, can also easily be detected by looking at the File-Date record at the start of the registry, or by using features of the protocol used for downloading, without having to download the full registry.
>>>>
It might be nice to have the File-Date value available as a separate short piece of data (a separate web page on IANA's website, a thing in DNS, etc. But even with existing registry format it might be possible to minimize load on IANA's website by describing how HTTP/1.1 features can be used to only download necessary information, such as downloading the beginning of the registry which contains the File-Date field.One can subscribed to ietf-languages-announcements at . Also one can look at the list of registrations (http://www.iana.org/assignments/lang-subtags-templates/index.html).Regarding II): I think this needs some advice on which operations need and which don't need fetching of the registry.*No* operations depend on fetching the registry at runtime. Only validation and canonicalization depend on a copy of the registry being available. Wording the I-D is always in terms of "as of a particular registry date".Also this might mention that any application keeping a copy of the registry needs to preserve the File-Date value in some form.It is actually already a requirement of being a validating processor. Section 2.2.9 says: -- An implementation that claims to be validating MUST: // ... # Specify the particular registry date for which the implementation performs validation of subtags. -- That is the File-Date. Note that the expectation (but not requirement) is that validating implementations are tied to a registry version, not obtaining it mechanically.Regarding III): I think it is Ok for now to explicitly acknowledge that this is an issue. It might also be worth discussing how the registry format facilitates "diffing" 2 versions of the registry.It's possible, although once you've parsed the new registry, the diff doesn't matter so much. Note that stability rules generally prevent breaking changes.
I think what Alex means is that the diff would significantly reduce the amount of data that needs to be downloaded to update the registry to a new version.
Overall, I agree that we have discussed this issue before, but I also understand that Alex is concerned, probably not only by himself, but also about the fact that his fellow IESG members might easily bring up this issue as a "discuss". So I think it's better to have it clearly documented in the security section.
Regards, Martin.
Addison _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
-- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.