Addison Phillips <addison at yahoo dash inc dot com> wrote:
Your best bet is probably to generate subtag sequences based on the ABNF. Some particular problem cases to check would be:
Just a few minor nits here. This is slightly more complex than meets the eye, unfortunately.
- singletons in the first position (except for 'x' and the grandfathered list)
Sadly, the existence of the grandfathered list means that all well-formed processors must also do a limited amount of validity checking. I don't question the importance of maintaining support for the grandfathered tags, but this is a side effect.
- misplaced extlang (3ALPHA in the third or later position following any of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop at singleton]
Repalce "following" by "immediately following." 2ALPHA followed by 3ALPHA followed by 3ALPHA is fine.
"DIGIT 3alpha" here should be "DIGIT 3alphanum".
- misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alphanum)[note: stop at singleton]
Needs another qualifier. 2ALPHA followed by 4ALPHA is fine: "zh-Hans".
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen
("-foo-bar-baz")
The second is really just a special case of the first: a missing subtag at the end or beginning, respectively. One thing I found useful, when building my validator, was to parse out the subtags first and check them for validity afterward, so the hyphens never become part of the validity checking per se.
Thus, these are all errors: "a-foo" "abcdefghi-012345678" "ab-abc-abc-abc-abc" "ab-abcd-abc" "ab-ab-abc"
Why pick on Abkhasian so much? :-)
"ab-x-abc-x-abc" // anything goes after x
Not quite anything, of course: 1*("-" (1*8alphanum))
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.