> I just wrote a non-validating parser for language tags and I'm looking > for test data. I want to test bizarre tags to see if the parser does > classify them properly. Good for you! > I'm specially interested in badly-formed tags: the I-D contains mostly > well-formed tags.Your best bet is probably to generate subtag sequences based on the ABNF. Some particular problem cases to check would be:
- singletons in the first position (except for 'x' and the grandfathered list)
- overlong subtags (longer than 8 characters) - more than three extlangs- misplaced extlang (3ALPHA in the third or later position following any of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop at singleton] - misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alphanum)[note: stop at singleton] - misplaced variant (five or more characters, or four or more starting with a digit; either occurring before an extlang/script/region is an error).
- non-x singleton followed immediately by a singleton (including 'x')
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
- digits in the primary (first) subtag
- repeated singleton (note case insensitivity)
Thus, these are all errors:
"a-foo"
"abcdefghi-012345678"
"ab-abc-abc-abc-abc"
"ab-abcd-abc"
"ab-ab-abc"
"ab-123-abc"
"ab-abcde-abc"
"ab-1abc-abc"
"ab-ab-abcd"
"ab-123-abcd"
"ab-abcde-abcd"
"ab-1abc-abcd"
"ab-a-b"
"ab-a-x"
"ab--ab"
"ab-abc-"
"-ab-abc"
"ab-a-abc-a-abc"
These are not errors:
"ab-x-abc-x-abc" // anything goes after x
"ab-x-abc-a-a" // ditto
"i-default" // grandfathered
Hope that helps,
Addison
Addison Phillips
Globalization Architect − Yahoo! Inc.
Internationalization is an architecture.
It is not a feature.
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.