[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Test suite for language tags?



> I just wrote a non-validating parser for language tags and I'm looking
> for test data. I want to test bizarre tags to see if the parser does
> classify them properly.

Good for you!

> I'm specially interested in badly-formed tags: the I-D contains mostly
> well-formed tags.

Your best bet is probably to generate subtag sequences based on the ABNF. Some particular problem cases to check would be:

- singletons in the first position (except for 'x' and the grandfathered list)
- overlong subtags (longer than 8 characters)
- more than three extlangs
- misplaced extlang (3ALPHA in the third or later position following any of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop at singleton] - misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alphanum)[note: stop at singleton] - misplaced variant (five or more characters, or four or more starting with a digit; either occurring before an extlang/script/region is an error).
- non-x singleton followed immediately by a singleton (including 'x')
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
- digits in the primary (first) subtag
- repeated singleton (note case insensitivity)

Thus, these are all errors:

"a-foo"
"abcdefghi-012345678"
"ab-abc-abc-abc-abc"
"ab-abcd-abc"
"ab-ab-abc"
"ab-123-abc"
"ab-abcde-abc"
"ab-1abc-abc"
"ab-ab-abcd"
"ab-123-abcd"
"ab-abcde-abcd"
"ab-1abc-abcd"
"ab-a-b"
"ab-a-x"
"ab--ab"
"ab-abc-"
"-ab-abc"
"ab-a-abc-a-abc"

These are not errors:

"ab-x-abc-x-abc" // anything goes after x
"ab-x-abc-a-a"   // ditto
"i-default"      // grandfathered

Hope that helps,

Addison

Addison Phillips
Globalization Architect − Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.