I updated the cldr language tag tests for BCP 47 2008(exp)
The BNF used both for testing and for generating random strings is at
The canned test data is at:
In so doing, I found the following small text item. The first term in this item in the ABNF no longer needs parens, so for consistency they should be removed:language = (2*3ALPHA) ; shortest ISO 639 code
/ 4ALPHA ; reserved for future use
/ 5*8ALPHA ; registered language subtag
For those interested, the regular _expression_ generated from the BNF is the following:
Regex: (?: ((?: [a-z A-Z]{2,3} | [a-z A-Z]{4,8} ))(?: [-] ((?: [a-z A-Z]{4} )) )? (?: [-] ((?: [a-z A-Z]{2} | [0-9]{3} )) )? (?: [-] ((?: (?: [a-z A-Z 0-9]{5,8} | [0-9] [a-z A-Z 0-9]{3} ) (?: [-] (?: [a-z A-Z 0-9]{5,8} | [0-9] [a-z A-Z 0-9]{3} ) )* )) )? (?: [-] ((?: (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) (?: [-] (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) )* )) )? (?: [-] ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ )) )? ) | ((?: (?i)en [-] GB [-] oed| i [-] (?: ami | bnn | default | enochian | hak | klingon | lux | mingo | navajo | pwn | tao | tay | tsu )| no [-] (?: bok | nyn )| sgn [-] (?: BE [-] (?: fr | nl) | CH [-] de)| zh [-] (?: cmn (?: [-] Hans | [-] Hant )? | gan | min (?: [-] nan)? | wuu | yue))) | ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ ))
Note that this could be somewhat simpler: it does a few extra things to get the capture groups to work right, and it includes both lowercase and uppercase since those are used in generating the random test data.
--
Mark
_______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.