[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] ABNF, plus small fix



Whoops, bad link. The BNF is at:

http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagRegex.txt

And here is the code, if anyone cares:

http://unicode.org/cldr/data/tools/java/org/unicode/cldr/tool/CheckLangTagBNF.java

Mark

On Fri, Mar 21, 2008 at 4:44 PM, Mark Davis <mark.davis at icu-project.org> wrote:
I updated the cldr language tag tests for BCP 47 2008(exp)

The BNF used both for testing and for generating random strings is at

http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt

The canned test data is at:

http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt

In so doing, I found the following small text item. The first term in this item in the ABNF no longer needs parens, so for consistency they should be removed:

language      = (2*3ALPHA)             ; shortest ISO 639 code
/ 4ALPHA ; reserved for future use
/ 5*8ALPHA ; registered language subtag

For those interested, the regular _expression_ generated from the BNF is the following:

Regex: (?: ((?: [a-z A-Z]{2,3} | [a-z A-Z]{4,8} ))(?: [-] ((?: [a-z A-Z]{4} )) )? (?: [-] ((?: [a-z A-Z]{2} | [0-9]{3} )) )? (?: [-] ((?: (?: [a-z A-Z 0-9]{5,8} | [0-9] [a-z A-Z 0-9]{3} ) (?: [-] (?: [a-z A-Z 0-9]{5,8} | [0-9] [a-z A-Z 0-9]{3} ) )* )) )? (?: [-] ((?: (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) (?: [-] (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) )* )) )? (?: [-] ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ )) )? ) |   ((?: (?i)en [-] GB [-] oed|   i [-] (?: ami | bnn | default | enochian | hak | klingon | lux | mingo | navajo | pwn | tao | tay | tsu )|   no [-] (?: bok | nyn )|    sgn [-] (?: BE [-] (?: fr | nl) | CH [-] de)|   zh [-] (?: cmn (?: [-] Hans | [-] Hant )? | gan | min (?: [-] nan)? | wuu | yue))) |    ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ ))

Note that this could be somewhat simpler: it does a few extra things to get the capture groups to work right, and it includes both lowercase and uppercase since those are used in generating the random test data.

--
Mark



--
Mark
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.