[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ltru] Regex
I just redid my regex for the new version. The only tweak I made was that it only allows one extlang (as per the description, item 4), which means that I had to retain zh-min-nan as irregular.
(?i)
(?:
(?: ( [a-z]{2,8} | [a-z]{2,3} [-_] [a-z]{3} )
(?: [-_] ( [a-z]{4} ) )?
(?: [-_] ( [a-z]{2} | [0-9]{3} ) )?
(?: [-_] ( (?: [a-z 0-9]{5,8} | [0-9] [a-z 0-9]{3} ) (?: [-_] (?: [a-z 0-9]{5,8} | [0-9] [a-z 0-9]{3} ) )* ) )?
(?: [-_] ( [a-w y-z] (?: [-_] [a-z 0-9]{2,8} )+ (?: [-_] [a-w y-z] (?: [-_] [a-z 0-9]{2,8} )+ )* ) )?
(?: [-_] ( x (?: [-_] [a-z 0-9]{1,8} )+ ) )? )
| ( x (?: [-_] [a-z 0-9]{1,8} )+ )
| ( en [-_] GB [-_] oed
| i [-_] (?: ami | bnn | default | enochian | hak | klingon | lux | mingo | navajo | pwn | tao | tay | tsu )
| no [-_] (?: bok | nyn )
| sgn [-_] (?: BE [-_] (?: fr | nl) | CH [-_] de )
| zh [-_] min [-_] nan ) )
As before,
- the (?i) is for case insensitive matching,
- the [-_] is for implemenatations (like Unicode) that allow alternate separators (can be replaced by [-] otherwise), and
- the (?: is Perl/Java syntax for non-capturing groups. [That is, the regular (..) capture the main components of the regex, for extraction later.]
Mark
Note Well: Messages sent to this mailing list are the opinions
of the senders and do not imply endorsement by the IETF.