Re: [Ltru] Re: Test suite for language tags?
"Mark Davis" <mark.davis@icu-project.org> Sat, 16 September 2006 23:28 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOjZv-0006pH-B6; Sat, 16 Sep 2006 19:28:07 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOjZt-0006np-KM for ltru@lists.ietf.org; Sat, 16 Sep 2006 19:28:05 -0400
Received: from nf-out-0910.google.com ([64.233.182.184]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GOjZr-0006gm-3p for ltru@lists.ietf.org; Sat, 16 Sep 2006 19:28:05 -0400
Received: by nf-out-0910.google.com with SMTP id n15so2764947nfc for <ltru@lists.ietf.org>; Sat, 16 Sep 2006 16:28:02 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=fMS9qmBsNxr+aXeYzVXu13407P1rLk6ucjSzY3huUkd1ErgkNHTEkgCAEJ6w/h1+oIRXPSV261LNJI7HpDNVnpOUZ6KU226aCKDCnk3MWOkBtV0mTch6YT68gKo1iqBzQvcufJdLKt2yGXhj2W7vpi/EoQFG9HDQSWi1PWI2POc=
Received: by 10.48.48.15 with SMTP id v15mr15176032nfv; Sat, 16 Sep 2006 16:28:01 -0700 (PDT)
Received: by 10.49.65.16 with HTTP; Sat, 16 Sep 2006 16:28:01 -0700 (PDT)
Message-ID: <30b660a20609161628t22ab3c4flc81ea92f40800a09@mail.gmail.com>
Date: Sat, 16 Sep 2006 16:28:01 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: [Ltru] Re: Test suite for language tags?
In-Reply-To: <6.0.0.20.2.20060901024806.109a6d90@localhost>
MIME-Version: 1.0
References: <20060801203351.GA8854@sources.org> <20060802072709.GA17404@nic.fr> <44D21ACD.4040707@yahoo-inc.com> <20060804165720.GA24037@sources.org> <44D4AC42.79E0@xyzzy.claranet.de> <20060830093000.GA31895@nic.fr> <44F6313D.2070000@yahoo-inc.com> <6.0.0.20.2.20060831201004.101ab8d0@localhost> <44F6EF0E.20602@yahoo-inc.com> <6.0.0.20.2.20060901024806.109a6d90@localhost>
X-Google-Sender-Auth: 00c82b82f04de333
X-Spam-Score: 0.3 (/)
X-Scan-Signature: bdc523f9a54890b8a30dd6fd53d5d024
Cc: Frank Ellermann <nobody@xyzzy.claranet.de>, ltru@lists.ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1034181887=="
Errors-To: ltru-bounces@ietf.org
BTW, I had updated my regex to the final spec for 4646. Here is a single Perl or Java regex that does most of the parse: Regex: ((?: [a-z A-Z]{2,3} (?: [-] [a-z A-Z]{3} ){0,3} | [a-z A-Z]{4,8} ))(?: [-] ((?: [a-z A-Z]{4} )) )?(?: [-] ((?: [a-z A-Z]{2} | [0-9]{3} )) )?(?: [-] ((?: (?: [0-9] [a-z A-Z 0-9]{3} | [a-z A-Z 0-9]{5,8} ) (?: [-] (?: [0-9] [a-z A-Z 0-9]{3} | [a-z A-Z 0-9]{5,8} ) )* )) )?(?: [-] ((?: (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) (?: [-] (?: [a-w y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) )* )) )?(?: [-] ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ )) )?| ( (?i) art [-] lojban| cel [-] gaulish| en [-] (?: boont | GB [-] oed | scouse )| i [-] (?: ami | bnn | default | enochian | hak | klingon | lux | mingo | navajo | pwn | tao | tay | tsu )| no [-] (?: bok | nyn)| sgn [-] (?: BE [-] fr | BE [-] nl | CH [-] de)| zh [-] (?: cmn | zh [-] cmn [-] Hans | cmn [-] Hant | gan | guoyu | hakka | min | min [-] nan | wuu | xiang | yue))| ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ )) It checks for the grandfathered tags, since otherwise too much cruft sneaks in. You can't check in regex that there are only single instances of each singleton extension. (In retrospect we could have allowed multiple singletons: we could have accepted en-a-bcdef-ghijk-b-123-a-lmnop as equivalent to the canonical form en-a-bcdef-ghijk-lmnop-b-123, but that's water under the bridge at this point.) Of course, I didn't put this together by hand. The table used to build it is much more readable, at http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagRegex.txt and a test file that includes strings mentioned on this list is at: http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Test suite for language tags? Addison Phillips
- Re: [Ltru] Test suite for language tags? Mark Davis
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? Mark Davis
- Re: [Ltru] Re: Test suite for language tags? Mark Davis
- [Ltru] Re: Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Doug Ewell
- [Ltru] Re: Test suite for language tags? Doug Ewell
- Re: [Ltru] Test suite for language tags? Doug Ewell
- Re: [Ltru] Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Doug Ewell
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- [Ltru] Re: Test suite for language tags? Doug Ewell
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Addison Phillips
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- Re: [Ltru] Re: Test suite for language tags? Martin Duerst
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- Re: [Ltru] Re: Test suite for language tags? Martin Duerst
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- [Ltru] Re: Test suite for language tags? Frank Ellermann
- Re: [Ltru] Re: Test suite for language tags? Martin Duerst
- Re: [Ltru] Re: Test suite for language tags? Mark Davis
- Re: [Ltru] Re: Test suite for language tags? Martin Duerst
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? Mark Davis
- [Ltru] Re: Test suite for language tags? Mark Davis
- [Ltru] Re: Test suite for language tags? Mark Davis
- [Ltru] Re: Test suite for language tags? Doug Ewell
- Re: [Ltru] Re: Test suite for language tags? Mark Davis
- Re: [Ltru] Re: zh-hakka Doug Ewell
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? John Cowan
- [Ltru] Re: zh-hakka Doug Ewell
- [Ltru] Re: Test suite for language tags? Doug Ewell
- Re: [Ltru] Re: Test suite for language tags? Addison Phillips
- Re: [Ltru] Re: zh-hakka Addison Phillips
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- [Ltru] Re: zh-hakka Frank Ellermann
- [Ltru] Re: zh-hakka Frank Ellermann
- [Ltru] Region subtag changes John Cowan
- [Ltru] Re: Region subtag changes Doug Ewell
- Re: [Ltru] Re: Region subtag changes John Cowan
- Re: [Ltru] Re: Region subtag changes Doug Ewell
- Re: [Ltru] Re: zh-hakka David Conrad
- [Ltru] Re: zh-hakka Stephane Bortzmeyer
- [Ltru] Re: zh-hakka Frank Ellermann
- Re: [Ltru] Re: zh-hakka Doug Ewell
- [Ltru] Re: zh-hakka Frank Ellermann
- Re: [Ltru] Re: zh-hakka David Conrad
- RE: [Ltru] Re: zh-hakka Debbie Garside
- [Ltru] RS region subtag John Cowan
- [Ltru] Re: zh-hakka Frank Ellermann
- [Ltru] Available parsers? (Was: Test suite for la… Stephane Bortzmeyer
- [Ltru] Re: Available parsers? (Was: Test suite fo… Doug Ewell
- Re: [Ltru] Re: Test suite for language tags? Martin Duerst
- [Ltru] Re: Test suite for language tags? Stephane Bortzmeyer
- Re: [Ltru] Re: Test suite for language tags? Mark Davis