Mark wrote:> A good example.Thank you!> Here's what I suggest you do:
> The following are all expressible in BCP47, and need to be tossed out:
- All those already with language codes (ang, enm).
These are all ISO 639-2/3/5 codes used within the hierarchical system of ISO 639-6 in order to relate the languages, so they are already within the LSR. It is a simple procedure to remove them.> 2. All of the written vs non-written differencesWhy?
> 3. All "written in <script>" variantsAs previously discussed, those that are expressible within BCP 47, yes, the others, of which there are many, no.> Most of the rest are associated with geographic designations.Language does tend to be linked to where people live... I will agree with that... but...> Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.
- unlXXXXX is a UN/LOCODE minus the country designator (see for this case, http://www.unece.org/cefact/locode/gb.htm)
- isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by CLDR). See for this case, http://en.wikipedia.org/wiki/ISO_3166-2:GB,
> So toss out all those that could be expressed in this way (using case just to make the derivation clear):en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for ShetlandicI don't think so... and here's why... , let us look at the Staffordshire dialect as a case in point...The staffordshire dialect area, as designated within ISO 639-6, includes the whole of Staffordshire itself plus most of Cheshire, northern Shropshire, and parts of Southern Derbyshire, northwestern Warwickshire and northeastern Worcestershire. it could be said that the "Potteries" dialect is within this dialect but I believe it is a distinct variant. There is no way of defining the "potteries" via ISO or UN/LOCODE or the Staffordshire dialect area for that matter - it would be a complete mess. The following example shows some distinctions within this dialects vowel sounds:bait is pronounced like beatbeat is pronounced like baitIn fact, I believe that in this dialect the words "It seems the same" are pronounced "It sames the seem"! Quite distinct and very apt to my comments, I think you will agree.Similarly, the Lincolnshire dialect area consists only of central Lincolnshire, while the Leicestershire area includes most of Leicestershire (other than a bit linked to the Nottinghamshire dialect) plus part of south Nottinghamshire and parts of western Lincolnshire.I think you get my drift...
Best regardsDebbieA good example. Here's what I suggest you do:
From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 22:15
To: debbie at ictmarketing.co.uk
Cc: Broome, Karen; Kent Karlsson; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)
The following are all expressible in BCP47, and need to be tossed out:
Most of the rest are associated with geographic designations. Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.
- All those already with language codes (ang, enm).
- All of the written vs non-written differences
- All "written in <script>" variants
So toss out all those that could be expressed in this way (using case just to make the derivation clear):
- unlXXXXX is a UN/LOCODE minus the country designator (see for this case, http://www.unece.org/cefact/locode/gb.htm)
- isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by CLDR). See for this case, http://en.wikipedia.org/wiki/ISO_3166-2:GB,
en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for Shetlandic
...
Then what you have left would be useful to review.
Mark
On Wed, Jul 15, 2009 at 13:40, Debbie Garside <debbie at ictmarketing.co.uk> wrote:
Mark wrote:A more in-depth version of en-GB dialects... representing the draft data for ISO 639-6 And... I personally have to take full responsibility for this particular piece of research so corrections/suggestions etc. are most welcome!
gmcw grmc Germanic West nsea gmcw North Sea angl nsea Anglic ango angl Anglo Saxon meng ango Middle English enen meng Early Northern Middle English emsc enen Early Scots Northern Middle English msco emsc Middle Scots sco engc Scots scow sco Scots Written scol scow Scots Written Latin Script scos sco Scots Spoken sctl sco Scots-L sctw sctl Scots-L Written sotl sctw Scots-L Written Latin Script llan sco Lallans llaw llan Lallans Written llnl llaw Lallans Written Latin Script budo sco Buchan-Doric wbud budo Buchan-Doric Written lbud wbud Buchan-Doric Written Latin Script stld scos Shetlandic orcd scos Orcadian cthn scos Caithness helm scos Helmsdale bkil scos Black-Isle nirn scos Nairn mray scos Moray bchn scos Buchan abrd scos Aberdonian munh scos Mounth csct scos Central-Scots glca scos Glesca swst scos Southwest-Scots brsc scos Border-Scots ulla llan Ullans wull ulla Ullans Written ulll wull Ullans Written Latin Script sull ulla Ullans Spoken dngl sull Donegal drya sull Derry-Antrim cntd sull County-Down mlde scos Madeleine-Scots otgo scos Otago-Scots nubn sco Northumbrian nrbn nubn Northumbrian Spoken nond nrbn Northumberland grdi nrbn Geordie wrde nrbn Wearside teed nrbn Tees-Side dumc nrbn Durham-County swwe nrbn Swaledale-Wensleydale yrkm nrbn Yorkshire-Moors hmbn nrbn Humberside-N lxyn nrbn Lyne crle nrbn Carlisle vled nrbn Vale-Of-Eden lkln nrbn Lakeland-N lknd nrbn Lakeland-S esse meng Early Southern And South Western Middle English emse meng Early Midland And South Eastern Middle English emen emse Early Modern English aeng eng Anglo-English waen aeng Anglo-English Written laen waen Anglo-English Written Latin Script seng aeng Anglo-English Spoken aenn seng Anglo-English North Cluster naen aenn Northern Anglo-English nann naen Northern Anglo-English Northeast newc nann Newcastle sdld nann Sunderland ddbh nann Middlesborough tyne nann Tyneside naln naen Northern Anglo-English Lower North nalc naln Northern Anglo-English Lower North Central clsl nalc Carlisle shff nalc Sheffield cumb nalc Cumbria angy nalc Yorkshire wngy angy Yorkshire Western hlfx wngy Halifax hddf wngy Huddersfield brdd wngy Bradford lddd wngy Leeds yrkk wngy York lanc nalc Lancashire lacc lanc Lancashire Central hmbe lanc Humberside huuu nalc Hull gmby nalc Grimsby cang seng Anglo-English Central Cluster aenc cang Anglo-English West Central Cluster scou aenc Scouse nwme aenc North West Midlands English mctr nwme Manchester chsr nwme Cheshire shpn nwme Shropshire North drby nwme Derbyshire stff nwme Staffordshire pttr nwme Potteries wmen cang West Midlands English brmm wmen Birmingham blco wmen Black Country ecen cang East Central English cece ecen Central Midlands English nnle cece Northern Nottinghamshire-Leicester nttm cece Nottingham leic cece Leicester neme ecen North-East Midlands English lncn neme Lincoln eden ecen East Midlands English nttt eden Nottinghamshire-S slcn eden Lincolnshire-S snrt eden Northamptonshire-S cmbn eden Cambridgeshire-N grnk cmbr Grantham-Newark pbom cmbr Peterborough-Oakham crby cmbr Corby bnor cmbr Northamptonshire Borders saen seng Anglo-English South Cluster swen saen Anglo-English South Southwest uswe swen Upper Southwest English hrfd uswe Herefordshire shrp uswe Shropshire gloc uswe Gloucestershire wrrc uswe Worcestershire gowp uswe Gower Peninsula spem uswe South Pembrokeshire cswe swen Central Southwest English glct cswe Gloucestershire bucc cswe Buckinghamshire-S ofrd cswe Oxfordshire brke cswe Berkshire-S hmpe cswe Hampshire-S wilt cswe Wiltshire avnn cswe Avon sstt cswe Somerset brst cswe Bristol lswe swen Lower Southwest English dvon lswe Devon lswc lswe Cornwall eacr lswc East Cornwall wecr lswc West Cornwall anec seng Anglo-English East Cluster smen anec South Midlands English nthh smen Northamptonshire bedd smen Bedfordshire camb smen Cambridgeshire buck smen Buckinghamshire-NW eaan anec East Anglia sufn eaan Suffolk-NE nflk eaan Norfolk nfol nflk Norfolk Northern nnoe nflk Norfolk Eastern fenn eaan Fens snea eaan Southern-East Anglia hcen anec Home Counties English cckn hcen Cockney lndn hcen London Counties hant hcen Hampshire nrks hcen Berkshire bckk hcen Buckinghamshire Best regardsDebbieI think the difference among dialects is important, but 639-6 doesn't work for incorporation into BCP47. The advantage of the BCP47 structure is that it allows reasonable behavior for applications that don't recognize the dialectical distinctions. That is, if we have:
From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 18:31
To: Broome, Karen
Cc: Kent Karlsson; debbie at ictmarketing.co.uk; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)
en-US-southern
en-US-newengla
en-US-philly (pronouncing "huge" as /judʒ/)
en-US-general
en-GB-scots
...
(cf http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language)
Then applications that don't recognize the variants can fall back to en-US and en-GB. If these all had different atomic primary language codes, then applications would be forced to keep all the relationship data for all the codes hanging around - which, frankly, they are not going to do.
Mark
On Wed, Jul 15, 2009 at 08:37, Broome, Karen <Karen.Broome at am.sony.com> wrote:
For what it's worth, the film and television world does have a pretty heavy requirement for dialect distinctions. We also have a need to identify spoken and written variants. ISO 639-6 also provides a fixed-length tag, which can be advantageous in some situations. While I tend to see ISO 639-6 as an interesting alternative to xml:lang and not necessarily something I'd use within xml:lang, I wanted to correct the assumption that dialect tagging is obscure and the distinction between spoken and written variants is not useful.
Regards,
Karen Broome
-----Original Message-----
From: ltru-bounces at ietf.org on behalf of Kent Karlsson
Sent: Wed 7/15/2009 6:27 AM
To: debbie at ictmarketing.co.uk; 'LTRU Working Group'
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)Den 2009-07-15 09.00, skrev "Debbie Garside" <debbie at ictmarketing.co.uk>:
> Well for starters, there are separate codes for Catalan and Valencian :-)
So does BCP 47 (well, nearly):
ca
ca-valencia
There is nothing in principle hindering a registration of a variant subtag
specifically for "true" Catalan (no value judgement implied).
> And, I rather like the way ISO 639-6 deals with variants of Chinese.
639-3 also deals with "variants" of Chinese (separate languages, really).
How does 639-6 do it differently (apart from using 4-letter codes instead of
3-letter codes)?
> Perhaps you would like to tell me how many of the 7000+ codes of ISO 639-3
> will be used. My guess is approximately 2-300 at present but over time more
> and more. The answer is the same for ISO 639-6.
>
> Essentially, all the reasons for including ISO 639-6 are the same as for
> including ISO 639-3. Unless of course, you think that ISO 639-3 is perfect
> and defines all languages distinctly and that anything else cannot, is not,
> and definitely is not a language. Then of course you have to decide that
> BCP 47 will only deal with languages and not dialects.
BCP 47 does deal with dialects, using variant subtags. However, it is very
very far from systematic or comprehensive. It requires individual
registration of each variant. I would venture to guess that that process
will never result in a systematic or (in some sense) comprehensive set
of variant subtags for dialects. On the other hand, the call for tagging
dialects separately, currently seems fairly small amongst the consumers of
BCP 47, IMHO.
/kent k
> Then, and only then,
> may you exclude ISO 639-6.
>
>
> Debbie
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.