[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)




Mark


On Wed, Jul 15, 2009 at 15:26, Debbie Garside <debbie at ictmarketing.co.uk> wrote:
Mark wrote:
 
> A good example.
 
Thank you!
 
> Here's what I suggest you do:

> The following are all expressible in BCP47, and need to be tossed out:
  1. All those already with language codes (ang, enm).
These are all ISO 639-2/3/5 codes used within the hierarchical system of ISO 639-6 in order to relate the languages, so they are already within the LSR.  It is a simple procedure to remove them.
 
> 2.  All of the written vs non-written differences
Why? 

Because that distinction is already expressible in BCP47, as I said.
 
> 3All "written in <script>" variants
 
As previously discussed, those that are expressible within BCP 47, yes, the others, of which there are many, no.
 
> Most of the rest are associated with geographic designations.
 
Language does tend to be linked to where people live... I will agree with that... but...
 
>  Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.
> So toss out all those that could be expressed in this way (using case just to make the derivation clear):

en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for Shetlandic
 
 
I don't think so... and here's why... , let us look at the Staffordshire dialect as a case in point...
 
The staffordshire dialect area, as designated within ISO 639-6,  includes the whole of Staffordshire itself plus most of Cheshire, northern Shropshire, and parts of Southern Derbyshire, northwestern Warwickshire and northeastern Worcestershire.  it could be said that the "Potteries" dialect is within this dialect but I believe it is a distinct variant.  There is no way of defining the "potteries" via ISO or UN/LOCODE or the Staffordshire dialect area for that matter - it would be a complete mess.    The following example shows some distinctions within this dialects vowel sounds:
 
bait is pronounced like beat
beat is pronounced like bait
 
In fact, I believe that in this dialect the words "It seems the same" are pronounced "It sames the seem"!  Quite distinct and very apt to my comments, I think you will agree.
 
Similarly, the Lincolnshire dialect area consists only of central Lincolnshire, while the Leicestershire area includes most of Leicestershire (other than a bit linked to the Nottinghamshire dialect) plus part of south Nottinghamshire and parts of western Lincolnshire.
 
I think you get my drift...

It doesn't matter. The use of a region code in BCP 47 means that the variant is associated with that region. That does not mean that the region includes all instances, nor that the entire region only uses that variant. And no matter how you "define" a region (eg "whole of Staffordshire itself plus most of Cheshire, northern Shropshire, and parts of Southern Derbyshire, northwestern Warwickshire and northeastern Worcestershire", that will be the case.

 
Best regards
 
Debbie

From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 22:15
To: debbie at ictmarketing.co.uk
Cc: Broome, Karen; Kent Karlsson; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

A good example. Here's what I suggest you do:

The following are all expressible in BCP47, and need to be tossed out:
  1. All those already with language codes (ang, enm).
  2. All of the written vs non-written differences
  3. All "written in <script>" variants
Most of the rest are associated with geographic designations. Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.
So toss out all those that could be expressed in this way (using case just to make the derivation clear):

en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for Shetlandic
...

Then what you have left would be useful to review.

Mark


On Wed, Jul 15, 2009 at 13:40, Debbie Garside <debbie at ictmarketing.co.uk> wrote:
Mark wrote:
 
 
A more in-depth version of en-GB dialects... representing the draft data for ISO 639-6   And... I personally have to take full responsibility for this particular piece of research so corrections/suggestions etc. are most welcome! 
 
gmcw grmc Germanic West
nsea gmcw North Sea
angl nsea Anglic
ango angl Anglo Saxon
meng ango Middle English
enen meng Early Northern Middle English
emsc enen Early Scots Northern Middle English
msco emsc Middle Scots
sco engc Scots
scow sco Scots Written
scol scow Scots Written  Latin Script
scos sco Scots Spoken
sctl sco Scots-L
sctw sctl Scots-L Written
sotl sctw Scots-L Written Latin Script
llan sco Lallans
llaw llan Lallans Written
llnl llaw Lallans Written Latin Script
budo sco Buchan-Doric
wbud budo Buchan-Doric Written
lbud wbud Buchan-Doric Written Latin Script
stld scos Shetlandic
orcd scos Orcadian
cthn scos Caithness
helm scos Helmsdale
bkil scos Black-Isle
nirn scos Nairn
mray scos Moray
bchn scos Buchan
abrd scos Aberdonian
munh scos Mounth
csct scos Central-Scots
glca scos Glesca
swst scos Southwest-Scots
brsc scos Border-Scots
ulla llan Ullans
wull ulla Ullans Written
ulll wull Ullans Written Latin Script
sull ulla Ullans Spoken
dngl sull Donegal
drya sull Derry-Antrim
cntd sull County-Down
mlde scos Madeleine-Scots
otgo scos Otago-Scots
nubn sco Northumbrian 
nrbn nubn Northumbrian Spoken
nond nrbn Northumberland
grdi nrbn Geordie
wrde nrbn Wearside
teed nrbn Tees-Side
dumc nrbn Durham-County
swwe nrbn Swaledale-Wensleydale
yrkm nrbn Yorkshire-Moors
hmbn nrbn Humberside-N
lxyn nrbn Lyne
crle nrbn Carlisle
vled nrbn Vale-Of-Eden
lkln nrbn Lakeland-N
lknd nrbn Lakeland-S
esse meng Early Southern And South Western Middle English
emse meng Early Midland And South Eastern Middle English
emen emse Early Modern English
aeng eng Anglo-English
waen aeng Anglo-English Written
laen waen Anglo-English Written Latin Script
seng aeng Anglo-English Spoken
aenn seng Anglo-English North Cluster
naen aenn Northern Anglo-English
nann naen Northern Anglo-English Northeast
newc nann Newcastle
sdld nann Sunderland
ddbh nann Middlesborough
tyne nann Tyneside
naln naen Northern Anglo-English Lower North
nalc naln Northern Anglo-English Lower North Central
clsl nalc Carlisle
shff nalc Sheffield
cumb nalc Cumbria
angy nalc Yorkshire
wngy angy Yorkshire Western
hlfx wngy Halifax
hddf wngy Huddersfield
brdd wngy Bradford
lddd wngy Leeds
yrkk wngy York
lanc nalc Lancashire
lacc lanc Lancashire Central
hmbe lanc Humberside
huuu nalc Hull
gmby nalc Grimsby
cang seng Anglo-English Central Cluster
aenc cang Anglo-English West Central Cluster
scou aenc Scouse
nwme aenc North West Midlands English
mctr nwme Manchester
chsr nwme Cheshire
shpn nwme Shropshire North
drby nwme Derbyshire
stff nwme Staffordshire
pttr nwme Potteries
wmen cang West Midlands English
brmm wmen Birmingham
blco wmen Black Country
ecen cang East Central English 
cece ecen Central Midlands English
nnle cece Northern Nottinghamshire-Leicester
nttm cece Nottingham
leic cece Leicester
neme ecen North-East Midlands English
lncn neme Lincoln
eden ecen East Midlands English
nttt eden Nottinghamshire-S
slcn eden Lincolnshire-S
snrt eden Northamptonshire-S
cmbn eden Cambridgeshire-N
grnk cmbr Grantham-Newark
pbom cmbr Peterborough-Oakham
crby cmbr Corby
bnor cmbr Northamptonshire Borders
saen seng Anglo-English South Cluster
swen saen Anglo-English South Southwest
uswe swen Upper Southwest English
hrfd uswe Herefordshire
shrp uswe Shropshire
gloc uswe Gloucestershire
wrrc uswe Worcestershire
gowp uswe Gower Peninsula
spem uswe South Pembrokeshire
cswe swen Central Southwest English
glct cswe Gloucestershire
bucc cswe Buckinghamshire-S
ofrd cswe Oxfordshire
brke cswe Berkshire-S
hmpe cswe Hampshire-S
wilt cswe Wiltshire
avnn cswe Avon
sstt cswe Somerset
brst cswe Bristol
lswe swen Lower Southwest English
dvon lswe Devon
lswc lswe Cornwall
eacr lswc East Cornwall
wecr lswc West Cornwall
anec seng Anglo-English East Cluster
smen anec South Midlands English
nthh smen Northamptonshire
bedd smen Bedfordshire
camb smen Cambridgeshire
buck smen Buckinghamshire-NW
eaan anec East Anglia
sufn eaan Suffolk-NE
nflk eaan Norfolk
nfol nflk Norfolk Northern
nnoe nflk Norfolk Eastern
fenn eaan Fens
snea eaan Southern-East Anglia
hcen anec Home Counties English
cckn hcen Cockney
lndn hcen London Counties
hant hcen Hampshire
nrks hcen Berkshire
bckk hcen Buckinghamshire
 
 
Best regards
 
Debbie


From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 18:31
To: Broome, Karen
Cc: Kent Karlsson; debbie at ictmarketing.co.uk; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

I think the difference among dialects is important, but 639-6 doesn't work for incorporation into BCP47. The advantage of the BCP47 structure is that it allows reasonable behavior for applications that don't recognize the dialectical distinctions. That is, if we have:

en-US-southern
en-US-newengla
en-US-philly (pronouncing "huge" as /judʒ/)
en-US-general
en-GB-scots
...
(cf http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language)

Then applications that don't recognize the variants can fall back to en-US and en-GB. If these all had different atomic primary language codes, then applications would be forced to keep all the relationship data for all the codes hanging around - which, frankly, they are not going to do.

Mark


On Wed, Jul 15, 2009 at 08:37, Broome, Karen <Karen.Broome at am.sony.com> wrote:

For what it's worth, the film and television world does have a pretty heavy requirement for dialect distinctions. We also have a need to identify spoken and written variants. ISO 639-6 also provides a fixed-length tag, which can be advantageous in some situations. While I tend to see ISO 639-6 as an interesting alternative to xml:lang and not necessarily something I'd use within xml:lang, I wanted to correct the assumption that dialect tagging is obscure and the distinction between spoken and written variants is not useful.

Regards,

Karen Broome




-----Original Message-----
From: ltru-bounces at ietf.org on behalf of Kent Karlsson
Sent: Wed 7/15/2009 6:27 AM
To: debbie at ictmarketing.co.uk; 'LTRU Working Group'
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)


Den 2009-07-15 09.00, skrev "Debbie Garside" <debbie at ictmarketing.co.uk>:

> Well for starters, there are separate codes for Catalan and Valencian :-)

So does BCP 47 (well, nearly):
    ca
    ca-valencia

There is nothing in principle hindering a registration of a variant subtag
specifically for "true" Catalan (no value judgement implied).

> And, I rather like the way ISO 639-6 deals with variants of Chinese.

639-3 also deals with "variants" of Chinese (separate languages, really).
How does 639-6 do it differently (apart from using 4-letter codes instead of
3-letter codes)?

> Perhaps you would like to tell me how many of the 7000+ codes of ISO 639-3
> will be used.  My guess is approximately 2-300 at present but over time more
> and more.  The answer is the same for ISO 639-6.
>
> Essentially, all the reasons for including ISO 639-6 are the same as for
> including ISO 639-3.  Unless of course, you think that ISO 639-3 is perfect
> and defines all languages distinctly and that anything else cannot, is not,
> and definitely is not a language.  Then of course you have to decide that
> BCP 47 will only deal with languages and not dialects.

BCP 47 does deal with dialects, using variant subtags. However, it is very
very far from systematic or comprehensive. It requires individual
registration of each variant. I would venture to guess that that process
will never result in a systematic or (in some sense) comprehensive set
of variant subtags for dialects. On the other hand, the call for tagging
dialects separately, currently seems fairly small amongst the consumers of
BCP 47, IMHO.

    /kent k

> Then, and only then,
> may you exclude ISO 639-6.
>
>
> Debbie


_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru





Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.