John Cowan wrote:
Well, and then interpreted as Shift-JIS in the case of my mailer :-(. Our spec as well as the HTTP headers will say that it's UTF-8. That should be enough. People who are okay to see garbage will be as satisfied with Bokmå as they are with Bokm?$B%F!&l, or any other weird rendering, but people who like to see the real thing will be best served by UTF-8.Bokmål is at least reconstructible without knowing anything except the SGML character reference conventions and the codepoints of Unicode characters; Bokm?$B%F!&l (which is the way it got to me, sans ESC characters) is nothing but rubbish.
Well, that's mozibake: it's ISO 2022-JP encoding of the Shift-JIS characters made by mis-interpreting UTF-8 bytes. Lovely.
This is an *excellent* example of why we need explicit escaping (for which SGML is as good a convention as any) rather than encoding, given the present state of email.
But that consideration doesn't apply to the registry file. It only applies to the discussion of a registry entry on ietf-languages. Admittedly MUAs vary greatly in their support for encodings, but we can certainly specify directions for how to send the registration request to the list so that everyone can tell what code points are intended and other instructions for how to encode those characters in the registry.
Suggestions to limit the character repertoire strike me as counter productive too. Why would we do that? I could see, for example, that we might end up with a few Chinese descriptions or comments in the registry to disambiguate Chinese variations. Why artificially restrict it 'ab initio'?
I find this fascination with ASCII slightly quaint. Addison -- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. _______________________________________________ Ltru mailing list Ltru at ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.