Minutes of the Character Set Policy BOF
Chair: Harald Tveit Alvestrand, <Harald.T.Alvestrand@uninett.no>
Reported by: Roland Hedberg
Harald presented the to be IETF policy for the handling of character sets, which briefly consists of the following: all information, in the format of strings for human consumption, transported using IETF protocols must have the character set and language declared. The default character set should be ISO 10646(Unicode) with UTF-8 as transport encoding. Language tags according to RFC 1766 should be used. A short discussion followed which made it quite obvious that there was rough consensus among the people in the room that this was a good thing.
After a run through of the reasoning behind IETF adapting a character set policy, some of the points were discussed. It was concluded that the policy should not deal with glyphs since that is an application client business, neither should the IETF deal with problems inside ISO 10646. Undefined issues that should be dealt with within the IETF are things like character set registration and how to define comparison between strings. It was concluded that normalization is a very hard ting to do; it is really a research topic. As are ordering since it is language dependent. Therefore, we should initially deal only with comparison between strings. Further on, a proposal was made that protocol element names should be in ASCII as long as we don't have rules for name comparisons.
Regarding language tags, we do not know what language tags we will need, but we do need one tag with the meaning "the language is Unknown." It was also discussed whether we should look to either ISO or the unicode consortium for maintenance of the language tags.
A straw poll among the people present showed a rough consensus about the four bullets in Harald's proposal.
Roster Not Received