[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sip] language info in SIP headers
This is actually a fairly thorny problem. From a previous private
discussion, there are only a few possible solutions:
- RFC 2482 or http://www.unicode.org/unicode/reports/tr27/#tag provide
language tagging, as you noted. While I have not seen an official
statement from the IESG on this, I believe 10646 language tagging has
generally been deprecated. It's not quite clear whether the reasons (I
believe, nesting) for that apply to short text such as Subject or
Organization.
- For header fields that allow parameters, I believe the best option is
to add a language parameter, as in
From: "Somebody" <sip:somebody@somewhere.com> ;language="en"
This doesn't allow multiple languages within the same header field, but
is probably useful and sufficient as a hint for text-to-speech and other
UI rendering. It is also backward-compatible. This works for From, To
and a few others, such as the NAI header fields.
- For header fields that don't allow parameters (Subject, Organization,
etc.), there are no good options. In general, I believe we should avoid
such header fields in the future. One possible kludge suggested in
private conversation would be to add a descriptive header that tells you
what language each header field contains. Or, simpler but less
space-efficient, something like
Subject-Language: jp
Organization-Language: en-us
Any other suggestions?
Shingo Fujimoto wrote:
> Hello. I would like to raise the new issue to solve.
>
> SIP RFC specified UTF-8 friendly transport, but I think that did not
> specified the way to use multi bytes characters in header part of
> SIP messages.
>
> If I use Japanese Kanji letters in From: header (as display name),
> the receiver UA cannot determine which language of characters are
> used in UTF-8 encoded text. Additionally, since UTF-8 CJK kanji
> characters are overlapped between Chinese, Japanese, and Korean,
> sometimes UA can not determine which font should be used to display
> them.
>
> MIME part 3 and updated RFC2231 introduced some solution for this
> problem with combination of character set and language tag. However
> RFC2231 always required escaping 8bit-flaged characters, there are
> huge overhead to express multi byte characters.
>
> As another option, recent Unicode 3.x specification includes
> "language tag" characters to solve this problem, but it is not
> common to support such new feature in the any UTF-8 handling
> libraries. I guess that results corrupted display in UAs.
>
> My suggestion about this, we should prepare supplemental document
> for SIP which specify the way to express international characters
> with language information in header part (modified RFC2231 or UTF-8
> with "lang tag" support).
>
> I would like to hear your opinion.
>
> Thank you.
_______________________________________________
Sip mailing list https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip