[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sip] language info in SIP headers



I am happy to hear you already had some discussions.

Henning Schulzrinne wrote:

> - RFC 2482 or http://www.unicode.org/unicode/reports/tr27/#tag provide
> language tagging, as you noted. While I have not seen an official
> statement from the IESG on this, I believe 10646 language tagging has
> generally been deprecated. It's not quite clear whether the reasons (I
> believe, nesting) for that apply to short text such as Subject or
> Organization.


You are probably right.

Here is the comment from Ned Freed, one of the RFC2231 authors:

> Language indicators in a pure UTF-8 environment like SIP headers have to be
> stored as part of the UTF-8 string itself. The plane 14 language tags described
> in
> 
>    http://www.unicode.org/unicode/reports/tr27/#tag
> 
> are the only means of doing this that I'm aware of. However, the use
> of such tags is, as the text indicates, strongly discouraged.
> 
> It is the Unicode Technical Committee's belief that
> 
>   The requirement for language information embedded in plain text data is often
>   overstated.
> 
> As such, I see little chance for support for such tagging in the context of SIP
> headers being widely deployed.
> 
> 				Ned
> 

The reason I prefer this approach is that SIP messages must be small
if it runs over UDP. "language tag" will add only few bytes.

Problem is backward compatibility with widely deployed SIP implementations.


> - For header fields that allow parameters, I believe the best option is
> to add a language parameter, as in
> 
> From: "Somebody" <sip:somebody@somewhere.com> ;language="en"
> 
> This doesn't allow multiple languages within the same header field, but
> is probably useful and sufficient as a hint for text-to-speech and other
> UI rendering. It is also backward-compatible. This works for From, To
> and a few others, such as the NAI header fields.


I think this approach is fair. I guess it is quite rare to use
multiple languages are used in short text field.


> - For header fields that don't allow parameters (Subject, Organization,
> etc.), there are no good options. In general, I believe we should avoid
> such header fields in the future. One possible kludge suggested in
> private conversation would be to add a descriptive header that tells you
> what language each header field contains. Or, simpler but less
> space-efficient, something like
> 
> Subject-Language: jp
> Organization-Language: en-us


I think these are also as good as previous one.

> Any other suggestions?


Another option is use MIME part3 with small modification.
Since current MIME part3 did not allow to use any 8 bit flagged
characters in MIME headers, encoding always escapes UTF-8 characters.

This is an example of MIME part3 header.

From: =?UTF-8*JP?B?<base64 encoded utf-8 string>?= <sip:foo@bar.org>

"base64" part is escaped and increased its size with 1.5 times or more.

If we define new encoding, say 'U', which only escape special
charactors ('=','?',and '%' with '%xx' escaping).

From: =?UTF-8*JP?U?<raw utf-8 string>?= <sip:foo@bar.org>

This extension saves overhead on 8 bit charactors, and will not
suffer any older implementations.

Additionally, if we can assume utf-8 and 'U' encoding, then we can
minimize the escaping syntax too.

From: =?JP?<raw utf-8 string>?= <sip:foo@bar.org>

This still valid as display name value.

Both Keith and Ned suggested me that if we define 'U' encoding, it
should be limited within SIP specification.

-- 
======================================
         SHINGO FUJIMOTO
    FUJITSU LABORATORIES LIMITED
E-mail: shingo_fujimoto@jp.fujitsu.com



_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip