[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sip] language info in SIP headers



> I think the point Ned is making here, is that language
> tagging is not all that useful in practice. With UTF-8,
> you can render Kanji characters. You need the language
> tagging IFF some automata needs to do something special
> based on knowledge of the language, such as translation
> or text to speech. However, it is far from clear whether
> that information is truly needed. Translation of names
> (like those in Subject, From, To) is not likely to work
> in any case, nor is it clear that the language tag is
> needed for it...

You're about to step into the Han unification minefield. From my limited
understanding, the same Unicode/10646 code point is rendered differently
depending on the language (Japanese, Chinese, etc.). I don't know if
this is also true for text-to-speech. Certainly, text-to-speech is
useful for all three headers, e.g., in a voicemail system. Jonathan
Rosenberg ;lang=de and Jonathan Rosenberg ;lang=en sound very different.

> Well, as Ned indicated, and Henning repeated, the in-band
> language tagging provided by UTF-8 is not considered
> acceptable, and we would not be able to write a specification
> which recommends it as the solution for sip.

I think he meant the ?=jp? marking, not UTF-8 language tagging. (That
said, it isn't clear to me why this would be inherently better, except
maybe to avoid tripping non-language-tag-capable 10646 renderers and
annoy the user with random characters instead... The arguments against
tagging seem to be about the same in either syntax - nesting, and all that.)

A pragmatic choice is to say that To/From don't need UTF or ?=? tagging,
since ;language=jp will work just fine, without annoying users with
random ASCII art. The Subject header is more difficult, but there, a
smart text-to-speech renderer can usually guess at the language by
checking its dictionary. From my vague recollection, very few words are
needed to do fairly accurate language recognition, excepting deliberate
attempts to create phrases that are meaningful in two languages. Google
seems to do an ok guessing job on most web pages, for example. (Guessing
isn't pretty, but many users aren't going to bother marking their
Subject line anyway and routinely use multiple languages, so that device
configuration isn't sufficient.)

> 
> -Jonathan R.


_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip