[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sip] language info in SIP headers



I don't think this is a major issue, but there is a semantic difference. 
Accept-Language lists *all* the languages I'm willing to deal with, in 
terms of error messages, so it might be something like

Accept-Language: en, de, fr

This doesn't tell the receiver text-to-speech engine anything about how 
to pronounce my name. As noted, for personal names, even the same ASCII 
string (ignoring the much more complicated CJK sets) can be pronounced 
quite differently, particularly where the name has been "imported". I 
believe the problem is more pronounced with CJK since Chinese names are 
rendered in "classical" Chinese, rather than simplified. (And I'm 
probably getting the details wrong.)


Sean Olson wrote:
> Is there a problem to be solved here that cannot
> be handled by Accept-Language and Content-Language?
> I realize these only cover the payload and not the
> headers. My assumption is that most of the information
> that you would need to extract from the headers
> can either be treated as a string of octets by an
> automata 
> -or-
> is for human consumption and therefore it is
> sufficient if these can be rendered appropriately
> to the end user (ignoring the Han unification 
> problem for now) which UTF-8 should cover. Proper
> names as in the From/To are a good example of this.
> 
> What automata application cannot make use of the 
> Accept-Language header to do its job? (I'm not
> saying one does not exist, I would just like
> clarification of the problem we are trying to solve 
> here)
> 
> Regards,
> Sean Olson
> 
> --- Henning Schulzrinne <hgs@cs.columbia.edu> wrote:
> 
>>>I think the point Ned is making here, is that
>>
>>language
>>
>>>tagging is not all that useful in practice. With
>>
>>UTF-8,
>>
>>>you can render Kanji characters. You need the
>>
>>language
>>
>>>tagging IFF some automata needs to do something
>>
>>special
>>
>>>based on knowledge of the language, such as
>>
>>translation
>>
>>>or text to speech. However, it is far from clear
>>
>>whether
>>
>>>that information is truly needed. Translation of
>>
>>names
>>
>>>(like those in Subject, From, To) is not likely to
>>
>>work
>>
>>>in any case, nor is it clear that the language tag
>>
>>is
>>
>>>needed for it...
>>
>>You're about to step into the Han unification
>>minefield. From my limited
>>understanding, the same Unicode/10646 code point is
>>rendered differently
>>depending on the language (Japanese, Chinese, etc.).
>>I don't know if
>>this is also true for text-to-speech. Certainly,
>>text-to-speech is
>>useful for all three headers, e.g., in a voicemail
>>system. Jonathan
>>Rosenberg ;lang=de and Jonathan Rosenberg ;lang=en
>>sound very different.
>>
>>
>>>Well, as Ned indicated, and Henning repeated, the
>>
>>in-band
>>
>>>language tagging provided by UTF-8 is not
>>
>>considered
>>
>>>acceptable, and we would not be able to write a
>>
>>specification
>>
>>>which recommends it as the solution for sip.
>>
>>I think he meant the ?=jp? marking, not UTF-8
>>language tagging. (That
>>said, it isn't clear to me why this would be
>>inherently better, except
>>maybe to avoid tripping non-language-tag-capable
>>10646 renderers and
>>annoy the user with random characters instead... The
>>arguments against
>>tagging seem to be about the same in either syntax -
>>nesting, and all that.)
>>
>>A pragmatic choice is to say that To/From don't need
>>UTF or ?=? tagging,
>>since ;language=jp will work just fine, without
>>annoying users with
>>random ASCII art. The Subject header is more
>>difficult, but there, a
>>smart text-to-speech renderer can usually guess at
>>the language by
>>checking its dictionary. From my vague recollection,
>>very few words are
>>needed to do fairly accurate language recognition,
>>excepting deliberate
>>attempts to create phrases that are meaningful in
>>two languages. Google
>>seems to do an ok guessing job on most web pages,
>>for example. (Guessing
>>isn't pretty, but many users aren't going to bother
>>marking their
>>Subject line anyway and routinely use multiple
>>languages, so that device
>>configuration isn't sufficient.)
>>
>>
>>>-Jonathan R.
>>
>>
>>_______________________________________________
>>Sip mailing list 
>>https://www1.ietf.org/mailman/listinfo/sip
>>This list is for NEW development of the core SIP
>>Protocol
>>Use sip-implementors@cs.columbia.edu for questions
>>on current sip
>>Use sipping@ietf.org for new developments on the
>>application of sip
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - Feel better, live better
> http://health.yahoo.com


_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip