mediactrl-0----Page:19
1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31 

With Respect to MRCPv2
A media server contains media processing components that are able to manipulate RTP streams. Typical processing includes mixing multiple streams, transcoding a stream (e.g., from G.711 to MS-GSM), storing or retrieving a stream (e.g., from RTP to HTTP), detecting tones (e.g., DTMF), converting text to speech, and performing speech recognition. Note that an MRCPv2 server may offer the low-level processing for the last two services, where the media server is a client to the MRCPv2 server. Also note it is common to call the package of detecting user input, recording media, and playing media "Interactive Voice Response," or IVR.
Media services offered by the media server are addressed using SIP mechanisms, such as described in RFC 4240. Media servers commonly have a built-in VoiceXML interpreter. VoiceXML describes the elements of the user interaction, and is a proven model for separating application logic (which run on the clients of the media server) from the user interface (which the media server renders). Note this is a fundamentally different interaction model from MRCPv2, where media processing engines offer raw, low-level speech services.
PPT Version