idnits 2.17.1 draft-ietf-avt-rtp-sbc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 9, 2010) is 5069 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'A2DPV10' ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Working Group AVT C. Hoene 2 Internet Draft University of Tuebingen 3 Intended status: Standards Track F. de Bont 4 Expires: December 2010 Philips Electronics 5 June 9, 2010 7 RTP Payload Format for Bluetooth's SBC audio codec 8 draft-ietf-avt-rtp-sbc-00.txt 10 Status of this Memo 12 This Internet-Draft is submitted in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 This Internet-Draft will expire on December 9, 2010. 33 Copyright Notice 35 Copyright (c) 2010 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Abstract 50 This document specifies a Real-time Transport Protocol (RTP) payload 51 format to be used for the low complexity subband codec (SBC), which 52 is the mandatory audio codec of the Advanced Audio Distribution 53 Profile (A2DP) Specification written by the Bluetooth(r) Special 54 Interest Group (SIG). The payload format is designed to be able to 55 interoperate with existing Bluetooth A2DP devices, to provide high 56 streaming audio quality, interactive audio transmission over the 57 internet, and ultra-low delay coding for jam sessions on the 58 internet. This document contains also a media type registration which 59 specifies the use of the RTP payload format. 61 Table of Contents 63 1. Introduction...................................................3 64 2. Conventions used in this document..............................3 65 3. Background.....................................................3 66 4. Usage Scenarios................................................5 67 4.1. Scenario 1: Interconnection of A2DP devices...............5 68 4.2. Scenario 2: High quality interactive audio transmissions..6 69 4.3. Scenario 3: Ensembles performing over a network...........6 70 5. Header Usage...................................................7 71 6. Payload Format.................................................8 72 6.1. Media payload format header...............................8 73 6.2. SBC Frame Structure.......................................9 74 6.3. Frame header..............................................9 75 6.4. Remaining frame..........................................12 76 7. Payload Format Parameters.....................................12 77 7.1. SBC Media Type Registration..............................12 78 7.1.1. Capabilities: A2DP modes............................13 79 7.1.2. Capabilities: other modes...........................15 80 7.2. Mapping to SDP Parameters................................15 81 7.2.1. Offer-Answer Model Considerations...................15 82 7.2.2. Declarative SDP Considerations......................17 83 8. Congestion Control............................................17 84 9. Packet loss concealment.......................................18 85 10. Security Considerations......................................19 86 11. IANA Considerations..........................................19 87 12. References...................................................20 88 12.1. Normative References....................................20 89 12.2. Informative References..................................20 90 13. Acknowledgments..............................................22 92 1. Introduction 94 The Bluetooth(r) Special Interest Group (SIG) specifies in the 95 Advanced Audio Distribution Profile (A2DP) [A2DPV10] a mono and 96 stereo high quality audio subband codec (SBC). This document 97 specifies the payload format for the encapsulation of SBC encoded 98 audio frames into the Real-time Transport Protocol (RTP). 100 SBC has a low computational complexity at modest compression rates. 101 Its bit rate can be controlled widely. Recommended operational modes 102 range from 127 to 345 kb/s, for mono and stereo audio signals. SBC's 103 algorithmic delay can be as low as 16 samples making it ideal for 104 ensembles playing music over the network requiring ultra low acoustic 105 delays. 107 2. Conventions used in this document 109 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 110 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 111 document are to be interpreted as described in RFC-2119 [RFC2119]. 113 The following acronyms are used in this document: 115 A2DP - Audio Distribution Profile 116 AAC - Advanced Audio Coding 117 ATRAC - Adaptive Transform Acoustic Coding 118 DCCP - Datagram Congestion Control Protocol 119 MP3 - MPEG-1 Audio Layer 3 120 SBC - SubBand Codec 121 SIG - Special Interest Group 123 3. Background 125 The A2DP specification is intended for streaming of music content to 126 headphones, headsets, or speakers over Bluetooth wireless channels. 127 A2DP supports multiple audio coding including MP3, AAC, ATRAC, which 128 are all non-mandatory. To ensure interoperability, the SBC codec has 129 been specified, which shall be included into all A2DP Bluetooth 130 devices. 132 The SBC is a low complexity subband codec based on earlier work 133 presented in [Bon1995] and [Rault1989]. It has a moderate compression 134 ratio. The SBC encoder has filter banks splitting the audio signal 135 into 4 or 8 subbands. Then the codec decides with how many bits each 136 subband is encoded and finally quantizes the subband signals 137 blockwise. An SBC frame can have different block sizes. The size of a 138 block can be 4, 8, 12 or 16. Both decoder and encoder shall support 139 all four block sizes. 141 SBC can operate at four different sampling frequencies. The sampling 142 frequency can be selected from a set of 16, 32, 44.1, and 48 kHz. It 143 is mandatory that each SBC decoder can operate at the frequencies 144 44.1 and 48 kHz. Each SBC encoder shall work at least at a sampling 145 rate of 44.1 or 48 kHz. 147 Four channel modes are supported, which are mono, dual channel, 148 stereo, and joint-stereo. The decoder shall support all four of them; 149 the encoder shall support mono and at least one additional mode. 151 SBC can use four or eight subbands. The decoder shall support both; 152 the encoder shall support at least 8 subbands. 154 The bit allocation modes of SBC can be either based on signal to 155 noise ratio or on loudness. The decoder shall support both modes; the 156 encoder shall support at least the loudness mode. 158 The SBC encoder reduces one block to a given number of bits. The bit- 159 pool variable defines how many bits are used per block. A2DP devices 160 define the range of valid bit-pool values by providing minimum and 161 maximum bit-pool values. The bit-pool values shall range from 2 to 162 250 but shall not be larger than number of subbands times 16 for the 163 mono and dual and times 32 for the stereo and joint-stereo channel 164 modes. 166 SBC encoders inside A2DP devices may be capable of changing the bit- 167 pool parameter dynamically during the encoding process. For example, 168 algorithms were invented that change the number of bits depending on 169 the current acoustic content [Pilati2008]. 171 The decoder shall support all possible bit-pool values that do not 172 result in excess of maximum bit rate, which is 320kb/s for mono and 173 512kb/s for two-channel modes. The encoder is required to support at 174 least one possible bit-pool value. The A2DP specification recommends 175 the encoding parameters given in Table 1. 177 +------------------------------------------------------------+ 178 | SBC encoder settings at Medium Quality | 179 +--------------------------------+-------------+-------------+ 180 | | Mono | Joint Stereo| 181 | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | 182 | Bitpool value | 19 | 18 | 35 | 33 | 183 | Resulting frame length (bytes) | 46 | 44 | 83 | 79 | 184 | Resulting bit rate (kb/s) | 127 | 132 | 229 | 237 | 185 +--------------------------------+------+------+------+------+ 186 | SBC encoder settings at High Quality | 187 +--------------------------------+-------------+-------------+ 188 | | Mono | Joint Stereo| 189 | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | 190 | Bitpool value | 31 | 29 | 53 | 51 | 191 | Resulting frame length (bytes) | 70 | 66 | 119 | 115 | 192 | Resulting bit rate (kb/s) | 193 | 198 | 328 | 345 | 193 +--------------------------------+------+------+------+------+ 194 + Other settings: Block length = 16, loudness, subbands = 8 | 195 +------------------------------------------------------------+ 197 Table 1: Recommended sets of SBC parameters in the SRC device as 198 given in [A2DPV10] 200 The A2DP V1.0 specification describes a media payload format, which 201 we adopt in this document one-to-one without any change. 203 4. Usage Scenarios 205 As compared to many other encoding schemes, the SBC is general enough 206 to support multiple, quite diverse usage scenarios. Thus, it might be 207 required to change the behavior of the encoding and transmission to 208 achieve a good performance for a given usage scenario. Thus, we 209 enlist three main scenarios and describe their quality requirements 210 and their impact on the encoding and transmission. 212 4.1. Scenario 1: Interconnection of A2DP devices 214 In this scenario it is intended to interconnect Bluetooth A2DP 215 devices. RTP frames generated by an A2DP device can be transmitted 216 directly via this RTP profile. Vice versa, an A2DP device should be 217 able to receive the RTP profile by default. Thus, the payload format 218 describe in this RFC MUST be fully interoperable with any A2DP 219 device. 221 The transmission between two A2DP devices has a constant frame rate 222 with a sender-controlled bit rate. It is not anticipated that the 223 transmission is adapted to congestion and bandwidth variation. 225 4.2. Scenario 2: High quality interactive audio transmissions 227 In the second scenario we consider a telephone call having a very 228 good audio quality at modest acoustic one-way latencies ranging from 229 50 and 150 ms [ITUG107], so that music can be listened over the 230 telephone while two persons talk together interactively. 232 In addition, the reliability of the audio transmission should be 233 high, even in cases of low and varying bandwidth. 235 This second scenario assumes that the SBC transmission is used on top 236 of a transport protocol that implements a congestion control 237 algorithm. Using the SBC encoding, the sampling, bit, and frame rates 238 should be controlled to cope with congestion. For example, if the 239 available transmission bandwidth is too low to allow SBC to transmit 240 audio at a high quality, the application can lower the sampling, bit, 241 or frame rate of the stream at the cost of higher algorithmic delay 242 or a degraded audio quality. In this case, changing the sampling or 243 frame rate may cause a short acoustic artifact because SBC's internal 244 filters must be reset. 246 The A2DP media format does not allow a dynamic change of the encoding 247 parameters beside the bit-pool value. The encoding parameters can 248 only be altered with the "Change Parameters" procedure, which is 249 defined in [GAVDPV12]. Such a change will cause a hearable 250 interruption and thus shall be avoided. 252 If an application using RTP wants to switch between different sets of 253 encoding parameters, then these set of parameter CAN be either 254 negotiate beforehand (as described in Section 7.2.) or an 255 renegotiation similar to the "Change Parameters" procedure CAN take 256 place. An application MUST NOT change the sampling frequency, block 257 length, encoding mode or the number of subbands within one RTP 258 session having the same RTP payload identifier. 260 4.3. Scenario 3: Ensembles performing over a network 262 In some usage scenarios, users want to act simultaneously and not 263 just interactively. For example, if persons sing in a chorus, if 264 musicians jam, or if e-sportsmen play computer games in a team 265 together, they need to acoustically communicate. 267 In these scenarios, the latency requirements are much harder than for 268 interactive usages. For example, if two musicians are placed more 269 than 10 meters apart, they can hardly keep synchronized. Empirical 270 studies [Gurevich2004] have shown that if ensembles playing over 271 networks, the optimal acoustic latency is around 11.5 ms with 272 targeted range from 10 to 25 ms. 274 To fulfill such requirements, it might be necessary to further reduce 275 the algorithmic coding delay by varying the block length parameter. 276 The default value of the block length parameter is chosen such that 277 the coding efficiency is maximized. For example, at 44.1 kHz and 278 using 8 subbands and a block length of 16, the algorithmic delay is 279 4.72 ms (208 samples). The value of the block length parameter can be 280 decreased, at the expense of a higher bit rate or lower quality, to 281 lower the latency to fulfill the very stringent latency requirements 282 of this scenario. 284 Still, given the speed of light as the fundamental limit of speed of 285 information exchange, distributed ensembles can perform only 286 regionally if latency budget of 25 ms must keep. Typically, an 287 optical fiber has a refractive index of 1.46 and thus in an optical 288 fiber bits travel about 5136 km one-way in 25 ms. 290 5. Header Usage 292 The format of the RTP header is specified in [RFC3550]. The payload 293 format defined in this document uses the fields of the header in a 294 manner fully consistent with that specification. 296 marker (M): In accordance with [A2DPV10] the marker bit MUST be set 297 to zero. 299 payload type (PT): The assignment of an RTP payload type for this 300 packet format is outside the scope of the document, and 301 will not be specified here. It is expected that the RTP 302 profile under which this payload format is being used will 303 assign a payload type for this codec or specify that the 304 payload type is to be bound dynamically (see Section 6.2). 306 timestamp (TS): The RTP timestamp clock frequency MUST be the same as 307 the sampling frequency, which has been negotiated for the 308 current RTP session (see Section 6.2). If a media payload 309 consists of multiple SBC frames, the TS of the media packet 310 header represents the TS of the first SBC frame. The TS of 311 the following SBC frames MUST be calculated using the 312 sampling rate and the number of samples per frame per 313 channel. A change in sampling frequency MUST NOT occur 314 within one media packet. 315 A SBC frame may be fragmented into multiple media packets 316 to reduce the packetisation delay. Then, all packets that 317 make up a fragmented SBC frame MUST use the same TS. 319 6. Payload Format 321 The format of the payload MUST follow exactly the description given 322 in the appendix of [A2DPV10]. In the following, for the sake of 323 clarity, we repeat the payload format definition. 325 The payload MUST consist of one media payload format header described 326 in Section 5.2 and SBC frames described in Section 5.3. Either an 327 integral number of SBC frames or one fragment of an SBC frame can be 328 transmitted: 330 (a) When the payload contains an integral number of SBC frames 331 +--------+-----------+----------- -+ 332 | Header | SBC frame | SBC frame ... | 333 +--------+-----------+----------- -+ 335 (b) When the SBC frame is fragmented 336 +--------+---------------------------------------+ 337 | Header | First fragment of SBC frame | 338 +--------+---------------------------------------+ 340 +--------+---------------------------------------+ 341 | Header | Subsequent fragments of the SBC frame | 342 +--------+---------------------------------------+ 344 A media payload always starts with an 8-bit header, which is placed 345 before the SBC data. 347 The SBC frame can be fragmented across several media payloads. All 348 fragmented packets, except the last one, MUST have the same total 349 data packet size. 351 This payload fragmentation CAN be preferred against the fragmentation 352 mechanisms of lower layers (e.g., IP) because the packetisation delay 353 and thus the acoustic latency are reduced and the error robustness is 354 increased because parts of the SBC frame can be considered for 355 decoding. 357 6.1. Media payload format header 359 The following figure shows the format of media payload header, which 360 consists of one byte. 362 0 1 2 3 4 5 6 7 363 +-+-+-+---+-+-+-+-+ 364 |F|S|L|RFA|#frames| 365 +-+-+-+---+-+-+-+-+ 367 F bit - Set to 1 if the SBC frame is fragmented, otherwise set to 0. 369 S bit - Set to 1 for the starting packet of a fragmented SBC frame, 370 otherwise set to 0. 372 L bit - Set to 1 for the last packet of a fragmented SBC frame, 373 otherwise set to 0. 375 RFA - SHOULD be zero, reserved for future addition. 377 #frames (4 bits) - If the F bit is set to 0, this field indicates the 378 number of frames contained in this packet. If the F bit is 379 set to 1, this field indicates the number of remaining 380 fragments, including the current fragment. Thus the last 381 counter value MUST be one. For example, if there are three 382 fragments then the counter has value 3, 2 and 1 for 383 subsequent fragments. 385 6.2. SBC Frame Structure 387 The complete SBC frame consists of a frame header, scale factors, 388 audio samplings, and padding bits. The following diagram shows the 389 general SBC frame format layout: 391 +--------------+---------------+---------------+---------+ 392 | frame_header | scale_factors | audio_samples | padding | 393 +--------------+---------------+---------------+---------+ 395 The following sections describe the audio format, which consists of 396 bits stored in a bandwidth-efficient, compact mode. 398 6.3. Frame header 400 The frame header consists of fields defined in [A2DPV10], which are 401 SYNCWORD, SAMPLING_FREQUENCY, BLOCKS, CHANNEL_MODE, 402 ALLOCATION_METHOD, SUBBANDS, BITPOOL, CRC_CHECK, optionally JOIN bit 403 fields and a RFA. The layout of the first four bytes of the frame 404 header is given in the following table. 406 0 1 2 3 407 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | SYNCWORD |SF.|BL.|CM.|A|S|BITPOOL |CRC_CHECK | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 Legend: SF.=SAMPLING FREQUENCY, BL.=BLOCKS, CM.=CHANNEL_MODE, 412 A.=ALLOCATION_METHOD, S.=SUBBANDS 414 SYNCWORD (8 bits): The first field is the 8 bit synchronization word, 415 which is always set to 156. 417 SAMPLING_FREQUENCY (2 bits): The sampling frequency field indicates 418 with which sampling frequency the SBC frame has been 419 encoded. The table below specifies the corresponding 420 sampling frequencies for the bit patterns. The sampling 421 frequency MUST NOT be changed without changing the payload 422 type, too. 424 +--------------------+----------------+ 425 | SAMPLING_FREQUENCY | sampling | 426 | bit 0 1 | frequency (Hz) | 427 +--------------------+----------------+ 428 | 0 0 | 16000 | 429 | 0 1 | 32000 | 430 | 1 0 | 44100 | 431 | 1 1 | 48000 | 432 +--------------------+----------------+ 434 BLOCKS (2 bits): It indicates the block size with which the stream 435 has been encoded. The block size is selected conforming to 436 the table below. The block size MUST NOT be changed without 437 changing the payload type, too. 439 +---------+-----------+ 440 | BLOCKS | Number of | 441 | bit 0 1 | blocks | 442 +---------+-----------+ 443 | 0 0 | 4 | 444 | 0 1 | 8 | 445 | 1 0 | 12 | 446 | 1 1 | 16 | 447 +---------+-----------+ 449 CHANNEL_MODE (2 bits): These two bits indicate with which channel 450 mode the frame has been encoded. The number of channels 451 depends on this information. The channel mode MUST NOT be 452 changed without changing the payload type, too. 454 +--------------+--------------+-----------+ 455 | CHANNEL_MODE | channel mode | number of | 456 | bit 0 1 | | channels | 457 +--------------+--------------+-----------+ 458 | 0 0 | MONO | 1 | 459 | 0 1 | DUAL_CHANNEL | 2 | 460 | 1 0 | STEREO | 2 | 461 | 1 1 | JOINT_STEREO | 2 | 462 +--------------+--------------+-----------+ 464 ALLOCATION_METHOD (1 bit): This bit indicates how the bit pool is 465 allocated to different subbands. Either it is based on the 466 loudness of the sub band signal or on the signal to noise 467 ratio. The allocation method MUST NOT be changed without 468 changing the payload type, too. 470 +-------------------+------------+ 471 | ALLOCATION_METHOD | allocation | 472 | bit 0 | method | 473 +-------------------+------------+ 474 | 0 | LOUDNESS | 475 | 1 | SNR | 476 +-------------------+------------+ 478 SUBBANDS (1 bit): This bit indicates the number of subbands with 479 which the frame has been encoded. The number of subband 480 MUST NOT be changed without changing the payload type, too. 482 +----------+-----------+ 483 | SUBBANDS | number of | 484 | bit 0 | subbands | 485 +----------+-----------+ 486 | 0 | 4 | 487 | 1 | 8 | 488 +----------+-----------+ 490 BITPOOL (8 bits): This unsigned integer indicates the size of the bit 491 allocation pool that has been used for encoding the current 492 block. The value of the bit-pool field MUST NOT exceed 16 493 times the number of subbands for the MONO and DUAL_CHANNEL 494 channel modes and 32 times the number of subbands for the 495 STEREO and JOINT_STEREO channel modes. The bitpool value 496 MAY change from SBC frame to the next. In addition, the 497 bitpool value MUST be restricted such that it does not 498 result in excess of maximum bit rate, which is 320kb/s for 499 mono and 512kb/s for two-channel modes. 501 The remaining part of the header consists of CRC_CHECK, optionally 502 JOIN bit fields and a RFA. 504 6.4. Remaining frame 506 The remaining part of the frame includes scale factors and audio 507 sample data, which are processed by the codec as described in 508 [A2DPV10]. 510 7. Payload Format Parameters 512 This section defines the parameters that MAY be used to configure 513 optional features in the SBC payload format over RTP transmission. 515 The parameters are defined here as part of the media subtype 516 registrations for the SBC. A mapping of the parameters into the 517 Session Description Protocol (SDP) [RFC4566] is also provided for 518 those applications that use SDP. In control protocols that do not use 519 MIME or SDP, the media type parameters must be mapped to the 520 appropriate format used with that control protocol. 522 7.1. SBC Media Type Registration 524 [Note to RFC Editor: Please replace all occurrences of RFC XXXX by 525 the RFC number assigned to this document] 527 This registration is done using the template defined in [RFC4288] and 528 following [RFC4855]. 530 MIME media type name: audio 532 MIME subtype name: SBC 534 Required parameters: none 536 Optional parameters: 538 Capabilities: The capabilities of the encoder and decoder are 539 described by a parameter string that MUST start with an 540 octet written as two hexadecimal digits. This octet is 541 called VERSION and MUST be identical to the SYNCWORD that 542 will be used in the SBC frames. It is used to distinguish 543 different negotiation procedures. 544 The interpretation of the following characters depends on 545 the value of the VERSION octet. Refer to Section 7.1.1. and 546 Section 7.1.2. to find a description. 548 Encoding considerations: This media type is framed and contains 549 binary data; see Section 4.8 of RFC 4288. 551 Security considerations: See Section 9 of RFC XXXX 553 Interoperability considerations: none 555 Published specification: RFC XXXX 557 Applications which use this media type: Audio and video conferencing 558 tools, distributed orchestras 560 Additional information: none 562 Person & email address to contact for further information: Christian 563 Hoene, hoene@uni-tuebingen.org 565 Intended usage: COMMON 567 Restrictions on usage: none 569 Author: Christian Hoene, Frans de Bont 571 Change controller: IETF Audio/Video Transport working group delegated 572 from the IESG 574 7.1.1. Capabilities: A2DP modes 576 The capabilities of the encoder and decoder MUST start with the 577 hexadecimal value of 9C, followed by a comma and four comma-separated 578 hexadecimal octets. These four octets called Octet 1, 2, 3, and 4 579 share a similar meaning as those defined in Section 4.3.2 of 580 [A2DPV10]. However, because sampling frequency and number of channels 581 are already given in the SDP parameter "a=rtpmap", bit 0 up to and 582 including bit 3 of Octet 1 MUST BE ignored if received. The meaning 583 of the bits and the octets are described in the following 584 enumeration. The bit numbering follows the network bit order having 585 the highest bit first. 587 o Octet 1: Bit 0 (aka 2^7): If one, then the sampling frequency 588 16000 Hz is supported (ignored during SDP negotiations but SHOULD 589 be set if the clock rate is 16000 and CAN be cleared otherwise). 591 o Octet 1: Bit 1: If one, then the sampling frequency 32000 Hz is 592 supported (ignored during SDP negotiations but SHOULD be set if 593 the clock rate is 32000 and CAN be cleared otherwise). 595 o Octet 1: Bit 2: If one, then the sampling frequency 44100 Hz is 596 supported (ignored during SDP negotiations but SHOULD be set if 597 the clock rate is 44100 and CAN be cleared otherwise). 599 o Octet 1: Bit 3: If one, then the sampling frequency 48000 Hz is 600 supported (ignored during SDP negotiations but SHOULD be set if 601 the clock rate is 48000 and CAN be cleared otherwise). 603 o Octet 1: Bit 4: If one, then the channel mode MONO is supported 604 (ignored during SDP negotiations but SHOULD be set if the number 605 of channels is one and CAN be cleared otherwise). 607 o Octet 1: Bit 5: If one, then the channel mode DUAL_CHANNEL is 608 supported (*). 610 o Octet 1: Bit 6: If one, then the channel mode STEREO is supported 611 (*). 613 o Octet 1: Bit 7 (aka 2^0): If one, then the channel mode 614 JOINT_STEREO is supported (*). 616 o Octet 2: Bit 0: If one, the block length can be 4. 618 o Octet 2: Bit 1: If one, the block length can be 8. 620 o Octet 2: Bit 2: If one, the block length can be 12. 622 o Octet 2: Bit 3: If one, the block length can be 16. 624 o Octet 2: Bit 4: If one, the number of subband can be 4. 626 o Octet 2: Bit 5: If one, the number of subband can be 8. 628 o Octet 2: Bit 6: If one, the allocation mode based on signal to 629 noise ratio is supported. 631 o Octet 2: Bit 7: If one, the allocation mode based on loudness is 632 supported. 634 o Octet 3: Unsigned integer: The minimal bit-pool value that the 635 device supports. MUST be larger or equal than 2 and less or equal 636 than the maximal bit-pool value. 638 o Octet 4: Unsigned integer: The maximal bit-pool value that the 639 device supports MUST be equal or lower than 250. 641 (*) At least one of the bits 5, 6 or 7 of Octet 1 MUST be set if the 642 number of channels is set to two in the SDP parameter "a=rtpmap". 644 7.1.2. Capabilities: other modes 646 If the value of the VERSION octet is not equal to a known SYNCWORD 647 value, then the capabilities MUST be ignored. 649 7.2. Mapping to SDP Parameters 651 The information carried in the media type specification has a 652 specific mapping to fields in the Session Description Protocol (SDP) 653 [RFC4566], which is commonly used to describe RTP sessions. When SDP 654 is used to specify sessions employing the SBC codec, the mapping is 655 as follows: 657 o The media type ("audio") goes in SDP "m=" as the media name. 659 o The media subtype ("SBC") goes in SDP "a=rtpmap" as the encoding 660 name. 662 o The RTP in "a=rtpmap" MUST be set to the selected 663 sampling frequency. 665 o The RTP in "a=rtpmap" specifies the number 666 of audio channels: 2 for stereo material (refer to RFC 4566 667 [RFC4566]) and 1 for mono. If one channel is used, the encoding 668 parameter can be omitted. 670 o The parameter "capabilities" goes in the SDP "a=fmtp" by the 671 capabilities description as described in Section 7.1. 673 7.2.1. Offer-Answer Model Considerations 675 The Bluetooth standard document [AVDTPV12] describes how an A2DP 676 source and an A2DP sink negotiate their capabilities. Prior to the 677 establishment of the audio stream, one A2DP device can query the 678 service capabilities of the other device using the "Get Capabilities 679 Procedure". In any case, the coding mode is set using the "Set 680 Configuration" procedure. Only after a successful configuration, the 681 stream connection can be established. 683 In addition to the Bluetooth negotiation procedure, the SDP 684 negotiation MUST NOT agree on one single configuration but CAN agree 685 that multiple configuration modes, which are identified by different 686 payload type values, are supported. 688 The following considerations apply when using SDP offer-answer 689 procedures [RFC3264] to negotiate the use of SBC payload in RTP: 691 o The "capabilities" parameter is bi-directional, i.e., the 692 restricted mode set applies to media both to be received and sent 693 by the declaring entity. If the capabilities were supplied in the 694 offer, the answerer MUST return either the same mode-set or a 695 subset of this mode-set. If no capabilities were supplied in the 696 offer, the answerer MAY return capabilities to restrict the 697 possible modes. In any case, the capabilities in the answer then 698 apply for both offerer and answerer. The offerer MUST NOT send 699 frames of a mode that has been removed by the answerer. The 700 negotiation is finished if the offerer and the answerer have 701 agreed upon explicit capabilities for each payload type number. 702 The number of blocks and subbands and the kind of allocation 703 method and channel mode MUST haven been negotiated unambiguously. 705 o Any unknown parameter in an offer MUST be ignored by the receiver 706 and MUST NOT be included in the answer. 708 Below are some example parts of SDP offer-answer exchanges. 710 o Example 1 711 Offer: SBC all A2DP modes 712 m=audio 54874 RTP/AVP 96 713 a=rtpmap:96 SBC/48000/2 714 a=fmtp:96 capabilities=9C,17,FF,02,FA 715 m=audio 54874 RTP/AVP 97 716 a=rtpmap:97 SBC/48000 717 a=fmtp:97 capabilities=9C,18,FF,02,FA 718 m=audio 54874 RTP/AVP 98 719 a=rtpmap:98 SBC/44100/2 720 a=fmtp:98 capabilities=9C,27,FF,02,FA 721 m=audio 54874 RTP/AVP 99 722 a=rtpmap:99 SBC/44100 723 a=fmtp:99 capabilities=9C,28,FF,02,FA 724 m=audio 54874 RTP/AVP 100 725 a=rtpmap:100 SBC/32000/2 726 a=fmtp:101 capabilities=9C,47,FF,02,FA 727 m=audio 54874 RTP/AVP 102 728 a=rtpmap:102 SBC/32000 729 a=fmtp:102 capabilities=9C,48,FF,02,FA 730 m=audio 54874 RTP/AVP 103 731 a=rtpmap:103 SBC/16000/2 732 a=fmtp:103 capabilities=9C,87,FF,02,FA 733 m=audio 54874 RTP/AVP 104 734 a=rtpmap:104 SBC/48000 735 a=fmtp:104 capabilities=9C,88,FF,02,FA 737 Answer: 48 kHz, JOINT_STEREO, 16 blocks, 8 subbands, LOUDNESS 738 m=audio 59452 RTP/AVP 96 739 a=rtpmap:96 SBC/48000/2 740 a=fmtp:96 capabilities=9C,11,15,02,FA 742 o Example 2 743 Offer: The A2DP SBC 48 kHz modes with mono or joint stereo, 8 744 subbands, loudness allocation method. In addition an unknown mode 745 called AD is offered. 746 m=audio 54874 RTP/AVP 96 747 a=rtpmap:96 SBC/48000/2 748 a=fmtp:96 capabilities=9C,11,F5,02,FA 749 m=audio 54874 RTP/AVP 97 750 a=rtpmap:97 SBC/48000/1 751 a=fmtp:97 capabilities=9C, 18,F5,02,FA 752 m=audio 54874 RTP/AVP 98 753 a=rtpmap:98 SBC/16000/1 754 a=fmtp:98 capabilities=AD 756 Answer: both A2DP modes are accepted but the unknown mode AD is 757 ignored. 758 m=audio 59452 RTP/AVP 96 759 a=rtpmap:96 SBC/48000/2 760 a=fmtp:96 capabilities=9C,11,F5,02,FA 761 m=audio 59452 RTP/AVP 9 762 a=rtpmap:97 SBC/48000/1 763 a=fmtp:97 capabilities=9C,18,F5,02,FA 765 7.2.2. Declarative SDP Considerations 767 For declarative use of SDP nothing specific is defined for this 768 payload format. The configuration given by the SDP MUST be used when 769 sending and/or receiving media in the session. 771 8. Congestion Control 773 One Bluetooth links, bandwidth can be reserved and thus the A2DP 774 specification does not consider any kind of congestion control. 775 However, congestion control is an important issue for any usage in 776 non-dedicated networks such as the Internet. Thus, congestion control 777 for RTP MUST be used in accordance with [RFC3550] and any appropriate 778 profile (for example, [RFC3551]). An additional requirement if best- 779 effort service is being used is: users of this payload format MUST 780 monitor packet loss to ensure that the packet loss rate is within 781 acceptable parameters. 783 Reducing the session bandwidth is possible by one or more of the 784 following means, which all will have negative impact to the users' 785 experience as he can notice a higher latency or a degraded audio 786 quality. The selection of the following means depends on current 787 usage scenario, the congestion control protocol, and the perceptual 788 assessment of the audio transmission and is not subject of this 789 specification. 791 1. If the bandwidth and frame rate shall be reduced, the sampling 792 rate can be lowered [Boutremans2004,Hoene2005]. 794 2. If the gross bandwidth and the frame rate shall be reduced, more 795 blocks can be put into one SBC frame and more SBC frames can be 796 placed in one RTP payload. 798 3. If the bandwidth shall be reduced, then the bit-pool value can be 799 reduced, so that the frames get smaller or the mono mode can be 800 selected. 802 4. If the bandwidth is very low, instead of an ongoing transmission, 803 a push-to-talk like service with temporary transmission 804 interruptions and a high delay can be applied. 806 5. If the packet loss rate is very high, the session shall be 807 terminated because the quality of the audio transmission is too 808 bad to be useful [Widmer2002]. 810 Because the SBC encoding can be tuned with many parameters, it is 811 especially useful for rate adaptive transport protocols such as DCCP 812 [RFC4340] or TCP [RFC4571]. The report [Hoene2009] describes, which 813 SBC coding mode gives the best speech and audio quality under known 814 bandwidth and time constrains. 816 9. Packet loss concealment 818 In order to cope with packet losses, the SBC decoder SHOULD be 819 extended by a packet loss concealment algorithm. The packet loss 820 concealment algorithm SHOULD provide a good audio quality in case of 821 losses. Otherwise, the congestion control algorithm can not trade off 822 well the quality impairment due to packet losses versus the quality 823 impairment caused by different encoding modes. It is RECOMMENDED that 824 at a least the reserve order replicated pitch periods (RORPP) 825 algorithm as defined in [Hoene2009] or any better is used. 827 If this requirement is not meet, then the congestion control cannot 828 predict the impact of packet loss on the audio quality and thus will 829 not be able to control the encoding parameters optimally. 831 10. Security Considerations 833 RTP packets using the payload format defined in this specification 834 are subject to the general security considerations discussed in the 835 RTP specification [RFC3550] and any appropriate profile (for example, 836 [RFC3551]). 838 As this format transports encoded speech/audio, the main security 839 issues include confidentiality, integrity protection, and 840 authentication of the speech/audio itself. The payload format itself 841 does not have any built-in security mechanisms. Any suitable 842 external mechanisms, such as SRTP [RFC3711], MAY be used. 844 This payload format and the SBC encoding do not exhibit any large 845 non-uniformity in the receiver-end computational load and thus are 846 unlikely to pose a denial-of-service threat due to the receipt of 847 pathological datagrams. 849 11. IANA Considerations 851 It is requested that one new media subtype (audio/SBC) and one 852 optional parameter for this media subtype ("capabilities") are 853 registered by IANA, see Section 5.1 and Section 5.2. 855 12. References 857 12.1. Normative References 859 [A2DPV10] Bluetooth SIG, "Advanced Audio Distribution Profile", Audio 860 Video WG, adopted specification, revision V1.0, May 22th, 861 2003. 863 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 864 Requirement Levels", BCP 14, RFC 2119, March 1997. 866 [RFC3264] Rosenberg, J. and Schulzrinne, H., "An Offer/Answer 867 Modelwith Session Description Protocol (SDP)", RFC 3264, 868 June 2002. 870 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 871 Jacobson, "RTP: A Transport Protocol for Real-Time 872 Applications", STD 64, RFC 3550, July 2003. 874 [RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio and 875 Video Conferences with Minimal Control", STD 65, RFC 3551, 876 July 2003. 878 [RFC4288] Freed, N. and Klensin, J., "Media Type Specifications and 879 Registration Procedures", BCP 13, RFC 4288, December 2005. 881 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 882 Description Protocol", RFC 4566, July 2006. 884 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 885 Formats", RFC 4855, February 2007. 887 12.2. Informative References 889 [AVDTPV12] Bluetooth SIG, "Audio/Video Distribution Transport 890 Protocol Specification", Audio Video WG, adopted 891 specification, revision V12, April 16th, 2007. 893 [Bon1995] de Bont, F., Groenewegen, M., and Oomen, W., "A High 894 Quality Audio-Coding System at 128 kb/s", 98th AES 895 Convention, February 25 - 28, 1995. 897 [Boutremans2004] Boutremans, C., Le Boudec J.-Y., and Widmer, J., 898 "End-to-end congestion control for tcp-friendly flows with 899 variable packet size", ACM Computer Communication Review, 900 Vol. 31, No. 2, pp. 137-151, 2004. 902 [Pilati2008] Pilati, L., Zadissa, M., "Enhancements to the SBC CODEC 903 for Voice Communication in Mobile Devices", AES Convention 904 124, No. 7347, May 2008. 906 [Hoene2009] Hoene, C., Hyder, M.. "Considering bluetooth's subband 907 codec (SBC) for wideband speech and audio on the internet". 908 Technical Report WSI-2009-3, Universitaet Tuebingen - WSI, 909 72076 Tuebingen, Germany, October 2009. 911 [GAVDPV12] Bluetooth SIG, "Generic Audio/Video Distribution Profile", 912 Audio Video WG, adopted specification, revision V12, April 913 16th, 2007. 915 [Gurevich2004] Gurevich, M., Chafe, C., Leslie, G., and Tyan, S., 916 "Simulation of Networked Ensemble Performance with Varying 917 Time Delays: Characterization of Ensemble Accuracy", 918 Proceedings of the 2004 International Computer Music 919 Conference, Miami, USA, 2004. 921 [Hoene2005] Hoene, C., and Karl, H., and Wolisz, A., "A perceptual 922 quality model intended for adaptive VoIP applications", 923 International Journal of Communication Systems, Wiley, 924 August 2005. 926 [ITUG107] ITU-T G.107, "The E-model, a computational model for use in 927 transmission planning", ITU-T Recommendation G.107, May 928 2000. 930 [Rault1989] Rault, J., Dehery, Y., Roudaut, J., Bruekers, A., and 931 Veldhuis, R., "Digital transmission system using subband 932 coding of a digital signal", Publication number: EP0400755 933 (B1). 935 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 936 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 937 RFC 3711, March 2004. 939 [RFC4340] Kohler, E., Handley, M., and Floyd, S., "Datagram 940 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 942 [RFC4571] Lazzaro, J., "Framing Real-time Transport Protocol (RTP) 943 and RTP Control Protocol (RTCP) Packets over Connection- 944 Oriented Transport", RFC4571, July 2006. 946 [Widmer2002] Widmer, J., Mauve, M., and Damm, J., "Probabilistic 947 congestion control for non-adaptable flows", In 12th 948 International Workshop on Network and Operating Systems 949 Support for Digital Audio and Video (NOSSDAV), Miami, FL, 950 USA, May 2002. 952 13. Acknowledgments 954 Funding for this draft has been provided by the University of 955 Tuebingen within the "Projektfoerderung fuer 956 Nachwuchswissenschaftler". 958 This document was prepared using 2-Word-v2.0.template.dot. 960 Authors' Addresses 962 Christian Hoene 963 University of Tuebingen 964 Wilhelm-Schickard-Institute 965 Sand 13 966 72076 Tuebingen 967 DE 969 Phone: +49 7071 29 70532 970 Email: hoene@uni-tuebingen.de 972 Frans de Bont 973 Philips Electronics 974 High Tech Campus 5 975 5656 AE Eindhoven 976 NL 978 Phone: +31 40 2740234 979 Email: frans.de.bont@philips.com