idnits 2.17.1 draft-ietf-payload-rtp-sbc-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 10, 2013) is 4124 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'A2DPV12' ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Working Group PAYLOAD C. Hoene 2 Internet Draft Symonics GmbH 3 Intended status: Standards Track F. de Bont 4 Expires: July 2013 Philips Electronics 5 January 10, 2013 7 RTP Payload Format for Bluetooth's SBC Audio Codec 8 draft-ietf-payload-rtp-sbc-04 10 Status of this Memo 12 This Internet-Draft is submitted in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 This Internet-Draft will expire on July 10, 2013. 33 Copyright Notice 35 Copyright (c) 2013 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with 43 respect to this document. Code Components extracted from this 44 document must include Simplified BSD License text as described in 45 Section 4.e of the Trust Legal Provisions and are provided without 46 warranty as described in the Simplified BSD License. 48 Abstract 50 This document specifies a Real-time Transport Protocol (RTP) payload 51 format to be used for the low complexity subband codec (SBC), which 52 is the mandatory audio codec of the Advanced Audio Distribution 53 Profile (A2DP) Specification written by the Bluetooth(r) Special 54 Interest Group (SIG). The payload format is designed to be able to 55 interoperate with existing Bluetooth A2DP devices, to provide high 56 streaming audio quality, interactive audio transmission over the 57 internet, and ultra-low delay coding for jam sessions on the 58 internet. This document contains also a media type registration 59 which specifies the use of the RTP payload format. 61 Table of Contents 63 1. Introduction ................................................. 3 64 2. Conventions used in this Document ............................ 3 65 3. Background ................................................... 3 66 3.1. SBC Frame Structure ..................................... 5 67 3.2. Frame Header ............................................ 5 68 3.3. Remaining Frame Part .................................... 8 69 4. Usage Scenarios .............................................. 8 70 4.1. Scenario 1: Interconnection of A2DP Devices ............. 8 71 4.2. Scenario 2: High Quality Interactive Audio Transmissions 9 72 4.3. Scenario 3: Ensembles performing over a Network ......... 9 73 5. Header Usage ................................................ 10 74 6. Payload Format .............................................. 11 75 7. Payload Format Parameters ................................... 11 76 7.1. Media Type Registration for SBC ........................ 11 77 7.1.1. Capabilities: A2DP Modes .......................... 13 78 7.1.2. Capabilities: Other Modes ......................... 14 79 7.2. Mapping to SDP Parameters .............................. 14 80 7.2.1. Offer-Answer Model Considerations ................. 15 81 7.2.2. Declarative SDP Considerations .................... 17 82 8. Congestion Control .......................................... 17 83 9. Packet Loss Concealment ..................................... 18 84 10. Security Considerations .................................... 18 85 11. IANA Considerations......................................... 19 86 12. References ................................................. 20 87 12.1. Normative References .................................. 20 88 12.2. Informative References ................................ 20 89 13. Acknowledgments ............................................ 22 91 1. Introduction 93 The Bluetooth(r) Special Interest Group (SIG) specifies in the 94 Advanced Audio Distribution Profile (A2DP) [A2DPV12] a mono and 95 stereo high quality audio subband codec (SBC). This document 96 specifies the payload format for the encapsulation of SBC encoded 97 audio frames into the Real-time Transport Protocol (RTP). 99 SBC has a low computational complexity at modest compression rates. 100 Its bit rate can be controlled widely. Recommended operational modes 101 range from 127 to 345 kb/s, for mono and stereo audio signals. SBC's 102 algorithmic delay can be as low as 16 samples making it ideal for 103 ensembles playing music over the network requiring ultra low 104 acoustic delays. 106 2. Conventions used in this Document 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in RFC-2119 [RFC2119]. 112 The following acronyms are used in this document: 114 A2DP - Audio Distribution Profile 115 AAC - Advanced Audio Coding 116 ATRAC - Adaptive Transform Acoustic Coding 117 DCCP - Datagram Congestion Control Protocol 118 MP3 - MPEG-1 Audio Layer 3 119 SBC - SubBand Codec 120 SIG - Special Interest Group 122 3. Background 124 The A2DP specification [A2DPV12] is intended for streaming of music 125 content to headphones, headsets, or speakers over Bluetooth wireless 126 channels. A2DP supports multiple audio coding including MP3, AAC, 127 ATRAC, which are all non-mandatory. To ensure interoperability, the 128 SBC codec has been specified, in appendix B of the A2DP 129 specification, which shall be included into all A2DP Bluetooth 130 devices. 132 SBC is a low complexity subband codec based on earlier work 133 presented in [Bon1995] and [Rault1989]. It has a moderate 134 compression ratio. The SBC encoder has filter banks splitting the 135 audio signal into 4 or 8 subbands. Then the codec decides with how 136 many bits each subband is encoded and finally quantizes the subband 137 signals blockwise. An SBC frame can have different block sizes. The 138 size of a block can be 4, 8, 12 or 16. Both decoder and encoder 139 shall support all four block sizes. 141 SBC can operate at four different sampling frequencies. The sampling 142 frequency can be selected from a set of 16, 32, 44.1, and 48 kHz. It 143 is mandatory that each SBC decoder can operate at the frequencies 144 44.1 and 48 kHz. Each SBC encoder shall work at least at a sampling 145 rate of 44.1 or 48 kHz. 147 Four channel modes are supported, which are mono, dual channel, 148 stereo, and joint-stereo. The decoder shall support all four of 149 them; the encoder shall support mono and at least one additional 150 mode. 152 SBC can use four or eight subbands. The decoder shall support both; 153 the encoder shall support at least 8 subbands. 155 The bit allocation modes of SBC can be either based on signal to 156 noise ratio or on loudness. The decoder shall support both modes; 157 the encoder shall support at least the loudness mode. 159 The SBC encoder reduces one block to a given number of bits. The 160 bit-pool variable defines how many bits are used per block. The A2DP 161 profile defines the range of valid bit-pool values by providing 162 minimum and maximum bit-pool values. The bit-pool values shall range 163 from 2 to 250 but shall not be larger than number of subbands times 164 16 for the mono and dual and times 32 for the stereo and joint- 165 stereo channel modes. 167 SBC encoders according to the A2DP profile may be capable of 168 changing the bit-pool parameter dynamically during the encoding 169 process. For example, algorithms were invented that change the 170 number of bits depending on the current acoustic content 171 [Pilati2008]. 173 An SBC decoder according to the A2DP profile shall support all 174 possible bit-pool values that do not result in excess of maximum bit 175 rate, which is 320kb/s for mono and 512kb/s for two-channel modes. 176 The encoder is required to support at least one possible bit-pool 177 value. The A2DP profile recommends the encoding parameters given in 178 Table 1. 180 +------------------------------------------------------------+ 181 | SBC encoder settings at Medium Quality | 182 +--------------------------------+-------------+-------------+ 183 | | Mono | Joint Stereo| 184 | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | 185 | Bitpool value | 19 | 18 | 35 | 33 | 186 | Resulting frame length (bytes) | 46 | 44 | 83 | 79 | 187 | Resulting bit rate (kb/s) | 127 | 132 | 229 | 237 | 188 +--------------------------------+------+------+------+------+ 189 | SBC encoder settings at High Quality | 190 +--------------------------------+-------------+-------------+ 191 | | Mono | Joint Stereo| 192 | Sampling frequency (kHz) | 44.1 | 48 | 44.1 | 48 | 193 | Bitpool value | 31 | 29 | 53 | 51 | 194 | Resulting frame length (bytes) | 70 | 66 | 119 | 115 | 195 | Resulting bit rate (kb/s) | 193 | 198 | 328 | 345 | 196 +--------------------------------+------+------+------+------+ 197 + Other settings: Block length = 16, loudness, subbands = 8 | 198 +------------------------------------------------------------+ 200 Table 1: Recommended sets of SBC parameters in the SRC device as 201 given in [A2DPV12] 203 3.1. SBC Frame Structure 205 An SBC frame consists of a frame header, scale factors, audio 206 samples, and padding bits. The following diagram shows the general 207 SBC frame format layout: 209 +--------------+---------------+---------------+---------+ 210 | frame_header | scale_factors | audio_samples | padding | 211 +--------------+---------------+---------------+---------+ 213 The following sections describe the audio format, which consists of 214 bits stored in a bandwidth-efficient, compact mode. 216 3.2. Frame Header 218 The frame header consists of fields defined in [A2DPV12], which are 219 SYNCWORD, SAMPLING_FREQUENCY, BLOCKS, CHANNEL_MODE, 220 ALLOCATION_METHOD, SUBBANDS, BITPOOL, CRC_CHECK, optionally JOIN bit 221 fields and a RFA. The layout of the first four bytes of the frame 222 header is given in the following table. 224 0 1 2 3 225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 | SYNCWORD |SF.|BL.|CM.|A|S|BITPOOL |CRC_CHECK | 228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 229 Legend: SF.=SAMPLING FREQUENCY, BL.=BLOCKS, CM.=CHANNEL_MODE, 230 A.=ALLOCATION_METHOD, S.=SUBBANDS 232 SYNCWORD (8 bits): The first field is the 8 bit synchronization 233 word, which is always set to 156. 235 SAMPLING_FREQUENCY (2 bits): The sampling frequency field indicates 236 with which sampling frequency the SBC frame has been 237 encoded. The table below specifies the corresponding 238 sampling frequencies for the bit patterns. The sampling 239 frequency MUST NOT be changed without changing the payload 240 type, too. 242 +--------------------+----------------+ 243 | SAMPLING_FREQUENCY | sampling | 244 | bit 0 1 | frequency (Hz) | 245 +--------------------+----------------+ 246 | 0 0 | 16000 | 247 | 0 1 | 32000 | 248 | 1 0 | 44100 | 249 | 1 1 | 48000 | 250 +--------------------+----------------+ 252 BLOCKS (2 bits): It indicates the block size with which the stream 253 has been encoded. The block size is selected conforming to 254 the table below. The block size MUST NOT be changed 255 without changing the payload type, too. 257 +---------+-----------+ 258 | BLOCKS | Number of | 259 | bit 0 1 | blocks | 260 +---------+-----------+ 261 | 0 0 | 4 | 262 | 0 1 | 8 | 263 | 1 0 | 12 | 264 | 1 1 | 16 | 265 +---------+-----------+ 267 CHANNEL_MODE (2 bits): These two bits indicate with which channel 268 mode the frame has been encoded. The number of channels 269 depends on this information. The channel mode MUST NOT be 270 changed without changing the payload type, too. 272 +--------------+--------------+-----------+ 273 | CHANNEL_MODE | channel mode | number of | 274 | bit 0 1 | | channels | 275 +--------------+--------------+-----------+ 276 | 0 0 | MONO | 1 | 277 | 0 1 | DUAL_CHANNEL | 2 | 278 | 1 0 | STEREO | 2 | 279 | 1 1 | JOINT_STEREO | 2 | 280 +--------------+--------------+-----------+ 282 ALLOCATION_METHOD (1 bit): This bit indicates how the bit pool is 283 allocated to different subbands. Either it is based on the 284 loudness of the sub band signal or on the signal to noise 285 ratio. The allocation method MUST NOT be changed without 286 changing the payload type, too. 288 +-------------------+------------+ 289 | ALLOCATION_METHOD | allocation | 290 | bit 0 | method | 291 +-------------------+------------+ 292 | 0 | LOUDNESS | 293 | 1 | SNR | 294 +-------------------+------------+ 296 SUBBANDS (1 bit): This bit indicates the number of subbands with 297 which the frame has been encoded. The number of subband 298 MUST NOT be changed without changing the payload type, 299 too. 301 +----------+-----------+ 302 | SUBBANDS | number of | 303 | bit 0 | subbands | 304 +----------+-----------+ 305 | 0 | 4 | 306 | 1 | 8 | 307 +----------+-----------+ 309 BITPOOL (8 bits): This unsigned integer indicates the size of the 310 bit allocation pool that has been used for encoding the 311 current block. The value of the bit-pool field MUST NOT 312 exceed 16 times the number of subbands for the MONO and 313 DUAL_CHANNEL channel modes and 32 times the number of 314 subbands for the STEREO and JOINT_STEREO channel modes. 315 The bitpool value MAY change from SBC frame to the next. 316 In addition, the bitpool value MUST be restricted such 317 that it does not result in excess of maximum bit rate, 318 which is 320kb/s for mono and 512kb/s for two-channel 319 modes. 321 The remaining part of the header consists of CRC_CHECK, optionally 322 JOIN bit fields and a RFA. 324 3.3. Remaining Frame Part 326 The remaining part of the frame includes scale factors and audio 327 sample data, which are processed by the codec as described in 328 [A2DPV12]. 330 4. Usage Scenarios 332 As compared to many other encoding schemes, the SBC codec is general 333 enough to support multiple, quite diverse usage scenarios. Thus, it 334 might be required to change the behavior of the encoding and 335 transmission to achieve a good performance for a given usage 336 scenario. Thus, three main scenarios are listed and their quality 337 requirements and impact on encoding and transmission are described. 339 4.1. Scenario 1: Interconnection of A2DP Devices 341 This scenario is intended for interconnecting Bluetooth A2DP 342 devices. RTP frames generated by an A2DP device can be transmitted 343 directly via this RTP profile. Vice versa, an A2DP device should be 344 able to receive the RTP profile by default. Thus, the payload format 345 describe in this RFC MUST be fully interoperable with any A2DP 346 device. 348 The transmission between two A2DP devices has a constant frame rate 349 with a sender-controlled bit rate. It is not anticipated that the 350 transmission is adapted to congestion and bandwidth variation. 352 4.2. Scenario 2: High Quality Interactive Audio Transmissions 354 In the second scenario a telephone call is considered having a very 355 good audio quality at modest acoustic one-way latencies ranging from 356 50 and 150 ms [ITUG107], so that music can be listened over the 357 telephone while two persons talk together interactively. 359 In addition, the reliability of the audio transmission should be 360 high, even in cases of low and varying bandwidth. 362 This second scenario assumes that the SBC transmission is used on 363 top of a transport protocol that implements a congestion control 364 algorithm. Using the SBC encoding, the sampling, bit, and frame 365 rates should be controlled to cope with congestion. For example, if 366 the available transmission bandwidth is too low to allow SBC to 367 transmit audio at a high quality, the application can lower the 368 sampling, bit, or frame rate of the stream at the cost of higher 369 algorithmic delay or a degraded audio quality. In this case, 370 changing the sampling or frame rate may cause a short acoustic 371 artifact because SBC's internal filters must be reset. 373 The A2DP media format does not allow a dynamic change of the 374 encoding parameters beside the bit-pool value. The encoding 375 parameters can only be altered with the "Change Parameters" 376 procedure, which is defined in [GAVDPV12]. Such a change will cause 377 a hearable interruption and thus shall be avoided. 379 If an application using RTP wants to switch between different sets 380 of encoding parameters, then these set of parameter CAN be either 381 negotiate beforehand (as described in Section 7.2.) or an 382 renegotiation similar to the "Change Parameters" procedure CAN take 383 place. An application MUST NOT change the sampling frequency, block 384 length, encoding mode or the number of subbands within one RTP 385 session having the same RTP payload identifier. 387 4.3. Scenario 3: Ensembles performing over a Network 389 In some usage scenarios, users want to act simultaneously and not 390 just interactively. For example, if persons sing in a chorus, if 391 musicians jam, or if e-sportsmen play computer games in a team 392 together, they need to acoustically communicate. 394 In these scenarios, the latency requirements are much harder than 395 for interactive usages. For example, if two musicians are placed 396 more than 10 meters apart, they can hardly keep synchronized. 397 Empirical studies [Gurevich2004] have shown that if ensembles 398 playing over networks, the optimal acoustic latency is around 11.5 399 ms with targeted range from 10 to 25 ms. 401 To fulfill such requirements, it might be necessary to further 402 reduce the algorithmic coding delay by varying the block length 403 parameter. The default value of the block length parameter is chosen 404 such that the coding efficiency is maximized. For example, at 44.1 405 kHz and using 8 subbands and a block length of 16, the algorithmic 406 delay is 4.72 ms (208 samples). The value of the block length 407 parameter can be decreased, at the expense of a higher bit rate or 408 lower quality, to lower the latency to fulfill the very stringent 409 latency requirements of this scenario. 411 Still, given the speed of light as the fundamental limit of speed of 412 information exchange, distributed ensembles can perform only 413 regionally if latency budget of 25 ms must keep. Typically, an 414 optical fiber has a refractive index of 1.46 and thus in an optical 415 fiber bits travel about 5136 km one-way in 25 ms. 417 5. Header Usage 419 The format of the RTP header is specified in [RFC3550]. The payload 420 format defined in this document uses the fields of the header in a 421 manner fully consistent with that specification. 423 marker (M): In accordance with [A2DPV12] the marker bit MUST be set 424 to zero. 426 payload type (PT): The assignment of an RTP payload type for this 427 packet format is outside the scope of the document, and 428 will not be specified here. It is expected that the RTP 429 profile under which this payload format is being used will 430 assign a payload type for this codec or specify that the 431 payload type is to be bound dynamically (see Section 6.2). 433 timestamp (TS): The RTP timestamp clock frequency MUST be the same 434 as the sampling frequency, which has been negotiated for 435 the current RTP session (see Section 6.2). If a media 436 payload consists of multiple SBC frames, the TS of the 437 media packet header represents the TS of the first SBC 438 frame. The TS of the following SBC frames MUST be 439 calculated using the sampling rate and the number of 440 samples per frame per channel. A change in sampling 441 frequency MUST NOT occur within one media packet. 442 A SBC frame may be fragmented into multiple media packets 443 to reduce the packetisation delay. Then, all packets that 444 make up a fragmented SBC frame MUST use the same TS. 446 6. Payload Format 448 The format of the payload MUST follow exactly the description given 449 in Section 4.3.4, "Media Payload Format", of [A2DPV12]. 451 If the payload format parameters have been negotiated and a 452 restricted set of encoding and decoding modes have been selected, 453 than any SBC frame that describes a coding mode that has not been 454 chosen MUST be ignored. 456 7. Payload Format Parameters 458 This section defines the parameters that MAY be used to configure 459 optional features in the SBC payload format over RTP transmission. 461 The parameters are defined here as part of the media subtype 462 registrations for the SBC codec. A mapping of the parameters into 463 the Session Description Protocol (SDP) [RFC4566] is also provided 464 for those applications that use SDP. In control protocols that do 465 not use MIME or SDP, the media type parameters must be mapped to the 466 appropriate format used with that control protocol. 468 7.1. Media Type Registration for SBC 470 [Note to RFC Editor: Please replace all occurrences of RFC XXXX by 471 the RFC number assigned to this document] 473 This registration is done using the template defined in [RFC4288] 474 and following [RFC4855]. 476 Media type name: audio 478 Subtype name: SBC 479 Required parameters: 481 Rate: The RTP timestamp clock rate. See Section 5 for usage 482 details. 484 Optional parameters: 486 Channels: Specifies the number of audio channels: 2 for stereo 487 (refer to RFC 4566 [RFC4566]) and 1 for mono, 488 accordingly the SBC channel mode. If one channel is 489 used, this parameter can be omitted. 491 Capabilities: The capabilities of the encoder and decoder are 492 described by a parameter string that MUST start with an 493 octet written as two hexadecimal digits. This octet is 494 called VERSION and MUST be identical to the SYNCWORD 495 that will be used in the SBC frames. It is used to 496 distinguish different negotiation procedures. 497 The interpretation of the following characters depends 498 on the value of the VERSION octet. Refer to Section 499 7.1.1. and Section 7.1.2. to find a description. 501 Encoding considerations: This media type is framed and contains 502 binary data; see Section 4.8 of RFC 4288. 504 Security considerations: See Section 9 of RFC XXXX 506 Interoperability considerations: none 508 Published specification: RFC XXXX 510 Applications which use this media type: Audio and video conferencing 511 tools, distributed orchestras 513 Additional information: none 515 Person & email address to contact for further information: 516 See Authors' Addresses at the end of RFC XXXX 518 Intended usage: COMMON 520 Restrictions on usage: none 522 Author: See Authors' Addresses at the end of RFC XXXX 523 Change controller: IETF Audio/Video Transport Payloads working group 524 delegated from the IESG 526 7.1.1. Capabilities: A2DP Modes 528 The capabilities of the encoder and decoder MUST start with the 529 hexadecimal value of 9C, followed by a comma and four comma- 530 separated hexadecimal octets. These four octets called Octet 1, 2, 531 3, and 4 share a similar meaning as those defined in Section 4.3.2 532 of [A2DPV12]. However, because sampling frequency and number of 533 channels are already given in the SDP parameter "a=rtpmap", bit 0 up 534 to and including bit 3 of Octet 1 MUST BE ignored if received. The 535 meaning of the bits and the octets are described in the following 536 enumeration. The bit numbering follows the network bit order having 537 the highest bit first. 539 o Octet 1: Bit 0 (aka 2^7): If one, then the sampling frequency 540 16000 Hz is supported (ignored during SDP negotiations but SHOULD 541 be set if the clock rate is 16000 and MUST be cleared otherwise). 543 o Octet 1: Bit 1: If one, then the sampling frequency 32000 Hz is 544 supported (ignored during SDP negotiations but SHOULD be set if 545 the clock rate is 32000 and MUST be cleared otherwise). 547 o Octet 1: Bit 2: If one, then the sampling frequency 44100 Hz is 548 supported (ignored during SDP negotiations but SHOULD be set if 549 the clock rate is 44100 and MUST be cleared otherwise). 551 o Octet 1: Bit 3: If one, then the sampling frequency 48000 Hz is 552 supported (ignored during SDP negotiations but SHOULD be set if 553 the clock rate is 48000 and MUST be cleared otherwise). 555 o Octet 1: Bit 4: If one, then the channel mode MONO is supported 556 (ignored during SDP negotiations but SHOULD be set if the number 557 of channels is one and MUST be cleared otherwise). 559 o Octet 1: Bit 5: If one, then the channel mode DUAL_CHANNEL is 560 supported (*). 562 o Octet 1: Bit 6: If one, then the channel mode STEREO is supported 563 (*). 565 o Octet 1: Bit 7 (aka 2^0): If one, then the channel mode 566 JOINT_STEREO is supported (*). 568 o Octet 2: Bit 0: If one, the block length can be 4. 570 o Octet 2: Bit 1: If one, the block length can be 8. 572 o Octet 2: Bit 2: If one, the block length can be 12. 574 o Octet 2: Bit 3: If one, the block length can be 16. 576 o Octet 2: Bit 4: If one, the number of subband can be 4. 578 o Octet 2: Bit 5: If one, the number of subband can be 8. 580 o Octet 2: Bit 6: If one, the allocation mode based on signal to 581 noise ratio is supported. 583 o Octet 2: Bit 7: If one, the allocation mode based on loudness is 584 supported. 586 o Octet 3: Unsigned integer: The minimal bit-pool value that the 587 device supports. MUST be larger or equal than 2 and less or equal 588 than the maximal bit-pool value. 590 o Octet 4: Unsigned integer: The maximal bit-pool value that the 591 device supports MUST be equal or lower than 250. 593 (*) At least one of the bits 5, 6 or 7 of Octet 1 MUST be set if the 594 number of channels is set to two in the SDP parameter "a=rtpmap". 596 7.1.2. Capabilities: Other Modes 598 If the value of the VERSION octet is not equal to a known SYNCWORD 599 value, then the capabilities MUST be ignored. 601 7.2. Mapping to SDP Parameters 603 The information carried in the media type specification has a 604 specific mapping to fields in the Session Description Protocol (SDP) 605 [RFC4566], which is commonly used to describe RTP sessions. When SDP 606 is used to specify sessions employing the SBC codec, the mapping is 607 as follows: 609 o The media type ("audio") goes in SDP "m=" as the media name. 611 o The media subtype ("SBC") goes in SDP "a=rtpmap" as the encoding 612 name. 614 o The required parameter "rate" goes in SDP "a=rtpmap" as the RTP 615 . 617 o The optional parameter "channels", if present, goes in SDP as the 618 "a=rtpmap" RTP . 620 o The optional parameter "capabilities", if present, goes in the SDP 621 "a=fmtp" by the capabilities description as described in Section 622 7.1. 624 7.2.1. Offer-Answer Model Considerations 626 The Bluetooth standard document [AVDTPV12] describes how an A2DP 627 source and an A2DP sink negotiate their capabilities. Prior to the 628 establishment of the audio stream, one A2DP device can query the 629 service capabilities of the other device using the "Get Capabilities 630 Procedure". In any case, the coding mode is set using the "Set 631 Configuration" procedure. Only after a successful configuration, the 632 stream connection can be established. 634 In addition to the Bluetooth negotiation procedure, the SDP 635 negotiation MUST NOT agree on one single configuration but CAN agree 636 that multiple configuration modes, which are identified by different 637 payload type values, are supported. 639 The following considerations apply when using SDP offer-answer 640 procedures [RFC3264] to negotiate the use of SBC payload in RTP: 642 o The "capabilities" parameter is bi-directional, i.e., the 643 restricted mode set applies to media both to be received and sent 644 by the declaring entity. If the capabilities were supplied in the 645 offer, the answerer MUST return either the same mode-set or a 646 subset of this mode-set. If no capabilities were supplied in the 647 offer, the answerer MAY return capabilities to restrict the 648 possible modes. In any case, the capabilities in the answer then 649 apply for both offerer and answerer. The offerer MUST NOT send 650 frames of a mode that has been removed by the answerer. The 651 negotiation is finished if the offerer and the answerer have 652 agreed upon explicit capabilities for each payload type number. 653 The number of blocks and subbands and the kind of allocation 654 method and channel mode MUST have been negotiated unambiguously. 656 o Any unknown parameter in an offer MUST be ignored by the receiver 657 and MUST NOT be included in the answer. 659 Below are some example parts of SDP offer-answer exchanges. 661 o Example 1 662 Offer: SBC all A2DP modes 663 m=audio 54874 RTP/AVP 96 664 a=rtpmap:96 SBC/48000/2 665 a=fmtp:96 capabilities=9C,17,FF,02,FA 666 m=audio 54874 RTP/AVP 97 667 a=rtpmap:97 SBC/48000 668 a=fmtp:97 capabilities=9C,18,FF,02,FA 669 m=audio 54874 RTP/AVP 98 670 a=rtpmap:98 SBC/44100/2 671 a=fmtp:98 capabilities=9C,27,FF,02,FA 672 m=audio 54874 RTP/AVP 99 673 a=rtpmap:99 SBC/44100 674 a=fmtp:99 capabilities=9C,28,FF,02,FA 675 m=audio 54874 RTP/AVP 100 676 a=rtpmap:100 SBC/32000/2 677 a=fmtp:101 capabilities=9C,47,FF,02,FA 678 m=audio 54874 RTP/AVP 102 679 a=rtpmap:102 SBC/32000 680 a=fmtp:102 capabilities=9C,48,FF,02,FA 681 m=audio 54874 RTP/AVP 103 682 a=rtpmap:103 SBC/16000/2 683 a=fmtp:103 capabilities=9C,87,FF,02,FA 684 m=audio 54874 RTP/AVP 104 685 a=rtpmap:104 SBC/48000 686 a=fmtp:104 capabilities=9C,88,FF,02,FA 688 Answer: 48 kHz, JOINT_STEREO, 16 blocks, 8 subbands, LOUDNESS 689 m=audio 59452 RTP/AVP 96 690 a=rtpmap:96 SBC/48000/2 691 a=fmtp:96 capabilities=9C,11,15,02,FA 693 o Example 2 694 Offer: The A2DP SBC 48 kHz modes with mono or joint stereo, 8 695 subbands, loudness allocation method. In addition an unknown mode 696 called AD is offered. 697 m=audio 54874 RTP/AVP 96 698 a=rtpmap:96 SBC/48000/2 699 a=fmtp:96 capabilities=9C,11,F5,02,FA 700 m=audio 54874 RTP/AVP 97 701 a=rtpmap:97 SBC/48000/1 702 a=fmtp:97 capabilities=9C, 18,F5,02,FA 703 m=audio 54874 RTP/AVP 98 704 a=rtpmap:98 SBC/16000/1 705 a=fmtp:98 capabilities=AD 707 Answer: both A2DP modes are accepted but the unknown mode AD is 708 ignored. 709 m=audio 59452 RTP/AVP 96 710 a=rtpmap:96 SBC/48000/2 711 a=fmtp:96 capabilities=9C,11,F5,02,FA 712 m=audio 59452 RTP/AVP 9 713 a=rtpmap:97 SBC/48000/1 714 a=fmtp:97 capabilities=9C,18,F5,02,FA 716 7.2.2. Declarative SDP Considerations 718 For declarative use of SDP nothing specific is defined for this 719 payload format. The configuration given by the SDP MUST be used when 720 sending and/or receiving media in the session. 722 8. Congestion Control 724 One Bluetooth links, bandwidth can be reserved and thus the A2DP 725 specification does not consider any kind of congestion control. 726 However, congestion control is an important issue for any usage in 727 non-dedicated networks such as the Internet. Thus, congestion 728 control for RTP MUST be used in accordance with [RFC3550] and any 729 appropriate profile (for example, [RFC3551]). An additional 730 requirement if best-effort service is being used is: users of this 731 payload format MUST monitor packet loss to ensure that the packet 732 loss rate is within acceptable parameters. 734 Reducing the session bandwidth is possible by one or more of the 735 following means, which all will have negative impact to the users' 736 experience as he can notice a higher latency or a degraded audio 737 quality. The selection of the following means depends on current 738 usage scenario, the congestion control protocol, and the perceptual 739 assessment of the audio transmission and is not subject of this 740 specification. 742 1. If the bandwidth and frame rate shall be reduced, the sampling 743 rate can be lowered [Boutremans2004,Hoene2005]. 745 2. If the gross bandwidth and the frame rate shall be reduced, more 746 blocks can be put into one SBC frame and more SBC frames can be 747 placed in one RTP payload. 749 3. If the bandwidth shall be reduced, then the bit-pool value can be 750 reduced, so that the frames get smaller or the mono mode can be 751 selected. 753 4. If the bandwidth is very low, instead of an ongoing transmission, 754 a push-to-talk like service with temporary transmission 755 interruptions and a high delay can be applied. 757 5. If the packet loss rate is very high, the session shall be 758 terminated because the quality of the audio transmission is too 759 bad to be useful [Widmer2002]. 761 Because the SBC encoding can be tuned with many parameters, it is 762 especially useful for rate adaptive transport protocols such as DCCP 763 [RFC4340] or TCP [RFC4571]. The report [Hoene2009] describes, which 764 SBC coding mode gives the best speech and audio quality under known 765 bandwidth and time constrains. 767 9. Packet Loss Concealment 769 In order to cope with packet losses, the SBC decoder SHOULD be 770 extended by a packet loss concealment algorithm. The packet loss 771 concealment algorithm SHOULD provide a good audio quality in case of 772 losses. Otherwise, the congestion control algorithm can not trade 773 off well the quality impairment due to packet losses versus the 774 quality impairment caused by different encoding modes. It is 775 RECOMMENDED that at a least the reserve order replicated pitch 776 periods (RORPP) algorithm as defined in [Hoene2009] or any better is 777 used. 779 If this requirement is not meet, then the congestion control cannot 780 predict the impact of packet loss on the audio quality and thus will 781 not be able to control the encoding parameters optimally. 783 10. Security Considerations 785 RTP packets using the payload format defined in this specification 786 are subject to the general security considerations discussed in the 787 RTP specification [RFC3550] and any appropriate profile (for 788 example, [RFC3551]). 790 As this format transports encoded speech/audio, the main security 791 issues include confidentiality, integrity protection, and 792 authentication of the speech/audio itself. The payload format 793 itself does not have any built-in security mechanisms. Any suitable 794 external mechanisms, such as SRTP [RFC3711], MAY be used. 796 This payload format and the SBC encoding do not exhibit any large 797 non-uniformity in the receiver-end computational load and thus are 798 unlikely to pose a denial-of-service threat due to the receipt of 799 pathological datagrams. 801 11. IANA Considerations 803 It is requested that one new media subtype (audio/SBC) and one 804 optional parameter for this media subtype ("capabilities") are 805 registered by IANA, see Section 7.1 and Section 7.2. 807 12. References 809 12.1. Normative References 811 [A2DPV12] Bluetooth SIG, "Advanced Audio Distribution Profile", 812 Audio Video WG, adopted specification, revision V1.2, 813 April 16th, 2007. 815 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 816 Requirement Levels", BCP 14, RFC 2119, March 1997. 818 [RFC3264] Rosenberg, J. and Schulzrinne, H., "An Offer/Answer 819 Modelwith Session Description Protocol (SDP)", RFC 3264, 820 June 2002. 822 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 823 Jacobson, "RTP: A Transport Protocol for Real-Time 824 Applications", STD 64, RFC 3550, July 2003. 826 [RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio and 827 Video Conferences with Minimal Control", STD 65, RFC 3551, 828 July 2003. 830 [RFC4288] Freed, N. and Klensin, J., "Media Type Specifications and 831 Registration Procedures", BCP 13, RFC 4288, December 2005. 833 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 834 Description Protocol", RFC 4566, July 2006. 836 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 837 Formats", RFC 4855, February 2007. 839 12.2. Informative References 841 [AVDTPV12] Bluetooth SIG, "Audio/Video Distribution Transport 842 Protocol Specification", Audio Video WG, adopted 843 specification, revision V12, April 16th, 2007. 845 [Bon1995] de Bont, F., Groenewegen, M., and Oomen, W., "A High 846 Quality Audio-Coding System at 128 kb/s", 98th AES 847 Convention, February 25 - 28, 1995. 849 [Boutremans2004] Boutremans, C., Le Boudec J.-Y., and Widmer, J., 850 "End-to-end congestion control for tcp-friendly flows with 851 variable packet size", ACM Computer Communication Review, 852 Vol. 31, No. 2, pp. 137-151, 2004. 854 [Pilati2008] Pilati, L., Zadissa, M., "Enhancements to the SBC CODEC 855 for Voice Communication in Mobile Devices", AES Convention 856 124, No. 7347, May 2008. 858 [Hoene2009] Hoene, C., Hyder, M.. "Considering bluetooth's subband 859 codec (SBC) for wideband speech and audio on the 860 internet". Technical Report WSI-2009-3, Universitaet 861 Tuebingen - WSI, 72076 Tuebingen, Germany, October 2009. 863 [GAVDPV12] Bluetooth SIG, "Generic Audio/Video Distribution 864 Profile", Audio Video WG, adopted specification, revision 865 V12, April 16th, 2007. 867 [Gurevich2004] Gurevich, M., Chafe, C., Leslie, G., and Tyan, S., 868 "Simulation of Networked Ensemble Performance with Varying 869 Time Delays: Characterization of Ensemble Accuracy", 870 Proceedings of the 2004 International Computer Music 871 Conference, Miami, USA, 2004. 873 [Hoene2005] Hoene, C., and Karl, H., and Wolisz, A., "A perceptual 874 quality model intended for adaptive VoIP applications", 875 International Journal of Communication Systems, Wiley, 876 August 2005. 878 [ITUG107] ITU-T G.107, "The E-model, a computational model for use 879 in transmission planning", ITU-T Recommendation G.107, May 880 2000. 882 [Rault1989] Rault, J., Dehery, Y., Roudaut, J., Bruekers, A., and 883 Veldhuis, R., "Digital transmission system using subband 884 coding of a digital signal", Publication number: EP0400755 885 (B1). 887 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 888 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 889 RFC 3711, March 2004. 891 [RFC4340] Kohler, E., Handley, M., and Floyd, S., "Datagram 892 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 894 [RFC4571] Lazzaro, J., "Framing Real-time Transport Protocol (RTP) 895 and RTP Control Protocol (RTCP) Packets over Connection- 896 Oriented Transport", RFC4571, July 2006. 898 [Widmer2002] Widmer, J., Mauve, M., and Damm, J., "Probabilistic 899 congestion control for non-adaptable flows", In 12th 900 International Workshop on Network and Operating Systems 901 Support for Digital Audio and Video (NOSSDAV), Miami, FL, 902 USA, May 2002. 904 13. Acknowledgments 906 Funding for this draft has been provided by the University of 907 Tuebingen within the "Projektfoerderung fuer 908 Nachwuchswissenschaftler". 910 This document was prepared using 2-Word-v2.0.template.dot. 912 Authors' Addresses 914 Christian Hoene 915 Symonics GmbH 916 Sand 13 917 72076 Tuebingen 918 DE 920 Phone: +49 7071 568 1300 921 Email: Christian.hoene@symonics.com 923 Frans de Bont 924 Philips Electronics 925 High Tech Campus 36 926 5656 AE Eindhoven 927 NL 929 Phone: +31 40 2740234 930 Email: frans.de.bont@philips.com