idnits 2.17.1 draft-ietf-avt-rtp-atrac-family-24.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'Sender' on line 453 -- Looks like a reference, but probably isn't: 'Receiver' on line 463 -- Looks like a reference, but probably isn't: '9-11' on line 1355 ** Obsolete normative reference: RFC 4566 (ref. '2') (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4288 (ref. '5') (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 3388 (ref. '7') (Obsoleted by RFC 5888) -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '15') (Obsoleted by RFC 7826) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport M. Hatanaka 2 Internet-Draft J. Matsumoto 3 Expires: November 17, 2009 Sony Corporation 4 May, 2009 6 RTP Payload Format for Adaptive TRansform Acoustic Coding (ATRAC) Family 7 draft-ietf-avt-rtp-atrac-family-24 9 Status of this Memo 11 Internet-Drafts are working documents of the Internet Engineering 12 Task Force (IETF), its areas, and its working groups. Note that 13 other groups may also distribute working documents as Internet- 14 Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet-Drafts 19 as reference material or to cite them other than as "work in 20 progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt. 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 This Internet-Draft will expire on November 17, 2009. 30 Submission Compliance for Internet-Drafts 32 This Internet-Draft is submitted to IETF in full conformance with 33 the provisions of BCP 78 and BCP 79. 35 Copyright and License Notice 37 Copyright (c) 2009 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents in effect on the date of 42 publication of this document (http://trustee.ietf.org/license-info). 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 Hatanaka, et al. [Page 1] 47 Abstract 49 This document describes an RTP payload format for efficient and 50 flexible transporting of audio data encoded with the Adaptive 51 TRansform Audio Coding (ATRAC) family of codecs. Recent enhancements 52 to the ATRAC family of codecs support high quality audio coding with 53 multiple channels. The RTP payload format as presented in this 54 document also includes support for data fragmentation, elementary 55 redundancy measures, and a variation on scalable streaming. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Conventions Used in This Document . . . . . . . . . . . . . . 4 61 3. Codec Specific Details . . . . . . . . . . . . . . . . . . . . 4 62 4. RTP Packetization and Transport of ATRAC-Family Streams . . . 5 63 4.1 ATRAC Frames . . . . . . . . . . . . . . . . . . . . . . . 5 64 4.2 Concatenation of Frames . . . . . . . . . . . . . . . . . 5 65 4.3 Frame Fragmentation . . . . . . . . . . . . . . . . . . . 5 66 4.4 Transmission of Redundant Frames . . . . . . . . . . . . . 6 67 4.5 Scalable Lossless Streaming (High-Speed Transfer mode) . . 6 68 4.5.1 Scalable Multiplexed Streaming . . . . . . . . . . . . 6 69 4.5.2 Scalable Multi-Session Streaming . . . . . . . . . . . 7 70 5. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5.1 Global Structure of Payload Format . . . . . . . . . . . . 8 72 5.2 Usage of RTP Header Fields . . . . . . . . . . . . . . . . 9 73 5.3 RTP Payload Structure . . . . . . . . . . . . . . . . . . 10 74 5.3.1 ATRAC Header Section . . . . . . . . . . . . . . . . . 10 75 5.3.2 ATRAC Frames Section . . . . . . . . . . . . . . . . . 11 76 5.3.2.1 Support of redundancy. . . . . . . . . . . . . . . . . 11 77 5.3.2.2 Frame Fragmentation . . . . . . . . . . . . . . . . . 13 78 6. Packetization Examples . . . . . . . . . . . . . . . . . . . . 14 79 6.1 Example Multi-frame Packet . . . . . . . . . . . . . . . . 14 80 6.2 Example Fragmented ATRAC Frame . . . . . . . . . . . . . . 15 81 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16 82 7.1 ATRAC3 Media type Registration . . . . . . . . . . . . . . 17 83 7.2 ATRAC-X Media type Registraion . . . . . . . . . . . . . . 19 84 7.3 ATRAC Advanced Lossless Media type Registration . . . . . 21 85 7.4 Channel Mapping Configuration Table . . . . . . . . . . . 23 86 7.5 Mapping Media type Parameters into SDP . . . . . . . . . . 24 87 7.5.1 For Media subtype ATRAC3 . . . . . . . . . .. . . . . 24 88 7.5.2 For Media subtype ATRAC-X . . . . . . . . . .. . . . . 24 89 7.5.3 For Media subtype ATRAC Advanced Lossless . .. . . . . 25 90 7.6 Offer-Answer Model Considerations . . . . . . . . . . . . 26 91 7.6.1 For All Three Media Subtypes . . . . . . . .. . . . . 26 92 7.6.2 For Media subtype ATRAC3 . . . . . . . . . . . . . . 26 93 7.6.3 For Media subtype ATRAC-X . . . . . . . . . . . . . . 27 94 7.6.4 For Media subtype ATRAC Advanced Lossless . . . . . . 27 95 7.7 Usage of declarative SDP . . . . . . . . . . . . . . . . . 28 96 7.8 Example SDP Session Descriptions . . . . . . . . . . . . . 28 97 7.9 Example Offer-Answer Exchange . . . . . . . . . . . . . . 30 99 Hatanaka, et al. [Page 2] 100 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 101 9. Security Considerations . . . . . . . . . . . . . . . . . . . 32 102 10. Considerations on Correct Decoding . . . . . . . . . . . . . . 33 103 10.1 Verification of the Packets . . . . . . . . . . . . . . . 33 104 10.2 Validity Checking of the Packets . . . . . . . . . . . . . 33 105 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 106 11.1 Normative References . . . . . . . . . . . . . . . . . . . 34 107 11.2 Informative References . . . . . . . . . . . . . . . . . . 35 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 35 109 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . 35 111 Hatanaka, et al. [Page 3] 112 1. Introduction 114 The ATRAC family of perceptual audio codecs is designed to address 115 numerous needs for high-quality, low bit-rate audio transfer. ATRAC 116 technology can be found in many consumer and professional products 117 and applications, including MD players, CD players, voice recorders, 118 and mobile phones. 120 Recent advances in ATRAC technology allow for multiple channels of 121 audio to be encoded in customizable groupings. This should allow 122 for future expansions in scaled streaming. To provide the greatest 123 flexibility in streaming any one of the ATRAC family member codecs, 124 however, this payload format does not distinguish between the codecs 125 on a packet level. 127 This simplified payload format contains only the basic information 128 needed to disassemble a packet of ATRAC audio in order to decode it. 129 There is also basic support for fragmentation and redundancy. 131 2. Conventions Used in This Document 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 135 document are to be interpreted as described in RFC2119 [4]. 137 3. Codec Specific Details 139 Early versions of the ATRAC codec handled only two channels of audio 140 at 44.1kHz sampling frequency, with typical bit-rates between 66kbps 141 and 132kbps. The latest version allows for a maximum 8 channels of 142 audio, up to 96kHz in sampling frequency, and a lossless encoding 143 option which can be transmitted in either a scalable (also known as 144 High-Speed Transfer mode) or standard (aka Standard mode) format. 145 The feasible bit-rate range has also expanded, allowing from a low of 146 8kbps up to 1400kbps in lossy encoding modes. 148 Depending on the version of ATRAC used, the sample-frame size is 149 either 512, 1024 or 2048 samples. While the lossy and Standard mode 150 lossless formats are encoded as sequential single audio frames, 151 High-Speed Transfer mode lossless data comprises two layers -- a 152 lossy base layer and an enhancement layer. 153 Although streaming of multi-channel audio is supported depending on 154 the ATRAC version used, all encoded audio for a given time period is 155 contained within a single frame. Therefore, there is no interleaving 156 nor splitting of audio data on a per-channel basis to be concerned 157 with. 159 Hatanaka, et al. [Page 4] 160 4. RTP Packetization and Transport of ATRAC-Family Streams 162 4.1 ATRAC Frames 164 For transportation of compressed audio data, ATRAC uses the concept 165 of frames. ATRAC frames are the smallest data unit for which timing 166 information is attributed. Frames are octet-aligned by definition. 168 4.2 Concatenation of Frames 170 It is often possible to carry multiple frames in one RTP packet. 171 This can be useful in audio, where on a LAN with a 1500 byte MTU, an 172 average of 7 complete 64kbps ATRAC frames could be carried in a 173 single RTP packet, as each ATRAC frame would be approximately 200 174 bytes. ATRAC frames may be of fixed or variable length. To 175 facilitate parsing in the case of multiple frames in one RTP packet, 176 the size of each frame is made known to the receiver by carrying "in 177 band" the frame size for each contained frame in an RTP packet. 178 However, to simplify the implementation of RTP receivers, it is 179 required that when multiple frames are carried in an RTP packet, each 180 frame MUST be complete, i.e., the number of frames in an RTP packet 181 MUST be integral. 183 4.3 Frame Fragmentation 185 The ATRAC codec can handle very large frames. As most IP networks 186 have significantly smaller MTU sizes than the frame sizes ATRAC can 187 handle, this payload format allows for the fragmentation of an ATRAC 188 frame over multiple RTP packets. However, to simplify the 189 implementation of RTP receivers, an RTP packet MUST either carry one 190 or more complete ATRAC frames or a single fragment of one ATRAC 191 frame. In other words, RTP packets MUST NOT contain fragments of 192 multiple ATRAC frames and MUST NOT contain a mix of complete and 193 fragmented frames. 195 Hatanaka, et al. [Page 5] 196 4.4 Transmission of Redundant Frames 198 As RTP does not guarantee reliable transmission, receipt of data is 199 not assured. Loss of a packet can result in a "decoding gap" at the 200 receiver. One method to remedy this problem is to allow time-shifted 201 copies of ATRAC frames to be sent along with current data. For a 202 modest cost in latency and implementation complexity, error 203 resiliency to packet loss can be achieved. For further details, see 204 section 5.3.2.1, and reference[12]. 206 4.5 Scalable Lossless Streaming (High-Speed Transfer mode) 208 As ATRAC supports a variation on scalable encoding, this payload 209 format provides a mechanism for transmitting essential data (also 210 referred to as the base layer) with its enhancement data in two ways 211 -- multiplexed through one session or separated over two sessions. 212 In either method, only the base layer is essential in producing audio 213 data. The enhancement layer carries the remaining audio data needed 214 to decode lossless audio data. So in situations of limited 215 bandwidth, the sender may choose not to transmit enhancement data yet 216 still provide a client with enough data to generate lossily-encoded 217 audio through the base layer. 219 4.5.1 Scalable Multiplexed Streaming 221 In multiplexed streaming, the base layer and enhancement layer are 222 coupled together in each packet, utilizing only one session as 223 illustrated in Figure 1. 224 The packet MUST begin with base layer, and the two layer types 225 MUST interleave if both of layer exist in a packet (only base or 226 enhancement is included in a packet at the beginning of a streaming, 227 or during the fragmentation). 229 +----------------+ +----------------+ +----------------+ 230 |Base|Enhancement|--|Base|Enhancement|--|Base|Enhancement| ... 231 +----------------+ +----------------+ +----------------+ 232 N N+1 N+2 : Packet 234 Figure 1. Multiplexed structure 236 Hatanaka, et al. [Page 6] 237 4.5.2 Scalable Multi-Session Streaming 239 In multi-session streaming, the base layer and enhancement layer are 240 sent over two separate sessions, allowing clients with certain 241 bandwidth limitations to receive just the base layer for decoding as 242 illustrated in Figure 2. 244 In this case, it is REQUIRED to determine which sessions are paired 245 together in receiver side. For paired base and enhancement layer 246 session, the CNAME bindings in RTCP session MUST be applied using the 247 same CNAME to ensure correct mapping to the RTP source. 249 While there may be alternative methods for synchronization of the 250 layers, the timestamp SHOULD be used for synchronizing the base layer 251 with its enhancement. The two sessions MUST be synchronized 252 using the information in RTCP SR packets to align the RTP timestamps. 254 If the enhancement layer's session data cannot arrive until 255 the presentation time, the decoder MUST decode the Base layer 256 session's data only, ignoring the enhancement layer's data. 258 Session 1: 259 +------+ +------+ +------+ +------+ 260 | Base |--| Base |--| Base |--| Base | ... 261 +------+ +------+ +------+ +------+ 262 N N+1 N+2 N+3 : Packet 264 Session 2: 265 +-------------+ +-------------+ +-------------+ 266 | Enhancement |--| Enhancement |--| Enhancement | ... 267 +-------------+ +-------------+ +-------------+ 268 N N+1 N+2 : Packet 269 Figure 2. Multi-Session Streaming 271 Hatanaka, et al. [Page 7] 272 5. Payload Format 274 5.1 Global Structure of Payload Format 276 The structure of ATRAC Payload is illustrated in Figure 3. 277 The RTP payload following the RTP header contains two 278 octet-aligned data sections. 280 +------+--------------+-----------------------------+ 281 |RTP | ATRAC Header | ATRAC Frames Section | 282 |Header| Section | (including redundant data) | 283 +------+--------------+-----------------------------+ 284 < ---------------- RTP Packet Payload ------------- > 286 Figure 3. Structure of RTP Payload of ATRAC family 288 The first data section is the ATRAC Header, containing just one 289 header with information for the whole packet. The second 290 section is where the encoded ATRAC frames are stored. This may 291 contain either a single fragment of one ATRAC frame, or one or more 292 complete ATRAC frames. The ATRAC Frames Section MUST NOT be empty. 293 When using the redundancy mechanism described in section 5.3.2.1, the 294 redundant frame data can be included in this section and time stamp 295 MUST be set to the oldest redundant frame's time stamp. 297 To benefit from ATRAC's High-Speed Transfer mode lossless encoding 298 capability, the RTP payload can be split across two sessions, with 299 one transmitting an essential base layer and the other transmitting 300 enhancement data. However in either case, the above structure still 301 applies. 303 Hatanaka, et al. [Page 8] 304 5.2 Usage of RTP Header Fields 306 0 1 2 3 307 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 309 |V=2|P|X| CC |M| PT | sequence number | 310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 311 | timestamp | 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 | synchronization source (SSRC) identifier | 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 | contributing source (CSRC) identifiers | 316 | ..... | 317 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 Figure 4. RTP Standard Header Part 320 The structure of RTP Standard Header Part is illustrated in Figure 4. 322 Version(V): 2 bits 323 Set to 2. 325 Padding(P): 1 bit 326 If the padding bit is set, the packet contains one or more 327 additional padding octets at the end which are not part of the 328 payload. The last octet of the padding contains a count of how 329 many padding octets should be ignored, including itself. Padding 330 may be needed by some encryption algorithms with fixed block sizes 331 or for carrying several RTP packets in a lower-layer protocol data 332 unit (see [1]). 334 Extension(X): 1 bit 335 Defined by the RTP profile used. 337 CSRC count(CC): 4bits 338 see RFC 3550[1]. 340 Marker (M): 1 bit 341 Set to 1 if the packet is the first packet after a silence period, 342 otherwise it MUST be set to 0. 344 Payload Type (PT): 7 bits 345 The assignment of an RTP payload type for this packet format is 346 outside the scope of this document; it is specified by the RTP 347 profile under which this payload format is used, or signaled 348 dynamically out-of-band (e.g., using SDP). 350 Hatanaka, et al. [Page 9] 351 sequence number: 16bits 352 A sequential number for RTP packet. It ranges from 0 to 65535 and 353 repeats itself periodically. 355 Timestamp: 32 bits 356 A timestamp representing the sampling time of the first sample of 357 the first ATRAC frame in the current RTP packet. 358 When using SDP, the clock rate of the RTP timestamp MUST be 359 expressed using the "rtpmap" attribute. 360 For ATRAC3 and ATRAC Advanced Lossless, the RTP timestamp rate 361 MUST be 44100Hz. For ATRAC-X the RTP timestamp rate is 44100Hz or 362 48000Hz, and it will be selected by out-of-band signaling. 364 SSRC: 32bits 365 see RFC 3550[1]. 367 CSRC list: 0 to 15 items, 32bits each 368 see RFC 3550[1]. 370 5.3 RTP Payload Structure 372 5.3.1 Usage of ATRAC Header Section 374 The ATRAC header section has the fixed length of one byte as 375 illustrated in Figure 5. 377 0 1 2 3 4 5 6 7 378 +-+-+-+-+-+-+-+-+ 379 |C|FrgNo|NFrames| 380 +-+-+-+-+-+-+-+-+ 381 Figure 5. ATRAC RTP Header 383 Continuation Flag (C) : 1bit 384 The packet which corresponds to the last part of the audio frame data 385 in a fragmentation, MUST have this bit to 0, otherwise set to 1. 387 Fragment Number (FrgNo): 3 bits 388 In the event of data fragmentation, this value is one for the first 389 packet, and increases sequentially for the remaining fragmented data 390 packets. This value MUST be zero for an unfragmented frame. (Note: 391 3 bits is sufficient to avoid Fragment Number rollover given the 392 current maximum supported bit-rate in the ATRAC specification. If 393 that changes, the choice of 3 bits for the Fragment Number should be 394 revisited.) 396 Number of Frames (NFrames): 4 bits 397 The number of audio frames in this packet are field value + 1. 398 This allows for a maximum of 16 ATRAC-encoded audio frames per 399 packet, with 0 indicating one audio frame. Each audio frame MUST be 400 complete in the packet if fragmentation is not applied. In case of 401 fragmentation, the data for only one audio frame is allowed to be 402 fragmented, and this value MUST be 0. 404 5.3.2 Usage of ATRAC Frames Section 406 The ATRAC Frames Section contains an integer number of complete 407 ATRAC frames or a single fragment of one ATRAC frame as 408 illustrated in Figure 6. Each ATRAC frame is preceded by a one-bit 409 flag indicating the layer type and a Block Length field indicating 410 the size in bytes of the ATRAC frame. If more than one ATRAC frame 411 is present, then the frames are concatenated into a contiguous 412 string of bit-flag, Block Length, and ATRAC frame in order of their 413 frame number. This section MUST NOT be empty. 415 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 417 |E| Block Length | ATRAC frame |... 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 Figure 6. ATRAC Frame Section Format 421 Layer Type Flag (E): 1 bit 422 Set to 1 if the corresponding ATRAC frame is from an enhancement 423 layer. 0 indicates a base layer encoded frame. 425 Block length: 15 bits 426 The byte length of encoded audio data for the following frame. This 427 is so that in the case of fragmentation, if only a subsequent packet 428 is received, decoding can still occur. 15 bits allows for a maximum 429 block length of 32,767 bytes. 431 ATRAC frame: The encoded ATRAC audio data. 433 5.3.2.1 Support of redundancy 435 This payload format provides a rudimentary scheme to compensate 436 for occasional packet loss. As every packet's timestamp corresponds 437 to the first audio frame regardless of whether it is redundant or 438 not, and because we know how many frames of audio each packet 439 encapsulates, if two successive packets are successfully transmitted, 440 we can calculate the number of redundant frames being sent. The 441 result gives the client a sense of how the server is responding to 442 RTCP reports and warns it to expand its buffer size if necessary. 443 As an example of using the Redundant Data, refer to Figure 7 and 8. 445 In this example, the server has determined that for the next few 446 number of packets, it should send the last two frames from the 447 previous packet due to recent RTCP reports. Thus, between packets 448 N and N+1, there is a redundancy of two frames (which the client 449 may choose to dispose of). The benefit arises when packets N+2 450 and N+3 do not arrive at all, after which eventually packet N+4 451 arrives with successive necessary audio frame data. 453 [Sender] 455 |-Fr0-|-Fr1-|-Fr2-| Packet: N, TS=0 456 |-Fr1-|-Fr2-|-Fr3-| Packet: N+1, TS=1024 457 |-Fr2-|-Fr3-|-Fr4-| Packet: N+2, TS=2048 458 |-Fr3-|-Fr4-|-Fr5-| Packet: N+3, TS=3072 459 |-Fr4-|-Fr5-|-Fr6-| Packet: N+4, TS=4096 461 -----------> Packet "N+2" and "N+3" not arrived -------------> 463 [Receiver] 465 |-Fr0-|-Fr1-|-Fr2-| Packet: N, TS=0 466 |-Fr1-|-Fr2-|-Fr3-| Packet: N+1, TS=1024 467 |-Fr4-|-Fr5-|-Fr6-| Packet: N+4, TS=4096 469 The receiver can decode from FR4 to Fr6 by using Packet "N+4" data 470 even if the packet loss of "N+2" and "N+3" is occured. 472 Figure 7. Redundant Example 474 0 1 2 3 475 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 |V=2|P|X| CC |M| PT | sequence number | 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 | timestamp (= start sample time of Fr1) | 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 | synchronization source (SSRC) identifier | 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 483 | contributing source (CSRC) identifiers | 484 | ..... | 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 |0| 0 | 3 |0| Block Length | | 487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 488 | (redundant) ATRAC frame (Fr1) data ... | 489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 490 |0| Block Length |(redundant) ATRAC frame (Fr2) | 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 | (cont.) |0| Block Length | ATRAC frame (Fr3) | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 | (cont.) | 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 Figure 8. Packet structure example with Redundant data 498 (case of Packet "N+1") 500 5.3.2.2 Frame Fragmentation 502 Each RTP packet MUST contain either an integer number of ATRAC 503 encoded audio frames (with a maximum of 16), or one ATRAC frame 504 fragment. In the former case, as many complete ATRAC frames as can 505 fit in a single path-MTU SHOULD be placed in an RTP packet. However, 506 if even a single ATRAC frame will not fit into a complete RTP packet, 507 the ATRAC frame MUST be fragmented. 509 The start of a fragmented frame gets placed in its own RTP packet 510 with its Continuation bit (C) set to one, and its Fragment Number 511 (FragNo) set to one. As the frame must be the only one in the 512 packet, the Number of Frames field is zero. Subsequent packets are 513 to contain the remaining fragmented frame data, with the Fragment 514 Number increasing sequentially and the Continuation bit (C) 515 consistently set to one. As subsequent packets do not contain any new 516 frames, the Number of Frames field MUST be ignored. The last packet 517 of fragmented data MUST have the Continuation bit (C) set to zero. 519 Packets containing related fragmented frames MUST have identical 520 timestamps. Thus, while the Continuous bit and Fragment Number fields 521 indicate fragmentation and a means to reorder the packets, the 522 timestamp can be used to determine which packets go together." 524 6. Packetization Examples 526 6.1 Example Multi-frame Packet 528 Multiple encoded audio frames are combined into one packet. Note 529 how for this example, only base layer frames are sent redundantly, 530 but are followed by interleaved base layer and enhancement layer 531 frames as illustrated in Figure 9. 533 0 1 2 3 534 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 535 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 536 |V=2|P|X| CC |M| PT | sequence number | 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 538 | timestamp | 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 | synchronization source (SSRC) identifier | 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 542 | contributing source (CSRC) identifiers | 543 | ..... | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 |0| 0 | 5 |0| Block Length | | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 | (redundant) base layer frame 1 data... | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 |0| Block Length |(redundant) base layer frame 2 | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 551 | (cont.) |0| Block Length | base layer frame 3 | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | (cont.) |1| Block Length | enhancement frame 3 | 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 555 | (cont.) |0| Block Length | base layer frame 4 | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 557 | (cont.) |1| Block Length | enhancement frame 4 | 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 560 Figure 9. Example Multi-frame Packet 562 6.2 Example Fragmented ATRAC Frame 564 The encoded audio data frame is split over three RTP packets as 565 illustrated in Figure 10. The following points are highlighted 566 in the example below: 568 o transition from one to zero of the Continuation bit (C) 570 o sequential increase in the Fragment Number 572 Packet 1: 573 0 1 2 3 574 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 |V=2|P|X| CC |M| PT | sequence number | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | timestamp | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | synchronization source (SSRC) identifier | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 | contributing source (CSRC) identifiers | 583 | ..... | 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 |1| 1 | 0 |1| Block Length | | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | enhancement data... | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 Packet 2: 591 0 1 2 3 592 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 |V=2|P|X| CC |M| PT | sequence number | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | timestamp | 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 598 | synchronization source (SSRC) identifier | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 600 | contributing source (CSRC) identifiers | 601 | ..... | 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 603 |1| 2 | 0 |1| Block Length | | 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | ...more enhancement data... | 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 Packet 3: 608 0 1 2 3 609 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 611 |V=2|P|X| CC |M| PT | sequence number | 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 | timestamp | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | synchronization source (SSRC) identifier | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | contributing source (CSRC) identifiers | 618 | ..... | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 |0| 3 | 0 |1| Block Length | | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | ...the last of the enhancement data | 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 Figure 10. Example Fragmented ATRAC Frame 627 7. Payload Format Parameters 629 Certain parameters will need to be defined before ATRAC family 630 encoded content can be streamed. Other optional parameters may also 631 be defined to take advantage of specific features relevant to certain 632 ATRAC versions. Parameters for ATRAC3, ATRAC-X, and ATRAC Advanced 633 Lossless are defined here as part of the media subtype registration 634 process. A mapping of these parameters into the Session Description 635 Protocol (SDP) (RFC 4566) [2] is also provided for applications that 636 utilize SDP. These registrations use the template defined in RFC 637 4288 [5] and follow RFC 4855 [6]. 639 The data format and parameters are specified for real-time transport 640 in RTP. 642 7.1 ATRAC3 Media type Registration 644 The media subtype for the Adaptive TRansform Codec version 3 (ATRAC3) 645 uses the template defined in RFC 4855 [6]. 647 Note, any unknown parameter MUST be ignored by the receiver. 649 Type name: audio 651 Subtype name: atrac3 653 Required parameters: 654 rate: Represents the sampling frequency in Hz of the original 655 audio data. Permissible value is 44100 only. 657 baseLayer: Indicates the encoded bit-rate in kbps for the audio 658 data to be streamed. Permissible values are 66, 105 and 132. 660 Optional parameters: 662 ptime: see RFC4566[2] 664 maxptime: see RFC4566[2] 665 The frame length of ATRAC3 is 1024/44100 = 23.22...(ms), and 666 fractional value may not be applicable for the SDP definition. 667 So the the value of the parameter MUST be a multiple of 24(ms) 668 considering safe transmission. 669 If this parameter is not present, the sender MAY encapsulate 670 a maximum of 6 encoded frames into one RTP packet, in streaming 671 of ATRAC3. 673 maxRedundantFrames: The maximum number of redundant frames that may 674 be sent during a session in any given packet under the redundant 675 framing mechanism detailed in the draft. Allowed values are integers 676 in the range of 0 to 15, inclusive. If this parameter is not used, a 677 default of 15 MUST be assumed. 679 Encoding considerations: This media type is framed and contains 680 binary data. 682 Security considerations: This media type does not carry active 683 content. See Section 9 of this document. 685 Interoperability considerations: none 687 Published specification: ATRAC3 Standard Specification[9] 688 Applications that use this media type: 689 Audio and video streaming and conferencing tools. 691 Additional information: none 692 Magic number(s): none 693 File extension(s): 'at3', 'aa3', and 'omg' 694 Macintosh file type code(s): none 696 Person & email address to contact for further information: 697 Mitsuyuki Hatanaka 698 Jun Matsumoto 699 actech@jp.sony.com 701 Intended usage: COMMON 703 Restrictions on usage: This media type depends on RTP framing, 704 and hence is only defined for transfer via RTP. 706 Author: 707 Mitsuyuki Hatanaka 708 Jun Matsumoto 709 actech@jp.sony.com 711 Change controller: IETF AVT WG delegated from the IESG 713 7.2 ATRAC-X Media type Registration 715 The media subtype for the Adaptive TRansform Codec version X 716 (ATRAC-X) uses the template defined in RFC 4855 [6]. 718 Note, any unknown parameter MUST be ignored by the receiver. 720 Type name: audio 722 Subtype name: atrac-x 724 Required parameters: 725 rate: Represents the sampling frequency in Hz of the original 726 audio data. Permissible values are 44100 and 48000. 728 baseLayer: Indicates the encoded bit-rate in kbps for the audio 729 data to be streamed. Permissible values are 32, 48, 64, 96, 128, 730 160, 192, 256, 320 and 352. 732 channelID: Indicates the number of channels and channel layout 733 according to the table1 in Section 7.4. Note that this layout is 734 different from that proposed in RFC 3551 [3]. However, as 735 channelID = 0 defines an ambiguous channel layout, the channel 736 mapping defined in Section 4.1 of [3] could be used. Permissible 737 values are 0, 1, 2, 3, 4, 5, 6, 7. 739 Optional parameters: 741 ptime: see RFC4566[2] 743 maxptime: see RFC4566[2] 744 The frame length of ATRAC-X is 2048/44100 = 46.44...(ms) or 745 2048/48000 = 42.67...(ms), but fractional value may not be applicable 746 for the SDP definition. So the value of the parameter MUST be a 747 multiple of 47(ms) or 43(ms) considering safe transmission. 749 If this parameter is not present, the sender MAY encapsulate a 750 maximum of 16 encoded frames into one RTP packet, in streaming 751 of ATRAC-X. 753 maxRedundantFrames: The maximum number of redundant frames that 754 may be sent during a session in any given packet under the redundant 755 framing mechanism detailed in the draft. Allowed values are integers 756 in the range 0 to 15, inclusive. If this parameter is not used, a 757 default of 15 MUST be assumed. 759 delayMode: Indicates a desire to use low-delay features, in which 760 case the decoder will process received data accordingly based on 761 this value. Permissible values are 2 and 4. 763 Encoding considerations: This media type is framed and contains 764 binary data. 766 Security considerations: This media type does not carry active 767 content. See Section 9 of this document. 769 Interoperability considerations: none 771 Published specification: ATRAC-X Standard Specification[10] 773 Applications that use this media type: 774 Audio and video streaming and conferencing tools. 776 Additional information: none 778 Magic number(s): none 779 File extension(s): 'atx', 'aa3', and 'omg' 780 Macintosh file type code(s): none 782 Person & email address to contact for further information: 783 Mitsuyuki Hatanaka 784 Jun Matsumoto 785 actech@jp.sony.com 787 Intended usage: COMMON 789 Restrictions on usage: This media type depends on RTP framing, 790 and hence is only defined for transfer via RTP. 792 Author: 793 Mitsuyuki Hatanaka 794 Jun Matsumoto 795 actech@jp.sony.com 797 Change controller: IETF AVT WG delegated from the IESG 799 7.3 ATRAC Advanced Lossless Media type Registration 801 The media subtype for the Adaptive TRansform Codec Lossless version 802 (ATRAC Advanced Lossless) uses the template defined in RFC 4855 [6]. 804 Note, any unknown parameter MUST be ignored by the receiver. 806 Type name: audio 808 Subtype name: atrac-advanced-lossless 810 Required parameters: 811 rate: Represents the sampling frequency in Hz of the original 812 audio data. Permissible value is 44100 only for High-speed transfer 813 mode. Any value of 24000, 32000, 44100, 48000, 64000, 88200, 814 96000, 176400 and 192000 can be used for Standard mode. 816 baseLayer: Indicates the encoded bit-rate in kbps for the base 817 layer in High-Speed Transfer mode lossless encodings. 818 For Standard lossless mode this value MUST be 0. 819 The Permissible values for ATRAC3 baselayer are 66, 105 and 132. 820 For ATRAC-X baselayer, they are 32, 48, 64, 96, 128, 160, 192, 256, 821 320 and 352. 823 blockLength: Indicates the block length. In High-speed Transfer 824 mode, the value of 1024 and 2048 is used for ATRAC3 and ATRAC-X 825 based ATRAC Advanced Lossless streaming, respectively. 826 Any value of 512, 1024 and 2048 can be used for Standard mode. 828 channelID: Indicates the number of channels and channel layout 829 according to the table1 in Section 7.4. Note that this layout is 830 different from that proposed in RFC 3551 [3]. However, as channelID 831 = 0 defines an ambiguous channel layout, the channel mapping defined 832 in Section 4.1 of [3] could be used in this case. Permissible values 833 are 0, 1, 2, 3, 4, 5, 6, 7. 835 ptime: see RFC4566[2] 837 maxptime: see RFC4566[2] 838 In streaming of ATRAC Advanced Lossless, multiple frames cannot be 839 transmitted in a single RTP packet, as the frame size is large. 840 So it SHOULD be regarded as the time of one encoded frame in both of 841 the sender and the receiver side. The frame length of ATRAC Advanced 842 Lossless is 512/44100 = 11.6...(ms), 1024/44100 = 23.22...(ms) or 843 2048/44100 = 46.44...(ms), but fractional value may not be applicable 844 for the SDP definition. So the the value of the parameter MUST be 845 12(ms), 24(ms) or 47(ms) considering safe transmission. 847 Encoding considerations: This media type is framed and contains 848 binary data. 850 Security considerations: This media type does not carry active 851 content. See Section 9 of this document. 853 Interoperability considerations: none 854 Published specification: 855 ATRAC Advanced Lossless Standard Specification[11] 857 Applications that use this media type: 858 Audio and video streaming and conferencing tools. 860 Additional information: none 862 Magic number(s): none 863 File extension(s): 'aal', 'aa3', and 'omg' 864 Macintosh file type code(s): none 866 Person & email address to contact for further information: 868 Mitsuyuki Hatanaka 869 Jun Matsumoto 870 actech@jp.sony.com 872 Intended usage: COMMON 874 Restrictions on usage: This media type depends on RTP framing, 875 and hence is only defined for transfer via RTP. 877 Author: 878 Mitsuyuki Hatanaka 879 Jun Matsumoto 880 actech@jp.sony.com 882 Change controller: IETF AVT WG delegated from the IESG 884 7.4 Channel Mapping Configuration Table 886 The Table 1 explains the mapping between the channelID as 887 passed during SDP negotiations, and the speaker mapping the 888 value represents. 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 891 | channelID | Number of | Default Speaker | 892 | | Channels | Mapping | 893 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 | 0 | max 64 | undefined | 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 | 1 | 1 | front: center | 897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 898 | 2 | 2 | front: left, right | 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 900 | 3 | 3 | front: left, right | 901 | | | front: center | 902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 903 | 4 | 4 | front: left, right | 904 | | | front: center | 905 | | | rear: surround | 906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 | 5 | 5+1 | front: left, right | 908 | | | front: center | 909 | | | rear: left, right | 910 | | | LFE | 911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 912 | 6 | 6+1 | front: left, right | 913 | | | front: center | 914 | | | rear: left, right | 915 | | | rear: center | 916 | | | LFE | 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 918 | 7 | 7+1 | front: left, right | 919 | | | front: center | 920 | | | rear: left, right | 921 | | | side: left, right | 922 | | | LFE | 923 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 925 Table 1. Channel Configuration 927 7.5 Mapping Media type Parameters into SDP 929 The information carried in the Media type specification has a 930 specific mapping to fields in the Session Description Protocol (SDP) 931 [2], which is commonly used to describe RTP sessions. When SDP is 932 used to specify sessions employing the ATRAC family of codecs, the 933 following mapping rules according to the ATRAC codec apply: 935 7.5.1 For Media subtype ATRAC3 937 o The Media type ("audio") goes in SDP "m=" as the media name 939 o The Media subtype (payload format name) goes in SDP "a=rtpmap" as 940 the encoding name. ATRAC3 supports only mono or stereo signals, 941 so a corresponding number of channels(0 or 1) MUST also be 942 specified in this attribute. 944 o The "baseLayer" parameter goes in SDP "a=fmtp". This parameter 945 MUST be present. "maxRedundantFrames" may follow, but if no value 946 is transmitted, the receiver SHOULD assume a default value of 947 "15". 949 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 950 "a=maxptime" attributes, respectively. 952 7.5.2 For Media subtype ATRAC-X 954 o The Media type ("audio") goes in SDP "m=" as the media name 956 o The Media subtype (payload format name) goes in SDP "a=rtpmap" as 957 the encoding name. This SHOULD be followed by the "sampleRate" 958 (as the RTP clock rate), and then the actual number of channels 959 regardless of the channelID parameter. 961 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 962 "a=maxptime" attributes, respectively. 964 o Any remaining parameters go in the SDP "a=fmtp" attribute by 965 copying them directly from the Media type string as a 966 semicolon separated list of parameter=value pairs. The 967 "baseLayer" parameter MUST be the first entry on this line. 968 The "channelID" parameter MUST be the next entry. The receiver 969 MUST assume a default value of "15" for "maxRedundantFrames". 971 7.5.3 For Media subtype ATRAC Advanced Lossless 973 o The Media type ("audio") goes in SDP "m=" as the media name 975 o The Media subtype (payload format name) goes in SDP "a=rtpmap" as 976 the encoding name. This MUST be followed by the "sampleRate" 977 (as the RTP clock rate), and then the actual number of channels 978 regardless of the channelID parameter. 980 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 981 "a=maxptime" attributes, respectively. 983 o Any remaining parameters go in the SDP "a=fmtp" attribute by 984 copying them directly from the Media type string as a 985 semicolon separated list of parameter=value pairs. 986 On this line, the parameters "baseLayer" and "blockLength" 987 MUST be present in this order. 988 The value of "blockLength" MUST be one of 1024 and 2048, for 989 using ATRAC3 and ATRAC-X as baselayer, respectively. 990 If "baseLayer=0" (means standard mode), "blockLength" MUST be one 991 of either 512, 1024, or 2048. The "channelID" parameter MUST be 992 the next entry . The receiver MUST assume a default value of "15" 993 for "maxRedundantFrames". 995 7.6 Offer-Answer Model Considerations 997 Some options for encoding and decoding ATRAC audio data will require 998 either or both of the sender and receiver to comply with certain 999 specifications. In order to establish an interoperable transmission 1000 framework, an Offer-Answer negotiation in SDP MUST observe the 1001 following considerations. (See reference [14].): 1003 7.6.1 For All Three Media Subtypes 1005 o Each combination of the RTP payload transport format configuration 1006 parameters (baseLayer and blockLength, sampleRate, channelID) is 1007 unique in its bit-pattern and not compatible with any other 1008 combination. When creating an offer in an application desiring to 1009 use the more advanced features (sample rates above 44100kHz, more 1010 than two channels), the offerer SHOULD also offer a payload type 1011 containing only the lowest set of necessary requirements. 1012 If multiple configurations are of interest to the application 1013 they may all be offered. 1015 o The parameters "maxptime" and "ptime" will in most cases not 1016 affect interoperability, however the setting of the parameters can 1017 affect the performance of the application. The SDP offer-answer 1018 handling of the "ptime" parameter is described in RFC3264. 1019 The "maxptime" parameter MUST be handled in the same way. 1021 7.6.2 For Media subtype ATRAC3 1023 o In response to an offer, downgraded subsets of "baseLayer" are 1024 possible. However for best performance, we suggest the answer 1025 contain the highest possible values offered. 1027 7.6.3 For Media subtype ATRAC-X 1029 o In response to an offer, downgraded subsets of "sampleRate", 1030 "baseLayer", and "channelID" are possible. For best performance, 1031 an answer MUST NOT contain any values requiring further 1032 capabilities than the offer contains, but it SHOULD provide values 1033 as close as possible to those in the offer. 1035 o The "maxRedundantFrames" is a suggested minimum. This value MAY 1036 be increased in an answer (with a maximum of 15), but MUST NOT be 1037 reduced. 1039 o The optional parameter "delayMode" is non-negotiable. If the 1040 Answerer cannot comply with the offered value, the session MUST be 1041 deemed inoperable. 1043 7.6.4 For Media subtype ATRAC Advanced Lossless 1045 o In response to an offer, downgraded subsets of "sampleRate", 1046 "baseLayer", and "channelID" are possible. For best performance, 1047 an answer MUST NOT contain any values requiring further 1048 capabilities than the offer contains, but it SHOULD provide values 1049 as close as possible to those in the offer. 1051 o There are no requirements when negotiating "blockLength", other 1052 than that both parties must be in agreement. 1054 o The "maxRedundantFrames" is a suggested minimum. This value MAY 1055 be increased in an answer (with a maximum of 15), but MUST NOT be 1056 reduced. 1058 o For transmission of scalable multi-session streaming of ATRAC 1059 Advanced Lossless content, the attributes of media stream 1060 identification, group information and decoding dependency between 1061 base layer stream and enhancement layer stream MUST be signaled 1062 in SDP by offer/answer model. In this case, the attribute of 1063 "group", "mid" and "depend" followed by appropriate parameter MUST 1064 be used in SDP[7][8] in order to indicate layered coding 1065 dependency. The attribute of "group" followed by "DDP" parameter 1066 is used for indicating relationship between the base and the 1067 enhancement layer stream with decoding dependency. Each stream is 1068 identified by "mid" attribute, and the dependency of enhancement 1069 layer stream is defined by "depend" attribute, as the enhancement 1070 layer is only useful when the base layer is available. Examples 1071 for signaling ATRAC Advanced Lossless decoding dependency are 1072 described in section 7.8 and 7.9. 1074 7.7 Usage of declarative SDP 1076 In declarative usage, like SDP in RTSP [15] or SAP [16], the 1077 parameters MUST be interpreted as follows: 1079 o The payload format configuration parameters (baseLayer, 1080 sampleRate, channelID) are all declarative and a participant MUST 1081 use the configuration(s) provided for the session. More than one 1082 configuration may be provided if necessary by declaring multiple 1083 RTP payload types, however the number of types SHOULD be kept 1084 small. 1086 o Any "maxptime" and "ptime" values SHOULD be selected with care to 1087 ensure that the session's participants can achieve reasonable 1088 performance. 1090 o The attribute of "mid", "group" and "depend" MUST be used for 1091 indicating relationship and dependency of the base layer and 1092 the enhancement layer in scalable multi-session streaming of ATRAC 1093 ADVANCED LOSSLESS content, as described in 7.6, 7.8 and 7.9. 1095 7.8 Example SDP Session Descriptions 1097 Example usage of ATRAC-X with stereo at 44100Hz: 1099 v=0 1100 o=atrac 2465317890 2465317890 IN IP4 service.example.com 1101 s=ATRAC-X Streaming 1102 c=IN IP4 192.0.2.1/127 1103 t=3409539540 3409543140 1104 m=audio 49120 RTP/AVP 99 1105 a=rtpmap:99 ATRAC-X/44100/2 1106 a=fmtp:99 baseLayer=128; channelID=2; delayMode=2 1107 a=maxptime:47 1109 Example usage of ATRAC-X with 5.1 setup at 48000Hz: 1111 v=0 1112 o=atrac 2465317890 2465317890 IN IP4 service.example.com 1113 s=ATRAC-X 5.1ch Streaming 1114 c=IN IP4 192.0.2.1/127 1115 t=3409539540 3409543140 1116 m=audio 49120 RTP/AVP 99 1117 a=rtpmap:99 ATRAC-X/48000/6 1118 a=fmtp:99 baseLayer=320; channelID=5 1119 a=maxptime:43 1120 Example usage of ATRAC-Advanced-Lossless in multiplexed 1121 High-Speed Transfer mode: 1123 v=0 1124 o=atrac 2465317890 2465317890 IN IP4 service.example.com 1125 s=AAL Multiplexed Streaming 1126 c=IN IP4 192.0.2.1/127 1127 t=3409539540 3409543140 1128 m=audio 49200 RTP/AVP 96 1129 a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 1130 a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2 1131 a=maxptime:47 1133 Example usage of ATRAC-Advanced-Lossless in multi-session 1134 High-Speed Transfer mode. In this case, the base layer and the 1135 enhancement layer stream are identified by L1 and L2 respectively, 1136 and L2 depends on L1 in decoding. 1138 v=0 1139 o=atrac 2465317890 2465317890 IN IP4 service.example.com 1140 s=AAL Multi Session Streaming 1141 c=IN IP4 192.0.2.1/127 1142 t=3409539540 3409543140 1143 a=group:DDP L1 L2 1144 m=audio 49200 RTP/AVP 96 1145 a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 1146 a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2 1147 a=maxptime:47 1148 a=mid:L1 1149 m=audio 49202 RTP/AVP 97 1150 a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2 1151 a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2 1152 a=maxptime:47 1153 a=mid:L2 1154 a=depend:97 lay L1:96 1156 Example usage of ATRAC-Advanced-Lossless in Standard mode: 1158 m=audio 49200 RTP/AVP 99 1159 a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2 1160 a=fmtp:99 baseLayer=0; blockLength=1024; channelID=2 1161 a=maxptime:24 1163 7.9 Example Offer-Answer Exchange 1165 The following Offer/Answer example shows how a desire to stream 1166 multi-channel content is turned down by the receiver, who answers 1167 with only the ability to receive stereo content: 1169 Offer: 1171 m=audio 49170 RTP/AVP 98 99 1172 a=rtpmap:98 ATRAC-X/44100/6 1173 a=fmtp:98 baseLayer=320; channelID=5 1174 a=rtpmap:99 ATRAC-X/44100/2 1175 a=fmtp:99 baseLayer=160; channelID=2 1177 Answer: 1179 m=audio 49170 RTP/AVP 99 1180 a=rtpmap:99 ATRAC-X/44100/2 1181 a=fmtp:99 baseLayer=160; channelID=2 1183 The following Offer/Answer example shows the receiver answering with 1184 a selection of supported parameters: 1186 Offer: 1188 m=audio 49170 RTP/AVP 97 98 99 1189 a=rtpmap:97 ATRAC-X/44100/2 1190 a=fmtp:97 baseLayer=128; channelID=2 1191 a=rtpmap:98 ATRAC-X/44100/6 1192 a=fmtp:98 baseLayer=128; channelID=5 1193 a=rtpmap:99 ATRAC-X/48000/6 1194 a=fmtp:99 baseLayer=320; channelID=5 1196 Answer: 1198 m=audio 49170 RTP/AVP 97 98 1199 a=rtpmap:97 ATRAC-X/44100/2 1200 a=fmtp:97 baseLayer=128; channelID=2 1201 a=rtpmap:98 ATRAC-X/44100/6 1202 a=fmtp:98 baseLayer=128; channelID=5 1204 The following Offer/Answer example shows an exchange in trying to 1205 resolve using ATRAC-Advanced-Lossless. The offer contains three 1206 options: multi-session High-Speed Transfer mode, multiplexed High- 1207 Speed Transfer mode, and Standard mode. 1209 Offer: 1211 // Multi-session High-Speed Transfer mode, L1 and L2 are correspond to 1212 the base layer and the enhancement layer respectively, and L2 depends 1213 on L1 in decoding. 1215 a=group:DDP L1 L2 1216 m=audio 49200 RTP/AVP 96 1217 a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 1218 a=fmtp:96 baseLayer=132; blockLength=1024; channelID=2 1219 a=maxptime:24 1220 a=mid:L1 1222 m=audio 49202 RTP/AVP 97 1223 a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2 1224 a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2 1225 a=maxptime:24 1226 a=mid:L2 1227 a=depend:97 lay L1:96 1229 // Multiplexed High-Speed Transfer mode 1230 m=audio 49200 RTP/AVP 98 1231 a=rtpmap:98 ATRAC-ADVANCED-LOSSLESS/44100/2 1232 a=fmtp:98 baseLayer=256; blockLength=2048; channelID=2 1233 a=maxptime:47 1235 // Standard mode 1236 m=audio 49200 RTP/AVP 99 1237 a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2 1238 a=fmtp:99 baseLayer=0; blockLength=2048; channelID=2 1239 a=maxptime:47 1241 Answer: 1243 a=group:DDP L1 L2 1244 m=audio 49200 RTP/AVP 94 1245 a=rtpmap:94 ATRAC-ADVANCED-LOSSLESS/44100/2 1246 a=fmtp:94 baseLayer=132; blockLength=1024; channelID=2 1247 a=maxptime:24 1248 a=mid:L1 1250 m=audio 49202 RTP/AVP 95 1251 a=rtpmap:95 ATRAC-ADVANCED-LOSSLESS/44100/2 1252 a=fmtp:95 baseLayer=0; blockLength=2048; channelID=2 1253 a=maxptime:24 1254 a=mid:L2 1255 a=depend:95 lay L1:94 1256 Note that payload format (encoding) names are commonly shown in 1257 upper case. Media subtypes are commonly shown in lower case. 1258 These names are case-insensitive in both places. Similarly, 1259 parameter names are case-insensitive both in Media types and in 1260 the default mapping to the SDP a=fmtp attribute. 1262 8. IANA Considerations 1264 Three new Media subtypes, for audio/ATRAC3, audio/ATRAC-X, 1265 audio/ATRAC-ADVANCED-LOSSLESS are requested to be registered 1266 (see Section 7). 1268 9. Security Considerations 1270 The payload format as described in this document is subject to the 1271 security considerations defined in RFC3550 [1] and any applicable 1272 profile, for example RFC 3551 [3]. Also the security of media type 1273 registration MUST be taken into account as described in section 5 of 1274 RFC 4855[6]. 1276 The payload for ATRAC family consists solely of compressed audio 1277 data to be decoded and presented as sound, and the standard 1278 specifications of ATRAC3, ATRAC-X and ATRAC Advanced Lossless[9][10] 1279 [11] strictly define the bit stream syntax and the buffer model in 1280 decoder side for each codec. So they can not carry "active content" 1281 that could impose malicious side-effects upon the receiver, and 1282 they does not cause any problem of illegal resource consumption in 1283 receiver side, as far as the bit streams are conforming to their 1284 standard specifications. 1286 This payload format does not implement any security mechanisms of 1287 its own. Confidentiality, integrity protection, and authentication 1288 have to be provided by a mechanism external to this payload format, 1289 e.g., SRTP RFC3711[13]. 1291 10. Considerations on Correct Decoding 1293 10.1 Verification of the Packets 1295 Verification of the received encoded audio packets MUST be performed 1296 so as to ensure correct decoding of the packets. As a most primitive 1297 implementation, the comparison of the packet size and payload length 1298 can be taken into account. If the UDP packet length is longer than 1299 the RTP packet length, the packet can be accepted, but the extra 1300 bytes MUST be ignored. In case of receiving shorter UDP packet or 1301 improperly encoded packets, the packets MUST be discarded. 1303 10.2 Validity Checking of the Packets 1305 Also validity checking of the received audio packets MUST be 1306 performed. It can be carried out by decoding process, as ATRAC 1307 format is designed so that the validity of data frames can be 1308 determined by decoding algorithm. The required decoder response to 1309 a malformed frame is to discard the malformed data and conceal the 1310 errors in the audio output until a valid frame is detected and 1311 decoded. This is expected to prevent crashes and other abnormal 1312 decoder behavior in response to errors or attacks. 1314 11. References 1316 11.1 Normative References 1318 [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobsen, 1319 "RTP: A Transport Protocol for Real-Time Applications", 1320 RFC 3550, STD 64, July 2003. 1322 [2] Handley, M. , V. Jacobson and C. Perkins, "SDP: Session 1323 Description Protocol", RFC 4566, July 2006. 1325 [3] Schulzrinne, H., "RTP Profile for Audio and Video Conferences 1326 with Minimal Control", RFC 3551, STD 65, July 2003. 1328 [4] Bradner, S., "Key words for use in RFCs to Indicate 1329 Requirement Levels, BCP 14", RFC 2119, March 1997. 1331 [5] N. Freed, J. Klensin, 1332 "Media Type Specifications and Registration Procedures", 1333 RFC 4288, STD 64, March 2005. 1335 [6] S. Casner, 1336 "Media Type Registration of RTP Payload Formats", 1337 RFC 4855, STD 64, July 2003. 1339 [7] Camarillo, G., Holler, J., and H. Schulzrinne, "Grouping of 1340 Media Lines in the Session Description Protocol (SDP)", 1341 RFC 3388, March2002. 1343 [8] Schierl, T., Wenger, S. "draft-ietf-mmusic-decoding- 1344 dependency-04.txt", Internet draft, February 25 2008. 1346 [9] ATRAC3 Standard Specification ver.1.1, 1347 Sony Corporation, 2003. 1349 [10] ATRAC-X Standard Specification ver.1.2, 1350 Sony Corporation, 2004. 1352 [11] ATRAC Advanced Lossless Standard Specification ver.1.1, 1353 Sony Corporation, 2007. 1355 ATRAC specifications[9-11] are provided for the ATRAC licensed 1356 users. See https://datatracker.ietf.org/ipr/ for the details 1357 of the license. 1359 11.2 Informative References 1361 [12] Perkins, C., Kouvelas, I., Hodon, O., Hardman, V., Handley, M., 1362 Bolot, J.C., Vega-Garcia, A. and Fosse-Parisis, S., 1363 "RTP Payload for Redundant Audio Data", RFC 2198, 1364 September 1997. 1366 [13] Baugher, M., Carrara, E., McGrew, D., Naslund, M., and Norrman, 1367 "The Secure Real Time Transport Protocol", RFC 3711, 1368 March 2004. 1370 [14] Rosenberg, J. and Schulzrinne, "An Offer/Answer Model with the 1371 Session Description Protocol (SDP)", RFC 3264, June 2002. 1373 [15] Schulzrinne, H., Rao, and Lanphier, "Real Time Streaming 1374 Protocol (RTSP)", RFC 2326, April 1998. 1376 [16] Handley, M., Perkins, C. and Whelan, "Session Announcement 1377 Protocol", RFC 2974, October 2000. 1379 Authors' Addresses 1381 Mitsuyuki Hatanaka 1382 Sony Corporation, Japan 1383 1-7-1 Konan 1384 Minato-ku 1385 Tokyo 108-0075 1386 Japan 1388 Jun Matsumoto 1389 Sony Corporation, Japan 1390 1-7-1 Konan 1391 Minato-ku 1392 Tokyo 108-0075 1393 Japan 1395 Email: actech@jp.sony.com 1397 Acknowledgment 1399 Funding for the RFC Editor function is currently provided by the 1400 Internet Society.