idnits 2.17.1 draft-hatanaka-avt-rtp-atrac-family-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 722. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 699. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 706. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 712. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 728), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 37. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Number of Frames (NFrames): 4 bits The number of frames in this packet. This allows for a maximum of 16 ATRAC-encoded audio encapsulations per packet, with 0 indicating one frame. Keep in mind only the first frame is allowed to be fragmented. Additionally, this MUST not be anything other than 0 for subsequent packets containing the fragmented frame. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 8, 2004) is 7204 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '6' is defined on line 651, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 656, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 659, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2327 (ref. '2') (Obsoleted by RFC 4566) -- Obsolete informational reference (is this intentional?): RFC 3267 (ref. '6') (Obsoleted by RFC 4867) Summary: 7 errors (**), 0 flaws (~~), 8 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport M. Romaine 2 Internet-Draft M. Hatanaka 3 Expires: January 6, 2005 J. Matsumoto 4 SONY 5 July 8, 2004 7 RTP Payload Format for ATRAC Family 8 draft-hatanaka-avt-rtp-atrac-family-02 10 Status of this Memo 12 By submitting this Internet-Draft, I certify that any applicable 13 patent or other IPR claims of which I am aware have been disclosed, 14 or will be disclosed, and any of which I become aware will be 15 disclosed, in accordance with RFC 3668. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as 20 Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on January 6, 2005. 35 Copyright Notice 37 Copyright (C) The Internet Society (2004). All Rights Reserved. 39 Abstract 41 This document describes an RTP payload format for efficient and 42 flexible transporting of audio data encoded with the Adaptive 43 TRansform Audio Codec (ATRAC) family of codecs. Recent enhancements 44 to the ATRAC family of codecs support high quality audio coding with 45 multiple channels. The RTP payload format as presented in this 46 document includes support for data fragmentation and elementary 47 redundancy measures. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1 ATRAC Details . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 5 55 3.1 RTP Header . . . . . . . . . . . . . . . . . . . . . . . . 5 56 3.2 Payload Header . . . . . . . . . . . . . . . . . . . . . . 5 57 3.3 Payload Data . . . . . . . . . . . . . . . . . . . . . . . 6 58 4. Frame Packetization . . . . . . . . . . . . . . . . . . . . . 8 59 4.1 Example Fragmented ATRAC Frame . . . . . . . . . . . . . . 8 60 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 10 61 5.1 ATRAC3 MIME Registration . . . . . . . . . . . . . . . . . 10 62 5.2 ATRAC-X MIME Registraion . . . . . . . . . . . . . . . . . 11 63 5.3 Channel Mapping Configuration Table . . . . . . . . . . . 13 64 5.4 Mapping MIME Parameters into SDP . . . . . . . . . . . . . 13 65 5.4.1 For MIME subtype ATRAC3 . . . . . . . . . . . . . . . 14 66 5.4.2 For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 14 67 5.5 Offer-Answer Model Considerations . . . . . . . . . . . . 14 68 5.5.1 For MIME subtype ATRAC3 . . . . . . . . . . . . . . . 14 69 5.5.2 For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 14 70 5.6 Example SDP Session Descriptions . . . . . . . . . . . . . 15 71 5.7 Example Offer-Answer Exchange . . . . . . . . . . . . . . 15 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 74 7.1 Confidentiality . . . . . . . . . . . . . . . . . . . . . 17 75 7.2 Authentication . . . . . . . . . . . . . . . . . . . . . . 17 76 7.3 Decoding Validation . . . . . . . . . . . . . . . . . . . 17 77 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 78 8.1 Normative References . . . . . . . . . . . . . . . . . . . . 18 79 8.2 Informative References . . . . . . . . . . . . . . . . . . . 18 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 18 81 Intellectual Property and Copyright Statements . . . . . . . . 20 83 1. Introduction 85 The ATRAC family of perceptual audio codecs are designed to address 86 numerous needs for high-quality, low bit-rate audio transfer. ATRAC 87 technology can be found in many consumer and professional products 88 and applications, including MD players, voice recorders, mobile 89 phones, and CD players. The need for real-time streaming of audio 90 data has grown, and this document details our efforts in increasing 91 the product and application space for the ATRAC family of codecs. 93 Recent advances in ATRAC technology allow for multiple channels of 94 audio to be encoded in customizable groupings. This should allow for 95 future expansions in scaled streaming. To provide the greatest 96 flexibility in streaming any one of the ATRAC family member codecs 97 however, this payload format does not distinguish between the codecs 98 on a packet level. 100 This simplified payload format contains only the basic information 101 needed to disassemble a packet of ATRAC audio in order to decode it. 102 Timestamps are in sample units, with audio data currently encoded 103 into frames of 1024 or 2048 samples depending on the ATRAC version. 104 There is also basic support for fragmentation and redundancy, as 105 ATRAC frames MAY exceed an MTU size of 1500 octets. 107 Although streaming of multi-channel audio is supported depending on 108 the ATRAC version used, all encoded audio for a given time period is 109 contained within a single frame. Therefore, there is no interleaving 110 nor splitting of audio data on a channel-basis to be concerned with. 112 1.1 ATRAC Details 114 Early versions of the ATRAC codec handled only two channels of audio 115 at 44.1kHz sampling frequency, with typical bit-rates between 66kbps 116 and 132kbps. The latest version allows for up to 8 channels of audio 117 at 96kHz sampling frequency. The feasible bit-rate range has also 118 expanded, allowing from 8kbps to 1400kbps. 120 Depending on the version of ATRAC used, the sample-frame size is 121 either 1024 or 2048. Actual bit-rates are determined by specifying a 122 fixed encoded frame-size. In other words, instead of requesting a 123 stereo 44.1kHz stream at, say, 64kbps, one would tell the encoder to 124 create encoded frame-sizes of 364bytes. 126 2. Conventions 128 The key words "MUST, "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in RFC 2119 [4]. 132 3. Payload Format 134 3.1 RTP Header 136 0 1 2 3 137 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 138 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 139 |V=2|P|X| CC |M| PT | sequence number | 140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 141 | timestamp | 142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 143 | synchronization source (SSRC) identifier | 144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 | contributing source (CSRC) identifiers | 146 | ..... | 147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 149 Marker (M): 1 bit 150 Set to zero as silence suppression is currently not used. 152 Payload Type (PT): 7 bits 153 The payload type can either be dynamically allocated at the 154 application level, or an RTP profile for a class of applications is 155 expected to assign the payload type for this format. A dynamic 156 allocation SHOULD designate this format as ATRAC-Family. 158 Sequence number: 16 bits 159 This field is as defined in RFC 3550 [1]. 161 Timestamp: 32 bits 162 A timestamp representing the sampling time of the first sample of the 163 first ATRAC frame in the RTP packet. The clock frequency MUST be set 164 to the sample rate of the encoded audio data, and is conveyed 165 out-of-band. 167 3.2 Payload Header 169 The ATRAC family payload header is a scant two octets. This should 170 make processing very simple. 172 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 174 |C|FrgNo| Rsrvd |NFrames| FrOff | 175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 177 Continuous flag (C): 1 bit Set to one if this is a continuation of a 178 fragmented packet. 180 Fragment Number (FrgNo): 3 bits 181 In the event of data fragmentation, this value is 1 for the first 182 packet, and increases sequentially for the remaining fragmented data 183 packets. 185 Number of Frames (NFrames): 4 bits 186 The number of frames in this packet. This allows for a maximum of 16 187 ATRAC-encoded audio encapsulations per packet, with 0 indicating one 188 frame. Keep in mind only the first frame is allowed to be 189 fragmented. Additionally, this MUST not be anything other than 0 for 190 subsequent packets containing the fragmented frame. 192 Frame Offset (FrOff): 4 bits 193 The purpose of frame offsets is to provide a basic mechanism for the 194 transmission of redundant data. Redundant frames are sent 195 sequentially before any new frames in the same packet. The timestamp 196 also reflects the playback time of the first frame in a packet, even 197 if the first frame is a redundant frame. Frame-size lengths are 198 determined during SDP negotiations (one of either 1024 or 2048 199 samples), and are fixed for a given session. A "maxRedundantFrames" 200 parameter is also sent during SDP negotations; this allows for the 201 necessary buffer size to be calculated in advance. 203 As an example of using Frame Offsets, refer to Figure 1, which 204 considers a situation when FrOff is 2. If a packet has 4 frames of 205 audio, with each frame representing 1024 samples of audio, then we 206 can calculate that playback begins with 2 frames (2048 samples) of 207 redundant data, and can allocate buffer space as necessary. (The 208 only other necessary variable is sampling frequency, which MUST have 209 been established during SDP negotiations). This field SHOULD NOT be 210 used in packets containing fragmented data. 212 |-Fr1-|-Fr2-|-Fr3-|-Fr4-| Nth Packet, TS=1 213 |-Fr3-|-Fr4-|-Fr5-|-Fr6-| N+1th Packet, TS=3 214 |-Fr5-|-Fr6-|-Fr7-|-Fr8-| N+1th Packet, TS=5 216 3.3 Payload Data 218 ATRAC payload data consists of 2 octets which represent the 219 byte-length of encoded audio data. After that, the actual audio data 220 follows. 222 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 224 | Block Length | Rsrvd | ATRAC data... | 225 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 Block length: 12 bits 228 The byte length of encoded audio data until the end of the current 229 packet. This is so that in the case of fragmentation, if only a 230 subsequent packet is received, decoding can still occur. 12 bits 231 allows for a maximum block length of 4096 bytes. In the event a data 232 block is larger than 4096 bytes but would still fit within MTU 233 limits, fragmentation MUST occur. 235 4. Frame Packetization 237 Each RTP packet contains either an integer number of ATRAC encoded 238 audio frames, with a maximum of 16, or one ATRAC frame fragment. 240 As many complete ATRAC frames as can fit in a single path-MTU SHOULD 241 be placed in an RTP packet, with the aforementioned maximum of 16. 242 However, if an ATRAC frame will not fit into an RTP packet, it MUST 243 be fragmented. 245 The start of a fragmented frame gets placed in its own RTP packet, 246 its Continuous bit (C) set to one, and its Fragment Number (FragNo) 247 set to one. As the frame must be the only one in the packet, the 248 Number of Frames field is zero. Subsequent packets are to contain 249 the remaining fragmented frame data, with the Fragment Number 250 increasing sequentially and the Continuous bit (C) consistently set 251 to one. As subsequent packets do not contain any new frames, the 252 Number of Frames field SHOULD be ignored. The last packet of 253 fragmented data MUST have the Continuous bit (C) set to zero. 255 In the event of fragmentation, the basic redundancy measures should 256 NOT be used. 258 4.1 Example Fragmented ATRAC Frame 260 An example of a fragmented ATRAC frame is presented below. The 261 encoded audio data frame is split over three RTP packets. For 262 brevity, the RTP packet header details have been excluded. 264 Packet 1: 265 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 |1| 1 | Rsrvd | 0 | 0 | block length | Rsrvd | 268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 269 | ATRAC data... | 270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 272 Packet 2: 273 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 |1| 2 | Rsrvd | 0 | 0 | block length | Rsrvd | 276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 277 | ...more ATRAC data... | 278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 Packet 3: 281 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 283 |0| 3 | Rsrvd | 0 | 0 | block length | Rsrvd | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 | ...the last of the ATRAC data | 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 The following points highlight important characteristics of the 289 example above: 290 o the transition from one to zero of the Continuous bit (C) 291 o a sequential increase in the Fragment Number 293 5. Payload Format Parameters 295 Certain parameters will need to be defined before ATRAC family 296 encoded content can be streamed. Other optional parameters may also 297 be defined to take advantage of specific features relevant to certain 298 ATRAC versions. Parameters for ATRAC3 and ATRAC-X are defined here 299 as part of the MIME subtype registration process. A mapping of these 300 parameters into the Session Description Protocol (SDP) (RFC 2327) [2] 301 is also provided for applications that utilize SDP. 303 The data format and parameters are specified for real-time transport 304 in RTP. 306 5.1 ATRAC3 MIME Registration 308 The MIME subtype for the Adaptive TRansform Codec version 3 (ATRAC3) 309 is allocated from the Vendor tree since this codec is intended to be 310 used with commercial products, and use of any ATRAC family codec 311 requires a license from Sony Corporation, the vendor. 313 Note, any unspecified parameter MUST be ignored by the receiver. 315 Media Type name: audio 317 Media subtype name: vnd.sony.atrac3 319 Required parameters: 321 frameLength: Indicates the size in bytes of an encoded audio frame. 322 In essence, this value determines the bit-rate of the encoded audio. 323 Permissible values are 192 (66kbps), 304 (105kbps), and 384 324 (132kbps). 326 Optional parameters: 328 maxRedundantFrames: The maximum number of redundant frames that may 329 be sent during a session in any given packet under the redundant 330 framing mechanism detailed in the draft. 332 maxptime: The maximum amount of media which can be encapsulated in a 333 payload packet, expressed as time in milliseconds. The time is 334 calculated as the sum of the time the media present in the packet 335 represents. The time SHOULD be a multiple of the frame size. If 336 this parameter is not present, the sender MAY encapsulate a maximum 337 of 16 encoded frames into one RTP packet. 339 ptime: see RFC 2327 [2] 340 Encoding considerations: This type is defined for transfer via RTP 341 RFC 3550 [1]. 343 Security considerations: Audio data is believed to offer no security 344 risks. 346 Public specifications: Please refer to section 7 of this draft. 348 Macintosh file type code: none 349 Object identifier or OID: none 351 Person & email address to contact for further information: 352 Mitsuyuki Hatanaka 353 hatanaka@av.crl.sony.co.jp 355 Intended usage: LIMITED USE 356 Only licensees of ATRAC technology may use this type. 358 Author/Change controller: 359 hatanaka@av.crl.sony.co.jp 361 5.2 ATRAC-X MIME Registraion 363 The MIME subtype for the Adaptive TRansform Codec version X (ATRAC-X) 364 is allocated from the Vendor tree since this codec is intended to be 365 used with commercial products, and use of any ATRAC family codec 366 requires a license from Sony Corporation, the vendor. 368 Note, any unspecified parameter MUST be ignored by the receiver. 370 Media Type name: audio 372 Media subtype name: vnd.sony.atrac-x 374 Required parameters: 376 sampleRate: Represents the sampling frequency in Hz of the original 377 audio data. Permissible values are 32000, 44100, 48000, 88200, 378 96000. 380 frameLength: Indicates the size in bytes of an encoded audio frame. 381 In essence, this value determines the bitrate of the encoded audio. 382 Permissible values lie within 8 ~ 8192. 384 channelID: Indicates the number of channels and channel layout 385 according to the table in Section 5.3. Note that this layout is 386 different from that proposed in RFC 3551 [3]. However, as channelID 387 = 0 defines an ambiguous channel layout, the channel mapping defined 388 in Section 4.1 of [3] could be used. Permissible values are 0, 1, 2, 389 3, 4, 5, 6, 7. 391 Optional parameters: 393 maxRedundantFrames: The maximum number of redundant frames that may 394 be sent during a session in any given packet under the redundant 395 framing mechanism detailed in the draft. If this parameter is not 396 used, a default of "16" SHOULD be assumed. 398 delayMode: Indicates a desire to use low-delay features, in which 399 case the decoder will process received data accordingly based on this 400 value. Permissible values are 2 and 4. 402 encryptionMode: Indicates whether the audio frames have been 403 encrypted using OpenMG ("OpenMG") or a third party method ("Other). 404 If "Other", the specific mode MUST be determined at the application 405 level. Permissible values are "OpenMG" and "Other". 407 maxptime: The maximum amount of media which can be encapsulated in a 408 payload packet, expressed as time in milliseconds. The time is 409 calculated as the sum of the time the media present in the packet 410 represents. The time SHOULD be a multiple of the frame size. If 411 this parameter is not present, the sender MAY encapsulate a maximum 412 of 16 encoded frames into one RTP packet. 414 ptime: see RFC 2327 [2] 416 Encoding considerations: This type is defined for transfer via RTP 417 (RFC 3550) [1]. 419 Security considerations: 420 Audio data is believed to offer no security risks. 422 Public specifications: 423 Please refer to section 7 of this draft. 425 Macintosh file type code: none 426 Object identifier or OID: none 428 Person & email address to contact for further information: 429 Mitsuyuki Hatanaka 430 hatanaka@av.crl.sony.co.jp 432 Intended usage: LIMITED USE 433 Only licensees of ATRAC technology may use this type. 435 Author/Change controller: 437 hatanaka@av.crl.sony.co.jp 439 5.3 Channel Mapping Configuration Table 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 | channelID | Number of | Default Speaker | 443 | | Channels | Mapping | 444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 445 | 0 | max 64 | undefined | 446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 447 | 1 | 1 | front: center | 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 449 | 2 | 2 | front: left, right | 450 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 451 | 3 | 3 | front: left, right | 452 | | | front: center | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | 4 | 4 | front: left, right | 455 | | | front: center | 456 | | | rear: surround | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | 5 | 5+1 | front: left, right | 459 | | | front: center | 460 | | | rear: left, right | 461 | | | LFE | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | 6 | 6+1 | front: left, right | 464 | | | front: center | 465 | | | rear: left, right | 466 | | | rear: center | 467 | | | LFE | 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | 7 | 7+1 | front: left, right | 470 | | | front: center | 471 | | | rear: left, right | 472 | | | side: left, right | 473 | | | LFE | 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 5.4 Mapping MIME Parameters into SDP 478 The information carried in the MIME media type specification has a 479 specific mapping to fields in the Session Description Protocol (SDP) 480 [2], which is commonly used to describe RTP sessions. When SDP is 481 used to specify sessions employing the ATRAC family of codecs, the 482 following mapping rules according to the ATRAC codec apply: 484 5.4.1 For MIME subtype ATRAC3 485 o The MIME type ("audio") goes in SDP "m=" as the media name 486 o The MIME subtype (payload format name) goes in SDP "a=rtpmap" as 487 the encoding name. 488 o The "frameLength" parameter goes in SDP "a=fmtp". This parameter 489 MUST be present. "maxRedundantFrames" may follow, but if no value 490 is transmitted, the receiver SHOULD assume a default value of 491 "16". 492 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 493 "a=maxptime" attributes, respectively. 495 5.4.2 For MIME subtype ATRAC-X 496 o The MIME type ("audio") goes in SDP "m=" as the media name 497 o The MIME subtype (payload format name) goes in SDP "a=rtpmap" as 498 the encoding name. This should be followed by the "sampleRate" 499 (as the RTP clock rate), and then the total number of channels. 500 o Any remaining parameters go in the SDP "a=fmtp" attribute by 501 copying them directly from the MIME media type string as a 502 semicolon separated list of parameter=value pairs. The 503 "frameLength" parameter must be the first entry on this line. It 504 is recommened that the "channelID" parameter be the next entry. 505 The receiver MUST assume a default value of "16" for 506 "maxRedundantFrames". 507 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 508 "a=maxptime" attributes, respectively. 510 5.5 Offer-Answer Model Considerations 512 Some options for encoding and decoding ATRAC audio data will require 513 either or both the sender and receiver to comply with certain 514 specifications. In order to establish an interoperable transmission 515 framework, an Offer-Answer negotiation in SDP should observe the 516 following considerations: 518 5.5.1 For MIME subtype ATRAC3 519 o Downgraded subsets of "frameLength" are possible. However for 520 best performance, we suggest the Answerer respond with the highest 521 possible values offered. 523 5.5.2 For MIME subtype ATRAC-X 524 o When creating an offer with considerably high requirements (such 525 as 8 channels at 96kHz), it is RECOMMENDED that the offerer also 526 propose a configuration with lower requirements, such as a stereo 527 only option. Although multiple alternative configurations may be 528 offered, care should be taken to not offer too many payload types. 529 o Downgraded subsets of "sampleRate", "frameLength", and "channelID" 530 are possible. However for best performance, we suggest the 531 Answerer respond with the highest possible values offered. 533 o The "maxRedundantFrames" is a suggested minimum. The Answerer MAY 534 use a higher value, but MUST NOT use a lower value. 535 o The optional parameters "delayMode" and "encryptionMode" are 536 non-negotiable. Thus, if the Answerer cannot comply with the 537 offered value, the session must be deemed inoperable. 538 o The parameters "maxptime" and "ptime" should not, in most cases, 539 affect the interoperability. However, the parameter settings can 540 affect application performance. 542 5.6 Example SDP Session Descriptions 544 Example usage of ATRAC-X with stereo at 44100Hz: 546 m=audio 49120 RTP/AVP 99 547 a=rtpmap:99 ATRAC-X/44100/2 548 a=fmtp:99 frameLength=312; channelID=2; delayMode=2 549 a=maxptime:20 551 Example usage of ATRAC-X with 5.1 setup at 48000Hz: 553 m=audio 49120 RTP/AVP 99 554 a=rtpmap:99 ATRAC-X/48000/6 555 a=fmtp:99 frameLength=1156; channelID=5 556 a=maxptime:30 558 5.7 Example Offer-Answer Exchange 560 An example Offer/Answer model (assuming ATRAC Family's PT is 99). 562 Alice's Offer: 564 m=audio 49170 RTP/AVP 99 565 a=rtpmap:98 ATRAC-X/44100/6 566 a=fmtp:99 frameLength=1156; channelID=5 567 a=rtpmap:99 ATRAC-X/44100/6 568 a=fmtp:99 frameLength=386; channelID=5 570 Bob's Answer: 572 m=audio 49170 RTP/AVP 99 573 a=rtpmap:99 ATRAC-X/44100/2 574 a=fmtp:99 frameLength=386; channelID=2 576 6. IANA Considerations 578 New MIME subtypes for ATRAC3 and ATRAC-X are currently being 579 registered (see Section 5). 581 7. Security Considerations 583 Certain security precautions may be desired to protect copyrighted 584 material. The payload format as described in this document is 585 subject to the security considerations defined in RFC3550 [1]. This 586 payload format however does not implement any security mechanisms of 587 its own. External means, such as SRTP [5], MAY be used since the 588 audio compression scheme follows an end-to-end model. 590 Since the data transported is audio that is already encoded, the main 591 security issues are confidentiality, integrity, and authentication of 592 the actual audio. 594 7.1 Confidentiality 596 To ensure confidentiality of ATRAC encoded audio, the audio frames 597 will have to be encrypted. Encryption of the payload header, 598 however, is not as neccessary, and in fact may not be preferrable if 599 the information could be useful to some third party application. 601 Because the audio compression scheme follows an end-to-end model, 602 encryption may be performed after packet encapsulation. As 603 multi-channel transmissions are contained in single encoded audio 604 frames, there is no concern for encryption affecting interleaving 605 data. 607 7.2 Authentication 609 Transmitted data may be tampered or altered due malicious attempts, 610 such as man-in-the-middle attacks. Such attacks may result in 611 depacketization and/or decoding errors that could decimate audio 612 quality. 614 As this payload format does not include its own means for sender 615 authentication and integrity protection, an external mechanism must 616 be used. It is RECOMMENDED, however, that the chosen mechanism 617 protect more than just the audio data bits. For example, to protect 618 against a man-in-the-middle attack, the payload header and RTP header 619 SHOULD be protected. 621 7.3 Decoding Validation 623 Verification of the received encoded audio packets should be 624 performed so as to ensure a minimal level of audio quality. As a 625 most primitive implementation, if the receiver calculates a packet 626 size differing from the payload length based on data in the payload 627 header fields, the receiver SHOULD discard the packet. 629 8. References 631 8.1 Normative References 633 [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen, 634 "RTP: A Transport Protocol for Real-Time Applications", RFC 635 3550, July 2003. 637 [2] Handley, M. and V. Jacobson, "SDP: Session Description 638 Protocol", RFC 2327, April 1998. 640 [3] Schulzrinne, H., "RTP Profile for Audio and Video Conferences 641 with Minimal Control", RFC 3551, July 2003. 643 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 644 Levels, BCP 14", RFC 2119, March 1997. 646 8.2 Informative References 648 [5] Kerr, P., "RTP Payload Format for Vorbis Encoded Audio", October 649 2003. 651 [6] Sjoberg, J., "Real-Time Transport Protocol (RTP) Payload Format 652 and File Storage Format for the Adaptive Multi-Rate (AMR) and 653 Adpative Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, 654 June 2002. 656 [7] Baugher, M., Carrara, E., McGrew, D., Naslund, M. and Norrman, 657 "The Secure Real Time Transport Protocol", July 2003. 659 [8] Rosenberg, J. and Schulzrinne, "An Offer/Answer Model with the 660 Session Description Protocl (SDP)", RFC 3264, June 2002. 662 Authors' Addresses 664 Matthew Romaine 665 Sony Corporation, Japan 666 6-7-35 Kitashinagawa 667 Shinagawa-ku 668 Tokyo 141-0001 669 Japan 671 EMail: Matthew.Romaine@jp.sony.com 672 Mitsuyuki Hatanaka 673 Sony Corporation, Japan 674 6-7-35 Kitashinagawa 675 Shinagawa-ku 676 Tokyo 141-0001 677 Japan 679 EMail: hatanaka@av.crl.sony.co.jp 681 Jun Matsumoto 682 Sony Corporation, Japan 683 6-7-35 Kitashinagawa 684 Shinagawa-ku 685 Tokyo 141-0001 686 Japan 688 EMail: jun@av.crl.sony.co.jp 690 Intellectual Property Statement 692 The IETF takes no position regarding the validity or scope of any 693 Intellectual Property Rights or other rights that might be claimed to 694 pertain to the implementation or use of the technology described in 695 this document or the extent to which any license under such rights 696 might or might not be available; nor does it represent that it has 697 made any independent effort to identify any such rights. Information 698 on the procedures with respect to rights in RFC documents can be 699 found in BCP 78 and BCP 79. 701 Copies of IPR disclosures made to the IETF Secretariat and any 702 assurances of licenses to be made available, or the result of an 703 attempt made to obtain a general license or permission for the use of 704 such proprietary rights by implementers or users of this 705 specification can be obtained from the IETF on-line IPR repository at 706 http://www.ietf.org/ipr. 708 The IETF invites any interested party to bring to its attention any 709 copyrights, patents or patent applications, or other proprietary 710 rights that may cover technology that may be required to implement 711 this standard. Please address the information to the IETF at 712 ietf-ipr@ietf.org. 714 Disclaimer of Validity 716 This document and the information contained herein are provided on an 717 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 718 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 719 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 720 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 721 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 722 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 724 Copyright Statement 726 Copyright (C) The Internet Society (2004). This document is subject 727 to the rights, licenses and restrictions contained in BCP 78, and 728 except as set forth therein, the authors retain all their rights. 730 Acknowledgment 732 Funding for the RFC Editor function is currently provided by the 733 Internet Society.