idnits 2.17.1 draft-ietf-codec-ambisonics-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC7845, but the abstract doesn't seem to directly say this. It does mention RFC7845 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 134 has weird spacing: '... order n = ...' == Line 135 has weird spacing: '... degree m = k...' -- The document date (August 13, 2018) is 2082 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 codec J. Skoglund 3 Internet-Draft Google LLC 4 Updates: 7845 (if approved) M. Graczyk 5 Intended status: Standards Track August 13, 2018 6 Expires: February 14, 2019 8 Ambisonics in an Ogg Opus Container 9 draft-ietf-codec-ambisonics-09 11 Abstract 13 This document defines an extension to the Opus audio codec to 14 encapsulate coded ambisonics using the Ogg format. It also contains 15 updates to RFC 7845 to reflect necessary changes in the description 16 of channel mapping families. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on February 14, 2019. 35 Copyright Notice 37 Copyright (c) 2018 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 3. Ambisonics With Ogg Opus . . . . . . . . . . . . . . . . . . 3 55 3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3 56 3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4 57 3.3. Allowed Numbers of Channels . . . . . . . . . . . . . . . 5 58 4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 5. Updates to RFC 7845 . . . . . . . . . . . . . . . . . . . . . 6 60 5.1. Format of the Channel Mapping Table . . . . . . . . . . . 7 61 5.2. Unknown Mapping Families . . . . . . . . . . . . . . . . 8 62 6. Experimental Mapping Families . . . . . . . . . . . . . . . . 8 63 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 64 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 65 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 66 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 68 10.2. Informative References . . . . . . . . . . . . . . . . . 10 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 71 1. Introduction 73 Ambisonics is a representation format for three dimensional sound 74 fields which can be used for surround sound and immersive virtual 75 reality playback. See [gerzon75] and [daniel04] for technical 76 details on the ambisonics format. For the purposes of the this 77 document, ambisonics can be considered a multichannel audio stream. 78 A separate stereo stream can be used alongside the ambisonics in a 79 head-tracked virtual reality experience to provide so-called non- 80 diegetic audio - audio which should remain unchanged by listener head 81 rotation; e.g., narration or stereo music. Ogg is a general purpose 82 container, supporting audio, video, and other media. It can be used 83 to encapsulate audio streams coded using the Opus codec. See 84 [RFC6716] and [RFC7845] for technical details on the Opus codec and 85 its encapsulation in the Ogg container respectively. 87 This document extends the Ogg Opus format by defining two new channel 88 mapping families for encoding ambisonics. The Ogg Opus format is 89 extended indirectly by adding items with values 2 and 3 to the IANA 90 "Opus Channel Mapping Families" registry. When 2 or 3 are used as 91 the Channel Mapping Family Number in an Ogg stream, the semantic 92 meaning of the channels in the multichannel Opus stream is one of the 93 ambisonics layouts defined in this document. This mapping can also 94 be used in other contexts which make use of the channel mappings 95 defined by the Opus Channel Mapping Families registry. Furthermore, 96 mapping families 240 through 254 (inclusively) are reserved for 97 experimental use. 99 2. Terminology 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 103 "OPTIONAL" in this document are to be interpreted as described in BCP 104 14 [RFC2119] [RFC8174] when, and only when, they appear in all 105 capitals, as shown here. 107 3. Ambisonics With Ogg Opus 109 Ambisonics can be encapsulated in the Ogg format by encoding with the 110 Opus codec and setting the channel mapping family value to 2 or 3 in 111 the Ogg identification header (ID). A demuxer implementation 112 encountering Channel Mapping Family 2 or Family 3 MUST interpret the 113 Opus stream as containing ambisonics with the format described in 114 Section 3.1 or Section 3.2, respectively. 116 3.1. Channel Mapping Family 2 118 This channel mapping uses the same channel mapping table format used 119 by channel mapping family 1. The output channels are ambisonic 120 components ordered in Ambisonic Channel Number (ACN) order, defined 121 in Figure 1, followed by two optional channels of non-diegetic stereo 122 indexed (left, right). The terms order and degree are defined 123 according to [ambix]. 125 ACN = n * (n + 1) + m, 126 for order n and degree m. 128 Figure 1: Ambisonic Channel Number (ACN) 130 For the ambisonic channels the ACN component corresponds to channel 131 index as k = ACN. The reverse correspondence can also be computed 132 for an ambisonic channel with index k. 134 order n = floor(sqrt(k)), 135 degree m = k - n * (n + 1). 137 Figure 2: Ambisonic Degree and Order from ACN 139 Note that channel mapping family 2 allows for so-called mixed order 140 ambisonic representation where only a subset of the full ambisonic 141 order number of channels is encoded. By specifying the full number 142 in the channel count field, the inactive ACNs can then be indicated 143 in the channel mapping field using the index 255. 145 Ambisonic channels are normalized with Schmidt Semi-Normalization 146 (SN3D). The interpretation of the ambisonics signal as well as 147 detailed definitions of ACN channel ordering and SN3D normalization 148 are described in [ambix] Section 2.1. 150 3.2. Channel Mapping Family 3 152 In this mapping, C output channels (the channel count) are generated 153 at the decoder by multiplying K = N + M decoded channels with a 154 designated demixing matrix, D, having C rows and K columns (C and K 155 do not have to be equal). Here, N denotes the number of streams 156 encoded and M the number of these which are coupled to produce two 157 channels. As for channel mapping family 2 this mapping family also 158 allows for encoding and decoding of full order ambisonics, mixed 159 order ambisonics, and for non-diegetic stereo channels, but also has 160 the added flexibility of mixing channels. Let X denote a column 161 vector containing K decoded channels X1, X2, ..., XK (from N 162 streams), and let S denote a column vector containing C output 163 streams S1, S2, ..., SC. Then S = D X, i.e., 165 / \ / \ / \ 166 | S1 | | D11 D12 ... D1K | | X1 | 167 | S2 | | D21 D22 ... D2K | | X2 | 168 | ... | = | ... ... ... ... | | ... | 169 | SC | | DC1 DC2 ... DCK | | XK | 170 \ / \ / \ / 172 Figure 3: Demixing in Channel Mapping Family 3 174 The matrix MUST be provided in the channel mapping table part of the 175 identification header, see section 5.1.1 in [RFC7845]. The matrix 176 replaces the need for a channel mapping field and for channel mapping 177 family 3 the mapping table has the following layout: 179 0 1 2 3 180 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 181 +-+-+-+-+-+-+-+-+ 182 | Stream Count | 183 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 184 | Coupled Count | Demixing Matrix : 185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 187 Figure 4: Channel Mapping Table for Channel Mapping Family 3 189 The fields in the channel mapping table have the following meaning: 191 1. Stream Count 'N' (8 bits, unsigned): 193 This is the total number of streams encoded in each Ogg packet. 195 2. Coupled Stream Count 'M' (8 bits, unsigned): 197 This is the number of the N streams whose decoders are to be 198 configured to produce two channels (stereo). 200 3. Demixing Matrix (16*K*C bits, signed): 202 The coefficients of the demixing matrix stored in column-major 203 order as 16-bit, signed, two's complement fixed-point values with 204 15 fractional bits (Q15), little endian. If needed, the output 205 gain field can be used for a normalization scale. For mixed 206 order ambisonic representations, the silent ACN channels are 207 indicated by all zeros in the corresponding rows of the mixing 208 matrix. This allows also for mixed order with non-diegetic 209 stereo as the number of columns implies the presence of non- 210 diegetic channels. 212 Note that [RFC7845] specifies that the identification header cannot 213 exceed one "page", which is 65,025 octets. This limits the ambisonic 214 order, which then MUST be lower than 12, if full order is utilized 215 and the number of coded streams is the same as the ambisonic order 216 plus the two non-diegetic channels. The total output channel number, 217 C, MUST be set in the 3rd field of the identification header. 219 3.3. Allowed Numbers of Channels 221 For both channel mapping family 2 and family 3, the allowed numbers 222 of channels: (1 + n)^2 + 2j for n = 0, 1, ..., 14 and j = 0 or 1, 223 where n denotes the (highest) ambisonic order and j denotes whether 224 or not there is a separate non-diegetic stereo stream. This 225 corresponds to periphonic ambisonics from zeroth to fourteenth order 226 plus potentially two channels of non-diegetic stereo. Explicitly the 227 allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 228 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 229 196, 198, 225, and 227. Note again that if full ambisonic order is 230 used and the number of coded streams is the same as the ambisonic 231 order plus the two non-diegetic channels, due to the identification 232 header length limit, the order MUST be lower than 12. 234 4. Downmixing 236 The downmixing matrices in this section are only examples known to 237 give acceptable results for stereo downmixing from ambisonics, but 238 other mixing strategies will be allowed, e.g., to emphasize a certain 239 panning. 241 An Ogg Opus player MAY use the matrix in Figure 5 to implement 242 downmixing from multichannel files using Channel Mapping Family 2 and 243 3, when there is no non-diegetic stereo. The first and second 244 ambisonic channels are known as "W" and "Y" respectively. The 245 omitted coefficients in the matrix in the figure have the value 0.0. 247 / \ / \ / \ 248 | L | | 0.5 0.5 0.0 ... | | W | 249 | R | = | 0.5 -0.5 0.0 ... | | Y | 250 \ / \ / | ... | 251 \ / 253 Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 254 - only Ambisonic Channels 256 The first ambisonic channel (W) is a mono audio stream which 257 represents the average audio signal over all directions. Since W is 258 not directional, Ogg Opus players MAY use W directly for mono 259 playback. 261 If a non-diegetic stereo track is present, the player MAY use the 262 matrix in Figure 6 for downmixing. Ls and Rs denote the two non- 263 diegetic stereo channels. 265 / \ / \ / \ 266 | L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W | 267 | R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y | 268 \ / \ / | ... | 269 | Ls | 270 | Rs | 271 \ / 273 Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 274 - Ambisonic Channels Plus a Non-diegetic Stereo Stream 276 5. Updates to RFC 7845 277 5.1. Format of the Channel Mapping Table 279 The language in section 5.1.1 in [RFC7845] implies that the channel 280 mapping table, when present, has a fixed format for all channel 281 mapping families: 283 The order and meaning of these channels are defined by a channel 284 mapping, which consists of the 'channel mapping family' octet and, 285 for channel mapping families other than family 0, a 'channel 286 mapping table', as illustrated in Figure 3. 288 This document updates [RFC7845] to clarify that the format of the 289 channel mapping table may depend on the channel mapping family: 291 The order and meaning of these channels are defined by a channel 292 mapping, which consists of the 'channel mapping family' octet and 293 for channel mapping families other than family 0, a 'channel 294 mapping table'. 296 The format of the channel mapping table depends on the channel 297 mapping family. Unless the channel mapping family requires a 298 custom format for its channel mapping table, the RECOMMENDED 299 channel mapping table format for new mapping families is 300 illustrated in Figure 3. 302 The change above is not meant to change how families 1 and 255 303 currently work. To ensure that, the first paragraph of 304 Section 5.1.1.2 is changed from: 306 Allowed numbers of channels: 1...8. Vorbis channel order (see 307 below). 309 to 311 Allowed numbers of channels: 1...8, with the mapping specified 312 according to Figure 3. Vorbis channel order (see below). 314 Similary, the first paragraph of Section 5.1.1.3 is changed from: 316 Allowed numbers of channels: 1...255. No defined channel meaning. 318 to 320 Allowed numbers of channels: 1...255, with the mapping specified 321 according to Figure 3. No defined channel meaning. 323 5.2. Unknown Mapping Families 325 The treatment of unknown mapping families is changed slightly. 326 Section 5.1.1.4 of [RFC7845] states: 328 The remaining channel mapping families (2...254) are reserved. A 329 demuxer implementation encountering a reserved 'channel mapping 330 family' value SHOULD act as though the value is 255. 332 This is changed to: 334 The remaining channel mapping families (2...254) are reserved. A 335 demuxer implementation encountering a 'channel mapping family' 336 value that it does not recognize SHOULD NOT attempt to decode the 337 packets and SHOULD NOT use any information except for the first 19 338 octets of the ID header packet (Fig. 2) and the comment header 339 (Fig. 10). 341 6. Experimental Mapping Families 343 To make development of new mapping families easier while reducing the 344 risk of creating compatibility issues with non-final version of 345 mapping families, mapping families 240 through 254 (inclusively) are 346 now reserved for experiments and implementations of in-development 347 families. Note that these mapping family experiments are not 348 restricted to ambisonics. Implementers SHOULD attempt to use 349 experimental family numbers that have not recently been used and 350 SHOULD advertise what experimental numbers they use (e.g. for 351 Internet-Drafts). 353 The ambisonics mapping experiments that led to this document used 354 experimental family 254 for family 2 and experimental family 253 for 355 family 3. 357 7. Security Considerations 359 Implementations of the Ogg container need to take appropriate 360 security considerations into account, as outlined in Section 10 of 361 [RFC7845]. The extension defined in this document requires that 362 semantic meaning be assigned to more channels than the existing Ogg 363 format requires. Since more allocations will be required to encode 364 and decode these semantically meaningful channels, care should be 365 taken in any new allocation paths. Implementations MUST NOT overrun 366 their allocated memory nor read from uninitialized memory when 367 managing the ambisonic channel mapping. 369 8. IANA Considerations 371 This document updates the IANA Media Types registry "Opus Channel 372 Mapping Families" to add 17 new assignments. 374 +---------+------------------------------+--------------------------+ 375 | Value | Description | Reference | 376 +---------+------------------------------+--------------------------+ 377 | 0 | Mono, L/R stereo | Section 5.1.1.1 of | 378 | | | [RFC7845] | 379 | | | | 380 | 1 | 1-8 channel surround | Section 5.1.1.2 of | 381 | | | [RFC7845] | 382 | | | | 383 | 2 | Ambisonics as individual | Section 3.1 of this | 384 | | channels | document | 385 | | | | 386 | 3 | Ambisonics with demixing | Section 3.2 of this | 387 | | matrix | document | 388 | | | | 389 | 240-254 | Experimental use | Section 6 of this | 390 | | | document | 391 | | | | 392 | 255 | Discrete channels | Section 5.1.1.3 of | 393 | | | [RFC7845] | 394 +---------+------------------------------+--------------------------+ 396 9. Acknowledgments 398 Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin 399 Gorzel, and Andrew Allen for their guidance and valuable 400 contributions to this document. 402 10. References 404 10.1. Normative References 406 [ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi, 407 "AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011, 408 . 411 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 412 Requirement Levels", BCP 14, RFC 2119, 413 DOI 10.17487/RFC2119, March 1997, 414 . 416 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 417 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 418 September 2012, . 420 [RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation 421 for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845, 422 April 2016, . 424 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 425 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 426 May 2017, . 428 10.2. Informative References 430 [daniel04] 431 Daniel, J. and S. Moreau, "Further Study of Sound Field 432 Coding with Higher Order Ambisonics", May 2004, 433 . 436 [gerzon75] 437 Gerzon, M., "Ambisonics. Part one: General system 438 description", August 1975, 439 . 442 Authors' Addresses 444 Jan Skoglund 445 Google LLC 446 345 Spear Street 447 San Francisco, CA 94105 448 USA 450 Email: jks@google.com 452 Michael Graczyk 454 Email: michael@mgraczyk.com