idnits 2.17.1 draft-ietf-codec-ambisonics-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC7845, but the abstract doesn't seem to directly say this. It does mention RFC7845 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 140 has weird spacing: '... order n = ...' == Line 141 has weird spacing: '... degree m = k...' -- The document date (May 01, 2018) is 2180 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 codec J. Skoglund 3 Internet-Draft Google Inc. 4 Updates: 7845 (if approved) M. Graczyk 5 Intended status: Standards Track May 01, 2018 6 Expires: November 2, 2018 8 Ambisonics in an Ogg Opus Container 9 draft-ietf-codec-ambisonics-05 11 Abstract 13 This document defines an extension to the Opus audio codec to 14 encapsulate coded ambisonics using the Ogg format. It also contains 15 updates to RFC 7845 to reflect necessary changes in the description 16 of channel mapping families. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on November 2, 2018. 35 Copyright Notice 37 Copyright (c) 2018 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 3. Ambisonics With Ogg Opus . . . . . . . . . . . . . . . . . . 3 55 3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3 56 3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4 57 4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 6 58 5. Updates to RFC 7845 . . . . . . . . . . . . . . . . . . . . . 6 59 5.1. Format of the Channel Mapping Table . . . . . . . . . . . 6 60 5.2. Unknown Mapping Families . . . . . . . . . . . . . . . . 7 61 6. Experimental Mapping Families . . . . . . . . . . . . . . . . 8 62 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 63 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 64 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 65 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 66 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 67 10.2. Informative References . . . . . . . . . . . . . . . . . 9 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 70 1. Introduction 72 Ambisonics is a representation format for three dimensional sound 73 fields which can be used for surround sound and immersive virtual 74 reality playback. See [gerzon75] and [daniel04] for technical 75 details on the ambisonics format. For the purposes of the this 76 document, ambisonics can be considered a multichannel audio stream. 77 A separate stereo stream can be used alongside the ambisonics in a 78 head-tracked virtual reality experience to provide so-called non- 79 diegetic audio - audio which should remain unchanged by listener head 80 rotation; e.g., narration or stereo music. Ogg is a general purpose 81 container, supporting audio, video, and other media. It can be used 82 to encapsulate audio streams coded using the Opus codec. See 83 [RFC6716] and [RFC7845] for technical details on the Opus codec and 84 its encapsulation in the Ogg container respectively. 86 This document extends the Ogg Opus format by defining two new channel 87 mapping families for encoding ambisonics. The Ogg Opus format is 88 extended indirectly by adding an item with value 2 or 3 to the IANA 89 "Opus Channel Mapping Families" registry. When 2 or 3 are used as 90 the Channel Mapping Family Number in an Ogg stream, the semantic 91 meaning of the channels in the multichannel Opus stream is one of the 92 ambisonics layouts defined in this document. This mapping can also 93 be used in other contexts which make use of the channel mappings 94 defined by the Opus Channel Mapping Families registry. Furthermore, 95 mapping families 240 through 254 (inclusively) are reserved for 96 experimental use. 98 2. Terminology 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 102 "OPTIONAL" in this document are to be interpreted as described in 103 [RFC2119]. 105 3. Ambisonics With Ogg Opus 107 Ambisonics can be encapsulated in the Ogg format by encoding with the 108 Opus codec and setting the channel mapping family value to 2 or 3 in 109 the Ogg identification header (ID). A demuxer implementation 110 encountering Channel Mapping Family 2 or Family 3 MUST interpret the 111 Opus stream as containing ambisonics with the format described in 112 Section 3.1 or Section 3.2, respectively. 114 3.1. Channel Mapping Family 2 116 Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 117 or 1, where n denotes the (highest) ambisonic order and j whether or 118 not there is a separate non-diegetic stereo stream. This corresponds 119 to periphonic ambisonics from zeroth to fourteenth order plus 120 potentially two channels of non-diegetic stereo. Explicitly the 121 allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 122 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 123 196, 198, 225, 227. 125 This channel mapping uses the same channel mapping table format used 126 by channel mapping family 1. The output channels are ambisonic 127 components ordered in Ambisonic Channel Number (ACN) order, defined 128 in Figure 1, followed by two optional channels of non-diegetic stereo 129 indexed (left, right). 131 ACN = n * (n + 1) + m, 132 for order n and degree m. 134 Figure 1: Ambisonic Channel Number (ACN) 136 For the ambisonic channels the ACN component corresponds to channel 137 index as k = ACN. The reverse correspondence can also be computed 138 for an ambisonic channel with index k. 140 order n = floor(sqrt(k)), 141 degree m = k - n * (n + 1). 143 Figure 2: Ambisonic Degree and Order from ACN 145 Note that channel mapping family 2 allows for so-called mixed order 146 ambisonic representation where only a subset of the full ambisonic 147 order number of channels. By specifying the full number in the 148 channel count field, the inactive ACNs can then be indicated in the 149 channel mapping field using the index 255. 151 Ambisonic channels are normalized with Schmidt Semi-Normalization 152 (SN3D). The interpretation of the ambisonics signal as well as 153 detailed definitions of ACN channel ordering and SN3D normalization 154 are described in [ambix] Section 2.1. 156 3.2. Channel Mapping Family 3 158 Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 159 or 1, where n denotes the (highest) ambisonic order and j whether or 160 not there is a separate non-diegetic stereo stream. This corresponds 161 to periphonic ambisonics from zeroth to fourteenth order plus 162 potentially two channels of non-diegetic stereo. Explicitly the 163 allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 164 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 165 196, 198, 225, 227. 167 In this mapping, C output channels (the channel count) are generated 168 at the decoder by multiplying K = N + M decoded channels with a 169 designated demixing matrix, D, having C rows and K columns. Here, N 170 denotes the number of streams encoded and M the number of these which 171 are coupled to produce two channels. As for channel mapping family 2 172 this mapping family also allows for encoding and decoding of full 173 order ambisonics, mixed order ambisonics, and for non-diegetic stereo 174 channels, but also has the added flexibility of mixing channels. Let 175 X denote a column vector containing K decoded channels X1, X2, ..., 176 XK (from N streams), and let S denote a column vector containing C 177 output streams S1, S2, ..., SC. Then S = D X, i.e., 179 / \ / \ / \ 180 | S1 | | D11 D12 ... D1K | | X1 | 181 | S2 | | D21 D22 ... D2K | | X2 | 182 | ... | = | ... ... ... ... | | ... | 183 | SC | | DC1 DC2 ... DCK | | XK | 184 \ / \ / \ / 186 Figure 3: Demixing in Channel Mapping Family 3 188 The matrix MUST be provided as side information and MUST be stored in 189 the channel mapping table part of the identification header, c.f. 190 section 5.1.1 in [RFC7845]. The matrix replaces the need for a 191 channel mapping field and for channel mapping family 3 the mapping 192 table has the following layout: 194 0 1 2 3 195 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 196 +-+-+-+-+-+-+-+-+ 197 | Stream Count | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 | Coupled Count | Demixing Matrix : 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 202 Figure 4: Channel Mapping Table for Channel Mapping Family 3 204 The fields in the channel mapping table have the following meaning: 206 1. Stream Count 'N' (8 bits, unsigned): 208 This is the total number of streams encoded in each Ogg packet. 210 2. Coupled Stream Count 'M' (8 bits, unsigned): 212 This is the number of the N streams whose decoders are to be 213 configured to produce two channels (stereo). 215 3. Demixing Matrix (16*K*C bits, signed): 217 The coefficients of the demixing matrix stored column-wise as 218 16-bit, signed, two's complement fixed-point values with 15 219 fractional bits (Q15), little endian. If needed, the output gain 220 field can be used for a normalization scale. For mixed order 221 ambisonic representations, the silent ACN channels are indicated 222 by all zeros in the corresponding rows of the mixing matrix. 223 This allows also for mixed order with non-diegetic stereo as the 224 number of columns implies the presence of non-diegetic channels. 226 Note that [RFC7845] specifies that the identification header cannot 227 exceed one "page", which is 65,025 octets. This limits the ambisonic 228 order to be lower than 12, if full order is utilized and the number 229 of coded streams is the same as the ambisonic order plus the two non- 230 diegetic channels. Also note that the total output channel number, 231 C, MUST be set in the 3rd field of the identification header. 233 4. Downmixing 235 An Ogg Opus player MAY use the matrix in Figure 5 to implement 236 downmixing from multichannel files using Channel Mapping Family 2 and 237 3, when there is no non-diegetic stereo. This downmixing is known to 238 give acceptable results for stereo downmixing from ambisonics. The 239 first and second ambisonic channels are known as "W" and "Y" 240 respectively. 242 / \ / \ / \ 243 | L | | 0.5 0.5 0.0 ... | | W | 244 | R | = | 0.5 -0.5 0.0 ... | | Y | 245 \ / \ / | ... | 246 \ / 248 Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 249 - only Ambisonic Channels 251 The first ambisonic channel (W) is a mono audio stream which 252 represents the average audio signal over all directions. Since W is 253 not directional, Ogg Opus players MAY use W directly for mono 254 playback. 256 If a non-diegetic stereo track is present, the player MAY use the 257 matrix in Figure 6 for downmixing. Ls and Rs denote the two non- 258 diegetic stereo channels. 260 / \ / \ / \ 261 | L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W | 262 | R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y | 263 \ / \ / | ... | 264 | Ls | 265 | Rs | 266 \ / 268 Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 269 - Ambisonic Channels Plus a Non-diegetic Stereo Stream 271 5. Updates to RFC 7845 273 5.1. Format of the Channel Mapping Table 275 The language in section 5.1.1 in [RFC7845] implies that the channel 276 mapping table, when present, has a fixed format for all channel 277 mapping families: 279 The order and meaning of these channels are defined by a channel 280 mapping, which consists of the 'channel mapping family' octet and, 281 for channel mapping families other than family 0, a 'channel 282 mapping table', as illustrated in Figure 3. 284 This document updates [RFC7845] to clarify that the format of the 285 channel mapping table may depend on the channel mapping family: 287 The order and meaning of these channels are defined by a channel 288 mapping, which consists of the 'channel mapping family' octet and 289 for channel mapping families other than family 0, a 'channel 290 mapping table'. 292 The format of the channel mapping table depends on the channel 293 mapping family. Unless the channel mapping family requires a 294 custom format for its channel mapping table, the RECOMMENDED 295 channel mapping table format for new mapping families is 296 illustrated in Figure 3. 298 The change above is not meant to change how families 1 and 255 299 currently work. To ensure that, the first paragraph of 300 Section 5.1.1.2 is changed from: 302 Allowed numbers of channels: 1...8. Vorbis channel order (see 303 below). 305 to 307 Allowed numbers of channels: 1...8, with the mapping specified 308 according to Figure 3. Vorbis channel order (see below). 310 Similary, the first paragraph of Section 5.1.1.4 is changed from: 312 Allowed numbers of channels: 1...255. No defined channel meaning. 314 to 316 Allowed numbers of channels: 1...255, with the mapping specified 317 according to Figure 3. No defined channel meaning. 319 5.2. Unknown Mapping Families 321 Treatment of unknown mapping families is changed slightly. 322 Section 5.1.1.4 of [RFC7845] states: 324 The remaining channel mapping families (2...254) are reserved. A 325 demuxer implementation encountering a reserved 'channel mapping 326 family' value SHOULD act as though the value is 255. 328 This is changed to: 330 The remaining channel mapping families (2...254) are reserved. A 331 demuxer implementation encountering a 'channel mapping family' 332 value that it does not recognize SHOULD NOT attempt to decode the 333 packets and SHOULD NOT use any information except for the first 19 334 octets of the ID header packet (Fig. 2) and the comment header 335 (Fig. 10). 337 6. Experimental Mapping Families 339 To make development of new mapping families easier while reducing the 340 risk of creating compatibility issues with non-final version of 341 mapping families, mapping families 240 through 254 (inclusively) are 342 now reserved for experiments and implementations of in-development 343 families. Implementers SHOULD attempt to use experimental family 344 numbers that have not recently been used and SHOULD advertise what 345 experimental numbers they use (e.g. for Internet-Drafts). 347 The ambisonics mapping experiments that led to this document used 348 experimental family 254 for family 2 and experimental family 253 for 349 family 3. 351 7. Security Considerations 353 Implementations of the Ogg container need take appropriate security 354 considerations into account, as outlined in Section 10 of [RFC7845]. 355 The extension defined in this document requires that semantic meaning 356 be assigned to more channels than the existing Ogg format requires. 357 Since more allocations will be required to encode and decode these 358 semantically meaningful channels, care should be taken in any new 359 allocation paths. Implementations MUST NOT overrun their allocated 360 memory nor read from uninitialized memory when managing the ambisonic 361 channel mapping. 363 8. IANA Considerations 365 This document updates the IANA Media Types registry "Opus Channel 366 Mapping Families" to add 17 new assignments. 368 +---------+---------------------------+ 369 | Value | Reference | 370 +---------+---------------------------+ 371 | 2 | This Document Section 3.1 | 372 | | | 373 | 3 | This Document Section 3.2 | 374 | | | 375 | 240-254 | This Document Section 6 | 376 +---------+---------------------------+ 378 9. Acknowledgments 380 Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin 381 Gorzel, and Andrew Allen for their guidance and valuable 382 contributions to this document. 384 10. References 386 10.1. Normative References 388 [ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi, 389 "AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011, 390 . 393 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 394 Requirement Levels", BCP 14, RFC 2119, 395 DOI 10.17487/RFC2119, March 1997, 396 . 398 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 399 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 400 September 2012, . 402 [RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation 403 for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845, 404 April 2016, . 406 10.2. Informative References 408 [daniel04] 409 Daniel, J. and S. Moreau, "Further Study of Sound Field 410 Coding with Higher Order Ambisonics", May 2004, 411 . 414 [gerzon75] 415 Gerzon, M., "Ambisonics. Part one: General system 416 description", August 1975, 417 . 420 Authors' Addresses 421 Jan Skoglund 422 Google Inc. 423 345 Spear Street 424 San Francisco, CA 94105 425 USA 427 Email: jks@google.com 429 Michael Graczyk 431 Email: michael@graczyk.com