idnits 2.17.1 draft-ietf-codec-ambisonics-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 131 has weird spacing: '... order n = ...' == Line 132 has weird spacing: '... degree m = k...' -- The document date (October 30, 2017) is 2369 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 codec J. Skoglund 3 Internet-Draft Google Inc. 4 Intended status: Standards Track M. Graczyk 5 Expires: May 3, 2018 October 30, 2017 7 Ambisonics in an Ogg Opus Container 8 draft-ietf-codec-ambisonics-04 10 Abstract 12 This document defines an extension to the Opus audio codec to 13 encapsulate coded ambisonics using the Ogg format. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at https://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on May 3, 2018. 32 Copyright Notice 34 Copyright (c) 2017 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (https://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 51 3. Ambisonics With Ogg Opus . . . . . . . . . . . . . . . . . . 3 52 3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3 53 3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4 54 4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 6 55 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 56 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 57 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7 58 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 59 8.1. Normative References . . . . . . . . . . . . . . . . . . 7 60 8.2. Informative References . . . . . . . . . . . . . . . . . 7 61 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 63 1. Introduction 65 Ambisonics is a representation format for three dimensional sound 66 fields which can be used for surround sound and immersive virtual 67 reality playback. See [gerzon75] and [daniel04] for technical 68 details on the ambisonics format. For the purposes of the this 69 document, ambisonics can be considered a multichannel audio stream. 70 A separate stereo stream can be used alongside the ambisonics in a 71 head-tracked virtual reality experience to provide so-called non- 72 diegetic audio - audio which should remain unchanged by listener head 73 rotation; e.g., narration or stereo music. Ogg is a general purpose 74 container, supporting audio, video, and other media. It can be used 75 to encapsulate audio streams coded using the Opus codec. See 76 [RFC6716] and [RFC7845] for technical details on the Opus codec and 77 its encapsulation in the Ogg container respectively. 79 This document extends the Ogg Opus format by defining two new channel 80 mapping families for encoding ambisonics. The Ogg Opus format is 81 extended indirectly by adding an item with value 2 or 3 to the IANA 82 "Opus Channel Mapping Families" registry. When 2 or 3 are used as 83 the Channel Mapping Family Number in an Ogg stream, the semantic 84 meaning of the channels in the multichannel Opus stream is one of the 85 ambisonics layouts defined in this document. This mapping can also 86 be used in other contexts which make use of the channel mappings 87 defined by the Opus Channel Mapping Families registry. 89 2. Terminology 91 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 92 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 93 "OPTIONAL" in this document are to be interpreted as described in 94 [RFC2119]. 96 3. Ambisonics With Ogg Opus 98 Ambisonics can be encapsulated in the Ogg format by encoding with the 99 Opus codec and setting the channel mapping family value to 2 or 3 in 100 the Ogg identification header (ID). A demuxer implementation 101 encountering Channel Mapping Family 2 or Family 3 MUST interpret the 102 Opus stream as containing ambisonics with the format described in 103 Section 3.1 or Section 3.2, respectively. 105 3.1. Channel Mapping Family 2 107 Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 108 or 1, where n denotes the (highest) ambisonic order and j whether or 109 not there is a separate non-diegetic stereo stream. This corresponds 110 to periphonic ambisonics from zeroth to fourteenth order plus 111 potentially two channels of non-diegetic stereo. Explicitly the 112 allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 113 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 114 196, 198, 225, 227. 116 This channel mapping uses the same channel mapping table format used 117 by channel mapping family 1. The output channels are ambisonic 118 components ordered in Ambisonic Channel Number (ACN) order, defined 119 in Figure 1, followed by two optional channels of non-diegetic stereo 120 indexed (left, right). 122 ACN = n * (n + 1) + m, 123 for order n and degree m. 125 Figure 1: Ambisonic Channel Number (ACN) 127 For the ambisonic channels the ACN component corresponds to channel 128 index as k = ACN. The reverse correspondence can also be computed 129 for an ambisonic channel with index k. 131 order n = floor(sqrt(k)), 132 degree m = k - n * (n + 1). 134 Figure 2: Ambisonic Degree and Order from ACN 136 Note that channel mapping family 2 allows for so-called mixed order 137 ambisonic representation where only a subset of the full ambisonic 138 order number of channels. By specifying the full number in the 139 channel count field, the inactive ACNs can then be indicated in the 140 channel mapping field using the index 255. 142 Ambisonic channels are normalized with Schmidt Semi-Normalization 143 (SN3D). The interpretation of the ambisonics signal as well as 144 detailed definitions of ACN channel ordering and SN3D normalization 145 are described in [ambix] Section 2.1. 147 3.2. Channel Mapping Family 3 149 Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 150 or 1, where n denotes the (highest) ambisonic order and j whether or 151 not there is a separate non-diegetic stereo stream. This corresponds 152 to periphonic ambisonics from zeroth to fourteenth order plus 153 potentially two channels of non-diegetic stereo. Explicitly the 154 allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 155 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 156 196, 198, 225, 227. 158 In this mapping, C output channels (the channel count) are generated 159 at the decoder by multiplying K = N + M decoded channels with a 160 designated demixing matrix, D, having C rows and K columns. Here, N 161 denotes the number of streams encoded and M the number of these which 162 are coupled to produce two channels. As for channel mapping family 2 163 this mapping family also allows for encoding and decoding of full 164 order ambisonics, mixed order ambisonics, and for non-diegetic stereo 165 channels, but also has the added flexibility of mixing channels. Let 166 X denote a column vector containing K decoded channels X1, X2, ..., 167 XK (from N streams), and let S denote a column vector containing C 168 output streams S1, S2, ..., SC. Then S = D X, i.e., 170 / \ / \ / \ 171 | S1 | | D11 D12 ... D1K | | X1 | 172 | S2 | | D21 D22 ... D2K | | X2 | 173 | ... | = | ... ... ... ... | | ... | 174 | SC | | DC1 DC2 ... DCK | | XK | 175 \ / \ / \ / 177 Figure 3: Demixing in Channel Mapping Family 3 179 The matrix MUST be provided as side information and MUST be stored in 180 the channel mapping table part of the identification header, c.f. 181 section 5.1.1 in [RFC7845]. The matrix replaces the need for a 182 channel mapping field and for channel mapping family 3 the mapping 183 table has the following layout: 185 0 1 2 3 186 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 187 +-+-+-+-+-+-+-+-+ 188 | Stream Count | 189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 190 | Coupled Count | Demixing Matrix : 191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 Figure 4: Channel Mapping Table for Channel Mapping Family 3 195 The fields in the channel mapping table have the following meaning: 197 1. Stream Count 'N' (8 bits, unsigned): 199 This is the total number of streams encoded in each Ogg packet. 201 2. Coupled Stream Count 'M' (8 bits, unsigned): 203 This is the number of the N streams whose decoders are to be 204 configured to produce two channels (stereo). 206 3. Demixing Matrix (16*K*C bits, signed): 208 The coefficients of the demixing matrix stored column-wise as 209 16-bit, signed, two's complement fixed-point values with 15 210 fractional bits (Q15), little endian. If needed, the output gain 211 field can be used for a normalization scale. For mixed order 212 ambisonic representations, the silent ACN channels are indicated 213 by all zeros in the corresponding rows of the mixing matrix. 214 This allows also for mixed order with non-diegetic stereo as the 215 number of columns implies the presence of non-diegetic channels. 217 Note that [RFC7845] specifies that the identification header cannot 218 exceed one "page", which is 65,025 octets. This limits the ambisonic 219 order to be lower than 12, if full order is utilized and the number 220 of coded streams is the same as the ambisonic order plus the two non- 221 diegetic channels. Also note that the total output channel number, 222 C, MUST be set in the 3rd field of the identification header. 224 4. Downmixing 226 An Ogg Opus player MAY use the matrix in Figure 5 to implement 227 downmixing from multichannel files using Channel Mapping Family 2 and 228 3, when there is no non-diegetic stereo. This downmixing is known to 229 give acceptable results for stereo downmixing from ambisonics. The 230 first and second ambisonic channels are known as "W" and "Y" 231 respectively. 233 / \ / \ / \ 234 | L | | 0.5 0.5 0.0 ... | | W | 235 | R | = | 0.5 -0.5 0.0 ... | | Y | 236 \ / \ / | ... | 237 \ / 239 Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 240 - only Ambisonic Channels 242 The first ambisonic channel (W) is a mono audio stream which 243 represents the average audio signal over all directions. Since W is 244 not directional, Ogg Opus players MAY use W directly for mono 245 playback. 247 If a non-diegetic stereo track is present, the player MAY use the 248 matrix in Figure 6 for downmixing. Ls and Rs denote the two non- 249 diegetic stereo channels. 251 / \ / \ / \ 252 | L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W | 253 | R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y | 254 \ / \ / | ... | 255 | Ls | 256 | Rs | 257 \ / 259 Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 260 - Ambisonic Channels Plus a Non-diegetic Stereo Stream 262 5. Security Considerations 264 Implementations of the Ogg container need take appropriate security 265 considerations into account, as outlined in Section 10 of [RFC7845]. 266 The extension defined in this document requires that semantic meaning 267 be assigned to more channels than the existing Ogg format requires. 268 Since more allocations will be required to encode and decode these 269 semantically meaningful channels, care should be taken in any new 270 allocation paths. Implementations MUST NOT overrun their allocated 271 memory nor read from uninitialized memory when managing the ambisonic 272 channel mapping. 274 6. IANA Considerations 276 This document updates the IANA Media Types registry "Opus Channel 277 Mapping Families" to add two new assignments. 279 +-------+---------------------------+ 280 | Value | Reference | 281 +-------+---------------------------+ 282 | 2 | This Document Section 3.1 | 283 | | | 284 | 3 | This Document Section 3.2 | 285 +-------+---------------------------+ 287 7. Acknowledgments 289 Thanks to Timothy Terriberry, Marcin Gorzel and Andrew Allen for 290 their guidance and valuable contributions to this document. 292 8. References 294 8.1. Normative References 296 [ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi, 297 "AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011, 298 . 301 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 302 Requirement Levels", BCP 14, RFC 2119, 303 DOI 10.17487/RFC2119, March 1997, 304 . 306 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 307 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 308 September 2012, . 310 [RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation 311 for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845, 312 April 2016, . 314 8.2. Informative References 316 [daniel04] 317 Daniel, J. and S. Moreau, "Further Study of Sound Field 318 Coding with Higher Order Ambisonics", May 2004, 319 . 322 [gerzon75] 323 Gerzon, M., "Ambisonics. Part one: General system 324 description", August 1975, 325 . 328 Authors' Addresses 330 Jan Skoglund 331 Google Inc. 332 345 Spear Street 333 San Francisco, CA 94105 334 USA 336 Email: jks@google.com 338 Michael Graczyk 340 Email: michael@graczyk.com