idnits 2.17.1 draft-gunther-detnet-proaudio-req-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 31, 2015) is 3313 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force C. Gunther, Ed. 3 Internet-Draft HARMAN 4 Intended status: Informational E. Grossman, Ed. 5 Expires: October 2, 2015 DOLBY 6 March 31, 2015 8 Deterministic Networking Professional Audio Requirements 9 draft-gunther-detnet-proaudio-req-01 11 Abstract 13 This draft documents the needs in the professional audio and video 14 industry to establish multi-hop paths and optional redundant paths 15 for characterized flows with deterministic properties. In this 16 context deterministic implies that streams can be established which 17 provide guaranteed bandwidth and latency which can be established 18 from a Layer 3 (IP) interface. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on October 2, 2015. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 56 3. Fundamental Stream Requirements . . . . . . . . . . . . . . . 3 57 3.1. Guaranteed Bandwidth . . . . . . . . . . . . . . . . . . 4 58 3.2. Bounded and Consistent Latency . . . . . . . . . . . . . 4 59 3.2.1. Optimizations . . . . . . . . . . . . . . . . . . . . 5 60 4. Additional Stream Requirements . . . . . . . . . . . . . . . 6 61 4.1. Deterministic Time to Establish Streaming . . . . . . . . 6 62 4.2. Use of Unused Reservations by Best-Effort Traffic . . . . 6 63 4.3. Layer 3 Interconnecting Layer 2 Islands . . . . . . . . . 7 64 4.4. Secure Transmission . . . . . . . . . . . . . . . . . . . 7 65 4.5. Redundant Paths . . . . . . . . . . . . . . . . . . . . . 7 66 4.6. Link Aggregation . . . . . . . . . . . . . . . . . . . . 8 67 4.7. Traffic Segregation . . . . . . . . . . . . . . . . . . . 8 68 4.7.1. Packet Forwarding Rules, VLANs and Subnets . . . . . 8 69 4.7.2. Multicast Addressing (IPv4 and IPv6) . . . . . . . . 8 70 5. Integration of Reserved Streams into IT Networks . . . . . . 9 71 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 72 6.1. Denial of Service . . . . . . . . . . . . . . . . . . . . 9 73 6.2. Control Protocols . . . . . . . . . . . . . . . . . . . . 9 74 7. A State-of-the-Art Broadcast Installation Hits Technology 75 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 76 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 77 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 78 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 80 10.2. Informative References . . . . . . . . . . . . . . . . . 11 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 83 1. Introduction 85 The professional audio and video industry includes music and film 86 content creation, broadcast, cinema, and live exposition as well as 87 public address, media and emergency systems at large venues 88 (airports, stadiums, churches, theme parks). These industries have 89 already gone through the transition of audio and video signals from 90 analog to digital, however the interconnect systems remain primarily 91 point-to-point with a single (or small number of) signals per link, 92 interconnected with purpose-built hardware. 94 These industries are now attempting to transition to packet based 95 infrastructure for distributing audio and video in order to reduce 96 cost, increase routing flexibility, and integrate with existing IT 97 infrastructure. 99 However, there are several requirements for making a network the 100 primary infrastructure for audio and video which are not met by 101 todays networks and these are our concern in this draft. 103 The principal requirement is that pro audio and video applications 104 become able to establish streams that provide guaranteed (bounded) 105 bandwidth and latency from the Layer 3 (IP) interface. Such streams 106 can be created today within standards-based layer 2 islands however 107 these are not sufficient to enable effective distribution over wider 108 areas (for example broadcast events that span wide geographical 109 areas). 111 Some proprietary systems have been created which enable deterministic 112 streams at layer 3 however they are engineered networks in that they 113 require careful configuration to operate, often require that the 114 system be over designed, and it is implied that all devices on the 115 network voluntarily play by the rules of that network. To enable 116 these industries to successfully transition to an interoperable 117 multi-vendor packet-based infrastructure requires effective open 118 standards, and we believe that establishing relevant IETF standards 119 is a crucial factor. 121 It would be highly desirable if such streams could be routed over the 122 open Internet, however even intermediate solutions with more limited 123 scope (such as enterprise networks) can provide a substantial 124 improvement over todays networks, and a solution that only provides 125 for the enterprise network scenario is an acceptable first step. 127 We also present more fine grained requirements of the audio and video 128 industries such as safety and security, redundant paths, devices with 129 limited computing resources on the network, and that reserved stream 130 bandwidth is available for use by other best-effort traffic when that 131 stream is not currently in use. 133 2. Requirements Language 135 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 136 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 137 document are to be interpreted as described in RFC 2119 [RFC2119]. 139 3. Fundamental Stream Requirements 141 The fundamental stream properties are guaranteed bandwidth and 142 deterministic latency as described in this section. Additional 143 stream requirements are described in a subsequent section. 145 3.1. Guaranteed Bandwidth 147 Transmitting audio and video streams is unlike common file transfer 148 activities because guaranteed delivery cannot be achieved by re- 149 trying the transmission; by the time the missing or corrupt packet 150 has been identified it is too late to execute a re-try operation and 151 stream playback is interrupted, which is unacceptable in for example 152 a live concert. In some contexts large amounts of buffering can be 153 used to provide enough delay to allow time for one or more retries, 154 however this is not an effective solution when live interaction is 155 involved, and is not considered an acceptable general solution for 156 pro audio and video. (Have you ever tried speaking into a microphone 157 through a sound system that has an echo coming back at you? It makes 158 it almost impossible to speak clearly). 160 Providing a way to reserve a specific amount of bandwidth for a given 161 stream is a key requirement. 163 3.2. Bounded and Consistent Latency 165 Latency in this context means the amount of time that passes between 166 when a signal is sent over a stream and when it is received, for 167 example the amount of time delay between when you speak into a 168 microphone and when your voice emerges from the speaker. Any delay 169 longer than about 10-15 milliseconds is noticeable by most live 170 performers, and greater latency makes the system unusable because it 171 prevents them from playing in time with the other players (see slide 172 6 of [SRP_LATENCY]). 174 The 15ms latency bound is made even more challenging because it is 175 often the case in network based music production with live electric 176 instruments that multiple stages of signal processing are used, 177 connected in series (i.e. from one to the other for example from 178 guitar through a series of digital effects processors) in which case 179 the latencies add, so the latencies of each individual stage must all 180 together remain less than 15ms. 182 In some situations it is acceptable at the local location for content 183 from the live remote site to be delayed to allow for a statistically 184 acceptable amount of latency in order to reduce jitter. However, 185 once the content begins playing in the local location any audio 186 artifacts caused by the local network are unacceptable, especially in 187 those situations where a live local performer is mixed into the feed 188 from the remote location. 190 In addition to being bounded to within some predictable and 191 acceptable amount of time (which may be 15 milliseconds or more or 192 less depending on the application) the latency also has to be 193 consistent. For example when playing a film consisting of a video 194 stream and audio stream over a network, those two streams must be 195 synchronized so that the voice and the picture match up. A common 196 tolerance for audio/video sync is one NTSC video frame (about 33ms) 197 and to maintain the audience perception of correct lip sync the 198 latency needs to be consistent within some reasonable tolerance, for 199 example 10%. 201 A common architecture for synchronizing multiple streams that have 202 different paths through the network (and thus potentially different 203 latencies) is to enable measurement of the latency of each path, and 204 have the data sinks (for example speakers) buffer (delay) all packets 205 on all but the slowest path. Each packet of each stream is assigned 206 a presentation time which is based on the longest required delay. 207 This implies that all sinks must maintain a common time reference of 208 sufficient accuracy, which can be achieved by any of various 209 techniques. 211 This type of architecture is commonly implemented using a central 212 controller that determines path delays and arbitrates buffering 213 delays. 215 3.2.1. Optimizations 217 The controller might also perform optimizations based on the 218 individual path delays, for example sinks that are closer to the 219 source can inform the controller that they can accept greater latency 220 since they will be buffering packets to match presentation times of 221 farther away sinks. The controller might then move a stream 222 reservation on a short path to a longer path in order to free up 223 bandwidth for other critical streams on that short path. See slides 224 3-5 of [SRP_LATENCY]. 226 Additional optimization can be achieved in cases where sinks have 227 differing latency requirements, for example in a live outdoor concert 228 the speaker sinks have stricter latency requirements than the 229 recording hardware sinks. See slide 7 of [SRP_LATENCY]. 231 Device cost can be reduced in a system with guaranteed reservations 232 with a small bounded latency due to the reduced requirements for 233 buffering (i.e. memory) on sink devices. For example, a theme park 234 might broadcast a live event across the globe via a layer 3 protocol; 235 in such cases the size of the buffers required is proportional to the 236 latency bounds and jitter caused by delivery, which depends on the 237 worst case segment of the end-to-end network path. For example on 238 todays open internet the latency is typically unacceptable for audio 239 and video streaming without many seconds of buffering. In such 240 scenarios a single gateway device at the local network that receives 241 the feed from the remote site would provide the expensive buffering 242 required to mask the latency and jitter issues associated with long 243 distance delivery. Sink devices in the local location would have no 244 additional buffering requirements, and thus no additional costs, 245 beyond those required for delivery of local content. The sink device 246 would be receiving the identical packets as those sent by the source 247 and would be unaware that there were any latency or jitter issues 248 along the path. 250 4. Additional Stream Requirements 252 The requirements in this section are more specific yet are common to 253 multiple audio and video industry applications. 255 4.1. Deterministic Time to Establish Streaming 257 Some audio systems installed in public environments (airports, 258 hospitals) have unique requirements with regards to health, safety 259 and fire concerns. One such requirement is a maximum of 3 seconds 260 for a system to respond to an emergency detection and begin sending 261 appropriate warning signals and alarms without human intervention. 262 For this requirement to be met, the system must support a bounded and 263 acceptable time from a notification signal to specific stream 264 establishment. For further details see [ISO7240-16]. 266 Similar requirements apply when the system is restarted after a power 267 cycle, cable re-connection, or system reconfiguration. 269 In many cases such re-establishment of streaming state must be 270 achieved by the peer devices themselves, i.e. without a central 271 controller (since such a controller may only be present during 272 initial network configuration). 274 Video systems introduce related requirements, for example when 275 transitioning from one camera feed to another. Such systems 276 currently use purpose-built hardware to switch feeds smoothly, 277 however there is a current initiative in the broadcast industry to 278 switch to a packet-based infrastructure (see [STUDIO_IP] and the ESPN 279 DC2 use case described below). 281 4.2. Use of Unused Reservations by Best-Effort Traffic 283 In cases where stream bandwidth is reserved but not currently used 284 (or is under-utilized) that bandwidth must be available to best- 285 effort (i.e. non-time-sensitive) traffic. For example a single 286 stream may be nailed up (reserved) for specific media content that 287 needs to be presented at different times of the day, ensuring timely 288 delivery of that content, yet in between those times the full 289 bandwidth of the network can be utilized for best-effort tasks such 290 as file transfers. 292 This also addresses a concern of IT network administrators that are 293 considering adding reserved bandwidth traffic to their networks that 294 users will just reserve a ton of bandwidth and then never un-reserve 295 it even though they are not using it, and soon they will have no 296 bandwidth left. 298 4.3. Layer 3 Interconnecting Layer 2 Islands 300 As an intermediate step (short of providing guaranteed bandwidth 301 across the open internet) it would be valuable to provide a way to 302 connect multiple Layer 2 networks. For example layer 2 techniques 303 could be used to create a LAN for a single broadcast studio, and 304 several such studios could be interconnected via layer 3 links. 306 4.4. Secure Transmission 308 Digital Rights Management (DRM) is very important to the audio and 309 video industries. Any time protected content is introduced into a 310 network there are DRM concerns that must be maintained (see 311 [CONTENT_PROTECTION]). Many aspects of DRM are outside the scope of 312 network technology, however there are cases when a secure link 313 supporting authentication and encryption is required by content 314 owners to carry their audio or video content when it is outside their 315 own secure environment (for example see [DCI]). 317 As an example, two techniques are Digital Transmission Content 318 Protection (DTCP) and High-Bandwidth Digital Content Protection 319 (HDCP). HDCP content is not approved for retransmission within any 320 other type of DRM, while DTCP may be retransmitted under HDCP. 321 Therefore if the source of a stream is outside of the network and it 322 uses HDCP protection it is only allowed to be placed on the network 323 with that same HDCP protection. 325 4.5. Redundant Paths 327 On-air and other live media streams must be backed up with redundant 328 links that seamlessly act to deliver the content when the primary 329 link fails for any reason. In point-to-point systems this is 330 provided by an additional point-to-point link; the analogous 331 requirement in a packet-based system is to provide an alternate path 332 through the network such that no individual link can bring down the 333 system. 335 4.6. Link Aggregation 337 For transmitting streams that require more bandwidth than a single 338 link in the target network can support, link aggregation is a 339 technique for combining (aggregating) the bandwidth available on 340 multiple physical links to create a single logical link of the 341 required bandwidth. However, if aggregation is to be used, the 342 network controller (or equivalent) must be able to determine the 343 maximum latency of any path through the aggregate link (see Bounded 344 and Consistent Latency section above). 346 4.7. Traffic Segregation 348 Sink devices may be low cost devices with limited processing power. 349 In order to not overwhelm the CPUs in these devices it is important 350 to limit the amount of traffic that these devices must process. 352 As an example, consider the use of individual seat speakers in a 353 cinema. These speakers are typically required to be cost reduced 354 since the quantities in a single theater can reach hundreds of seats. 355 Discovery protocols alone in a one thousand seat theater can generate 356 enough broadcast traffic to overwhelm a low powered CPU. Thus an 357 installation like this will benefit greatly from some type of traffic 358 segregation that can define groups of seats to reduce traffic within 359 each group. All seats in the theater must still be able to 360 communicate with a central controller. 362 There are many techniques that can be used to support this 363 requirement including (but not limited to) the following examples. 365 4.7.1. Packet Forwarding Rules, VLANs and Subnets 367 Packet forwarding rules can be used to eliminate some extraneous 368 streaming traffic from reaching potentially low powered sink devices, 369 however there may be other types of broadcast traffic that should be 370 eliminated using other means for example VLANs or IP subnets. 372 4.7.2. Multicast Addressing (IPv4 and IPv6) 374 Multicast addressing is commonly used to keep bandwidth utilization 375 of shared links to a minimum. 377 Because of the MAC Address forwarding nature of Layer 2 bridges it is 378 important that a multicast MAC address is only associated with one 379 stream. This will prevent reservations from forwarding packets from 380 one stream down a path that has no interested sinks simply because 381 there is another stream on that same path that shares the same 382 multicast MAC address. 384 Since each multicast MAC Address can represent 32 different IPv4 385 multicast addresses there must be a process put in place to make sure 386 this does not occur. Requiring use of IPv6 address can achieve this, 387 however due to their continued prevalence, solutions that are 388 effective for IPv4 installations are also required. 390 5. Integration of Reserved Streams into IT Networks 392 A commonly cited goal of moving to a packet based media 393 infrastructure is that costs can be reduced by using off the shelf, 394 commodity network hardware. In addition, economy of scale can be 395 realized by combining media infrastructure with IT infrastructure. 396 In keeping with these goals, stream reservation technology should be 397 compatible with existing protocols, and not compromise use of the 398 network for best effort (non-time-sensitive) traffic. 400 6. Security Considerations 402 Many industries that are moving from the point-to-point world to the 403 digital network world have little understanding of the pitfalls that 404 they can create for themselves with improperly implemented network 405 infrastructure. DetNet should consider ways to provide security 406 against DoS attacks in solutions directed at these markets. Some 407 considerations are given here as examples of ways that we can help 408 new users avoid common pitfalls. 410 6.1. Denial of Service 412 One security pitfall that this author is aware of involves the use of 413 technology that allows a presenter to throw the content from their 414 tablet or smart phone onto the A/V system that is then viewed by all 415 those in attendance. The facility introducing this technology was 416 quite excited to allow such modern flexibility to those who came to 417 speak. One thing they hadn't realized was that since no security was 418 put in place around this technology it left a hole in the system that 419 allowed other attendees to "throw" their own content onto the A/V 420 system. 422 6.2. Control Protocols 424 Professional audio systems can include amplifiers that are capable of 425 generating hundreds or thousands of watts of audio power which if 426 used incorrectly can cause hearing damage to those in the vicinity. 427 Apart from the usual care required by the systems operators to 428 prevent such incidents, the network traffic that controls these 429 devices must be secured (as with any sensitive application traffic). 430 In addition, it would be desirable if the configuration protocols 431 that are used to create the network paths used by the professional 432 audio traffic could be designed to protect devices that are not meant 433 to receive high-amplitude content from having such potentially 434 damaging signals routed to them. 436 7. A State-of-the-Art Broadcast Installation Hits Technology Limits 438 ESPN recently constructed a state-of-the-art 194,000 sq ft, $125 439 million broadcast studio called DC2. The DC2 network is capable of 440 handling 46 Tbps of throughput with 60,000 simultaneous signals. 441 Inside the facility are 1,100 miles of fiber feeding four audio 442 control rooms. (See details at [ESPN_DC2] ). 444 In designing DC2 they replaced as much point-to-point technology as 445 they possibly could with packet-based technology. They constructed 446 seven individual studios using layer 2 LANS (using IEEE 802.1 AVB) 447 that were entirely effective at routing audio within the LANs, and 448 they were very happy with the results, however to interconnect these 449 layer 2 LAN islands together they ended up using dedicated links 450 because there is no standards-based routing solution available. 452 This is the kind of motivation we have to develop these standards 453 because customers are ready and able to use them. 455 8. Acknowledgements 457 The editors would like to acknowledge the help of the following 458 individuals and the companies they represent: 460 Jeff Koftinoff, Meyer Sound 462 Jouni Korhonen, Associate Technical Director, Broadcom 464 Pascal Thubert, CTAO, Cisco 466 Kieran Tyrrell, Sienda New Media Technologies GmbH 468 9. IANA Considerations 470 This memo includes no request to IANA. 472 10. References 474 10.1. Normative References 476 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 477 Requirement Levels", BCP 14, RFC 2119, March 1997. 479 10.2. Informative References 481 [CONTENT_PROTECTION] 482 Olsen, D., "1722a Content Protection", 2012, 483 . 486 [DCI] Digital Cinema Initiatives, LLC, "DCI Specification, 487 Version 1.2", 2012, . 489 [ESPN_DC2] 490 Daley, D., "ESPN's DC2 Scales AVB Large", 2014, 491 . 494 [ISO7240-16] 495 ISO, "ISO 7240-16:2007 Fire detection and alarm systems -- 496 Part 16: Sound system control and indicating equipment", 497 2007, . 500 [SRP_LATENCY] 501 Gunther, C., "Specifying SRP Latency", 2014, 502 . 505 [STUDIO_IP] 506 Mace, G., "IP Networked Studio Infrastructure for 507 Synchronized & Real-Time Multimedia Transmissions", 2007, 508 . 511 Authors' Addresses 513 Craig Gunther (editor) 514 Harman International 515 10653 South River Front Parkway 516 South Jordan, UT 84095 517 USA 519 Phone: +1 801 568-7675 520 Email: craig.gunther@harman.com 521 URI: http://www.harman.com 522 Ethan Grossman (editor) 523 Dolby Laboratories, Inc. 524 100 Potrero Ave 525 San Francisco, CA 94103 526 USA 528 Phone: +1 415 645 4726 529 Email: ethan.grossman@dolby.com 530 URI: http://www.dolby.com