idnits 2.17.1 

draft-gunther-detnet-proaudio-req-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (March 31, 2015) is 3313 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                          C. Gunther, Ed.
3	Internet-Draft                                                    HARMAN
4	Intended status: Informational                          E. Grossman, Ed.
5	Expires: October 2, 2015                                           DOLBY
6	                                                          March 31, 2015

8	        Deterministic Networking Professional Audio Requirements
9	                  draft-gunther-detnet-proaudio-req-01

11	Abstract

13	   This draft documents the needs in the professional audio and video
14	   industry to establish multi-hop paths and optional redundant paths
15	   for characterized flows with deterministic properties.  In this
16	   context deterministic implies that streams can be established which
17	   provide guaranteed bandwidth and latency which can be established
18	   from a Layer 3 (IP) interface.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on October 2, 2015.

37	Copyright Notice

39	   Copyright (c) 2015 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
55	   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
56	   3.  Fundamental Stream Requirements . . . . . . . . . . . . . . .   3
57	     3.1.  Guaranteed Bandwidth  . . . . . . . . . . . . . . . . . .   4
58	     3.2.  Bounded and Consistent Latency  . . . . . . . . . . . . .   4
59	       3.2.1.  Optimizations . . . . . . . . . . . . . . . . . . . .   5
60	   4.  Additional Stream Requirements  . . . . . . . . . . . . . . .   6
61	     4.1.  Deterministic Time to Establish Streaming . . . . . . . .   6
62	     4.2.  Use of Unused Reservations by Best-Effort Traffic . . . .   6
63	     4.3.  Layer 3 Interconnecting Layer 2 Islands . . . . . . . . .   7
64	     4.4.  Secure Transmission . . . . . . . . . . . . . . . . . . .   7
65	     4.5.  Redundant Paths . . . . . . . . . . . . . . . . . . . . .   7
66	     4.6.  Link Aggregation  . . . . . . . . . . . . . . . . . . . .   8
67	     4.7.  Traffic Segregation . . . . . . . . . . . . . . . . . . .   8
68	       4.7.1.  Packet Forwarding Rules, VLANs and Subnets  . . . . .   8
69	       4.7.2.  Multicast Addressing (IPv4 and IPv6)  . . . . . . . .   8
70	   5.  Integration of Reserved Streams into IT Networks  . . . . . .   9
71	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
72	     6.1.  Denial of Service . . . . . . . . . . . . . . . . . . . .   9
73	     6.2.  Control Protocols . . . . . . . . . . . . . . . . . . . .   9
74	   7.  A State-of-the-Art Broadcast Installation Hits Technology
75	       Limits  . . . . . . . . . . . . . . . . . . . . . . . . . . .  10
76	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  10
77	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
78	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
79	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  10
80	     10.2.  Informative References . . . . . . . . . . . . . . . . .  11
81	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

83	1.  Introduction

85	   The professional audio and video industry includes music and film
86	   content creation, broadcast, cinema, and live exposition as well as
87	   public address, media and emergency systems at large venues
88	   (airports, stadiums, churches, theme parks).  These industries have
89	   already gone through the transition of audio and video signals from
90	   analog to digital, however the interconnect systems remain primarily
91	   point-to-point with a single (or small number of) signals per link,
92	   interconnected with purpose-built hardware.

94	   These industries are now attempting to transition to packet based
95	   infrastructure for distributing audio and video in order to reduce
96	   cost, increase routing flexibility, and integrate with existing IT
97	   infrastructure.

99	   However, there are several requirements for making a network the
100	   primary infrastructure for audio and video which are not met by
101	   todays networks and these are our concern in this draft.

103	   The principal requirement is that pro audio and video applications
104	   become able to establish streams that provide guaranteed (bounded)
105	   bandwidth and latency from the Layer 3 (IP) interface.  Such streams
106	   can be created today within standards-based layer 2 islands however
107	   these are not sufficient to enable effective distribution over wider
108	   areas (for example broadcast events that span wide geographical
109	   areas).

111	   Some proprietary systems have been created which enable deterministic
112	   streams at layer 3 however they are engineered networks in that they
113	   require careful configuration to operate, often require that the
114	   system be over designed, and it is implied that all devices on the
115	   network voluntarily play by the rules of that network.  To enable
116	   these industries to successfully transition to an interoperable
117	   multi-vendor packet-based infrastructure requires effective open
118	   standards, and we believe that establishing relevant IETF standards
119	   is a crucial factor.

121	   It would be highly desirable if such streams could be routed over the
122	   open Internet, however even intermediate solutions with more limited
123	   scope (such as enterprise networks) can provide a substantial
124	   improvement over todays networks, and a solution that only provides
125	   for the enterprise network scenario is an acceptable first step.

127	   We also present more fine grained requirements of the audio and video
128	   industries such as safety and security, redundant paths, devices with
129	   limited computing resources on the network, and that reserved stream
130	   bandwidth is available for use by other best-effort traffic when that
131	   stream is not currently in use.

133	2.  Requirements Language

135	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
136	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
137	   document are to be interpreted as described in RFC 2119 [RFC2119].

139	3.  Fundamental Stream Requirements

141	   The fundamental stream properties are guaranteed bandwidth and
142	   deterministic latency as described in this section.  Additional
143	   stream requirements are described in a subsequent section.

145	3.1.  Guaranteed Bandwidth

147	   Transmitting audio and video streams is unlike common file transfer
148	   activities because guaranteed delivery cannot be achieved by re-
149	   trying the transmission; by the time the missing or corrupt packet
150	   has been identified it is too late to execute a re-try operation and
151	   stream playback is interrupted, which is unacceptable in for example
152	   a live concert.  In some contexts large amounts of buffering can be
153	   used to provide enough delay to allow time for one or more retries,
154	   however this is not an effective solution when live interaction is
155	   involved, and is not considered an acceptable general solution for
156	   pro audio and video.  (Have you ever tried speaking into a microphone
157	   through a sound system that has an echo coming back at you?  It makes
158	   it almost impossible to speak clearly).

160	   Providing a way to reserve a specific amount of bandwidth for a given
161	   stream is a key requirement.

163	3.2.  Bounded and Consistent Latency

165	   Latency in this context means the amount of time that passes between
166	   when a signal is sent over a stream and when it is received, for
167	   example the amount of time delay between when you speak into a
168	   microphone and when your voice emerges from the speaker.  Any delay
169	   longer than about 10-15 milliseconds is noticeable by most live
170	   performers, and greater latency makes the system unusable because it
171	   prevents them from playing in time with the other players (see slide
172	   6 of [SRP_LATENCY]).

174	   The 15ms latency bound is made even more challenging because it is
175	   often the case in network based music production with live electric
176	   instruments that multiple stages of signal processing are used,
177	   connected in series (i.e. from one to the other for example from
178	   guitar through a series of digital effects processors) in which case
179	   the latencies add, so the latencies of each individual stage must all
180	   together remain less than 15ms.

182	   In some situations it is acceptable at the local location for content
183	   from the live remote site to be delayed to allow for a statistically
184	   acceptable amount of latency in order to reduce jitter.  However,
185	   once the content begins playing in the local location any audio
186	   artifacts caused by the local network are unacceptable, especially in
187	   those situations where a live local performer is mixed into the feed
188	   from the remote location.

190	   In addition to being bounded to within some predictable and
191	   acceptable amount of time (which may be 15 milliseconds or more or
192	   less depending on the application) the latency also has to be
193	   consistent.  For example when playing a film consisting of a video
194	   stream and audio stream over a network, those two streams must be
195	   synchronized so that the voice and the picture match up.  A common
196	   tolerance for audio/video sync is one NTSC video frame (about 33ms)
197	   and to maintain the audience perception of correct lip sync the
198	   latency needs to be consistent within some reasonable tolerance, for
199	   example 10%.

201	   A common architecture for synchronizing multiple streams that have
202	   different paths through the network (and thus potentially different
203	   latencies) is to enable measurement of the latency of each path, and
204	   have the data sinks (for example speakers) buffer (delay) all packets
205	   on all but the slowest path.  Each packet of each stream is assigned
206	   a presentation time which is based on the longest required delay.
207	   This implies that all sinks must maintain a common time reference of
208	   sufficient accuracy, which can be achieved by any of various
209	   techniques.

211	   This type of architecture is commonly implemented using a central
212	   controller that determines path delays and arbitrates buffering
213	   delays.

215	3.2.1.  Optimizations

217	   The controller might also perform optimizations based on the
218	   individual path delays, for example sinks that are closer to the
219	   source can inform the controller that they can accept greater latency
220	   since they will be buffering packets to match presentation times of
221	   farther away sinks.  The controller might then move a stream
222	   reservation on a short path to a longer path in order to free up
223	   bandwidth for other critical streams on that short path.  See slides
224	   3-5 of [SRP_LATENCY].

226	   Additional optimization can be achieved in cases where sinks have
227	   differing latency requirements, for example in a live outdoor concert
228	   the speaker sinks have stricter latency requirements than the
229	   recording hardware sinks.  See slide 7 of [SRP_LATENCY].

231	   Device cost can be reduced in a system with guaranteed reservations
232	   with a small bounded latency due to the reduced requirements for
233	   buffering (i.e. memory) on sink devices.  For example, a theme park
234	   might broadcast a live event across the globe via a layer 3 protocol;
235	   in such cases the size of the buffers required is proportional to the
236	   latency bounds and jitter caused by delivery, which depends on the
237	   worst case segment of the end-to-end network path.  For example on
238	   todays open internet the latency is typically unacceptable for audio
239	   and video streaming without many seconds of buffering.  In such
240	   scenarios a single gateway device at the local network that receives
241	   the feed from the remote site would provide the expensive buffering
242	   required to mask the latency and jitter issues associated with long
243	   distance delivery.  Sink devices in the local location would have no
244	   additional buffering requirements, and thus no additional costs,
245	   beyond those required for delivery of local content.  The sink device
246	   would be receiving the identical packets as those sent by the source
247	   and would be unaware that there were any latency or jitter issues
248	   along the path.

250	4.  Additional Stream Requirements

252	   The requirements in this section are more specific yet are common to
253	   multiple audio and video industry applications.

255	4.1.  Deterministic Time to Establish Streaming

257	   Some audio systems installed in public environments (airports,
258	   hospitals) have unique requirements with regards to health, safety
259	   and fire concerns.  One such requirement is a maximum of 3 seconds
260	   for a system to respond to an emergency detection and begin sending
261	   appropriate warning signals and alarms without human intervention.
262	   For this requirement to be met, the system must support a bounded and
263	   acceptable time from a notification signal to specific stream
264	   establishment.  For further details see [ISO7240-16].

266	   Similar requirements apply when the system is restarted after a power
267	   cycle, cable re-connection, or system reconfiguration.

269	   In many cases such re-establishment of streaming state must be
270	   achieved by the peer devices themselves, i.e. without a central
271	   controller (since such a controller may only be present during
272	   initial network configuration).

274	   Video systems introduce related requirements, for example when
275	   transitioning from one camera feed to another.  Such systems
276	   currently use purpose-built hardware to switch feeds smoothly,
277	   however there is a current initiative in the broadcast industry to
278	   switch to a packet-based infrastructure (see [STUDIO_IP] and the ESPN
279	   DC2 use case described below).

281	4.2.  Use of Unused Reservations by Best-Effort Traffic

283	   In cases where stream bandwidth is reserved but not currently used
284	   (or is under-utilized) that bandwidth must be available to best-
285	   effort (i.e. non-time-sensitive) traffic.  For example a single
286	   stream may be nailed up (reserved) for specific media content that
287	   needs to be presented at different times of the day, ensuring timely
288	   delivery of that content, yet in between those times the full
289	   bandwidth of the network can be utilized for best-effort tasks such
290	   as file transfers.

292	   This also addresses a concern of IT network administrators that are
293	   considering adding reserved bandwidth traffic to their networks that
294	   users will just reserve a ton of bandwidth and then never un-reserve
295	   it even though they are not using it, and soon they will have no
296	   bandwidth left.

298	4.3.  Layer 3 Interconnecting Layer 2 Islands

300	   As an intermediate step (short of providing guaranteed bandwidth
301	   across the open internet) it would be valuable to provide a way to
302	   connect multiple Layer 2 networks.  For example layer 2 techniques
303	   could be used to create a LAN for a single broadcast studio, and
304	   several such studios could be interconnected via layer 3 links.

306	4.4.  Secure Transmission

308	   Digital Rights Management (DRM) is very important to the audio and
309	   video industries.  Any time protected content is introduced into a
310	   network there are DRM concerns that must be maintained (see
311	   [CONTENT_PROTECTION]).  Many aspects of DRM are outside the scope of
312	   network technology, however there are cases when a secure link
313	   supporting authentication and encryption is required by content
314	   owners to carry their audio or video content when it is outside their
315	   own secure environment (for example see [DCI]).

317	   As an example, two techniques are Digital Transmission Content
318	   Protection (DTCP) and High-Bandwidth Digital Content Protection
319	   (HDCP).  HDCP content is not approved for retransmission within any
320	   other type of DRM, while DTCP may be retransmitted under HDCP.
321	   Therefore if the source of a stream is outside of the network and it
322	   uses HDCP protection it is only allowed to be placed on the network
323	   with that same HDCP protection.

325	4.5.  Redundant Paths

327	   On-air and other live media streams must be backed up with redundant
328	   links that seamlessly act to deliver the content when the primary
329	   link fails for any reason.  In point-to-point systems this is
330	   provided by an additional point-to-point link; the analogous
331	   requirement in a packet-based system is to provide an alternate path
332	   through the network such that no individual link can bring down the
333	   system.

335	4.6.  Link Aggregation

337	   For transmitting streams that require more bandwidth than a single
338	   link in the target network can support, link aggregation is a
339	   technique for combining (aggregating) the bandwidth available on
340	   multiple physical links to create a single logical link of the
341	   required bandwidth.  However, if aggregation is to be used, the
342	   network controller (or equivalent) must be able to determine the
343	   maximum latency of any path through the aggregate link (see Bounded
344	   and Consistent Latency section above).

346	4.7.  Traffic Segregation

348	   Sink devices may be low cost devices with limited processing power.
349	   In order to not overwhelm the CPUs in these devices it is important
350	   to limit the amount of traffic that these devices must process.

352	   As an example, consider the use of individual seat speakers in a
353	   cinema.  These speakers are typically required to be cost reduced
354	   since the quantities in a single theater can reach hundreds of seats.
355	   Discovery protocols alone in a one thousand seat theater can generate
356	   enough broadcast traffic to overwhelm a low powered CPU.  Thus an
357	   installation like this will benefit greatly from some type of traffic
358	   segregation that can define groups of seats to reduce traffic within
359	   each group.  All seats in the theater must still be able to
360	   communicate with a central controller.

362	   There are many techniques that can be used to support this
363	   requirement including (but not limited to) the following examples.

365	4.7.1.  Packet Forwarding Rules, VLANs and Subnets

367	   Packet forwarding rules can be used to eliminate some extraneous
368	   streaming traffic from reaching potentially low powered sink devices,
369	   however there may be other types of broadcast traffic that should be
370	   eliminated using other means for example VLANs or IP subnets.

372	4.7.2.  Multicast Addressing (IPv4 and IPv6)

374	   Multicast addressing is commonly used to keep bandwidth utilization
375	   of shared links to a minimum.

377	   Because of the MAC Address forwarding nature of Layer 2 bridges it is
378	   important that a multicast MAC address is only associated with one
379	   stream.  This will prevent reservations from forwarding packets from
380	   one stream down a path that has no interested sinks simply because
381	   there is another stream on that same path that shares the same
382	   multicast MAC address.

384	   Since each multicast MAC Address can represent 32 different IPv4
385	   multicast addresses there must be a process put in place to make sure
386	   this does not occur.  Requiring use of IPv6 address can achieve this,
387	   however due to their continued prevalence, solutions that are
388	   effective for IPv4 installations are also required.

390	5.  Integration of Reserved Streams into IT Networks

392	   A commonly cited goal of moving to a packet based media
393	   infrastructure is that costs can be reduced by using off the shelf,
394	   commodity network hardware.  In addition, economy of scale can be
395	   realized by combining media infrastructure with IT infrastructure.
396	   In keeping with these goals, stream reservation technology should be
397	   compatible with existing protocols, and not compromise use of the
398	   network for best effort (non-time-sensitive) traffic.

400	6.  Security Considerations

402	   Many industries that are moving from the point-to-point world to the
403	   digital network world have little understanding of the pitfalls that
404	   they can create for themselves with improperly implemented network
405	   infrastructure.  DetNet should consider ways to provide security
406	   against DoS attacks in solutions directed at these markets.  Some
407	   considerations are given here as examples of ways that we can help
408	   new users avoid common pitfalls.

410	6.1.  Denial of Service

412	   One security pitfall that this author is aware of involves the use of
413	   technology that allows a presenter to throw the content from their
414	   tablet or smart phone onto the A/V system that is then viewed by all
415	   those in attendance.  The facility introducing this technology was
416	   quite excited to allow such modern flexibility to those who came to
417	   speak.  One thing they hadn't realized was that since no security was
418	   put in place around this technology it left a hole in the system that
419	   allowed other attendees to "throw" their own content onto the A/V
420	   system.

422	6.2.  Control Protocols

424	   Professional audio systems can include amplifiers that are capable of
425	   generating hundreds or thousands of watts of audio power which if
426	   used incorrectly can cause hearing damage to those in the vicinity.
427	   Apart from the usual care required by the systems operators to
428	   prevent such incidents, the network traffic that controls these
429	   devices must be secured (as with any sensitive application traffic).
430	   In addition, it would be desirable if the configuration protocols
431	   that are used to create the network paths used by the professional
432	   audio traffic could be designed to protect devices that are not meant
433	   to receive high-amplitude content from having such potentially
434	   damaging signals routed to them.

436	7.  A State-of-the-Art Broadcast Installation Hits Technology Limits

438	   ESPN recently constructed a state-of-the-art 194,000 sq ft, $125
439	   million broadcast studio called DC2.  The DC2 network is capable of
440	   handling 46 Tbps of throughput with 60,000 simultaneous signals.
441	   Inside the facility are 1,100 miles of fiber feeding four audio
442	   control rooms.  (See details at [ESPN_DC2] ).

444	   In designing DC2 they replaced as much point-to-point technology as
445	   they possibly could with packet-based technology.  They constructed
446	   seven individual studios using layer 2 LANS (using IEEE 802.1 AVB)
447	   that were entirely effective at routing audio within the LANs, and
448	   they were very happy with the results, however to interconnect these
449	   layer 2 LAN islands together they ended up using dedicated links
450	   because there is no standards-based routing solution available.

452	   This is the kind of motivation we have to develop these standards
453	   because customers are ready and able to use them.

455	8.  Acknowledgements

457	   The editors would like to acknowledge the help of the following
458	   individuals and the companies they represent:

460	   Jeff Koftinoff, Meyer Sound

462	   Jouni Korhonen, Associate Technical Director, Broadcom

464	   Pascal Thubert, CTAO, Cisco

466	   Kieran Tyrrell, Sienda New Media Technologies GmbH

468	9.  IANA Considerations

470	   This memo includes no request to IANA.

472	10.  References

474	10.1.  Normative References

476	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
477	              Requirement Levels", BCP 14, RFC 2119, March 1997.

479	10.2.  Informative References

481	   [CONTENT_PROTECTION]
482	              Olsen, D., "1722a Content Protection", 2012,
483	              <http://grouper.ieee.org/groups/1722/contributions/2012/
484	              avtp_dolsen_1722a_content_protection.pdf>.

486	   [DCI]      Digital Cinema Initiatives, LLC, "DCI Specification,
487	              Version 1.2", 2012, <http://www.dcimovies.com/>.

489	   [ESPN_DC2]
490	              Daley, D., "ESPN's DC2 Scales AVB Large", 2014,
491	              <http://sportsvideo.org/main/blog/2014/06/
492	              espns-dc2-scales-avb-large>.

494	   [ISO7240-16]
495	              ISO, "ISO 7240-16:2007 Fire detection and alarm systems --
496	              Part 16: Sound system control and indicating equipment",
497	              2007, <http://www.iso.org/iso/
498	              catalogue_detail.htm?csnumber=42978>.

500	   [SRP_LATENCY]
501	              Gunther, C., "Specifying SRP Latency", 2014,
502	              <http://www.ieee802.org/1/files/public/docs2014/
503	              cc-cgunther-acceptable-latency-0314-v01.pdf>.

505	   [STUDIO_IP]
506	              Mace, G., "IP Networked Studio Infrastructure for
507	              Synchronized & Real-Time Multimedia Transmissions", 2007,
508	              <http://www.ieee802.org/1/files/public/docs2047/
509	              avb-mace-ip-networked-studio-infrastructure-0107.pdf>.

511	Authors' Addresses

513	   Craig Gunther (editor)
514	   Harman International
515	   10653 South River Front Parkway
516	   South Jordan, UT  84095
517	   USA

519	   Phone: +1 801 568-7675
520	   Email: craig.gunther@harman.com
521	   URI:   http://www.harman.com
522	   Ethan Grossman (editor)
523	   Dolby Laboratories, Inc.
524	   100 Potrero Ave
525	   San Francisco, CA  94103
526	   USA

528	   Phone: +1 415 645 4726
529	   Email: ethan.grossman@dolby.com
530	   URI:   http://www.dolby.com