idnits 2.17.1 

draft-rosenberg-rtcweb-rtpmux-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 4, 2011) is 4680 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-03) exists of
     draft-perkins-rtcweb-rtp-usage-01

  == Outdated reference: A later version (-16) exists of
     draft-ietf-rtcweb-use-cases-and-requirements-01

  -- Obsolete informational reference (is this intentional?): RFC 5245
     (Obsoleted by RFC 8445, RFC 8839)


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTCWEB                                                      J. Rosenberg
3	Internet-Draft                                                     Skype
4	Intended status: Informational                               C. Jennings
5	Expires: January 5, 2012                                           Cisco
6	                                                             J. Peterson
7	                                                                 Neustar
8	                                                              M. Kaufman
9	                                                                   Skype
10	                                                             E. Rescorla
11	                                                                    RTFM
12	                                                           T. Terriberry
13	                                                                 Mozilla
14	                                                            July 4, 2011

16	 Multiplexing of Real-Time Transport Protocol (RTP) Traffic for Browser
17	                  based Real-Time Communications (RTC)
18	                    draft-rosenberg-rtcweb-rtpmux-00

20	Abstract

22	   This document argues that multiplexing of voice and video traffic
23	   over a single RTP session should be specified as the baseline mode of
24	   operation for multimedia traffic in RTC web.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 5, 2012.

43	Copyright Notice

45	   Copyright (c) 2011 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	   2.  RTP Muxing with SSRC . . . . . . . . . . . . . . . . . . . . .  3
62	   3.  Arguments in Favor of Multiplexing . . . . . . . . . . . . . .  4
63	     3.1.  NAT Resource Preservation  . . . . . . . . . . . . . . . .  4
64	     3.2.  Improved Failure Modes . . . . . . . . . . . . . . . . . .  5
65	     3.3.  Setup Time . . . . . . . . . . . . . . . . . . . . . . . .  5
66	     3.4.  Complexity . . . . . . . . . . . . . . . . . . . . . . . .  5
67	   4.  Responding to draft-perkins-rtcweb-rtp-usage . . . . . . . . .  5
68	     4.1.  Requires Additional Signaling  . . . . . . . . . . . . . .  6
69	     4.2.  QoS and Traffic Engineering  . . . . . . . . . . . . . . .  6
70	     4.3.  Scalability  . . . . . . . . . . . . . . . . . . . . . . .  7
71	     4.4.  RTP Retransmission . . . . . . . . . . . . . . . . . . . .  7
72	     4.5.  Forward Error Correction . . . . . . . . . . . . . . . . .  8
73	     4.6.  RTCP Issues  . . . . . . . . . . . . . . . . . . . . . . .  8
74	   5.  Arguing Against a Shim . . . . . . . . . . . . . . . . . . . .  9
75	   6.  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 10
76	   7.  Informative References . . . . . . . . . . . . . . . . . . . . 10
77	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

79	1.  Introduction

81	   The RTCweb working group is chartered to specify a framework and
82	   protocols for enabling real-time communications services within a
83	   browser, without the need for plugins
84	   [I-D.rosenberg-rtcweb-framework].  It is envisioned that this will
85	   enable many use cases [I-D.ietf-rtcweb-use-cases-and-requirements],
86	   the most basic of which is a video call between two users on the web.

88	   In order to enable this functionality, the specifications produced by
89	   the IETF will mandate a specific set of protocols that must be
90	   implemented within the browser.  It is anticipated that these
91	   protocols will include the Real-Time Transport Protocol [RFC3550],
92	   and either in full or in part, Interactive Connectivity Establishment
93	   (ICE) [RFC5245].

95	   The usage of RTP raises the question of multiplexing - whether or not
96	   RTCP and RTP should run on the same port, and furthermore, whether or
97	   not voice, video, and possibly data, should also run on the same
98	   port.  To provide guidance on this, Perkins et. al. produced
99	   [I-D.perkins-rtcweb-rtp-usage], which recommends that voice and video
100	   utilize different RTP sessions, and thus different UDP ports.

102	   This document argues against this conclusion, and advocates that a
103	   single transport session (i.e., a single UDP port) is used to carry
104	   voice and video traffic, using the SSRC for demux.

106	2.  RTP Muxing with SSRC

108	   This document recommends that all of the associated media content of
109	   the call - the voice, video, and RTCP traffic for both the voice and
110	   video sessions, utilize a single transport session (i.e., single UDP
111	   port).  In cases where there are multiple video streams (for example,
112	   screen sharing), the single transport session would carry all of the
113	   video.  Furthemore, that demultiplexing voice and video traffic is
114	   done by assigning a different SSRC to each.  This recommendation
115	   applies to the case of a single unicast communications session
116	   between a pair of endpoints (e.g., this document does not consider
117	   the case of running a multi-user service like a gateway).

119	   To enable multiplexing, we propose that the 32-bit SSRC value in the
120	   RTP header be broken up into the following sub-fields:

122	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
123	     |          Magic Cookie         |Type |     StreamID          |x|
124	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
125	                                SSRC Field

127	   The Magic Cookie is two bytes, with a value of 0xf7b3.  It is meant
128	   to facilitate DPI applications which can use its value to - with high
129	   confidence - determine that this RTP packet uses the encoding format
130	   defined here.  The type is a 3 bit value, corresponding to the top-
131	   level MIME type of the media (mapping table TBD).  It too is meant to
132	   facilitate DPI applications which want to separate voice and video.
133	   The streamID is a 12 bit field which represents the unique ID for
134	   this stream.  It is signaled between participants out of band.  The
135	   final bit, 'x' is set to zero and is reserved for future usage.

137	3.  Arguments in Favor of Multiplexing

139	   This section outlines several arguments in favor of multiplexing.

141	3.1.  NAT Resource Preservation

143	   Today's Internet is full of Network Address Translators (NAT), a
144	   situation which is likely to get worse as IPv4 address exhaustion
145	   continues.  When NAT is in use, the constraint on the number of
146	   endpoints behind the NAT is based on the number of parallel transport
147	   sessions that need to be supported.  If, for example, a NAT has a
148	   single external IP address, it can support 64k UDP sessions while
149	   having an endpoint-independent mapping behavior [RFC4787].  Thus, in
150	   the presence of NAT, parallel transport sessions becomes the scarce
151	   resource.

153	   If rtcweb specifies that audio and video run on a separate port, this
154	   will double the number of transport session resources consumed in
155	   intervening NATs.  While the usage of port as an application layer
156	   demux point made sense when RTP was designed back in 1992 (the year
157	   the first RTP draft was published), the Internet has changed
158	   substantially since then.  Continuing to perpetuate this design
159	   optimizes preseveration of legacy against protection of resources in
160	   the modern Internet.  We feel that this optimizes in the wrong
161	   direction.

163	   Given that we anticipate widespread usage of rtcweb, this design
164	   choice may create a non-trivial load on the transport session
165	   capacity of the Internet at large.  Real-time video communications on
166	   the Internet has seen huge growth in recent years.  For Skype,
167	   approximately 40% of its Skype-to-Skype calls are video based.  A
168	   recent report by Sandvine reports that Skype alone is the third
169	   largest source of upload traffic on the Internet as a whole, largely
170	   attributed to Skype video calling.  <http://www.sandvine.com/
171	   downloads/documents/05-17-2011_phenomena/
172	   Sandvine%20Global%20Internet%20Phenomena%20Spotlight%20-%20Netflix%
173	   20Rising.pdf>.  The conclusion from this is that the costs of a
174	   separate voice and video port cannot be ignored.

176	   Simply put, the usage of transport ports for application
177	   demultiplexing should be considered harmful for the Internet.

179	3.2.  Improved Failure Modes

181	   The usage of separate transport sessions for the audio, video or
182	   other content of the call introduces a variety of partial failure
183	   modes.  The transport session for one type of media might get
184	   established; but a NAT capacity problem might cause the transport
185	   session for another type of media to fail.  Usage of a single
186	   transport session means that the conversation succeeds or fails
187	   atomically.  We consider this a feature.

189	3.3.  Setup Time

191	   The rtcweb group is considering the usage of ICE to create p2p
192	   sessions.  ICE provides firewall and NAT traversal in addition to
193	   providing a handshake necessary to assure mutual consent for
194	   communications.

196	   Unfortunately, ICE requires time to perform its setup operations.
197	   This time grows in proportion to the number of transport sessions
198	   which must be opened in order to support the call.  By using a
199	   different port for video traffic, call setup times will increase.
200	   The precise amount of this increase depends on the type of NAT and
201	   varies depending on packet loss.  However, in a simple, ideal case of
202	   no packet loss and direct connectivity between endpoints, this value
203	   is XXX [[fill in]].

205	3.4.  Complexity

207	   ICE is not a simple protocol.  One of its significant complexities is
208	   its requirement to support calls for multiple media streams, each of
209	   which runs on a separate port, and multiple components for each
210	   stream (e.g., RTCP).  If the concept of streams and components were
211	   eliminated, ICE would be a simpler protocol.

213	   If, within rtcweb, a single transport connection was utilized,
214	   browsers could implement a simplified version of the ICE protocol.

216	4.  Responding to draft-perkins-rtcweb-rtp-usage

218	   [I-D.perkins-rtcweb-rtp-usage] outlines several arguments for
219	   continuing to use a separate port for audio and video.  In this
220	   section, we respond to those arguments.

222	4.1.  Requires Additional Signaling

224	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
225	   video on the same RTP session would require a demux point to be
226	   specified (for example, the SSRC), and require additional signaling
227	   to be specified to accomplish this.

229	   Firstly, this conclusion is only partly true.  For communications
230	   sessions between rtcweb users within the same domain, no signaling
231	   specifications are required.  This is true in general with rtcweb;
232	   one of its benefits is that it does not require standardized
233	   signaling.

235	   Secondly, it is not yet clear that rtcweb will be able to
236	   interoperate with existing VoIP endpoitns without a media
237	   intermediary to terminate ICE traffic.  It is our position that
238	   interoperability without media intermediary only be provided for
239	   basic voice services, and even then, only when RTCP is supported.  In
240	   the case of basic voice endpoints, where there is no video, RTP
241	   multiplexing of voice and video is irrelevant, and thus no signaling
242	   complexity is introduced.

244	   Thirdly, the primary place where there will be a need for signaling
245	   enhancements is for inter-domain calling between rtcweb endpoints in
246	   different domains.  In such a case, an SDP extension is required, and
247	   one can be specified.  It is trivial to do so.

249	   Finally, this document does recommend that it be possible to utilize
250	   a separate transport session for voice and for video, and that, in
251	   the worst case, this mode can be used for calls between an rtcweb
252	   endpoint and a legacy endpoint.

254	4.2.  QoS and Traffic Engineering

256	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
257	   video on the same RTP session would mean that it would not be
258	   possible to apply QoS techniques separately for voice and video which
259	   rely on the 5-tuple.

261	   Firstly, the public Internet lacks any QoS mechanism, so this
262	   argument is moot on the public Internet.

264	   Secondly, private enterprise networks which do provide QoS most often
265	   use diffserv.  Diffserv is compatible with utilization of a common
266	   port for voice and video traffic.  Typically, different DSCPs are
267	   used for voice and video (Cisco recommends EF for audio and AF41 for
268	   video in enterprise telephony deployments), and this practice is
269	   compatible with usage of the same port - each packet would be marked
270	   appropriately.  It is also possible to use the same DSCP for voice
271	   and video.

273	   Carrier networks, such as mobile operator networks, typically provide
274	   QoS through traffic engineering, using a combination of MPLS tunnels
275	   and diffserv markings.  MPLS tunnels do use 5-tuples as classifiers
276	   to determine which traffic to put in what kind of tunnel.  If there
277	   is a need for using separate MPLS tunnels for voice and video, the
278	   DSCP codepoint itself can be used as a differentiator.

280	   It is true that it would not be possible to utilize RSVP to
281	   separately establish QoS treatment for the voice and the video
282	   traffic.  However, there is very little real deployment of RSVP.
283	   None within the public Internet and relatively little within
284	   corporate networks.  As such, this argument is mostly theoretical.

286	   Finally, DPI is used within some operator networks to perform traffic
287	   classification.  It would always be possible to use DPI to assign
288	   different treatment to voice and video traffic.

290	4.3.  Scalability

292	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
293	   video on the same RTP session would mean that layered coding using
294	   multicast for each layer would not be possible.

296	   Firstly, most layered coding today uses unicast and a switch or mixer
297	   of some sort to discard layers.  That architecture is completely
298	   compatible with the usage of a single transport session for voice and
299	   video.  The limitation applies only to the use of IP multicast for
300	   real-time communications.  The usage of multicast on the Internet has
301	   substantially diminished over time.  There is some usage today in
302	   private networks but primarily for streaming media distribution.  The
303	   usage for real-time communications is quite rare.  As such, we find
304	   this to be a theoretical corner case.

306	4.4.  RTP Retransmission

308	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
309	   video on the same RTP session would not be interoperable with
310	   endpoints doing RTP retransmission per [RFC4588].

312	   As pointed out above, interoperability with existing endpoints
313	   without the usage of a media intermediary is not a given at this
314	   point, and we argue it should only be supported for the common case -
315	   a basic, voice-only RTP-capable endpoint.  There is, to our
316	   knowledge, relatively little deployment of RFC4588, at least for
317	   real-time communications.  It is certainly not a common feature in
318	   basic RTP endpoints and never a baseline requirement for
319	   interoperability.  Consequently, if there is a need to interoperate
320	   with an endpoint supporting RFC4588, and it is desired to avoid a
321	   media intermediary, RFC4588 can just be turned off for the session.

323	   As such, we find the interoperability argument here not compelling.

325	4.5.  Forward Error Correction

327	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
328	   video on the same RTP session will limit the applicability of FEC
329	   [RFC5109] to when the RTP packets are half of the path MTU.

331	   There are two cases to consider - interoperability with existing
332	   endpoints and usage for calls between rtcweb endpoints.

334	   For interoperability with existing endpoints, we argue the same thing
335	   here as for retransmits.  FEC is not commonly used in legacy voice
336	   endpoints, and if it is supported, is never a required feature.
337	   Consequently, if present, its usage can be disabled when
338	   interoperating with an rtcweb endpoint.  If FEC is included as part
339	   of the rtcweb specifications, the lower bandwidth of voice means that
340	   FEC packets could be sent on the same port, using [RFC2198], without
341	   approaching the path MTU.

343	   For communications between rtcweb endpoints, this is only an issue if
344	   FEC is included as part of the rtcweb specification.  If the group
345	   decides to do that (there is some value for real-time video), it
346	   should define a mechanism which allows for FEC packets to be sent
347	   using a separate SSRC.

349	4.6.  RTCP Issues

351	   [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and
352	   video on the same RTP session will introduce complications in the
353	   usage of RTCP, primarily when considering RTCP extensions.

355	   It is our belief that normal RTCP operation as defined in the RTCP
356	   specification will work fine with multiplexed voice and video
357	   traffic.  SRs and RRs are already generated per SSRC to handle
358	   multiple senders, and RTCP in general supports feedback for multiple
359	   SSRC within a session.  These mechanisms work as defined when each
360	   SSRC happens to represent a different media stream instead of a
361	   different user.

363	   The only complication that arises is for RTCP extensions which are
364	   defined to be media dependent.  [I-D.perkins-rtcweb-rtp-usage] points
365	   out, as an example, the usage of RTCP extended report blocks (XR)
366	   [RFC3611].  However, XR works fine in conjunction with multiplexing
367	   of voice and video within the same port.  Each of the seven report
368	   blocks defined in [RFC3611] include the SSRC of the source as part of
369	   the block, and thus will work.  [I-D.perkins-rtcweb-rtp-usage]
370	   indicates that "SSRC purpose tagging needs not only to be one the
371	   media side, but also on the RTCP reporting".  However, we do not
372	   believe this to be accurate.  Since the XR blocks report the SSRC
373	   source already, the specifications provide all that is needed.  The
374	   XR report is merely included when it is relevant.

376	   Furthermore, the discussion around XR assumes that we need to support
377	   them for interoperability with existing VoIP endpoints, or we are
378	   utilizing it for rtcweb itself.  As with FEC and retransmissions, in
379	   the case of interoperability, if there is an issue, XR can simply be
380	   disabled in these cases.  [RFC3611] does specify that XR can be sent
381	   without prior signaling.  In the worst case XR are received by an
382	   rtcweb endpoint which are then discarded.  In terms of usage of RTCP
383	   XR for communications between rtcweb endpoints, we would argue that a
384	   much more flexible solution would be to provide Javascript APis which
385	   allow the application to have access to the same data used to
386	   generate the XR, and then the application itself can use this data as
387	   it sees fit, including sending it back to the sender through some
388	   kind of application data packet.

390	5.  Arguing Against a Shim

392	   It has been proposed on the mailing list that an alternative approach
393	   for multiplexing on the same port would be to specify a new
394	   multiplexing protocol that has a small shim, which could then be used
395	   to separate voice and video traffic as a layer between UDP and RTP.
396	   Such a shim could then also be used to enable non-RTP data traffic as
397	   well.

399	   We believe that such a shim would be a mistake, for the same reason
400	   that shims have been avoided in the multiplexing of RTCP, STUN, and
401	   DTLS on the same port as RTP:

403	   o  The shim would break interoperability with a great deal of
404	      existing network inspection gear - firewalls, packet sniffers,
405	      traffic analyzers, and so on - which know how to extract, parse,
406	      and process RTP packets.

408	   o  The shim would add complexity through yet another layer of
409	      multiplexing.

411	   o  The shim would increase packet overhead further.

413	   o  A shim is a mistake which cannot be undone later.  If multiplexing
414	      on a single port truly causes interoperability issues, clients can
415	      fall back to using multiple ports, possibly even in the
416	      preponderance of cases.  However, once a shim is inserted,
417	      interoperability will always require an intermediary to strip it
418	      out, forever.

420	6.  Conclusion

422	   In conclusion, we feel that benefits of multiplexing of voice and
423	   video on a single RTP session (and thus single transport connection),
424	   outweight the drawbacks.  The primary benefit is the impact on NAT
425	   capacity, which is becoming an important issue in the modern
426	   Internet.  Furthermore, the unique nature of backwards compatibility
427	   for rtcweb lessens many of the interoperability concerns, and the
428	   traditional arguments around multicast and RSVP are simply no longer
429	   relevant and those technologies have faded from use.

431	7.  Informative References

433	   [I-D.perkins-rtcweb-rtp-usage]
434	              Perkins, C., Westerlund, M., and J. Ott, "RTP Requirements
435	              for RTC-Web", draft-perkins-rtcweb-rtp-usage-01 (work in
436	              progress), June 2011.

438	   [I-D.rosenberg-rtcweb-framework]
439	              Rosenberg, J., Kaufman, M., Hiie, M., and F. Audet, "An
440	              Architectural Framework for Browser based Real-Time
441	              Communications (RTC)", draft-rosenberg-rtcweb-framework-00
442	              (work in progress), February 2011.

444	   [I-D.ietf-rtcweb-use-cases-and-requirements]
445	              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
446	              Time Communication Use-cases and Requirements",
447	              draft-ietf-rtcweb-use-cases-and-requirements-01 (work in
448	              progress), July 2011.

450	   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
451	              (ICE): A Protocol for Network Address Translator (NAT)
452	              Traversal for Offer/Answer Protocols", RFC 5245,
453	              April 2010.

455	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
456	              Jacobson, "RTP: A Transport Protocol for Real-Time
457	              Applications", STD 64, RFC 3550, July 2003.

459	   [RFC4787]  Audet, F. and C. Jennings, "Network Address Translation
460	              (NAT) Behavioral Requirements for Unicast UDP", BCP 127,
461	              RFC 4787, January 2007.

463	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
464	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
465	              July 2006.

467	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
468	              Correction", RFC 5109, December 2007.

470	   [RFC3611]  Friedman, T., Caceres, R., and A. Clark, "RTP Control
471	              Protocol Extended Reports (RTCP XR)", RFC 3611,
472	              November 2003.

474	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
475	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
476	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
477	              September 1997.

479	Authors' Addresses

481	   Jonathan Rosenberg
482	   Skype

484	   Email: jdrosen@skype.net
485	   URI:   http://www.jdrosen.net

487	   Cullen Jennings
488	   Cisco

490	   Email: fluffy@cisco.com

492	   Jon Peterson
493	   Neustar

495	   Email: jon.peterson@neustar.biz
496	   Matthew Kaufman
497	   Skype

499	   Email: matthew.kaufman@skype.net

501	   Eric Rescorla
502	   RTFM

504	   Email: ekr@rtfm.com

506	   Tim Terriberry
507	   Mozilla

509	   Email: tterriberry@mozilla.com