idnits 2.17.1 

draft-jennings-dispatch-new-media-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  == There are 3 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 368: '...   The transport MAY provide a compres...'
     RFC 2119 keyword, line 565: '...  Implementation MUST support at least...'
     RFC 2119 keyword, line 569: '...  Implementation MUST support at least...'
     RFC 2119 keyword, line 573: '...   Video codecs MUST support any aspec...'
     RFC 2119 keyword, line 587: '...   Video codecs MUST support a min wid...'
     (1 more instance...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 18, 2018) is 2230 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-01) exists of
     draft-barnes-mls-protocol-00

  == Outdated reference: A later version (-20) exists of
     draft-ietf-payload-flexible-fec-scheme-06


     Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        C. Jennings
3	Internet-Draft                                                     Cisco
4	Intended status: Standards Track                          March 18, 2018
5	Expires: September 19, 2018

7	                          Modular Media Stack
8	                  draft-jennings-dispatch-new-media-01

10	Abstract

12	   A sketch of a proposal for a modular media stack for interactive
13	   communications.

15	Status of This Memo

17	   This Internet-Draft is submitted in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF).  Note that other groups may also distribute
22	   working documents as Internet-Drafts.  The list of current Internet-
23	   Drafts is at http://datatracker.ietf.org/drafts/current/.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   This Internet-Draft will expire on September 19, 2018.

32	Copyright Notice

34	   Copyright (c) 2018 IETF Trust and the persons identified as the
35	   document authors.  All rights reserved.

37	   This document is subject to BCP 78 and the IETF Trust's Legal
38	   Provisions Relating to IETF Documents
39	   (http://trustee.ietf.org/license-info) in effect on the date of
40	   publication of this document.  Please review these documents
41	   carefully, as they describe your rights and restrictions with respect
42	   to this document.  Code Components extracted from this document must
43	   include Simplified BSD License text as described in Section 4.e of
44	   the Trust Legal Provisions and are provided without warranty as
45	   described in the Simplified BSD License.

47	Table of Contents

49	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
50	   2.  Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
51	   3.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
52	   4.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
53	   5.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .   5
54	   6.  Connectivity Layer  . . . . . . . . . . . . . . . . . . . . .   5
55	     6.1.  Snowflake - New ICE . . . . . . . . . . . . . . . . . . .   6
56	     6.2.  STUN2 . . . . . . . . . . . . . . . . . . . . . . . . . .   6
57	       6.2.1.  STUN2 Request . . . . . . . . . . . . . . . . . . . .   6
58	       6.2.2.  STUN2 Response  . . . . . . . . . . . . . . . . . . .   6
59	     6.3.  TURN2 . . . . . . . . . . . . . . . . . . . . . . . . . .   7
60	   7.  Transport Layer . . . . . . . . . . . . . . . . . . . . . . .   8
61	   8.  Media Layer - RTP3  . . . . . . . . . . . . . . . . . . . . .   9
62	     8.1.  RTP Meta Data . . . . . . . . . . . . . . . . . . . . . .  12
63	     8.2.  Securing the messages . . . . . . . . . . . . . . . . . .  12
64	     8.3.  Sender requests . . . . . . . . . . . . . . . . . . . . .  12
65	     8.4.  Data Codecs . . . . . . . . . . . . . . . . . . . . . . .  13
66	     8.5.  Media Keep Alive  . . . . . . . . . . . . . . . . . . . .  13
67	     8.6.  Forward Error Correction  . . . . . . . . . . . . . . . .  13
68	     8.7.  MTI Codecs  . . . . . . . . . . . . . . . . . . . . . . .  13
69	       8.7.1.  Audio . . . . . . . . . . . . . . . . . . . . . . . .  13
70	       8.7.2.  Video . . . . . . . . . . . . . . . . . . . . . . . .  13
71	       8.7.3.  Annotation  . . . . . . . . . . . . . . . . . . . . .  14
72	       8.7.4.  Application Data Channels . . . . . . . . . . . . . .  14
73	       8.7.5.  Reverse Requests & Stats  . . . . . . . . . . . . . .  14
74	     8.8.  Message Key Agreement . . . . . . . . . . . . . . . . . .  15
75	   9.  Control Layer . . . . . . . . . . . . . . . . . . . . . . . .  15
76	     9.1.  Transport Capabilities API  . . . . . . . . . . . . . . .  15
77	     9.2.  Media Capabilities API  . . . . . . . . . . . . . . . . .  15
78	     9.3.  Transport Configuration API . . . . . . . . . . . . . . .  16
79	     9.4.  Media Configuration API . . . . . . . . . . . . . . . . .  16
80	     9.5.  Transport Metrics . . . . . . . . . . . . . . . . . . . .  18
81	     9.6.  Flow Metrics API  . . . . . . . . . . . . . . . . . . . .  18
82	     9.7.  Stream Metrics API  . . . . . . . . . . . . . . . . . . .  19
83	   10. Call Signalling - JABBER2 . . . . . . . . . . . . . . . . . .  19
84	   11. Signalling Examples . . . . . . . . . . . . . . . . . . . . .  20
85	     11.1.  Simple Audio Example . . . . . . . . . . . . . . . . . .  20
86	       11.1.1.  simple audio advertisement . . . . . . . . . . . . .  20
87	       11.1.2.  simple audio proposal  . . . . . . . . . . . . . . .  21
88	     11.2.  Simple Video Example . . . . . . . . . . . . . . . . . .  22
89	       11.2.1.  Proposal sent to camera  . . . . . . . . . . . . . .  23
90	     11.3.  Simulcast Video Example  . . . . . . . . . . . . . . . .  24
91	     11.4.  FEC Example  . . . . . . . . . . . . . . . . . . . . . .  24
92	       11.4.1.  Advertisement includes a FEC codec.  . . . . . . . .  24
93	       11.4.2.  Proposal sent to camera  . . . . . . . . . . . . . .  25
94	   12. Switched Forwarding Unit (SFU)  . . . . . . . . . . . . . . .  26
95	     12.1.  Software Defined Networking  . . . . . . . . . . . . . .  26
96	     12.2.  Vector Packet Processors . . . . . . . . . . . . . . . .  27
97	     12.3.  Information Centric Networking . . . . . . . . . . . . .  27
98	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  27
99	   14. Other Work  . . . . . . . . . . . . . . . . . . . . . . . . .  27
100	   15. Style of specification  . . . . . . . . . . . . . . . . . . .  27
101	   16. Informative References  . . . . . . . . . . . . . . . . . . .  28
102	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  28

104	1.  Introduction

106	   This draft is an accumulation of varios ideas some people are
107	   thinking about.  Most of them are fairly separable and could be
108	   morphed into existing protocols though this draft takes a blank sheet
109	   of paper approach to considering what would be the best think if we
110	   were starting from scratch.  With that is place, it is possible to
111	   ask which of theses ideas makes sense to back patch into existing
112	   protocols.

114	2.  Goals

116	   o  Better connectivity by enable situation where asymmetric media is
117	      possible.

119	   o  Design for SFU ( Switch Forwarding Units).  Design for multiparty
120	      calls first then consider two party calls as a specialized subcase
121	      of that.

123	   o  Designed for client servers with server based controll of clients

125	   o  Faster setup

127	   o  Pluggable congestion controll

129	   o  much much simpler

131	   o  end to end security

133	   o  remove ability to use STUN / TURN in DDOS reflection attacks

135	   o  ability for receiver of video to tell the sender about size
136	      changes of display window such that the sender can match

138	   o  Eliminiate the problems with ROC in SRTP

140	   o  address reasons people have not used from SDES to DTLS-SRTP

142	   o  seperation of call setup and ongoing call / conference control
143	   o  make codec negotiation more generic so that it works for future
144	      codecs

146	   o  remove ICE's need for global pacing which is more or less
147	      imposible on general purpose devices like PCs

149	3.  Overview

151	   This draft proposes a new media stack to replace the existing stack
152	   RTP, DTLS-SRTP, and SDP Offer Answer.  The key parts of this stack
153	   are connectivity layer, the transport layer, the media layer, a
154	   control API, and the singling layer.

156	   The connectivity layer uses a simplified version of ICE, called
157	   snowflake [I-D.jennings-dispatch-snowflake], to find connectivity
158	   between endpoints and change the connectivity from one address to
159	   another as different networks become available or disappear.  It is
160	   based on ideas from [I-D.jennings-mmusic-ice-fix].

162	   The transport layer uses QUIC to provide a hop by hop encrypted,
163	   congestion controlled transport of media.  Although QUIC does not
164	   currently have all of the partial reliability mechanisms to make this
165	   work, this draft assumes that they will be added to QUIC.

167	   The media layer uses existing codecs and packages them along with
168	   extra header information to provide information about, when the
169	   sequence needs to be played back, which camera it came from, and
170	   media streams to be synchronized.

172	   The control API is an abstract API that provides a way for the media
173	   stack to report it capabilities and features and a way for the an
174	   application tell the media stack how it should be configured.
175	   Configuration includes what codec to use, size and frame rate of
176	   video, and where to send the media.

178	   The singling layer is based on an advertisement and proposal model.
179	   Each endpoint can create an advertisement that describes what it
180	   supports including things like supported codecs and maximum bitrates.
181	   A proposal can be sent to an endpoint that tells the endpoint exactly
182	   what media to send and receive and where to send it.  The endpoint
183	   can accept or reject this proposal in total but cannot change any
184	   part of it.

186	4.  Terminology

188	   o  media stream: Stream of information from a single sensor.  For
189	      example, a video stream from a single camera.  A stream may have
190	      multiple encodings for example video at different resolutions.

192	   o  encoding: A encoded version of a stream.  A given stream may have
193	      several encodings at different resolutions.  One encoding may
194	      depend on other encodings such as forward error corrections or in
195	      the case of scalable video codecs.

197	   o  flow: A logical transport between two computers.  Many media
198	      streams can be transported over a single flow.  The actually IP
199	      address and ports used to transport data in the flow may change
200	      over time as connectivity changes.

202	   o  message: some data or media that to be sent across the network
203	      along with metadata about it.  Similar to an RTP packet.

205	   o  media source: a camera, microphone or other source of data on an
206	      endpoint

208	   o  media sink: a speaker, screen, or other destination for data on an
209	      endpoint

211	   o  TLV: Tag Length Value.  When used in the draft, the Tag, Length,
212	      and any integer values are coded as variable length integers
213	      similar to how this is done in CBOR.

215	5.  Architecture

217	   Much of the deployments architecture of IETF media designs are based
218	   on a distributed controller for the media stack that is running peer
219	   to peer in each client.  Nearly all deployments, by they a cloud
220	   based conferencing systems or an enterprise PBX, use a central
221	   controller that acts as an SBC to try and controll each client.  The
222	   goal here would be an deployment architecture that

224	   o  support a single controller that controlled all the device in a
225	      given conference or call.  The controller could be in the cloud or
226	      running on one of the endpoints.

228	   o  design for multi party conference calls first and treat 2 party
229	      calls as a specialed sub case of that

231	   o  design with the assumption that an light weight SFU (Switched
232	      Forwarding Unit) was used to distribute media for conference
233	      calls.

235	6.  Connectivity Layer
236	6.1.  Snowflake - New ICE

238	   All that is needed to discover the connectivity is way to:

240	   o  Gather some IP/ports that may work using TURN2 relay, STUN2, and
241	      local addresses.

243	   o  A controller, which might be running in the cloud, to inform a
244	      client to send a STUN2 packet from a given source IP/port to a
245	      given destination IP/port.

247	   o  The receiver notifies the controller about information on received
248	      STUN2 packets.

250	   o  The controller can tell the sender the secret that was in the
251	      packet to prove consent of the receiver to receive data then the
252	      sending client can allow media to flow over that connection.

254	   The actually algorithm used to decide on what pairs of addresses are
255	   tested and in what order does not need to be agreed on by both the
256	   sides of the call - only the controller needs to know it.  This
257	   allows the controller to use machine learning, past history, and
258	   heuristics to find an optimal connection much faster than something
259	   like ICE.

261	   The details of this approach are described in
262	   [I-D.jennings-dispatch-snowflake].  Many of ideas in this can be
263	   traced back to [I-D.kaufman-rtcweb-traversal].

265	6.2.  STUN2

267	   The speed of setting up a new media flow is often determined by how
268	   many STUN2 checks need to be done.  If the STUN2 packets are smaller,
269	   then the stun checks can be done faster without risk of causing
270	   congestion.

272	6.2.1.  STUN2 Request

274	   A STUN2 request consists of, well, really nothing.  The STUN client
275	   just opens a QUIC connection to the STUN server.

277	6.2.2.  STUN2 Response

279	   When the STUN2 sever receives a new QUIC connection, it responds with
280	   the IP address and port that the connection came from.

282	   The client can check it is talking to the correct STUN server by
283	   checking the fingerprint of the certificate.  Protocols like ICE
284	   would need to exchange theses fingerprints instead of all the crazy
285	   stun attributes.

287	   Thanks to Peter Thatcher for proposing STUN over QUIC.

289	6.3.  TURN2

291	   TODO: make TURN2 run over QUIC

293	   Out of band, the client tells the TURN2 server the fingerprint of the
294	   cert it uses to authenticate with.  The TURN2 server gives the client
295	   two public IP:port address pairs.  One is called inbound and other
296	   called outbound.  The client connects to the outbound port and
297	   authenticates to TURN2 server using the TLS domain name of server.
298	   The TURN2 server authenticates the client using mutual TLS with
299	   fingerprint of cert provided by the client.  Any time a message or
300	   stun packet is received on the matched inbound port, the TURN2 server
301	   forwards it to the client(s) connected to the outbound port.

303	   A single TURN2 connection can be used for multiple different calls or
304	   session at the same time and a client could choose to allocate the
305	   TURN2 connection at the time that it started up.  It does not need to
306	   be done on a per session basis.

308	   The client can not send from the TURN2 server.

310	               Client A      Turn Server     Client B
311	           (Media Receiver)                (Media Sender)
312	                 |              |              |
313	                 |              |              |
314	                 |              |              |
315	                 |(1) OnInit Register (A's fingerprint)
316	                 |------------->|              |
317	                 |              |              |
318	                 |              |              |
319	                 |(2) Register  Response (Port Pair (L,R))
320	                 |<-------------|              |
321	                 |              |              |
322	                 |              |              |
323	                 | L(left of Server), R(Right of Server)
324	                 |              |              |
325	                 |              |              |
326	                 |              |              |
327	                 |(3) Setup TLS Connection (L port)
328	                 |..............|              |
329	                 |              |              |
330	                 |              |              |
331	                 |              |              | B send's media to A
332	                 |              |              |
333	                 |              |              |
334	                 |              |              |
335	                 |              |(4) Media Tx (Received on Port R)
336	                 |              |<-------------|
337	                 |              |              |
338	                 |              |              |
339	                 |(5) Media Tx (Sent from Port L)
340	                 |<-------------|              |
341	                 |              |              |
342	                 |              |              |

344	7.  Transport Layer

346	   The responsibility of the transport layer is to provide an end to end
347	   crypto layer equivalent to DTLS and they must ensure adequate
348	   congestion control.  The transport layer brings up a flow between two
349	   computers.  This flow can be used by multiple media streams.

351	   The MTI transport layer is QUIC with packets.  It assumes that QUIC
352	   has a way to delivers the packets in an effecent unreliable mode as
353	   wells as an optional way to deliver important metadata packets in a
354	   reliable mode.  It assumes that QUIC can report up to the rate
355	   adaptation layer a current max target bandwidth that QUIC can
356	   transmit at.  It's possible these are all unrealistic characteristics
357	   of QUIC in which case a new transport protocol should be developed
358	   that provides these and is layered on top of DTLS for security.

360	   This is secured by checking the fingerprints of the DTLS connection
361	   match the fingerprints provided at the control layer or by checking
362	   the names of the certificates match what was provided at control
363	   layer.

365	   The transport layer needs to be able to set the DSCP values in
366	   transmitting packets as specified by the control layer.

368	   The transport MAY provide a compression mode to remove the redundancy
369	   of the non-encrypted portion of the media messages such as
370	   GlobalEncodingID.  For example, a GlobalEncodingID could be mapped to
371	   a QUIC channel and then it could be removed before sending the
372	   message and added back on the receiving side.

374	   The transport need to be able to ensure that it has a very small
375	   chance of being confused with the STUN2 traffic it will be
376	   multiplexed with.  (Open issue - if the STUN2 runs on top of same
377	   transport, this becomes less of issue )

379	   The transport crypto needs to be able to export server state that can
380	   be passed out of band to the client to enable the client to make a
381	   zero RTT connection to the server.

383	8.  Media Layer - RTP3

385	   Each message consist of a set of TLV headers with metadata about the
386	   packet, followed by payload data such as the output of audio or video
387	   codec.

389	   There are several message headers that help the receiver understand
390	   what to do with the media.  The TLV header are the follow:

392	   o  Conference ID: Integer that will be globally unique identifier for
393	      the for all applications using a common call singling system.
394	      This is set by the proposal.

396	   o  Endpoint ID: Integer to uniquely identify the endpoint with within
397	      scope of conference ID.  This is set by the proposal.

399	   o  Source ID: integer to uniquely identify the input source within
400	      the scope a endpoint ID.  A source could be a specific camera or a
401	      microphone.  This is set by the endpoint and included in the
402	      advertisement.

404	   o  Sink ID: integer to uniquely identify the sink within the scope a
405	      endpoint ID.  A sink could be a speaker or screen.  This is set by
406	      the endpoint and included in the advertisement.  An endpoint
407	      sending media can have this set.  If it is set it should transmit
408	      it for 3 frames any time it changes and once every 5 second.  An
409	      SFU can add, modify, or delete this from any media packet.  TODO -
410	      How to use this for SFU controlled layout - for example, if have
411	      100 users in conference and want to put the 10 most recent
412	      speakers in thumbnails.  Do we need this at all ?

414	   o  Encoding ID: integer to uniquely identify the encoding of the
415	      stream within the scope of the source ID.  Note there may be
416	      multiple encodings of data from the same source.  This is set by
417	      the proposal.

419	   o  Salt : salt to use for forming the initialization vector for AEAD.
420	      The salt shall be sent as part of the packet and need not be sent
421	      in all the packets.  This is created by the endpoint sending the
422	      message.

424	   o  GlobalEncodingID: 64 bit hash of concatenation of conference ID,
425	      endpoint ID, source ID, encoding ID

427	   o  Capture time: Time when the first sample in the message was
428	      captured.  It is a NTP time in ms with the high order bits
429	      discarded.  The number of bits in the capture time needs to be
430	      large enough that it does not wrap in for the lifetime of this
431	      stream.  This is set by the endpoint sending the message.

433	   o  Sequence ID: When the data captured for a single point in time is
434	      too large to fit in a single message, it can be split into
435	      multiple chunks which are sequentially numbered starting at 0
436	      corresponding to the first chunk of the message.  This is set by
437	      the endpoint sending the message.

439	   o  GlobalMessageID: 64 bit hash of concatenation of conference ID,
440	      endpoint ID, encoding ID, sequence ID

442	   o  Active level: this is a number from 0 to 100 indicates the
443	      probability that the sender of this media wishes it to be
444	      considered active media.  For example if it was voice, it would be
445	      100 if the person was clearly speaking, and 0 if not, and perhaps
446	      a value in the middle if it was uncertain.  This allows an media
447	      switch to select the active speaker in the in a conference call.

449	   o  Location: relative or absolute location, direction of view, and
450	      field view.  With video coming from drones, 360 cameras, VR light
451	      field cameras, and complex video conferencing rooms, this provides
452	      the information about the camera or microphone that the receiver
453	      can use to render the correct view.  This is end to end encrypted.

455	   o  Reference Frame : bool to indicate if this message is part of a
456	      reference frame.  Typically, a SFU will switch to the new video
457	      stream at the start of a reference frame.

459	   o  DSCP : DSCP to use on transmissions of this message and future
460	      messages on this GlobalEncodingID

462	   o  Layer ID : Integer indicating which layer is for scalable video
463	      codecs.  SFU may use this to selectively drop a frame.

465	   The keys used for the AEAD are unique to a given conference ID and
466	   endpoint ID.

468	   If the message has any of the following headers, they must occur in
469	   the following order followed by all other headers:

471	   1.  GlobalEncodingID,

473	   2.  GlobalMessageID,

475	   3.  conference ID,

477	   4.  endpoint ID,

479	   5.  encoding ID,

481	   6.  sequence ID,

483	   7.  active level,

485	   8.  DSCP

487	   Every second there much be at least one message in each encoding that
488	   contains:

490	   o  conference ID,

492	   o  endpoint ID,

494	   o  encoding ID,

496	   o  salt,

498	   o  and sequence ID headers
499	   but they are not needed in every packet.

501	   The sequence ID or GlobalMessageID is required in every message and
502	   periodically there should be message with the capture time.

504	8.1.  RTP Meta Data

506	   We tend to end up with a few categories of data associated with the
507	   media:

509	   o  Stuff you need at the same time you get the media.  For example,
510	      this is a reference frame.

512	   o  Stuff you need soon but not instantly.  For example the name of
513	      the speaker in a given rectangle of a video stream

515	   And it tends to change at different rates:

517	   o  Stuff that you need to process the media and may change but does
518	      not change quickly and you don't need it with every frame.  For
519	      example, salt for encryption

521	   o  Stuff that you need to join the media but may never change.  For
522	      example, resolution of the video is

524	   TODO - think about how to optimize design for each type of meta data

526	8.2.  Securing the messages

528	   The whole message is end to end secured with AEAD.  The headers are
529	   authenticated while the payload data is authenticated and encrypted.
530	   Similar to how the IV for AES-GCM is calculated in SRTP, in this case
531	   the IV is computed by xor'ing the salt with the concatenation of the
532	   GlobalEncodingID and low 64 bits of sequence ID.  The message
533	   consists of the authenticated data, followed by the encrypted data ,
534	   then the authentication tag.

536	8.3.  Sender requests

538	   The control layer supports requesting retransmission of a particular
539	   media message identified by IDs and capture time it would contain.

541	   The control layer supports requesting a maximum rate for each given
542	   encoding ID.

544	8.4.  Data Codecs

546	   Data messages including raw bytes, xml, senml can all be sent just
547	   like media by selecting an appropriate codec and a software based
548	   source or sink.  An additional parameter to the codec can indicate if
549	   reliably delivery is needed and if in order delivery is needed.

551	8.5.  Media Keep Alive

553	   Provided by transport.

555	8.6.  Forward Error Correction

557	   A new Reed-Solomon based FEC scheme based on
558	   [I-D.ietf-payload-flexible-fec-scheme] that provides FEC over
559	   messages needs to be defined.

561	8.7.  MTI Codecs

563	8.7.1.  Audio

565	   Implementation MUST support at least G711 and Opus

567	8.7.2.  Video

569	   Implementation MUST support at least H.264 and AV1

571	   Video codecs use square pixels.

573	   Video codecs MUST support any aspect ratio within the limits of their
574	   max width and height.

576	   Video codecs can specify a maximum pixel rate, maximum frame rate,
577	   maximum images size.  The can also specify a list of binary flags of
578	   supported features which are defined by the codec and may be
579	   supported by the codec for encode, decode, or neither where each
580	   feature can be independently controlled.  They can not impose
581	   constraints beyond that.  Some existing codecs like vp8 may easily
582	   fit into that while some codec like H264 may need some suspects
583	   defined as new codecs to meet the requirements for this.  It is not
584	   expected that all the nuances that could be negotiated with SDP for
585	   264 would be supported in this new media.

587	   Video codecs MUST support a min width and min height of 1.

589	   All video on the wire is oriented such that the first scan line in
590	   the frame is up and first pixel in the scan line is on the left.

592	   T.38 fax and DTMF are not supported.  Fax can be sent as a TIFF
593	   imager over a data channel and DTFM can be done as an application
594	   specific information over a data channel.

596	   TODO: Capture the list of what metadata video encoders produce * if
597	   it is a reference frame or not * resolution * frame-rate ? * capture
598	   time of frame

600	   TODO: Capture the list of what metadata video encoders needs.  *
601	   capture timestamp * source and target resolution * source and target
602	   frame-rate * target bitrate * max bitrate * max pixel rate

604	8.7.3.  Annotation

606	   Optional support for annotation based overlay using vector graphics
607	   such as a subset of SVG.

609	8.7.4.  Application Data Channels

611	   Need support for application defined data in both a reliable and
612	   unreliable datagram mode.

614	8.7.5.  Reverse Requests & Stats

616	   The hope is that this is not needed.

618	   Much of what goes in the reverse direction of the media in RTCP is
619	   either used for congestion controll, diagnostics, or controll of the
620	   codec such as requesting to resent a frame or sending a new intra
621	   codec frame for video.  The design reduces the need for this.

623	   The congestion controll information which is needed quickly is all
624	   handled at QUIC layer.

626	   The diagnostic type information can be reported from the endpint to
627	   the controller and does not need to flow at the media level.

629	   Information that needs to be delivered reliably can be sent that way
630	   at the QUIC level remove the need for retransmit type request.
631	   System that use selective retransmission to recover from packet loss
632	   of media do not tend to work as well for interactive medias as
633	   forward error correction schemes because of the large latency they
634	   introduce.

636	   Information like requesting a new intra codec frame for video often
637	   needs to come from the controller and can be sent over the signalling
638	   and controll layer.

640	8.8.  Message Key Agreement

642	   The secret for encrypting messages can be provided in the proposal by
643	   value or by a reference.  The reference approach allows the client to
644	   get it from a messaging system where the server creating the proposal
645	   may not have access to the the secret.  For example, it might come
646	   from a system like [I-D.barnes-mls-protocol].

648	9.  Control Layer

650	   The control layer needs an API to find out what the capabilities of
651	   the device are, and then a way to set up sending and receiving
652	   stream.  All media flow are only in one direction.  The control is
653	   broken into control of connectivity and transports, and control of
654	   media streams.

656	9.1.  Transport Capabilities API

658	   An API to get information for remote connectivity including:

660	   o  set the IP, port, and credential for each TURN2 server

662	   o  can return the IP, port tuple for the remote side to send to TURN2
663	      server

665	   o  gather local IP, port, protocol tuples for receiving media

667	   o  report SHA256 fingerprint of local TLS certificate

669	   o  encryption algorithms supported

671	   o  report an error for a bad TURN2 credential

673	9.2.  Media Capabilities API

675	   Send and receive codecs are consider separate codecs and can have
676	   separate capabilities though the default to the same if not specified
677	   separately.

679	   For each send or receive audio codec, an API to learn:

681	   o  codec name

683	   o  the max sample rate

685	   o  the max sample size

687	   o  the max bitrate
688	   For each send or receive video codec, an API to learn:

690	   o  codec name

692	   o  the max width

694	   o  the max height

696	   o  the max frame rate

698	   o  the max pixel depth

700	   o  the max bitrate

702	   o  the max pixel rate ( pixels / second )

704	9.3.  Transport Configuration API

706	   To create a new flow, the information that can be configured is:

708	   o  turn server to use

710	   o  list of IP, Port, Protocol tuples to try connecting to

712	   o  encryption algorithm to use

714	   o  TLS fingerprint of far side

716	   An api to allow modification of the follow attributes of a flow:

718	   o  total max bandwidth for flow

720	   o  forward error correction scheme for flow

722	   o  FEC time window

724	   o  retransmission scheme for flow

726	   o  addition IP, Port, Protocol pairs to send to that may improve
727	      connectivity

729	9.4.  Media Configuration API

731	   For all streams:

733	   o  set conference ID

735	   o  set endpoint ID
736	   o  set encoding ID

738	   o  salt and secret for AEAD

740	   o  flag to pause transition

742	   For each transmitted audio steam, a way to set the:

744	   o  audio codec to use

746	   o  media source to connect

748	   o  max encoded bitrate

750	   o  sample rate

752	   o  sample size

754	   o  number of channels to encode

756	   o  packetization time

758	   o  process as one of : automatically set, raw, speech, music

760	   o  DSCP value to use

762	   o  flag to indicating to use constant bit rate

764	   o  optionally set a sinkID to periodically include in the media

766	   For each transmitted video stream, a way to set

768	   o  video codec to use

770	   o  media source to connect to

772	   o  max width and max height

774	   o  max encoded bitrate

776	   o  max pixel rate

778	   o  sample rate

780	   o  sample size

782	   o  process as one of : automatically set, rapidly changing video,
783	      fine detail video

785	   o  DSCP value to use

787	   o  for layered codec, a layer ID and set of layers IDs this depends
788	      on

790	   o  optionally set a sinkID to periodically include in the media

792	   For each transmitted video stream, a way to tell it to:

794	   o  encode the next frame as an intra frame

796	   For each transmitted data stream:

798	   o  a way to send a data message and indicate reliable or unreliable
799	      transmission

801	   For each received audio stream:

803	   o  audio codec to use

805	   o  media sink to connect to

807	   o  lip sync flag

809	   For each received video stream:

811	   o  video codec to use

813	   o  media sink to connect to

815	   o  lip sync flag

817	   For each received data stream:

819	   o  notification of received data messages

821	   Note on lip sync: For any streams that have the lip sync flag set to
822	   true, the render attempts to synchronize their play back.

824	9.5.  Transport Metrics

826	   o  report gathering state and completion

828	9.6.  Flow Metrics API

830	   For each flow, report:

832	   o  report connectivity state
833	   o  report bits sent

835	   o  report packets lost

837	   o  report estimated RTT

839	   o  report SHA256 fingerprint for certificate of far side

841	   o  current 5 tuple in use

843	9.7.  Stream Metrics API

845	   For sending streams:

847	   o  Bits sent

849	   o  packets lost

851	   For receiving streams:

853	   o  capture time of most recently receives packet

855	   o  endpoint ID of more recently received packet

857	   o  bits received

859	   o  packets lost

861	   For video streams (send & receive):

863	   o  current encoded width and height

865	   o  current encoded frame rate

867	10.  Call Signalling - JABBER2

869	   Call signalling is out of scope for usages like WebRTC but other
870	   usages may want a common REST API they can use.

872	   Call signalling works be having the client connect to a server when
873	   it starts up and send its current advertisement and open a web socket
874	   or to receive proposals from the server.  A client can make a rest
875	   call indicating the parties(s) it wishes to connect to and the server
876	   will then send proposals to all clients that connect them.  The
877	   proposal tell each client exactly how to configure it's media stack
878	   and MUST be either completely accepted, or completely rejected.

880	   The signalling is based on the the advertisement proposal ideas from
881	   [I-D.peterson-sipcore-advprop].

883	   We define one round trip of signalling to be a message going from a
884	   client up to a server in the cloud, then down to another client which
885	   returns a response along the reverse path.  With this definition SIP
886	   is takes 1.5 round trips or more if TURN is needed to set up a call
887	   while this takes 0.5 round trips.

889	11.  Signalling Examples

891	11.1.  Simple Audio Example

893	11.1.1.  simple audio advertisement

895	                  {
896	                    "receiveAt":[
897	                      {
898	                        "relay":"2001:db8::10:443",
899	                        "stunSecret":"s8i739dk8",
900	                        "tlsFingerprintSHA256":"1283938"
901	                      },
902	                      {
903	                        "stun":"203.0.113.10:43210",
904	                        "stunSecret":"s8i739dk8",
905	                        "tlsFingerprintSHA256":"1283938"
906	                      },
907	                      {
908	                        "local":"192.168.0.2:443",
909	                        "stunSecret":"s8i739dk8",
910	                        "tlsFingerprintSHA256":"1283938"
911	                      }
912	                    ],
913	                    "sources":[
914	                      {
915	                        "sourceID":1,
916	                        "sourceType":"audio",
917	                        "codecs":[
918	                          {
919	                            "codecName":"opus",
920	                            "maxBitrate":128000
921	                          },
922	                          {
923	                            "codecName":"g711"
924	                          }
925	                        ]
926	                      }
927	                    ],
928	                    "sinks":[
929	                      {
930	                        "sinkID":1,
931	                        "sourceType":"audio",
932	                        "codecs":[
933	                          {
934	                            "codecName":"opus",
935	                            "maxBitrate":256000
936	                          },
937	                          {
938	                            "codecName":"g711"
939	                          }
940	                        ]
941	                      }
942	                    ]
943	                  }

945	11.1.2.  simple audio proposal

947	                  {
948	                    "receiveAt":[
949	                      {
950	                        "relay":"2001:db8::10:443",
951	                        "stunSecret":"s8i739dk8"
952	                      },
953	                      {
954	                        "stun":"203.0.113.10:43210",
955	                        "stunSecret":"s8i739dk8"
956	                      },
957	                      {
958	                        "local":"192.168.0.10:443",
959	                        "stunSecret":"s8i739dk8"
960	                      }
961	                    ],
962	                    "sendTo":[
963	                      {
964	                        "relay":"2001:db8::20:443",
965	                        "stunSecret":"20kdiu83kd8",
966	                        "tlsFingerprintSHA256":"9389739"
967	                      },
968	                      {
969	                        "stun":"203.0.113.20:43210",
970	                        "stunSecret":"20kdiu83kd8",
971	                        "tlsFingerprintSHA256":"9389739"
972	                      },
973	                      {
974	                        "local":"192.168.0.20:443",
975	                        "stunSecret":"20kdiu83kd8",
976	                        "tlsFingerprintSHA256":"9389739"
977	                      }
978	                    ],
979	                    "sendStreams":[
980	                      {
981	                        "conferenceID":4638572387,
982	                        "endpointID":23,
983	                        "sourceID":1,
984	                        "encodingID":1,
985	                        "codecName":"opus",
986	                        "AEAD":"AES128-GCM",
987	                        "secret":"xy34",
988	                        "maxBitrate":24000,
989	                        "packetTime":20
990	                      }
991	                    ],
992	                    "receiveStreams":[
993	                      {
994	                        "conferenceID":4638572387,
995	                        "endpointID":23,
996	                        "sinkID":1,
997	                        "encodingID":1,
998	                        "codecName":"opus",
999	                        "AEAD":"AES128-GCM",
1000	                        "secret":"xy34"
1001	                      }
1002	                    ]
1003	                  }

1005	11.2.  Simple Video Example

1007	   Advertisement for simple send only camera with no audio
1008	                    {
1009	                      "sources":[
1010	                        {
1011	                          "sourceID":1,
1012	                          "sourceType":"video",
1013	                          "codecs":[
1014	                            {
1015	                              "codecName":"av1",
1016	                              "maxBitrate":20000000,
1017	                              "maxWidth":3840,
1018	                              "maxHeight":2160,
1019	                              "maxFrameRate":120,
1020	                              "maxPixelRate":248832000,
1021	                              "maxPixelDepth":8
1022	                            }
1023	                          ]
1024	                        }
1025	                      ]
1026	                    }

1028	11.2.1.  Proposal sent to camera

1030	                  {
1031	                    "sendTo":[
1032	                      {
1033	                        "relay":"2001:db8::20:443",
1034	                        "stunSecret":"20kdiu83kd8",
1035	                        "tlsFingerprintSHA256":"9389739"
1036	                      }
1037	                    ],
1038	                    "sendStreams":[
1039	                      {
1040	                        "conferenceID":0,
1041	                        "endpointID":0,
1042	                        "sourceID":0,
1043	                        "encodingID":0,
1044	                        "codecName":"av1",
1045	                        "AEAD":"NULL",
1046	                        "width":640,
1047	                        "height":480,
1048	                        "frameRate":30
1049	                      }
1050	                    ]
1051	                  }

1053	11.3.  Simulcast Video Example

1055	   Advertisement same as simple camera above but proposal has two
1056	   streams with different encodingID.

1058	                  {
1059	                    "sendTo":[
1060	                      {
1061	                        "relay":"2001:db8::20:443",
1062	                        "stunSecret":"20kdiu83kd8",
1063	                        "tlsFingerprintSHA256":"9389739"
1064	                      }
1065	                    ],
1066	                    "sendStreams":[
1067	                      {
1068	                        "conferenceID":0,
1069	                        "endpointID":0,
1070	                        "sourceID":0,
1071	                        "encodingID":1,
1072	                        "codecName":"av1",
1073	                        "AEAD":"NULL",
1074	                        "width":1920,
1075	                        "height":1080,
1076	                        "frameRate":30
1077	                      },
1078	                      {
1079	                        "conferenceID":0,
1080	                        "endpointID":0,
1081	                        "sourceID":0,
1082	                        "encodingID":2,
1083	                        "codecName":"av1",
1084	                        "AEAD":"NULL",
1085	                        "width":240,
1086	                        "height":240,
1087	                        "frameRate":15
1088	                      }
1089	                    ]
1090	                  }

1092	11.4.  FEC Example

1094	11.4.1.  Advertisement includes a FEC codec.

1096	                    {
1097	                      "sources":[
1098	                        {
1099	                          "sourceID":1,
1100	                          "sourceType":"video",
1101	                          "codecs":[
1102	                            {
1103	                              "codecName":"av1",
1104	                              "maxBitrate":20000000,
1105	                              "maxWidth":3840,
1106	                              "maxHeight":2160,
1107	                              "maxFrameRate":120,
1108	                              "maxPixelRate":248832000,
1109	                              "maxPixelDepth":8
1110	                            },
1111	                            {
1112	                              "codecName":"flex-fec-rs"
1113	                            }
1114	                          ]
1115	                        }
1116	                      ]
1117	                    }

1119	11.4.2.  Proposal sent to camera
1120	                  {
1121	                    "sendTo":[
1122	                      {
1123	                        "relay":"2001:db8::20:443",
1124	                        "stunSecret":"20kdiu83kd8",
1125	                        "tlsFingerprintSHA256":"9389739"
1126	                      }
1127	                    ],
1128	                    "sendStreams":[
1129	                      {
1130	                        "conferenceID":0,
1131	                        "endpointID":0,
1132	                        "sourceID":0,
1133	                        "encodingID":1,
1134	                        "codecName":"av1",
1135	                        "AEAD":"NULL",
1136	                        "width":640,
1137	                        "height":480,
1138	                        "frameRate":30
1139	                      },
1140	                      {
1141	                        "conferenceID":0,
1142	                        "endpointID":0,
1143	                        "sourceID":0,
1144	                        "encodingID":2,
1145	                        "AEAD":"NULL",
1146	                        "codecName":"flex-fec-rs",
1147	                        "fecRepairWindow":200,
1148	                        "fecRepairEncodingIDs":[
1149	                          1
1150	                        ]
1151	                      }
1152	                    ]
1153	                  }

1155	12.  Switched Forwarding Unit (SFU)

1157	   When several clients are in conference call, the SFU can forward
1158	   packets based on looking at which clients needs a given
1159	   GlobalEncodingID.  By looking at the "active level", the SFU can
1160	   figure out which endpoints are the active speaker and forward only
1161	   those.  The SFU never changes anything in the message.

1163	12.1.  Software Defined Networking

1165	   Is it possible to use the packet recycling concepts in SDN to forward
1166	   a single packet to multiple endpoints?  Can the way SDN forwarding
1167	   would work be adapted to use a SDN router as a SFU?

1169	12.2.  Vector Packet Processors

1171	   Can we use fast VPP systems like fd.io to create a SFU?

1173	12.3.  Information Centric Networking

1175	   What changes would be needed to map RTP2 into the prefix and suffix
1176	   of hICN?

1178	13.  Acknowledgements

1180	   Thank you for input from: Harald Alvestrand, Espen Berger, Matthew
1181	   Kaufman, Patrick Linskey, Eric Rescorla, Peter Thatcher, Malcolm
1182	   Walters Martin Thomson

1184	14.  Other Work

1186	   rfc7016

1188	   draft-kaufman-rtcweb-traversal

1190	   Consider using terminology from rfc7656

1192	   docs.google.com/presentation/
1193	   d/1Sg_1TVCcKJvZ8Egz5oa0CP01TC2rNdv9HVu7W38Y4zA/
1194	   edit#slide=id.g29a8672e18_22_120

1196	   docs.google.com/presentation/d/1o-
1197	   o5jZBLw3Py1OuenzWDkxDG6NigSmLHvGw5KemKWLw/
1198	   edit#slide=id.g2f8f4acff1_1_249

1200	   cs.chromium.org/chromium/src/third_party/webrtc/common_video/include/
1201	   video_frame.h

1203	15.  Style of specification

1205	   Fundamental driven by experiments.  The proposal is to have a high
1206	   level overview document where we document some of the design - this
1207	   document could be a start of that.  Then write a a spec for each on
1208	   of the separable protocol parts such as STUN2, TURN2, etc.

1210	   The protocol specs would contain a high level overview like you might
1211	   find on a wikipedia page and the details of the protocol encoding
1212	   would be provided in an open source reference implementation.  The
1213	   test code for the references implementation helps test the spec.  The
1214	   implementation is not optimized for perfromance but instead is simply
1215	   trying to clearly illustrate the protocol.  Particular version of the
1216	   draft would be bound to a tagged version of the source code.  All the
1217	   source code would be under normal IETF IPR rules just like it was
1218	   included directly in the draft.

1220	16.  Informative References

1222	   [I-D.barnes-mls-protocol]
1223	              Barnes, R., Millican, J., Omara, E., Cohn-Gordon, K., and
1224	              R. Robert, "The Messaging Layer Security (MLS) Protocol",
1225	              draft-barnes-mls-protocol-00 (work in progress), February
1226	              2018.

1228	   [I-D.ietf-payload-flexible-fec-scheme]
1229	              Zanaty, M., Singh, V., Begen, A., and G. Mandyam, "RTP
1230	              Payload Format for Flexible Forward Error Correction
1231	              (FEC)", draft-ietf-payload-flexible-fec-scheme-06 (work in
1232	              progress), March 2018.

1234	   [I-D.jennings-dispatch-snowflake]
1235	              Jennings, C. and S. Nandakumar, "Snowflake - A Lighweight,
1236	              Asymmetric, Flexible, Receiver Driven Connectivity
1237	              Establishment", draft-jennings-dispatch-snowflake-01 (work
1238	              in progress), March 2018.

1240	   [I-D.jennings-mmusic-ice-fix]
1241	              Jennings, C., "Proposal for Fixing ICE", draft-jennings-
1242	              mmusic-ice-fix-00 (work in progress), July 2015.

1244	   [I-D.kaufman-rtcweb-traversal]
1245	              Kaufman, M. and J. Rosenberg, "NAT Traversal Requirements
1246	              for RTCWEB", draft-kaufman-rtcweb-traversal-00 (work in
1247	              progress), June 2011.

1249	   [I-D.peterson-sipcore-advprop]
1250	              Peterson, J. and C. Jennings, "The Advertisement/Proposal
1251	              Model of Session Description", draft-peterson-sipcore-
1252	              advprop-01 (work in progress), March 2011.

1254	Author's Address

1256	   Cullen Jennings
1257	   Cisco

1259	   Email: fluffy@iii.ca