idnits 2.17.1
draft-ietf-clue-rtp-mapping-06.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (January 17, 2016) is 3013 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Missing Reference: 'RFCXXXX' is mentioned on line 617, but not defined
== Outdated reference: A later version (-54) exists of
draft-ietf-mmusic-sdp-bundle-negotiation-24
== Outdated reference: A later version (-15) exists of
draft-ietf-clue-signaling-06
== Outdated reference: A later version (-14) exists of
draft-ietf-mmusic-sdp-simulcast-03
-- Obsolete informational reference (is this intentional?): RFC 4566
(Obsoleted by RFC 8866)
-- Obsolete informational reference (is this intentional?): RFC 5285
(Obsoleted by RFC 8285)
Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 CLUE WG R. Even
3 Internet-Draft Huawei Technologies
4 Intended status: Standards Track J. Lennox
5 Expires: July 20, 2016 Vidyo
6 January 17, 2016
8 Mapping RTP streams to CLUE media captures
9 draft-ietf-clue-rtp-mapping-06.txt
11 Abstract
13 This document describes how the Real Time transport Protocol (RTP) is
14 used in the context of the CLUE protocol. It also describes the
15 mechanisms and recommended practice for mapping RTP media streams
16 defined in SDP to CLUE media captures.
18 Status of This Memo
20 This Internet-Draft is submitted in full conformance with the
21 provisions of BCP 78 and BCP 79.
23 Internet-Drafts are working documents of the Internet Engineering
24 Task Force (IETF). Note that other groups may also distribute
25 working documents as Internet-Drafts. The list of current Internet-
26 Drafts is at http://datatracker.ietf.org/drafts/current/.
28 Internet-Drafts are draft documents valid for a maximum of six months
29 and may be updated, replaced, or obsoleted by other documents at any
30 time. It is inappropriate to use Internet-Drafts as reference
31 material or to cite them other than as "work in progress."
33 This Internet-Draft will expire on July 20, 2016.
35 Copyright Notice
37 Copyright (c) 2016 IETF Trust and the persons identified as the
38 document authors. All rights reserved.
40 This document is subject to BCP 78 and the IETF Trust's Legal
41 Provisions Relating to IETF Documents
42 (http://trustee.ietf.org/license-info) in effect on the date of
43 publication of this document. Please review these documents
44 carefully, as they describe your rights and restrictions with respect
45 to this document. Code Components extracted from this document must
46 include Simplified BSD License text as described in Section 4.e of
47 the Trust Legal Provisions and are provided without warranty as
48 described in the Simplified BSD License.
50 Table of Contents
52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
53 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
54 3. RTP topologies for CLUE . . . . . . . . . . . . . . . . . . . 3
55 4. Mapping CLUE Capture Encodings to RTP streams . . . . . . . . 5
56 4.1. Review of RTP related documents relevant to CLUE work. . 6
57 4.2. Requirements of a solution . . . . . . . . . . . . . . . 7
58 4.3. Static Mapping . . . . . . . . . . . . . . . . . . . . . 9
59 4.4. Dynamic mapping . . . . . . . . . . . . . . . . . . . . . 9
60 4.5. Recommendations . . . . . . . . . . . . . . . . . . . . . 9
61 5. Application to CLUE Media Requirements . . . . . . . . . . . 10
62 6. CaptureID definition . . . . . . . . . . . . . . . . . . . . 11
63 6.1. RTCP CaptureId SDES Item . . . . . . . . . . . . . . . . 11
64 6.2. RTP Header Extension . . . . . . . . . . . . . . . . . . 12
65 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 12
66 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13
67 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
68 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14
69 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15
70 11.1. Normative References . . . . . . . . . . . . . . . . . . 15
71 11.2. Informative References . . . . . . . . . . . . . . . . . 16
72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18
74 1. Introduction
76 Telepresence systems can send and receive multiple media streams.
77 The CLUE framework [I-D.ietf-clue-framework] defines media captures
78 as a source of Media, such as from one or more Capture Devices. A
79 Media Capture may also be constructed from other Media streams. A
80 middle box can express conceptual Media Captures that it constructs
81 from Media streams it receives. A Multiple Content Capture (MCC) is
82 a special Media Capture composed of multiple Media Captures.
84 SIP offer answer [RFC3264] uses SDP [RFC4566] to describe the
85 RTP[RFC3550] media streams. Each RTP stream has a unique SSRC within
86 its RTP session. The content of the RTP stream is created by an
87 encoder in the endpoint. This may be an original content from a
88 camera or a content created by an intermediary device like an MCU.
90 This document makes recommendations, for the telepresence
91 architecture, about how RTP and RTCP streams should be encoded and
92 transmitted, and how their relation to CLUE Media Captures should be
93 communicated. The proposed solution supports multiple RTP
94 topologies.
96 With regards to the media (audio and video), systems that support
97 CLUE use RTP for the media, SDP for codec and media transport
98 negotiation (CLUE individual encodings) and the CLUE protocol for
99 media Capture description and selection. In order to associate the
100 media in the different protocols there are three mapping that need to
101 be specified:
103 1. CLUE individual encodings to SDP
105 2. RTP media streams to SDP (this is not a CLUE specific mapping)
107 3. RTP media streams to MC to map the received RTP steam to the
108 current MC in the MCC.
110 2. Terminology
112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
114 document are to be interpreted as described in RFC2119[RFC2119] and
115 indicate requirement levels for compliant RTP implementations.
117 The definitions from the CLUE framework document
118 [I-D.ietf-clue-framework] section 3 are used by this document as
119 well.
121 3. RTP topologies for CLUE
123 The typical RTP topologies used by Telepresence systems specify
124 different behaviors for RTP and RTCP distribution. A number of RTP
125 topologies are described in [RFC7667]. For telepresence, the
126 relevant topologies include point-to-point, as well as media mixers,
127 media- switching mixers, and Selective Forwarding middleboxs.
129 In the point-to-point topology, one peer communicates directly with a
130 single peer over unicast. There can be one or more RTP sessions, and
131 each RTP session can carry multiple RTP streams identified by their
132 SSRC. All SSRCs will be recognized by the peers based on the
133 information in the RTCP SDES report that will include the CNAME and
134 SSRC of the sent RTP streams. There are different point to point use
135 cases as specified in CLUE use case [RFC7205]. There may be a
136 difference between the symmetric and asymmetric use cases. While in
137 the symmetric use case the typical mapping will be from a Media
138 capture device to a render device (e.g. camera to monitor) in the
139 asymmetric case the render device may receive different capture
140 information (RTP stream from different cameras) if it has fewer
141 rendering devices (monitors). In some cases, a CLUE session which,
142 at a high-level, is point-to-point may nonetheless have RTP stream
143 which is best described by one of the mixer topologies. For example,
144 a CLUE endpoint can produce composite or switched captures for use by
145 a receiving system with fewer displays than the sender has cameras.
146 The Media capture may be described using MCC.
148 For the Media Mixer topology [RFC7667], the peers communicate only
149 with the mixer. The mixer provides mixed or composited media
150 streams, using its own SSRC for the sent streams. There are two
151 cases here. In the first case the mixer may have separate RTP
152 sessions with each peer (similar to the point to point topology)
153 terminating the RTCP sessions on the mixer; this is known as Topo-
154 RTCP-Terminating MCU in [RFC7667]. In the second case, the mixer can
155 use a conference-wide RTP session similar to [RFC7667] Topo-mixer or
156 Topo-Video-switching. The major difference is that for the second
157 case, the mixer uses conference-wide RTP sessions, and distributes
158 the RTCP reports to all the RTP session participants, enabling them
159 to learn all the CNAMEs and SSRCs of the participants and know the
160 contributing source or sources (CSRCs) of the original streams from
161 the RTP header. In the first case, the Mixer terminates the RTCP and
162 the participants cannot know all the available sources based on the
163 RTCP information. The conference roster information including
164 conference participants, endpoints, media and media-id (SSRC) can be
165 available using the conference event package [RFC4575] element.
167 In the Media-Switching Mixer topology [RFC7667], the peer to mixer
168 communication is unicast with mixer RTCP feedback. It is
169 conceptually similar to a compositing mixer as described in the
170 previous paragraph, except that rather than compositing or mixing
171 multiple sources, the mixer provides one or more conceptual sources
172 selecting one source at a time from the original sources. The Mixer
173 creates a conference-wide RTP session by sharing remote SSRC values
174 as CSRCs to all conference participants.
176 In the Selective Forwarding middlebox topology, the peer to mixer
177 communication is unicast with RTCP mixer feedback. Every potential
178 sender in the conference has a source which may be "projected" by the
179 mixer into every other RTP session in the conference; thus, every
180 original source is maintained with an independent RTP identity to
181 every receiver, maintaining separate decoding state and its original
182 RTCP SDES information. However, RTCP is terminated at the mixer,
183 which might also perform reliability, repair, rate adaptation, or
184 transcoding on the stream. Senders' SSRCs may be renumbered by the
185 mixer. The sender may turn the projected sources on and off at any
186 time, depending on which sources it thinks are most relevant for the
187 receiver; this is the primary reason why this topology must act as an
188 RTP mixer rather than as a translator, as otherwise these disabled
189 sources would appear to have enormous packet loss. Source switching
190 is accomplished through this process of enabling and disabling
191 projected sources, with the higher-level semantic assignment of
192 reason for the RTP streams assigned externally.
194 The above topologies demonstrate two major RTP/RTCP behaviors:
196 1. The mixer may either use the source SSRC when forwarding RTP
197 packets, or use its own created SSRC. Still the mixer will
198 distribute all RTCP information to all participants creating
199 conference-wide RTP session/s. This allows the participants to
200 learn the available RTP sources in each RTP session. The
201 original source information will be the SSRC or in the CSRC
202 depending on the topology. The point to point case behaves like
203 this.
205 2. The mixer terminates the RTCP from the source, creating separate
206 RTP sessions with the peers. In this case the participants will
207 not receive the source SSRC in the CSRC. Since this is usually a
208 mixer topology, the source information is available from the SIP
209 conference event package [RFC4575]. Subscribing to the
210 conference event package allows each participant to know the
211 SSRCs of all sources in the conference.
213 4. Mapping CLUE Capture Encodings to RTP streams
215 The different topologies described in Section 3 create different SSRC
216 distribution models and RTP stream multiplexing points.
218 Most video conferencing systems today can separate multiple RTP
219 sources by placing them into separate RTP sessions using, the SDP
220 description. For example, main and slides video sources are
221 separated into separate RTP sessions based on the content attribute
222 [RFC4796]. This solution works straightforward if the multiplexing
223 point is at the UDP transport level, where each RTP stream uses a
224 separate RTP session. This will also be true for mapping the RTP
225 streams to Media Captures Encodings if each media capture encodings
226 uses a separate RTP session, and the consumer can identify it based
227 on the receiving RTP port. In this case, SDP only needs to label the
228 RTP session with an identifier that can be used to identify the media
229 capture in the CLUE description. The SDP label attribute serves as
230 this identifier. In this case, the mapping does not change even if
231 the RTP session is switched using same or different SSRC. (The
232 multiplexing is not at the SSRC level).
234 Even though Session multiplexing is supported by CLUE, for scaling
235 reasons, CLUE recommends using SSRC multiplexing in a single or
236 multiple sessions using [I-D.ietf-mmusic-sdp-bundle-negotiation]. So
237 we need to look at how to map RTP streams to Captures Encodings when
238 SSRC multiplexing is used.
240 When looking at SSRC multiplexing we can see that in various
241 topologies, the SSRC behavior may be different:
243 1. The SSRCs are static (assigned by the MCU/Mixer), and there is an
244 SSRC for each media capture encoding defined in the CLUE
245 protocol. Source information may be conveyed using CSRC, or, in
246 the case of topo-RTCP-Terminating MCU, is not conveyed.
248 2. The SSRCs are dynamic, representing the original source and are
249 relayed by the Mixer/MCU to the participants.
251 In the above two cases the MCU/Mixer may create an advertisement,
252 with a virtual room capture scene.
254 Another case we can envision is that the MCU / Mixer relays all the
255 capture scenes from all advertisements to all consumers. This means
256 that the advertisement will include multiple capture scenes, each
257 representing a separate TelePresence room with its own coordinate
258 system.
260 MCCs bring another mapping issue, in that an MCC represents multiple
261 Media Captures that can be sent as part of this MCC if configured by
262 the consumer. When receiving an RTP stream which is mapped to the
263 MCC, the consumer needs to know which original MC it is in order to
264 get the MC parameters from the advertisement. If a consumer
265 requested a MCC, the original MC does not have a capture encoding, so
266 it cannot be associated with an m-line using a label as described in
267 CLUE signaling [I-D.ietf-clue-signaling]. This is important, for
268 example, to get correct scaling information for the original MC,
269 which may be different for the various MCs that are contributing to
270 the MCC.
272 4.1. Review of RTP related documents relevant to CLUE work.
274 This section provides an overview of the RFCs and drafts that are can
275 be used in a CLUE system and as a base for a mapping solution. This
276 section is for information only; the normative behavior is given in
277 the cited documents. Tools for SSRC multiplexing support are defined
278 for general conferencing applications; CLUE systems use the same
279 tools.
281 When looking at the available tools based on current work in MMUSIC,
282 AVTcore and AVText Working Groups for supporting SSRC multiplexing
283 the following documents are considered to be relevant.
285 Negotiating Media Multiplexing Using the Session Description Protocol
286 in [I-D.ietf-mmusic-sdp-bundle-negotiation] defines a "bundle" SDP
287 grouping extension that can be used with SDP Offer/Answer mechanism
288 to negotiate the usage of a single 5-tuple for sending and receiving
289 media associated with multiple SDP media descriptions ("m=").
290 [bundle] specifies how to associate a received RTP stream with the
291 m-line describing it. The assumption in the work is that each SDP
292 m-line represents a single media source.
293 [I-D.ietf-mmusic-sdp-bundle-negotiation] specifies using the SDP mid
294 value and sending it as RTCP SDES and an RTP header extension in
295 order to be able to map the RTP stream to the SDP m-line. This is
296 relevant when there are multiple RTP streams with the same payload
297 subtype number.
299 SDP Source attribute [RFC5576] mechanisms to describe specific
300 attributes of RTP sources based on their SSRC.
302 Negotiation of generic image attributes in SDP [RFC6236] provides the
303 means to negotiate the image size. The image attribute can be used
304 to offer different image parameters like size but in order to offer
305 multiple RTP streams with different resolutions it does it using
306 separate RTP session for each image option
307 ([I-D.ietf-mmusic-sdp-bundle-negotiation] provides the support of a
308 single RTP session but each image option will need a separate SDP
309 m-line).
311 The recommended support of the simulcast case is to use
312 [I-D.ietf-mmusic-sdp-simulcast]
314 In the next sections, the document will propose mechanisms to map the
315 RTP streams to CLUE media captures.
317 4.2. Requirements of a solution
319 This section lists, more briefly, the requirements a media
320 architecture for Clue telepresence needs to achieve, summarizing the
321 discussion of previous sections. In this section, RFC 2119 [RFC2119]
322 language refers to requirements on a solution, not an implementation;
323 thus, requirements keywords are not written in capital letters.
325 Media-1: It must not be necessary for a Clue session to use more than
326 a single transport flow for transport of a given media type (video or
327 audio).
329 Media-2: It must, however, be possible for a Clue session to use
330 multiple transport flows for a given media type where it is
331 considered valuable (for example, for distributed media, or
332 differential quality-of-service).
334 Media-3: It must be possible for a Clue endpoint or MCU to
335 simultaneously send sources corresponding to static captures and to
336 both composited and switched multi-content captures in the same
337 transport flow. (Any given device might not necessarily be able send
338 all of these source types; but for those that can, it must be
339 possible for them to be sent simultaneously.)
341 Media-4: It must be possible for an original source to move among
342 multi-content captures (i.e. at one time be sent for one MCC, and at
343 a later time be sent for another one).
345 Media-5: It must be possible for a source to be placed into a MCC
346 even if the source is a "late joiner", i.e. was added to the
347 conference after the receiver requested the MCC.
349 Media-6: Whenever a given source is assigned to a switched capture,
350 it must be immediately possible for a receiver to determine the MCC
351 it corresponds to, and thus that any previous source is no longer
352 being mapped to that switched capture.
354 Media-7: It must be possible for a receiver to identify the original
355 capture(s) that are currently being mapped to an MCC, and correlate
356 it with both the Clue advertisement and out-of-band (non-Clue)
357 information such as rosters.
359 Media-8: It must be possible for a source to move among MCCs without
360 requiring a refresh of decoder state (e.g., for video, a fresh
361 I-frame), when this is unnecessary. However, it must also be
362 possible for a receiver to indicate when a refresh of decoder state
363 is in fact necessary.
365 Media-9: If a given source is being sent on the same transport flow
366 for more than one reason (e.g. if it corresponds to more than one
367 switched capture at once, or to a static capture), it should be
368 possible for a sender to send only one copy of the source.
370 Media-10: On the network, media flows should, as much as possible,
371 look and behave like currently-defined usages of existing protocols;
372 established semantics of existing protocols must not be redefined.
374 Media-11: The solution should seek to minimize the processing burden
375 for boxes that distribute media to decoding hardware.
377 Media-12: If multiple sources from a single synchronization context
378 are being sent simultaneously, it must be possible for a receiver to
379 associate and synchronize them properly, even for sources that are
380 are mapped to switched captures.
382 4.3. Static Mapping
384 Static mapping is widely used in current MCU implementations. It is
385 also common for a point to point symmetric use case when both
386 endpoints have the same capabilities. For capture encodings with
387 static SSRCs, it is most straightforward to indicate this mapping
388 outside the media stream, in the CLUE or SDP signaling. When using
389 SSRC multiplexing [I-D.ietf-mmusic-sdp-bundle-negotiation] defines
390 the use of the SDP mid attribute value to associate between the
391 received RTP stream and the SDP m-line. The mid is carried as an RTP
392 header extension and RTCP SDES message defined in
393 [I-D.ietf-mmusic-sdp-bundle-negotiation] .
395 4.4. Dynamic mapping
397 Dynamic mapping by tagging each media packet with the SDP mid value.
398 This means that a receiver immediately knows how to interpret
399 received media, even when an unknown SSRC is seen. As long as the
400 media carries a known mid, it can be assumed that this media stream
401 will replace the stream currently being received with the same mid.
403 This gives significant advantages to switching latency, as a switch
404 between sources can be achieved without any form of negotiation with
405 the receiver.
407 However, the disadvantage in using a mid in the stream that it
408 introduces additional processing costs for every media packet, as mid
409 are scoped only within one hop (i.e., within a cascaded conference a
410 mid that is used from the source to the first MCU is not meaningful
411 between two MCUs, or between an MCU and a receiver), and so they may
412 need to be added or modified at every stage.
414 An additional issue with putting mid in the RTP packets comes from
415 cases where a non-bundle aware endpoint is being switched by an MCU
416 to a bundle endpoint. In this case, we may require up to an
417 additional 12 bytes in the RTP header, which may push a media packet
418 over the MTU. However, as the MTU on either side of the switch may
419 not match, it is possible that this could happen even without adding
420 extra data into the RTP packet. The 12 additional bytes per packet
421 could also be a significant bandwidth increase in the case of very
422 low bandwidth audio codecs.
424 4.5. Recommendations
426 The recommendation is that CLUE endpoint using SSRC multiplexing MUST
427 support [[I-D.ietf-mmusic-sdp-bundle-negotiation] and use the SDP mid
428 attribute for mapping.
430 5. Application to CLUE Media Requirements
432 The requirement section Section 4.2 offers a number of requirements
433 that are believed to be necessary for a CLUE RTP mapping. The
434 solutions described in this document are believed to meet these
435 requirements, though some of them are only possible for some of the
436 topologies. (Since the requirements are generally of the form "it
437 must be possible for a sender to do something", this is adequate; a
438 sender which wishes to perform that action needs to choose a topology
439 which allows the behavior it wants.
441 In this section we address only those requirements where the
442 topologies or the association mechanisms treat the requirements
443 differently.
445 Media-4: It must be possible for an original source to move among
446 switched captures (i.e. at one time be sent for one switched capture,
447 and at a later time be sent for another one).
449 This applies naturally for static sources with a Switched Mixer. For
450 dynamic sources with a Selective Forwarding middlebox, this just
451 requires the mid in the header extension element to be updated
452 appropriately.
454 Media-6: Whenever a given source is transmitted for a switched
455 capture, it must be immediately possible for a receiver to determine
456 the switched capture it corresponds to, and thus that any previous
457 source is no longer being mapped to that switched capture.
459 For a Switched Mixer, this applies naturally. For a Selective
460 Forwarding middlebox, this is done based on the mid.
462 Media-7: It must be possible for a receiver to identify the original
463 source that is currently being mapped to a switched capture, and
464 correlate it with out-of-band (non-Clue) information such as rosters.
466 For a Switched Mixer, this is done based on the CSRC, if the mixer is
467 providing CSRCs; For a Selective Forwarding middlebox, this is done
468 based on the SSRC.
470 Media-8: It must be possible for a source to move among switched
471 captures without requiring a refresh of decoder state (e.g., for
472 video, a fresh I-frame), when this is unnecessary. However, it must
473 also be possible for a receiver to indicate when a refresh of decoder
474 state is in fact necessary.
476 This can be done by a Selective Forwarding middlebox, but not by a
477 Switching Mixer. The last requirement can be accomplished through an
478 FIR message [RFC5104], though potentially a faster mechanism (not
479 requiring a round-trip time from the receiver) would be preferable.
481 Media-9: If a given source is being sent on the same transport flow
482 to satisfy more than one capture (e.g. if it corresponds to more than
483 one switched capture at once, or to a static capture as well as a
484 switched capture), it should be possible for a sender to send only
485 one copy of the source.
487 For a Selective Forwarding middlebox, this may be a problem since an
488 encoding can be used by a single MC, it will require using the same
489 SDP label for multiple MC (example middle camera and active speaker
490 MC) this can also be done for an environment with a hybrid of mixer
491 topologies and static and dynamic captures. It is not possible for
492 static captures from a Switched Mixer.
494 Media-12: If multiple sources from a single synchronization context
495 are being sent simultaneously, it must be possible for a receiver to
496 associate and synchronize them properly, even for sources that are
497 mapped to switched captures.
499 For a Mixed or Switched Mixer topology, receivers will see only a
500 single synchronization context (CNAME), corresponding to the mixer.
501 For a Selective Forwarding middlebox, separate projecting sources
502 keep separate synchronization contexts based on their original
503 CNAMEs, thus allowing independent synchronization of sources from
504 independent rooms without needing global synchronization. In hybrid
505 cases, however (e.g. if audio is mixed), all sources which need to be
506 synchronized with the mixed audio must get the same CNAME (and thus a
507 mixer-provided timebase) as the mixed audio.
509 6. CaptureID definition
511 For MCC which can represent multiple switched MCs there is a need to
512 know which MC represents the current RTP stream, requires a mapping
513 from an RTP stream to an MC. In order to address this mapping this
514 document defines an RTP header extension that includes the CaptureID
515 in order to map to the original MC allowing the consumer to use the
516 MC attributes like the spatial information. The media provider MUST
517 send for MCC the captureID of the current MC in the RTP header and as
518 a RTCP SDES message.
520 6.1. RTCP CaptureId SDES Item
522 This document specifies a new RTCP SDES message
523 0 1 2 3
524 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
526 | CaptureId = XXX | length |CaptureId
527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
528 | ....
530 This CaptureID is the same as in the CLUE MC and is also used in the
531 RTP header extension.
533 This SDES message MAY be sent in a compound RTCP packet based on the
534 application need.
536 6.2. RTP Header Extension
538 The CaptureId is carried within the RTP header extension field, using
539 [RFC5285] two bytes header extension.
541 Support is negotiated within the SDP, i.e.
543 a=extmap:1 urn:ietf:params:rtp-hdrext:CaptureId
545 Packets tagged by the sender with the CapturId then contain a header
546 extension as shown below
548 0 1 2 3
549 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
551 | ID | Len-1 | CaptureId
552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
553 | CaptureId .. |
554 +-+-+-+-+-+-+-+-+
556 There is no need to send the CaptureId header extension with all RTP
557 packets. Senders MAY choose to send it only when a new MC is sent.
558 If such a mode is being used, the header extension SHOULD be sent in
559 the first few RTP packets to reduce the risk of losing it due to
560 packet loss.
562 7. Examples
564 In this partial advertisement the Media Provider advertises a
565 composed capture VC7 made by a big picture representing the current
566 speaker (VC3) and two picture-in-picture boxes representing the
567 previous speakers (the previous one -VC5- and the oldest one -VC6).
569
571 CS1
572 true
573
574 VC3
575 VC5
576 VC6
577
578 true
579 true
580 1
581 false
582 big picture of the current speaker
583 pips about previous speakers
584 1
585 it
586 static
587 individual
588
590 In this case the media provider will send capture IDs VC3, VC5 or VC6
591 as an RTP header extension and RTCP SDES message for the RTP stream
592 associated with the MC.
594 8. Acknowledgements
596 The authors would like to thanks Allyn Romanow and Paul Witty for
597 contributing text to this work.
599 9. IANA Considerations
601 This document defines a new extension URI in the RTP Compact Header
602 Extensions subregistry of the Real-Time Transport Protocol (RTP)
603 Parameters registry, according to the following data:
605 Extension URI: urn:ietf:params:rtp-hdrext:CaptureId
607 Description: CLUE CaptureId
609 Contact: roni.even@mail01.huawei.com
611 Reference: RFC XXXX
613 The IANA is requested to register one new RTCP SDES items in the
614 "RTCP SDES Item Types" registry, as follows:
616 Value Abbrev Name Reference
617 TBA CCID CLUE CaptureId [RFCXXXX]
619 10. Security Considerations
621 The security considerations of the RTP specification, the RTP/SAVPF
622 profile, and the various RTP/RTCP extensions and RTP payload formats
623 that form the complete protocol suite described in this memo apply.
624 It is not believed there are any new security considerations
625 resulting from the combination of these various protocol extensions.
627 The Extended Secure RTP Profile for Real-time Transport Control
628 Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides
629 handling of fundamental issues by offering confidentiality, integrity
630 and partial source authentication. A mandatory to support media
631 security solution is created by combining this secured RTP profile
632 and DTLS-SRTP keying [RFC5764]
634 RTCP packets convey a Canonical Name (CNAME) identifier that is used
635 to associate RTP packet streams that need to be synchronised across
636 related RTP sessions. Inappropriate choice of CNAME values can be a
637 privacy concern, since long-term persistent CNAME identifiers can be
638 used to track users across multiple calls. This memo mandates
639 generation of short-term persistent RTCP CNAMES, as specified in
640 RFC7022 [RFC7022], resulting in untraceable CNAME values that
641 alleviate this risk.
643 Some potential denial of service attacks exist if the RTCP reporting
644 interval is configured to an inappropriate value. This could be done
645 by configuring the RTCP bandwidth fraction to an excessively large or
646 small value using the SDP "b=RR:" or "b=RS:" lines [RFC3556], or some
647 similar mechanism, or by choosing an excessively large or small value
648 for the RTP/AVPF minimal receiver report interval (if using SDP, this
649 is the "a=rtcp-fb:... trr-int" parameter) [RFC4585] The risks are as
650 follows:
652 1. the RTCP bandwidth could be configured to make the regular
653 reporting interval so large that effective congestion control
654 cannot be maintained, potentially leading to denial of service
655 due to congestion caused by the media traffic;
657 2. the RTCP interval could be configured to a very small value,
658 causing endpoints to generate high rate RTCP traffic, potentially
659 leading to denial of service due to the non-congestion controlled
660 RTCP traffic; and
662 3. RTCP parameters could be configured differently for each
663 endpoint, with some of the endpoints using a large reporting
664 interval and some using a smaller interval, leading to denial of
665 service due to premature participant timeouts due to mismatched
666 timeout periods which are based on the reporting interval (this
667 is a particular concern if endpoints use a small but non-zero
668 value for the RTP/AVPF minimal receiver report interval (trr-int)
669 [RFC4585], as discussed in [I-D.ietf-avtcore-rtp-multi-stream]).
671 Premature participant timeout can be avoided by using the fixed (non-
672 reduced) minimum interval when calculating the participant timeout
673 ([I-D.ietf-avtcore-rtp-multi-stream]). To address the other
674 concerns, endpoints SHOULD ignore parameters that configure the RTCP
675 reporting interval to be significantly longer than the default five
676 second interval specified in [RFC3550] (unless the media data rate is
677 so low that the longer reporting interval roughly corresponds to 5%
678 of the media data rate), or that configure the RTCP reporting
679 interval small enough that the RTCP bandwidth would exceed the media
680 bandwidth.
682 The guidelines in [RFC6562] apply when using variable bit rate (VBR)
683 audio codecs such as Opus. The use of the encryption of the header
684 extensions are RECOMMENDED, unless there are known reasons, like RTP
685 middleboxes performing voice activity based source selection or third
686 party monitoring that will greatly benefit from the information, and
687 this has been expressed using API or signalling. If further evidence
688 are produced to show that information leakage is significant from
689 audio level indications, then use of encryption needs to be mandated
690 at that time.
692 In multi-party communication scenarios using RTP Middleboxes, a lot
693 of trust is placed on these middleboxes to preserve the sessions
694 security. The middlebox needs to maintain the confidentiality,
695 integrity and perform source authentication. The middlebox can
696 perform checks that prevents any endpoint participating in a
697 conference to impersonate another. Some additional security
698 considerations regarding multi-party topologies can be found in
699 [RFC7667]
701 11. References
703 11.1. Normative References
705 [I-D.ietf-clue-framework]
706 Duckworth, M., Pepperell, A., and S. Wenger, "Framework
707 for Telepresence Multi-Streams", draft-ietf-clue-
708 framework-25 (work in progress), January 2016.
710 [I-D.ietf-mmusic-sdp-bundle-negotiation]
711 Holmberg, C., Alvestrand, H., and C. Jennings,
712 "Negotiating Media Multiplexing Using the Session
713 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle-
714 negotiation-24 (work in progress), January 2016.
716 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
717 Requirement Levels", BCP 14, RFC 2119,
718 DOI 10.17487/RFC2119, March 1997,
719 .
721 11.2. Informative References
723 [I-D.ietf-avtcore-rtp-multi-stream]
724 Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
725 "Sending Multiple Media Streams in a Single RTP Session",
726 draft-ietf-avtcore-rtp-multi-stream-11 (work in progress),
727 December 2015.
729 [I-D.ietf-clue-signaling]
730 Kyzivat, P., Xiao, L., Groves, C., and R. Hansen, "CLUE
731 Signaling", draft-ietf-clue-signaling-06 (work in
732 progress), August 2015.
734 [I-D.ietf-mmusic-sdp-simulcast]
735 Westerlund, M., Nandakumar, S., and M. Zanaty, "Using
736 Simulcast in SDP and RTP Sessions", draft-ietf-mmusic-sdp-
737 simulcast-03 (work in progress), October 2015.
739 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
740 with Session Description Protocol (SDP)", RFC 3264,
741 DOI 10.17487/RFC3264, June 2002,
742 .
744 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
745 Jacobson, "RTP: A Transport Protocol for Real-Time
746 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
747 July 2003, .
749 [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth
750 Modifiers for RTP Control Protocol (RTCP) Bandwidth",
751 RFC 3556, DOI 10.17487/RFC3556, July 2003,
752 .
754 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
755 Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
756 July 2006, .
758 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
759 Session Initiation Protocol (SIP) Event Package for
760 Conference State", RFC 4575, DOI 10.17487/RFC4575, August
761 2006, .
763 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
764 "Extended RTP Profile for Real-time Transport Control
765 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
766 DOI 10.17487/RFC4585, July 2006,
767 .
769 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description
770 Protocol (SDP) Content Attribute", RFC 4796,
771 DOI 10.17487/RFC4796, February 2007,
772 .
774 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
775 "Codec Control Messages in the RTP Audio-Visual Profile
776 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
777 February 2008, .
779 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
780 Real-time Transport Control Protocol (RTCP)-Based Feedback
781 (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
782 2008, .
784 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP
785 Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July
786 2008, .
788 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
789 Media Attributes in the Session Description Protocol
790 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
791 .
793 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
794 Security (DTLS) Extension to Establish Keys for the Secure
795 Real-time Transport Protocol (SRTP)", RFC 5764,
796 DOI 10.17487/RFC5764, May 2010,
797 .
799 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image
800 Attributes in the Session Description Protocol (SDP)",
801 RFC 6236, DOI 10.17487/RFC6236, May 2011,
802 .
804 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of
805 Variable Bit Rate Audio with Secure RTP", RFC 6562,
806 DOI 10.17487/RFC6562, March 2012,
807 .
809 [RFC7022] Begen, A., Perkins, C., Wing, D., and E. Rescorla,
810 "Guidelines for Choosing RTP Control Protocol (RTCP)
811 Canonical Names (CNAMEs)", RFC 7022, DOI 10.17487/RFC7022,
812 September 2013, .
814 [RFC7205] Romanow, A., Botzko, S., Duckworth, M., and R. Even, Ed.,
815 "Use Cases for Telepresence Multistreams", RFC 7205,
816 DOI 10.17487/RFC7205, April 2014,
817 .
819 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
820 DOI 10.17487/RFC7667, November 2015,
821 .
823 Authors' Addresses
825 Roni Even
826 Huawei Technologies
827 Tel Aviv
828 Israel
830 Email: roni.even@mail01.huawei.com
832 Jonathan Lennox
833 Vidyo, Inc.
834 433 Hackensack Avenue
835 Seventh Floor
836 Hackensack, NJ 07601
837 US
839 Email: jonathan@vidyo.com