idnits 2.17.1 draft-westerlund-avtcore-transport-multiplexing-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 21, 2013) is 3839 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5234' is defined on line 1013, but no explicit reference was found in the text == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-13) exists of draft-ietf-avtcore-multi-media-rtp-session-03 == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-multiplex-guidelines-01 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft Ericsson 4 Intended status: Standards Track C. S. Perkins 5 Expires: April 24, 2014 University of Glasgow 6 October 21, 2013 8 Multiplexing Multiple RTP Sessions onto a Single Lower-Layer Transport 9 draft-westerlund-avtcore-transport-multiplexing-07 11 Abstract 13 This memo defines a mechanism to allow multiple RTP sessions to be 14 multiplexed onto a single lower-layer transport flow (e.g., onto a 15 single UDP 5-tuple). Requirements for multiplexing RTP sessions are 16 discussed, along with the trade-off between the different options. A 17 shim-based multiplexing layer is proposed, along with associated 18 signalling. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 24, 2014. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 6 58 5. Design Considerations . . . . . . . . . . . . . . . . . . . . 8 59 5.1. Location of Multiplexing Shim Header . . . . . . . . . . 9 60 5.2. ICE and DTLS-SRTP Integration . . . . . . . . . . . . . . 10 61 5.3. Signalling Fall Back . . . . . . . . . . . . . . . . . . 10 62 6. Specification . . . . . . . . . . . . . . . . . . . . . . . . 11 63 6.1. Shim Layer . . . . . . . . . . . . . . . . . . . . . . . 11 64 6.2. Signalling . . . . . . . . . . . . . . . . . . . . . . . 15 65 6.3. SRTP Key Management . . . . . . . . . . . . . . . . . . . 16 66 6.3.1. Security Description . . . . . . . . . . . . . . . . 16 67 6.3.2. DTLS-SRTP . . . . . . . . . . . . . . . . . . . . . . 17 68 6.3.3. MIKEY . . . . . . . . . . . . . . . . . . . . . . . . 17 69 6.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . 18 70 6.4.1. Secure RTP Packet with Multiplexing Shim . . . . . . 18 71 6.4.2. Basic RTP Multiplex Negotiation in SDP . . . . . . . 19 72 6.4.3. Advanced RTP Multiplex Negotiation in SDP . . . . . . 20 73 7. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 20 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 75 9. Security Considerations . . . . . . . . . . . . . . . . . . . 21 76 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 77 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 78 11.1. Normative References . . . . . . . . . . . . . . . . . . 22 79 11.2. Informational References . . . . . . . . . . . . . . . . 22 80 Appendix A. Possible Solutions . . . . . . . . . . . . . . . . . 24 81 A.1. Header Extension . . . . . . . . . . . . . . . . . . . . 24 82 A.2. Multiplexing Shim . . . . . . . . . . . . . . . . . . . . 25 83 A.3. Single Session . . . . . . . . . . . . . . . . . . . . . 26 84 A.4. Use the SRTP MKI field . . . . . . . . . . . . . . . . . 27 85 A.5. Use an Octet in the Padding . . . . . . . . . . . . . . . 28 86 A.6. Redefine the SSRC field . . . . . . . . . . . . . . . . . 28 87 Appendix B. Comparison . . . . . . . . . . . . . . . . . . . . . 29 88 B.1. Support of Multiple RTP Sessions Over Single Transport . 29 89 B.2. Enable Same SSRC Value in Multiple RTP Sessions . . . . . 29 90 B.2.1. Avoid SSRC Translation in Gateways/Translation . . . 29 91 B.2.2. Support Existing Extensions . . . . . . . . . . . . . 30 92 B.3. Ensure SRTP Functions . . . . . . . . . . . . . . . . . . 30 93 B.4. Don't Redefine Used Bits . . . . . . . . . . . . . . . . 31 94 B.5. Firewall Friendly . . . . . . . . . . . . . . . . . . . . 32 95 B.6. Monitoring and Reporting . . . . . . . . . . . . . . . . 33 96 B.7. Usable over Multicast . . . . . . . . . . . . . . . . . . 34 97 B.8. Incremental Deployment . . . . . . . . . . . . . . . . . 34 98 B.9. Summary and Conclusion . . . . . . . . . . . . . . . . . 36 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 101 1. Introduction 103 With the ongoing development of the WebRTC conferencing and CLUE 104 telepresence standards, there is renewed interest in defining a 105 mechanism that allows multiple RTP sessions [RFC3550] to share a 106 single lower layer transport, such as a bi-directional UDP flow. The 107 main problem driving this is the cost of doing NAT/firewall traversal 108 for each individual RTP flow. ICE and other NAT/firewall traversal 109 solutions are clearly capable of attempting to open multiple flows. 110 However, there is both increased risk for failure, and an increased 111 cost in the creation of multiple flows. The increased cost comes as 112 slightly higher delay in establishing the traversal, and the amount 113 of consumed NAT/firewall resources. The latter might be an 114 increasing problem in the IPv4 to IPv6 transition period. 116 There is ongoing work on specifying how and when one RTP session can 117 contain multiple media types 118 [I-D.ietf-avtcore-multi-media-rtp-session]. That addresses certain 119 use cases, while this proposal addresses a different set of use cases 120 and motivations (discussed further in Section 3). The classical 121 method of having each RTP session run over a specific transport flow 122 is still motivated for a number of use cases, especially when flow 123 based QoS is to be used for some media streams. 125 This memo draws up some requirements for consideration on how to 126 transport multiple RTP sessions over a single lower-layer transport. 127 These requirements have to be weighted carefully, as no known 128 solution exists that can fulfil the combined set of requirements 129 completely. A number of possible solutions where considered and 130 discussed with respect to their properties. Based on that, this memo 131 defines a multiplexing shim, along with SDP signalling, and examples. 132 The other considered proposals and the comparison is available as 133 appendices. 135 2. Terminology 137 Unless specifically noted, all mentioning of multiplexing in this 138 memo refer to the multiplexing of multiple RTP Sessions onto the same 139 lower layer transport. It is important to make this distinction as 140 RTP contains a number of multiplexing points for various purposes, 141 such as media formats (Payload Type), media sources (SSRC), and RTP 142 sessions. 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in RFC 2119 [RFC2119]. 148 3. Motivation 150 RTP has always allowed applications to use of multiple RTP sessions, 151 by using different transport-layer flows for each session [RFC3550]. 152 The primary motivation was to support differential quality of service 153 per session, using flow-level differentiated services mechanisms, but 154 it also lets applications separate flows into several RTP sessions to 155 better reflect application-level semantics where appropriate. 157 More recently, there has been a desire to send multiple types of 158 media in a single RTP session. This uses one RTP session instead of 159 several RTP sessions, giving up flow-level quality of service, and 160 semantic separation of traffic, but reducing the number of transport 161 level flows to ease NAT and firewall traversal. Clarifications to 162 the RTP specification to support this can be found in 163 [I-D.ietf-avtcore-multi-media-rtp-session]. 165 There is also a third option that can be useful in some cases. This 166 is to somehow multiplex several RTP sessions onto a single transport 167 layer flow. The motivations for why this alternative is needed are 168 as follows. 170 To Ease NAT and Firewall Traversal: The existence of network address 171 translation (NAT/NAPT) and firewalls on almost all Internet access 172 has implications for protocols, such as RTP, that were designed to 173 use multiple transport-layer flows. Any NAT or firewall traversal 174 solution has to to ensure that all the necessary transport-layer 175 flows are established. This has three impacts: 177 1. Increased delay to perform the transport flow establishment 179 2. The more transport flows, the more state and the more resource 180 consumption in the NAT and Firewalls. When the resource 181 consumption in NAT/firewalls reaches their limits, unexpected 182 behaviours usually occur. Commonly resulting in service 183 disruptions. 185 3. More transport flows means a higher risk that some transport 186 flow fails to be established, thus preventing the application 187 to communicate. 189 Using fewer transport-layer flows, by multiplexing several RTP 190 sessions onto a single transport-layer flow, reduces the risk of 191 communication failure, improves establishment behaviour, and 192 reduces the load on NATs and firewalls. 194 To Support Application-level Session-layer Semantics: Applications 195 can use multiple RTP sessions to separate media streams that have 196 different uses or purposes. For example, a group conferencing 197 application might use one RTP session to distribute high-quality 198 video of the active speaker, switching the source of that video as 199 the conversation progresses, coupled with a second RTP session to 200 send always-on low-quality views of the inactive speakers, making 201 it easier of the MCU to manage the traffic. Separation of flows 202 into different RTP sessions also allows different processing based 203 on the media type, such as audio and video, in end-points and 204 middleboxes. This can give middleboxes the knowledge that any 205 SSRC within the session is supposed to be processed in a similar 206 way, saving them the need to perform differential processing on a 207 per-SSRC basis. 209 Not all applications need to separate their traffic into different 210 semantic classes. And, for those that do, it is clearly possible 211 to find other multiplexing solutions for many simpler cases, for 212 example based on signalled semantics for SSRC, or looking at the 213 payload type and differences in encoding. This lack of semantic 214 separation for some flows becomes more critical as the application 215 semantics get more complex. For example, an application that has 216 one set of video streams showing session participants, and another 217 set that shares an application or presentation slides, would 218 likely want to separate those streams for reasons such as control, 219 prioritization, QoS, methods for robustness, etc. In those cases, 220 using the RTP session for separation of flows with different 221 semantics is a powerful tool that can ease the application design, 222 and something that we would like to preserve when providing a 223 solution for how to use only a single lower-layer transport. 225 Multiplexing and the use of different RTP session is discussed 226 further in [I-D.ietf-avtcore-multiplex-guidelines]. 228 To Allow Use of Certain RTP Extensions: Different applications use 229 different sets of RTP extensions. Several of these extensions are 230 known to have limitations that prevent them from being used in RTP 231 sessions that carry different types of media. This is discussed 232 more in [I-D.ietf-avtcore-multi-media-rtp-session]. The 233 extensions that are known to be problematic include parity FEC 234 [RFC5109], RTP Retransmission in session mode [RFC4588], and some 235 forms of layered coding. This prevents some applications from 236 sending multiple types of media in a single RTP session, forcing 237 them to use multiple RTP sessions. To prevent those applications 238 from having to use several transport-layer flows for the different 239 RTP sessions, it is desirable to have a way of multiplexing 240 several RTP sessions on a single transport-layer flow. 242 The centre of the motivation is to ensure that the use of multiple 243 RTP sessions is available, and usable, for applications that have no 244 need for transport-layer separation of their media streams and want 245 to reduce their exposure to any NAT or Firewall inconsistencies and 246 minimize the resource consumption. As a benefit, a well designed 247 solution will remove the limitations on what existing RTP mechanisms 248 or extensions that can be used by the application, when compared to 249 sending multiple media types in a single RTP session. 251 4. Requirements 253 This section lists and discusses a number of potential requirements. 254 However, it is not difficult to realize that it is in fact possible 255 to put requirements that makes the set of feasible solutions an empty 256 set. It is thus necessary to consider which requirements that are 257 essential to fulfil and which can be compromised on to arrive at a 258 solution. 260 Support Use of Multiple RTP Sessions: As stated in the RTP 261 specification [RFC3550], "The distinguishing feature of an RTP 262 session is that each maintains a full, separate space of SSRC 263 identifiers [...]. The set of participants included in one RTP 264 session consists of those that can receive an SSRC identifier 265 transmitted by any one of the participants either in RTP as the 266 SSRC or a CSRC [...] or in RTCP". Accordingly, any mechanism to 267 multiplex several RTP sessions onto a single transport-layer flow 268 needs to allow each RTP session to use the complete SSRC space, 269 independent of any other RTP sessions multiplexed onto that 270 transport-layer flow. 272 As a corollary of the above, two different RTP sessions that are 273 being multiplexed onto the same transport-layer flow need to be 274 able to use the same SSRC value. This is a absolute requirement, 275 for two reasons. Firstly, to avoid mandating SSRC assignment 276 rules that are coordinated between the sessions. If the RTP 277 sessions multiplexed together need to have unique SSRC values, 278 then additional code that works between RTP Sessions is needed in 279 the implementations. Thus raising the bar for implementing this 280 solution. In addition, if one gateways between parts of a system 281 using this multiplexing and parts that aren't multiplexing, the 282 part that isn't multiplexing also needs to fulfil the requirements 283 on how SSRC is assigned or force the gateway to translate SSRCs. 284 Translating SSRC is actually hard as it requires one to understand 285 the semantics of all current and future RTP and RTCP extensions. 286 Otherwise a barrier for deploying new extensions is created. 287 Second, there are some few RTP extensions that currently rely on 288 being able to use the same SSRC in different RTP sessions, 289 including parity FEC [RFC5109], RTP Retransmission in session mode 290 [RFC4588], and some forms of layered coding. 292 Support the Secure RTP (SRTP) Profile: SRTP [RFC3711] is one of the 293 most commonly used security solutions for RTP. In addition, it is 294 the only one defined by IETF that is integrated into RTP. This 295 integration has several aspects that needs to be considered when 296 designing a solution for multiplexing RTP sessions on the same 297 lower layer transport. 299 Determining Crypto Context: SRTP first of all needs to know which 300 session context a received or to-be-sent packet relates to. 301 It also normally relies on the lower layer transport to 302 identify the session. It uses the Master Key Indicator 303 (MKI), if present, to determine which key set is to be used. 304 Then the SSRC and sequence number are used by most crypto 305 suites, including the most common use of AES Counter Mode, 306 to actually generate the correct cipher stream. 308 Unencrypted Headers: SRTP has chosen to leave the RTP headers and 309 the first two 32-bit words of the first RTCP header 310 unencrypted, to allow for both header compression and 311 monitoring to work also in the presence of encryption. As 312 these fields are in clear text they are used in most crypto 313 suites for SRTP to determine how to protect or recover the 314 plain text. 316 It is here important to contrast SRTP against a set of other 317 possible protection mechanisms. DTLS, TLS, and IPsec are all 318 protecting and encapsulating the entire RTP and RTCP packets. 319 They don't perform any partial operations on the RTP and RTCP 320 packets. Any change that is considered to be part of the RTP and 321 RTCP packet is transparent to them, but possibly not to SRTP. 322 Thus the impact on SRTP operations has to be considered when 323 defining a mechanism. 325 Support Legacy Implementations of RTP and RTCP: The core of RTP is 326 in use in many systems, and has an extremely large deployed base 327 with numerous implementations. Changing any of the RTP or RTCP 328 packet definitions, outside of defined extension points, is highly 329 problematic. First of all, the implementations need to change to 330 support this new semantics. Secondly, you get a large transition 331 period when you have some session participants that support the 332 new semantics and some that don't. Combing the two behaviours in 333 the same session can force the deployment of costly and less than 334 perfect translation devices. 336 Support NAT and Firewall Traversal: It is desirable that current NAT 337 devices, firewalls, and application level gateways will accept 338 multiplexed packets from several RTP sessions as they accept 339 normal RTP packets. However, in the authors' opinion we can't let 340 the firewall stifle invention and evolution of the protocol. It 341 is also necessary to be aware that a change that will make most 342 deep inspecting firewall consider the packet as not valid RTP/RTCP 343 will have a more difficult deployment story. 345 Support Monitors and Reporting Tools: It is desirable that a third 346 party monitor can still operate on the multiplexed RTP Sessions. 347 It is however likely that they will require an update to correctly 348 monitor and report on multiplexed RTP Sessions. 350 Another type of function to consider is packet sniffers and their 351 selector filters. These can be impacted by a change of the 352 fields. An observation is that many such systems are usually 353 quite rapidly updated to consider new types of standardized or 354 simply common packet formats. 356 Support Use of IP Multicast: It is desirable that a solution can be 357 used if RTP and RTCP packets are sent over multicast, both Any 358 Source Multicast (ASM) and Single Source Multicast (SSM). The 359 reason for this requirement is to allow a system using RTP to use 360 the same configuration regardless of the transport being done over 361 unicast or multicast. In addition, multicast can't be claimed to 362 have an issue with using multiple ports, as each multicast group 363 has a complete port space scoped by address. 365 Support Incremental Deployment: A good solution has the property 366 that in topologies that contains RTP mixers or Translators, a 367 single session participant can enable multiplexing without having 368 any impact on any other session participants. Thus a node ought 369 to be able to take a multiplexed packet and then easily send it 370 out with minimal or no modification on another leg of the session, 371 where each RTP session is transported over its own lower-layer 372 transport. It also needs to be as easy to do the reverse 373 forwarding operation. 375 5. Design Considerations 377 We propose a solution based around a shim layer, inserted between the 378 transport layer headers and the RTP layer headers, to demultiplex 379 separate RTP sessions. The design rationale for using a shim layer 380 header, as opposed to other demultiplexing points, is discussed in 381 Appendix A. In the following we discuss design considerations 382 regarding placement and use of the shim layer header. 384 5.1. Location of Multiplexing Shim Header 386 A major question affecting the SHIM is the location of the SHIM 387 header providing the Identifier of the session the packet relate to. 388 This section will discuss in detail about the impact of making the 389 different choices. 391 Identified aspects to consider are: 393 Possibility to Process: A prefixed shim header, i.e. between the 394 transport protocol and the RTP/RTCP packet header has the 395 advantage that any node on the network that likes to include the 396 header in any per-packet processing can reach it. Reasons for 397 per-packet processing are: 399 a. Quality of Service classification 401 b. SHIM ingress or egress 403 c. Monitoring 405 Many routers or similar devices can only read and process the 406 first N bytes of the whole packet, where N is commonly on the 407 order of 64-128 bytes. Any other type of processing means putting 408 the packet on the slow path. Thus a prefixed solution enables 409 this processing while a postfixed solution will most likely 410 forever prevent this type of devices to process it. 412 Legacy Processing: RTP packets contain very few fixed bits and are 413 difficult to distinguish using deep packet inspection without 414 access to the signalling channel, or without keeping per-flow 415 state to correlate changes in the (presumed) RTP headers across 416 packets to gain confidence that the flow is of the expected type. 417 Firewalls, application-level gateways, and other network entities 418 that concern themselves with trying to track RTP flows will need 419 to be updated. This can create a barrier to deployment. Using a 420 postfix shim likely gives the least resistance for initial 421 deployment. However, even with a postfix shim, deployment can be 422 hindered when multiple RTP sessions using the same SSRC values, 423 since this will appear to give irregular behaviour of the fields 424 for what the third party believes is one media stream, when it is 425 actually several multiple streams. The use of a prefixed shim 426 will however maintain the long-term capabilities of such devices 427 assuming they can be updated to include the SHIM header as part of 428 the classification. 430 Header Compression: The different header compression techniques that 431 has been developed compresses IP/UDP/RTP as complete combination. 432 If one instead have a IP/UDP/SHIM/RTP then the compression for the 433 full set might not work or poorly. Instead only IP/UDP header 434 compression is likely to be applied. Thus a prefix will loose 435 some compression efficiency until compression profiles for IP/UDP/ 436 SHIM/RTP has been developed, implemented and deployed. Postfix 437 don't have that issue, but nor can it ever gain anything from 438 header compression which an prefixed solution could once an 439 updated profile is deployed. Postfix also will have reduced 440 efficiency compressing sessions when the same SSRC is used in two 441 different RTP sessions as the RTP header fields like sequence 442 number, etc., will not behave as expected and need frequent 443 explicit updates. 445 The question of a prefixed or a postfixed shim header comes down to a 446 trade-off between long term usability and deployment issues. A 447 prefixed shim offers a good long term possibility to adapt any 448 network function that needs to take the shim header into account, but 449 at the same time any function that tries to analyse packets might 450 block the packets and hinder deployment. A postfixed shim will 451 likely have the best short-term deployment possibilities, but long 452 term this choice will likely prevent many network nodes that like to 453 be capable of separating the RTP sessions being multiplexed together 454 from successfully doing that. After discussion in the working group 455 it has been determined that a prefixed shim is the preferred 456 solution. 458 5.2. ICE and DTLS-SRTP Integration 460 When using ICE [RFC5245] or DTLS-SRTP [RFC5764] or both with RTP 461 there exist the issue that RTP, STUN [RFC5389] and DTLS-SRTP are 462 simultaneously in use over the same lower layer transport flow, like 463 UDP. This multiplexing is based on the value of the first byte of 464 the lower layer transport payload as discussed in Section 5.1.2 of 465 DTLS-SRTP [RFC5764]. 467 The replacement of a single RTP session with the multiple RTP 468 sessions identified by a SHIM ought not be misidentified to be either 469 STUN or DTLS-SRTP or any other protocol intending to take the 470 available free code-points in the range 193-255 (Decimal). Thus a 471 prefixed SHIM needs to have its first byte have the two first bits 472 set to 10 (Binary). Having the SHIM share the identity of RTP is not 473 an issue as there has to be mutual agreement that the SHIM is used 474 instead of RTP. 476 5.3. Signalling Fall Back 477 Both SIP and WebRTC applications use SDP signalling to describe the 478 RTP sessions and transport layer connections used in a call. It is 479 therefore necessary to consider how to signal multiple RTP sessions 480 multiplexed onto a single lower layer transport within SDP. It is 481 also important to consider backwards compatibility with any legacy 482 applications that do not understand any proposed SDP extension. 484 An SDP session description is built up using media ("m=") lines 485 describing media flows, with associated connection ("c=") lines 486 describing the transport layer flows. In the usual offer/answer use 487 of SDP the communicating parties use a single c= line to represent 488 the IP-layer path, with one m= line per type of media, running each 489 type of media on a separate transport layer port, and hence a 490 separate RTP session. This gives a clean separation of RTP sessions, 491 but requires multiple transport layer flows to be used, complicating 492 NAT/firewall traversal. 494 The SDP bundle extension [I-D.ietf-mmusic-sdp-bundle-negotiation] 495 provides a way to signal that several m= lines are to be bundled 496 together into a single RTP session running on a single transport 497 layer port. This is essentially the opposite semantic to the one we 498 want: it combines seemingly disparate RTP sessions into one using a 499 single transport layer flow, while we seek to use a single transport 500 layer flow, but keep the sessions separate. Accordingly, we do not 501 re-use the bundle mechanism. 503 We do, however, want to allow the case where an application would 504 prefer to use separate RTP sessions multiplexed over a single lower 505 layer transport, because that simplifies processing, but fall back to 506 using the bundle mechanism if necessary. Similarly, fall back to 507 using separate RTP sessions on separate transport layer flows needs 508 to be supported. 510 6. Specification 512 This section contains the specification of the RTP session 513 multiplexing SHIM, using an explicit session identifier of the 514 encapsulated payload. 516 6.1. Shim Layer 518 This solution is based on a shim layer that is inserted in the stack 519 between the RTP and RTCP packets and the transport layer being used 520 by the RTP sessions. Thus the layering is as shown in Figure 1. 522 +-------------------------+ 523 | RTP / RTCP Packet | 524 +-------------------------+ 525 | Session ID Layer | 526 +-------------------------+ 527 | Transport Layer Header | 528 +-------------------------+ 529 | Network Layer Header | 530 +-------------------------+ 532 Figure 1: Stack view with session ID layer shim 534 The above stack is in fact a layered one as it does allow multiple 535 RTP Sessions to be multiplexed on top of the Session ID shim layer. 536 This enables the example presented in Figure 2 where four sessions, 537 S1-S4, are sent over the same Transport layer, and where the Session 538 ID layer will combine and encapsulate them with the session ID on 539 transmission and separate and decapsulate them on reception. 541 +-------------------------+ 542 | S1 | S2 | S3 | S4 | 543 +-------------------------+ 544 | Session ID Layer | 545 +-------------------------+ 546 | Transport Layer Header | 547 +-------------------------+ 548 | Network Layer Header | 549 +-------------------------+ 551 Figure 2: Example with four RTP sessions on top of session ID layer 553 The Session ID layer encapsulates one RTP or RTCP packet from a given 554 RTP session and prefixes a 4-octet Session ID layer shim header to 555 the packet. The Session ID layer shim header is depicted in Figure 3 556 and comprises a 2 bit fixed header (10b), 14 reserved bits, and a 16 557 bits unsigned integer field with the Session ID (SID) value. 559 0 1 2 3 560 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 |1 0| reserved | Session ID (SID) | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 Figure 3: Session ID layer shim header 567 Each RTP session being multiplexed on top of a given transport layer 568 is assigned either a single or a pair of unique SID in the range 569 0-65535. The reason for assigning a pair of SIDs to a given RTP 570 session are for RTP Sessions that doesn't support "Multiplexing RTP 571 Data and Control Packets on a Single Port" [RFC5761] to still be able 572 to use a single 5-tuple. The reasons for supporting this extra 573 functionality is that RTP and RTCP multiplexing based on the payload 574 type/packet type fields enforces certain restrictions on the RTP 575 sessions. These restrictions might not be acceptable. As this 576 solution does not have these restrictions, performing RTP and RTCP 577 multiplexing in this way has benefits. 579 Each Session ID value space is scoped by the underlying transport 580 protocol. Common transport protocols like UDP [RFC0768], DCCP 581 [RFC4340], TCP [RFC0793], and SCTP [RFC4960] can all be scoped by one 582 or more 5-tuple (Transport protocol, source address and port, 583 destination address and port). The case of multiple 5-tuples occur 584 in the case of multi-unicast topologies, also called meshed 585 multiparty RTP sessions or in case any application would need more 586 than 32768 RTP sessions. 588 0 1 2 3 589 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 |1 0| reserved | Session ID (SID) | 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 593 |V=2|P|X| CC |M| PT | sequence number | | 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 595 | timestamp | | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 597 | synchronization source (SSRC) identifier | | 598 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 599 | contributing source (CSRC) identifiers | | 600 | .... | | 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 602 | RTP extension (OPTIONAL) | | 603 +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 604 | | payload ... | | 605 | | +-------------------------------+ | 606 | | | RTP padding | RTP pad count | | 607 +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 608 | ~ SRTP MKI (OPTIONAL) ~ | 609 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 610 | : authentication tag (RECOMMENDED) : | 611 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 612 +- Encrypted Portion* Authenticated Portion ---+ 614 Figure 4: SRTP Packet encapsulated by Session ID Layer 616 0 1 2 3 617 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 |1 0| reserved | Session ID (SID) | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 621 |V=2|P| RC | PT=SR or RR | length | | 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 623 | SSRC of sender | | 624 +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 625 | ~ sender info ~ | 626 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 627 | ~ report block 1 ~ | 628 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 629 | ~ report block 2 ~ | 630 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 631 | ~ ... ~ | 632 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 633 | |V=2|P| SC | PT=SDES=202 | length | | 634 | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 635 | | SSRC/CSRC_1 | | 636 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 637 | ~ SDES items ~ | 638 | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 639 | ~ ... ~ | 640 +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 641 | |E| SRTCP index | | 642 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 643 | ~ SRTCP MKI (OPTIONAL) ~ | 644 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 645 | : authentication tag : | 646 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 647 +-- Encrypted Portion Authenticated Portion -----+ 649 Figure 5: SRTCP packet encapsulated by Session ID layer 651 The processing in a receiver when the Session ID layer is present 652 will be to 654 1. Pick up the packet from the lower layer transport 656 2. Inspect the SID field value 658 3. Strip the SID field from the packet 660 4. Forward it to the (S)RTP Session context identified by the SID 661 value 663 6.2. Signalling 665 There are several aspects to negotiating the use of multiple RTP 666 sessions multiplexing onto a single transport layer flow within SDP. 667 Firstly, the SDP offer needs to indicate the desire the use the shim- 668 based multiplexing scheme and suggest a transport layer port for the 669 multiplex. Then, if the answering party agrees to use the shim, they 670 need to agree on the transport layer port to use, and assign session 671 ID values for the individual RTP sessions. This all needs to be done 672 in a manner that allows graceful fall back to separate RTP sessions, 673 or a single bundled RTP session. 675 This section defines how to negotiate the use of the Session ID shim 676 layer, using the SDP [RFC4566] offer/answer model [RFC3264]. A new 677 SDP grouping semantics is defined, "SHIM", along with a new media 678 type to represent the shim layer. The grouping semantics allow each 679 media description ("m=" line) associated with a 'SHIM' group to be 680 identified, and associated with the multiplexed transport flow. 682 When it is desired to use multiple RTP sessions multiplexed over a 683 single lower layer transport flow, the SDP offer will contain one 684 "m=" line for each RTP session, plus one additional "m=" line 685 representing the transport layer flow to be used for the multiplex. 686 The "m=" lines that represent the media will flows be created as-if 687 the multiplex was not present, with transport layer ports assigned in 688 the usual manner. The "m=" line representing the multiplex will also 689 have a transport layer port assigned, and will use the "application/ 690 rtp-shim" media type running over UDP (i.e., it will be signalled as 691 "m=application udp rtp-shim" in the SDP). All the "m=" lines 692 representing the media flows and the multiplexing shim will be part 693 of an SDP group, with "SHIM" semantics. 695 There MUST be exactly one "m=" line representing an RTP multiplex in 696 each "SHIM" group in the SDP offer. If the offer contains more than 697 one "m=" line representing an RTP multiplex in a single "SHIM" group, 698 then the answering party MUST reject all the RTP multiplexes in that 699 "SHIM" group. A "SHIM" group that does not include any "m=" line 700 representing an RTP multiplex is malformed; the answering party MUST 701 reject all "m=" lines in that "SHIM" group. 703 If the answering party does not understand, or does not want to use, 704 the RTP multiplexing shim, it will reject the "m=" line for the flow 705 representing the multiplex. This is be done by setting the port for 706 that "m=" line to zero in the answer. The endpoints will then fall 707 back to using separate RTP sessions for each "m=" line, with separate 708 transport layer flows for each on the assigned ports. 710 If the answering party chooses to use the multiplexing shim, it will 711 return an answer that includes a valid port for the multiplex. The 712 ports for the other media lines in the SHIM group that the answering 713 party wants to accept MUST be set to port 9 (the discard port) to 714 indicate that the media for those ports is to be sent as part of the 715 multiplex (the intuition is that the separate port is discarded, and 716 only the multiplex remains). Ports for some "m=" lines in the SHIM 717 group MAY be set to zero to reject some or all of the flows in the 718 group. 720 (tbd: it is an open issue whether the answering party is allowed 721 to accept some "m=" lines from the SHIM group into the multiplex 722 while sending others as separate flows on their own ports) 724 If the multiplex was accepted, multiplexed media corresponding to the 725 "m=" lines whose port was set to 9 in the answer will start to flow. 726 This multiplexed media MUST use the shim on the transport layer ports 727 corresponding to the "m=" line of the multiplexing shim. The session 728 identifiers used in the shim MUST match the ports that were included 729 in the "m=" lines in the offer. The transport layer ports included 730 in those "m=" lines MUST NOT be used for media, and the offering 731 party SHOULD issue a follow-up offer closing down the "m=" lines used 732 for those ports (i.e., setting the ports in their "m=" line to 9) and 733 keeping just the multiplex. 735 (tbd: an alternative would be for the answer to reject all except 736 the multiplex stream by setting their ports to zero, but include 737 an attribute for each rejected "m=" line to indicate that if it is 738 to form part of the multiplex. This can perhaps be expected to 739 work better with middleboxes, but is a more significant change to 740 offer/answer processing at the endpoints.) 742 6.3. SRTP Key Management 744 Key management for SRTP do needs discussion as we do cause multiple 745 SRTP sessions to exist on the same underlying transport flow. Thus 746 we need to ensure that the key management mechanism still are 747 properly associated with the SRTP session context it intends to key. 748 To ensure that we do look at the three SRTP key management mechanism 749 that IETF has specified, one after another. 751 6.3.1. Security Description 753 Session Description Protocol (SDP) Security Descriptions for Media 754 Streams [RFC4568] as being based on SDP has no issue with the RTP 755 session multiplexing on lower layer specified here. The reason is 756 that the actual keying is done using a media level SDP attribute. 757 Thus the attribute is already associated with a particular media 758 description. A media description that also will have an instance of 759 the "a=session-mux-id" attribute carrying the SID value/pair used 760 with this particular crypto parameters. 762 6.3.2. DTLS-SRTP 764 Datagram Transport Layer Security (DTLS) Extension to Establish Keys 765 for the Secure Real-time Transport Protocol (SRTP) [RFC5764] is a 766 keying mechanism that works on the media plane on the same lower 767 layer transport that SRTP/SRTCP will be transported over. 769 The most direct solution would be to use the SHIM and the SID context 770 identifier to be applied also on DTLS packets. Thus using the same 771 SID that is used with RTP and/or RTCP also for the DTLS message 772 intended to key that particular SRTP and/or SRTCP flow(s). This of 773 course requires independent usage of DTLS-SRTP for each RTP session. 774 In addition it requires changing the layering for DTLS-SRTP as well 775 as RTP. Thus this behaviour doesn't gain you anything in regards to 776 key-management when using SHIM and have some costs. 778 Instead we propose that an DTLS-SRTP key-derivation change is 779 introduced. By including the Session ID value in the derivation of 780 the keying material a single DTLS-SRTP key-management operation could 781 apply keys and parameters for all the RTP sessions in the same 782 transport flow. Thus the keying cost is significantly reduced, 783 especially in regards to network communication and delay impact and 784 vulnerability to packet loss. 786 Details to be written up. 788 6.3.3. MIKEY 790 MIKEY: Multimedia Internet KEYing [RFC3830] is a key management 791 protocol that has several transports. In some cases it is used 792 directly on a transport protocol such as UDP, but there is also a 793 specification for how MIKEY is used with SDP "Key Management 794 Extensions for Session Description Protocol (SDP) and Real Time 795 Streaming Protocol (RTSP)" [RFC4567]. 797 Lets start with the later, i.e. the SDP transport, which shares the 798 properties with Security Description in that is can be associated 799 with a particular media description in a SDP. As long as one avoids 800 using the session level attribute one can be certain to correctly 801 associate the key exchange with a given SRTP/SRTCP context. 803 It does appear that MIKEY directly over a lower layer transport 804 protocol will have similar issues as DTLS. 806 6.4. Examples 808 6.4.1. Secure RTP Packet with Multiplexing Shim 810 The figure below contains an example Secure RTP packet with the RTP 811 multiplexing shim header, encapsulated by a UDP packet. The RTP 812 multiplexing shim immediately follows the UDP header, and is followed 813 by the encapsulated secure RTP packet. The Secure RTP authentication 814 tag protects the RTP packet only; it does not authenticate the RTP 815 multiplexing shim or the UDP headers. 817 0 1 2 3 818 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 820 | Source Port | Destination Port | U 821 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D 822 | Length | Checksum | P 823 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 824 |1 0| reserved | Session ID (SID) | 825 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 826 |V=2|P|X| CC |M| PT | sequence number | | 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 828 | timestamp | | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 830 | synchronization source (SSRC) identifier | | 831 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 832 | contributing source (CSRC) identifiers | | 833 | .... | | 834 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 835 | RTP extension (OPTIONAL) | | 836 +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 837 | | payload ... | | 838 | | +-------------------------------+ | 839 | | | RTP padding | RTP pad count | | 840 +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+ 841 | ~ SRTP MKI (OPTIONAL) ~ | 842 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 843 | : authentication tag (RECOMMENDED) : | 844 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 845 +- Encrypted Portion* Authenticated Portion ---+ 847 SRTP Packet Encapsulated by Session ID Layer 849 6.4.2. Basic RTP Multiplex Negotiation in SDP 851 This section contains SDP offer/answer examples. In the below SDP 852 offer, one audio and one video is being offered. The audio is using 853 session identifier 10000, and the video is using session identifier 854 10002. If the answer were to reject the "m=application...rtp-shim" 855 line, then separate RTP sessions would be set up for the audio and 856 video on ports 10000 and 10002 respectively. 858 v=0 859 o=alice 2890844526 2890844526 IN IP4 atlanta.example.com 860 s= 861 c=IN IP4 atlanta.example.com 862 t=0 0 863 a=group:SHIM foo bar baz 864 m=audio 10000 RTP/AVP 0 8 97 865 b=AS:200 866 a=mid:foo 867 a=rtpmap:0 PCMU/8000 868 a=rtpmap:8 PCMA/8000 869 a=rtpmap:97 iLBC/8000 870 m=video 10002 RTP/AVP 31 32 871 b=AS:1000 872 a=mid:bar 873 a=rtpmap:31 H261/90000 874 a=rtpmap:32 MPV/90000 875 m=application 10004 udp rtp-shim 876 a=mid:baz 878 The SDP answer from an end-point that supports the RTP multiplexing 879 shim follows. Note that the ports on the audio and video lines are 880 set to 9, to indicate that these flows are included in the multiplex. 881 The port of the m= line corresponding to the multiplex is set to the 882 transport port used for the multiplex. 884 v=0 885 o=bob 2808844564 2808844564 IN IP4 biloxi.example.com 886 s= 887 c=IN IP4 biloxi.example.com 888 t=0 0 889 a=group:SHIM foo bar baz 890 m=audio 9 RTP/AVP 0 891 b=AS:200 892 a=mid:foo 893 a=rtpmap:0 PCMU/8000 894 m=video 9 RTP/AVP 32 895 b=AS:1000 896 a=mid:bar 897 a=rtpmap:32 MPV/90000 898 m=application 10004 udp rtp-shim 899 a=mid:baz 901 The SDP answer from an end-point that does not support this SHIM. 902 The ports for the audio and video lines are kept, and the port is set 903 to 0 in the "m=" line corresponding to the multiplex. 905 v=0 906 o=bob 2808844564 2808844564 IN IP4 biloxi.example.com 907 s= 908 c=IN IP4 biloxi.example.com 909 t=0 0 910 a=group:SHIM foo bar baz 911 m=audio 10000 RTP/AVP 0 912 b=AS:200 913 a=mid:foo 914 a=rtpmap:0 PCMU/8000 915 m=video 10002 RTP/AVP 32 916 b=AS:1000 917 a=mid:bar 918 a=rtpmap:32 MPV/90000 919 m=application 0 udp rtp-shim 920 a=mid:baz 922 6.4.3. Advanced RTP Multiplex Negotiation in SDP 924 (tbd: add more examples) 926 7. Open Issues 928 This work is still at a relatively early phase. This section 929 contains a list of open issues where the author desires some input. 931 1. In Section 6.2 there is a discussion of which parameters that 932 need to be configured. The scope of these rules and if they do 933 make sense needs additional discussion. 935 2. Can we provide better control so that applications that doesn't 936 desire fall back to single RTP session when Multiplexing shim 937 fails to be supported but Bundle is supported ends up with a 938 better alternative? 940 3. The details for how to do key-derivation, preferably in such a 941 way that it can be reused by multiple key-management solutions 942 like MIKEY and DTLS-SRTP 944 4. The signalling solution will be revisited when the BUNDLE 945 solution discussion has yield some result. 947 8. IANA Considerations 949 (tbd: register the application/rtp-shim media type) 951 (tbd: register the "SHIM" semantics for the RTP grouping framework 953 9. Security Considerations 955 The security properties of the Session ID layer is depending on what 956 mechanism is used to protect the RTP and RTCP packets of a given RTP 957 session. If IPsec or transport layer security solutions such as DTLS 958 or TLS are being used then both the encapsulated RTP/RTCP packets and 959 the session ID layer will be protected by that security mechanism. 960 Thus potentially providing both confidentiality, integrity and source 961 authentication. If SRTP is used, the session ID layer will not be 962 directly protected by SRTP. However, it will be implicitly integrity 963 protected (assuming the RTP/RTCP packet is integrity protected) as 964 the only function of the field is to identify the session context. 965 Thus any modification of the SID field will attempt to retrieve the 966 wrong SRTP crypto context. If that retrieval fails, the packet will 967 be anyway be discarded. If it is successful, the context will not 968 lead to successful verification of the packet. 970 10. Acknowledgements 972 This memo is based on the input from various people, especially in 973 the context of the RTCWEB discussion of how to use only a single 974 lower layer transport. The RTP and RTCP packet figures are borrowed 975 from RFC3711. The SDP example is extended from the one present in 976 [I-D.ietf-mmusic-sdp-bundle-negotiation]. Eric Rescorla contributed 977 the basic idea of optimizing the DTLS-SRTP key-management by 978 modifying the key derivation process. 980 The proposal in Appendix A.5 is original suggested by Colin Perkins. 981 The idea in Appendix A.6 is from an Internet Draft 982 [I-D.rosenberg-rtcweb-rtpmux] written by Jonathan Rosenberg et. al. 983 The proposal in Appendix A.3 is a result of discussion by a group of 984 people at IETF meeting #81 in Quebec. 986 11. References 987 11.1. Normative References 989 [I-D.ietf-mmusic-sdp-bundle-negotiation] 990 Holmberg, C., Alvestrand, H., and C. Jennings, 991 "Multiplexing Negotiation Using Session Description 992 Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- 993 bundle-negotiation-05 (work in progress), October 2013. 995 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 996 Requirement Levels", BCP 14, RFC 2119, March 1997. 998 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 999 with Session Description Protocol (SDP)", RFC 3264, June 1000 2002. 1002 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1003 Jacobson, "RTP: A Transport Protocol for Real-Time 1004 Applications", STD 64, RFC 3550, July 2003. 1006 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1007 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1008 RFC 3711, March 2004. 1010 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1011 Description Protocol", RFC 4566, July 2006. 1013 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1014 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1016 11.2. Informational References 1018 [I-D.ietf-avtcore-multi-media-rtp-session] 1019 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1020 Multiple Types of Media in a Single RTP Session", draft- 1021 ietf-avtcore-multi-media-rtp-session-03 (work in 1022 progress), July 2013. 1024 [I-D.ietf-avtcore-multiplex-guidelines] 1025 Westerlund, M., Perkins, C., and H. Alvestrand, 1026 "Guidelines for using the Multiplexing Features of RTP to 1027 Support Multiple Media Streams", draft-ietf-avtcore- 1028 multiplex-guidelines-01 (work in progress), July 2013. 1030 [I-D.lennox-rtcweb-rtp-media-type-mux] 1031 Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media 1032 Types In a Single Real-Time Transport Protocol (RTP) 1033 Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work 1034 in progress), October 2011. 1036 [I-D.rosenberg-rtcweb-rtpmux] 1037 Rosenberg, J., Jennings, C., Peterson, J., Kaufman, M., 1038 Rescorla, E., and T. Terriberry, "Multiplexing of Real- 1039 Time Transport Protocol (RTP) Traffic for Browser based 1040 Real-Time Communications (RTC)", draft-rosenberg-rtcweb- 1041 rtpmux-00 (work in progress), July 2011. 1043 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1044 August 1980. 1046 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 1047 793, September 1981. 1049 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 1050 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 1051 August 2004. 1053 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1054 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 1056 [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 1057 Carrara, "Key Management Extensions for Session 1058 Description Protocol (SDP) and Real Time Streaming 1059 Protocol (RTSP)", RFC 4567, July 2006. 1061 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 1062 Description Protocol (SDP) Security Descriptions for Media 1063 Streams", RFC 4568, July 2006. 1065 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1066 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1067 July 2006. 1069 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 1070 4960, September 2007. 1072 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1073 Correction", RFC 5109, December 2007. 1075 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1076 (ICE): A Protocol for Network Address Translator (NAT) 1077 Traversal for Offer/Answer Protocols", RFC 5245, April 1078 2010. 1080 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1081 Header Extensions", RFC 5285, July 2008. 1083 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1084 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1085 October 2008. 1087 [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size 1088 Real-Time Transport Control Protocol (RTCP): Opportunities 1089 and Consequences", RFC 5506, April 2009. 1091 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1092 Control Packets on a Single Port", RFC 5761, April 2010. 1094 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1095 Security (DTLS) Extension to Establish Keys for the Secure 1096 Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. 1098 Appendix A. Possible Solutions 1100 This section documents the solutions explored when selecting a SHIM 1101 based one and discusses their feasibility. 1103 A.1. Header Extension 1105 One proposal is to define an RTP header extension [RFC5285] that 1106 explicitly enumerates the session identifier in each packet. This 1107 proposal has some merits regarding RTP, since it uses an existing 1108 extension mechanism; it explicitly enumerates the session allowing 1109 for third parties to associate the packet to a given RTP session; and 1110 it works with SRTP as currently defined since a header extension is 1111 by default not encrypted, and is thus readable by the receiving stack 1112 without needing to guess which session it belongs to and attempt to 1113 decrypt it. This approach does, however, conflict with the 1114 requirement from [RFC5285] that "header extensions using this 1115 specification MUST only be used for data that can be safely ignored 1116 by the recipient", since correct processing of the received packet 1117 depends on using the header extension to demultiplex it to the 1118 correct RTP session. 1120 Using a header extension also result in the session ID is in the 1121 integrity protected part of the packet. Thus a translator between 1122 multiplexed and non-multiplexed has the options: 1124 1. to be part of the security context to verify the field 1126 2. to be part of the security context to verify the field and remove 1127 it before forwarding the packet 1129 3. to be outside of the security context and leave the header 1130 extension in the packet. However, that requires successful 1131 negotiation of the header extension, but not of the 1132 functionality, with the receiving end-points. 1134 The biggest existing hurdle for this solution is that there exist no 1135 header extension field in the RTCP packets. This requires defining a 1136 solution for RTCP that allows carrying the explicit indicator, 1137 preferably in a position that isn't encrypted by SRTCP. However, the 1138 current SRTCP definition does not offer such a position in the 1139 packet. 1141 Modifying the RR or SR packets is possible using profile specific 1142 extensions. However, that has issues when it comes to deployment and 1143 in addition any information placed there would end up in the 1144 encrypted part. 1146 Another alternative could be to define another RTCP packet type that 1147 only contains the common header, using the 5 bits in the first byte 1148 of the common header to carry a session id. That would allow SRTCP 1149 to work correctly as long it accepts this new packet type being the 1150 first in the packet. Allowing a non-SR/RR packet as the first packet 1151 in a compound RTCP packet is also needed if an implementation is to 1152 support Reduced Size RTCP packets [RFC5506]. The remaining downside 1153 with this is that all stack implementations supporting multiplexing 1154 would need to modify its RTCP compound packet rules to include this 1155 packet type first. Thus a translator box between supporting nodes 1156 and non-supporting nodes needs to be in the crypto context. 1158 This solution's per packet overhead is expected to be 64-bits for 1159 RTCP. For RTP it is 64-bits if no header extension was otherwise 1160 used, and an additional 16 bits (short header), or 24 bits plus (if 1161 needed) padding to next 32-bits boundary if other header extensions 1162 are used. 1164 A.2. Multiplexing Shim 1166 This proposal is to prefix or postfix all RTP and RTCP packets with a 1167 session ID field. This field would be outside of the normal RTP and 1168 RTCP packets, thus having no impact on the RTP and RTCP packets and 1169 their processing. An additional step of demultiplexing processing 1170 would be added prior to RTP stack processing to determine in which 1171 RTP session context the packet is to be included. This has also no 1172 impact on SRTP/SRTCP as the shim layer would be outside of its 1173 protection context. The shim layer's session ID is however 1174 implicitly integrity protected as any error in the field will result 1175 in the packet being placed in the wrong or non-existing context, thus 1176 resulting in a integrity failure if processed by SRTP/SRTCP. 1178 This proposal is quite simple to implement in any gateway or 1179 translating device that goes from a multiplexed to a non-multiplexed 1180 domain or vice versa, as only an additional field needs to be added 1181 to or removed from the packet. 1183 The main downside of this proposal is that it is very likely to 1184 trigger a firewall response from any deep packet inspection device. 1185 If the field is prefixed, the RTP fields are not matching the 1186 heuristics field (unless the shim is designed to look like an RTP 1187 header, in which case the payload length is unlikely to match the 1188 expected value) and thus are likely preventing classification of the 1189 packet as an RTP packet. If it is postfixed, it is likely classified 1190 as an RTP packet but might not correctly validate if the content 1191 validation is such that the payload length is expected to match 1192 certain values. It is expected that a postfixed shim will be less 1193 problematic than a prefixed shim in this regard, but we are lacking 1194 hard data on this. 1196 This solution's per packet overhead is 1 byte. 1198 A.3. Single Session 1200 Given the difficulty of multiplexing several RTP sessions onto a 1201 single lower-layer transport, it's tempting to send multiple media 1202 streams in a single RTP session. Doing this avoids the need to de- 1203 multiplex several sessions on a single transport, but at the cost of 1204 losing the RTP session as a separator for different type of streams. 1205 Lacking different RTP sessions to demultiplex incoming packets, a 1206 receiver will have to dig deeper into the packet before determining 1207 what to do with it. Care has to be taken in that inspection. For 1208 example, it is important to be careful to ensure that each real media 1209 source uses its own SSRC in the session and that this SSRC doesn't 1210 change media type. 1212 The loss of the RTP session as a separator for different usages or 1213 purpose would be an minor issue if the only difference between the 1214 RTP sessions is the media type. In this case, the application could 1215 use the Payload Type field to identify the media type. The loss of 1216 the RTP Session functionality is however severe, if the application 1217 uses the RTP Session for separating different treatments, contexts 1218 etc. Then you would need additional signalling to bind the different 1219 sources to groups which can help make the necessary distinctions. 1221 However, the loss of the RTP session as separator is not the only 1222 issue with this approach. The RTP Multiplexing Architecture 1223 [I-D.ietf-avtcore-multiplex-guidelines] discusses a number of issues 1224 in Section 6.7. These include RTCP bandwidth differences, 1225 limitations in the number of payload types, media aware RTP mixers 1226 and interactions with Legacy end-points. 1228 Additional attention needs to be placed on this important aspect. In 1229 multi-party situations using central nodes there exist some 1230 difficulties in having a legacy implementation using multiple RTP 1231 sessions interworking with an end-point having only a single RTP 1232 session across the central node. The main reason is the fact that 1233 the one using single session with multiple media types has only one 1234 SSRC space, while the other end-points have multiple spaces. Thus 1235 translation might have to occur because there is several RTP sessions 1236 using the same SSRC value. This has both limitations, processing 1237 overhead and the possibility of becoming an deployment obstacle for 1238 new RTP/RTCP extensions. 1240 This approach has been proposed in the RTCWeb context in 1241 [I-D.lennox-rtcweb-rtp-media-type-mux] and 1242 [I-D.ietf-mmusic-sdp-bundle-negotiation]. These drafts describe how 1243 to signal multiple media streams multiplexed into a single RTP 1244 session, and address some of the issues raised here and in 1245 Section 6.7 of the RTP Multiplexing Architecture 1246 [I-D.ietf-avtcore-multiplex-guidelines] draft. 1248 This method has several limitations that limits its usage as solution 1249 in providing multiple RTP sessions on the same lower layer transport. 1250 However, we acknowledge that there are some uses for which this 1251 method can be sufficient and which can accept the methods limitations 1252 and downsides. The RTCWEB WG has a working assumption to support 1253 this method. For more details of this method, see the relevant 1254 drafts under development. We do include this method in the 1255 comparison to provide a more complete picture of the pro and cons of 1256 this method. 1258 This solution has no per packet overhead. The signalling overhead 1259 will be a different question. 1261 A.4. Use the SRTP MKI field 1263 This proposal is to overload the MKI SRTP/SRTCP identifier to not 1264 only identify a particular crypto context, but also identify the 1265 actual RTP Session. This clearly is a miss use of the MKI field, 1266 however it appears to be with little negative implications. SRTP 1267 already supports handling of multiple crypto contexts. 1269 The two major downsides with this proposal is first the fact that it 1270 requires using SRTP/SRTCP to multiplex multiple sessions on a single 1271 lower layer transport. The second issue is that the session ID 1272 parameter needs to be put into the various key-management schemes and 1273 to make them understand that the reason to establish multiple crypto 1274 contexts is because they are connected to various RTP Sessions. 1275 Considering that SRTP have at least 3 used keying mechanisms, DTLS- 1276 SRTP [RFC5764], Security Descriptions [RFC4568], and MIKEY [RFC3830], 1277 this is not an insignificant amount of work. 1279 This solution has 32-bit per packet overhead, but only if the MKI was 1280 not already used. 1282 A.5. Use an Octet in the Padding 1284 The basics of this proposal is to have the RTP packet and the last 1285 (mandated by RFC3550) RTCP packet in a compound to include padding, 1286 at least 2 bytes. One byte for the padding count (last byte) and one 1287 byte just before the padding count containing the session ID. 1289 This proposal uses bytes to carry the session ID that have no defined 1290 value and is intended to be ignored by the receiver. From that 1291 perspective it only causes packet expansion that is supported and 1292 handled by all existing equipment. If an implementation fails to 1293 understand that it is needs to interpret this padding byte to learn 1294 the session ID, it will see a mostly coherent RTP session except 1295 where SSRCs overlap or where the payload types overlap. However, 1296 reporting on the individual sources or forwarding the RTCP RR are not 1297 completely without merit. 1299 There is one downside of this proposal and that has to do with SRTP. 1300 To be able to determine the crypto context, it is necessary to access 1301 to the encrypted payload of the packet. Thus, the only mechanism 1302 available for a receiver to solve this issue is to try the existing 1303 crypto contexts for any session on the same lower layer transport and 1304 then use the one where the packet decrypts and verifies correctly. 1305 Thus for transport flows with many crypto contexts, an attacker could 1306 simply generate packets that don't validate to force the receiver to 1307 try all crypto contexts they have rather than immediately discard it 1308 as not matching a context. A receiver can mitigate this somewhat by 1309 using heuristics based on the RTP header fields to determine which 1310 context applies for a received packet, but this is not a complete 1311 solution. 1313 This solution has a 16-bit per packet overhead. 1315 A.6. Redefine the SSRC field 1316 The Rosenberg et. al. Internet draft "Multiplexing of Real-Time 1317 Transport Protocol (RTP) Traffic for Browser based Real-Time 1318 Communications (RTC)" [I-D.rosenberg-rtcweb-rtpmux] proposed to 1319 redefine the SSRC field. This has the advantage of no packet 1320 expansion. It also looks like regular RTP. However, it has a number 1321 of implications. First of all it prevents any RTP functionality that 1322 require the same SSRC in multiple RTP sessions. 1324 Secondly its interoperability with end-point using multiple RTP 1325 sessions are problematic. Such interoperability will requires an 1326 SSRC translator function in the gateway node to ensure that the SSRCs 1327 fulfil the semantic rules of the different domains. That translator 1328 is actually far from easy as it needs to understand the semantics of 1329 all RTP and RTCP extensions that include SSRC/CSRC. This as it is 1330 necessary to know when a particular matching 32-bit pattern is an 1331 SSRC field and when the field is just a combination of other fields 1332 that create the same matching 32-bit pattern. Thus there is a 1333 possibility that such a translator becomes a obstacle in deploying 1334 future RTP/RTCP extensions. In addition the translator actually have 1335 significant overhead when SRTP are in use. This as a verification 1336 that the packet is authentic, decryption, SSRC translation, 1337 encryption and finally generation of authentication tags are needed. 1338 In addition the translator has to be part of the security context. 1340 This solution has no per packet overhead. 1342 Appendix B. Comparison 1344 This section compares the above potential solutions with the 1345 requirements. Motivations are provided in addition to a high level 1346 metric of successfully, partially and failing to meet requirement. 1347 In the end a summary table (Figure 6) of the high level value are 1348 provided. 1350 B.1. Support of Multiple RTP Sessions Over Single Transport 1352 This one is easy to determine. Only the single session proposal 1353 fails this requirement as it is not at all designed to meet it. The 1354 rest fully support this requirement. 1356 B.2. Enable Same SSRC Value in Multiple RTP Sessions 1358 Based on the discussion in Section 4 two sub-requirements have been 1359 derived. 1361 B.2.1. Avoid SSRC Translation in Gateways/Translation 1362 This sub-requirement is derived based on the desire to avoid having 1363 gateways or translators perform full SSRC translation to minimize 1364 complexity, avoid the requirement to have gateways in security 1365 context, and as a hinder to long-term evolution. Two of the 1366 proposals have issues with this, due to their lack of support for 1367 multiple 32-bit SSRC spaces and lacking possibility to have the same 1368 SSRC value in multiple RTP sessions. The proposals that have these 1369 properties and thus are marked as failing are the Single Session and 1370 Redefine the SSRC field. The other proposals are all successful in 1371 meeting this requirement. 1373 B.2.2. Support Existing Extensions 1375 The second sub-requirement is how well the proposals support using 1376 the existing RTP mechanisms. Here both Single Session and Redefine 1377 the SSRC field will have clear issues as they cannot support the same 1378 full 32-bit SSRC value in two different RTP sessions. This is 1379 clearly an issue for the XOR based FEC. RTP retransmission and 1380 scalable encoding are minor issues as there exist alternatives to 1381 those mechanisms that works with the structure of these two 1382 proposals. Thus we give them a fail. The Header Extension gets a 1383 partial due to unclear interaction between putting in an header 1384 extension and these mechanisms. 1386 B.3. Ensure SRTP Functions 1388 This requirement is about ensuring both secure and efficient usage of 1389 SRTP. The Octet in Padding field proposal gets a fail as the 1390 receiving end-point cannot determine the intended RTP session prior 1391 to de-encryption of the padding field. Thus a catch-22 arises which 1392 can only be resolved by trying all session contexts and see what 1393 decrypts. This causes a security vulnerability as an attacker can 1394 inject a packet which does not meet any of the session contexts. The 1395 receiver will then attempt decryption and authentication of it using 1396 all its session contexts, increasing the amount of wasted resources 1397 by a factor equal to the number of multiplexed sessions. Thus this 1398 proposal gets a fail. 1400 The proposal of Overloading the SRTP MKI field as session identifier 1401 gets a partial due to the fact that it cannot use SRTP's key- 1402 management mechanism out of the box. It forces the key-management 1403 mechanism and the SRTP implementations to maintain the MKI-to-RTP 1404 session bindings to maintain secure and correct function. 1406 The Redefine the SSRC field gets a partial due to its need to modify 1407 the key-management mechanisms to correctly identify the partial SSRC 1408 space the parameters applies to. Similarly, the SRTP implementation 1409 also needs to be updated to correctly support this security context 1410 differentiation. 1412 The header extension based solution gets a less severe partial than 1413 Redefine the SSRC and the MKI. It will however have an issue when 1414 using a gateway to a domain that does not multiplex multiple RTP 1415 sessions over the same transport. Then the gateway will require to 1416 be in the security context to be able to add or remove the header 1417 extension as it is in the part of the packet that is integrity 1418 protected by SRTP. 1420 The remaining two proposals do not affect SRTP mechanisms and thus 1421 successfully meet this requirement. 1423 B.4. Don't Redefine Used Bits 1425 This requirement is all about RTP and RTCP header fields having a 1426 given definition ought not be changed as it can cause 1427 interoperability problems between modified and non-modified 1428 implementations. This becomes especially problematic in RTP sessions 1429 used for multi-party sessions. 1431 Redefine the SSRC field gets a big fail on this as it redefines the 1432 SSRC field, a core field in RTP. It has been identified that such a 1433 change will have issues since if it gets connected to a non-modified 1434 end-point that randomly assigns the SSRC, as supposed by RFC 3550, 1435 those SSRCs will be distributed over different RTP sessions at the 1436 modified end-point. Also other functions using the SSRC field, not 1437 understanding the additional semantics of the SSRC field, is likely 1438 to have issues. 1440 Using the SRTP MKI field to identify a session is overloading that 1441 field with double semantics. This likely has minimal negative impact 1442 in RTP since it ought to be possible to have the SRTP stack use the 1443 MKI field to both look up the security context and which output RTP 1444 session the processed packet belongs to. However, this redefinition 1445 clearly creates issues with the key-management scheme. That will 1446 have to be modified to handle both this change and deal with the 1447 interoperability issues when negotiating its usage. This gets a full 1448 fail due to that it makes the problem someone else's, namely the RTP 1449 implementers. 1451 Defining an Octet in the Padding field redefines a field, whose 1452 definition is to have zero value and is expected to be ignored by the 1453 receiver according to the original semantics. Thus this is one of 1454 the more benign modifications one can do, however this can still 1455 cause issues in implementations that unnecessarily check the field 1456 values, or in Firewalls. This is judged to be partially meeting the 1457 requirement. 1459 The Header Extension proposal does in fact not redefine any currently 1460 used bits in RTP. The header extension would be a correctly 1461 identified extension with its own definition. However, it does 1462 redefine a rule on what header extensions are for. The RTCP solution 1463 however would have more severe impact as it would need to redefine 1464 the standard meaning of an RTCP packet header in addition to the 1465 default compound packet rules. Due to these issues the proposal 1466 fails to meet this requirement. 1468 The multiplexing shim and the single session both successfully meet 1469 this requirement. 1471 B.5. Firewall Friendly 1473 This requirement is clearly difficult to judge as firewall 1474 implementations are highly different in both implementation, scope of 1475 what it investigates in packets, and set policies. A reasonable goal 1476 is to minimize the likeliness that rules and policies intended to let 1477 RTP media streams pass, will also let these streams through when 1478 multiplexing RTP sessions over a single transport. The below 1479 analysis shows that no solution is truly firewall friendly and all 1480 are judged as being partially meeting this goal. However, the reason 1481 why it is believed that a firewall might react to the streams are 1482 quite different. 1484 The Single Session and Redefine the SSRC field are likely the least 1485 suspect solutions from a firewall perspective. However, as their 1486 transport flows contain multiple SSRCs with payloads that indicate 1487 likely multiple different media types they are still likely to make a 1488 picky firewall block the transport. This is especially true for 1489 Firewalls that take signalling messages into account where it will 1490 expect a particular media type in a given context. A non upgraded 1491 firewall might in fact produce two different contexts with 1492 overlapping transport parameters where both rules will receive media 1493 streams of the other media type that are outside of the allowed rule. 1494 However, to be clear if these proposals doesn't get through, none of 1495 the other will either as they all will have this behaviour. 1497 The header extension proposal is potentially problematic for two 1498 reasons. The first reason, which also other proposals has, is 1499 related to that the same SSRC value can exist in two RTP sessions 1500 over the same underlying flow. Anyone tracking the sequence number 1501 and timestamp will react badly as the second media stream with the 1502 same SSRC causes constant jumps back and forth in these fields 1503 compared to the first stream, if packets are transmitted 1504 simultaneously for both SSRCs. This issue can likely only be solved 1505 by having the Firewalls that like to track flows to also use the 1506 session identifier to create context. This is possible as the header 1507 extension will be in the clear and in the front. The second issue is 1508 that the header extension itself can get the firewall to react. 1509 Especially very picky ones that expect packets with certain media 1510 types to have certain packet lengths. They are not compatible with a 1511 header extension. 1513 The Multiplexing Shim shares the issue with multiple flows for the 1514 same SSRC. Firewalls and deep packet inspection cause the shim 1515 placement to be in question. If it is a pre-fixed shim, it prevents 1516 the packet from looking like regular IP/UDP/RTP packets and be 1517 correctly classified in Firewalls and DPI engines. However, if one 1518 puts it last, it is unlikely that any firewall or DPI ever will be 1519 able to take the session context into account as it is at the end of 1520 the packet. This as many line rate processing devices only take a 1521 certain amount of the headers into account. 1523 The SRTP MKI field is likely the solution that has least firewall and 1524 DPI issues, after the single RTP session. There is no additional 1525 suspect field. The only difference from a single RTP session in the 1526 transport flow is the fact that multiple MKI are guaranteed to be 1527 used. However, that can occur also in a single RTP session usage. 1528 Thus the only issues are the one shared with single session and the 1529 one that several RTP media streams can use the same SSRC. 1531 The octet in the padding field has, in addition to the issues the 1532 SRTP MKI field has, the single issue that it redefines something that 1533 is supposed to be zero into a value. Thus potentially causing a 1534 deeply inspecting firewall to clamp the flow in fear of covert 1535 channel or non-compliance. 1537 B.6. Monitoring and Reporting 1539 The monitoring and reporting requirement considers several aspects. 1540 How useful monitoring can one get from an existing legacy monitor, 1541 and secondary any issues in upgrading them to handle the selected 1542 solution. Thirdly, packet selector filters and packet sniffers 1543 concerns are considered. 1545 In general one can expect the proposals that have only a single SSRC 1546 space to work better with legacy. Thus both Single Session and 1547 Redefine SSRC space can gather and report data on media flows most 1548 likely. The only potential issue is that due to the different media 1549 types and clock rates, some failure can occur. In particular a third 1550 party monitor can be targeted to a specific media type, like 1551 monitoring VoIP. That monitor will have problems processing any 1552 video packets correctly and generate the VoIP specific metrics for 1553 any video sending SSRC. In general, no legacy solution for 1554 monitoring will be able to correctly create the sub-contexts that 1555 each RTP session has in the solutions, without update to handle the 1556 new semantics. Also when it comes to the packet filtering and 1557 selector filters, fine grained control can only be accomplished 1558 implementing the new semantics. Therefore only the Single Session 1559 meets this requirement fully. 1561 Redefine the SSRC field is close to fully meeting the requirement, 1562 however due to that there exist a session structure that is hidden to 1563 anyone that is not upgraded to understand the semantics, this only 1564 gets a partial. 1566 The other proposals all can have multiple RTP sessions using the same 1567 SSRC. This will create significant issues for any legacy third party 1568 monitor. Only an updated monitor, or for that matter packet 1569 selector, can pick out the individual media streams and their 1570 associated RTCP traffic. Thus all these proposals gets a failure to 1571 meet the requirement. 1573 B.7. Usable over Multicast 1575 As discussed earlier the goal with having the option usable also over 1576 multicast is to remove the need to produce different media streams 1577 for transport over unicast and multicast. All of the proposals 1578 successfully meet the requirement. 1580 B.8. Incremental Deployment 1582 The possibility to deploy the usage of the multiplexing of multiple 1583 RTP sessions over a single transport, especially in the context of 1584 multi-party sessions, is a great benefit for any of the proposals. 1585 Thus not all end-point implementations needs to be upgraded before 1586 one start enabling it in the central node and any signalling. 1588 Considering a centralized multi-party application where some 1589 participants are using multiple transport flows and you want to 1590 enable one particular participant to use the single transport to the 1591 central node, one criteria stands out. The possibility to have one 1592 RTP session per transport in one leg, and in the next multiplex them 1593 together with minimal complexity and packet changes. Here there are 1594 significant differences. 1596 The Multiplexing Shim has the least overhead for this. As the 1597 central node or gateway between deployments only needs to either add 1598 or remove the shim identifier and then forward the packet over the 1599 corresponding transport, either a joint one on the single transport 1600 side, or over the individual one on the multiple transport side. 1602 The SRTP MKI field proposal is almost as good, as the only main 1603 difference is the need to coordinate the used MKIs on the non- 1604 multiplexed legs so that there is no overlap between the RTP 1605 sessions. And if there is, the MKI can be translated in gateway as 1606 SRTP has no integrity protection over the MKI. Thus both 1607 multiplexing shim and SRTP MKI field does successfully meet this 1608 requirement. 1610 The Header Extension supports multiple full 32-bit SSRC spaces and 1611 can thus handle all the RTP sessions without need for any SSRC 1612 translation, however this proposal does run into the problem that the 1613 gateway needs to be in the security context to be able to add or 1614 remove the header extension when SRTP is used. In addition to the 1615 security implications of that, there is a complexity overhead due to 1616 the need to redo the authentication tags on all RTP/RTCP packets. 1617 Thus it gets a partial. 1619 The Octet in the Padding field share issues with the header extension 1620 but have even higher complexities for this. The reason is that the 1621 padding field is also encrypted. Thus to add or remove it (although 1622 removing it might be unnecessary) forces the end-point to encrypt at 1623 least that byte also, and for ciphers that are not stream-ciphers, 1624 the whole packet needs to be re-encrypted. Thus this proposal gets a 1625 very weak partially meeting the requirement. 1627 The Single Session and Redefine the SSRC field do not allow several 1628 vanilla RTP sessions to be connected to these proposals. The reason 1629 is the single 32-bit SSRC space they have. Single Session only has 1630 one session and the Redefine the SSRC fields uses some of the bits as 1631 session identifier. This forces the gateway to translate the SSRC 1632 whenever it does not fulfil the rules or semantics of the multiplexed 1633 side. For Redefine SSRC field this becomes almost constant as the 1634 session identifier part of the SSRC has to be the same over all SSRCs 1635 from the same session. For Single Session it might only be needed 1636 when there otherwise would be an SSRC collision between the sessions. 1637 This further assumes that the non-multiplexed side would never use 1638 any of the RTP mechanisms that require the same SSRC in multiple RTP 1639 sessions, as they cannot be gatewayed at all. When translating an 1640 SSRC there is first of all an overhead, with SRTP that includes a 1641 complete authenticate, decrypt, encrypt and create a new 1642 authentication tag cycle. In addition, the SSRC translation could 1643 potentially be a deployment obstacle for new RTP/RTCP extensions that 1644 has to be understood by the translator to be correctly translated. 1645 Therefore these two proposals gets a fail to meet the requirements. 1647 B.9. Summary and Conclusion 1649 This section contains a summary table of the high level outcome 1650 against the different requirements. 1652 A table mapping the requirements against the ID numbers used in the 1653 table is the following: 1655 1: Support multiple RTP sessions over one transport flow 1657 2: Enable same SSRC value in multiple RTP sessions 1659 2.1: Avoid SSRC translation in gateways/translators 1661 2.2: Support existing extensions 1663 3: Ensure SRTP functions 1665 4: Don't Redefine used bits 1667 5: Firewall Friendly 1669 6: Monitoring and Reporting still needs to function 1671 7: Usable over Multicast 1673 8: Incremental deployment 1675 OH: Overhead in Bytes. + means variable 1677 ---------------+---+---+---+---+---+---+---+---+---+---- 1678 Solution | 1 |2.1|2.2| 3 | 4 | 5 | 6 | 7 | 8 | OH 1679 ---------------+---+---+---+---+---+---+---+---+---+---- 1680 Header Ext. | S | S | P | P | F | P | F | S | P | 8+ 1681 Multiplex Shim | S | S | S | S | S | P | F | S | S | 1 1682 Single Session | F | F | F | S | S | P | S | S | F | 0 1683 SRTP MKI Field | S | S | S | P | F | P | F | S | S | 4 1684 Padding Field | S | S | S | F | P | P | F | S | P | 2 1685 Redefine SSRC | S | F | F | P | F | P | P | S | S | 0 1686 ---------------+---+---+---+---+---+---+---+---+---+---- 1688 Figure 6: Summary Table of Evaluation (Successfully (S), Partially 1689 (P) or Fails (F) to meet requirement) 1691 Considering these options, the authors would recommend that AVTCORE 1692 standardize a solution based on a post or prefixed multiplexing 1693 field, i.e. a shim approach combined with the appropriate signalling 1694 as described in Appendix A.2. 1696 Authors' Addresses 1698 Magnus Westerlund 1699 Ericsson 1700 Farogatan 6 1701 SE-164 80 Kista 1702 Sweden 1704 Phone: +46 10 714 82 87 1705 Email: magnus.westerlund@ericsson.com 1707 Colin Perkins 1708 University of Glasgow 1709 School of Computing Science 1710 Glasgow G12 8QQ 1711 United Kingdom 1713 Email: csp@csperkins.org 1714 URI: http://csperkins.org/