idnits 2.17.1 draft-ietf-mops-streaming-opcons-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (12 July 2021) is 1012 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 == Outdated reference: A later version (-14) exists of draft-pantos-hls-rfc8216bis-09 == Outdated reference: A later version (-10) exists of draft-ietf-quic-datagram-02 == Outdated reference: A later version (-18) exists of draft-ietf-quic-manageability-11 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2001 (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MOPS J. Holland 3 Internet-Draft Akamai Technologies, Inc. 4 Intended status: Informational A. Begen 5 Expires: 13 January 2022 Networked Media 6 S. Dawkins 7 Tencent America LLC 8 12 July 2021 10 Operational Considerations for Streaming Media 11 draft-ietf-mops-streaming-opcons-06 13 Abstract 15 This document provides an overview of operational networking issues 16 that pertain to quality of experience in streaming of video and other 17 high-bitrate media over the Internet. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 13 January 2022. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Notes for Contributors and Reviewers . . . . . . . . . . 4 54 1.1.1. Venues for Contribution and Discussion . . . . . . . 4 55 1.1.2. History of Public Discussion . . . . . . . . . . . . 5 56 2. Bandwidth Provisioning . . . . . . . . . . . . . . . . . . . 5 57 2.1. Scaling Requirements for Media Delivery . . . . . . . . . 5 58 2.1.1. Video Bitrates . . . . . . . . . . . . . . . . . . . 5 59 2.1.2. Virtual Reality Bitrates . . . . . . . . . . . . . . 6 60 2.2. Path Requirements . . . . . . . . . . . . . . . . . . . . 7 61 2.3. Caching Systems . . . . . . . . . . . . . . . . . . . . . 7 62 2.4. Predictable Usage Profiles . . . . . . . . . . . . . . . 8 63 2.5. Unpredictable Usage Profiles . . . . . . . . . . . . . . 9 64 2.6. Extremely Unpredictable Usage Profiles . . . . . . . . . 10 65 3. Latency Considerations . . . . . . . . . . . . . . . . . . . 11 66 3.1. Ultra Low-Latency . . . . . . . . . . . . . . . . . . . . 12 67 3.2. Low-Latency Live . . . . . . . . . . . . . . . . . . . . 12 68 3.3. Non-Low-Latency Live . . . . . . . . . . . . . . . . . . 13 69 3.4. On-Demand . . . . . . . . . . . . . . . . . . . . . . . . 14 70 4. Adaptive Encoding, Adaptive Delivery, and Measurement 71 Collection . . . . . . . . . . . . . . . . . . . . . . . 14 72 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 14 73 4.2. Adaptive Encoding . . . . . . . . . . . . . . . . . . . . 15 74 4.3. Adaptive Segmented Delivery . . . . . . . . . . . . . . . 15 75 4.4. Bitrate Detection Challenges . . . . . . . . . . . . . . 16 76 4.4.1. Idle Time between Segments . . . . . . . . . . . . . 16 77 4.4.2. Head-of-Line Blocking . . . . . . . . . . . . . . . . 17 78 4.4.3. Wide and Rapid Variation in Path Capacity . . . . . . 17 79 4.5. Measurement Collection . . . . . . . . . . . . . . . . . 18 80 4.5.1. CTA-2066: Streaming Quality of Experience Events, 81 Properties and Metrics . . . . . . . . . . . . . . . 18 82 4.5.2. CTA-5004: Common Media Client Data (CMCD) . . . . . . 19 83 4.6. Unreliable Transport . . . . . . . . . . . . . . . . . . 19 84 5. Evolution of Transport Protocols and Transport Protocol 85 Behaviors . . . . . . . . . . . . . . . . . . . . . . . . 20 86 5.1. UDP and Its Behavior . . . . . . . . . . . . . . . . . . 20 87 5.2. TCP and Its Behavior . . . . . . . . . . . . . . . . . . 21 88 5.3. The QUIC Protocol and Its Behavior . . . . . . . . . . . 22 89 6. Streaming Encrypted Media . . . . . . . . . . . . . . . . . . 24 90 6.1. General Considerations for Media Encryption . . . . . . . 25 91 6.2. Considerations for "Hop-by-Hop" Media Encryption . . . . 26 92 6.3. Considerations for "End-to-End" Media Encryption . . . . 27 93 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 94 8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 95 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 96 10. Informative References . . . . . . . . . . . . . . . . . . . 28 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 99 1. Introduction 101 As the internet has grown, an increasingly large share of the traffic 102 delivered to end users has become video. Estimates put the total 103 share of internet video traffic at 75% in 2019, expected to grow to 104 82% by 2022. This estimate projects the gross volume of video 105 traffic will more than double during this time, based on a compound 106 annual growth rate continuing at 34% (from Appendix D of [CVNI]). 108 A substantial part of this growth is due to increased use of 109 streaming video, although the amount of video traffic in real-time 110 communications (for example, online videoconferencing) has also grown 111 significantly. While both streaming video and videoconferencing have 112 real-time delivery and latency requirements, these requirements vary 113 from one application to another. For example, videoconferencing 114 demands an end-to-end (one-way) latency of a few hundreds of 115 milliseconds whereas live streaming can tolerate latencies of several 116 seconds. 118 This document specifically focuses on the streaming applications and 119 defines streaming as follows: 121 * Streaming is transmission of a continuous media from a server to a 122 client and its simultaneous consumption by the client. 124 * Here, continuous media refers to media and associated streams such 125 as video, audio, metadata, etc. In this definition, the critical 126 term is "simultaneous", as it is not considered streaming if one 127 downloads a video file and plays it after the download is 128 completed, which would be called download-and-play. 130 This has two implications. 132 * First, the server's transmission rate must (loosely or tightly) 133 match to client's consumption rate in order to provide 134 uninterrupted playback. That is, the client must not run out of 135 data (buffer underrun) or accept more data than it can buffer 136 before playback (buffer overrun) as any excess media is simply 137 discarded. 139 * Second, the client's consumption rate is limited not only by 140 bandwidth availability but also real-time constraints. That is, 141 the client cannot fetch media that is not available from a server 142 yet. 144 In many contexts, video traffic can be handled transparently as 145 generic application-level traffic. However, as the volume of video 146 traffic continues to grow, it's becoming increasingly important to 147 consider the effects of network design decisions on application-level 148 performance, with considerations for the impact on video delivery. 150 This document examines networking issues as they relate to quality of 151 experience in internet video delivery. The focus is on capturing 152 characteristics of video delivery that have surprised network 153 designers or transport experts without specific video expertise, 154 since these highlight key differences between common assumptions in 155 existing networking documents and observations of video delivery 156 issues in practice. 158 Making specific recommendations on operational practices aimed at 159 mitigating these issues is out of scope, though some existing 160 mitigations are mentioned in passing. The intent is to provide a 161 point of reference for future solution proposals to use in describing 162 how new technologies address or avoid these existing observed 163 problems. 165 1.1. Notes for Contributors and Reviewers 167 Note to RFC Editor: Please remove this section and its subsections 168 before publication. 170 This section is to provide references to make it easier to review the 171 development and discussion on the draft so far. 173 1.1.1. Venues for Contribution and Discussion 175 This document is in the Github repository at: 177 https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons 178 (https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons) 180 Readers are welcome to open issues and send pull requests for this 181 document. 183 Substantial discussion of this document should take place on the MOPS 184 working group mailing list (mops@ietf.org). 186 * Join: https://www.ietf.org/mailman/listinfo/mops 187 (https://www.ietf.org/mailman/listinfo/mops) 189 * Search: https://mailarchive.ietf.org/arch/browse/mops/ 190 (https://mailarchive.ietf.org/arch/browse/mops/) 192 1.1.2. History of Public Discussion 194 Presentations: 196 * IETF 105 BOF: 198 https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s 199 (https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s) 201 * IETF 106 meeting: 203 https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s 204 (https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s) 206 * MOPS Interim Meeting 2020-04-15: 208 https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s 209 (https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s) 211 * IETF 108 meeting: 213 https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s 214 (https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s) 216 * MOPS 2020-10-30 Interim meeting: 218 https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s 219 (https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s) 221 2. Bandwidth Provisioning 223 2.1. Scaling Requirements for Media Delivery 225 2.1.1. Video Bitrates 227 Video bitrate selection depends on many variables including the 228 resolution (height and width), frame rate, color depth, codec, 229 encoding parameters, scene complexity and amount of motion. 230 Generally speaking, as the resolution, frame rate, color depth, scene 231 complexity and amount of motion increase, the encoding bitrate 232 increases. As newer codecs with better compression tools are used, 233 the encoding bitrate decreases. Similarly, a multi-pass encoding 234 generally produces better quality output compared to single-pass 235 encoding at the same bitrate, or delivers the same quality at a lower 236 bitrate. 238 Here are a few common resolutions used for video content, with 239 typical ranges of bitrates for the two most popular video codecs 240 [Encodings]. 242 +============+================+============+============+ 243 | Name | Width x Height | H.264 | H.265 | 244 +============+================+============+============+ 245 | DVD | 720 x 480 | 1.0 Mbps | 0.5 Mbps | 246 +------------+----------------+------------+------------+ 247 | 720p (1K) | 1280 x 720 | 3-4.5 Mbps | 2-4 Mbps | 248 +------------+----------------+------------+------------+ 249 | 1080p (2K) | 1920 x 1080 | 6-8 Mbps | 4.5-7 Mbps | 250 +------------+----------------+------------+------------+ 251 | 2160p (4k) | 3840 x 2160 | N/A | 10-20 Mbps | 252 +------------+----------------+------------+------------+ 254 Table 1 256 2.1.2. Virtual Reality Bitrates 258 The bitrates given in Section 2.1.1 describe video streams that 259 provide the user with a single, fixed, point of view - so, the user 260 has no "degrees of freedom", and the user sees all of the video image 261 that is available. 263 Even basic virtual reality (360-degree) videos that allow users to 264 look around freely (referred to as "three degrees of freedom", or 265 3DoF) require substantially larger bitrates when they are captured 266 and encoded as such videos require multiple fields of view of the 267 scene. The typical multiplication factor is 8 to 10. Yet, due to 268 smart delivery methods such as viewport-based or tiled-based 269 streaming, we do not need to send the whole scene to the user. 270 Instead, the user needs only the portion corresponding to its 271 viewpoint at any given time. 273 In more immersive applications, where limited user movement ("three 274 degrees of freedom plus", or 3DoF+) or full user movement ("six 275 degrees of freedom", or 6DoF) is allowed, the required bitrate grows 276 even further. In this case, immersive content is typically referred 277 to as volumetric media. One way to represent the volumetric media is 278 to use point clouds, where streaming a single object may easily 279 require a bitrate of 30 Mbps or higher. Refer to [MPEGI] and [PCC] 280 for more details. 282 2.2. Path Requirements 284 The bitrate requirements in Section 2.1 are per end-user actively 285 consuming a media feed, so in the worst case, the bitrate demands can 286 be multiplied by the number of simultaneous users to find the 287 bandwidth requirements for a router on the delivery path with that 288 number of users downstream. For example, at a node with 10,000 289 downstream users simultaneously consuming video streams, 290 approximately 80 Gbps might be necessary in order for all of them to 291 get typical content at 1080p resolution. 293 However, when there is some overlap in the feeds being consumed by 294 end users, it is sometimes possible to reduce the bandwidth 295 provisioning requirements for the network by performing some kind of 296 replication within the network. This can be achieved via object 297 caching with delivery of replicated objects over individual 298 connections, and/or by packet-level replication using multicast. 300 To the extent that replication of popular content can be performed, 301 bandwidth requirements at peering or ingest points can be reduced to 302 as low as a per-feed requirement instead of a per-user requirement. 304 2.3. Caching Systems 306 When demand for content is relatively predictable, and especially 307 when that content is relatively static, caching content close to 308 requesters, and pre-loading caches to respond quickly to initial 309 requests is often useful (for example, HTTP/1.1 caching is described 310 in [RFC7234]). This is subject to the usual considerations for 311 caching - for example, how much data must be cached to make a 312 significant difference to the requester, and how the benefits of 313 caching and pre-loading caches balances against the costs of tracking 314 "stale" content in caches and refreshing that content. 316 It is worth noting that not all high-demand content is "live" 317 content. One popular example is when popular streaming content can 318 be staged close to a significant number of requesters, as can happen 319 when a new episode of a popular show is released. This content may 320 be largely stable, so low-cost to maintain in multiple places 321 throughout the Internet. This can reduce demands for high end-to-end 322 bandwidth without having to use mechanisms like multicast. 324 Caching and pre-loading can also reduce exposure to peering point 325 congestion, since less traffic crosses the peering point exchanges if 326 the caches are placed in peer networks, especially when the content 327 can be pre-loaded during off-peak hours, and especially if the 328 transfer can make use of "Lower-Effort Per-Hop Behavior (LE PHB) for 329 Differentiated Services" [RFC8622], "Low Extra Delay Background 330 Transport (LEDBAT)" [RFC6817], or similar mechanisms. 332 All of this depends, of course, on the ability of a content provider 333 to predict usage and provision bandwidth, caching, and other 334 mechanisms to meet the needs of users. In some cases (Section 2.4), 335 this is relatively routine, but in other cases, it is more difficult 336 (Section 2.5, Section 2.6). 338 And as with other parts of the ecosystem, new technology brings new 339 challenges. For example, with the emergence of ultra-low-latency 340 streaming, responses have to start streaming to the end user while 341 still being transmitted to the cache, and while the cache does not 342 yet know the size of the object. Some of the popular caching systems 343 were designed around cache footprint and had deeply ingrained 344 assumptions about knowing the size of objects that are being stored, 345 so the change in design requirements in long-established systems 346 caused some errors in production. Incidents occurred where a 347 transmission error in the connection from the upstream source to the 348 cache could result in the cache holding a truncated segment and 349 transmitting it to the end user's device. In this case, players 350 rendering the stream often had the video freeze until the player was 351 reset. In some cases the truncated object was even cached that way 352 and served later to other players as well, causing continued stalls 353 at the same spot in the video for all players playing the segment 354 delivered from that cache node. 356 2.4. Predictable Usage Profiles 358 Historical data shows that users consume more video and videos at 359 higher bitrates than they did in the past on their connected devices. 360 Improvements in the codecs that help with reducing the encoding 361 bitrates with better compression algorithms could not have offset the 362 increase in the demand for the higher quality video (higher 363 resolution, higher frame rate, better color gamut, better dynamic 364 range, etc.). In particular, mobile data usage has shown a large 365 jump over the years due to increased consumption of entertainment as 366 well as conversational video. 368 2.5. Unpredictable Usage Profiles 370 Although TCP/IP has been used with a number of widely used 371 applications that have symmetric bandwidth requirements (similar 372 bandwidth requirements in each direction between endpoints), many 373 widely-used Internet applications operate in client-server roles, 374 with asymmetric bandwidth requirements. A common example might be an 375 HTTP GET operation, where a client sends a relatively small HTTP GET 376 request for a resource to an HTTP server, and often receives a 377 significantly larger response carrying the requested resource. When 378 HTTP is commonly used to stream movie-length video, the ratio between 379 response size and request size can become arbitrarily large. 381 For this reason, operators may pay more attention to downstream 382 bandwidth utilization when planning and managing capacity. In 383 addition, operators have been able to deploy access networks for end 384 users using underlying technologies that are inherently asymmetric, 385 favoring downstream bandwidth (e.g. ADSL, cellular technologies, 386 most IEEE 802.11 variants), assuming that users will need less 387 upstream bandwidth than downstream bandwidth. This strategy usually 388 works, except when it faiis because application bandwidth usage 389 patterns have changed in ways that were not predicted. 391 One example of this type of change was when peer-to-peer file sharing 392 applications gained popularity in the early 2000s. To take one well- 393 documented case ([RFC5594]), the Bittorrent application created 394 "swarms" of hosts, uploading and downloading files to each other, 395 rather than communicating with a server. Bittorrent favored peers 396 who uploaded as much as they downloaded, so that new Bittorrent users 397 had an incentive to significantly increase their upstream bandwidth 398 utilization. 400 The combination of the large volume of "torrents" and the peer-to- 401 peer characteristic of swarm transfers meant that end user hosts were 402 suddenly uploading higher volumes of traffic to more destinations 403 than was the case before Bittorrent. This caused at least one large 404 ISP to attempt to "throttle" these transfers, to mitigate the load 405 that these hosts placed on their network. These efforts were met by 406 increased use of encryption in Bittorrent, similar to an arms race, 407 and set off discussions about "Net Neutrality" and calls for 408 regulatory action. 410 Especially as end users increase use of video-based social networking 411 applications, it will be helpful for access network providers to 412 watch for increasing numbers of end users uploading significant 413 amounts of content. 415 2.6. Extremely Unpredictable Usage Profiles 417 The causes of unpredictable usage described in Section 2.5 were more 418 or less the result of human choices, but we were reminded during a 419 post-IETF 107 meeting that humans are not always in control, and 420 forces of nature can cause enormous fluctuations in traffic patterns. 422 In his talk, Sanjay Mishra [Mishra] reported that after the CoViD-19 423 pandemic broke out in early 2020, 425 * Comcast's streaming and web video consumption rose by 38%, with 426 their reported peak traffic up 32% overall between March 1 to 427 March 30, 429 * AT&T reported a 28% jump in core network traffic (single day in 430 April, as compared to pre stay-at-home daily average traffic), 431 with video accounting for nearly half of all mobile network 432 traffic, while social networking and web browsing remained the 433 highest percentage (almost a quarter each) of overall mobility 434 traffic, and 436 * Verizon reported similar trends with video traffic up 36% over an 437 average day (pre COVID-19)}. 439 We note that other operators saw similar spikes during this time 440 period. Craig Labowitz [Labovitz] reported 442 * Weekday peak traffic increases over 45%-50% from pre-lockdown 443 levels, 445 * A 30% increase in upstream traffic over their pre-pandemic levels, 446 and 448 * A steady increase in the overall volume of DDoS traffic, with 449 amounts exceeding the pre-pandemic levels by 40%. (He attributed 450 this increase to the significant rise in gaming-related DDoS 451 attacks ([LabovitzDDoS]), as gaming usage also increased.) 453 Subsequently, the Internet Architecture Board (IAB) held a COVID-19 454 Network Impacts Workshop [IABcovid] in November 2020. Given a larger 455 number of reports and more time to reflect, the following 456 observations from the draft workshop report are worth considering. 458 * Participants describing different types of networks reported 459 different kinds of impacts, but all types of networks saw impacts. 461 * Mobile networks saw traffic reductions and residential networks 462 saw significant increases. 464 * Reported traffic increases from ISPs and IXPs over just a few 465 weeks were as big as the traffic growth over the course of a 466 typical year, representing a 15-20% surge in growth to land at a 467 new normal that was much higher than anticipated. 469 * At DE-CIX Frankfurt, the world's largest Internet Exchange Point 470 in terms of data throughput, the year 2020 has seen the largest 471 increase in peak traffic within a single year since the IXP was 472 founded in 1995. 474 * The usage pattern changed significantly as work-from-home and 475 videoconferencing usage peaked during normal work hours, which 476 would have typically been off-peak hours with adults at work and 477 children at school. One might expect that the peak would have had 478 more impact on networks if it had happened during typical evening 479 peak hours for video streaming applications. 481 * The increase in daytime bandwidth consumption reflected both 482 significant increases in "essential" applications such as 483 videoconferencing and VPNs, and entertainment applications as 484 people watched videos or played games. 486 * At the IXP-level, it was observed that port utilization increased. 487 This phenomenon is mostly explained by a higher traffic demand 488 from residential users. 490 3. Latency Considerations 492 Streaming media latency refers to the "glass-to-glass" time duration, 493 which is the delay between the real-life occurrence of an event and 494 the streamed media being appropriately displayed on an end user's 495 device. Note that this is different from the network latency 496 (defined as the time for a packet to cross a network from one end to 497 another end) because it includes video encoding/decoding and 498 buffering time, and for most cases also ingest to an intermediate 499 service such as a CDN or other video distribution service, rather 500 than a direct connection to an end user. 502 Streaming media can be usefully categorized according to the 503 application's latency requirements into a few rough categories: 505 * ultra low-latency (less than 1 second) 507 * low-latency live (less than 10 seconds) 509 * non-low-latency live (10 seconds to a few minutes) 511 * on-demand (hours or more) 513 3.1. Ultra Low-Latency 515 Ultra low-latency delivery of media is defined here as having a 516 glass-to-glass delay target under one second. 518 This level of latency is sometimes necessary for real-time 519 interactive applications such as video conferencing, operation of 520 remote control devices or vehicles, or remotely hosted real-time 521 gaming systems. Some media content providers aim to achieve this 522 level of latency for live media events involving sports, but have 523 usually so far been unsuccessful over the internet at scale, though 524 it is often possible within a localized environment with a controlled 525 network, such as inside a specific venue connected to the event. 526 Applications operating in this domain that encounter transient 527 network events such as loss or reordering of some packets often 528 experience user-visible artifacts in the media. 530 Applications requiring ultra low latency for media delivery are 531 usually tightly constrained on the available choices for media 532 transport technologies, and sometimes may need to operate in 533 controlled environments to reliably achieve their latency and quality 534 goals. 536 Most applications operating over IP networks and requiring latency 537 this low use the Real-time Transport Protocol (RTP) [RFC3550] or 538 WebRTC [RFC8825], which uses RTP for the media transport as well as 539 several other protocols necessary for safe operation in browsers. 541 Worth noting is that many applications for ultra low-latency delivery 542 do not need to scale to more than one user at a time, which 543 simplifies many delivery considerations relative to other use cases. 544 For applications that need to replicate streams to multiple users, 545 especially at a scale exceeding tens of users, this level of latency 546 has historically been nearly impossible to achieve except with the 547 use of multicast or planned provisioning in controlled networks. 549 Recommended reading for applications adopting an RTP-based approach 550 also includes [RFC7656]. For increasing the robustness of the 551 playback by implementing adaptive playout methods, refer to [RFC4733] 552 and [RFC6843]. 554 Applications with further-specialized latency requirements are out of 555 scope for this document. 557 3.2. Low-Latency Live 559 Low-latency live delivery of media is defined here as having a glass- 560 to-glass delay target under 10 seconds. 562 This level of latency is targeted to have a user experience similar 563 to traditional broadcast TV delivery. A frequently cited problem 564 with failing to achieve this level of latency for live sporting 565 events is the user experience failure from having crowds within 566 earshot of one another who react audibly to an important play, or 567 from users who learn of an event in the match via some other channel, 568 for example social media, before it has happened on the screen 569 showing the sporting event. 571 Applications requiring low-latency live media delivery are generally 572 feasible at scale with some restrictions. This typically requires 573 the use of a premium service dedicated to the delivery of live video, 574 and some tradeoffs may be necessary relative to what's feasible in a 575 higher latency service. The tradeoffs may include higher costs, or 576 delivering a lower quality video, or reduced flexibility for adaptive 577 bitrates, or reduced flexibility for available resolutions so that 578 fewer devices can receive an encoding tuned for their display. Low- 579 latency live delivery is also more susceptible to user-visible 580 disruptions due to transient network conditions than higher latency 581 services. 583 Implementation of a low-latency live video service can be achieved 584 with the use of low-latency extensions of HLS (called LL-HLS) 585 [I-D.draft-pantos-hls-rfc8216bis] and DASH (called LL-DASH) 586 [LL-DASH]. These extensions use the Common Media Application Format 587 (CMAF) standard [MPEG-CMAF] that allows the media to be packaged into 588 and transmitted in units smaller than segments, which are called 589 chunks in CMAF language. This way, the latency can be decoupled from 590 the duration of the media segments. Without a CMAF-like packaging, 591 lower latencies can only be achieved by using very short segment 592 durations. However, shorter segments means more frequent intra-coded 593 frames and that is detrimental to video encoding quality. CMAF 594 allows us to still use longer segments (improving encoding quality) 595 without penalizing latency. 597 While an LL-HLS client retrieves each chunk with a separate HTTP GET 598 request, an LL-DASH client uses the chunked transfer encoding feature 599 of the HTTP [CMAF-CTE] which allows the LL-DASH client to fetch all 600 the chunks belonging to a segment with a single GET request. An HTTP 601 server can transmit the CMAF chunks to the LL-DASH client as they 602 arrive from the encoder/packager. A detailed comparison of LL-HLS 603 and LL-DASH is given in [MMSP20]. 605 3.3. Non-Low-Latency Live 607 Non-low-latency live delivery of media is defined here as a live 608 stream that does not have a latency target shorter than 10 seconds. 610 This level of latency is the historically common case for segmented 611 video delivery using HLS [RFC8216] and DASH [MPEG-DASH]. This level 612 of latency is often considered adequate for content like news or pre- 613 recorded content. This level of latency is also sometimes achieved 614 as a fallback state when some part of the delivery system or the 615 client-side players do not have the necessary support for the 616 features necessary to support low-latency live streaming. 618 This level of latency can typically be achieved at scale with 619 commodity CDN services for HTTP(s) delivery, and in some cases the 620 increased time window can allow for production of a wider range of 621 encoding options relative to the requirements for a lower latency 622 service without the need for increasing the hardware footprint, which 623 can allow for wider device interoperability. 625 3.4. On-Demand 627 On-Demand media streaming refers to playback of pre-recorded media 628 based on a user's action. In some cases on-demand media is produced 629 as a by-product of a live media production, using the same segments 630 as the live event, but freezing the manifest after the live event has 631 finished. In other cases, on-demand media is constructed out of pre- 632 recorded assets with no streaming necessarily involved during the 633 production of the on-demand content. 635 On-demand media generally is not subject to latency concerns, but 636 other timing-related considerations can still be as important or even 637 more important to the user experience than the same considerations 638 with live events. These considerations include the startup time, the 639 stability of the media stream's playback quality, and avoidance of 640 stalls and video artifacts during the playback under all but the most 641 severe network conditions. 643 In some applications, optimizations are available to on-demand video 644 that are not always available to live events, such as pre-loading the 645 first segment for a startup time that doesn't have to wait for a 646 network download to begin. 648 4. Adaptive Encoding, Adaptive Delivery, and Measurement Collection 650 4.1. Overview 652 Adaptive BitRate (ABR) is a sort of application-level response 653 strategy in which the streaming client attempts to detect the 654 available bandwidth of the network path by observing the successful 655 application-layer download speed, then chooses a bitrate for each of 656 the video, audio, subtitles and metadata (among the limited number of 657 available options) that fits within that bandwidth, typically 658 adjusting as changes in available bandwidth occur in the network or 659 changes in capabilities occur during the playback (such as available 660 memory, CPU, display size, etc.). 662 4.2. Adaptive Encoding 664 Media servers can provide media streams at various bitrates because 665 the media has been encoded at various bitrates. This is a so-called 666 "ladder" of bitrates, that can be offered to media players as part of 667 the manifest that describes the media being requested by the media 668 player, so that the media player can select among the available 669 bitrate choices. 671 The media server may also choose to alter which bitrates are made 672 available to players by adding or removing bitrate options from the 673 ladder delivered to the player in subsequent manifests built and sent 674 to the player. This way, both the player, through its selection of 675 bitrate to request from the manifest, and the server, through its 676 construction of the bitrates offered in the manifest, are able to 677 affect network utilization. 679 4.3. Adaptive Segmented Delivery 681 ABR playback is commonly implemented by streaming clients using HLS 682 [RFC8216] or DASH [MPEG-DASH] to perform a reliable segmented 683 delivery of media over HTTP. Different implementations use different 684 strategies [ABRSurvey], often relying on proprietary algorithms 685 (called rate adaptation or bitrate selection algorithms) to perform 686 available bandwidth estimation/prediction and the bitrate selection. 688 Many server-player systems will do an initial probe or a very simple 689 throughput speed test at the start of a video playback. This is done 690 to get a rough sense of the highest video bitrate in the ABR ladder 691 that the network between the server and player will likely be able to 692 provide under initial network conditions. After the initial testing, 693 clients tend to rely upon passive network observations and will make 694 use of player side statistics such as buffer fill rates to monitor 695 and respond to changing network conditions. 697 The choice of bitrate occurs within the context of optimizing for 698 some metric monitored by the client, such as highest achievable video 699 quality or lowest chances for a rebuffering event (playback stall). 701 4.4. Bitrate Detection Challenges 703 This kind of bandwidth-measurement system can experience trouble in 704 several ways that are affected by networking issues. Because 705 adaptive application-level response strategies are often using rates 706 as observed by the application layer, there are sometimes inscrutable 707 transport-level protocol behaviors that can produce surprising 708 measurement values when the application-level feedback loop is 709 interacting with a transport-level feedback loop. 711 A few specific examples of surprising phenomena that affect bitrate 712 detection measurements are described in the following subsections. 713 As these examples will demonstrate, it's common to encounter cases 714 that can deliver application level measurements that are too low, too 715 high, and (possibly) correct but varying more quickly than a lab- 716 tested selection algorithm might expect. 718 These effects and others that cause transport behavior to diverge 719 from lab modeling can sometimes have a significant impact on ABR 720 bitrate selection and on user quality of experience, especially where 721 players use naive measurement strategies and selection algorithms 722 that don't account for the likelihood of bandwidth measurements that 723 diverge from the true path capacity. 725 4.4.1. Idle Time between Segments 727 When the bitrate selection is chosen substantially below the 728 available capacity of the network path, the response to a segment 729 request will typically complete in much less absolute time than the 730 duration of the requested segment, leaving significant idle time 731 between segment downloads. This can have a few surprising 732 consequences: 734 * TCP slow-start when restarting after idle requires multiple RTTs 735 to re-establish a throughput at the network's available capacity. 736 When the active transmission time for segments is substantially 737 shorter than the time between segments, leaving an idle gap 738 between segments that triggers a restart of TCP slow-start, the 739 estimate of the successful download speed coming from the 740 application-visible receive rate on the socket can thus end up 741 much lower than the actual available network capacity. This in 742 turn can prevent a shift to the most appropriate bitrate. 743 [RFC7661] provides some mitigations for this effect at the TCP 744 transport layer, for senders who anticipate a high incidence of 745 this problem. 747 * Mobile flow-bandwidth spectrum and timing mapping can be impacted 748 by idle time in some networks. The carrier capacity assigned to a 749 link can vary with activity. Depending on the idle time 750 characteristics, this can result in a lower available bitrate than 751 would be achievable with a steadier transmission in the same 752 network. 754 Some receiver-side ABR algorithms such as [ELASTIC] are designed to 755 try to avoid this effect. 757 Another way to mitigate this effect is by the help of two 758 simultaneous TCP connections, as explained in [MMSys11] for Microsoft 759 Smooth Streaming. In some cases, the system-level TCP slow-start 760 restart can also be disabled, for example as described in 761 [OReilly-HPBN]. 763 4.4.2. Head-of-Line Blocking 765 In the event of a lost packet on a TCP connection with SACK support 766 (a common case for segmented delivery in practice), loss of a packet 767 can provide a confusing bandwidth signal to the receiving 768 application. Because of the sliding window in TCP, many packets may 769 be accepted by the receiver without being available to the 770 application until the missing packet arrives. Upon arrival of the 771 one missing packet after retransmit, the receiver will suddenly get 772 access to a lot of data at the same time. 774 To a receiver measuring bytes received per unit time at the 775 application layer, and interpreting it as an estimate of the 776 available network bandwidth, this appears as a high jitter in the 777 goodput measurement. This can appear as a stall of some time, 778 followed by a sudden leap that can far exceed the actual capacity of 779 the transport path from the server when the hole in the received data 780 is filled by a later retransmission. 782 It's worth noting that more modern transport protocols such as QUIC 783 have mitigation of head-of-line blocking as a protocol design goal. 784 See Section 5.3 for more details. 786 4.4.3. Wide and Rapid Variation in Path Capacity 788 As many end devices have moved to wireless connectivity for the final 789 hop (Wi-Fi, 5G, or LTE), new problems in bandwidth detction have 790 emerged from radio interference and signal strength effects. 792 Each of these technologies can experience sudden changes in capacity 793 as the end user device moves from place to place and encounters new 794 sources of interference. Microwave ovens, for example, can cause a 795 throughput degradation of more than a factor of 2 while active 796 [Micro]. 5G and LTE likewise can easily see rate variation by a 797 factor of 2 or more over a span of seconds as users move around. 799 These swings in actual transport capacity can result in user 800 experience issues that can be exacerbated by insufficiently 801 responsive ABR algorithms. 803 4.5. Measurement Collection 805 In addition to measurements media players use to guide their segment- 806 by-segment adaptive streaming requests, streaming media providers may 807 also rely on measurements collected from media players to provide 808 analytics that can be used for decisions such as whether the adaptive 809 encoding bitrates in use are the best ones to provide to media 810 players, or whether current media content caching is providing the 811 best experience for viewers. 813 In addition to measurements media players use to guide their segment- 814 by-segment adaptive streaming requests, streaming media providers may 815 also rely on measurements collected from media players to provide 816 analytics that can be used for decisions such as whether the adaptive 817 encoding bitrates in use are the best ones to provide to media 818 players, or whether current media content caching is providing the 819 best experience for viewers. To that effect, the Consumer Technology 820 Association (CTA) who owns the Web Application Video Ecosystem (WAVE) 821 project has published two important specifications. 823 4.5.1. CTA-2066: Streaming Quality of Experience Events, Properties and 824 Metrics 826 [CTA-2066] specifies a set of media player events, properties, 827 quality of experience (QoE) metrics and associated terminology for 828 representing streaming media quality of experience across systems, 829 media players and analytics vendors. While all these events, 830 properties, metrics and associated terminology is used across a 831 number of proprietary analytics and measurement solutions, they were 832 used in slightly (or vastly) different ways that led to 833 interoperability issues. CTA-2066 attempts to address this issue by 834 defining a common terminology as well as how each metric should be 835 computed for consistent reporting. 837 4.5.2. CTA-5004: Common Media Client Data (CMCD) 839 Many assumes that the CDNs have a holistic view into the health and 840 performance of the streaming clients. However, this is not the case. 841 The CDNs produce millions of log lines per second across hundreds of 842 thousands of clients and they have no concept of a "session" as a 843 client would have, so CDNs are decoupled from the metrics the clients 844 generate and report. A CDN cannot tell which request belongs to 845 which playback session, the duration of any media object, the 846 bitrate, or whether any of the clients have stalled and are 847 rebuffering or are about to stall and will rebuffer. The consequence 848 of this decoupling is that a CDN cannot prioritize delivery for when 849 the client needs it most, prefetch content, or trigger alerts when 850 the network itself may be underperforming. One approach to couple 851 the CDN to the playback sessions is for the clients to communicate 852 standardized media-relevant information to the CDNs while they are 853 fetching data. [CTA-5004] was developed exactly for this purpose. 855 4.6. Unreliable Transport 857 In contrast to segmented delivery, several applications use 858 unreliable UDP or SCTP with its "partial reliability" extension 859 [RFC3758] to deliver Media encapsulated in RTP [RFC3550] or raw MPEG 860 Transport Stream ("MPEG-TS")-formatted video [MPEG-TS], when the 861 media is being delivered in situations such as broadcast and live 862 streaming, that better tolerate occasional packet loss without 863 retransmission. 865 Under congestion and loss, this approach generally experiences more 866 video artifacts with fewer delay or head-of-line blocking effects. 867 Often one of the key goals is to reduce latency, to better support 868 applications like videoconferencing, or for other live-action video 869 with interactive components, such as some sporting events. 871 The Secure Reliable Transport protocol [SRT] also uses UDP in an 872 effort to achieve lower latency for streaming media, although it adds 873 reliability at the application layer. 875 Congestion avoidance strategies for deployments using unreliable 876 transport protocols vary widely in practice, ranging from being 877 entirely unresponsive to congestion, to using feedback signaling to 878 change encoder settings (as in [RFC5762]), to using fewer enhancement 879 layers (as in [RFC6190]), to using proprietary methods to detect 880 "quality of experience" issues and turn off video in order to allow 881 less bandwidth-intensive media such as audio to be delivered. 883 More details about congestion avoidance strategies used with 884 unreliable transport protocols are included in Section 5.1. 886 5. Evolution of Transport Protocols and Transport Protocol Behaviors 888 Because networking resources are shared between users, a good place 889 to start our discussion is how contention between users, and 890 mechanisms to resolve that contention in ways that are "fair" between 891 users, impact streaming media users. These topics are closely tied 892 to transport protocol behaviors. 894 As noted in Section 4, Adaptive Bitrate response strategies such as 895 HLS [RFC8216] or DASH [MPEG-DASH] are attempting to respond to 896 changing path characteristics, and underlying transport protocols are 897 also attempting to respond to changing path characteristics. 899 For most of the history of the Internet, these transport protocols, 900 described in Section 5.1 and Section 5.2, have had relatively 901 consistent behaviors that have changed slowly, if at all, over time. 902 Newly standardized transport protocols like QUIC [RFC9000] can behave 903 differently from existing transport protocols, and these behaviors 904 may evolve over time more rapidly than currently-used transport 905 protocols. 907 For this reason, we have included a description of how the path 908 characteristics that streaming media providers may see are likely to 909 evolve over time. 911 5.1. UDP and Its Behavior 913 For most of the history of the Internet, we have trusted UDP-based 914 applications to limit their impact on other users. One of the 915 strategies used was to use UDP for simple query-response application 916 protocols, such as DNS, which is often used to send a single-packet 917 request to look up the IP address for a DNS name, and return a 918 single-packet response containing the IP address. Although it is 919 possible to saturate a path between a DNS client and DNS server with 920 DNS requests, in practice, that was rare enough that DNS included few 921 mechanisms to resolve contention between DNS users and other users 922 (whether they are also using DNS, or using other application 923 protocols). 925 In recent times, the usage of UDP-based applications that were not 926 simple query-response protocols has grown substantially, and since 927 UDP does not provide any feedback mechanism to senders to help limit 928 impacts on other users, application-level protocols such as RTP 929 [RFC3550] have been responsible for the decisions that TCP-based 930 applications have delegated to TCP - what to send, how much to send, 931 and when to send it. So, the way some UDP-based applications 932 interact with other users has changed. 934 It's also worth pointing out that because UDP has no transport-layer 935 feedback mechanisms, UDP-based applications that send and receive 936 substantial amounts of information are expected to provide their own 937 feedback mechanisms. This expectation is most recently codified in 938 Best Current Practice [RFC8085]. 940 RTP relies on RTCP Sender and Receiver Reports [RFC3550] as its own 941 feedback mechanism, and even includes Circuit Breakers for Unicast 942 RTP Sessions [RFC8083] for situations when normal RTP congestion 943 control has not been able to react sufficiently to RTP flows sending 944 at rates that result in sustained packet loss. 946 The notion of "Circuit Breakers" has also been applied to other UDP 947 applications in [RFC8084], such as tunneling packets over UDP that 948 are potentially not congestion-controlled (for example, 949 "Encapsulating MPLS in UDP", as described in [RFC7510]). If 950 streaming media is carried in tunnels encapsulated in UDP, these 951 media streams may encounter "tripped circuit breakers", with 952 resulting user-visible impacts. 954 5.2. TCP and Its Behavior 956 For most of the history of the Internet, we have trusted the TCP 957 protocol to limit the impact of applications that sent a significant 958 number of packets, in either or both directions, on other users. 959 Although early versions of TCP were not particularly good at limiting 960 this impact [RFC0793], the addition of Slow Start and Congestion 961 Avoidance, as described in [RFC2001], were critical in allowing TCP- 962 based applications to "use as much bandwidth as possible, but to 963 avoid using more bandwidth than was possible". Although dozens of 964 RFCs have been written refining TCP decisions about what to send, how 965 much to send, and when to send it, since 1988 [Jacobson-Karels] the 966 signals available for TCP senders remained unchanged - end-to-end 967 acknowledgments for packets that were successfully sent and received, 968 and packet timeouts for packets that were not. 970 The success of the largely TCP-based Internet is evidence that the 971 mechanisms TCP used to achieve equilibrium quickly, at a point where 972 TCP senders do not interfere with other TCP senders for sustained 973 periods of time, have been largely successful. The Internet 974 continued to work even when the specific mechanisms used to reach 975 equilibrium changed over time. Because TCP provides a common tool to 976 avoid contention, as some TCP-based applications like FTP were 977 largely replaced by other TCP-based applications like HTTP, the 978 transport behavior remained consistent. 980 In recent times, the TCP goal of probing for available bandwidth, and 981 "backing off" when a network path is saturated, has been supplanted 982 by the goal of avoiding growing queues along network paths, which 983 prevent TCP senders from reacting quickly when a network path is 984 saturated. Congestion control mechanisms such as COPA [COPA18] and 985 BBR [I-D.cardwell-iccrg-bbr-congestion-control] make these decisions 986 based on measured path delays, assuming that if the measured path 987 delay is increasing, the sender is injecting packets onto the network 988 path faster than the receiver can accept them, so the sender should 989 adjust its sending rate accordingly. 991 Although TCP protocol behavior has changed over time, the common 992 practice of implementing TCP as part of an operating system kernel 993 has acted to limit how quickly TCP behavior can change. Even with 994 the widespread use of automated operating system update installation 995 on many end-user systems, streaming media providers could have a 996 reasonable expectation that they could understand TCP transport 997 protocol behaviors, and that those behaviors would remain relatively 998 stable in the short term. 1000 5.3. The QUIC Protocol and Its Behavior 1002 The QUIC protocol, developed from a proprietary protocol into an IETF 1003 standards-track protocol [RFC9000], turns many of the statements made 1004 in Section 5.1 and Section 5.2 on their heads. 1006 Although QUIC provides an alternative to the TCP and UDP transport 1007 protocols, QUIC is itself encapsulated in UDP. As noted elsewhere in 1008 Section 6.1, the QUIC protocol encrypts almost all of its transport 1009 parameters, and all of its payload, so any intermediaries that 1010 network operators may be using to troubleshoot HTTP streaming media 1011 performance issues, perform analytics, or even intercept exchanges in 1012 current applications will not work for QUIC-based applications 1013 without making changes to their networks. Section 6 describes the 1014 implications of media encryption in more detail. 1016 While QUIC is designed as a general-purpose transport protocol, and 1017 can carry different application-layer protocols, the current 1018 standardized mapping is for HTTP/3 [I-D.ietf-quic-http], which 1019 describes how QUIC transport features are used for HTTP. The 1020 convention is for HTTP/3 to run over UDP port 443 [Port443] but this 1021 is not a strict requirement. 1023 When HTTP/3 is encapsulated in QUIC, which is then encapsulated in 1024 UDP, streaming operators (and network operators) might see UDP 1025 traffic patterns that are similar to HTTP(S) over TCP. Since earlier 1026 versions of HTTP(S) rely on TCP, UDP ports may be blocked for any 1027 port numbers that are not commonly used, such as UDP 53 for DNS. 1029 Even when UDP ports are not blocked and HTTP/3 can flow, streaming 1030 operators (and network operators) may severely rate-limit this 1031 traffic because they do not expect to see legitimate high-bandwidth 1032 traffic such as streaming media over the UDP ports that HTTP/3 is 1033 using. 1035 As noted in Section 4.4.2, because TCP provides a reliable, in-order 1036 delivery service for applications, any packet loss for a TCP 1037 connection causes "head-of-line blocking", so that no TCP segments 1038 arriving after a packet is lost will be delivered to the receiving 1039 application until the lost packet is retransmitted, allowing in-order 1040 delivery to the application to continue. As described in [RFC9000], 1041 QUIC connections can carry multiple streams, and when packet losses 1042 do occur, only the streams carried in the lost packet are delayed. 1044 A QUIC extension currently being specified ([I-D.ietf-quic-datagram]) 1045 adds the capability for "unreliable" delivery, similar to the service 1046 provided by UDP, but these datagrams are still subject to the QUIC 1047 connection's congestion controller, providing some transport-level 1048 congestion avoidance measures, which UDP does not. 1050 As noted in Section 5.2, there is increasing interest in transport 1051 protocol behaviors that responds to delay measurements, instead of 1052 responding to packet loss. These behaviors may deliver improved user 1053 experience, but in some cases have not responded to sustained packet 1054 loss, which exhausts available buffers along the end-to-end path that 1055 may affect other users sharing that path. The QUIC protocol provides 1056 a set of congestion control hooks that can be use for algorithm 1057 agility, and [RFC9002] defines a basic algorithm with transport 1058 behavior that is roughly similar to TCP NewReno [RFC6582]. However, 1059 QUIC senders can and do unilaterally chose to use different 1060 algorithms such as loss-based CUBIC [RFC8312], delay-based COPA or 1061 BBR, or even something completely different 1063 We do have experience with deploying new congestion controllers 1064 without melting the Internet (CUBIC is one example), but the point 1065 mentioned in Section 5.2 about TCP being implemented in operating 1066 system kernels is also different with QUIC. Although QUIC can be 1067 implemented in operating system kernels, one of the design goals when 1068 this work was chartered was "QUIC is expected to support rapid, 1069 distributed development and testing of features", and to meet this 1070 expectation, many implementers have chosen to implement QUIC in user 1071 space, outside the operating system kernel, and to even distribute 1072 QUIC libraries with their own applications. 1074 The decision to deploy a new version of QUIC is relatively 1075 uncontrolled, compared to other widely used transport protocols, and 1076 this can include new transport behaviors that appear without much 1077 notice except to the QUIC endpoints. At IETF 105, Christian Huitema 1078 and Brian Trammell presented a talk on "Congestion Defense in Depth" 1079 [CDiD], that explored potential concerns about new QUIC congestion 1080 controllers being broadly deployed without the testing and 1081 instrumentation that current major content providers routinely 1082 include. The sense of the room at IETF 105 was that the current 1083 major content providers understood what is at stake when they deploy 1084 new congestion controllers, but this presentation, and the related 1085 discussion in TSVAREA minutes from IETF 105 ([tsvarea-105], are still 1086 worth a look for new and rapidly growing content providers. 1088 It is worth considering that if TCP-based HTTP traffic and UDP-based 1089 HTTP/3 traffic are allowed to enter operator networks on roughly 1090 equal terms, questions of fairness and contention will be heavily 1091 dependent on interactions between the congestion controllers in use 1092 for TCP-base HTTP traffic and UDP-based HTTP/3 traffic. 1094 More broadly, [I-D.ietf-quic-manageability] discusses manageability 1095 of the QUIC transport protocol, focusing on the implications of 1096 QUIC's design and wire image on network operations involving QUIC 1097 traffic. It discusses what network operators can consider in some 1098 detail. 1100 6. Streaming Encrypted Media 1102 "Encrypted Media" has at least three meanings: 1104 * Media encrypted at the application layer, typically using some 1105 sort of Digital Rights Management (DRM) system, and typically 1106 remaining encrypted "at rest", when senders and receivers store 1107 it, 1109 * Media encrypted by the sender at the transport layer, and 1110 remaining encrypted until it reaches the ultimate media consumer 1111 (in this document, referred to as "end-to-end media encryption"), 1112 and 1114 * Media encrypted by the sender at the transport layer, and 1115 remaining encrypted until it reaches some intermediary that is 1116 _not_ the ultimate media consumer, but has credentials allowing 1117 decryption of the media content. This intermediary may examine 1118 and even transform the media content in some way, before 1119 forwarding re-encrypted media content (in this document referred 1120 to as "hop-by-hop media encryption") 1122 Both "hop-by-hop" and "end-to-end" encrypted transport may carry 1123 media that is, in addition, encrypted at the application layer. 1125 Each of these encryption strategies is intended to achieve a 1126 different goal. For instance, application-level encryption may be 1127 used for business purposes, such as avoiding piracy or enforcing 1128 geographic restrictions on playback, while transport-layer encryption 1129 may be used to prevent media steam manipulation or to protect 1130 manifests. 1132 This document does not take a position on whether those goals are 1133 "valid" (whatever that might mean). 1135 In this document, we will focus on media encrypted at the transport 1136 layer, whether encrypted "hop-by-hop" or "end-to-end". Because media 1137 encrypted at the application layer will only be processed by 1138 application-level entities, this encryption does not have transport- 1139 layer implications. 1141 Both "End-to-End" and "Hop-by-Hop" media encryption have specific 1142 implications for streaming operators. These are described in 1143 Section 6.2 and Section 6.3. 1145 6.1. General Considerations for Media Encryption 1147 The use of strong encryption does provide confidentiality for 1148 encrypted streaming media, from the sender to either an intermediary 1149 or the ultimate media consumer, and this does prevent Deep Packet 1150 Inspection by any intermediary that does not possess credentials 1151 allowing decryption. However, even encrypted content streams may be 1152 vulnerable to traffic analysis. An intermediary that can identify an 1153 encrypted media stream without decrypting it, may be able to 1154 "fingerprint" the encrypted media stream of known content, and then 1155 match the targeted media stream against the fingerprints of known 1156 content. This protection can be lessened if a media provider is 1157 repeatedly encrypting the same content. [CODASPY17] is an example of 1158 what is possible when identifying HTTPS-protected videos over TCP 1159 transport, based either on the length of entire resources being 1160 transferred, or on characteristic packet patterns at the beginning of 1161 a resource being transferred. 1163 If traffic analysis is successful at identifying encrypted content 1164 and associating it with specific users, this breaks privacy as 1165 certainly as examining decrypted traffic. 1167 Because HTTPS has historically layered HTTP on top of TLS, which is 1168 in turn layered on top of TCP, intermediaries do have access to 1169 unencrypted TCP-level transport information, such as retransmissions, 1170 and some carriers exploited this information in attempts to improve 1171 transport-layer performance [RFC3135]. The most recent standardized 1172 version of HTTPS, HTTP/3 [I-D.ietf-quic-http], uses the QUIC protocol 1174 [RFC9000] as its transport layer. QUIC relies on the TLS 1.3 initial 1175 handshake [RFC8446] only for key exchange [RFC9001], and encrypts 1176 almost all transport parameters itself, with the exception of a few 1177 invariant header fields. In the QUIC short header, the only 1178 transport-level parameter which is sent "in the clear" is the 1179 Destination Connection ID [RFC8999], and even in the QUIC long 1180 header, the only transport-level parameters sent "in the clear" are 1181 the Version, Destination Connection ID, and Source Connection ID. 1182 For these reasons, HTTP/3 is significantly more "opaque" than HTTPS 1183 with HTTP/1 or HTTP/2. 1185 6.2. Considerations for "Hop-by-Hop" Media Encryption 1187 Although the IETF has put considerable emphasis on end-to-end 1188 streaming media encryption, there are still important use cases that 1189 require the insertion of intermediaries. 1191 There are a variety of ways to involve intermediaries, and some are 1192 much more intrusive than others. 1194 From a content provider's perspective, a number of considerations are 1195 in play. The first question is likely whether the content provider 1196 intends that intermediaries are explicitly addressed from endpoints, 1197 or whether the content provider is willing to allow intermediaries to 1198 "intercept" streaming content transparently, with no awareness or 1199 permission from either endpoint. 1201 If a content provider does not actively work to avoid interception by 1202 intermediaries, the effect will be indistinguishable from 1203 "impersonation attacks", and endpoints cannot be assumed of any level 1204 of privacy. 1206 Assuming that a content provider does intend to allow intermediaries 1207 to participate in content streaming, and does intend to provide some 1208 level of privacy for endpoints, there are a number of possible tools, 1209 either already available or still being specified. These include 1211 * Server And Network assisted DASH [MPEG-DASH-SAND] - this 1212 specification introduces explicit messaging between DASH clients 1213 and network elements or between various network elements for the 1214 purpose of improving the efficiency of streaming sessions by 1215 providing information about real-time operational characteristics 1216 of networks, servers, proxies, caches, CDNs, as well as DASH 1217 client's performance and status. 1219 * "Double Encryption Procedures for the Secure Real-Time Transport 1220 Protocol (SRTP)" [RFC8723] - this specification provides a 1221 cryptographic transform for the Secure Real-time Transport 1222 Protocol that provides both hop-by-hop and end-to-end security 1223 guarantees. 1225 * Secure Media Frames [SFRAME] - [RFC8723] is closely tied to SRTP, 1226 and this close association impeded widespread deployment, because 1227 it could not be used for the most common media content delivery 1228 mechanisms. A more recent proposal, Secure Media Frames [SFRAME], 1229 also provides both hop-by-hop and end-to-end security guarantees, 1230 but can be used with other transport protocols beyond SRTP. 1232 If a content provider chooses not to involve intermediaries, this 1233 choice should be carefully considered. As an example, if media 1234 manifests are encrypted end-to-end, network providers who had been 1235 able to lower offered quality and reduce on their networks will no 1236 longer be able to do that. Some resources that might inform this 1237 consideration are in [RFC8825] (for WebRTC) and 1238 [I-D.ietf-quic-manageability] (for HTTP/3 and QUIC). 1240 6.3. Considerations for "End-to-End" Media Encryption 1242 "End-to-end" media encryption offers the potential of providing 1243 privacy for streaming media consumers, with the idea being that if an 1244 unauthorized intermediary can't decrypt streaming media, the 1245 intermediary can't use Deep Packet Inspection (DPI) to examine HTTP 1246 request and response headers and identify the media content being 1247 streamed. 1249 "End-to-end" media encryption has become much more widespread in the 1250 years since the IETF issued "Pervasive Monitoring Is an Attack" 1251 [RFC7258] as a Best Current Practice, describing pervasive monitoring 1252 as a much greater threat than previously appreciated. After the 1253 Snowden disclosures, many content providers made the decision to use 1254 HTTPS protection - HTTP over TLS - for most or all content being 1255 delivered as a routine practice, rather than in exceptional cases for 1256 content that was considered "sensitive". 1258 Unfortunately, as noted in [RFC7258], there is no way to prevent 1259 pervasive monitoring by an "attacker", while allowing monitoring by a 1260 more benign entity who "only" wants to use DPI to examine HTTP 1261 requests and responses in order to provide a better user experience. 1262 If a modern encrypted transport protocol is used for end-to-end media 1263 encryption, intermediary streaming operators are unable to examine 1264 transport and application protocol behavior. As described in 1265 Section 6.2, only an intermediary streaming operator who is 1266 explicitly authorized to examine packet payloads, rather than 1267 intercepting packets and examining them without authorization, can 1268 continue these practices. 1270 [RFC7258] said that "The IETF will strive to produce specifications 1271 that mitigate pervasive monitoring attacks", so streaming operators 1272 should expect the IETF's direction toward preventing unauthorized 1273 monitoring of IETF protocols to continue for the forseeable future. 1275 7. IANA Considerations 1277 This document requires no actions from IANA. 1279 8. Security Considerations 1281 This document introduces no new security issues. 1283 9. Acknowledgments 1285 Thanks to Alexandre Gouaillard, Aaron Falk, Dave Oran, Glenn Deen, 1286 Kyle Rose, Leslie Daigle, Lucas Pardue, Mark Nottingham, Matt Stock, 1287 Mike English, Roni Even, and Will Law for very helpful suggestions, 1288 reviews and comments. 1290 (If we missed your name, please let us know!) 1292 10. Informative References 1294 [ABRSurvey] 1295 Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R., and 1296 A. Bentaleb et al, "A Survey on Bitrate Adaptation Schemes 1297 for Streaming Media Over HTTP", IEEE Communications 1298 Surveys & Tutorials , 2019, 1299 . 1301 [CDiD] Huitema, C. and B. Trammell, "(A call for) Congestion 1302 Defense in Depth", July 2019, 1303 . 1306 [CMAF-CTE] Law, W., "Ultra-Low-Latency Streaming Using Chunked- 1307 Encoded and Chunked Transferred CMAF", October 2018, 1308 . 1311 [CODASPY17] 1312 Reed, A. and M. Kranch, "Identifying HTTPS-Protected 1313 Netflix Videos in Real-Time", ACM CODASPY , March 2017, 1314 . 1316 [COPA18] Arun, V. and H. Balakrishnan, "Copa: Practical Delay-Based 1317 Congestion Control for the Internet", USENIX NSDI , April 1318 2018, . 1320 [CTA-2066] Consumer Technology Association, "Streaming Quality of 1321 Experience Events, Properties and Metrics", March 2020, 1322 . 1325 [CTA-5004] CTA, ., "Common Media Client Data (CMCD)", September 2020, 1326 . 1329 [CVNI] "Cisco Visual Networking Index: Forecast and Trends, 1330 2017-2022 White Paper", 27 February 2019, 1331 . 1335 [ELASTIC] De Cicco, L., Caldaralo, V., Palmisano, V., and S. 1336 Mascolo, "ELASTIC: A client-side controller for dynamic 1337 adaptive streaming over HTTP (DASH)", Packet Video 1338 Workshop , December 2013, 1339 . 1341 [Encodings] 1342 Apple, Inc, ., "HLS Authoring Specification for Apple 1343 Devices", June 2020, 1344 . 1348 [I-D.cardwell-iccrg-bbr-congestion-control] 1349 Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson, 1350 "BBR Congestion Control", Work in Progress, Internet- 1351 Draft, draft-cardwell-iccrg-bbr-congestion-control-00, 3 1352 July 2017, . 1355 [I-D.draft-pantos-hls-rfc8216bis] 1356 Pantos, R., "HTTP Live Streaming 2nd Edition", Work in 1357 Progress, Internet-Draft, draft-pantos-hls-rfc8216bis-09, 1358 27 April 2021, . 1361 [I-D.ietf-quic-datagram] 1362 Pauly, T., Kinnear, E., and D. Schinazi, "An Unreliable 1363 Datagram Extension to QUIC", Work in Progress, Internet- 1364 Draft, draft-ietf-quic-datagram-02, 16 February 2021, 1365 . 1368 [I-D.ietf-quic-http] 1369 Bishop, M., "Hypertext Transfer Protocol Version 3 1370 (HTTP/3)", Work in Progress, Internet-Draft, draft-ietf- 1371 quic-http-34, 2 February 2021, 1372 . 1375 [I-D.ietf-quic-manageability] 1376 Kuehlewind, M. and B. Trammell, "Manageability of the QUIC 1377 Transport Protocol", Work in Progress, Internet-Draft, 1378 draft-ietf-quic-manageability-11, 21 April 2021, 1379 . 1382 [IABcovid] Arkko, J., Farrel, S., Kühlewind, M., and C. Perkins, 1383 "Report from the IAB COVID-19 Network Impacts Workshop 1384 2020", November 2020, . 1387 [Jacobson-Karels] 1388 Jacobson, V. and M. Karels, "Congestion Avoidance and 1389 Control", November 1988, 1390 . 1392 [Labovitz] Labovitz, C., "Network traffic insights in the time of 1393 COVID-19: April 9 update", April 2020, 1394 . 1397 [LabovitzDDoS] 1398 Takahashi, D., "Why the game industry is still vulnerable 1399 to DDoS attacks", May 2018, 1400 . 1404 [LL-DASH] DASH-IF, ., "Low-latency Modes for DASH", March 2020, 1405 . 1407 [Micro] Taher, T.M., Misurac, M.J., LoCicero, J.L., and D.R. Ucci, 1408 "Microwave Oven Signal Interference Mitigation For Wi-Fi 1409 Communication Systems", 2008 5th IEEE Consumer 1410 Communications and Networking Conference 5th IEEE, pp. 1411 67-68 , 2008. 1413 [Mishra] Mishra, S. and J. Thibeault, "An update on Streaming Video 1414 Alliance", April 2020, 1415 . 1420 [MMSP20] Durak, K. and . et al, "Evaluating the performance of 1421 Apple's low-latency HLS", IEEE MMSP , September 2020, 1422 . 1424 [MMSys11] Akhshabi, S., Begen, A.C., and C. Dovrolis, "An 1425 experimental evaluation of rate-adaptation algorithms in 1426 adaptive streaming over HTTP", ACM MMSys , February 2011, 1427 . 1429 [MPEG-CMAF] 1430 "ISO/IEC 23000-19:2020 Multimedia application format 1431 (MPEG-A) - Part 19: Common media application format (CMAF) 1432 for segmented media", March 2020, 1433 . 1435 [MPEG-DASH] 1436 "ISO/IEC 23009-1:2019 Dynamic adaptive streaming over HTTP 1437 (DASH) - Part 1: Media presentation description and 1438 segment formats", December 2019, 1439 . 1441 [MPEG-DASH-SAND] 1442 "ISO/IEC 23009-5:2017 Dynamic adaptive streaming over HTTP 1443 (DASH) - Part 5: Server and network assisted DASH (SAND)", 1444 February 2017, . 1446 [MPEG-TS] "H.222.0 : Information technology - Generic coding of 1447 moving pictures and associated audio information: 1448 Systems", 29 August 2018, 1449 . 1451 [MPEGI] Boyce, J.M. and . et al, "MPEG Immersive Video Coding 1452 Standard", Proceedings of the IEEE , n.d., 1453 . 1455 [OReilly-HPBN] 1456 "High Performance Browser Networking (Chapter 2: Building 1457 Blocks of TCP)", May 2021, 1458 . 1460 [PCC] Schwarz, S. and . et al, "Emerging MPEG Standards for 1461 Point Cloud Compression", IEEE Journal on Emerging and 1462 Selected Topics in Circuits and Systems , March 2019, 1463 . 1465 [Port443] "Service Name and Transport Protocol Port Number 1466 Registry", April 2021, . 1470 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1471 RFC 793, DOI 10.17487/RFC0793, September 1981, 1472 . 1474 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast 1475 Retransmit, and Fast Recovery Algorithms", RFC 2001, 1476 DOI 10.17487/RFC2001, January 1997, 1477 . 1479 [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 1480 Shelby, "Performance Enhancing Proxies Intended to 1481 Mitigate Link-Related Degradations", RFC 3135, 1482 DOI 10.17487/RFC3135, June 2001, 1483 . 1485 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1486 Jacobson, "RTP: A Transport Protocol for Real-Time 1487 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1488 July 2003, . 1490 [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. 1491 Conrad, "Stream Control Transmission Protocol (SCTP) 1492 Partial Reliability Extension", RFC 3758, 1493 DOI 10.17487/RFC3758, May 2004, 1494 . 1496 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 1497 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 1498 DOI 10.17487/RFC4733, December 2006, 1499 . 1501 [RFC5594] Peterson, J. and A. Cooper, "Report from the IETF Workshop 1502 on Peer-to-Peer (P2P) Infrastructure, May 28, 2008", 1503 RFC 5594, DOI 10.17487/RFC5594, July 2009, 1504 . 1506 [RFC5762] Perkins, C., "RTP and the Datagram Congestion Control 1507 Protocol (DCCP)", RFC 5762, DOI 10.17487/RFC5762, April 1508 2010, . 1510 [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. 1511 Eleftheriadis, "RTP Payload Format for Scalable Video 1512 Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, 1513 . 1515 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1516 NewReno Modification to TCP's Fast Recovery Algorithm", 1517 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1518 . 1520 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 1521 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 1522 DOI 10.17487/RFC6817, December 2012, 1523 . 1525 [RFC6843] Clark, A., Gross, K., and Q. Wu, "RTP Control Protocol 1526 (RTCP) Extended Report (XR) Block for Delay Metric 1527 Reporting", RFC 6843, DOI 10.17487/RFC6843, January 2013, 1528 . 1530 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 1531 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 1532 RFC 7234, DOI 10.17487/RFC7234, June 2014, 1533 . 1535 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1536 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 1537 2014, . 1539 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1540 "Encapsulating MPLS in UDP", RFC 7510, 1541 DOI 10.17487/RFC7510, April 2015, 1542 . 1544 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1545 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1546 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1547 DOI 10.17487/RFC7656, November 2015, 1548 . 1550 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1551 TCP to Support Rate-Limited Traffic", RFC 7661, 1552 DOI 10.17487/RFC7661, October 2015, 1553 . 1555 [RFC8083] Perkins, C. and V. Singh, "Multimedia Congestion Control: 1556 Circuit Breakers for Unicast RTP Sessions", RFC 8083, 1557 DOI 10.17487/RFC8083, March 2017, 1558 . 1560 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1561 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1562 . 1564 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1565 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1566 March 2017, . 1568 [RFC8216] Pantos, R., Ed. and W. May, "HTTP Live Streaming", 1569 RFC 8216, DOI 10.17487/RFC8216, August 2017, 1570 . 1572 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1573 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1574 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1575 . 1577 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 1578 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 1579 . 1581 [RFC8622] Bless, R., "A Lower-Effort Per-Hop Behavior (LE PHB) for 1582 Differentiated Services", RFC 8622, DOI 10.17487/RFC8622, 1583 June 2019, . 1585 [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, 1586 "Double Encryption Procedures for the Secure Real-Time 1587 Transport Protocol (SRTP)", RFC 8723, 1588 DOI 10.17487/RFC8723, April 2020, 1589 . 1591 [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for 1592 Browser-Based Applications", RFC 8825, 1593 DOI 10.17487/RFC8825, January 2021, 1594 . 1596 [RFC8999] Thomson, M., "Version-Independent Properties of QUIC", 1597 RFC 8999, DOI 10.17487/RFC8999, May 2021, 1598 . 1600 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1601 Multiplexed and Secure Transport", RFC 9000, 1602 DOI 10.17487/RFC9000, May 2021, 1603 . 1605 [RFC9001] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 1606 QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021, 1607 . 1609 [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection 1610 and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, 1611 May 2021, . 1613 [SFRAME] "Secure Media Frames Working Group (Home Page)", n.d., 1614 . 1616 [SRT] Sharabayko, M., "Secure Reliable Transport (SRT) Protocol 1617 Overview", 15 April 2020, 1618 . 1623 [tsvarea-105] 1624 "TSVAREA Minutes - IETF 105", July 2019, 1625 . 1628 Authors' Addresses 1630 Jake Holland 1631 Akamai Technologies, Inc. 1632 150 Broadway 1633 Cambridge, MA 02144, 1634 United States of America 1636 Email: jakeholland.net@gmail.com 1637 Ali Begen 1638 Networked Media 1639 Turkey 1641 Email: ali.begen@networked.media 1643 Spencer Dawkins 1644 Tencent America LLC 1645 United States of America 1647 Email: spencerdawkins.ietf@gmail.com