idnits 2.17.1 draft-ietf-mops-streaming-opcons-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 June 2021) is 1024 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 == Outdated reference: A later version (-14) exists of draft-pantos-hls-rfc8216bis-09 == Outdated reference: A later version (-10) exists of draft-ietf-quic-datagram-02 == Outdated reference: A later version (-18) exists of draft-ietf-quic-manageability-11 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2001 (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MOPS J. Holland 3 Internet-Draft Akamai Technologies, Inc. 4 Intended status: Informational A. Begen 5 Expires: 10 December 2021 Networked Media 6 S. Dawkins 7 Tencent America LLC 8 8 June 2021 10 Operational Considerations for Streaming Media 11 draft-ietf-mops-streaming-opcons-05 13 Abstract 15 This document provides an overview of operational networking issues 16 that pertain to quality of experience in streaming of video and other 17 high-bitrate media over the Internet. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 10 December 2021. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Notes for Contributors and Reviewers . . . . . . . . . . 4 54 1.1.1. Venues for Contribution and Discussion . . . . . . . 4 55 1.1.2. Template for Contributions . . . . . . . . . . . . . 5 56 1.1.3. History of Public Discussion . . . . . . . . . . . . 6 57 2. Bandwidth Provisioning . . . . . . . . . . . . . . . . . . . 6 58 2.1. Scaling Requirements for Media Delivery . . . . . . . . . 6 59 2.1.1. Video Bitrates . . . . . . . . . . . . . . . . . . . 6 60 2.1.2. Virtual Reality Bitrates . . . . . . . . . . . . . . 7 61 2.2. Path Requirements . . . . . . . . . . . . . . . . . . . . 8 62 2.3. Caching Systems . . . . . . . . . . . . . . . . . . . . . 8 63 2.4. Predictable Usage Profiles . . . . . . . . . . . . . . . 9 64 2.5. Unpredictable Usage Profiles . . . . . . . . . . . . . . 9 65 2.6. Extremely Unpredictable Usage Profiles . . . . . . . . . 10 66 3. Latency Considerations . . . . . . . . . . . . . . . . . . . 12 67 3.1. Ultra Low-Latency . . . . . . . . . . . . . . . . . . . . 12 68 3.2. Low-Latency Live . . . . . . . . . . . . . . . . . . . . 13 69 3.3. Non-Low-Latency Live . . . . . . . . . . . . . . . . . . 14 70 3.4. On-Demand . . . . . . . . . . . . . . . . . . . . . . . . 15 71 4. Adaptive Encoding, Adaptive Delivery, and Measurement 72 Collection . . . . . . . . . . . . . . . . . . . . . . . 15 73 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 15 74 4.2. Adaptive Encoding . . . . . . . . . . . . . . . . . . . . 16 75 4.3. Adaptive Segmented Delivery . . . . . . . . . . . . . . . 16 76 4.3.1. Idle Time between Segments . . . . . . . . . . . . . 17 77 4.3.2. Head-of-Line Blocking . . . . . . . . . . . . . . . . 17 78 4.4. Measurement Collection . . . . . . . . . . . . . . . . . 18 79 4.4.1. CTA-2066: Streaming Quality of Experience Events, 80 Properties and Metrics . . . . . . . . . . . . . . . 18 81 4.4.2. CTA-5004: Common Media Client Data (CMCD) . . . . . . 19 82 4.5. Unreliable Transport . . . . . . . . . . . . . . . . . . 19 83 5. Evolution of Transport Protocols and Transport Protocol 84 Behaviors . . . . . . . . . . . . . . . . . . . . . . . . 20 85 5.1. UDP and Its Behavior . . . . . . . . . . . . . . . . . . 20 86 5.2. TCP and Its Behavior . . . . . . . . . . . . . . . . . . 21 87 5.3. The QUIC Protocol and Its Behavior . . . . . . . . . . . 22 88 6. Streaming Encrypted Media . . . . . . . . . . . . . . . . . . 24 89 6.1. General Considerations for Media Encryption . . . . . . . 25 90 6.2. Considerations for "Hop-by-Hop" Media Encryption . . . . 26 91 6.3. Considerations for "End-to-End" Media Encryption . . . . 27 92 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 93 8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 94 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 95 10. Informative References . . . . . . . . . . . . . . . . . . . 28 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 98 1. Introduction 100 As the internet has grown, an increasingly large share of the traffic 101 delivered to end users has become video. Estimates put the total 102 share of internet video traffic at 75% in 2019, expected to grow to 103 82% by 2022. This estimate projects the gross volume of video 104 traffic will more than double during this time, based on a compound 105 annual growth rate continuing at 34% (from Appendix D of [CVNI]). 107 A substantial part of this growth is due to increased use of 108 streaming video, although the amount of video traffic in real-time 109 communications (for example, online videoconferencing) has also grown 110 significantly. While both streaming video and videoconferencing have 111 real-time delivery and latency requirements, these requirements vary 112 from one application to another. For example, videoconferencing 113 demands an end-to-end (one-way) latency of a few hundreds of 114 milliseconds whereas live streaming can tolerate latencies of several 115 seconds. 117 This document specifically focuses on the streaming applications and 118 defines streaming as follows: 120 * Streaming is transmission of a continuous media from a server to a 121 client and its simultaneous consumption by the client. 123 * Here, continuous media refers to media and associated streams such 124 as video, audio, metadata, etc. In this definition, the critical 125 term is "simultaneous", as it is not considered streaming if one 126 downloads a video file and plays it after the download is 127 completed, which would be called download-and-play. 129 This has two implications. 131 * First, the server's transmission rate must (loosely or tightly) 132 match to client's consumption rate in order to provide 133 uninterrupted playback. That is, the client must not run out of 134 data (buffer underrun) or accept more data than it can buffer 135 before playback (buffer overrun) as any excess media is simply 136 discarded. 138 * Second, the client's consumption rate is limited not only by 139 bandwidth availability but also real-time constraints. That is, 140 the client cannot fetch media that is not available from a server 141 yet. 143 In many contexts, video traffic can be handled transparently as 144 generic application-level traffic. However, as the volume of video 145 traffic continues to grow, it's becoming increasingly important to 146 consider the effects of network design decisions on application-level 147 performance, with considerations for the impact on video delivery. 149 This document aims to provide a taxonomy of networking issues as they 150 relate to quality of experience in internet video delivery. The 151 focus is on capturing characteristics of video delivery that have 152 surprised network designers or transport experts without specific 153 video expertise, since these highlight key differences between common 154 assumptions in existing networking documents and observations of 155 video delivery issues in practice. 157 Making specific recommendations for mitigating these issues is out of 158 scope, though some existing mitigations are mentioned in passing. 159 The intent is to provide a point of reference for future solution 160 proposals to use in describing how new technologies address or avoid 161 these existing observed problems. 163 1.1. Notes for Contributors and Reviewers 165 Note to RFC Editor: Please remove this section and its subsections 166 before publication. 168 This section is to provide references to make it easier to review the 169 development and discussion on the draft so far. 171 1.1.1. Venues for Contribution and Discussion 173 This document is in the Github repository at: 175 https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons 176 (https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons) 178 Readers are welcome to open issues and send pull requests for this 179 document. 181 Substantial discussion of this document should take place on the MOPS 182 working group mailing list (mops@ietf.org). 184 * Join: https://www.ietf.org/mailman/listinfo/mops 185 (https://www.ietf.org/mailman/listinfo/mops) 187 * Search: https://mailarchive.ietf.org/arch/browse/mops/ 188 (https://mailarchive.ietf.org/arch/browse/mops/) 190 1.1.2. Template for Contributions 192 Contributions are solicited regarding issues and considerations that 193 have an impact on media streaming operations. 195 Please note that contributions may be merged and substantially 196 edited, and as a reminder, please carefully consider the Note Well 197 before contributing: https://datatracker.ietf.org/submit/note-well/ 198 (https://datatracker.ietf.org/submit/note-well/) 200 Contributions can be emailed to mops@ietf.org, submitted as issues to 201 the issue tracker of the repository in Section 1.1.1, or emailed to 202 the document authors at draft-ietf-mops-streaming-opcons@ietf.org. 204 Contributors describing an issue not yet addressed in the draft are 205 requested to provide the following information, where applicable: 207 * a suggested title or name for the issue 209 * a long-term pointer to the best reference describing the issue 211 * a short description of the nature of the issue and its impact on 212 media quality of service, including: 214 - where in the network this issue has root causes 216 - who can detect this issue when it occurs 218 * an overview of the issue's known prevalence in practice. pointers 219 to write-ups of high-profile incidents are a plus. 221 * a list of known mitigation techniques, with (for each known 222 mitigation): 224 - a name for the mitigation technique 226 - a long-term pointer to the best reference describing it 228 - a short description of the technique: 230 o what it does 232 o where in the network it operates 234 o an overview of the tradeoffs involved-how and why it's 235 helpful, what it costs. 237 - supplemental information about the technique's deployment 238 prevalence and status 240 1.1.3. History of Public Discussion 242 Presentations: 244 * IETF 105 BOF: 246 https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s 247 (https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s) 249 * IETF 106 meeting: 251 https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s 252 (https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s) 254 * MOPS Interim Meeting 2020-04-15: 256 https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s 257 (https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s) 259 * IETF 108 meeting: 261 https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s 262 (https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s) 264 * MOPS 2020-10-30 Interim meeting: 266 https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s 267 (https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s) 269 2. Bandwidth Provisioning 271 2.1. Scaling Requirements for Media Delivery 273 2.1.1. Video Bitrates 275 Video bitrate selection depends on many variables including the 276 resolution (height and width), frame rate, color depth, codec, 277 encoding parameters, scene complexity and amount of motion. 278 Generally speaking, as the resolution, frame rate, color depth, scene 279 complexity and amount of motion increase, the encoding bitrate 280 increases. As newer codecs with better compression tools are used, 281 the encoding bitrate decreases. Similarly, a multi-pass encoding 282 generally produces better quality output compared to single-pass 283 encoding at the same bitrate, or delivers the same quality at a lower 284 bitrate. 286 Here are a few common resolutions used for video content, with 287 typical ranges of bitrate for the two most popular video codecs 288 [Encodings]. 290 +============+================+============+============+ 291 | Name | Width x Height | AVC | HEVC | 292 +============+================+============+============+ 293 | DVD | 720 x 480 | 1.0 Mbps | 0.5 Mbps | 294 +------------+----------------+------------+------------+ 295 | 720p (1K) | 1280 x 720 | 3-4.5 Mbps | 2-4 Mbps | 296 +------------+----------------+------------+------------+ 297 | 1080p (2K) | 1920 x 1080 | 6-8 Mbps | 4.5-7 Mbps | 298 +------------+----------------+------------+------------+ 299 | 2160p (4k) | 3840 x 2160 | N/A | 10-20 Mbps | 300 +------------+----------------+------------+------------+ 302 Table 1 304 2.1.2. Virtual Reality Bitrates 306 The bitrates given in Section 2.1.1 describe video streams that 307 provide the user with a single, fixed, point of view - so, the user 308 has no "degrees of freedom", and the user sees all of the video image 309 that is available. 311 Even basic virtual reality (360-degree) videos that allow users to 312 look around freely (referred to as "three degrees of freedom", or 313 3DoF) require substantially larger bitrates when they are captured 314 and encoded as such videos require multiple fields of view of the 315 scene. The typical multiplication factor is 8 to 10. Yet, due to 316 smart delivery methods such as viewport-based or tiled-based 317 streaming, we do not need to send the whole scene to the user. 318 Instead, the user needs only the portion corresponding to its 319 viewpoint at any given time. 321 In more immersive applications, where limited user movement ("three 322 degrees of freedom plus", or 3DoF+) or full user movement ("six 323 degrees of freedom", or 6DoF) is allowed, the required bitrate grows 324 even further. In this case, immersive content is typically referred 325 to as volumetric media. One way to represent the volumetric media is 326 to use point clouds, where streaming a single object may easily 327 require a bitrate of 30 Mbps or higher. Refer to [MPEGI] and [PCC] 328 for more details. 330 2.2. Path Requirements 332 The bitrate requirements in Section 2.1 are per end-user actively 333 consuming a media feed, so in the worst case, the bitrate demands can 334 be multiplied by the number of simultaneous users to find the 335 bandwidth requirements for a router on the delivery path with that 336 number of users downstream. For example, at a node with 10,000 337 downstream users simultaneously consuming video streams, 338 approximately 80 Gbps might be necessary in order for all of them to 339 get typical content at 1080p resolution. 341 However, when there is some overlap in the feeds being consumed by 342 end users, it is sometimes possible to reduce the bandwidth 343 provisioning requirements for the network by performing some kind of 344 replication within the network. This can be achieved via object 345 caching with delivery of replicated objects over individual 346 connections, and/or by packet-level replication using multicast. 348 To the extent that replication of popular content can be performed, 349 bandwidth requirements at peering or ingest points can be reduced to 350 as low as a per-feed requirement instead of a per-user requirement. 352 2.3. Caching Systems 354 When demand for content is relatively predictable, and especially 355 when that content is relatively static, caching content close to 356 requesters, and pre-loading caches to respond quickly to initial 357 requests is often useful (for example, HTTP/1.1 caching is described 358 in [RFC7234]). This is subject to the usual considerations for 359 caching - for example, how much data must be cached to make a 360 significant difference to the requester, and how the benefits of 361 caching and pre-loading caches balances against the costs of tracking 362 "stale" content in caches and refreshing that content. 364 It is worth noting that not all high-demand content is "live" 365 content. One popular example is when popular streaming content can 366 be staged close to a significant number of requesters, as can happen 367 when a new episode of a popular show is released. This content may 368 be largely stable, so low-cost to maintain in multiple places 369 throughout the Internet. This can reduce demands for high end-to-end 370 bandwidth without having to use mechanisms like multicast. 372 Caching and pre-loading can also reduce exposure to peering point 373 congestion, since less traffic crosses the peering point exchanges if 374 the caches are placed in peer networks, especially when the content 375 can be pre-loaded during off-peak hours, and especially if the 376 transfer can make use of "Lower-Effort Per-Hop Behavior (LE PHB) for 377 Differentiated Services" [RFC8622], "Low Extra Delay Background 378 Transport (LEDBAT)" [RFC6817], or similar mechanisms. 380 All of this depends, of course, on the ability of a content provider 381 to predict usage and provision bandwidth, caching, and other 382 mechanisms to meet the needs of users. In some cases (Section 2.4), 383 this is relatively routine, but in other cases, it is more difficult 384 (Section 2.5, Section 2.6). 386 2.4. Predictable Usage Profiles 388 Historical data shows that users consume more video and videos at 389 higher bitrates than they did in the past on their connected devices. 390 Improvements in the codecs that help with reducing the encoding 391 bitrates with better compression algorithms could not have offset the 392 increase in the demand for the higher quality video (higher 393 resolution, higher frame rate, better color gamut, better dynamic 394 range, etc.). In particular, mobile data usage has shown a large 395 jump over the years due to increased consumption of entertainment as 396 well as conversational video. 398 TBD: insert charts showing historical relative data usage patterns 399 with error bars by time of day in consumer networks? 401 TBD: Cross-ref vs. video quality by time of day in practice for some 402 case study? Not sure if there's a good way to capture a generalized 403 insight here, but it seems worth making the point that demand 404 projections can be used to help with e.g. power consumption with 405 routing architectures that provide for modular scalability. 407 2.5. Unpredictable Usage Profiles 409 Although TCP/IP has been used with a number of widely used 410 applications that have symmetric bandwidth requirements (similar 411 bandwidth requirements in each direction between endpoints), many 412 widely-used Internet applications operate in client-server roles, 413 with asymmetric bandwidth requirements. A common example might be an 414 HTTP GET operation, where a client sends a relatively small HTTP GET 415 request for a resource to an HTTP server, and often receives a 416 significantly larger response carrying the requested resource. When 417 HTTP is commonly used to stream movie-length video, the ratio between 418 response size and request size can become arbitrarily large. 420 For this reason, operators may pay more attention to downstream 421 bandwidth utilization when planning and managing capacity. In 422 addition, operators have been able to deploy access networks for end 423 users using underlying technologies that are inherently asymmetric, 424 favoring downstream bandwidth (e.g. ADSL, cellular technologies, 425 most IEEE 802.11 variants), assuming that users will need less 426 upstream bandwidth than downstream bandwidth. This strategy usually 427 works, except when it faiis because application bandwidth usage 428 patterns have changed in ways that were not predicted. 430 One example of this type of change was when peer-to-peer file sharing 431 applications gained popularity in the early 2000s. To take one well- 432 documented case ([RFC5594]), the Bittorrent application created 433 "swarms" of hosts, uploading and downloading files to each other, 434 rather than communicating with a server. Bittorrent favored peers 435 who uploaded as much as they downloaded, so that new Bittorrent users 436 had an incentive to significantly increase their upstream bandwidth 437 utilization. 439 The combination of the large volume of "torrents" and the peer-to- 440 peer characteristic of swarm transfers meant that end user hosts were 441 suddenly uploading higher volumes of traffic to more destinations 442 than was the case before Bittorrent. This caused at least one large 443 ISP to attempt to "throttle" these transfers, to mitigate the load 444 that these hosts placed on their network. These efforts were met by 445 increased use of encryption in Bittorrent, similar to an arms race, 446 and set off discussions about "Net Neutrality" and calls for 447 regulatory action. 449 Especially as end users increase use of video-based social networking 450 applications, it will be helpful for access network providers to 451 watch for increasing numbers of end users uploading significant 452 amounts of content. 454 2.6. Extremely Unpredictable Usage Profiles 456 The causes of unpredictable usage described in Section 2.5 were more 457 or less the result of human choices, but we were reminded during a 458 post-IETF 107 meeting that humans are not always in control, and 459 forces of nature can cause enormous fluctuations in traffic patterns. 461 In his talk, Sanjay Mishra [Mishra] reported that after the CoViD-19 462 pandemic broke out in early 2020, 464 * Comcast's streaming and web video consumption rose by 38%, with 465 their reported peak traffic up 32% overall between March 1 to 466 March 30, 468 * AT&T reported a 28% jump in core network traffic (single day in 469 April, as compared to pre stay-at-home daily average traffic), 470 with video accounting for nearly half of all mobile network 471 traffic, while social networking and web browsing remained the 472 highest percentage (almost a quarter each) of overall mobility 473 traffic, and 475 * Verizon reported similar trends with video traffic up 36% over an 476 average day (pre COVID-19)}. 478 We note that other operators saw similar spikes during this time 479 period. Craig Labowitz [Labovitz] reported 481 * Weekday peak traffic increases over 45%-50% from pre-lockdown 482 levels, 484 * A 30% increase in upstream traffic over their pre-pandemic levels, 485 and 487 * A steady increase in the overall volume of DDoS traffic, with 488 amounts exceeding the pre-pandemic levels by 40%. (He attributed 489 this increase to the significant rise in gaming-related DDoS 490 attacks ([LabovitzDDoS]), as gaming usage also increased.) 492 Subsequently, the Internet Architecture Board (IAB) held a COVID-19 493 Network Impacts Workshop [IABcovid] in November 2020. Given a larger 494 number of reports and more time to reflect, the following 495 observations from the draft workshop report are worth considering. 497 * Participants describing different types of networks reported 498 different kinds of impacts, but all types of networks saw impacts. 500 * Mobile networks saw traffic reductions and residential networks 501 saw significant increases. 503 * Reported traffic increases from ISPs and IXPs over just a few 504 weeks were as big as the traffic growth over the course of a 505 typical year, representing a 15-20% surge in growth to land at a 506 new normal that was much higher than anticipated. 508 * At DE-CIX Frankfurt, the world's largest Internet Exchange Point 509 in terms of data throughput, the year 2020 has seen the largest 510 increase in peak traffic within a single year since the IXP was 511 founded in 1995. 513 * The usage pattern changed significantly as work-from-home and 514 videoconferencing usage peaked during normal work hours, which 515 would have typically been off-peak hours with adults at work and 516 children at school. One might expect that the peak would have had 517 more impact on networks if it had happened during typical evening 518 peak hours for video streaming applications. 520 * The increase in daytime bandwidth consumption reflected both 521 significant increases in "essential" applications such as 522 videoconferencing and VPNs, and entertainment applications as 523 people watched videos or played games. 525 * At the IXP-level, it was observed that port utilization increased. 526 This phenomenon is mostly explained by a higher traffic demand 527 from residential users. 529 3. Latency Considerations 531 Streaming media latency refers to the "glass-to-glass" time duration, 532 which is the delay between the real-life occurrence of an event and 533 the streamed media being appropriately displayed on an end user's 534 device. Note that this is different from the network latency 535 (defined as the time for a packet to cross a network from one end to 536 another end) because it includes video encoding/decoding and 537 buffering time, and for most cases also ingest to an intermediate 538 service such as a CDN or other video distribution service, rather 539 than a direct connection to an end user. 541 Streaming media can be usefully categorized according to the 542 application's latency requirements into a few rough categories: 544 * ultra low-latency (less than 1 second) 546 * low-latency live (less than 10 seconds) 548 * non-low-latency live (10 seconds to a few minutes) 550 * on-demand (hours or more) 552 3.1. Ultra Low-Latency 554 Ultra low-latency delivery of media is defined here as having a 555 glass-to-glass delay target under one second. 557 This level of latency is sometimes necessary for real-time 558 interactive applications such as video conferencing, operation of 559 remote control devices or vehicles, or remotely hosted real-time 560 gaming systems. Some media content providers aim to achieve this 561 level of latency for live media events involving sports, but have 562 usually so far been unsuccessful over the internet at scale, though 563 it is often possible within a localized environment with a controlled 564 network, such as inside a specific venue connected to the event. 565 Applications operating in this domain that encounter transient 566 network events such as loss or reordering of some packets often 567 experience user-visible artifacts in the media. 569 Applications requiring ultra low latency for media delivery are 570 usually tightly constrained on the available choices for media 571 transport technologies, and sometimes may need to operate in 572 controlled environments to reliably achieve their latency and quality 573 goals. 575 Most applications operating over IP networks and requiring latency 576 this low use the Real-time Transport Protocol (RTP) [RFC3550] or 577 WebRTC [RFC8825], which uses RTP for the media transport as well as 578 several other protocols necessary for safe operation in browsers. 580 Worth noting is that many applications for ultra low-latency delivery 581 do not need to scale to more than one user at a time, which 582 simplifies many delivery considerations relative to other use cases. 583 For applications that need to replicate streams to multiple users, 584 especially at a scale exceeding tens of users, this level of latency 585 has historically been nearly impossible to achieve except with the 586 use of multicast or planned provisioning in controlled networks. 588 Recommended reading for applications adopting an RTP-based approach 589 also includes [RFC7656]. For increasing the robustness of the 590 playback by implementing adaptive playout methods, refer to [RFC4733] 591 and [RFC6843]. 593 Applications with further-specialized latency requirements are out of 594 scope for this document. 596 3.2. Low-Latency Live 598 Low-latency live delivery of media is defined here as having a glass- 599 to-glass delay target under 10 seconds. 601 This level of latency is targeted to have a user experience similar 602 to traditional broadcast TV delivery. A frequently cited problem 603 with failing to achieve this level of latency for live sporting 604 events is the user experience failure from having crowds within 605 earshot of one another who react audibly to an important play, or 606 from users who learn of an event in the match via some other channel, 607 for example social media, before it has happened on the screen 608 showing the sporting event. 610 Applications requiring low-latency live media delivery are generally 611 feasible at scale with some restrictions. This typically requires 612 the use of a premium service dedicated to the delivery of live video, 613 and some tradeoffs may be necessary relative to what's feasible in a 614 higher latency service. The tradeoffs may include higher costs, or 615 delivering a lower quality video, or reduced flexibility for adaptive 616 bitrates, or reduced flexibility for available resolutions so that 617 fewer devices can receive an encoding tuned for their display. Low- 618 latency live delivery is also more susceptible to user-visible 619 disruptions due to transient network conditions than higher latency 620 services. 622 Implementation of a low-latency live video service can be achieved 623 with the use of low-latency extensions of HLS (called LL-HLS) 624 [I-D.draft-pantos-hls-rfc8216bis] and DASH (called LL-DASH) 625 [LL-DASH]. These extensions use the Common Media Application Format 626 (CMAF) standard [MPEG-CMAF] that allows the media to be packaged into 627 and transmitted in units smaller than segments, which are called 628 chunks in CMAF language. This way, the latency can be decoupled from 629 the duration of the media segments. Without a CMAF-like packaging, 630 lower latencies can only be achieved by using very short segment 631 durations. However, shorter segments means more frequent intra-coded 632 frames and that is detrimental to video encoding quality. CMAF 633 allows us to still use longer segments (improving encoding quality) 634 without penalizing latency. 636 While an LL-HLS client retrieves each chunk with a separate HTTP GET 637 request, an LL-DASH client uses the chunked transfer encoding feature 638 of the HTTP [CMAF-CTE] which allows the LL-DASH client to fetch all 639 the chunks belonging to a segment with a single GET request. An HTTP 640 server can transmit the CMAF chunks to the LL-DASH client as they 641 arrive from the encoder/packager. A detailed comparison of LL-HLS 642 and LL-DASH is given in [MMSP20]. 644 3.3. Non-Low-Latency Live 646 Non-low-latency live delivery of media is defined here as a live 647 stream that does not have a latency target shorter than 10 seconds. 649 This level of latency is the historically common case for segmented 650 video delivery using HLS [RFC8216] and DASH [MPEG-DASH]. This level 651 of latency is often considered adequate for content like news or pre- 652 recorded content. This level of latency is also sometimes achieved 653 as a fallback state when some part of the delivery system or the 654 client-side players do not have the necessary support for the 655 features necessary to support low-latency live streaming. 657 This level of latency can typically be achieved at scale with 658 commodity CDN services for HTTP(s) delivery, and in some cases the 659 increased time window can allow for production of a wider range of 660 encoding options relative to the requirements for a lower latency 661 service without the need for increasing the hardware footprint, which 662 can allow for wider device interoperability. 664 3.4. On-Demand 666 On-Demand media streaming refers to playback of pre-recorded media 667 based on a user's action. In some cases on-demand media is produced 668 as a by-product of a live media production, using the same segments 669 as the live event, but freezing the manifest after the live event has 670 finished. In other cases, on-demand media is constructed out of pre- 671 recorded assets with no streaming necessarily involved during the 672 production of the on-demand content. 674 On-demand media generally is not subject to latency concerns, but 675 other timing-related considerations can still be as important or even 676 more important to the user experience than the same considerations 677 with live events. These considerations include the startup time, the 678 stability of the media stream's playback quality, and avoidance of 679 stalls and video artifacts during the playback under all but the most 680 severe network conditions. 682 In some applications, optimizations are available to on-demand video 683 that are not always available to live events, such as pre-loading the 684 first segment for a startup time that doesn't have to wait for a 685 network download to begin. 687 4. Adaptive Encoding, Adaptive Delivery, and Measurement Collection 689 4.1. Overview 691 Adaptive BitRate (ABR) is a sort of application-level response 692 strategy in which the streaming client attempts to detect the 693 available bandwidth of the network path by observing the successful 694 application-layer download speed, then chooses a bitrate for each of 695 the video, audio, subtitles and metadata (among the limited number of 696 available options) that fits within that bandwidth, typically 697 adjusting as changes in available bandwidth occur in the network or 698 changes in capabilities occur during the playback (such as available 699 memory, CPU, display size, etc.). 701 4.2. Adaptive Encoding 703 Media servers can provide media streams at various bitrates because 704 the media has been encoded at various bitrates. This is a so-called 705 "ladder" of bitrates, that can be offered to media players as part of 706 the manifest that describes the media being requested by the media 707 player, so that the media player can select among the available 708 bitrate choices. 710 The media server may also choose to alter which bitrates are made 711 available to players by adding or removing bitrate options from the 712 ladder delivered to the player in subsequent manifests built and sent 713 to the player. This way, both the player, through its selection of 714 bitrate to request from the manifest, and the server, through its 715 construction of the bitrates offered in the manifest, are able to 716 affect network utilization. 718 4.3. Adaptive Segmented Delivery 720 ABR playback is commonly implemented by streaming clients using HLS 721 [RFC8216] or DASH [MPEG-DASH] to perform a reliable segmented 722 delivery of media over HTTP. Different implementations use different 723 strategies [ABRSurvey], often relying on proprietary algorithms 724 (called rate adaptation or bitrate selection algorithms) to perform 725 available bandwidth estimation/prediction and the bitrate selection. 727 Many server-player systems will do an initial probe or a very simple 728 throughput speed test at the start of a video playback. This is done 729 to get a rough sense of the highest video bitrate in the ABR ladder 730 that the network between the server and player will likely be able to 731 provide under initial network conditions. After the initial testing, 732 clients tend to rely upon passive network observations and will make 733 use of player side statistics such as buffer fill rates to monitor 734 and respond to changing network conditions. 736 The choice of bitrate occurs within the context of optimizing for 737 some metric monitored by the client, such as highest achievable video 738 quality or lowest chances for a rebuffering event (playback stall). 740 This kind of bandwidth-measurement system can experience trouble in 741 several ways that can be affected by networking design choices. 742 Because adaptive application-level response strategies are typically 743 using application-level protocols, these mechanisms are affected by 744 transport-level protocol behaviors, and the application-level 745 feedback loop is interacting with a transport-level feedback loop, as 746 described in Section 4.3.1 and Section 4.3.2. 748 4.3.1. Idle Time between Segments 750 When the bitrate selection is chosen substantially below the 751 available capacity of the network path, the response to a segment 752 request will typically complete in much less absolute time than the 753 duration of the requested segment, leaving significant idle time 754 between segment downloads. This can have a few surprising 755 consequences: 757 * TCP slow-start when restarting after idle requires multiple RTTs 758 to re-establish a throughput at the network's available capacity. 759 When the active transmission time for segments is substantially 760 shorter than the time between segments leaving an idle gap between 761 segments that triggers a restart of TCP slow-start, the estimate 762 of the successful download speed coming from the application- 763 visible receive rate on the socket can thus end up much lower than 764 the actual available network capacity, preventing a shift to the 765 most appropriate bitrate. [RFC7661] provides some mitigations for 766 this effect at the TCP transport layer, for senders who anticipate 767 a high incidence of this problem. 769 * Mobile flow-bandwidth spectrum and timing mapping can be impacted 770 by idle time in some networks. The carrier capacity assigned to a 771 link can vary with activity. Depending on the idle time 772 characteristics, this can result in a lower available bitrate than 773 would be achievable with a steadier transmission in the same 774 network. 776 Some receive-side ABR algorithms such as [ELASTIC] are designed to 777 try to avoid this effect. Another way to mitigate this effect is by 778 the help of two simultaneous TCP connections is explained in 779 [MMSys11] for Microsoft Smooth Streaming. In some cases, the system- 780 level TCP slow-start restart can be disabled [OReilly-HPBN]. 782 4.3.2. Head-of-Line Blocking 784 In the event of a lost packet on a TCP connection with SACK support 785 (a common case for segmented delivery in practice), loss of a packet 786 can provide a confusing bandwidth signal to the receiving 787 application. Because of the sliding window in TCP, many packets may 788 be accepted by the receiver without being available to the 789 application until the missing packet arrives. Upon arrival of the 790 one missing packet after retransmit, the receiver will suddenly get 791 access to a lot of data at the same time. 793 To a receiver measuring bytes received per unit time at the 794 application layer, and interpreting it as an estimate of the 795 available network bandwidth, this appears as a high jitter in the 796 goodput measurement. 798 It's worth noting that more modern transport protocols such as QUIC 799 have mitigation of head-of-line blocking as a protocol design goal. 800 See Section 5.3 for more details. 802 4.4. Measurement Collection 804 In addition to measurements media players use to guide their segment- 805 by-segment adaptive streaming requests, streaming media providers may 806 also rely on measurements collected from media players to provide 807 analytics that can be used for decisions such as whether the adaptive 808 encoding bitrates in use are the best ones to provide to media 809 players, or whether current media content caching is providing the 810 best experience for viewers. 812 In addition to measurements media players use to guide their segment- 813 by-segment adaptive streaming requests, streaming media providers may 814 also rely on measurements collected from media players to provide 815 analytics that can be used for decisions such as whether the adaptive 816 encoding bitrates in use are the best ones to provide to media 817 players, or whether current media content caching is providing the 818 best experience for viewers. To that effect, the Consumer Technology 819 Association (CTA) who owns the Web Application Video Ecosystem (WAVE) 820 project has published two important specifications. 822 4.4.1. CTA-2066: Streaming Quality of Experience Events, Properties and 823 Metrics 825 [CTA-2066] specifies a set of media player events, properties, 826 quality of experience (QoE) metrics and associated terminology for 827 representing streaming media quality of experience across systems, 828 media players and analytics vendors. While all these events, 829 properties, metrics and associated terminology is used across a 830 number of proprietary analytics and measurement solutions, they were 831 used in slightly (or vastly) different ways that led to 832 interoperability issues. CTA-2066 attempts to address this issue by 833 defining a common terminology as well as how each metric should be 834 computed for consistent reporting. 836 4.4.2. CTA-5004: Common Media Client Data (CMCD) 838 Many assumes that the CDNs have a holistic view into the health and 839 performance of the streaming clients. However, this is not the case. 840 The CDNs produce millions of log lines per second across hundreds of 841 thousands of clients and they have no concept of a "session" as a 842 client would have, so CDNs are decoupled from the metrics the clients 843 generate and report. A CDN cannot tell which request belongs to 844 which playback session, the duration of any media object, the 845 bitrate, or whether any of the clients have stalled and are 846 rebuffering or are about to stall and will rebuffer. The consequence 847 of this decoupling is that a CDN cannot prioritize delivery for when 848 the client needs it most, prefetch content, or trigger alerts when 849 the network itself may be underperforming. One approach to couple 850 the CDN to the playback sessions is for the clients to communicate 851 standardized media-relevant information to the CDNs while they are 852 fetching data. [CTA-5004] was developed exactly for this purpose. 854 4.5. Unreliable Transport 856 In contrast to segmented delivery, several applications use 857 unreliable UDP or SCTP with its "partial reliability" extension 858 [RFC3758] to deliver Media encapsulated in RTP [RFC3550] or raw MPEG 859 Transport Stream ("MPEG-TS")-formatted video [MPEG-TS], when the 860 media is being delivered in situations such as broadcast and live 861 streaming, that better tolerate occasional packet loss without 862 retransmission. 864 Under congestion and loss, this approach generally experiences more 865 video artifacts with fewer delay or head-of-line blocking effects. 866 Often one of the key goals is to reduce latency, to better support 867 applications like videoconferencing, or for other live-action video 868 with interactive components, such as some sporting events. 870 The Secure Reliable Transport protocol [SRT] also uses UDP in an 871 effort to achieve lower latency for streaming media, although it adds 872 reliability at the application layer. 874 Congestion avoidance strategies for deployments using unreliable 875 transport protocols vary widely in practice, ranging from being 876 entirely unresponsive to congestion, to using feedback signaling to 877 change encoder settings (as in [RFC5762]), to using fewer enhancement 878 layers (as in [RFC6190]), to using proprietary methods to detect 879 "quality of experience" issues and turn off video in order to allow 880 less bandwidth-intensive media such as audio to be delivered. 882 More details about congestion avoidance strategies used with 883 unreliable transport protocols are included in Section 5.1. 885 5. Evolution of Transport Protocols and Transport Protocol Behaviors 887 *Note to Reviewers* 889 This section includes some material on UDP and TCP that may be 890 tutorial for some readers. We can decide how to explain that, if the 891 working group feels that this tutorial material is worth keeping. 892 Spencer thought it was worth including, because it provides a 893 contrast to the material on QUIC, which is significantly less 894 tutorial, unless the reader participated in the QUIC working group. 896 Because networking resources are shared between users, a good place 897 to start our discussion is how contention between users, and 898 mechanisms to resolve that contention in ways that are "fair" between 899 users, impact streaming media users. These topics are closely tied 900 to transport protocol behaviors. 902 As noted in Section 4, Adaptive Bitrate response strategies such as 903 HLS [RFC8216] or DASH [MPEG-DASH] are attempting to respond to 904 changing path characteristics, and underlying transport protocols are 905 also attempting to respond to changing path characteristics. 907 For most of the history of the Internet, these transport protocols, 908 described in Section 5.1 and Section 5.2, have had relatively 909 consistent behaviors that have changed slowly, if at all, over time. 910 Newly standardized transport protocols like QUIC [RFC9000] can behave 911 differently from existing transport protocols, and these behaviors 912 may evolve over time more rapidly than currently-used transport 913 protocols. 915 For this reason, we have included a description of how the path 916 characteristics that streaming media providers may see are likely to 917 evolve over time. 919 5.1. UDP and Its Behavior 921 For most of the history of the Internet, we have trusted UDP-based 922 applications to limit their impact on other users. One of the 923 strategies used was to use UDP for simple query-response application 924 protocols, such as DNS, which is often used to send a single-packet 925 request to look up the IP address for a DNS name, and return a 926 single-packet response containing the IP address. Although it is 927 possible to saturate a path between a DNS client and DNS server with 928 DNS requests, in practice, that was rare enough that DNS included few 929 mechanisms to resolve contention between DNS users and other users 930 (whether they are also using DNS, or using other application 931 protocols). 933 In recent times, the usage of UDP-based applications that were not 934 simple query-response protocols has grown substantially, and since 935 UDP does not provide any feedback mechanism to senders to help limit 936 impacts on other users, application-level protocols such as RTP 937 [RFC3550] have been responsible for the decisions that TCP-based 938 applications have delegated to TCP - what to send, how much to send, 939 and when to send it. So, the way some UDP-based applications 940 interact with other users has changed. 942 It's also worth pointing out that because UDP has no transport-layer 943 feedback mechanisms, UDP-based applications that send and receive 944 substantial amounts of information are expected to provide their own 945 feedback mechanisms. This expectation is most recently codified in 946 Best Current Practice [RFC8085]. 948 RTP relies on RTCP Sender and Receiver Reports [RFC3550] as its own 949 feedback mechanism, and even includes Circuit Breakers for Unicast 950 RTP Sessions [RFC8083] for situations when normal RTP congestion 951 control has not been able to react sufficiently to RTP flows sending 952 at rates that result in sustained packet loss. 954 The notion of "Circuit Breakers" has also been applied to other UDP 955 applications in [RFC8084], such as tunneling packets over UDP that 956 are potentially not congestion-controlled (for example, 957 "Encapsulating MPLS in UDP", as described in [RFC7510]). If 958 streaming media is carried in tunnels encapsulated in UDP, these 959 media streams may encounter "tripped circuit breakers", with 960 resulting user-visible impacts. 962 5.2. TCP and Its Behavior 964 For most of the history of the Internet, we have trusted the TCP 965 protocol to limit the impact of applications that sent a significant 966 number of packets, in either or both directions, on other users. 967 Although early versions of TCP were not particularly good at limiting 968 this impact [RFC0793], the addition of Slow Start and Congestion 969 Avoidance, as described in [RFC2001], were critical in allowing TCP- 970 based applications to "use as much bandwidth as possible, but to 971 avoid using more bandwidth than was possible". Although dozens of 972 RFCs have been written refining TCP decisions about what to send, how 973 much to send, and when to send it, since 1988 [Jacobson-Karels] the 974 signals available for TCP senders remained unchanged - end-to-end 975 acknowledgments for packets that were successfully sent and received, 976 and packet timeouts for packets that were not. 978 The success of the largely TCP-based Internet is evidence that the 979 mechanisms TCP used to achieve equilibrium quickly, at a point where 980 TCP senders do not interfere with other TCP senders for sustained 981 periods of time, have been largely successful. The Internet 982 continued to work even when the specific mechanisms used to reach 983 equilibrium changed over time. Because TCP provides a common tool to 984 avoid contention, as some TCP-based applications like FTP were 985 largely replaced by other TCP-based applications like HTTP, the 986 transport behavior remained consistent. 988 In recent times, the TCP goal of probing for available bandwidth, and 989 "backing off" when a network path is saturated, has been supplanted 990 by the goal of avoiding growing queues along network paths, which 991 prevent TCP senders from reacting quickly when a network path is 992 saturated. Congestion control mechanisms such as COPA [COPA18] and 993 BBR [I-D.cardwell-iccrg-bbr-congestion-control] make these decisions 994 based on measured path delays, assuming that if the measured path 995 delay is increasing, the sender is injecting packets onto the network 996 path faster than the receiver can accept them, so the sender should 997 adjust its sending rate accordingly. 999 Although TCP protocol behavior has changed over time, the common 1000 practice of implementing TCP as part of an operating system kernel 1001 has acted to limit how quickly TCP behavior can change. Even with 1002 the widespread use of automated operating system update installation 1003 on many end-user systems, streaming media providers could have a 1004 reasonable expectation that they could understand TCP transport 1005 protocol behaviors, and that those behaviors would remain relatively 1006 stable in the short term. 1008 5.3. The QUIC Protocol and Its Behavior 1010 The QUIC protocol, developed from a proprietary protocol into an IETF 1011 standards-track protocol [RFC9000], turns many of the statements made 1012 in Section 5.1 and Section 5.2 on their heads. 1014 Although QUIC provides an alternative to the TCP and UDP transport 1015 protocols, QUIC is itself encapsulated in UDP. As noted elsewhere in 1016 Section 6.1, the QUIC protocol encrypts almost all of its transport 1017 parameters, and all of its payload, so any intermediaries that 1018 network operators may be using to troubleshoot HTTP streaming media 1019 performance issues, perform analytics, or even intercept exchanges in 1020 current applications will not work for QUIC-based applications 1021 without making changes to their networks. Section 6 describes the 1022 implications of media encryption in more detail. 1024 While QUIC is designed as a general-purpose transport protocol, and 1025 can carry different application-layer protocols, the current 1026 standardized mapping is for HTTP/3 [I-D.ietf-quic-http], which 1027 describes how QUIC transport features are used for HTTP. The 1028 convention is for HTTP/3 to run over UDP port 443 [Port443] but this 1029 is not a strict requirement. 1031 When HTTP/3 is encapsulated in QUIC, which is then encapsulated in 1032 UDP, streaming operators (and network operators) might see UDP 1033 traffic patterns that are similar to HTTP(S) over TCP. Since earlier 1034 versions of HTTP(S) rely on TCP, UDP ports may be blocked for any 1035 port numbers that are not commonly used, such as UDP 53 for DNS. 1036 Even when UDP ports are not blocked and HTTP/3 can flow, streaming 1037 operators (and network operators) may severely rate-limit this 1038 traffic because they do not expect to see legitimate high-bandwidth 1039 traffic such as streaming media over the UDP ports that HTTP/3 is 1040 using. 1042 As noted in Section 4.3.2, because TCP provides a reliable, in-order 1043 delivery service for applications, any packet loss for a TCP 1044 connection causes "head-of-line blocking", so that no TCP segments 1045 arriving after a packet is lost will be delivered to the receiving 1046 application until the lost packet is retransmitted, allowing in-order 1047 delivery to the application to continue. As described in [RFC9000], 1048 QUIC connections can carry multiple streams, and when packet losses 1049 do occur, only the streams carried in the lost packet are delayed. 1051 A QUIC extension currently being specified ([I-D.ietf-quic-datagram]) 1052 adds the capability for "unreliable" delivery, similar to the service 1053 provided by UDP, but these datagrams are still subject to the QUIC 1054 connection's congestion controller, providing some transport-level 1055 congestion avoidance measures, which UDP does not. 1057 As noted in Section 5.2, there is increasing interest in transport 1058 protocol behaviors that responds to delay measurements, instead of 1059 responding to packet loss. These behaviors may deliver improved user 1060 experience, but in some cases have not responded to sustained packet 1061 loss, which exhausts available buffers along the end-to-end path that 1062 may affect other users sharing that path. The QUIC protocol provides 1063 a set of congestion control hooks that can be use for algorithm 1064 agility, and [RFC9002] defines a basic algorithm with transport 1065 behavior that is roughly similar to TCP NewReno [RFC6582]. However, 1066 QUIC senders can and do unilaterally chose to use different 1067 algorithms such as loss-based CUBIC [RFC8312], delay-based COPA or 1068 BBR, or even something completely different 1069 We do have experience with deploying new congestion controllers 1070 without melting the Internet (CUBIC is one example), but the point 1071 mentioned in Section 5.2 about TCP being implemented in operating 1072 system kernels is also different with QUIC. Although QUIC can be 1073 implemented in operating system kernels, one of the design goals when 1074 this work was chartered was "QUIC is expected to support rapid, 1075 distributed development and testing of features", and to meet this 1076 expectation, many implementers have chosen to implement QUIC in user 1077 space, outside the operating system kernel, and to even distribute 1078 QUIC libraries with their own applications. 1080 The decision to deploy a new version of QUIC is relatively 1081 uncontrolled, compared to other widely used transport protocols, and 1082 this can include new transport behaviors that appear without much 1083 notice except to the QUIC endpoints. At IETF 105, Christian Huitema 1084 and Brian Trammell presented a talk on "Congestion Defense in Depth" 1085 [CDiD], that explored potential concerns about new QUIC congestion 1086 controllers being broadly deployed without the testing and 1087 instrumentation that current major content providers routinely 1088 include. The sense of the room at IETF 105 was that the current 1089 major content providers understood what is at stake when they deploy 1090 new congestion controllers, but this presentation, and the related 1091 discussion in TSVAREA minutes from IETF 105 ([tsvarea-105], are still 1092 worth a look for new and rapidly growing content providers. 1094 It is worth considering that if TCP-based HTTP traffic and UDP-based 1095 HTTP/3 traffic are allowed to enter operator networks on roughly 1096 equal terms, questions of fairness and contention will be heavily 1097 dependent on interactions between the congestion controllers in use 1098 for TCP-base HTTP traffic and UDP-based HTTP/3 traffic. 1100 More broadly, [I-D.ietf-quic-manageability] discusses manageability 1101 of the QUIC transport protocol, focusing on the implications of 1102 QUIC's design and wire image on network operations involving QUIC 1103 traffic. It discusses what network operators can consider in some 1104 detail. 1106 6. Streaming Encrypted Media 1108 "Encrypted Media" has at least three meanings: 1110 * Media encrypted at the application layer, typically using some 1111 sort of Digital Rights Management (DRM) system, and typically 1112 remaining encrypted "at rest", when senders and receivers store 1113 it, 1115 * Media encrypted by the sender at the transport layer, and 1116 remaining encrypted until it reaches the ultimate media consumer 1117 (in this document, referred to as "end-to-end media encryption"), 1118 and 1120 * Media encrypted by the sender at the transport layer, and 1121 remaining encrypted until it reaches some intermediary that is 1122 _not_ the ultimate media consumer, but has credentials allowing 1123 decryption of the media content. This intermediary may examine 1124 and even transform the media content in some way, before 1125 forwarding re-encrypted media content (in this document referred 1126 to as "hop-by-hop media encryption") 1128 Both "hop-by-hop" and "end-to-end" encrypted transport may carry 1129 media that is, in addition, encrypted at the application layer. 1131 Each of these encryption strategies is intended to achieve a 1132 different goal. For instance, application-level encryption may be 1133 used for business purposes, such as avoiding piracy or enforcing 1134 geographic restrictions on playback, while transport-layer encryption 1135 may be used to prevent media steam manipulation or to protect 1136 manifests. 1138 This document does not take a position on whether those goals are 1139 "valid" (whatever that might mean). 1141 In this document, we will focus on media encrypted at the transport 1142 layer, whether encrypted "hop-by-hop" or "end-to-end". Because media 1143 encrypted at the application layer will only be processed by 1144 application-level entities, this encryption does not have transport- 1145 layer implications. 1147 Both "End-to-End" and "Hop-by-Hop" media encryption have specific 1148 implications for streaming operators. These are described in 1149 Section 6.2 and Section 6.3. 1151 6.1. General Considerations for Media Encryption 1153 The use of strong encryption does provide confidentiality for 1154 encrypted streaming media, from the sender to either an intermediary 1155 or the ultimate media consumer, and this does prevent Deep Packet 1156 Inspection by any intermediary that does not possess credentials 1157 allowing decryption. However, even encrypted content streams may be 1158 vulnerable to traffic analysis. An intermediary that can identify an 1159 encrypted media stream without decrypting it, may be able to 1160 "fingerprint" the encrypted media stream of known content, and then 1161 match the targeted media stream against the fingerprints of known 1162 content. This protection can be lessened if a media provider is 1163 repeatedly encrypting the same content. [CODASPY17] is an example of 1164 what is possible when identifying HTTPS-protected videos over TCP 1165 transport, based either on the length of entire resources being 1166 transferred, or on characteristic packet patterns at the beginning of 1167 a resource being transferred. 1169 If traffic analysis is successful at identifying encrypted content 1170 and associating it with specific users, this breaks privacy as 1171 certainly as examining decrypted traffic. 1173 Because HTTPS has historically layered HTTP on top of TLS, which is 1174 in turn layered on top of TCP, intermediaries do have access to 1175 unencrypted TCP-level transport information, such as retransmissions, 1176 and some carriers exploited this information in attempts to improve 1177 transport-layer performance [RFC3135]. The most recent standardized 1178 version of HTTPS, HTTP/3 [I-D.ietf-quic-http], uses the QUIC protocol 1179 [RFC9000] as its transport layer. QUIC relies on the TLS 1.3 initial 1180 handshake [RFC8446] only for key exchange [RFC9001], and encrypts 1181 almost all transport parameters itself, with the exception of a few 1182 invariant header fields. In the QUIC short header, the only 1183 transport-level parameter which is sent "in the clear" is the 1184 Destination Connection ID [RFC8999], and even in the QUIC long 1185 header, the only transport-level parameters sent "in the clear" are 1186 the Version, Destination Connection ID, and Source Connection ID. 1187 For these reasons, HTTP/3 is significantly more "opaque" than HTTPS 1188 with HTTP/1 or HTTP/2. 1190 6.2. Considerations for "Hop-by-Hop" Media Encryption 1192 Although the IETF has put considerable emphasis on end-to-end 1193 streaming media encryption, there are still important use cases that 1194 require the insertion of intermediaries. 1196 There are a variety of ways to involve intermediaries, and some are 1197 much more intrusive than others. 1199 From a content provider's perspective, a number of considerations are 1200 in play. The first question is likely whether the content provider 1201 intends that intermediaries are explicitly addressed from endpoints, 1202 or whether the content provider is willing to allow intermediaries to 1203 "intercept" streaming content transparently, with no awareness or 1204 permission from either endpoint. 1206 If a content provider does not actively work to avoid interception by 1207 intermediaries, the effect will be indistinguishable from 1208 "impersonation attacks", and endpoints cannot be assumed of any level 1209 of privacy. 1211 Assuming that a content provider does intend to allow intermediaries 1212 to participate in content streaming, and does intend to provide some 1213 level of privacy for endpoints, there are a number of possible tools, 1214 either already available or still being specified. These include 1216 * Server And Network assisted DASH [MPEG-DASH-SAND] - this 1217 specification introduces explicit messaging between DASH clients 1218 and network elements or between various network elements for the 1219 purpose of improving the efficiency of streaming sessions by 1220 providing information about real-time operational characteristics 1221 of networks, servers, proxies, caches, CDNs, as well as DASH 1222 client's performance and status. 1224 * "Double Encryption Procedures for the Secure Real-Time Transport 1225 Protocol (SRTP)" [RFC8723] - this specification provides a 1226 cryptographic transform for the Secure Real-time Transport 1227 Protocol that provides both hop-by-hop and end-to-end security 1228 guarantees. 1230 * Secure Media Frames [SFRAME] - [RFC8723] is closely tied to SRTP, 1231 and this close association impeded widespread deployment, because 1232 it could not be used for the most common media content delivery 1233 mechanisms. A more recent proposal, Secure Media Frames [SFRAME], 1234 also provides both hop-by-hop and end-to-end security guarantees, 1235 but can be used with other transport protocols beyond SRTP. 1237 If a content provider chooses not to involve intermediaries, this 1238 choice should be carefully considered. As an example, if media 1239 manifests are encrypted end-to-end, network providers who had been 1240 able to lower offered quality and reduce on their networks will no 1241 longer be able to do that. Some resources that might inform this 1242 consideration are in [RFC8825] (for WebRTC) and 1243 [I-D.ietf-quic-manageability] (for HTTP/3 and QUIC). 1245 6.3. Considerations for "End-to-End" Media Encryption 1247 "End-to-end" media encryption offers the potential of providing 1248 privacy for streaming media consumers, with the idea being that if an 1249 unauthorized intermediary can't decrypt streaming media, the 1250 intermediary can't use Deep Packet Inspection (DPI) to examine HTTP 1251 request and response headers and identify the media content being 1252 streamed. 1254 "End-to-end" media encryption has become much more widespread in the 1255 years since the IETF issued "Pervasive Monitoring Is an Attack" 1256 [RFC7258] as a Best Current Practice, describing pervasive monitoring 1257 as a much greater threat than previously appreciated. After the 1258 Snowden disclosures, many content providers made the decision to use 1259 HTTPS protection - HTTP over TLS - for most or all content being 1260 delivered as a routine practice, rather than in exceptional cases for 1261 content that was considered "sensitive". 1263 Unfortunately, as noted in [RFC7258], there is no way to prevent 1264 pervasive monitoring by an "attacker", while allowing monitoring by a 1265 more benign entity who "only" wants to use DPI to examine HTTP 1266 requests and responses in order to provide a better user experience. 1267 If a modern encrypted transport protocol is used for end-to-end media 1268 encryption, intermediary streaming operators are unable to examine 1269 transport and application protocol behavior. As described in 1270 Section 6.2, only an intermediary streaming operator who is 1271 explicitly authorized to examine packet payloads, rather than 1272 intercepting packets and examining them without authorization, can 1273 continue these practices. 1275 [RFC7258] said that "The IETF will strive to produce specifications 1276 that mitigate pervasive monitoring attacks", so streaming operators 1277 should expect the IETF's direction toward preventing unauthorized 1278 monitoring of IETF protocols to continue for the forseeable future. 1280 7. IANA Considerations 1282 This document requires no actions from IANA. 1284 8. Security Considerations 1286 This document introduces no new security issues. 1288 9. Acknowledgments 1290 Thanks to Mark Nottingham, Glenn Deen, Dave Oran, Aaron Falk, Kyle 1291 Rose, Leslie Daigle, Lucas Pardue, Matt Stock, Alexandre Gouaillard, 1292 and Mike English for their very helpful reviews and comments. 1294 10. Informative References 1296 [ABRSurvey] 1297 Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R., and 1298 A. Bentaleb et al, "A Survey on Bitrate Adaptation Schemes 1299 for Streaming Media Over HTTP", IEEE Communications 1300 Surveys & Tutorials , 2019, 1301 . 1303 [CDiD] Huitema, C. and B. Trammell, "(A call for) Congestion 1304 Defense in Depth", July 2019, 1305 . 1308 [CMAF-CTE] Law, W., "Ultra-Low-Latency Streaming Using Chunked- 1309 Encoded and Chunked Transferred CMAF", October 2018, 1310 . 1313 [CODASPY17] 1314 Reed, A. and M. Kranch, "Identifying HTTPS-Protected 1315 Netflix Videos in Real-Time", ACM CODASPY , March 2017, 1316 . 1318 [COPA18] Arun, V. and H. Balakrishnan, "Copa: Practical Delay-Based 1319 Congestion Control for the Internet", USENIX NSDI , April 1320 2018, . 1322 [CTA-2066] Consumer Technology Association, "Streaming Quality of 1323 Experience Events, Properties and Metrics", March 2020, 1324 . 1327 [CTA-5004] CTA, ., "Common Media Client Data (CMCD)", September 2020, 1328 . 1331 [CVNI] "Cisco Visual Networking Index: Forecast and Trends, 1332 2017-2022 White Paper", 27 February 2019, 1333 . 1337 [ELASTIC] De Cicco, L., Caldaralo, V., Palmisano, V., and S. 1338 Mascolo, "ELASTIC: A client-side controller for dynamic 1339 adaptive streaming over HTTP (DASH)", Packet Video 1340 Workshop , December 2013, 1341 . 1343 [Encodings] 1344 Apple, Inc, ., "HLS Authoring Specification for Apple 1345 Devices", June 2020, 1346 . 1350 [I-D.cardwell-iccrg-bbr-congestion-control] 1351 Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson, 1352 "BBR Congestion Control", Work in Progress, Internet- 1353 Draft, draft-cardwell-iccrg-bbr-congestion-control-00, 3 1354 July 2017, . 1357 [I-D.draft-pantos-hls-rfc8216bis] 1358 Pantos, R., "HTTP Live Streaming 2nd Edition", Work in 1359 Progress, Internet-Draft, draft-pantos-hls-rfc8216bis-09, 1360 27 April 2021, . 1363 [I-D.ietf-quic-datagram] 1364 Pauly, T., Kinnear, E., and D. Schinazi, "An Unreliable 1365 Datagram Extension to QUIC", Work in Progress, Internet- 1366 Draft, draft-ietf-quic-datagram-02, 16 February 2021, 1367 . 1370 [I-D.ietf-quic-http] 1371 Bishop, M., "Hypertext Transfer Protocol Version 3 1372 (HTTP/3)", Work in Progress, Internet-Draft, draft-ietf- 1373 quic-http-34, 2 February 2021, 1374 . 1377 [I-D.ietf-quic-manageability] 1378 Kuehlewind, M. and B. Trammell, "Manageability of the QUIC 1379 Transport Protocol", Work in Progress, Internet-Draft, 1380 draft-ietf-quic-manageability-11, 21 April 2021, 1381 . 1384 [IABcovid] Arkko, J., Farrel, S., Kuhlewind, M., and C. Perkins, 1385 "Report from the IAB COVID-19 Network Impacts Workshop 1386 2020", November 2020, . 1389 [Jacobson-Karels] 1390 Jacobson, V. and M. Karels, "Congestion Avoidance and 1391 Control", November 1988, 1392 . 1394 [Labovitz] Labovitz, C., "Network traffic insights in the time of 1395 COVID-19: April 9 update", April 2020, 1396 . 1399 [LabovitzDDoS] 1400 Takahashi, D., "Why the game industry is still vulnerable 1401 to DDoS attacks", May 2018, 1402 . 1406 [LL-DASH] DASH-IF, ., "Low-latency Modes for DASH", March 2020, 1407 . 1409 [Mishra] Mishra, S. and J. Thibeault, "An update on Streaming Video 1410 Alliance", April 2020, 1411 . 1416 [MMSP20] Durak, K. and . et al, "Evaluating the performance of 1417 Apple's low-latency HLS", IEEE MMSP , September 2020, 1418 . 1420 [MMSys11] Akhshabi, S., Begen, A.C., and C. Dovrolis, "An 1421 experimental evaluation of rate-adaptation algorithms in 1422 adaptive streaming over HTTP", ACM MMSys , February 2011, 1423 . 1425 [MPEG-CMAF] 1426 "ISO/IEC 23000-19:2020 Multimedia application format 1427 (MPEG-A) - Part 19: Common media application format (CMAF) 1428 for segmented media", March 2020, 1429 . 1431 [MPEG-DASH] 1432 "ISO/IEC 23009-1:2019 Dynamic adaptive streaming over HTTP 1433 (DASH) - Part 1: Media presentation description and 1434 segment formats", December 2019, 1435 . 1437 [MPEG-DASH-SAND] 1438 "ISO/IEC 23009-5:2017 Dynamic adaptive streaming over HTTP 1439 (DASH) - Part 5: Server and network assisted DASH (SAND)", 1440 February 2017, . 1442 [MPEG-TS] "H.222.0 : Information technology - Generic coding of 1443 moving pictures and associated audio information: 1444 Systems", 29 August 2018, 1445 . 1447 [MPEGI] Boyce, J.M. and . et al, "MPEG Immersive Video Coding 1448 Standard", Proceedings of the IEEE , n.d., 1449 . 1451 [OReilly-HPBN] 1452 "High Performance Browser Networking (Chapter 2: Building 1453 Blocks of TCP)", May 2021, 1454 . 1456 [PCC] Schwarz, S. and . et al, "Emerging MPEG Standards for 1457 Point Cloud Compression", IEEE Journal on Emerging and 1458 Selected Topics in Circuits and Systems , March 2019, 1459 . 1461 [Port443] "Service Name and Transport Protocol Port Number 1462 Registry", April 2021, . 1466 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1467 RFC 793, DOI 10.17487/RFC0793, September 1981, 1468 . 1470 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast 1471 Retransmit, and Fast Recovery Algorithms", RFC 2001, 1472 DOI 10.17487/RFC2001, January 1997, 1473 . 1475 [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 1476 Shelby, "Performance Enhancing Proxies Intended to 1477 Mitigate Link-Related Degradations", RFC 3135, 1478 DOI 10.17487/RFC3135, June 2001, 1479 . 1481 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1482 Jacobson, "RTP: A Transport Protocol for Real-Time 1483 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1484 July 2003, . 1486 [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. 1487 Conrad, "Stream Control Transmission Protocol (SCTP) 1488 Partial Reliability Extension", RFC 3758, 1489 DOI 10.17487/RFC3758, May 2004, 1490 . 1492 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 1493 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 1494 DOI 10.17487/RFC4733, December 2006, 1495 . 1497 [RFC5594] Peterson, J. and A. Cooper, "Report from the IETF Workshop 1498 on Peer-to-Peer (P2P) Infrastructure, May 28, 2008", 1499 RFC 5594, DOI 10.17487/RFC5594, July 2009, 1500 . 1502 [RFC5762] Perkins, C., "RTP and the Datagram Congestion Control 1503 Protocol (DCCP)", RFC 5762, DOI 10.17487/RFC5762, April 1504 2010, . 1506 [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. 1507 Eleftheriadis, "RTP Payload Format for Scalable Video 1508 Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, 1509 . 1511 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1512 NewReno Modification to TCP's Fast Recovery Algorithm", 1513 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1514 . 1516 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 1517 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 1518 DOI 10.17487/RFC6817, December 2012, 1519 . 1521 [RFC6843] Clark, A., Gross, K., and Q. Wu, "RTP Control Protocol 1522 (RTCP) Extended Report (XR) Block for Delay Metric 1523 Reporting", RFC 6843, DOI 10.17487/RFC6843, January 2013, 1524 . 1526 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 1527 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 1528 RFC 7234, DOI 10.17487/RFC7234, June 2014, 1529 . 1531 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1532 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 1533 2014, . 1535 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1536 "Encapsulating MPLS in UDP", RFC 7510, 1537 DOI 10.17487/RFC7510, April 2015, 1538 . 1540 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1541 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1542 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1543 DOI 10.17487/RFC7656, November 2015, 1544 . 1546 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1547 TCP to Support Rate-Limited Traffic", RFC 7661, 1548 DOI 10.17487/RFC7661, October 2015, 1549 . 1551 [RFC8083] Perkins, C. and V. Singh, "Multimedia Congestion Control: 1552 Circuit Breakers for Unicast RTP Sessions", RFC 8083, 1553 DOI 10.17487/RFC8083, March 2017, 1554 . 1556 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1557 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1558 . 1560 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1561 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1562 March 2017, . 1564 [RFC8216] Pantos, R., Ed. and W. May, "HTTP Live Streaming", 1565 RFC 8216, DOI 10.17487/RFC8216, August 2017, 1566 . 1568 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1569 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1570 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1571 . 1573 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 1574 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 1575 . 1577 [RFC8622] Bless, R., "A Lower-Effort Per-Hop Behavior (LE PHB) for 1578 Differentiated Services", RFC 8622, DOI 10.17487/RFC8622, 1579 June 2019, . 1581 [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, 1582 "Double Encryption Procedures for the Secure Real-Time 1583 Transport Protocol (SRTP)", RFC 8723, 1584 DOI 10.17487/RFC8723, April 2020, 1585 . 1587 [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for 1588 Browser-Based Applications", RFC 8825, 1589 DOI 10.17487/RFC8825, January 2021, 1590 . 1592 [RFC8999] Thomson, M., "Version-Independent Properties of QUIC", 1593 RFC 8999, DOI 10.17487/RFC8999, May 2021, 1594 . 1596 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1597 Multiplexed and Secure Transport", RFC 9000, 1598 DOI 10.17487/RFC9000, May 2021, 1599 . 1601 [RFC9001] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 1602 QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021, 1603 . 1605 [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection 1606 and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, 1607 May 2021, . 1609 [SFRAME] "Secure Media Frames Working Group (Home Page)", n.d., 1610 . 1612 [SRT] Sharabayko, M., "Secure Reliable Transport (SRT) Protocol 1613 Overview", 15 April 2020, 1614 . 1619 [tsvarea-105] 1620 "TSVAREA Minutes - IETF 105", July 2019, 1621 . 1624 Authors' Addresses 1626 Jake Holland 1627 Akamai Technologies, Inc. 1628 150 Broadway 1629 Cambridge, MA 02144, 1630 United States of America 1632 Email: jakeholland.net@gmail.com 1634 Ali Begen 1635 Networked Media 1636 Turkey 1638 Email: ali.begen@networked.media 1639 Spencer Dawkins 1640 Tencent America LLC 1641 United States of America 1643 Email: spencerdawkins.ietf@gmail.com