idnits 2.17.1 draft-dong-usecase-packet-significance-diff-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 16, 2021) is 1016 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Independent Submission L. Dong 3 Internet-Draft K. Makhijani 4 Intended status: Informational R. Li 5 Expires: December 18, 2021 Futurewei Technologies Inc. 6 June 16, 2021 8 A Use Case of Packets' Significance Difference with Media Scalability 9 draft-dong-usecase-packet-significance-diff-00 11 Abstract 13 This document introduces a use case of packets' significance 14 difference embedded with media scalability. With the dominance of 15 video traffic on the Internet, selectively dropping packets or parts 16 of packets from competing media streams becomes a complementary 17 mechanism when dealing with network congestion. 19 The document describes the characteristics of media scalability, some 20 limitations of existing end-to-end congestion control mechanisms 21 through rate control and adaptation, explains why current ways of 22 entire packet dropping at the traffic class level using in-network 23 active queue management are not most appropriate to meet end users' 24 Quality of Service expectations. The document identifies that there 25 exists "significance difference" among packets or even among parts of 26 the packets within a flow, and brings out a new set of requirements 27 for application and network to support packet significance difference 28 to improve the Quality of Experience of end users. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on December 18, 2021. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Terms and Abbreviations . . . . . . . . . . . . . . . . . . . 3 66 3. Media Scalability and Congestion Control . . . . . . . . . . 4 67 4. Packet Dropping . . . . . . . . . . . . . . . . . . . . . . . 5 68 5. Significance Difference Among Packets and Within Packets . . 6 69 6. New Requirements . . . . . . . . . . . . . . . . . . . . . . 7 70 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 72 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 73 10. Informative References . . . . . . . . . . . . . . . . . . . 8 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 76 1. Introduction 78 Recent studies [CiscoNetworkingIndex] show that IP video traffic will 79 be 82 percent of all consumer Internet traffic by 2021 in a global 80 scale, up from 73 percent in 2016. Live video has grown 15-fold from 81 2016 to 2021, accounts for 13 percent of Internet video traffic by 82 2021. VR (Virtual Reality) and AR (Augmented Reality) traffic has 83 increased 20-fold between 2016 and 2021, at a CAGR (Compound Annual 84 Growth Rate) of 82 percent. With the rapid growth of multimedia 85 streaming traffic, it is increasingly likely that multiple streaming 86 flows share a bottleneck link, which would inevitably cause network 87 congestion. Today's transport protocols and Internet protocols are 88 oblivious to multimedia streaming applications or end users' QoE 89 (Quality of Experience) expectations. From the perspective of user 90 experience and user expectation, the following two observations could 91 be made. 93 o It is very likely that a user may prefer to acquire the media 94 content in a somewhat degraded quality that is above the tolerance 95 threshold rather than getting nothing at all for a few seconds. 97 o A user may be particularly interested in certain group of blocks 98 belonging to the interested objects in the media content (i.e., 99 Region of Interest, RoI). It is necessary to prevent the RoI 100 blocks from being lost during transmission. 102 At the beginning of this document, the different types of scalability 103 are discussed in current video codecs, facilitating the rate control 104 and adaptation mechanisms carried out in video segments when dealing 105 with network congestion during the media streaming. It is 106 acknowledged that such mechanisms have efficiently improved users' 107 QoE. However, the packets on the wire cannot avoid the possibility 108 of being entirely dropped when the bottleneck network nodes cannot 109 retain them due to buffer overflowing during congestion. Thanks to 110 the scalability characteristics designed to the video codecs, it is 111 not hard to find out that the importance or significance of different 112 packets within a media streaming flow or even different parts of the 113 single packet could vary for their usefulness in decoding and 114 recovering the media content to meet receiver's expectation. The 115 document highlights the requirements of making the user' preference 116 and application context aware to the network to help further improve 117 the QoE of media streaming. Accordingly, the network could treat the 118 packets or different parts of the packets according to the 119 characteristics of the packets and end users' preferences. 121 2. Terms and Abbreviations 123 The terms and abbreviations used in this document are listed below. 125 o AR: Augmented Reality 127 o CAGR: Compound Annual Growth Rate 129 o DASH: Dynamic Adaptive Streaming over HTTP 131 o GOP: Group of Picture 133 o HAS: HTTP Adaptive Stream 135 o HTTP: Hypertext Transfer Protocol 137 o QoE: Quality of Experience 139 o QoS: Quality of Service 140 o SNR: Signal-to-Noise Ratio 142 o SVC: Scalable Video Coding 144 o VR: Virtual Reality 146 The above terminology is defined in greater details in the remainder 147 of this document. 149 3. Media Scalability and Congestion Control 151 A visual scene is represented in digital form by sampling the real 152 scene spatially on a rectangular grid in the video image plane and 153 sampling temporally at regular time intervals as a sequence of still 154 frames. Correspondingly, modern media codec [Conklin2001] [Kim2001] 155 incorporates three types of "Scalability": i.e., temporal 156 scalability, spatial scalability, and quality scalability, which 157 adapt the media bitstream by adding or removing some portions to/from 158 it in order to match the different needs or preferences of end users 159 as well as to the network conditions. 161 Temporal scalability refers to scalability designed to allow the 162 frame rate of the video bitstream to be varied using interlayer 163 prediction. Spatial scalability represents the spatial resolution 164 variations with respect to the original image frame. The lower layer 165 provides the basic spatial resolution. The enhancement layer employs 166 the spatially interpolated lower layers and constructs the source 167 video in its full spatial resolution. Quality scalability is also 168 commonly referred to as fidelity or SNR (Signal-to-Noise Ratio) 169 scalability. Each spatial layer could have many quality layers. For 170 example, SVC (Scalable Video Coding)[SVC] is an H.264 [H.264] 171 extension that divides a single video bitstream into multiple 172 representations or layers. This hierarchical layered structure 173 comprises a base layer and two enhancement layers. The media may be 174 scaled up by adding the enhancement layer(s) or scaled down by 175 dropping the enhancement layer(s). The levels of scalability 176 included in the media stream affect the quality of media presented to 177 the end users' devices. 179 Bursty loss and longer-than-expected delay have catastrophic effect 180 on QoE to end-users in media streaming. They are usually caused by 181 network congestion. Despite all kinds of congestion control 182 mechanisms developed in the community over the decades [Saadi2019] 183 [Adams2013], they often target different goals, e.g., link 184 utilization improvement, loss reduction, fairness enhancement. By 185 leveraging the flexibility and variety of media qualities provided by 186 different types of media scalability, for media streaming, minimizing 187 the possibility of network congestion can often be achieved by rate 188 control and media adaptation methods. 190 Existing rate control and adaptation methods [Bentaleb2019] [Wu2001] 191 can be at source-side and receiver-side, which are carried at end 192 devices and servers, respectively. 194 o In source-based schemes [Wu2000] , source regulates the sending 195 rate to maintain the packet loss ratio below a threshold by 196 employing the feedback from probing experiments, or source 197 determines the sending rate through a TCP-friendly model. 198 However, some constraints exist, media codecs can usually only 199 adjust their output rates in a much more coarse-grained fashion 200 than, for example, TCP. Users' QoE would also suffer if encoding 201 rates are switched too frequently. 203 o HTTP (Hypertext Transfer Protocol)-based dynamic video adaptation 204 methods [Kua2017] could be driven by source. The server collects 205 the feedback from the network and client (e.g., dynamic variation 206 of network bandwidth and receiving buffer capacity of the client), 207 and accordingly, the video quality will be adapted and streamed. 208 On the other hand, adaptation techniques are also proposed at 209 receiver-side, which mainly use DASH (Dynamic Adaptive Streaming 210 over HTTP) [MPEG-DASH-SAND] [MPEG-DASH] and HAS (HTTP Adaptive 211 Stream) for streaming adapted video data. 213 o The receiver-based rate control [McCanne1996] is typically used in 214 multicasting scalable media content, which is split into multiple 215 layers, with each layer corresponding to one channel in the 216 multicast tree. Receivers could regulate their own receiving 217 rates by adding/dropping channels. Thus receiver-based rate has 218 its limited usage in unicasting. All these techniques consider 219 full quality while streaming from sender to receivers; hence, they 220 consume more resources in the network. 222 4. Packet Dropping 224 Acknowledging the benefits offered by various congestion control and 225 congestion avoidance mechanisms, we would like to point out that the 226 feedback and rate adaption might not be prompt enough to cope with 227 the dropping of packets on the wire. 229 In the current Internet, a packet is treated as the minimal, 230 independent, and self-sufficient unit that gets classified, 231 forwarded, or dropped completely by a network node, according to the 232 local configuration and congestion condition. Although congestion 233 discard can be mitigated by a mixture of ingress traffic shaping and 234 active queue management mechanisms [Thiruchelvi2008] [Adams2013] to 235 avoid any network resource overdrawn, it is not feasible to be 236 deployed on a large scale, meanwhile wastes network resources 237 preparing for the worst possible scenario. 239 DiffServ [RFC2475] is is used to manage resources such as bandwidth 240 and queuing buffers on a per-hop basis between different classes of 241 traffic. The Internet traffic may be separated into different 242 classes with differentiated priorities. This allows preferential 243 treatment for latency or loss sensitive traffic over more tolerant 244 applications, for example those that can afford retransmission. 245 However, with video traffic dominating Internet traffic, flows of 246 media streaming applications with the same class still compete for 247 network resources when encountering bottleneck links and fighting 248 network congestion, preference decided on traffic class would not be 249 effective to eliminate the possibility of degraded service levels or 250 packet drops due to collisions with each other. 252 The routers treat every bit/byte in the packet payload equally, which 253 means every bit/byte has the same significance to the routers. Each 254 to-be-dropped packet is discarded completely. If the transport layer 255 protocol is TCP, after timeout or duplicate acknowledgements received 256 at the sender, the sender may re-try to send the dropped packet 257 before the maximum number of re-transmissions reaches. 258 Retransmission of packets wastes network resources, reduces the 259 overall throughput of the connection and causes longer latency for 260 the packet delivery. The study [RFC8836] has shown that a loss rate 261 of 1% is tolerable to users while a loss rate of 3% is intolerable to 262 most users who found the quality to be annoying (or worse), according 263 to the subjective opinions of the effects of packet loss on media 264 quality. Therefore, the current way of handling network congestion 265 by discarding the packet entirely and retransmitting the packets in a 266 blind-of-application-context manner is not very suitable for media 267 streaming. 269 5. Significance Difference Among Packets and Within Packets 271 With the various scalability implemented in the media codec, some 272 bits of an encoded media stream are more important than others. Bits 273 belonging to base layer usually are more significant to the decoder 274 than bits belonging to enhancement layers. For example, I-frames 275 hold complete picture data [Orosz2015] and is frequently referenced 276 by the subsequent frames. It is inserted by the encoder when the 277 scene changes. Losing the first I-frame in the GOP (Group of 278 Pictures) would cause video picture even missing for few seconds, 279 because P- and B-frames referencing to the I-frame would not be 280 decoded nor displayed either. Thus, I-frames are most essential in 281 the media stream, which have the most effect on perceived video 282 quality, and such effect can last through the whole GOP. P- and 283 B-frames are inserted at appropriate places to reduce the video size 284 or bitrate and are tuned to maintain a certain video quality level. 285 P-frame stands for Predicted Frame and allows macroblocks to be 286 compressed using temporal prediction in addition to spatial 287 prediction. Video scenes with a low level of movement are less 288 sensitive to both B-frame and P-frame packet loss, alternatively 289 video scenes with a high level of movement are more sensitive to both 290 B-frame and P-frame packet loss. A lost P-frame can impact the 291 remaining part of the GOP. A lost B-frame has only local effects in 292 a slowly moving content or with large static background. In a scene 293 of a dynamically moving content, losing B-frame has more dramatic 294 impact and its scale can be as far-reaching as a P-frame loss. 296 As another example, macroblocks that are identified to represent the 297 objects in RoI are likely more important than other macroblocks of 298 non-RoI regions. For packets carrying RoI macroblocks in the media 299 stream need to have higher priority to be retained compared to other 300 packets carrying non-RoI macroblocks. 302 On the other hand, let's say that the end-users can reveal their 303 preferences to the network, e.g., degree of tolerance to the decoded 304 media content' quality degradation, which might reflect visually such 305 as resolution reduction, missing objects in non-RoI regions, the 306 network could selectively drop packets in a differentiated manner 307 according to such information. This avoids retransmission or delay 308 of those packets with higher significance, reduce the experienced 309 end-to-end latency of end users, and maintain the continuous 310 streaming of the media. This is achieved at the cost of dropping 311 lower-significance packets. 313 6. New Requirements 315 We have discussed in the previous sections that due to the various 316 types of scalability implemented in the media codecs, "significance 317 difference" exists among packets or even among parts of the packets. 318 In other words, some packets containing the more important 319 macroblocks (e.g., RoI macroblocks, base layer macroblocks) show 320 higher significance than other packets for the media decoding at the 321 receiver side and the improvement of QoE of end users. In order for 322 the network be able to treat the packets of media streams in a 323 differentiated manner and at finer granularity than DiffServ, the 324 application shall reveal some information to the network to enable 325 selective packet dropping or partial packet dropping. Some examples 326 are listed below: 328 o Receiving end user's preference on media quality, e.g. tolerable 329 quality degradation regarding for example resolution. 331 o Labeling of the packets or some parts of the packets that 332 correspond to receiver's interested objects as RoI. 334 o Characteristics of media content contained in the packets, e.g. 335 frame type, movement level. 337 Correspondingly, the network shall be able to leverage the above 338 information revealed by the application, and selectively drop packets 339 or parts of the packets from competing media streaming flows with 340 precedence order when network congestion happens. The retransmission 341 could be maximumly eliminated. The receiving end user is able to 342 consume the delivered packets as many as possible in-time with 343 acceptable quality. 345 7. IANA Considerations 347 This document requires no actions from IANA. 349 8. Security Considerations 351 This document introduces no new security issues. 353 9. Acknowledgements 355 10. Informative References 357 [Adams2013] 358 Adams, R., "Active Queue Management: A Survey", IEEE 359 Communications Surveys and Tutorials, vol. 15, no. 3, pp. 360 1425-1476, 2013, . 363 [Bentaleb2019] 364 Bentaleb, A., Taani, B., Begen, A. C., Timmerer, C., and 365 R. Zimmermann, "A Survey on Bitrate Adaptation Schemes for 366 Streaming Media Over HTTP", IEEE Communications Surveys 367 and Tutorials, vol. 21, no. 1, pp. 562-585, 2019, 368 . 370 [CiscoNetworkingIndex] 371 Cisco, "Cisco Visual Networking Index: Forecast and 372 Methodology, 2016 to 2021", June 2017, 373 . 377 [Conklin2001] 378 Conklin, G. J., Greenbaum, G. S., Lillevold, K. O., 379 Lippman, A. F., and Y. A. Reznik, "Video Coding for 380 Streaming Media Delivery on the Internet", IEEE 381 Transactions on Circuits and Systems for Video 382 Technology, vol. 11, no. 3, pp. 269-281, 2001, 383 . 385 [H.264] ITU-T, "H.264 : Advanced Video Coding for Generic 386 Audiovisual Services", 2019, 387 . 389 [Kim2001] Kim, T., "Scalable video Streaming Over Internet", Ph.D. 390 Thesis, School of Electrical and Computer Engineering, 391 GeorgiaInstitute of Technology, January 2005, 392 . 394 [Kua2017] Kua, J., Armitage, G., and P. Branch, "A Survey of Rate 395 Adaptation Techniques for Dynamic Adaptive Streaming Over 396 HTTP", IEEE Communications Surveys and Tutorials, vol. 19, 397 no. 3, pp. 1842-1866, 2017, 398 . 400 [McCanne1996] 401 McCanne, S., Jacobson, V., and M. Vetterli, "Receiver- 402 Driven Layered Multicast", ACM Sigcomm, pp. 117-130, 403 1996, 404 . 407 [MPEG-DASH] 408 ISO/IEC, "23009-1:2019, Dynamic Adaptive Streaming over 409 HTTP (DASH) - Part 1: Media Presentation Description and 410 Segment Formats", 2019, 411 . 413 [MPEG-DASH-SAND] 414 ISO/IEC, "23009-5:2017, Dynamic Adaptive Streaming over 415 HTTP (DASH) - Part 5: Server and Network Assisted DASH 416 (SAND)", February 2017, 417 . 419 [Orosz2015] 420 Orosz, P., Skopko, T., and P. Varga, "Towards Estimating 421 Video QoE Based on Frame Loss Statistics of the Video 422 Streams", DOI: 10.1109/INM.2015.7140482, IFIP/IEEE 423 International Symposium on Integrated Network Management 424 (IM), pp. 1282-1285, 2015, 425 . 427 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 428 and W. Weiss, "An Architecture for Differentiated 429 Services", RFC 2475, December 1998, 430 . 432 [RFC8836] Jesup, R. and Z. Sarker, "Congestion Control Requirements 433 for Interactive Real-Time Media", RFC 8836, January 2001, 434 . 436 [Saadi2019] 437 Al-Saadi, R., Armitage, G., But, J., and P. Branch, "A 438 Survey of Delay-Based and Hybrid TCP Congestion Control 439 Algorithms", IEEE Communications Surveys and Tutorials, 440 vol. 21, no. 4, pp. 3609-3638, 2019, 441 . 443 [SVC] Schwarz, H., Marpe, D., and T. Wiegand, "Overview of the 444 Scalable Video Coding Extension of the H.264/AVC 445 Standard", IEEE Transactions on Circuits and Systems for 446 Video Technology, vol. 17, no. 9, 1103-1120, 2007, 447 . 449 [Thiruchelvi2008] 450 Thiruchelvi, G. and J. Raja, "A Survey On Active Queue 451 Management Mechanisms", International Journal of Computer 452 Science and Network Security, vol. 8, 2008, 453 . 456 [Wu2000] Wu, D., Hou, Y., and Y. Zhang, "Transporting Real-Time 457 Video Over the Internet: Challenges and approaches", 458 Proceedings of the IEEE, vol. 88, no. 12, 1855-1875, 2000, 459 . 461 [Wu2001] Wu, D., Hou, Y., Zhu, W., Zhang, Y., and J. Peha, 462 "Streaming Video Over the Internet: Approaches and 463 Directions", IEEE Transactions on Circuits and Systems for 464 Video Technology, vol. 11, no. 3, pp. 282-300, 2001, 465 . 467 Authors' Addresses 469 Lijun Dong 470 Futurewei Technologies Inc. 471 U.S.A 473 Email: lijun.dong@futurewei.com 475 Kiran Makhijani 476 Futurewei Technologies Inc. 477 U.S.A 479 Email: kiran.ietf@gmail.com 481 Richard Li 482 Futurewei Technologies Inc. 483 U.S.A 485 Email: richard.li@futurewei.com