idnits 2.17.1 draft-huang-alto-mowie-for-network-aware-app-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 19) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 21 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 4 has weird spacing: '...ntended statu...' == Line 392 has weird spacing: '...rmation to pe...' == Line 787 has weird spacing: '...yPRBNum metri...' == Unrecognized Status in 'Intended status: Proposed Standard', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document date (July 13, 2020) is 1381 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'CS2P' is defined on line 856, but no explicit reference was found in the text == Unused Reference: '5GAA' is defined on line 906, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ALTO W. Huang 3 Internet Draft Y. Zhang 4 Intended status: Proposed Standard Tencent 5 Expires: January 2021 R.Yang 6 Yale University 7 C. Xiong 8 Y. Lei 9 Y. Han 10 Tencent 11 G. Li 12 CMRI 13 July 13, 2020 15 MoWIE for Network Aware Application 16 draft-huang-alto-mowie-for-network-aware-app-01 18 Abstract 20 With the quick deployment of 5G networks in the world, cloud based 21 interactive services such as clouding gaming have gained substantial 22 attention and are regarded as potential killer applications. To 23 ensure users' quality of experience (QoE), a cloud interactive 24 service may require not only high bandwidth (e.g., high-resolution 25 media transmission) but also low delay (e.g., low latency and low 26 lagging). However, the bandwidth and delay experienced by a mobile 27 and wireless user can be dynamic, as a function of many factors, and 28 unhandled changes can substantially compromise users' QoE. In this 29 document, we investigate network-aware applications (NAA), which 30 realize cloud based interactive services with improved QoE, by 31 efficient utilization of Mobile and Wireless Information Exposure 32 (MoWIE) . In particular, this document demonstrates, through 33 realistic evaluations, that mobile network information such as MCS 34 (Modulation and Coding Scheme) can effectively expose the dynamicity 35 of the underlying network and can be made available to applications 36 through MoWIE; using such information, the applications can then 37 adapt key control knobs such as media codec scheme, encapsulation 38 and application logical function to minimize QoE deduction. Based on 39 the evaluations, we discuss how MoWIE can be a systematic extension 40 of the ALTO protocol, to expose more lower-layer and finer grain 41 network dynamics. 43 Status of this Memo 45 This Internet-Draft is submitted to IETF in full conformance with 46 the provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF), its areas, and its working groups. Note that 50 other groups may also distribute working documents as Internet- 51 Drafts. 53 The list of current Internet-Drafts is at 54 https://datatracker.ietf.org/drafts/current/. 56 Internet-Drafts are draft documents valid for a maximum of six 57 months and may be updated, replaced, or obsoleted by other documents 58 at any time. It is inappropriate to use Internet-Drafts as reference 59 material or to cite them other than as "work in progress." 61 The list of current Internet-Drafts can be accessed at 62 https://www.ietf.org/1id-abstracts.html 63 The list of Internet-Draft Shadow Directories can be accessed at 64 https://www.ietf.org/shadow.html 66 Copyright and License Notice 68 Copyright (c) 2020 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents 73 (https://trustee.ietf.org/license-info) in effect on the date of 74 publication of this document. Please review these documents 75 carefully, as they describe your rights and restrictions with 76 respect to this document. Please review these documents carefully, 77 as they describe your rights and restrictions with respect to this 78 document. Code Components extracted from this document must include 79 Simplified BSD License text as described in Section 4.e of the Trust 80 Legal Provisions and are provided without warranty as described in 81 the Simplified BSD License. 83 Table of Contents 84 1. Introduction of Network-aware Applications.......................3 85 2. Use Cases of Network-Aware Application (NAA).....................5 86 2.1. Cloud Gaming.................................................5 87 2.2. Low Delay Live Show..........................................5 88 2.3. Cloud VR.....................................................6 89 2.4. Performance Requirements of these Use Cases..................6 90 3. Current (Indirect) Technologies on NAA...........................7 91 3.1. Video Compression Based on ROI (Region of Interest)..........7 92 3.2. AI-based Adaptive Bitrate....................................8 93 4. Preliminary Improvement Based on MoWIE...........................9 94 4.1. ROI Detection with Network Information......................11 95 4.2. Adaptive Bitrate with Network Capability Exposure...........13 96 4.3. Analysis of the Experiments.................................15 97 5. Standardization Considerations of MoWIE as an Extension to ALTO.17 98 6. Security Considerations.........................................18 99 7. References......................................................18 100 7.1. Normative References........................................18 101 7.2. Informative References......................................19 102 Authors' Addresses.................................................20 104 1. Introduction of Network-aware Applications 106 With the quick and widely deployment of 5G network in the world, 107 more and more applications are now moving to the remote cloud-based 108 application, e.g., cloud office, cloud education and cloud gaming. 110 Some new and amazing applications are created and hosted in the 111 remote cloud, e.g., cloud AR/VR/MR. What's more a lot of traditional 112 niche interactive applications are becoming widely used in daily 113 business with the help of mobile network and cloud, e.g., cloud 114 video conference. Especially, during the coronavirus pandemic in 115 2020, many peoples have to stay at home and work/study remotely, the 116 usage of cloud applications, including cloud-based online courses, 117 cloud-based conferencing, and cloud gaming, has surged significant. 119 To provide acceptable QoE to the end users via the mobile network, 120 the cloud application needs to know the mobile network status, e.g., 121 delay, bandwidth, jitter to dynamically balance the generated media 122 traffic and the rendering/mixing in the cloud. Currently, the 123 application assumes the network as a black box and continuously uses 124 client or server measurement to detect the network characteristics, 125 and then adaptively change the parameters as well as logical 126 function of the application. However, when only application 127 information is utilized, the application can't guarantee a good QoE 128 in some cases. First, information from application side may have 129 delay. When a user enters some place with bad network such as 130 elevator or underground garage, the application will not receive 131 such information immediately. As a result, the buffer of video 132 application may have a high chance to run out. Then the screen will 133 freeze and users QoE will be harmed. Besides, the application does 134 not have information about other users in the cell. Thus, it can't 135 know how many resources it can get and when it will change. If other 136 users enter the cell and compete the resource, the application layer 137 may misjudge the resource and request a high bitrate. Then the delay 138 will increase and QoE will drop. Some information from network layer 139 like physical resource block (PRB) information and utilization rate 140 can help to describe how many resources the user will get and how 141 many users are competing with him. Such information is helpful to 142 predict the network and streaming videos. However, the application 143 can't get those kinds of information yet. 145 Mobile network is always pursuing standard solutions to get network 146 dynamic indicators that can be used by applications. In 3GPP, a lot 147 of IP-based QoE mechanism are reused. The ECN[RFC3168] has been 148 supported by the 4G radio station (eNB) to provide CE(Congestion 149 Encountered) information to the IMS application to perform the 150 Adaptive Bitrate (ABR) [TS26.114].The application can downgrade the 151 bit rate after receiving the CE indication, but does not know exact 152 bit rate to be selected. The DSCP[RFC2474] is used to difference the 153 QoS class and paging strategy[TS23.501],normally the application 154 cannot dynamically change the DSCP to improve bit rate based on the 155 network status. DASH [MPEG DASH] is a MPEG standard widely used for 156 the application to detect the throughput of the network based on the 157 current throughput and buffering states and adaptively select the 158 next segment of video streaming with a suitable bitrate in order to 159 avoid the re-buffering. SAND-DASH[TS26.247] defines the mechanism 160 that the network/server can provide available throughput to the 161 application, in such case, the better bitrate can be selected by DASH 162 application. 164 In 5G cellular networks, network capability exposure has been 165 specified which allows the 5G system to expose the QoS Flow 166 establishment with AF provided QoS requirements, user device 167 location, network status towards the 3rd party application servers 168 modeled as AF (Application Function) [TS23.501].In such case, the AF 169 can request the 5G to establish a dedicated QoS Flow to transport an 170 IP flow with the AF provided QoS requirements. The 5G also can 171 provide QNC (QoS Notification Control) to the AF if the 172 GBR(Guaranteed Bitrate) of the established GBR QoS Flow cannot be 173 fulfilled, and the AF can change the bitrate after receiving the QNC 174 notification. But the AF still does not know which bitrate to be 175 selected. So the 5G enhances the QNC with providing a list of 176 AQPs(alternative QoS profile). with this AQP, the 5G network provides 177 a subset of supported AQPs with the QNC, then the AF selects a bit 178 rate from 5G network supported AQPs, in such case, the GBR can 179 fulfilled again if the radio state of user is changed. QoS 180 predication is realized by network function inside 5GC to collect and 181 analyze the status and parameters from the 5G network entities, and 182 deliver the analytics results towards the entity such as application 183 server. However, both network capability exposure and QoS 184 predication solutions are designed for 5G access and core network, 185 which cannot cover the whole end-to-end network. How to enable the 186 application to be aware of the lower layer networks in Internet 187 scenario is an important area for both industrial and academic 188 researchers. 190 2. Use Cases of Network-Aware Application (NAA) 192 There are three typical NAAs, cloud gaming, low delay live show, and 193 cloud VR, whose QoE can be largely enhanced with the help of MoWIE. 195 2.1. Cloud Gaming 197 As mentioned above, cloud gaming becomes more and more popular 198 recently. This kind of games requires low latency and highly reliable 199 transmission of motion tracking data from user to gaming server in 200 the cloud, as well as low latency and high data rate transmission of 201 processed visual content from gaming server cloud to the user 202 devices. Cloud gaming is regarded as one major killer application as 203 well as traffic contributor to wireless and cellular networks 204 including 5G. The major advantages of cloud gaming are easy & quick 205 starting (no/less need to download and install big volume of software 206 in the user device), less cost and process load in user device and it 207 is also regarded as anti-cheating measure. Thus, the kind of gaming 208 becomes a competitive replacement for console gaming using cheaper PC 209 or laptop. In order to support high quality cloud gaming services, 210 the application need to get the information from the network layer, 211 e.g., the data rate value or range which lower layer can provide in 212 order to perform rendering and encoding, during which the application 213 in the cloud can adopt different parameters to adjust the size of 214 produced visual content within a time period. 216 2.2. Low Delay Live Show 218 In 2019, over 500 million active users were using online personal 219 live show services in China and there are 4 million simultaneous 220 online audience watching a celebrity's show. Low delay live show 221 requires the close interaction between application and network. 223 Compared with conventional broadcast services. This service is 224 interactive which means the audience can be involved and they are 225 able to provide feedback to the anchor. For example, a gaming show 226 broadcasts the gaming playing to all audience, and it also requires 227 playing game interaction between the anchor and the audience. A delay 228 lower than 100ms is desired. If the delay is too large, there will be 229 undesirable degradation on user experiences especially in a large- 230 scale show. To lower the latency and provide size-adjustable show 231 content, the application also requires the real-time lower layer 232 information. 234 2.3. Cloud VR 236 Cloud VR data volume is large which is related to different parameter 237 settings like DoF (Degree of Freedom), resolution and adopted 238 rendering and compression algorithm. The rendering can be performed 239 at the cloud/network side or a mix of the cloud and the user device 240 side. Because the latency in cloud VR is even as low as 20ms, the 241 application may need to interact with network to get the information 242 about the segmentation or transport block information, and these 243 lower layers information may be dependent on different layer 2 and 244 layer 3 wireless protocol designs. 246 2.4. Performance Requirements of these Use Cases 248 There are different bandwidth, latency and lagging requirements for 249 the above services which are characterized as parameter range. The 250 reason of using a range is because such requirements are related to a 251 group of parameter settings including resolution, frame rate and the 252 compression mechanism. We consider 1080p~4K as the resolution range, 253 60-120 FPS (Frames per second) as the frame rate and H.265 as an 254 example compression algorithm. The end-to-end latency requirement is 255 not only related to FPS but also the property of the service, i.e., 256 for weak interactive and strong interactive services [GSMA]. 258 With the typical parameters setting, cloud gaming generally needs a 259 bandwidth of 20~60 Mbps , we also consider the lagging significantly 260 happens when the latency is larger than 40~200ms, depending on the 261 types of games (e.g. 40ms for First Person Shoot games, 80ms for 262 Action games, and 200ms for Puzzle games).. In order to avoid bad 263 user experiences, the lagging rate is better to be as low as zero (in 264 an optimal QoE). For low latency live show, 20~50 Mbps bandwidth may 265 be needed and the end-to-end latency requirements is less than 100 266 ms. Cloud VR service generally requires 100~500 Mbps bandwidth and 267 20~50 ms end-to-end latency. It is noted that these values are 268 dependent with the parameter settings and they are provided to 269 illustrate the order of magnitude of these parameters for the afore- 270 mentioned use cases. These value range may be updated according to 271 specific scenarios and requirements. 273 3. Current (Indirect) Technologies on NAA 275 The applications have tried to increase QoE with the help of network 276 information captured from the application layer to guess the network 277 dynamics, such as bitrate, buffer status, packet loss rate and so on. 279 For example, adaptive bitrate (ABR) and buffer control methods to 280 reduce delay, and application layer forward error scheme (AL-FEC) to 281 avoid packet losing are proposed. This document focuses on two novel 282 approaches, which have achieved good performance in practice. One is 283 video encoding based on ROI, the other is reinforcement learning 284 based adaptive bitrate. 286 3.1. Video Compression Based on ROI (Region of Interest) 288 A foveated mechanism [Saccadic] in the Human Visual System indicates 289 that only small fovea region captures most visual attention at high 290 resolution, while other peripheral regions receive little attention 291 at low resolution. And we call those regions which attract users 292 most, the regions of interest (ROI)[Fahad]. 294 To predict human attention or ROI, saliency detection has been widely 295 studied in recent years [Borji], with a lot of applications in object 296 recognition, object segmentation, action recognition, image caption, 297 image/video compression, etc. 299 Since there exists the region of interest in a video, the cloud 300 server can give the ROI region higher rate while making other regions 301 a lower rate. As a result, the whole rate of the video is reduced 302 while the watching experience will not be harmed. 304 This method means to detect the ROI and re-allocate the coding scheme 305 for interested and non-interested regions in order to save the 306 bandwidth without sacrificing user's QoE. In recent years, the ever- 307 increasing video size has become a big problem to applications. The 308 data rate of a cloud gaming video in 1080P can reach 25Mbps, which 309 brings huge burden to the network, even for 5G network. Those ROI- 310 based video compression methods are mainly applied to the high 311 concurrency network to relive the burden of networks and then keep 312 QoE in an acceptable range. 314 However, current methods utilize application information like 315 application rate and application buffer size as the indicators to 316 roughly adjust the algorithm in interactive video services. That 317 information is hard to reflect the real-time network status 318 precisely. Therefore, it is hard to balance the QoE and bandwidth 319 saving in real-time scenario. More direct information is helpful for 320 those ROI methods to improve the performance. 322 3.2. AI-based Adaptive Bitrate 324 This method intends to reduce lagging and ensure the acceptable 325 picture quality. 327 Applications such as video live streaming and cloud gaming employ 328 adaptive bitrate (ABR) algorithms to optimize user QoE [MPC][ CS2P]. 330 Despite the abundance of recently proposed schemes, state-of-the-art 331 AI based ABR algorithms suffer from a key limitation. They use fixed 332 control rules based on simplified or inaccurate models of the 333 deployment environment. As a result, existing schemes inevitably fail 334 to achieve optimal performance across a broad set of network 335 conditions and QoE objectives. 337 A reinforcement learning based ABR algorithm named Pensieve was 338 proposed [Hongzi] recently. Unlike traditional ABR algorithms that 339 use fixed heuristics or inaccurate system models, Pensieve's ABR 340 algorithms are generated using observations of the resulting 341 performance of past decisions across a large number of video 342 streaming experiments. This allows Pensieve to optimize its policy 343 for different network characteristics and QoE metrics directly from 344 experience. Over a broad set of network conditions and QoE metrics, 345 it has been proven that Pensieve outperformed existing ABR algorithms 346 by 12%~25%. 348 For this method and those methods built upon this, it has been proven 349 that all the information, such as rate, download time, buffer size or 350 network level information which can reflect the performance are 351 useful to the reinforcement learning [Hongzi2]. Since those data can 352 reflect the network dynamics, they have been used to help the 353 applications to know how to change the rate and promote the users' 354 QoE. 356 However, all these data are obtained from the client side or the 357 server side. In reality, it is not easy to obtain such data in an 358 effective and efficient way. Lack of standardized approach to acquire 359 these data, is difficult to make this usable for different 360 applications for large scale deployment. Meanwhile, these data which 361 reflect the real-time network status change rapidly and randomly 362 which is hard to use a theoretical model to characterize. 364 To summarize, current practices can make some improvements by 365 indirectly measuring network status and react in the application. 367 However, the network status data is not rich, direct, real-time, also 368 lacks predictability, especially when in the mobile and wireless 369 network scenarios, which results in long react delay or high QoE 370 fluctuations. 372 4. Preliminary QoE Improvement Based on MoWIE 374 4.1. MoWIE Architecture and Network Information exposure 376 The fundamental idea of MoWIE is to achieve on demand and periodic 377 network information from network to applications, helping the service 378 provider to do a better policy control to improve user experience. 380 A possible MoWIE architecture include three core components, the 381 Client Application, the Mobile Network and the Application Server. 383 The raw data are collected firstly from the radio network and core 384 network and then further processing on these collected data and 385 exposed Network information are provided to the application Server. 387 These functions are defined as The network information service 388 (NIS)and the NIS can be deployed at MEC (Mobile Edge Computing). The 389 application server can send the NIS request on UE/Cell level 390 information, and obtain the NIS response on network information from 391 the mobile network. After user data pre-processing, the application 392 server will make best use of the network information to perform 393 analytics and directly influent the application functions e.g. bit 394 rate, data amount etc. 396 Typically, the network information includes two types of information 397 as below: 399 Cell level Information: 401 - The number of Downlink PRBs (Physical Resource Block) occupied 402 during sampling period; and 403 - the Downlink MAC data rate per cell; 404 UE level information (without privacy information): 406 - The Uplink SINR (Signal to Inference plus Noise Ratio); 407 - MCS: The index of MCS (Modulation and Coding Scheme); 408 - The number of packets occupied in PDCP buffer; The number of Downlink PDCP SDU packets; 409 - The number of PDCP SDU packets lost; 410 - The Downlink MAC data rate per UE. 412 4.2. RAN assisted TCP optimization based on MoWIE 414 The RAN information are used to assist TCP sending window adjustment 415 rather than traditional transport layer measurement and 416 acknowledgement. The RAN proactively predicts available radio 417 bandwidth and the buffer status per UE in a time granularity of RTT 418 level (e.g. 100ms) and then piggybacks such information in TCP ACK. 420 We have conducted trial in real mobile network. It is observed that 421 for the UE with good SINR, the throughput is significantly improved 422 by nearly 100%, and the UE with medium SINR can achieve 423 approximately 50% gain. 425 4.3. NAA QoE Test based on MoWIE 427 Different from traditional video streaming, cloud gaming has no 428 buffer to accommodate and re-arrange the received data. It must 429 display the stream once the stream is received. Any late stream is of 430 no use for the player. Cloud gaming performs not well in the existing 431 public 4G network according to our actual measurements. The end to 432 end delay is often greater than 100ms for a gaming client in Shenzhen 433 to a gaming server in Shanghai, coupled with the codec delay. Here 434 the delay is defined as the total delay from the user's operation 435 instruction to show the response picture on user's screen. 437 Once the network fluctuates, users will experience a longer delay. 439 The poor user experience is not only because of the relative low 440 network throughput, but also because the server cannot adapt the 441 application logical policies (e.g. codec scheme and data bitrate). 443 The popularity of 4K and even higher resolution and increasing FPS 444 for cloud gaming and AR/VR services require both high bandwidth and 445 low latency in wireless and cellular networks. The increasing 446 resolution would incur a higher encoding and decoding delay. However, 447 users' tolerance to delay will not increase with the resolution, 448 which means the application needs to adapt to the network dynamics in 449 a more efficient way. The higher resolution, the larger range of the 450 rate adaptation can be used. 452 In this section, we make experiments based on the methods described 453 in section 3 to improve the QoE of cloud gaming. The performance 454 between network-aware and native non-network-aware mechanisms are 455 compared. 457 4.4. ROI Detection with Network Information 459 The first experiment is based on the ROI detection. We will 460 investigate the impact of network perception. 462 Saliency detection method has successfully reduced the size of videos 463 and improve the QoE of users in video downloading [Saliency]. 465 However, it is not effective when applied to real-time interactive 466 streaming such as cloud gaming. 468 As we know, more accurate saliency region detection algorithm needs 469 more time to obtain the result. However, when the users are suffering 470 a bad performance network in cloud gaming, this precise detection may 471 incur more delay to the system. As a result, it will harm the final 472 QoE. 474 If the application can learn the network well in a real-time manner, 475 it can choose the algorithm based on how much delay the system can 476 tolerate. If the network condition is good enough, it can adopt an 477 algorithm which has deeper learning network and the added delay will 478 not be perceived by the end users. Thus, it can save huge bandwidth 479 without harming the QoE. On the other side, in a network with bad 480 condition, the server can use the fastest method to avoid extra 481 delay. 483 We make the experiments to show how the network information will 484 influence the total QoE and bandwidth saving in ROI detection. 486 The following 4 methods are compared: 488 1) The original video, without using ROI method. This acts as a 489 baseline. 491 2) Quick saliency detection and encoding method, which is not 492 accuracy in some cases. It only brings 10ms delay [Minbarrier]. 494 3) A relative accuracy saliency detection method. In general, if an 495 algorithm is more precise, it will take more time to get the results. 497 And the complexity of the picture will also influence the detection 498 time and accuracy. Based on our test video, we adopt the method which 499 brings delay about 40~70ms [LSTM]. 501 4) The application server in the cloud has the current bandwidth 502 information which derived from the wireless LAN NIC. Here it is a 503 simulation that all the collected bandwidth traces are already known 504 by the server. Thus, it can use the bandwidth traces to compute 505 transmission delay. Then the server can change the saliency detection 506 algorithm based on this information and then encode the video. 508 Although the result of future bandwidth prediction is not always 509 accurate in real environment, the assumption here will not influence 510 the final results much. Since in cloud gaming the server encodes the 511 stream based on ROI information frame by frame instead of in a grain 512 of chunks, the future bandwidth prediction window size doesn't have 513 to be long. Therefore, even the server can only get the bandwidth or 514 delay prediction for a short time window, the server can still use 515 this method with network information. 517 Test environment: 519 A 720P game video segment with a rate of 6.8Mbps. This is not a very 520 high bandwidth requirement example in cloud gaming. We just show how 521 it will benefit from MoWIE. High bandwidth requirement case will 522 benefit more if the bandwidth fluctuates much. 524 The three different networks are all wireless networks and the 525 available bandwidth is varied frequently, where 526 Network 1: The overall network condition is not very good, the 527 average network bandwidth is 7.1Mbps, but it continues to fluctuate, 528 and the minimum is only 3.9Mbps. 530 Network 2: The overall network condition is good, with an average 531 network bandwidth of 12Mbps and a minimum of 6.4Mbps. 533 Network 3: The network fluctuates dramatically, with an average 534 network bandwidth of 8.4Mbps and a minimum network bandwidth of 535 3.7Mbps 537 Test content: 539 The four methods are conducted on the original video under each three 540 networks. After re-encoding based on the saliency detection, we 541 calculate the new QoE and the saved bandwidth. The results are shown 542 in the Figure 4-1: 544 The QoE value is the MOS as standardized in the ITU. 546 +---+-----------------+-----------------+-----------------+ 547 | | Network 1 | Network 2 | Network 3 | 548 +---+---+-------------+---+-------------+---+-------------+ 549 | |QoE| BW Saving |QoE| BW Saving |QoE| BW Saving | 550 +---+---+-------------+---+-------------+---+-------------+ 551 | 1 |3.8| 0 |4.8| 0 |4.3| 0 | 552 +---+---+-------------+---+-------------+---+-------------+ 553 | 2 |3.8| 5% |4.8| 9% |4.3| 7% | 554 +---+---+-------------+---+-------------+---+-------------+ 555 | 3 |2.2| 2.1% |4.6| 38% |3.1| 34% | 556 +---+---+-------------+---+-------------+---+-------------+ 557 | 4 |3.6| 9% |4.7| 33% |4.3| 25% | 558 +---+---+-------------+---+-------------+---+-------------+ 559 Figure 4-1: QoE and Bandwidth Saving 561 Conclusion: 563 It can be seen that the methods such as method 2 and method 3 that do 564 not rely on the network information directly, have certain 565 limitations. 567 Though the method 2 is simple and time-consuming, it can only detect 568 a small part of region of interest accurately. Thus, even if the 569 network condition is very good, it can only save a small amount of 570 bandwidth, and sometimes there are some incorrect ROI detection. The 571 QoE will be reduced without hitting the ROI region. 573 For Method 3, the algorithm is complicated, and it can correctly 574 detect the user's area of interest, so that it can re-allocate 575 encoding scheme and save a lot of bandwidth. However, its algorithm 576 will introduce higher delay. When the user network condition is poor, 577 the extra delay will cause even worst user's QoE. Although the 578 bandwidth is saved, it affects the user experience seriously. 580 Method 4 is based on the application's awareness of the network. If 581 the application can know certain network information, it can balance 582 the complexity of the algorithm (introducing delay) and the accuracy 583 of the algorithm (saving bandwidth) according to the actual network 584 conditions. As can be seen from the experiment, method 4 can ensure 585 the user's QoE and save the bandwidth greatly at the same time. 587 4.5. Adaptive Bitrate with Network Capability Exposure 589 This experiment is AI-based rate adaption by utilizing the network 590 information provided by the cellular base station (eNB) in cellular 591 network. 593 Tencent has launched real network testing of NAA-enabled cloud gaming 594 in China Mobile LTE network, with the enhancement in eNB supporting 595 base station information exposure. 597 To enable the NAA mechanism, some cellular network information from 598 eNBs are collected in an adaptive interval based on the change rate 599 of network status. There information is categorized in two levels, 600 i.e., cell level and UE level. Cell level information are common for 601 all the UEs under a serving LTE cell and UE level information is 602 specific for different UEs. 3GPP LTE specifications have specified 603 how the PDCP (Packet Data Convergence Protocol), RLC (Radio Link 604 Control), MAC (Medium Access Control) and PHY (Physical) protocols 605 operate and this information are very essential statistics from these 606 protocol layers. 608 It is noted that in NAA mechanism, as the network information is from 609 eNB, and the eNB has the real-time information of radio link quality 610 statistics and layer 1 and layer 2 operation information, NAA 611 mechanism can expose rich information to upper layer, e.g., it is 612 capable to differentiate packet loss and congestion, which is very 613 helpful to the applications in practice. 615 In order to compare the cases with and without NAA, the cloud gaming 616 test environment is setup with 1080p resolution and around 20Mbps 617 bitrate. 619 Test scenarios 1~5 are as follows. 621 Test scenarios 1: Weak network. This scenario is the case where radio 622 link quality is low, e.g., in cell edge area and the bandwidth is not 623 able to serve cloud gaming. 625 Test scenario 2: User competition scenario. This scenario is defined 626 as the case when user amount is large thus the cellular network 627 bandwidth cannot serve all the cloud gaming users. 629 Test scenario 3-5: Other scenarios with random user movement trace 630 and user distribution. 632 Test method: To simplify to comparison, we just use the MCS (MCS 633 index) information derived from the eNB [TS38.214]. The information 634 is provided directly to the application, and the application then 635 adjusts the bit rate according to this information. Here, MCS index 636 shows the modulation (e.g. QPSK, 16QAM,...) and the coding rate used 637 during physical layer transmission, which is relevant to the real 638 data rate per UE. The benchmark method is adopting a constant bit 639 rate without any information to help it predicting the network 640 condition. We compare these scenarios and observe the reduction of 641 delay when those eNB data are utilized. 643 For different scenarios, the lagging rate is defined as the 644 performance indicator. In our experiments, we assume lagging happens 645 when transmission delay is greater than 200ms and lagging rate is 646 defined as the ratio between the number of frames greater than 200ms 647 and the total number of frames. 649 +-------------+--------------------------+ 650 |Test Scenario| Reduction of Lagging Rate| 651 +-------------+--------------------------+ 652 | 1 | 46% | 653 +-------------+--------------------------+ 654 | 2 | 21% | 655 +-------------+--------------------------+ 656 | 3 | 37% | 657 +-------------+--------------------------+ 658 | 4 | 56% | 659 +-------------+--------------------------+ 660 | 5 | 32% | 661 +-------------+--------------------------+ 662 Figure 4-2: Reduction of Lagging Rate 664 It can be clearly seen that with the MCS information, the application 665 can adjust the bit rate to decrease the lagging rate and then 666 significantly improve the user QoE. In weak network scenario, 46% 667 lagging can be avoided by NAA. 669 4.6. Analysis of the Experiments 671 The above-mentioned technologies demonstrate the performance gain of 672 NAA with MoWIE. 674 Although application information can also help to predict the network 675 and have already been used in adaptive bit rate methods, the 676 application information is not as sensitive as eNB information at the 677 very beginning in a lot of cases. For example, when more users enter 678 the cell, the PRB information will first reflect that each user may 679 get less bandwidth. However, the application information needs to 680 react after there is a trend that the bitrate is decreasing. That is 681 to say, the lower layer network information is more directly. 683 Without MoWIE, the application cannot get the lower layer network 684 information directly and then try to detect "blindly" to adapt to the 685 dynamics of the lower layer network, which cannot meet the 686 requirements of cloud interactive applications like cloud gaming, low 687 delay live show and Cloud VR. 689 It is noted that the more real-time network resource status the 690 application can learn, the better it can predict how much network 691 resource it can use within a prediction time window. However, there 692 is tradeoff between network information collection frequency and its 693 load and feasibility to the network devices. In principle, the total 694 network resource consumed for such network status reporting is also 695 designed in light-weight manner, e.g., by properly controlling the 696 interval of report and also the number of bits needed to convey the 697 reported information elements. In our experiments, the network status 698 information can be obtained in an adaptive interval based on the 699 change rate of network status, in order to provide good prediction 700 with less load introduced in the network. In fact, not all scenarios 701 need a very frequent information collection. If some information only 702 changes in a very small range and won't influence the final decision, 703 it is unnecessary to report such information all the time. However, 704 if its value varies over the preset threshold, it will inform the 705 application immediately. 707 The distribution and impact of the exposed data to the performance 708 gain for different algorithm needs to be further studied. This draft 709 is to give a guidance to figure out what kind of data needs to be 710 exposed during initial deployment of these mechanisms. 712 In our current cloud gaming, the application information can help to 713 reduce about 50% the lagging rate. The left 50% improvement room can 714 be achieved by network information exposure with MoWIE. Actually, the 715 effect of the two-layer information can be accumulated. However, due 716 to current deployment limitation, we cannot collect the application 717 information with the eNB information at the same time. Thus, in this 718 version of the draft we compare the performance with and without 719 MoWIE. We don't compare between application information assisted mode 720 and network information assisted mode in this draft. This is our on- 721 going work. Since both application and eNB information can reflect 722 the network variation, we will compare the performance among 723 application information assisted mode, network information assisted 724 mode and the mode of utilizing both layer information. 726 5. It should be noticed that the previous mechanisms may also work on 728 IEEE 802.11 standards (e.g. EHT), helping SP having a better 729 understanding for the network environment between AP and STAs. Based 730 on the fact that 802.11 devices are working on unlicensed spectrums, 731 and easily influenced by adjacent unlicensed devices, duty cycle and 732 related CQI information (e.g. MCS, bandwidth, and etc.) are 733 considered very important network information here.Standardization 734 Considerations of MoWIE as an Extension to ALTO 735 MoWIE can be a realistic, important extension to ALTO to serve the 736 aforementioned use cases, in the setting of the newer generation (5G) 737 of cellular network, which is a completely open IP based network 738 where routers/UPF with IP connectivity will be deployed much closer 739 to the users. One may consider not only the aforementioned cloud- 740 based multimedia applications, but also other latency sensitive 741 applications such as connected vehicles and automotive driving. 743 Extending ALTO with MoWIE, therefore, may allow ALTO to expose lower 744 layer network information to ensure higher application QoE for a wide 745 spectrum of applications. 747 One possible approach to standardizing the distribution of the 748 network information used in the evaluations is to send such 749 information as piggyback information in the datapath. One issue with 750 datapath method is that MoWIE intends to convey more complex and rich 751 information than current methods. To piggyback such complex and rich 752 information in the datapath will take away a lot of datapath 753 resource. But the datapath-based method can provide frequent changed 754 network information and it is much easy to synchronize the network 755 information and user data in the same time scale; Normally, there is 756 less user data in the the uplink direction and the free "space" 758 within the MTU can be used to piggyback the network informaiton to 759 the application, in such case no additional create a second 760 communicaiton channel between the application and network. However, 761 the datapath design may bring out more limited privacy management, 762 which is very important in MoWIE. The application cannot trust the 763 network information if there is no message authentication mechanism 764 for the piggyback network information. How the network inserts the 765 network information in the data packet is also challengeable since a 766 lot of transport layer protocol are encrypted and integration 767 protected. Another method is to create an associated path aligned 768 with datapath. Like the ICMP for IP and RTCP for RTP, this second 769 path can be used to provide additional information associated with 770 the datapath. But creating such second path is a big change to 771 current widely used transport protocols and a lot of applications 772 also need to change, this second path is also challengeable. 774 In 3GPP, network information exposure based on control plane 775 mechanism is introduced in 4G and 5G systems. We mainly discuss ALTO 776 extension-based design in tackling with this problem. Specifically, 777 the MoWIE extension will reuse existing ALTO mechanisms including 778 information resource directory, extensible performance metrics and 779 calendaring, and unified properties. It also requires modular, 780 reusable extensions, which we plan to specify in detail in a separate 781 document. Below is an overview of key considerations; security 782 considerations are in the following section. 784 - Network information selection and binding consideration: Instead of 785 hardcoding only specific network information, a modular design of 786 MoWIE is an ability for an ALTO client to select only the relevant 787 information (e.g., cell DLOccupyPRBNum metric and UE MCS) and then 788 request correspondingly. Existing ALTO information resource 789 directory is a starting point, but the design needs to be generic, 790 to provide abstraction for ease of use and extensibility. The 791 security mechanisms of the existing ALTO protocol should also be 792 extended to enforce proper authorization. 794 - Compact network information encoding consideration: One benefit of 795 ALTO is its high-level JSON based encoding. When the update 796 frequency increases, the existing base protocol and existing 797 extensions (in particular the SSE extension), however, may have 798 high bandwidth and processing overhead. Hence, encoding and 799 processing overhead of MoWIE should be considered. 801 - Stability and reliability consideration: A key benefit of the MoWIE 802 extension is the ability to allow more flexible, better coordinated 803 control. Any control mechanism, however, should integrate 804 fundamental overhead, stability and reliability mechanisms. . 806 6. Security Considerations 808 The collection, distribution of MoWIE information should consider the 809 security requirements on information privacy and information 810 integration protection and authentication in both sides. Since the 811 network status is not directly related to any special user, there is 812 currently no any privacy issue. But the information transmitted to 813 the application can pass through a lot of middle box and can be 814 changed by the man in the middle. To protect the network information, 815 an end to end encryption and integration is needed. Also, the network 816 needs to authenticate the information exposure provided to right 817 applications. These security requirements can be implemented by the 818 TLS and other security mechanisms. 820 7. References 822 7.1. Normative References 824 [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of 825 Explicit Congestion Notification (ECN) to IP", RFC 3168, 826 . 828 [RFC2474] K. Nichols, S. Blake, F. Baker, D. Black, "Definition of 829 the Differentiated Services Field (DS Field) in the IPv4 830 and IPv6 Headers.", RFC 2474, . 833 7.2. Informative References 835 [Fahad] Fahad Fazal Elahi Guraya ; Faouzi Alaya Cheikh ; Victor 836 Medina; A Novel Visual Saliency Model for Surveillance 837 Video Compression, 2011 Seventh International Conference 838 on Signal Image Technology & Internet-Based Systems 840 [Hongzi] Hongzi Mao; Ravi Netravali; Mohammad Alizadeh; Neural 841 Adaptive Video Streaming with Pensieve; SIGCOMM '17: 843 Proceedings of the Conference of the ACM Special Interest 844 Group on Data Communication; August 2017 Pages 197-210 846 [Saccadic] E. Matin, Saccadic suppression: a review and an 847 analysis, Psychological bulletin 81 (12) (1974) 899-917. 849 [Borji] A. Borji, L. Itti, State-of-the-art Analysis and Machine 850 Intelligence, IEEE Transactions on 35 (1) (2013) 185-207. 852 [MPC] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. 2015. A 853 Control-Theoretic Approach for Dynamic Adaptive Video 854 Streaming over HTTP. In SIGCOMM. ACM. 856 [CS2P] Y. Sun et al. 2016. CS2P: Improving Video Bitrate Selection 857 and Adaptation with Data-Driven Throughput Prediction. In 858 SIGCOMM. ACM. 860 [Hongzi2] Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew 861 Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy; 862 Real-world Video Adaptation with Reinforcement Learning ; 863 ICML 2 2019 Workshop RL4RealLife 865 [Saliency] Chenlei Guo, Liming Zhang; A Novel Multiresolution 866 Spatiotemporal Saliency Detection Model and Its 867 Applications in Image and Video Compression, IEEE 868 TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 869 2010 871 [Minbarrier] 872 Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, 873 Brian Price, Radomir Mech; Minimum barrier salient object 874 detection at 80 fps. The IEEE International Conference on 875 Computer Vision (ICCV), 2015, pp. 1404-1412. 877 [LSTM] Lai Jiang; Mai Xu; Zulin Wang; Predicting video Saliency 878 with Object-to-Motion CNN and Two-layer Convolutional 879 LSTM, arXiv:1709.06316v3 [cs.CV] 14 Jan 2019. 881 [TS23.501] 3GPP TS 23.501 System architecture for the 5G System 882 (5GS), 883 http://www.3gpp.org/ftp//Specs/archive/23_series/23.501/23 884 501-g40.zip 886 [TS38.214] 3GPP TS 38.214, NR Physical layer procedures for data, 887 http://www.3gpp.org/ftp//Specs/archive/38_series/38.214/38 888 214-g00.zip 890 [TS26.114] 3GPP TS 26.114, IP Multimedia Subsystem (IMS); 891 Multimedia telephony; Media handling and interaction, 892 http://www.3gpp.org/ftp//Specs/archive/26_series/26.114/26 893 114-g40.zip 895 [MPEG DASH] 896 ISO/IEC 23009, Dynamic Adaptive Streaming over HTTP; 897 https://mpeg.chiariglione.org/standards/mpeg-dash 899 [iiMedia] 2019-2020 China Online Live Streaming Market Research 900 Report, https://www.iimedia.cn/c400/69017.html 902 [GSMA] Cloud AR/VR Whitepaper, Last updated on April 26, 2019, 903 https://www.gsma.com/futurenetworks/wiki/cloud-ar-vr- 904 whitepaper/# 906 [5GAA] https://5gaa.org/news/5gaa-releases-white-paper-on-making- 907 5g-proactive-and-predictive-for-the-automotive-industry/ 909 [TS23.287] 3GPP TS 23.287, Architecture enhancements for 5G System 910 (5GS) to support Vehicle-to-Everything (V2X) services, 911 http://www.3gpp.org/ftp//Specs/archive/23_series/23.287/23 912 287-g20.zip 914 [TS26.247] 3GPP TS 26.247, Progressive Download and Dynamic 915 Adaptive Streaming over HTTP (3GP-DASH) 917 Authors' Addresses 919 Wei Huang 920 Tencent Building, 921 No. 10000 Shennan Avenue, Nanshan District 922 Shenzhen, Guangdong, 518000 923 China 925 Email: wienhuang@tencent.com 927 Yunfei Zhang 928 Flat 9, No. 10 West Building. 930 Xi Bei Wang East Road 931 Beijing, 100090 932 China 934 Email: yanniszhang@tencent.com 936 Y. Richard Yang 937 Watson 208A, 938 51 Prospect Street 939 New Haven, CT 06511 940 USA 942 Email: yang.r.yang@yale.edu 944 Chunshan Xiong 945 Flat 9, No. 10 West Building. 947 Xi Bei Wang East Road 948 Beijing, 100090 949 China 951 Email: chunshxiong@tencent.com 953 Yixue Lei 954 Flat 9, No. 10 West Building. 956 Xi Bei Wang East Road 957 Beijing, 100090 958 China 960 Email: yixuelei@tencent.com 962 Yunbo Han 963 Tencent Building, 964 No. 10000 Shennan Avenue, Nanshan District 965 Shenzhen, Guangdong, 518000 966 China 968 Email: yunbohan@tencent.com 970 Gang Li 971 China Mobile Research Institute 972 No.32, Xuanwumenxi Ave, Xicheng District 973 Beijing 100053, 974 China 976 Email:ligangyf@chinamobile.com