idnits 2.17.1 draft-ietf-mops-ar-use-case-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 25, 2021) is 908 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-ietf-mops-streaming-opcons-07 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MOPS R. Krishna 3 Internet-Draft InterDigital Europe Limited 4 Intended status: Informational A. Rahman 5 Expires: April 28, 2022 InterDigital Communications, LLC 6 October 25, 2021 8 Media Operations Use Case for an Augmented Reality Application on Edge 9 Computing Infrastructure 10 draft-ietf-mops-ar-use-case-03 12 Abstract 14 A use case describing transmission of an application on the Internet 15 that has several unique characteristics of Augmented Reality (AR) 16 applications is presented for the consideration of the Media 17 Operations (MOPS) Working Group. One key requirement identified is 18 that the Adaptive-Bit-Rate (ABR) algorithms' current usage of 19 policies based on heuristics and models is inadequate for AR 20 applications running on the Edge Computing infrastructure. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on April 28, 2022. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Conventions used in this document . . . . . . . . . . . . . . 3 58 3. Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3.1. Processing of Scenes . . . . . . . . . . . . . . . . . . 3 60 3.2. Generation of Images . . . . . . . . . . . . . . . . . . 4 61 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 62 5. AR Network Traffic and Interaction with TCP . . . . . . . . . 6 63 6. Informative References . . . . . . . . . . . . . . . . . . . 7 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 66 1. Introduction 68 The MOPS draft, [I-D.ietf-mops-streaming-opcons], provides an 69 overview of operational networking issues that pertain to Quality of 70 Experience (QoE) in delivery of video and other high-bitrate media 71 over the Internet. However, as it does not cover the increasingly 72 large number of applications with Augmented Reality (AR) 73 characteristics and their requirements on ABR algorithms, the 74 discussion in this draft compliments the overview presented in that 75 draft [I-D.ietf-mops-streaming-opcons]. 77 Future AR applications will bring several requirements for the 78 Internet and the mobile devices running these applications. AR 79 applications require a real-time processing of video streams to 80 recognize specific objects. This is then used to overlay information 81 on the video being displayed to the user. In addition some AR 82 applications will also require generation of new video frames to be 83 played to the user. Both the real-time processing of video streams 84 and the generation of overlay information are computationally 85 intensive tasks that generate heat [DEV_HEAT_1], [DEV_HEAT_2] and 86 drain battery power [BATT_DRAIN] on the AR mobile device. 87 Consequently, in order to run future applications with AR 88 characteristics on mobile devices, computationally intensive tasks 89 need to be offloaded to resources provided by Edge Computing. 91 Edge Computing is an emerging paradigm where computing resources and 92 storage are made available in close network proximity at the edge of 93 the Internet to mobile devices and sensors [EDGE_1], [EDGE_2]. 95 Adaptive-Bit-Rate (ABR) algorithms currently base their policy for 96 bit-rate selection on heuristics or models of the deployment 97 environment that do not account for the environment's dynamic nature 98 in use cases such as the one we present in this document. 99 Consequently, the ABR algorithms perform sub-optimally in such 100 deployments [ABR_1]. 102 2. Conventions used in this document 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in [RFC2119]. 108 3. Use Case 110 We now describe a use case that involves an application with AR 111 systems' characteristics. Consider a group of tourists who are being 112 conducted in a tour around the historical site of the Tower of 113 London. As they move around the site and within the historical 114 buildings, they can watch and listen to historical scenes in 3D that 115 are generated by the AR application and then overlaid by their AR 116 headsets onto their real-world view. The headset then continuously 117 updates their view as they move around. 119 The AR application first processes the scene that the walking tourist 120 is watching in real-time and identifies objects that will be targeted 121 for overlay of high resolution videos. It then generates high 122 resolution 3D images of historical scenes related to the perspective 123 of the tourist in real-time. These generated video images are then 124 overlaid on the view of the real-world as seen by the tourist. 126 We now discuss this processing of scenes and generation of high 127 resolution images in greater detail. 129 3.1. Processing of Scenes 131 The task of processing a scene can be broken down into a pipeline of 132 three consecutive subtasks namely tracking, followed by an 133 acquisition of a model of the real world, and finally registration 134 [AUGMENTED]. 136 Tracking: This includes tracking of the three dimensional coordinates 137 and six dimensional pose (coordinates and orientation) of objects in 138 the real world[AUGMENTED]. The AR application that runs on the 139 mobile device needs to track the pose of the user's head, eyes and 140 the objects that are in view.This requires tracking natural features 141 that are then used in the next stage of the pipeline. 143 Acquisition of a model of the real world: The tracked natural 144 features are used to develop an annotated point cloud based model 145 that is then stored in a database.To ensure that this database can be 146 scaled up,techniques such as combining a client side simultaneous 147 tracking and mapping and a server-side localization are used[SLAM_1], 148 [SLAM_2], [SLAM_3], [SLAM_4]. 150 Registration: The coordinate systems, brightness, and color of 151 virtual and real objects need to be aligned in a process called 152 registration [REG]. Once the natural features are tracked as 153 discussed above, virtual objects are geometrically aligned with those 154 features by geometric registration .This is followed by resolving 155 occlusion that can occur between virtual and the real objects 156 [OCCL_1], [OCCL_2]. The AR application also applies photometric 157 registration [PHOTO_REG] by aligning the brightness and color between 158 the virtual and real objects.Additionally, algorithms that calculate 159 global illumination of both the virtual and real objects 160 [GLB_ILLUM_1], [GLB_ILLUM_2] are executed.Various algorithms to deal 161 with artifacts generated by lens distortion [LENS_DIST], blur [BLUR], 162 noise [NOISE] etc are also required. 164 3.2. Generation of Images 166 The AR application must generate a high-quality video that has the 167 properties described in the previous step and overlay the video on 168 the AR device's display- a step called situated visualization. This 169 entails dealing with registration errors that may arise, ensuring 170 that there is no visual interference [VIS_INTERFERE], and finally 171 maintaining temporal coherence by adapting to the movement of user's 172 eyes and head. 174 4. Requirements 176 The components of AR applications perform tasks such as real-time 177 generation and processing of high-quality video content that are 178 computationally intensive. As a result,on AR devices such as AR 179 glasses excessive heat is generated by the chip-sets that are 180 involved in the computation [DEV_HEAT_1], [DEV_HEAT_2]. 181 Additionally, the battery on such devices discharges quickly when 182 running such applications [BATT_DRAIN]. 184 A solution to the heat dissipation and battery drainage problem is to 185 offload the processing and video generation tasks to the remote 186 cloud.However, running such tasks on the cloud is not feasible as the 187 end-to-end delays must be within the order of a few milliseconds. 188 Additionally,such applications require high bandwidth and low jitter 189 to provide a high QoE to the user.In order to achieve such hard 190 timing constraints, computationally intensive tasks can be offloaded 191 to Edge devices. 193 Another requirement for our use case and similar applications such as 194 360 degree streaming is that the display on the AR/VR device should 195 synchronize the visual input with the way the user is moving their 196 head. This synchronization is necessary to avoid motion sickness 197 that results from a time-lag between when the user moves their head 198 and when the appropriate video scene is rendered. This time lag is 199 often called "motion-to-photon" delay. Studies have shown 200 [PER_SENSE], [XR], [OCCL_3] that this delay can be at most 20ms and 201 preferably between 7-15ms in order to avoid the motion sickness 202 problem. Out of these 20ms, display techniques including the refresh 203 rate of write displays and pixel switching take 12-13ms [OCCL_3], 204 [CLOUD]. This leaves 7-8ms for the processing of motion sensor 205 inputs, graphic rendering, and RTT between the AR/VR device and the 206 Edge. The use of predictive techniques to mask latencies has been 207 considered as a mitigating strategy to reduce motion sickness 208 [PREDICT]. In addition, Edge Devices that are proximate to the user 209 might be used to offload these computationally intensive tasks. 210 Towards this end, the 3GPP requires and supports an Ultra Reliable 211 Low Latency of 0.1ms to 1ms for communication between an Edge server 212 and User Equipment(UE) [URLLC]. 214 Note that the Edge device providing the computation and storage is 215 itself limited in such resources compared to the Cloud. So, for 216 example, a sudden surge in demand from a large group of tourists can 217 overwhelm that device. This will result in a degraded user 218 experience as their AR device experiences delays in receiving the 219 video frames. In order to deal with this problem, the client AR 220 applications will need to use Adaptive Bit Rate (ABR) algorithms that 221 choose bit-rates policies tailored in a fine-grained manner to the 222 resource demands and playback the videos with appropriate QoE metrics 223 as the user moves around with the group of tourists. 225 However, heavy-tailed nature of several operational parameters make 226 prediction-based adaptation by ABR algorithms sub-optimal[ABR_2]. 227 This is because with such distributions, law of large numbers works 228 too slowly, the mean of sample does not equal the mean of 229 distribution, and as a result standard deviation and variance are 230 unsuitable as metrics for such operational parameters [HEAVY_TAIL_1], 231 [HEAVY_TAIL_2]. Other subtle issues with these distributions include 232 the "expectation paradox" [HEAVY_TAIL_1] where the longer we have 233 waited for an event the longer we have to wait and the issue of 234 mismatch between the size and count of events [HEAVY_TAIL_1]. This 235 makes designing an algorithm for adaptation error-prone and 236 challenging. Such operational parameters include but are not limited 237 to buffer occupancy, throughput, client-server latency, and variable 238 transmission times.In addition, edge devices and communication links 239 may fail and logical communication relationships between various 240 software components change frequently as the user moves around with 241 their AR device [UBICOMP]. 243 Thus, once the offloaded computationally intensive processing is 244 completed on the Edge Computing, the video is streamed to the user 245 with the help of an ABR algorithm which needs to meet the following 246 requirements [ABR_1]: 248 o Dynamically changing ABR parameters: The ABR algorithm must be 249 able to dynamically change parameters given the heavy-tailed 250 nature of network throughput. This, for example, may be 251 accomplished by AI/ML processing on the Edge Computing on a per 252 client or global basis. 254 o Handling conflicting QoE requirements: QoE goals often require 255 high bit-rates, and low frequency of buffer refills. However in 256 practice, this can lead to a conflict between those goals. For 257 example, increasing the bit-rate might result in the need to fill 258 up the buffer more frequently as the buffer capacity might be 259 limited on the AR device. The ABR algorithm must be able to 260 handle this situation. 262 o Handling side effects of deciding a specific bit rate: For 263 example, selecting a bit rate of a particular value might result 264 in the ABR algorithm not changing to a different rate so as to 265 ensure a non-fluctuating bit-rate and the resultant smoothness of 266 video quality . The ABR algorithm must be able to handle this 267 situation. 269 5. AR Network Traffic and Interaction with TCP 271 In addition to the requirements for ABR algorithms, there are other 272 operational issues that need to be considered for AR use cases such 273 as the one descibed above. In a study [AR_TRAFFIC] conducted to 274 characterize multi-user AR over cellular networks, the following 275 issues were identified: 277 o The uploading of data from an AR device to a remote server for 278 processing dominates the end-to-end latency. 280 o A lack of visual features in the grid environment can cause 281 increased latencies as the AR device uploads additional visual 282 data for processing to the remote server. 284 o AR applications tend to have large bursts that are separated by 285 significant time gaps. As a result, the TCP congestion window 286 enters slow start before the large bursts of data arrive 287 increasing the perceived user latency. The study [AR_TRAFFIC] 288 shows that segmentation latency at 4G LTE (Long Term Evolution)'s 289 RAN (Radio Access Network)'s RLC (Radio Link Control) layer 290 impacts TCP's performance during slow-start. 292 6. Informative References 294 [ABR_1] Mao, H., Netravali, R., and M. Alizadeh, "Neural Adaptive 295 Video Streaming with Pensieve", In Proceedings of the 296 Conference of the ACM Special Interest Group on Data 297 Communication, pp. 197-210, 2017. 299 [ABR_2] Yan, F., Ayers, H., Zhu, C., Fouladi, S., Hong, J., Zhang, 300 K., Levis, P., and K. Winstein, "Learning in situ: a 301 randomized experiment in video streaming", In 17th 302 USENIX Symposium on Networked Systems Design and 303 Implementation (NSDI 20), pp. 495-511, 2020. 305 [AR_TRAFFIC] 306 Apicharttrisorn, K., Balasubramanian, B., Chen, J., 307 Sivaraj, R., Tsai, Y., Jana, R., Krishnamurthy, S., Tran, 308 T., and Y. Zhou, "Characterization of Multi-User Augmented 309 Reality over Cellular Networks", In 17th Annual IEEE 310 International Conference on Sensing, Communication, and 311 Networking (SECON), pp. 1-9. IEEE, 2020. 313 [AUGMENTED] 314 Schmalstieg, D. and T. Hollerer, "Augmented 315 Reality", Addison Wesley, 2016. 317 [BATT_DRAIN] 318 Seneviratne, S., Hu, Y., Nguyen, T., Lan, G., Khalifa, S., 319 Thilakarathna, K., Hassan, M., and A. Seneviratne, "A 320 survey of wearable devices and challenges.", In IEEE 321 Communication Surveys and Tutorials, 19(4), p.2573-2620., 322 2017. 324 [BLUR] Kan, P. and H. Kaufmann, "Physically-Based Depth of Field 325 in Augmented Reality.", In Eurographics (Short Papers), 326 pp. 89-92., 2012. 328 [CLOUD] Corneo, L., Eder, M., Mohan, N., Zavodovski, A., Bayhan, 329 S., Wong, W., Gunningberg, P., Kangasharju, J., and J. 330 Ott, "Surrounded by the Clouds: A Comprehensive Cloud 331 Reachability Study.", In Proceedings of the Web Conference 332 2021, pp. 295-304, 2021. 334 [DEV_HEAT_1] 335 LiKamWa, R., Wang, Z., Carroll, A., Lin, F., and L. Zhong, 336 "Draining our Glass: An Energy and Heat characterization 337 of Google Glass", In Proceedings of 5th Asia-Pacific 338 Workshop on Systems pp. 1-7, 2013. 340 [DEV_HEAT_2] 341 Matsuhashi, K., Kanamoto, T., and A. Kurokawa, "Thermal 342 model and countermeasures for future smart glasses.", 343 In Sensors, 20(5), p.1446., 2020. 345 [EDGE_1] Satyanarayanan, M., "The Emergence of Edge Computing", 346 In Computer 50(1) pp. 30-39, 2017. 348 [EDGE_2] Satyanarayanan, M., Klas, G., Silva, M., and S. Mangiante, 349 "The Seminal Role of Edge-Native Applications", In IEEE 350 International Conference on Edge Computing (EDGE) pp. 351 33-40, 2019. 353 [GLB_ILLUM_1] 354 Kan, P. and H. Kaufmann, "Differential irradiance caching 355 for fast high-quality light transport between virtual and 356 real worlds.", In IEEE International Symposium on Mixed 357 and Augmented Reality (ISMAR),pp. 133-141, 2013. 359 [GLB_ILLUM_2] 360 Franke, T., "Delta voxel cone tracing.", In IEEE 361 International Symposium on Mixed and Augmented Reality 362 (ISMAR), pp. 39-44, 2014. 364 [HEAVY_TAIL_1] 365 Crovella, M. and B. Krishnamurthy, "Internet measurement: 366 infrastructure, traffic and applications", John Wiley and 367 Sons Inc., 2006. 369 [HEAVY_TAIL_2] 370 Taleb, N., "The Statistical Consequences of Fat Tails", 371 STEM Academic Press, 2020. 373 [I-D.ietf-mops-streaming-opcons] 374 Holland, J., Begen, A., and S. Dawkins, "Operational 375 Considerations for Streaming Media", draft-ietf-mops- 376 streaming-opcons-07 (work in progress), September 2021. 378 [LENS_DIST] 379 Fuhrmann, A. and D. Schmalstieg, "Practical calibration 380 procedures for augmented reality.", In Virtual 381 Environments 2000, pp. 3-12. Springer, Vienna, 2000. 383 [NOISE] Fischer, J., Bartz, D., and W. Strasser, "Enhanced visual 384 realism by incorporating camera image effects.", 385 In IEEE/ACM International Symposium on Mixed and 386 Augmented Reality, pp. 205-208., 2006. 388 [OCCL_1] Breen, D., Whitaker, R., and M. Tuceryan, "Interactive 389 Occlusion and automatic object placementfor augmented 390 reality", In Computer Graphics Forum, vol. 15, no. 3 , 391 pp. 229-238,Edinburgh, UK: Blackwell Science Ltd, 1996. 393 [OCCL_2] Zheng, F., Schmalstieg, D., and G. Welch, "Pixel-wise 394 closed-loop registration in video-based augmented 395 reality", In IEEE International Symposium on Mixed and 396 Augmented Reality (ISMAR), pp. 135-143, 2014. 398 [OCCL_3] Lang, B., "Oculus Shares 5 Key Ingredients for Presence in 399 Virtual Reality.", https://www.roadtovr.com/oculus- 400 shares-5-key-ingredients-for-presence-in-virtual-reality/, 401 2014. 403 [PER_SENSE] 404 Mania, K., Adelstein, B., Ellis, S., and M. Hill, 405 "Perceptual sensitivity to head tracking latency in 406 virtual environments with varying degrees of scene 407 complexity.", In Proceedings of the 1st Symposium on 408 Applied perception in graphics and visualization pp. 409 39-47., 2004. 411 [PHOTO_REG] 412 Liu, Y. and X. Granier, "Online tracking of outdoor 413 lighting variations for augmented reality with moving 414 cameras", In IEEE Transactions on visualization and 415 computer graphics, 18(4), pp.573-580, 2012. 417 [PREDICT] Buker, T., Vincenzi, D., and J. Deaton, "The effect of 418 apparent latency on simulator sickness while using a see- 419 through helmet-mounted display: Reducing apparent latency 420 with predictive compensation..", In Human factors 54.2, 421 pp. 235-249., 2012. 423 [REG] Holloway, R., "Registration error analysis for augmented 424 reality.", In Presence:Teleoperators and Virtual 425 Environments 6.4, pp. 413-432., 1997. 427 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 428 Requirement Levels", BCP 14, RFC 2119, 429 DOI 10.17487/RFC2119, March 1997, 430 . 432 [SLAM_1] Ventura, J., Arth, C., Reitmayr, G., and D. Schmalstieg, 433 "A minimal solution to the generalized pose-and-scale 434 problem", In Proceedings of the IEEE Conference on 435 Computer Vision and Pattern Recognition, pp. 422-429, 436 2014. 438 [SLAM_2] Sweeny, C., Fragoso, V., Hollerer, T., and M. Turk, "A 439 scalable solution to the generalized pose and scale 440 problem", In European Conference on Computer Vision, pp. 441 16-31, 2014. 443 [SLAM_3] Gauglitz, S., Sweeny, C., Ventura, J., Turk, M., and T. 444 Hollerer, "Model estimation and selection towards 445 unconstrained real-time tracking and mapping", In IEEE 446 transactions on visualization and computer graphics, 447 20(6), pp. 825-838, 2013. 449 [SLAM_4] Pirchheim, C., Schmalstieg, D., and G. Reitmayr, "Handling 450 pure camera rotation in keyframe-based SLAM", In 2013 451 IEEE international symposium on mixed and augmented 452 reality (ISMAR), pp. 229-238, 2013. 454 [UBICOMP] Bardram, J. and A. Friday, "Ubiquitous Computing Systems", 455 In Ubiquitous Computing Fundamentals pp. 37-94. CRC 456 Press, 2009. 458 [URLLC] 3GPP, "3GPP TR 23.725: Study on enhancement of Ultra- 459 Reliable Low-Latency Communication (URLLC) support in the 460 5G Core network (5GC).", 461 https://portal.3gpp.org/desktopmodules/Specifications/ 462 SpecificationDetails.aspx?specificationId=3453, 2019. 464 [VIS_INTERFERE] 465 Kalkofen, D., Mendez, E., and D. Schmalstieg, "Interactive 466 focus and context visualization for augmented reality.", 467 In 6th IEEE and ACM International Symposium on Mixed and 468 Augmented Reality, pp. 191-201., 2007. 470 [XR] 3GPP, "3GPP TR 26.928: Extended Reality (XR) in 5G.", 471 https://portal.3gpp.org/desktopmodules/Specifications/ 472 SpecificationDetails.aspx?specificationId=3534, 2020. 474 Authors' Addresses 475 Renan Krishna 476 InterDigital Europe Limited 477 64, Great Eastern Street 478 London EC2A 3QR 479 United Kingdom 481 Email: renan.krishna@interdigital.com 483 Akbar Rahman 484 InterDigital Communications, LLC 485 1000 Sherbrooke Street West 486 Montreal H3A 3G4 487 Canada 489 Email: Akbar.Rahman@InterDigital.com