idnits 2.17.1 draft-ietf-mops-ar-use-case-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 28, 2021) is 1002 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-ietf-mops-streaming-opcons-06 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MOPS R. Krishna 3 Internet-Draft InterDigital Europe Limited 4 Intended status: Informational A. Rahman 5 Expires: January 29, 2022 InterDigital Communications, LLC 6 July 28, 2021 8 Media Operations Use Case for an Augmented Reality Application on Edge 9 Computing Infrastructure 10 draft-ietf-mops-ar-use-case-02 12 Abstract 14 A use case describing transmission of an application on the Internet 15 that has several unique characteristics of Augmented Reality (AR) 16 applications is presented for the consideration of the Media 17 Operations (MOPS) Working Group. One key requirement identified is 18 that the Adaptive-Bit-Rate (ABR) algorithms' current usage of 19 policies based on heuristics and models is inadequate for AR 20 applications running on the Edge Computing infrastructure. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on January 29, 2022. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Conventions used in this document . . . . . . . . . . . . . . 3 58 3. Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3.1. Processing of Scenes . . . . . . . . . . . . . . . . . . 3 60 3.2. Generation of Images . . . . . . . . . . . . . . . . . . 4 61 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 62 5. Informative References . . . . . . . . . . . . . . . . . . . 6 63 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 65 1. Introduction 67 The MOPS draft, [I-D.ietf-mops-streaming-opcons], provides an 68 overview of operational networking issues that pertain to Quality of 69 Experience (QoE) in delivery of video and other high-bitrate media 70 over the Internet. However, as it does not cover the increasingly 71 large number of applications with Augmented Reality (AR) 72 characteristics and their requirements on ABR algorithms, the 73 discussion in this draft compliments the overview presented in that 74 draft [I-D.ietf-mops-streaming-opcons]. 76 Future AR applications will bring several requirements for the 77 Internet and the mobile devices running these applications. AR 78 applications require a real-time processing of video streams to 79 recognize specific objects. This is then used to overlay information 80 on the video being displayed to the user. In addition some AR 81 applications will also require generation of new video frames to be 82 played to the user. Both the real-time processing of video streams 83 and the generation of overlay information are computationally 84 intensive tasks that generate heat [DEV_HEAT_1], [DEV_HEAT_2] and 85 drain battery power [BATT_DRAIN] on the AR mobile device. 86 Consequently, in order to run future applications with AR 87 characteristics on mobile devices, computationally intensive tasks 88 need to be offloaded to resources provided by Edge Computing. 90 Edge Computing is an emerging paradigm where computing resources and 91 storage are made available in close network proximity at the edge of 92 the Internet to mobile devices and sensors [EDGE_1], [EDGE_2]. 94 Adaptive-Bit-Rate (ABR) algorithms currently base their policy for 95 bit-rate selection on heuristics or models of the deployment 96 environment that do not account for the environment's dynamic nature 97 in use cases such as the one we present in this document. 98 Consequently, the ABR algorithms perform sub-optimally in such 99 deployments [ABR_1]. 101 2. Conventions used in this document 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [RFC2119]. 107 3. Use Case 109 We now descibe a use case that involves an application with AR 110 systems' characteristics. Consider a group of tourists who are being 111 conducted in a tour around the historical site of the Tower of 112 London. As they move around the site and within the historical 113 buildings, they can watch and listen to historical scenes in 3D that 114 are generated by the AR application and then overlaid by their AR 115 headsets onto their real-world view. The headset then continuously 116 updates their view as they move around. 118 The AR application first processes the scene that the walking tourist 119 is watching in real-time and identifies objects that will be targeted 120 for overlay of high resolution videos. It then generates high 121 resolution 3D images of historical scenes related to the perspective 122 of the tourist in real-time. These generated video images are then 123 overlaid on the view of the real-world as seen by the tourist. 125 We now discuss this processsing of scenes and generation of high 126 resolution images in greater detail. 128 3.1. Processing of Scenes 130 The AR application that runs on the mobile device needs to first 131 track the pose (coordinates and orientation) of the user's head, eyes 132 and the objects that are in view.This requires tracking natural 133 features and developing an annotated point cloud based model that is 134 then stored in a database.To ensure that this database can be scaled 135 up,techniques such as combining a client side simultaneous tracking 136 and mapping and a server-side localization are used[SLAM_1], 137 [SLAM_2], [SLAM_3], [SLAM_4]. Once the natural features are tracked, 138 virtual objects are geometrically aligned with those features.This is 139 followed by resolving occlusion that can occur between virtual and 140 the real objects [OCCL_1], [OCCL_2]. 142 The next step for the AR apllication is to apply photometric 143 registration [PHOTO_REG]. This requires aligning the brightness and 144 color between the virtual and real objects.Additionally, algorithms 145 that calculate global illumination of both the virtual and real 146 objects [GLB_ILLUM_1], [GLB_ILLUM_2] are executed.Various algorithms 147 to deal with artifacts generated by lens distortion [LENS_DIST], blur 148 [BLUR], noise [NOISE] etc are also required. 150 3.2. Generation of Images 152 The AR application must generate a high-quality video that has the 153 properties descibed in the previous step and overlay the video on the 154 AR device's display- a step called situated visualization. This 155 entails dealing with registration errors that may arise, esuring that 156 there is no visual interference [VIS_INTERFERE], and finally 157 maintaining temporal coherence by adapting to the movement of user's 158 eyes and head. 160 4. Requirements 162 The components of AR applications perform tasks such as real-time 163 generation and processing of high-quality video content that are 164 computationally intensive. As a result,on AR devices such as AR 165 glasses excessive heat is generated by the chip-sets that are 166 involved in the computation [DEV_HEAT_1], [DEV_HEAT_2]. 167 Additionally, the battery on such devices discharges quickly when 168 running such applications [BATT_DRAIN]. 170 A solution to the heat dissipation and battery drainge problem is to 171 offload the processing and video generation tasks to the remote 172 cloud.However, running such tasks on the cloud is not feasible as the 173 end-to-end delays must be within the order of a few milliseconds. 174 Additionally,such applications require high bandwidth and low jitter 175 to provide a high QoE to the user.In order to achieve such hard 176 timing constraints, computationally intensive tasks can be offloaded 177 to Edge devices. 179 Another requirement for our use case and similar applications such as 180 360 degree streaming is that the display on the AR/VR device should 181 synchronize the visual input with the way the user is moving their 182 head. This synchronization is necessary to avoid motion sickness 183 that results from a time-lag between when the user moves their head 184 and when the appropriate video scene is rendered. This time lag is 185 often called "motion-to-photon" delay. Studies have shown 186 [PER_SENSE], [XR], [OCCL_3] that this delay can be at most 20ms and 187 preferably between 7-15ms in order to avoid the motion sickness 188 problem. Out of these 20ms, display techniques including the refresh 189 rate of write displays and pixel switching take 12-13ms [OCCL_3], 190 [CLOUD]. This leaves 7-8ms for the processing of motion sensor 191 inputs, graphic rendering, and RTT between the AR/VR device and the 192 Edge. The use of predictive techniques to mask latencies has been 193 considered as a mitigating strategy to reduce motion sickness 194 [PREDICT]. In addition, Edge Devices that are proximate to the user 195 might be used to offload these computationally intensive tasks. 196 Towards this end, the 3GPP requires and supports an Ultra Reliable 197 Low Latency of 0.1ms to 1ms for communication between an Edge server 198 and User Equipment(UE) [URLLC]. 200 Note that the Edge device providing the computation and storage is 201 itself limited in such resources compared to the Cloud. So, for 202 example, a sudden surge in demand from a large group of tourists can 203 overwhelm that device. This will result in a degraded user 204 experience as their AR device experiences delays in receiving the 205 video frames. In order to deal with this problem, the client AR 206 applications will need to use Adaptive Bit Rate (ABR) algorithms that 207 choose bit-rates policies tailored in a fine-grained manner to the 208 resource demands and playback the videos with appropriate QoE metrics 209 as the user moves around with the group of tourists. 211 However, heavy-tailed nature of several operational parameters make 212 prediction-based adaptation by ABR algorithms sub-optimal[ABR_2]. 213 This is because with such distributions, law of large numbers works 214 too slowly, the mean of sample does not equal the mean of 215 distribution, and as a result standard deviation and variance are 216 unsuitable as metrics for such operational parameters [HEAVY_TAIL_1], 217 [HEAVY_TAIL_2]. Other subtle issues with these distributions include 218 the "expectation paradox" [HEAVY_TAIL_1] where the longer we have 219 waited for an event the longer we have to wait and the issue of 220 mismatch between the size and count of events [HEAVY_TAIL_1]. This 221 makes designing an algorithm for adaptation error-prone and 222 challenging. Such operational parameters include but are not limited 223 to buffer occupancy, throughput, client-server latency, and variable 224 transmission times.In addition, edge devices and communication links 225 may fail and logical communication relationships between various 226 software components change frequently as the user moves around with 227 their AR device [UBICOMP]. 229 Thus, once the offloaded computationally intensive processing is 230 completed on the Edge Computing, the video is streamed to the user 231 with the help of an ABR algorithm which needs to meet the following 232 requirements [ABR_1]: 234 o Dynamically changing ABR parameters: The ABR algorithm must be 235 able to dynamically change parameters given the heavy-tailed 236 nature of network throughput. This, for example, may be 237 accomplished by AI/ML processing on the Edge Computing on a per 238 client or global basis. 240 o Handling conflicting QoE requirements: QoE goals often require 241 high bit-rates, and low frequency of buffer refills. However in 242 practice, this can lead to a conflict between those goals. For 243 example, increasing the bit-rate might result in the need to fill 244 up the buffer more frequently as the buffer capacity might be 245 limited on the AR device. The ABR algorithm must be able to 246 handle this situation. 248 o Handling side effects of deciding a specific bit rate: For 249 example, selecting a bit rate of a particular value might result 250 in the ABR algorithm not changing to a different rate so as to 251 ensure a non-fluctuating bit-rate and the resultant smoothness of 252 video quality . The ABR algorithm must be able to handle this 253 situation. 255 5. Informative References 257 [ABR_1] Mao, H., Netravali, R., and M. Alizadeh, "Neural Adaptive 258 Video Streaming with Pensieve", In Proceedings of the 259 Conference of the ACM Special Interest Group on Data 260 Communication, pp. 197-210, 2017. 262 [ABR_2] Yan, F., Ayers, H., Zhu, C., Fouladi, S., Hong, J., Zhang, 263 K., Levis, P., and K. Winstein, "Learning in situ: a 264 randomized experiment in video streaming", In 17th 265 USENIX Symposium on Networked Systems Design and 266 Implementation (NSDI 20), pp. 495-511, 2020. 268 [BATT_DRAIN] 269 Seneviratne, S., Hu, Y., Nguyen, T., Lan, G., Khalifa, S., 270 Thilakarathna, K., Hassan, M., and A. Seneviratne, "A 271 survey of wearable devices and challenges.", In IEEE 272 Communication Surveys and Tutorials, 19(4), p.2573-2620., 273 2017. 275 [BLUR] Kan, P. and H. Kaufmann, "Physically-Based Depth of Field 276 in Augmented Reality.", In Eurographics (Short Papers), 277 pp. 89-92., 2012. 279 [CLOUD] Corneo, L., Eder, M., Mohan, N., Zavodovski, A., Bayhan, 280 S., Wong, W., Gunningberg, P., Kangasharju, J., and J. 281 Ott, "Surrounded by the Clouds: A Comprehensive Cloud 282 Reachability Study.", In Proceedings of the Web Conference 283 2021, pp. 295-304, 2021. 285 [DEV_HEAT_1] 286 LiKamWa, R., Wang, Z., Carroll, A., Lin, F., and L. Zhong, 287 "Draining our Glass: An Energy and Heat characterization 288 of Google Glass", In Proceedings of 5th Asia-Pacific 289 Workshop on Systems pp. 1-7, 2013. 291 [DEV_HEAT_2] 292 Matsuhashi, K., Kanamoto, T., and A. Kurokawa, "Thermal 293 model and countermeasures for future smart glasses.", 294 In Sensors, 20(5), p.1446., 2020. 296 [EDGE_1] Satyanarayanan, M., "The Emergence of Edge Computing", 297 In Computer 50(1) pp. 30-39, 2017. 299 [EDGE_2] Satyanarayanan, M., Klas, G., Silva, M., and S. Mangiante, 300 "The Seminal Role of Edge-Native Applications", In IEEE 301 International Conference on Edge Computing (EDGE) pp. 302 33-40, 2019. 304 [GLB_ILLUM_1] 305 Kan, P. and H. Kaufmann, "Differential irradiance caching 306 for fast high-quality light transport between virtual and 307 real worlds.", In IEEE International Symposium on Mixed 308 and Augmented Reality (ISMAR),pp. 133-141, 2013. 310 [GLB_ILLUM_2] 311 Franke, T., "Delta voxel cone tracing.", In IEEE 312 International Symposium on Mixed and Augmented Reality 313 (ISMAR), pp. 39-44, 2014. 315 [HEAVY_TAIL_1] 316 Crovella, M. and B. Krishnamurthy, "Internet measurement: 317 infrastructure, traffic and applications", John Wiley and 318 Sons Inc., 2006. 320 [HEAVY_TAIL_2] 321 Taleb, N., "The Statistical Consequences of Fat Tails", 322 STEM Academic Press, 2020. 324 [I-D.ietf-mops-streaming-opcons] 325 Holland, J., Begen, A., and S. Dawkins, "Operational 326 Considerations for Streaming Media", draft-ietf-mops- 327 streaming-opcons-06 (work in progress), July 2021. 329 [LENS_DIST] 330 Fuhrmann, A. and D. Schmalstieg, "Practical calibration 331 procedures for augmented reality.", In Virtual 332 Environments 2000, pp. 3-12. Springer, Vienna, 2000. 334 [NOISE] Fischer, J., Bartz, D., and W. Strasser, "Enhanced visual 335 realism by incorporating camera image effects.", 336 In IEEE/ACM International Symposium on Mixed and 337 Augmented Reality, pp. 205-208., 2006. 339 [OCCL_1] Breen, D., Whitaker, R., and M. Tuceryan, "Interactive 340 Occlusion and automatic object placementfor augmented 341 reality", In Computer Graphics Forum, vol. 15, no. 3 , 342 pp. 229-238,Edinburgh, UK: Blackwell Science Ltd, 1996. 344 [OCCL_2] Zheng, F., Schmalstieg, D., and G. Welch, "Pixel-wise 345 closed-loop registration in video-based augmented 346 reality", In IEEE International Symposium on Mixed and 347 Augmented Reality (ISMAR), pp. 135-143, 2014. 349 [OCCL_3] Lang, B., "Oculus Shares 5 Key Ingredients for Presence in 350 Virtual Reality.", https://www.roadtovr.com/oculus- 351 shares-5-key-ingredients-for-presence-in-virtual-reality/, 352 2014. 354 [PER_SENSE] 355 Mania, K., Adelstein, B., Ellis, S., and M. Hill, 356 "Perceptual sensitivity to head tracking latency in 357 virtual environments with varying degrees of scene 358 complexity.", In Proceedings of the 1st Symposium on 359 Applied perception in graphics and visualization pp. 360 39-47., 2004. 362 [PHOTO_REG] 363 Liu, Y. and X. Granier, "Online tracking of outdoor 364 lighting variations for augmented reality with moving 365 cameras", In IEEE Transactions on visualization and 366 computer graphics, 18(4), pp.573-580, 2012. 368 [PREDICT] Buker, T., Vincenzi, D., and J. Deaton, "The effect of 369 apparent latency on simulator sickness while using a see- 370 through helmet-mounted display: Reducing apparent latency 371 with predictive compensation..", In Human factors 54.2, 372 pp. 235-249., 2012. 374 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 375 Requirement Levels", BCP 14, RFC 2119, 376 DOI 10.17487/RFC2119, March 1997, 377 . 379 [SLAM_1] Ventura, J., Arth, C., Reitmayr, G., and D. Schmalstieg, 380 "A minimal solution to the generalized pose-and-scale 381 problem", In Proceedings of the IEEE Conference on 382 Computer Vision and Pattern Recognition, pp. 422-429, 383 2014. 385 [SLAM_2] Sweeny, C., Fragoso, V., Hollerer, T., and M. Turk, "A 386 scalable solution to the generalized pose and scale 387 problem", In European Conference on Computer Vision, pp. 388 16-31, 2014. 390 [SLAM_3] Gauglitz, S., Sweeny, C., Ventura, J., Turk, M., and T. 391 Hollerer, "Model estimation and selection towards 392 unconstrained real-time tracking and mapping", In IEEE 393 transactions on visualization and computer graphics, 394 20(6), pp. 825-838, 2013. 396 [SLAM_4] Pirchheim, C., Schmalstieg, D., and G. Reitmayr, "Handling 397 pure camera rotation in keyframe-based SLAM", In 2013 398 IEEE international symposium on mixed and augmented 399 reality (ISMAR), pp. 229-238, 2013. 401 [UBICOMP] Bardram, J. and A. Friday, "Ubiquitous Computing Systems", 402 In Ubiquitous Computing Fundamentals pp. 37-94. CRC 403 Press, 2009. 405 [URLLC] 3GPP, "3GPP TR 23.725: Study on enhancement of Ultra- 406 Reliable Low-Latency Communication (URLLC) support in the 407 5G Core network (5GC).", 408 https://portal.3gpp.org/desktopmodules/Specifications/ 409 SpecificationDetails.aspx?specificationId=3453, 2019. 411 [VIS_INTERFERE] 412 Kalkofen, D., Mendez, E., and D. Schmalstieg, "Interactive 413 focus and context visualization for augmented reality.", 414 In 6th IEEE and ACM International Symposium on Mixed and 415 Augmented Reality, pp. 191-201., 2007. 417 [XR] 3GPP, "3GPP TR 26.928: Extended Reality (XR) in 5G.", 418 https://portal.3gpp.org/desktopmodules/Specifications/ 419 SpecificationDetails.aspx?specificationId=3534, 2020. 421 Authors' Addresses 422 Renan Krishna 423 InterDigital Europe Limited 424 64, Great Eastern Street 425 London EC2A 3QR 426 United Kingdom 428 Email: renan.krishna@interdigital.com 430 Akbar Rahman 431 InterDigital Communications, LLC 432 1000 Sherbrooke Street West 433 Montreal H3A 3G4 434 Canada 436 Email: rahmansakbar@yahoo.com