idnits 2.17.1 

draft-ietf-mops-ar-use-case-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (July 28, 2021) is 1002 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-12) exists of
     draft-ietf-mops-streaming-opcons-06


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	MOPS                                                          R. Krishna
3	Internet-Draft                               InterDigital Europe Limited
4	Intended status: Informational                                 A. Rahman
5	Expires: January 29, 2022               InterDigital Communications, LLC
6	                                                           July 28, 2021

8	 Media Operations Use Case for an Augmented Reality Application on Edge
9	                        Computing Infrastructure
10	                     draft-ietf-mops-ar-use-case-02

12	Abstract

14	   A use case describing transmission of an application on the Internet
15	   that has several unique characteristics of Augmented Reality (AR)
16	   applications is presented for the consideration of the Media
17	   Operations (MOPS) Working Group.  One key requirement identified is
18	   that the Adaptive-Bit-Rate (ABR) algorithms' current usage of
19	   policies based on heuristics and models is inadequate for AR
20	   applications running on the Edge Computing infrastructure.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on January 29, 2022.

39	Copyright Notice

41	   Copyright (c) 2021 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (https://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	   2.  Conventions used in this document . . . . . . . . . . . . . .   3
58	   3.  Use Case  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
59	     3.1.  Processing of Scenes  . . . . . . . . . . . . . . . . . .   3
60	     3.2.  Generation of Images  . . . . . . . . . . . . . . . . . .   4
61	   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   4
62	   5.  Informative References  . . . . . . . . . . . . . . . . . . .   6
63	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

65	1.  Introduction

67	   The MOPS draft, [I-D.ietf-mops-streaming-opcons], provides an
68	   overview of operational networking issues that pertain to Quality of
69	   Experience (QoE) in delivery of video and other high-bitrate media
70	   over the Internet.  However, as it does not cover the increasingly
71	   large number of applications with Augmented Reality (AR)
72	   characteristics and their requirements on ABR algorithms, the
73	   discussion in this draft compliments the overview presented in that
74	   draft [I-D.ietf-mops-streaming-opcons].

76	   Future AR applications will bring several requirements for the
77	   Internet and the mobile devices running these applications.  AR
78	   applications require a real-time processing of video streams to
79	   recognize specific objects.  This is then used to overlay information
80	   on the video being displayed to the user.  In addition some AR
81	   applications will also require generation of new video frames to be
82	   played to the user.  Both the real-time processing of video streams
83	   and the generation of overlay information are computationally
84	   intensive tasks that generate heat [DEV_HEAT_1], [DEV_HEAT_2] and
85	   drain battery power [BATT_DRAIN] on the AR mobile device.
86	   Consequently, in order to run future applications with AR
87	   characteristics on mobile devices, computationally intensive tasks
88	   need to be offloaded to resources provided by Edge Computing.

90	   Edge Computing is an emerging paradigm where computing resources and
91	   storage are made available in close network proximity at the edge of
92	   the Internet to mobile devices and sensors [EDGE_1], [EDGE_2].

94	   Adaptive-Bit-Rate (ABR) algorithms currently base their policy for
95	   bit-rate selection on heuristics or models of the deployment
96	   environment that do not account for the environment's dynamic nature
97	   in use cases such as the one we present in this document.
98	   Consequently, the ABR algorithms perform sub-optimally in such
99	   deployments [ABR_1].

101	2.  Conventions used in this document

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in [RFC2119].

107	3.  Use Case

109	   We now descibe a use case that involves an application with AR
110	   systems' characteristics.  Consider a group of tourists who are being
111	   conducted in a tour around the historical site of the Tower of
112	   London.  As they move around the site and within the historical
113	   buildings, they can watch and listen to historical scenes in 3D that
114	   are generated by the AR application and then overlaid by their AR
115	   headsets onto their real-world view.  The headset then continuously
116	   updates their view as they move around.

118	   The AR application first processes the scene that the walking tourist
119	   is watching in real-time and identifies objects that will be targeted
120	   for overlay of high resolution videos.  It then generates high
121	   resolution 3D images of historical scenes related to the perspective
122	   of the tourist in real-time.  These generated video images are then
123	   overlaid on the view of the real-world as seen by the tourist.

125	   We now discuss this processsing of scenes and generation of high
126	   resolution images in greater detail.

128	3.1.  Processing of Scenes

130	   The AR application that runs on the mobile device needs to first
131	   track the pose (coordinates and orientation) of the user's head, eyes
132	   and the objects that are in view.This requires tracking natural
133	   features and developing an annotated point cloud based model that is
134	   then stored in a database.To ensure that this database can be scaled
135	   up,techniques such as combining a client side simultaneous tracking
136	   and mapping and a server-side localization are used[SLAM_1],
137	   [SLAM_2], [SLAM_3], [SLAM_4].  Once the natural features are tracked,
138	   virtual objects are geometrically aligned with those features.This is
139	   followed by resolving occlusion that can occur between virtual and
140	   the real objects [OCCL_1], [OCCL_2].

142	   The next step for the AR apllication is to apply photometric
143	   registration [PHOTO_REG].  This requires aligning the brightness and
144	   color between the virtual and real objects.Additionally, algorithms
145	   that calculate global illumination of both the virtual and real
146	   objects [GLB_ILLUM_1], [GLB_ILLUM_2] are executed.Various algorithms
147	   to deal with artifacts generated by lens distortion [LENS_DIST], blur
148	   [BLUR], noise [NOISE] etc are also required.

150	3.2.  Generation of Images

152	   The AR application must generate a high-quality video that has the
153	   properties descibed in the previous step and overlay the video on the
154	   AR device's display- a step called situated visualization.  This
155	   entails dealing with registration errors that may arise, esuring that
156	   there is no visual interference [VIS_INTERFERE], and finally
157	   maintaining temporal coherence by adapting to the movement of user's
158	   eyes and head.

160	4.  Requirements

162	   The components of AR applications perform tasks such as real-time
163	   generation and processing of high-quality video content that are
164	   computationally intensive.  As a result,on AR devices such as AR
165	   glasses excessive heat is generated by the chip-sets that are
166	   involved in the computation [DEV_HEAT_1], [DEV_HEAT_2].
167	   Additionally, the battery on such devices discharges quickly when
168	   running such applications [BATT_DRAIN].

170	   A solution to the heat dissipation and battery drainge problem is to
171	   offload the processing and video generation tasks to the remote
172	   cloud.However, running such tasks on the cloud is not feasible as the
173	   end-to-end delays must be within the order of a few milliseconds.
174	   Additionally,such applications require high bandwidth and low jitter
175	   to provide a high QoE to the user.In order to achieve such hard
176	   timing constraints, computationally intensive tasks can be offloaded
177	   to Edge devices.

179	   Another requirement for our use case and similar applications such as
180	   360 degree streaming is that the display on the AR/VR device should
181	   synchronize the visual input with the way the user is moving their
182	   head.  This synchronization is necessary to avoid motion sickness
183	   that results from a time-lag between when the user moves their head
184	   and when the appropriate video scene is rendered.  This time lag is
185	   often called "motion-to-photon" delay.  Studies have shown
186	   [PER_SENSE], [XR], [OCCL_3] that this delay can be at most 20ms and
187	   preferably between 7-15ms in order to avoid the motion sickness
188	   problem.  Out of these 20ms, display techniques including the refresh
189	   rate of write displays and pixel switching take 12-13ms [OCCL_3],
190	   [CLOUD].  This leaves 7-8ms for the processing of motion sensor
191	   inputs, graphic rendering, and RTT between the AR/VR device and the
192	   Edge.  The use of predictive techniques to mask latencies has been
193	   considered as a mitigating strategy to reduce motion sickness
194	   [PREDICT].  In addition, Edge Devices that are proximate to the user
195	   might be used to offload these computationally intensive tasks.
196	   Towards this end, the 3GPP requires and supports an Ultra Reliable
197	   Low Latency of 0.1ms to 1ms for communication between an Edge server
198	   and User Equipment(UE) [URLLC].

200	   Note that the Edge device providing the computation and storage is
201	   itself limited in such resources compared to the Cloud.  So, for
202	   example, a sudden surge in demand from a large group of tourists can
203	   overwhelm that device.  This will result in a degraded user
204	   experience as their AR device experiences delays in receiving the
205	   video frames.  In order to deal with this problem, the client AR
206	   applications will need to use Adaptive Bit Rate (ABR) algorithms that
207	   choose bit-rates policies tailored in a fine-grained manner to the
208	   resource demands and playback the videos with appropriate QoE metrics
209	   as the user moves around with the group of tourists.

211	   However, heavy-tailed nature of several operational parameters make
212	   prediction-based adaptation by ABR algorithms sub-optimal[ABR_2].
213	   This is because with such distributions, law of large numbers works
214	   too slowly, the mean of sample does not equal the mean of
215	   distribution, and as a result standard deviation and variance are
216	   unsuitable as metrics for such operational parameters [HEAVY_TAIL_1],
217	   [HEAVY_TAIL_2].  Other subtle issues with these distributions include
218	   the "expectation paradox" [HEAVY_TAIL_1] where the longer we have
219	   waited for an event the longer we have to wait and the issue of
220	   mismatch between the size and count of events [HEAVY_TAIL_1].  This
221	   makes designing an algorithm for adaptation error-prone and
222	   challenging.  Such operational parameters include but are not limited
223	   to buffer occupancy, throughput, client-server latency, and variable
224	   transmission times.In addition, edge devices and communication links
225	   may fail and logical communication relationships between various
226	   software components change frequently as the user moves around with
227	   their AR device [UBICOMP].

229	   Thus, once the offloaded computationally intensive processing is
230	   completed on the Edge Computing, the video is streamed to the user
231	   with the help of an ABR algorithm which needs to meet the following
232	   requirements [ABR_1]:

234	   o  Dynamically changing ABR parameters: The ABR algorithm must be
235	      able to dynamically change parameters given the heavy-tailed
236	      nature of network throughput.  This, for example, may be
237	      accomplished by AI/ML processing on the Edge Computing on a per
238	      client or global basis.

240	   o  Handling conflicting QoE requirements: QoE goals often require
241	      high bit-rates, and low frequency of buffer refills.  However in
242	      practice, this can lead to a conflict between those goals.  For
243	      example, increasing the bit-rate might result in the need to fill
244	      up the buffer more frequently as the buffer capacity might be
245	      limited on the AR device.  The ABR algorithm must be able to
246	      handle this situation.

248	   o  Handling side effects of deciding a specific bit rate: For
249	      example, selecting a bit rate of a particular value might result
250	      in the ABR algorithm not changing to a different rate so as to
251	      ensure a non-fluctuating bit-rate and the resultant smoothness of
252	      video quality . The ABR algorithm must be able to handle this
253	      situation.

255	5.  Informative References

257	   [ABR_1]    Mao, H., Netravali, R., and M. Alizadeh, "Neural Adaptive
258	              Video Streaming with Pensieve", In  Proceedings of the
259	              Conference of the ACM Special Interest Group on Data
260	              Communication, pp. 197-210, 2017.

262	   [ABR_2]    Yan, F., Ayers, H., Zhu, C., Fouladi, S., Hong, J., Zhang,
263	              K., Levis, P., and K. Winstein, "Learning in situ: a
264	              randomized experiment in video streaming", In   17th
265	              USENIX Symposium on Networked Systems Design and
266	              Implementation (NSDI 20), pp. 495-511, 2020.

268	   [BATT_DRAIN]
269	              Seneviratne, S., Hu, Y., Nguyen, T., Lan, G., Khalifa, S.,
270	              Thilakarathna, K., Hassan, M., and A. Seneviratne, "A
271	              survey of wearable devices and challenges.", In  IEEE
272	              Communication Surveys and Tutorials, 19(4), p.2573-2620.,
273	              2017.

275	   [BLUR]     Kan, P. and H. Kaufmann, "Physically-Based Depth of Field
276	              in Augmented Reality.", In  Eurographics (Short Papers),
277	              pp. 89-92., 2012.

279	   [CLOUD]    Corneo, L., Eder, M., Mohan, N., Zavodovski, A., Bayhan,
280	              S., Wong, W., Gunningberg, P., Kangasharju, J., and J.
281	              Ott, "Surrounded by the Clouds: A Comprehensive Cloud
282	              Reachability Study.", In Proceedings of the Web Conference
283	              2021, pp. 295-304, 2021.

285	   [DEV_HEAT_1]
286	              LiKamWa, R., Wang, Z., Carroll, A., Lin, F., and L. Zhong,
287	              "Draining our Glass: An Energy and Heat characterization
288	              of Google Glass", In Proceedings of 5th Asia-Pacific
289	              Workshop on Systems pp. 1-7, 2013.

291	   [DEV_HEAT_2]
292	              Matsuhashi, K., Kanamoto, T., and A. Kurokawa, "Thermal
293	              model and countermeasures for future smart glasses.",
294	              In  Sensors, 20(5), p.1446., 2020.

296	   [EDGE_1]   Satyanarayanan, M., "The Emergence of Edge Computing",
297	              In  Computer 50(1) pp. 30-39, 2017.

299	   [EDGE_2]   Satyanarayanan, M., Klas, G., Silva, M., and S. Mangiante,
300	              "The Seminal Role of Edge-Native Applications", In  IEEE
301	              International Conference on Edge Computing (EDGE) pp.
302	              33-40, 2019.

304	   [GLB_ILLUM_1]
305	              Kan, P. and H. Kaufmann, "Differential irradiance caching
306	              for fast high-quality light transport between virtual and
307	              real worlds.", In  IEEE International Symposium on Mixed
308	              and Augmented Reality (ISMAR),pp. 133-141, 2013.

310	   [GLB_ILLUM_2]
311	              Franke, T., "Delta voxel cone tracing.", In  IEEE
312	              International Symposium on Mixed and Augmented Reality
313	              (ISMAR), pp. 39-44, 2014.

315	   [HEAVY_TAIL_1]
316	              Crovella, M. and B. Krishnamurthy, "Internet measurement:
317	              infrastructure, traffic and applications", John  Wiley and
318	              Sons Inc., 2006.

320	   [HEAVY_TAIL_2]
321	              Taleb, N., "The Statistical Consequences of Fat Tails",
322	              STEM  Academic Press, 2020.

324	   [I-D.ietf-mops-streaming-opcons]
325	              Holland, J., Begen, A., and S. Dawkins, "Operational
326	              Considerations for Streaming Media", draft-ietf-mops-
327	              streaming-opcons-06 (work in progress), July 2021.

329	   [LENS_DIST]
330	              Fuhrmann, A. and D. Schmalstieg, "Practical calibration
331	              procedures for augmented reality.", In  Virtual
332	              Environments 2000, pp. 3-12. Springer, Vienna, 2000.

334	   [NOISE]    Fischer, J., Bartz, D., and W. Strasser, "Enhanced visual
335	              realism by incorporating camera image effects.",
336	              In  IEEE/ACM International Symposium on Mixed and
337	              Augmented Reality, pp. 205-208., 2006.

339	   [OCCL_1]   Breen, D., Whitaker, R., and M. Tuceryan, "Interactive
340	              Occlusion and automatic object placementfor augmented
341	              reality", In  Computer Graphics Forum, vol. 15, no. 3 ,
342	              pp. 229-238,Edinburgh, UK: Blackwell Science Ltd, 1996.

344	   [OCCL_2]   Zheng, F., Schmalstieg, D., and G. Welch, "Pixel-wise
345	              closed-loop registration in video-based augmented
346	              reality", In  IEEE International Symposium on Mixed and
347	              Augmented Reality (ISMAR), pp. 135-143, 2014.

349	   [OCCL_3]   Lang, B., "Oculus Shares 5 Key Ingredients for Presence in
350	              Virtual Reality.",  https://www.roadtovr.com/oculus-
351	              shares-5-key-ingredients-for-presence-in-virtual-reality/,
352	              2014.

354	   [PER_SENSE]
355	              Mania, K., Adelstein, B., Ellis, S., and M. Hill,
356	              "Perceptual sensitivity to head tracking latency in
357	              virtual environments with varying degrees of scene
358	              complexity.", In  Proceedings of the 1st Symposium on
359	              Applied perception in graphics and visualization pp.
360	              39-47., 2004.

362	   [PHOTO_REG]
363	              Liu, Y. and X. Granier, "Online tracking of outdoor
364	              lighting variations for augmented reality with moving
365	              cameras", In  IEEE Transactions on visualization and
366	              computer graphics, 18(4), pp.573-580, 2012.

368	   [PREDICT]  Buker, T., Vincenzi, D., and J. Deaton, "The effect of
369	              apparent latency on simulator sickness while using a see-
370	              through helmet-mounted display: Reducing apparent latency
371	              with predictive compensation..", In  Human factors 54.2,
372	              pp. 235-249., 2012.

374	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
375	              Requirement Levels", BCP 14, RFC 2119,
376	              DOI 10.17487/RFC2119, March 1997,
377	              <https://www.rfc-editor.org/info/rfc2119>.

379	   [SLAM_1]   Ventura, J., Arth, C., Reitmayr, G., and D. Schmalstieg,
380	              "A minimal solution to the generalized pose-and-scale
381	              problem", In  Proceedings of the IEEE Conference on
382	              Computer Vision and Pattern Recognition, pp. 422-429,
383	              2014.

385	   [SLAM_2]   Sweeny, C., Fragoso, V., Hollerer, T., and M. Turk, "A
386	              scalable solution to the generalized pose and scale
387	              problem", In  European Conference on Computer Vision, pp.
388	              16-31, 2014.

390	   [SLAM_3]   Gauglitz, S., Sweeny, C., Ventura, J., Turk, M., and T.
391	              Hollerer, "Model estimation and selection towards
392	              unconstrained real-time tracking and mapping", In  IEEE
393	              transactions on visualization and computer graphics,
394	              20(6), pp. 825-838, 2013.

396	   [SLAM_4]   Pirchheim, C., Schmalstieg, D., and G. Reitmayr, "Handling
397	              pure camera rotation in keyframe-based SLAM", In  2013
398	              IEEE international symposium on mixed and augmented
399	              reality (ISMAR), pp. 229-238, 2013.

401	   [UBICOMP]  Bardram, J. and A. Friday, "Ubiquitous Computing Systems",
402	              In   Ubiquitous Computing Fundamentals pp. 37-94. CRC
403	              Press, 2009.

405	   [URLLC]    3GPP, "3GPP TR 23.725: Study on enhancement of Ultra-
406	              Reliable Low-Latency Communication (URLLC) support in the
407	              5G Core network (5GC).",
408	              https://portal.3gpp.org/desktopmodules/Specifications/
409	              SpecificationDetails.aspx?specificationId=3453, 2019.

411	   [VIS_INTERFERE]
412	              Kalkofen, D., Mendez, E., and D. Schmalstieg, "Interactive
413	              focus and context visualization for augmented reality.",
414	              In  6th IEEE and ACM International Symposium on Mixed and
415	              Augmented Reality, pp. 191-201., 2007.

417	   [XR]       3GPP, "3GPP TR 26.928: Extended Reality (XR) in 5G.",
418	              https://portal.3gpp.org/desktopmodules/Specifications/
419	              SpecificationDetails.aspx?specificationId=3534, 2020.

421	Authors' Addresses
422	   Renan Krishna
423	   InterDigital Europe Limited
424	   64, Great Eastern Street
425	   London  EC2A 3QR
426	   United Kingdom

428	   Email: renan.krishna@interdigital.com

430	   Akbar Rahman
431	   InterDigital Communications, LLC
432	   1000 Sherbrooke Street West
433	   Montreal  H3A 3G4
434	   Canada

436	   Email: rahmansakbar@yahoo.com