idnits 2.17.1 draft-han-iccrg-arvr-transport-problem-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2017) is 2595 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-07) exists of draft-ietf-tcpm-cubic-04 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Congestion Control Research Group L. Han, Ed. 3 Internet-Draft Huawei Technologies 4 Intended status: Informational S. Appleby 5 Expires: September 13, 2017 BT 6 K. Smith 7 Vodafone 8 March 12, 2017 10 Problem Statement: Transport Support for Augmented and Virtual Reality 11 Applications 12 draft-han-iccrg-arvr-transport-problem-00 14 Abstract 16 As emerging technology, Augmented Reality (AR) and Virtual Reality 17 (VR) bring up a lot of challenges to technologies such as information 18 display, image processing, fast computing and networking. This 19 document will analyze the requirements of AR and VR to networking, 20 especially to transport protocol. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 13, 2017. 39 Copyright Notice 41 Copyright (c) 2017 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 6 61 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 62 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 63 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 64 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 65 7.1. Normative References . . . . . . . . . . . . . . . . . . 10 66 7.2. Informative References . . . . . . . . . . . . . . . . . 10 67 Appendix A. Key Factors for Network-Based AR/VR . . . . . . . . 12 68 A.1. Latency Requirements . . . . . . . . . . . . . . . . . . 12 69 A.1.1. Motion to Photon (MTP) Latency . . . . . . . . . . . 12 70 A.1.2. Latency Budget . . . . . . . . . . . . . . . . . . . 13 71 A.2. Throughput Requirements . . . . . . . . . . . . . . . . . 15 72 A.2.1. Average Throughput . . . . . . . . . . . . . . . . . 15 73 A.2.2. Peak Throughput . . . . . . . . . . . . . . . . . . . 19 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 76 1. Introduction 78 Virtual Reality (VR) and Augmented Reality (AR) technologies have 79 enormous potential in many different fields, such as entertainment, 80 remote diagnosis, or remote maintenance. AR and VR applications aim 81 to cause users to perceive that they are physically present in a non- 82 physical or partly non-physical world. However, slightly unrealistic 83 artefacts not only distract from the sense of immersion, but they can 84 also cause `VR sickness' [VR-Sickness] by confusing the brain 85 whenever information about the virtual environment is good enough to 86 be believable but not wholly consistent. 88 This document is based on the assumption and prediction that the 89 current localized AR/VR will inevitably evolve to cloud based AR/VR. 90 Since cloud processing and state will be able to supplement local AR/ 91 VR devices, helping to reduce their size and power consumption, and 92 to provide much more content resource and flexibility to the AR/VR 93 applications. 95 Sufficient realism requires both very low latency and a very high 96 information rate. In addition the information rate varies 97 significantly and can include large bursts. This problem statement 98 aims to quantify these requirements, which are largely driven by the 99 video component of the transmission. The ambition is to improve 100 Internet technology so that AR/VR applications can create the 101 impression of remote presence over longer distances. 103 The goal is for the Internet to be able to routinely satisfy these 104 demanding requirements in 5-10 years. Then it will become feasible 105 to launch many new applications, using AR/VR technology in various 106 arrangements as a new platform over the Internet. A 5-10-year 107 horizon is considered appropriate, given it can take 1-2 years to 108 socialize a grand challenge in the IRTF/IETF then 2-3 years for 109 standards documents to be drafted and taken through the RFC process. 110 The technology itself will also take a few years to develop and 111 deploy. That is likely to run partly in parallel to standardization, 112 so the IETF will need to be ready to intervene wherever 113 interoperability is necessary. 115 1.1. Scope 117 This document is aimed at the transport area research community. 118 However, initially, advances at other layers are likely to make the 119 greatest inroads into the problem, for example: 121 o Network architecture: the physical distance between the content 122 cloud of AR/VR and users are short enough to limit the latency 123 caused by the propagation delay in physical media 125 o Motion sensors: reduction in latency for range of interest (RoI) 126 detection 128 o Sending app: better targeted degradation of quality below the 129 threshold of human perception, e.g. outside the range of interest 131 o Sending app: better coding and compression algorithms 133 o Access network: multiplexing bursts further down the layers and 134 therefore between more users, e.g. traffic-dependent scheduling 135 between layer-2 flows not layer-3 flows 137 o Core network: The capacity of the core network is sufficient to 138 support transport of AR/VR traffic cross different service 139 providers. 141 o Receiving app: better decoding and prediction algorithms 143 o Head mounted displays (HMDs): reducing display latency 144 The initial aim is to state the problem in terms of raw information 145 rates and delays. This initial draft can then form the basis of 146 discussions with experts in other fields, to quantify how much of the 147 problem they are likely to be able to remove. Then subsequent drafts 148 can better quantify the size of the remaining transport problem. 150 This document focuses on unicast-based AR/VR, which covers a wide 151 range of applications, such as VR gaming, shopping, surgery, etc. 152 Broadcast/multicast-based AR/VR is outside the scope of this 153 document. It is likely to need more supporting technology such as 154 multicast, caching and edge computing. Broadcast/multicast-based AR/ 155 VR is for live or multi-user events, such as sports broadcasts or 156 online education. The idea is to use panoramic streaming 157 technologies such that users can dynamically select different view 158 points and angles to become immersed in different real time video 159 streams. 161 Our intention is not to promote enhancement of the Internet specially 162 for AR/VR applications. Rather AR/VR is selected as a concrete 163 example that encompasses a fairly wide set of applications. It is 164 expected that an Internet that can support AR/VR will be able to 165 support other applications requiring both high throughput and low 166 latency, such as interactive video. It should be able to support 167 applications with more demanding latency requirements, but perhaps 168 only over shorter distances. For instance, low latency is needed for 169 vehicle to everything (V2X) communication, for example between 170 vehicles on roads, or between vehicles and remote cloud computing. 171 Tactile communication has very demanding latency needs, perhaps as 172 low as 1 ms. 174 2. Terminology 176 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 177 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 178 document are to be interpreted as described in RFC 2119 [RFC2119]. 180 2.1. Definitions 182 E2E 183 End-to-end 185 HMD 186 Head-Mounted Display or Device 188 AR 189 Augmented Reality (AR) is a live direct or indirect view of a 190 physical, real-world environment whose elements are augmented 191 (or supplemented) by computer-generated sensory input such as 192 sound, video, graphics or GPS data. It is related to a more 193 general concept called mediated reality, in which a view of 194 reality is modified (possibly even diminished rather than 195 augmented) by a computer 197 VR 198 Virtual Reality (VR) is a computer technology that uses 199 software-generated realistic images, sounds and other 200 sensations to replicate a real environment or an imaginary 201 setting, and simulates a user's physical presence in this 202 environment to enable the user to interact with this space 204 FOV 205 Field of View is the extent of the world that is visible 206 without eye movement, measured in degrees of visual angle in 207 the vertical and horizontal planes 209 Panorama 210 Panorama is any wide-angle view or representation of a physical 211 space, whether in painting, drawing, photography, film, seismic 212 images or a three-dimensional model 214 360 degree video 215 360-degree videos, also known as immersive videos or spherical 216 videos, are video recordings where a view in every direction is 217 recorded at the same time, shot using an omnidirectional camera 218 or a collection of cameras. Most 360-degree video is 219 monoscopic (2D), meaning that it is viewed as a one (360x180 220 equirectangular) image directed to both eyes. Stereoscopic 221 video (3D) is viewed as two distinct (360x180 equirectangular) 222 images directed individually to each eye. 360-degree videos are 223 typically viewed via personal computers, mobile devices such as 224 smartphones, or dedicated HMD 226 MTP and MTP Latency 227 Motion-To-Photon. Motion-to-Photon latency is the time needed 228 for a user movement to be fully reflected on a display screen 229 [MTP-Latency]. 231 Unmanaged 232 For the purpose of this document, if an unmanaged Internet 233 service supports AR/VR applications, it means that basic 234 connectivity provides sufficient support without requiring the 235 application or user to separately request any additional 236 service, even as a once-off request. 238 3. Problem Statement 240 Network based AR/VR applications need both low latency and high 241 throughput. We shall see that the ratio of peak to mean bit-rate 242 makes it challenging to hit both targets. To satisfy extreme delay 243 and throughput requirements as a niche service for a few special 244 users would probably be possible but challenging. This document 245 envisages an even more challenging scenario; to support AR/VR usage 246 as a routine service for the mass-market in the future. This would 247 either need the regular unmanaged Internet service to support both 248 low latency and high throughput, or it would need managed Internet 249 services to be so simple to activate that they would be universally 250 accessible. 252 Each of the elements of the above requirements are expanded and 253 quantified briefly below. The figures used are justified in depth in 254 Appendix A. 256 MTP Latency: AR/VR developers generally agree that MTP latency 257 becomes imperceptible below about 20 ms [Carmack13]. However, 258 some research has concluded that MTP latency MUST be less than 259 17ms for sensitive users [MTP-Latency-NASA]. Experience has shown 260 that standards bodies tend to set demanding quality levels, while 261 motivated humans often happily adapt to lower quality although 262 they struggle with more demanding tasks. Therefore, we MUST be 263 clear that this 20 ms requirement is designed to enable immersive 264 interaction for the same wide range of tasks that people are used 265 to undertaking locally. 267 Latency Budget: If the only component of delay was the speed of 268 light, 20 ms round trip would limit the physical distance between 269 the communicating parties to 3,000 km in air or 2,000 km in glass. 270 We cannot expand the physical scope of an AR/VR application beyond 271 this speed-of-light limit. However, we can ensure that 272 application processing and transport-related delays do not 273 significantly reduce this limited scope. As a rule of thumb they 274 should consume no more than 5-10% (1-2 ms) of this 20 ms budget, 275 and preferably less. See Appendix A.1 for the derivation of these 276 latency requirements. 278 +--------------+-------------+----------+-------------+-------------+ 279 | | Entry-level | Advanced | Ultimate 2D | Ultimate 3D | 280 +--------------+-------------+----------+-------------+-------------+ 281 | Video Type | 4K 2D | 12K 2D | 24K 2D | 24K 3D | 282 | | | | | | 283 | Mean bit | 22 Mb/s | 400 Mb/s | 2.9 Gb/s | 3.3 Gb/s | 284 | rate | | | | | 285 | Peak bit | 130 Mb/s | 1.9 Gb/s | 29 Gb/s | 38 Gb/s | 286 | rate | | | | | 287 | Burst time | 33 ms | 17 ms | 8 ms | 8 ms | 288 +--------------+-------------+----------+-------------+-------------+ 290 Table 1: Raw information rate requirements for various levels of AR/ 291 VR (YUV 420, H.265) 293 Raw information rate: Table 1 shows the summary of mean and peak raw 294 information rate for four types of H.265 video. Not only does the 295 raw information rate rise to very demanding levels, even for 12K 296 'Advanced AR/VR'. But the ratio of peak to mean increases from 297 about 6 for 'Entry-Level' AR/VR to nearly 12 for 'Ultimate 3-D' 298 AR/VR. See Appendix A.2 for more details and derivation of these 299 rate requirements. 301 Buffer constraint: It will be extremely inefficient (and therefore 302 costly) to provide sufficient capacity for the bursts. If the 303 latency constraint were not so tight, it would be more efficient 304 to provide less capacity than the peak rate and buffer the bursts 305 (in the network and/or the hosts). However even if capacity were 306 only provided for 1/k of the peak bit rate, play-out would be 307 delayed by (k-1) times the burst time. For instance, if a 1G b/s 308 link were provided for 'Advanced' AR/VR, we can see that k = 1.9. 309 Then play-out would be delayed by (1.9 - 1) * 17 ms = 15 ms. This 310 would consume 75% of our 20 ms delay budget. Therefore, it seems 311 that capacity sufficient for the peak rate will be needed, with no 312 buffering. We then have to rely on application-layer innovation 313 to reduce the peak bit rate. 315 Simultaneous bursts: One way to deal with such a high peak-to-mean 316 ratio would be to multiplex multiple AR/VR sessions within the 317 same capacity. This problem statement assumes that the bursts are 318 not correlated at the application layer. Then the probability 319 that most sessions burst simultaneously would become tiny. This 320 would be useful for the high degree of statistical multiplexing in 321 a core network, but it would be less useful in access networks, 322 which is where the bottleneck usually is, and where the number of 323 AR/VR sessions in the same bottleneck might often be close to 1. 324 Of course, if the bursts are correlated between different users, 325 there will be no multiplexing gain. 327 Problems with Unmanaged TCP Service: An unmanaged TCP solution would 328 probably use some derivative of TCP congestion control [RFC5681] 329 to adapt to the available capacity. The following problems with 330 TCP congestion control would have to be solved: 332 Transmission loss and throughput: TCP algorithms collectively 333 induce a low level of loss, and the lower the loss the faster 334 they go. TCP throughput is used to measure such performance. 335 No matter what TCP algorithm is used, the TCP throughput is 336 always capped by some parameters, such as RTT, packet loss 337 ration, etc. Importantly, the TCP throughput is always lower 338 than the physical link capacity. So, for a single flow to 339 attain the bit-rates shown in Table 1 requires a loss 340 probability that is so low that it could be physically limited 341 by the bit-error probability experienced over optical fiber 342 links. The analysis [I-D.ietf-tcpm-cubic] has collected the 343 data for different TCP throughput and corresponding packet loss 344 ration. 346 Flow-rate equality: 348 Host-Controlled: TCP ensures rough equality between L4 flow 349 rates as a simple way to ensure that no individual flow is 350 starved when others are not [RFC5290]. Consider a scenario 351 where one user has a dedicated 2 Gb/s access line, and they 352 are running an AR/VR applications that needs a minimum of 353 400 Mb/s. If the AR/VR app used TCP, it would fail whenever 354 the user (or their family) happened to start more than 4 355 other TCP long flows at once, i.e, FTP flows. This simple 356 example shows that flow-rate equality will probably need to 357 be relaxed to enable support for AR/VR as part of the 358 regular unmanaged Internet service. Fortunately, when there 359 is enough capacity for one flow to get 400 Mb/s, every flow 360 does not have to get 400 Mb/s to ensure that no-one starves. 361 This line of logic could allow flow-rate equality to be 362 relaxed in transport protocols like TCP. 364 Network-Enforced: However, if parts of the network were 365 enforcing flow rate equality, relaxing it would be much more 366 difficult. For instance, deployment of the per-flow queuing 367 scheduler in fq_CoDel [I-D.ietf-aqm-fq-codel] will introduce 368 this problem. 370 Dynamics: The bursts shown in Table 1 would be problematic for 371 TCP. It is hard for the throughput of one TCP flow to jump an 372 order of magnitude for one or two round trips, and even harder 373 for other TCP flows to yield over the same time-scale without 374 considerable queuing delay and/or loss. 376 Problems with Unmanaged UDP Service: Using UDP as transport cannot 377 solve the problems as faced by TCP. Fundamentally, IP network can 378 only provide the best-effort service, no matter if the transport 379 on top of IP is TCP or UDP. This is determined by the fact that 380 most of network devices use different variations of "Fair Queuing" 381 algorithm to queue IP flows without the awareness of TCP or UDP 382 protocol. As long as a fair queuing algorithm is used, a UDP flow 383 cannot obtain more bandwidth or shorter latency than others. But 384 using UDP may reduce the burden of re-transmission of lost packet, 385 if the lost packet is not so critical, like a non I-frame; or the 386 lost packet has passed its life cycle. Depending on if it has its 387 own congestion control, current UDP service has two types: 389 UDP with congestion control: QUIC is a typical UDP service with 390 congestion control. The congestion control algorithm used in 391 QUIC is similar to TCP CUBIC. This makes QUIC behave also 392 similar to TCP CUBIC. There will be no fundamental difference 393 compared with unmanaged TCP service in terms of fairness, 394 convergence and bandwidth utilization, etc. 396 UDP without congestion control: If UDP is used as transport 397 without extra congestion control, it will be weaker than with 398 congestion control to support the AR/VR application with high 399 throughput and short latency requirements. 401 Problems with Managed Service: As well as the common problems 402 outlined above, such as simultaneous bursts, the management and 403 policy aspects of managed QoS solution are problematic: 405 Complex provisioning: Currently QoS services are not 406 straightforward to enable, which would make routine widespread 407 support of AR/VR unlikely. It has proved particularly hard to 408 standardize how managed QoS services are enabled across host- 409 network and inter-domain interfaces. 411 Universality: For AR/VR support to become widespread and routine, 412 control of QoS provision would need to comply with the relevant 413 Net Neutrality [NET_Neutrality_ISOC] legislation appropriate to 414 the jurisdictions covering each part of the network path. 416 4. IANA Considerations 418 There is no change with regards to IANA 420 5. Security Considerations 422 There is no security issue introduced by this document 424 6. Acknowledgements 426 Special thanks to Bob Briscoe, he has given a lot advice and comments 427 during the period of study and writing of this draft, he also has 428 done a lot revision for the final draft. 430 We would like to thank Kjetil Raaen for comments on early drafts of 431 this work. 433 We also like to thank Huawei's research team leaded by Lei Han, Feng 434 Li and Yue Yin to provide the prospective analysis; also thank 435 Guoping Li, Boyan Tu, Xuefei Tang and Tao Ma from Huawei for their 436 involvement in the work discussion 438 Lastly, we want to thank Huawei's Information LAB, the basic AR/VR 439 data was from its research results 441 7. References 443 7.1. Normative References 445 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 446 Requirement Levels", BCP 14, RFC 2119, 447 DOI 10.17487/RFC2119, March 1997, 448 . 450 7.2. Informative References 452 [Carmack13] 453 Carmack, J., "Latency Mitigation Strategies", February 454 2013, . 457 [Chroma] Wikipedia, "Chroma subsampling", 2016, 458 . 460 [Fiber-Light-Speed] 461 Kevin Miller, "Calculating Optical Fiber Latency", 2012, 462 . 465 [GOP] Wikipedia, "Group of pictures", 2016, 466 . 468 [H264_Primer] 469 Adobe, "H.264 Primer", 2016, . 473 [I-D.ietf-aqm-fq-codel] 474 Hoeiland-Joergensen, T., McKenney, P., 475 dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, "The 476 FlowQueue-CoDel Packet Scheduler and Active Queue 477 Management Algorithm", draft-ietf-aqm-fq-codel-06 (work in 478 progress), March 2016. 480 [I-D.ietf-tcpm-cubic] 481 Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 482 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 483 draft-ietf-tcpm-cubic-04 (work in progress), February 484 2017. 486 [MTP-Latency] 487 Kostov, G., "Fostering Player Collaboration Within a 488 Multimodal Co-Located Game", University of Applied 489 Sciences Upper Austria, Masters Thesis , September 2015, 490 . 494 [MTP-Latency-NASA] 495 Bernard D. Adelstein, et al, NASA Ames Research Center, 496 etc, "HEAD TRACKING LATENCY IN VIRTUAL ENVIRONMENTS: 497 PSYCHOPHYSICS AND A MODEL", 2003, 498 . 501 [NET_Neutrality_ISOC] 502 Internet Society, "Network Neutrality, An Internet Society 503 Public Policy Briefing", 2015, 504 . 507 [PSNR] Wikipedia, "Peak signal-to-noise ratio", 2016, 508 . 511 [Raaen16] Raaen, K., "Response time in games : requirements and 512 improvements", University of Oslo, PhD Thesis , February 513 2016, . 516 [RFC5290] Floyd, S. and M. Allman, "Comments on the Usefulness of 517 Simple Best-Effort Traffic", RFC 5290, 518 DOI 10.17487/RFC5290, July 2008, 519 . 521 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 522 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 523 . 525 [VR-Sickness] 526 Wikipedia, "Virtual reality sickness", 2016, 527 . 530 [YUV] Wikipedia, "YUV", 2016, . 533 Appendix A. Key Factors for Network-Based AR/VR 535 A.1. Latency Requirements 537 A.1.1. Motion to Photon (MTP) Latency 539 Latency is the most important quality parameter of AR/VR 540 applications. With streaming video, caching technology located 541 closer to the user can reduce speed-of-light delays. In contrast 542 with AR/VR user actions are interactive and rarely predictable. At 543 any time a user can turn the HMD to any angle or take any other 544 action in response to virtual reality events. 546 AR/VR developers generally agree that MTP latency becomes 547 imperceptible below about 20 ms [Carmack13]. However, some research 548 has concluded that MTP latency MUST be less than 17ms for sensitive 549 users [MTP-Latency-NASA]. For a summary of numerous references 550 concerning the limit of human perception of delay see the thesis of 551 Raaen [Raaen16]. 553 Latency greater than 20 ms not only degrades the visual experience, 554 but also tends to result in Virtual Reality Sickness [VR-Sickness]. 555 Also known as cybersickness, this can cause symptoms similar to 556 motion sickness or simulator sickness, such as general discomfort, 557 headache, nausea, vomiting, disorientation, etc. 559 Sensory conflict theory believes that sickness can occur when a 560 user's perception of self-motion is based on inconsistent sensory 561 inputs between the visual system, vestibular (balance) system, and 562 non-vestibular proprioceptors (muscle spindles), particularly when 563 these inputs are at odds with the user's expectations from prior 564 experience. Sickness can be minimized by keeping MTP latency below 565 the threshold where humans can detect the lag between visual input 566 and self-motion. 568 The best localized AR/VR systems have significantly improved speed of 569 sensor detection, display refresh, and GPU processing in their head- 570 mounted displays (HMDs) to bring MTP latency below 20 ms for 571 localized AR/VR. However, network-based AR/VR research has just 572 started. 574 A.1.2. Latency Budget 576 Figure 1 illustrates the main components of E2E delay in network- 577 based AR/VR. 579 +------+ +------+ +------+ 580 | T1 |----------->| T4 |------------>| T2 | 581 +------+ +------+ +------+ 582 | 583 | 584 | 585 +------+ | 586 | T6 | | 587 +------+ | 588 ^ | 589 | | 590 | v 591 +------+ +------+ +------+ 592 | T5 |<-----------| T4 |<------------| T3 | 593 +------+ +------+ +------+ 595 T1: Sensor detection and Action capture 596 T2: Computation for ROI (Range of Interest) processing, rendering 597 and encoding 598 T3: GOP (group of pictures) framing and streaming 599 T4: Network transport 600 T5: Terminal decoding 601 T6: Screen refresh 603 Figure 1: The main components of E2E delay in network-based AR/VR 605 Table 2 shows approximate current values and projected values for 606 each component of E2E delay, based on likely technology advances in 607 hardware and software. 609 The current network transport latency is comprised of physical 610 propagation delay and switching/forwarding delay at each network 611 device. 613 1. The physical propagation delay: This is the delay caused by the 614 speed limit of signal transmitting in physical media. Take the fiber 615 as example, the optical transmit cannot exceed the light speed, or, 616 300km/ms in free space. But, light moving through the fiber optic 617 core will travel slower than light through a vacuum because of the 618 differences of the refractive index of light in free space and in the 619 glass. In normal optical fiber, the light speed is about 200km/ms 620 [Fiber-Light-Speed]. 622 2. The switching/forwarding delay: This delay normally is much more 623 than the physical propagation delay, which can vary from 200us to 624 200ms at each hop. 626 +---------+--------------------+----------------------+ 627 | Latency | Current value (ms) | Projected value (ms) | 628 +---------+--------------------+----------------------+ 629 | T1 | 1 | 1 | 630 | T2 | 11 | 2 | 631 | T3 | 110 to 1000 | 5 | 632 | T4 | 0.2 to 100 | ? | 633 | T5 | 5 | 5 | 634 | T6 | 1 | 0.01 | 635 | | | | 636 | MTP | 130 to 1118 | 13 + ? | 637 +---------+--------------------+----------------------+ 639 MTP = T1+T2+T3+T4+T5+T6 641 Table 2: Current and projected latency in key stages in network based 642 AR/VR 644 We can see that MTP latency is currently much greater than 20 ms. 646 If we project that the technology development and advance would bring 647 down the latency in some areas, such as reducing the latency caused 648 by GOP framing and streaming dramatically down to 5ms by using 649 improved parallel hardware processing, and reducing display response 650 time (refreshing latency) to 0.1 us by using OLED, etc; then the 651 budget for the round trip network transport latency will be about 5 652 to 7 ms. 654 This budget will be consumed by propagation delay, switching delay 655 and queuing delay. We can conclude 657 1. The physical distance between user and AR/VR server is limited 658 and MUST be less than 1000km. So, the deployment of AR/VR server 659 SHOULD be close to user as much as possible. 661 2. The total delay budget for network device will be low single 662 digit, i.e. if the distance between user and AR/VR server is 600KM, 663 then the accumulated maximum delay (round trip) allowed for all 664 network devices is about 2 to 4ms. This is equivalent to 1 to 2ms 665 delay in one direction for all network devices on the path. 667 A.2. Throughput Requirements 669 The Network bandwidth required for AR/VR is the actual TCP throughput 670 required by application if the AR/VR stream is transported by TCP. 671 It is another critical parameter for the quality of AR/VR 672 application. 674 The AR/VR network bandwidth depends on the raw streaming data rate, 675 or the bit rate for the video stream. 677 A.2.1. Average Throughput 679 The average network bandwidth for AR/VR is the average bit rate for 680 AR/VR video. 682 For AR/VR video stream, there are many parameters that can impact the 683 bit rate, such as display resolution, 2D or 3D, normal view or 684 panorama view, the codec type for the video processing, the color 685 space and sampling algorithm, the video pattern, etc. 687 Normally, the bit rate for 3D is approximately 1.5 times of 2D; and 688 the bit rate for panorama view is about 4 times of normal view. 690 The latest codec process for high resolution video is H.246 and 691 H.265. It has very high compression ratio. 693 The color space and sampling used in modern video streaming are YUV 694 system [YUV] and chroma subsampling [Chroma]. 696 YUV encodes a color image or video taking human perception into 697 account, allowing reduced bandwidth for chrominance components, 698 thereby typically enabling transmission errors or compression 699 artifacts to be more efficiently masked by the human perception than 700 using a "direct" RGB-representation. 702 Chroma subsampling is the practice of encoding images by implementing 703 less resolution for chroma information than for luma information, 704 taking advantage of the human visual system's lower acuity for color 705 differences than for luminance. 707 There are different sampling systems depends on the ratio of 708 different samples for colors, such as Y'CrCb 4:1:1, Y'CrCb 4:2:0, 709 Y'CrCb 4:2:2, Y'CrCb 4:4:4 and Y'CrCb 4:4:0. The most widely used 710 sampling methods is Y'CrCb 4:2:0, this is often called YUV420 (note, 711 the similar sampling for analog encoding is called Y'UV). 713 The video pattern, or motion rank, will also impact the stream bit 714 rate. The video frames change more frequent, the less data 715 compression will be obtained. 717 Compressed video stream consists of ordered successive group of 718 pictures, or GOP [GOP]. There are three types of pictures (or 719 frames) used in video compression, , such as H.264: 721 Intra code picture, or I-frames [GOP], Predictive coded picture, or 722 P-frames [GOP] and Bipredictive coded picture, or B-frames [GOP]. 724 An I-frame is in effect a fully specified picture, like a 725 conventional static image file. P-frames and B-frames hold only part 726 of the image information, so they need less space to store than an 727 I-frame and thus improve video compression rates. A P-frame holds 728 only the changes in the image from the previous frame. P-frames are 729 also known as delta-frames. A B-frame saves even more space by using 730 differences between the current frame and both the preceding and 731 following frames to specify its content. 733 A typical video stream have a sequence of GOP with pattern, for 734 example, IBBPBBPBBPBB, or, IBBBBPBBBBPBBBB. 736 The real bit rate also depends on the quality of the image user like 737 to view. The Peak signal-to-noise ratio, or PSNR [PSNR] is to denote 738 the quality of a image. The higher the PSNR, the better quality of 739 the image, and the higher the bit rate. 741 Since human can only distinguish some level of image quality 742 difference, it would be efficient to network if we could provide 743 image with minimum PSNR that human eye perception cannot distinguish 744 with image having higher PSNR. Unfortunately, this is still a 745 research topic and there is no fixed minimum PSNR applies all people. 747 So, there is no exact formula for the bit rate, however, we can have 748 experimental formula for the rough estimation of the bit rate for 749 different parameters. 751 Formula (1) is from the H.264 Primer [H264_Primer]: 753 Information rate = W * H * FPS * Rank * 0.07, (1) 755 where: 756 W: Number of pixels in horizontal direction 757 H: Number of pixels in vertical direction 758 FPS: Frames per second 759 Rank: Motion rank, which can be: 760 1: Low motion: video that has minimal movement 761 2: Medium motion: video that has some degree of movement 762 4: High motion: video that has a lot of movement and 763 movement is unpredictable 765 The four formulae tagged (2) below are more generic and with more 766 parameters for calculation of approximate information rates: 768 Average information rate = T * W * H * S * d * FPS / Cv ) 769 I-frame information rate = T * W * H * S * d * FPS / Cj ) 770 Burst size = T * W * H * S * d / Cj ) (2) 771 Burst time = 1/FPS ) 773 where: 774 T: Type of video, 1 for 2D, 2 for 3D 775 W: Number of pixels in horizontal direction 776 H: Number of pixels in vertical direction 777 S: Scale factor, which can be: 778 1 for YUV400 779 1.5 for YUV420 780 2 for YUV422 781 3 for YUV444 782 d: Color depth bits 783 FPS: Frames per second 784 Cv: Average compression ratio for video 785 Cj: Compression ratio for I-frame 787 Table 2 shows the bit rate calculated by the above formula 2 for 788 different AR/VR levels. 790 It MUST be noted that in the Table 2: 792 1. There is no industry standard about the type of VR yet. The 793 definition in the table is simply based on the 4K, 12K and 24K videos 794 for 360x180 degree display. The Ultimate VR is roughly corresponding 795 to the so called "Retina Display" which is about 60 PPD (Pix per 796 degree) or 300 PPI (Pix per inch). However, there is argument about 797 what is the limit of the human vision. J. Blackwell of the Optical 798 Society of America has determined in 1946 that the resolution of the 799 human eye was actually closer to 0.35 arc minutes, which is more than 800 3 times of the Apple's Retina Display (60 PPD). 802 2. The Mean and Peak Bit Rate illustrated in the table is calculated 803 for a specific video with the acceptable perceptive PSNR, and with 804 the typical compression ratio. It does not represent all type of 805 videos. So, the compression ratio in the table is not universally 806 applicable to all videos. 808 3. It MUST be aware that in the real use case, there are many 809 schemes to reduce the video bit rate further in addition to the 810 mandatory video compression. For example, only transmit the expected 811 resolution for the video in the FOV in time, but transmit the video 812 in other areas in slower speed, lower quality and lower resolution. 813 All these technologies and their impact to the bandwidth are out of 814 the scope of the document. 816 4. We assume the whole 360 degree video is transmitted to user site. 817 The same video could be viewed by naked eye, or by HMD (without too 818 much processing power). Thus, there is no difference to the network 819 in bit rate, burst and burst time; The only difference is that using 820 HMD can only view the video limited by its view angle. But if the 821 HMD has its own video decoder, powerful processing and can directly 822 communicate with the AR/VR content source, the network only needs to 823 transport the data defined by HMD resolution which is only a small 824 percentage of the whole 360 degree video. The corresponding data for 825 mean/peak bit rate, burst size can be easily calculated by the 826 formula (2). The last row "Infor Ratio of HMD/Whole video" denotes 827 the ratio of Information amount (mean/peak bit rate and burst size) 828 between HMD and the whole 360 degree video. 830 +-----------------+---------------+----------------+----------------+ 831 | | Entry-level VR| Advanced VR | Ultimate VR | 832 +-----------------+---------------+----------------+----------------+ 833 | Type | 4K 2D Video | 12K 2D Video | 24K 3D Video | 834 +-----------------+---------------+----------------+----------------+ 835 | Resolution W*H | 3840*1920 | 11520*5760 | 23040*11520 | 836 |360 degree video | | | | 837 +-----------------+---------------+----------------+----------------+ 838 | HMD Resolution/ | 960*960/ | 3840*3840/ | 7680*7680/ | 839 | view angle | 90 | 120 | 120 | 840 +-----------------+---------------+----------------+----------------+ 841 | PPD | 11 | 32 | 64 | 842 | (Pix per degree)| | | | 843 +-----------------+---------------+----------------+----------------+ 844 | d (bit) | 8 | 10 | 12 | 845 +-----------------+---------------+----------------+----------------+ 846 | Cv | 120 | 150 |200(2D), 350(3D)| 847 +-----------------+---------------+----------------+----------------+ 848 | FPS | 30 | 60 | 120 | 849 +-----------------+---------------+----------------+----------------+ 850 | Mean Bit rate | 22Mbps | 398Mbps | 2.87Gbps(2D) | 851 | | | | 3.28Gbps(3D) | 852 +-----------------+---------------+----------------+----------------+ 853 | Cj | 20 | 30 | 20(2D), 30(3D) | 854 +-----------------+---------------+----------------+----------------+ 855 | Peak bit rate | 132Mbps | 1.9Gbps | 28.7Gbps(2D)| 856 | | | | 38.2Gbps(3D)| 857 +-----------------+---------------+----------------+----------------+ 858 | Burst size | 553K byte | 4.15M Byte | 29.9M Byte(2D)| 859 | | | | 39.8M Byte(3D)| 860 +-----------------+---------------+----------------+----------------+ 861 | Burst time | 33ms | 17ms | 8ms | 862 +-----------------+---------------+----------------+----------------+ 863 | Infor Ratio of | 0.125 | 0.222 | 0.222 | 864 | HMD/Whole Video | | | | 865 +-----------------+---------------+----------------+----------------+ 867 Table 2 Bit rate for different VR (use YUV420 and H.265) 869 A.2.2. Peak Throughput 871 The peak bandwidth for AR/VR is the peak bit rate for an AR/VR video. 872 In this document, It is defined as the bit rate required to transport 873 an I-frame, and the burst size is the size of I-frame, burst time is 874 the time the I-frame must be transported from end to end based on 875 FPS. 877 Similar to the Mean Bit rate, the calculation of Peak bit rate is 878 purely theoretical and does not take any optimization into account. 880 There are two scenarios that a new I-frame will be generated and 881 transported. One is when the AR/VR video display has dramatically 882 changes that there is no similarity between two images; Another is 883 when the FOV changes. 885 When AR/VR user is moving header or moving his eyeball to change 886 Range of Interest, the FOV will be changed. FOV change may lead to 887 the re-transmit of a new I-frame 889 Since there is no reference frame for the video compression, the 890 I-frame can only be compressed by the infra-frame processing, or the 891 compression for a static image like JPEG, and the compression ratio 892 is much smaller than the inter-frame compression ratio. 894 It is estimated that the normal quality JPEG compression is about 20 895 to 30, This is only a fraction of the compression ratio for the 896 normal video streaming. 898 In addition to the low compression issue, there is another problem 899 involved. Due to the limit of MTP, the new I-frame must be rendered, 900 grouped, transmitted and displayed in the delay budge for the network 901 transport. This will cause the peak bit rate and burst size much 902 bigger than the normal video streaming like IPTV. 904 The peak bit rate or the bit rate for I-frame, burst size and burst 905 time are shown in the Formula 2. From the formula we can see the 906 ratio of peak bit rate and the average bit rate is the ration of Cv/ 907 Cj. Since the Cv could be 100 to 200 for 2D, but the Cj is only 908 about 20 to 30, so, the peak bit rate is about 10 times of average 909 bit rate. 911 Authors' Addresses 913 Lin Han (editor) 914 Huawei Technologies 915 2330 Central Expressway 916 Santa Clara, CA 95050 917 USA 919 Phone: +10 408 330 4613 920 Email: lin.han@huawei.com 921 Steve Appleby 922 BT 923 UK 925 Email: steve.appleby@bt.com 927 Kevin Smith 928 Vodafone 929 UK 931 Email: Kevin.Smith@vodafone.com