idnits 2.17.1 draft-geng-rtgwg-cfn-dyncast-ps-usecase-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 30, 2020) is 1272 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-04) exists of draft-sarathchandra-coin-appcentres-03 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 rtgwg L. Geng 3 Internet-Draft P. Liu 4 Intended status: Informational China Mobile 5 Expires: May 3, 2021 P. Willis 6 BT 7 October 30, 2020 9 Dynamic-Anycast in Compute First Networking (CFN-Dyncast) Use Cases and 10 Problem Statement 11 draft-geng-rtgwg-cfn-dyncast-ps-usecase-00 13 Abstract 15 Service providers are exploring the edge computing to achieve better 16 response time, control over data and carbon energy saving by moving 17 the computing services towards the edge of the network in scenarios 18 of 5G MEC (Multi-access Edge Computing), virtualized central office, 19 and others. Providing services by sharing computing resources from 20 multiple edges is emerging and becoming more and more useful for 21 computationally intensive tasks. The service nodes attached to 22 multiple edges normally have two key features, service equivalency 23 and service dynamism. Ideally they should serve the service in a 24 computational balanced way. However lots of approaches dispatch the 25 service in a static way, e.g., to the geographically closest edge, 26 and they may cause unbalanced usage of computing resources at edges 27 which further degrades user experience and system utilization. This 28 draft provides an overview of scenarios and problems associated. 30 Networking taking account of computing resource metrics as one of its 31 top parameters is called Compute First Networking (CFN) in this 32 document. The document identifies several key areas which require 33 more investigations in architecture and protocol to achieve the 34 balanced computing and networking resource utilization among edges in 35 CFN. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on May 3, 2021. 54 Copyright Notice 56 Copyright (c) 2020 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 72 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 4 73 3. Main Use-Cases . . . . . . . . . . . . . . . . . . . . . . . 4 74 3.1. Cloud Based Recognition in Augmented Reality (AR) . . . . 4 75 3.2. Connected Car . . . . . . . . . . . . . . . . . . . . . . 5 76 3.3. Cloud Virtual Reality (VR) . . . . . . . . . . . . . . . 5 77 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 6 78 5. Problems Statement . . . . . . . . . . . . . . . . . . . . . 6 79 5.1. Anycast based service addressing methodology . . . . . . 7 80 5.2. Flow affinity . . . . . . . . . . . . . . . . . . . . . . 7 81 5.3. Computing Aware Routing . . . . . . . . . . . . . . . . . 8 82 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 84 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 85 9. Informative References . . . . . . . . . . . . . . . . . . . 9 86 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 9 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 89 1. Introduction 91 Edge computing aims to provide better response times and transfer 92 rate, with respect to Cloud Computing, by moving the computing 93 towards the edge of the network. Edge computing can be built on 94 industrial PCs, embedded systems, gateways and others. They are put 95 close to the end user. There is an emerging requirement that 96 multiple edge sites (called edges too in this document) are deployed 97 at different locations to provide the service. There are millions of 98 home gateways, thousands of base stations and hundreds of central 99 offices in a city that can serve as candidate edges for hosting 100 service nodes. Depending on the location of the edge and its 101 capacity, each edge has different computing resources to be used for 102 a service. At peak hour, computing resources attached to a client's 103 closest edge site may not be sufficient to handle all the incoming 104 service demands. Longer response time or even demand dropping can be 105 experienced by the user. Increasing the computing resources hosted 106 on each edge site to the potential maximum capacity is neither 107 feasible nor economical. 109 Some user devices are purely battery-driven. Offloading the 110 computation intensive processing to the edge can save the battery. 111 Moreover the edge may use the data set (for the computation) that may 112 not exist on the user device because of the size of data pool or data 113 governance reasons. 115 At the same time, with the new technologies such as serverless 116 computing and container based virtual functions, service node on an 117 edge can be easily created and terminated in a sub-second scale. It 118 makes the available computing resources for a service change 119 dramatically over time at an edge. 121 DNS-based load balancing usually configures a domain in Domain Name 122 System (DNS) such that client requests to the domain are distributed 123 across a group of servers. It usually provides several IP addresses 124 for a domain name. The traditional techniques to manage the overall 125 load balancing process of clients issuing requests including choose- 126 the-closest or round-robin. The are relatively static which may 127 cause the unbalanced workload distribution in terms of network load 128 and computational load. 130 There are some dynamic ways which tries to distribute the request to 131 the server with the best available resources and minimal load. They 132 usually require L4-L7 handling of the packet processing. It is not 133 an efficient approach for huge number of short connections. At the 134 same time, such approaches can hardly get network status in real 135 time. Therefore the choice of the service node is almost entirely 136 determined by the computing status, rather than the comprehensive 137 consideration of both computing and network. 139 Networking taking account of computing resource metrics as one of its 140 top parameters is called Compute First Networking (CFN) in this 141 document. Edge site can interact with each other to provide network- 142 based edge computing service dispatching to achieve better load 143 balancing in CFN. Both computing load and network status are network 144 visible resources. 146 A single service has multiple instances attached to multiple edge 147 computing sites is conceptually like anycast in network language. 148 Because of the dynamic and anycast aspects of the problem, jointly 149 with the CFN deployment, we generally refer to it in this document as 150 CFN-Dyncast, as for Compute First Networking Dynamic Anycast. This 151 draft describes usage scenarios, problem space and key areas of CFN- 152 Dyncast. 154 2. Definition of Terms 156 CFN: Compute First Networking. Networking architecture taking 157 account of computing resource metrics as one of its top parameters to 158 achieve flexible load management and performance optimizations in 159 terms of both network and computing resources. 161 CFN-Dyncast: Compute First Networking Dynamic Anycast. The dynamic 162 and anycast aspects of the architecture in a CFN deployment. 164 3. Main Use-Cases 166 This section presents several typical scenarios which require 167 multiple edge sites to interconnect and to co-ordinate at network 168 layer to meet the service requirements and ensure user experience. 170 3.1. Cloud Based Recognition in Augmented Reality (AR) 172 In AR environment, the end device captures the images via cameras and 173 sends out the computing intensive service demand. Normally service 174 nodes at the edge are responsible for tasks with medium computational 175 complexity or low latency requirement like object detection, feature 176 extraction and template matching, and service nodes at cloud are 177 responsible for the most intensive computational tasks like object 178 recognition or latency non-sensitive tasks like AI based model 179 training. The end device hence only handles the tasks like target 180 tracking and image display, thereby reducing the computing load of 181 the client. 183 The computing resource for a specific service at the edge can be 184 instantiated on-demand. Once the task is completed, this resource 185 can be released. The lifetime of such "function as a service" can be 186 on a millisecond scale. Therefore computing resources on the edges 187 have distributed and dynamic natures. A service demand has to be 188 sent to and served by an edge with sufficient computing resource and 189 a good network path. 191 3.2. Connected Car 193 In auxiliary driving scenarios, to help overcome the non-line-of- 194 sight problem due to blind spot or obstacles, the edge node can 195 collect the comprehensive road and traffic information around the 196 vehicle location and perform data processing, and then the vehicles 197 in high security risk can be signaled. It improves the driving 198 safety in complicated road conditions, like at the intersections. 199 The video image information captured by the surveillance camera is 200 transmitted to the nearest edge node for processing. Warnings can be 201 sent to the cars driving too fast or under other invisible dangers. 203 When the local edge node is overloaded, the service demand sent to it 204 will be queued and the response from the auxiliary driving will be 205 delayed, and it may lead to traffic accidents. Hence, in such cases, 206 delay-insensitive services such as in-vehicle entertainment should be 207 dispatched to other light loaded nodes instead of local edge nodes, 208 so that the delay-sensitive service is preferentially processed 209 locally to ensure the service availability and user experience. 211 3.3. Cloud Virtual Reality (VR) 213 Cloud VR introduces the concept and technology of cloud computing and 214 rendering into VR applications. Edge cloud helps encode/decode and 215 rendering in this scenario. The end device usually only uploads the 216 posture or control information to the edge and then VR contents are 217 rendered in edge cloud. The video and audio outputs generated from 218 edge cloud are encoded, compressed, and transmitted back to the end 219 device or further transmitted to central data center via high 220 bandwidth network. 222 Cloud VR services have high requirements on both network and 223 computing. For example, for an entry-level Cloud VR (panoramic 8K 2D 224 video) with 110-degree Field of View (FOV) transmission, the typical 225 network requirements are bandwidth 40Mbps, RTT 20ms, packet loss rate 226 is 2.4E-5; the typical computing requirements are 8K H.265 real-time 227 decoding, 2K H.264 real-time encoding. 229 Edge site may use CPU or GPU for encode/decode. GPU usually has 230 better performance but CPU is more simple and straight forward for 231 use. Edges have different computing resources in terms of CPU and 232 GPU physically deployed. Available remaining resource determines if 233 a gaming instance can be started. The instance CPU, GPU and memory 234 utilization has a high impact on the processing delay on encoding, 235 decoding and rendering. At the same time, the network path quality 236 to the edge site is a key for user experience on quality of audio/ 237 video and game command response time. 239 Cloud VR service brings challenging requirements on both network and 240 computing so that the edge node to serve a service demand has to be 241 carefully selected to make sure it has sufficient computing resource 242 and good network path. 244 4. Requirements 246 This document mainly targets at the typical edge computing scenarios 247 with two key features, service equivalency and service dynamism. 249 o Service equivalency: A service is provided by one or more service 250 instances, providing an equivalent service functionality to 251 clients, while the existence of several instances is (possibly 252 across multiple edges) is to ensure better scalability and 253 availability 255 o Service dynamism: A single instance has very dynamic resources 256 over time to serve a service demand. Its dynamism is affected by 257 computing resource capability and load, network path quality, and 258 etc. The balancing mechanisms should adapt to the service 259 dynamism quickly and seamlessly. Failover kind of switching is 260 not desired. 262 5. Problems Statement 264 A service demand should be routed to the most suitable edge and 265 further to the service instance in real time among the multiple edges 266 with service equivalency and dynamism. Existing mechanisms use one 267 or more of the following ways and each of them has issues associated. 269 o Use the least network cost as metric to select the edge. Issue: 270 Computing information is a key to be considered in edge computing, 271 and it is not included here. 273 o Use geographical location deduced from IP prefix, pick the closest 274 edge. Issue: Edges are not so far apart in edge computing 275 scenario. Either hard to be deduced from IP address or the 276 location is not the key distinguisher. 278 o Health check in an infrequent base (>1s) to reflect the service 279 node status, and switch when fail-over. Issue: Health check is 280 very different from computing status information of service 281 instance. It is too coarse granularity. 283 o Application layer randomly picks or uses round-robin way to pick a 284 service node. Issue: It may share the load across multiple 285 service instances in terms of the computing capacity, the network 286 cost variance is barely considered. Edges can be deployed in 287 different cities which are not equal cost paths to a client. 288 Therefore network status is also a major concern. 290 o Global resolver and early binding (DNS-based load balancing): 291 Client queries a global resolver or load balancer first and gets 292 the exact server's address. And then steer traffic using that 293 address as binding address. It is called early binding because an 294 explicit binding address query has to be performed before sending 295 user data. Issue: Firstly, it clashes with the service dynamism. 296 Current resolver does not have the capability of such high 297 frequent change of indirection to new instance based on the 298 frequent change of each service instance. Secondly, edge 299 computing flow can be short. One or two round trip would be 300 completed. Out-of-band query for specific server address has high 301 overhead as it takes one more round trips. As discussed in 302 section 5.4 of [I-D.sarathchandra-coin-appcentres], the flexible 303 re-routing to appropriate service instances out of a pool of 304 available ones faces significant challenges when utilizing DNS for 305 this purpose. 307 o Traditional anycast. Issue: Only works for single request/reply 308 communication. No flow affinity guaranteed. 310 A network based dynamic anycast (Dyncast) architecture aims to 311 address the following points in CFN (CFN-Dyncast). 313 5.1. Anycast based service addressing methodology 315 A unique service identifier is used by all the service instances for 316 a specific service no matter which edge it attaches to. An anycast 317 like addressing and routing methodology among multiple edges makes 318 sure the data packet potentially can reach any of the edges with the 319 service instance attached. At the same time, each service instance 320 has its own unicast address to be used by the attaching edge to 321 access the service. From service identifier (an anycast address) to 322 a specific unicast address, the discovery and mapping methodology is 323 required to allow in-band service instance and edge selection in real 324 time in network. 326 5.2. Flow affinity 328 The traditional anycast is normally used for single request single 329 response style communication as each packet is forwarded individually 330 based on the forwarding table at the time. Packets may be sent to 331 different places when the network status changes. CFN in edge 332 computing requires multiple request multiple response style 333 communication between the client and the service node. Therefore the 334 data plane must maintain flow affinity. All the packets from the 335 same flow should go to the same service node. 337 5.3. Computing Aware Routing 339 Given that the current state of the art for routing is based on the 340 network cost, computing resource and/or load information is not 341 available or distributed at the network layer. At the same time, 342 computing resource metrics are not well defined and understood by the 343 network. They can be CPU/GPU capacity and load, number of sessions 344 currently serving, latency of service process expected and the 345 weights of each metric. Hence it is hard to make the best choice of 346 the edge based on both computing and network metrics at the same 347 time. 349 Computing information metric representation has to be agreed on by 350 the participated edges and metrics are to be exchanged among them. 352 Network cost in current routing system does not change very 353 frequently. However computing load is highly dynamic information. 354 It changes rapidly with the number of sessions, CPU/GPU utilization 355 and memory space. It has to be determined at what interval or event 356 such information needs to be distributed among edges. More frequent 357 distribution more accurate synchronization, but also more overhead. 359 Choosing the least path cost is the most common rule in routing. 360 However, the logic does not work well in computing aware routing. 361 Choosing the least computing load may result in oscillation. The 362 least load edge can quickly be flooded by the huge number of new 363 computing demands and soon become overloaded. Tidal effect may 364 follow. 366 Depending on the usage scenario, computing information can be carried 367 in BGP, IGP or SDN-like centralized way. More investigations in 368 those solution spaces is to be elaborated in other documents. It is 369 out of scope of this draft. 371 6. Summary 373 This document presents the CFN-Dyncast problem statement. CFN- 374 Dyncast aims at leveraging the resources mobile providers have 375 available at the edge of their networks. However, CFN-Dyncast aims 376 at taking into account as well the dynamic nature of service demands 377 and the availability of network resources so as to satisfy service 378 requirements and load balance among service instances. 380 This also document illustrate some use cases problems and list the 381 requirements for CFN-Dyncast. CFN-Dyncast architecture should 382 addresses how to distribute the computing resource information at the 383 network layer and how to assure flow affinity in an anycast based 384 service addressing environment. 386 7. Security Considerations 388 TBD 390 8. IANA Considerations 392 No IANA action is required so far. 394 9. Informative References 396 [I-D.sarathchandra-coin-appcentres] 397 Trossen, D., Sarathchandra, C., and M. Boniface, "In- 398 Network Computing for App-Centric Micro-Services", draft- 399 sarathchandra-coin-appcentres-03 (work in progress), 400 October 2020. 402 Acknowledgements 404 The author would like to thank Yizhou Li, Luigi IANNONE and Dirk 405 Trossen for their valuable suggestions to this document. 407 Authors' Addresses 409 Liang Geng 410 China Mobile 412 Email: gengliang@chinamobile.com 414 Peng Liu 415 China Mobile 417 Email: liupengyjy@chinamobile.com 419 Peter Willis 420 BT 422 Email: peter.j.willis@bt.com