idnits 2.17.1 draft-yang-alto-deliver-functions-over-networks-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 13, 2020) is 1381 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 5693 == Outdated reference: A later version (-24) exists of draft-ietf-alto-unified-props-new-09 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ALTO WG S. Yang 3 Internet-Draft L. Cui 4 Intended status: Standards Track Shenzhen University 5 Expires: January 14, 2021 M. Xu 6 Tsinghua University 7 Y. Yang 8 Tongji/Yale 9 R. Huang 10 Research Institute of Tsinghua University in Shenzhen 11 July 13, 2020 13 Delivering Functions over Networks: Traffic and Performance Optimization 14 for Edge Computing using ALTO 15 draft-yang-alto-deliver-functions-over-networks-01 17 Abstract 19 As the rapid development of the Internet, huge amounts of data are 20 being generated. To satisfy user demands, service providers deploy 21 services near the edge networks. In order to achieve better 22 performances, computing functions and user traffic need to be 23 scheduled properly. However, it is challenging to efficiently 24 schedule resources among the distributed edge servers due to the lack 25 of underlying information, e.g., network topology, traffic 26 distribution, link delay/bandwidth, utilization/capability of 27 computing servers. In this document, we employ the ALTO protocol to 28 help deliver functions and schedule traffic within the edge computing 29 platform. The protocol will provide information of multiple 30 resources for the distributed edge computing platform. The usage of 31 ALTO will improve the efficiency of function delivery in edge 32 computing. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 14, 2021. 50 Copyright Notice 52 Copyright (c) 2020 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 69 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 3.1. Edge computing . . . . . . . . . . . . . . . . . . . . . 4 71 3.2. Benefits of ALTO protocol . . . . . . . . . . . . . . . . 4 72 3.3. List of resources and services/functions . . . . . . . . 5 73 4. Scenario of delivering function . . . . . . . . . . . . . . . 6 74 5. Delivering functions over edge computing with ALTO protocol . 7 75 6. Implementation and Deployment . . . . . . . . . . . . . . . . 8 76 6.1. Implementation . . . . . . . . . . . . . . . . . . . . . 8 77 6.2. Deployment . . . . . . . . . . . . . . . . . . . . . . . 9 78 6.3. ALTO Integration . . . . . . . . . . . . . . . . . . . . 9 79 7. Management of Functions . . . . . . . . . . . . . . . . . . . 9 80 8. Multi-domain System . . . . . . . . . . . . . . . . . . . . . 10 81 9. Scheduling Framework . . . . . . . . . . . . . . . . . . . . 10 82 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 83 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 84 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 85 12.1. Normative References . . . . . . . . . . . . . . . . . . 12 86 12.2. Informative References . . . . . . . . . . . . . . . . . 12 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 89 1. Introduction 91 Internet of Things (IoT), artificial intelligence, virtual reality 92 and augmented reality (VR/AR) are developing rapidly, holding promise 93 for the future. The new applications are generating huge amounts of 94 data that need to be processed efficiently. The processing 95 applications involve kinds of functions/services according to user 96 demands. For example, 1) surveillance video could be analysed by AI 97 functions; 2) Hi-Definition video or VR/AR video should be encoded/ 98 decoded; 3) Content can be stored in edge networks, which can also be 99 seen as a function/service. Function as a service (FaaS) is becoming 100 more and more popular among cloud computing providers, e.g., Amazon 101 Lambda and IBM Openwhisk. It is expected that functions/services 102 would be deployed anywhere in networks. 104 Some of the functions/services put strong requirements on quality of 105 services provided by underlying networks, e.g., the delay and jitter 106 should be as small as possible to guarantee user experiences. 107 Different with Mesos and Kubernetes, which can schedule computing 108 resources efficiently in a computing cluster, deploy functions in 109 wide area networks is much more complex. 111 Firstly, properly deploying functions over distributed networks takes 112 multiple resources into considerations, including network traffic, 113 topology, link delay/bandwidth, computing capacity/utilization of 114 each computing cluster, etc. Besides, the resources are usually 115 scheduled across multiple domains to satisfy user demands. Thus, 116 these information needed to be collected with unified interfaces and 117 protocols, and resources scheduling algorithms SHOULD be optimized to 118 improve user experiences, and network performances, such as load 119 balancing. In this document, we will deliver functions over the edge 120 computing networks to utilize the computing and network resources 121 more efficiently. 123 We use the ALTO (Application-Layer Traffic Optimization) [RFC7285] to 124 optimize network traffic and performance by delivering functions over 125 the edge computing network. ALTO can provide global network 126 information for the distributed applications, while the information 127 can not be retrieved or computed by the applications themselves 128 [RFC5693]. Generally, the ALTO protocol will collect and compute 129 network information for the distributed edge clusters, including link 130 delay, network traffic, and other cost metrics. Finally, based on 131 pre-defined scheduling algorithms, the system will deliver the 132 functions to the most appropriate edge clusters according to the 133 information provided by the ALTO protocol. 135 For brevity, in this document, we will use the terminologies 136 introduced in [RFC7285] and [I-D.ietf-alto-unified-props-new]. 138 2. Conventions and Terminology 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in [RFC2119]. 144 3. Background 146 3.1. Edge computing 148 Edge computing was proposed to improve network performance in terms 149 of latency, security, bandwidth, etc. In edge computing 150 infrastructure, servers are deployed at the edge to reduce the 151 distance between users and servers. Users can submit their tasks to 152 the edge servers, which will process the tasks and return the 153 computational results back to the users. Compared with traditional 154 centralized computing, the latency, bandwidth and network traffic 155 performance of edge computing is better. Nowadays, edge computing is 156 used in different areas, e.g., latency-sensitive applications such as 157 IoT, artificial intelligence, 5G, VR/AR, etc. 159 To improve network performance, we will deliver functions over edge 160 computing, such that computing functions can be dynamically scheduled 161 in a distributed edge computing network. However, when deploying 162 functions to edge servers, multiple resources, including bandwidth, 163 computing and link resources, should be allocated to meet the 164 requirements in terms of latency and throughput. 166 3.2. Benefits of ALTO protocol 168 Application-Layer Traffic Optimization (ALTO) [RFC7285] is designed 169 to provide network information for distributed applications. More 170 specifically, the ALTO server will offer necessary network states and 171 information to guide the resource scheduling process for distributed 172 applications, which cannot retrieve the information by themselves. 173 The ALTO protocol will provide the essential network information, 174 including network traffic, cost map, and cost metrics, which are all 175 necessary in the resource selection process. In this case, the 176 distributed applications are allowed to manage the network traffic, 177 and select a better path with low delay to access the network and 178 process the computation tasks. 180 Since the edge computing clusters are distributed throughout the 181 network, they have different network states, including link delay, 182 topology, network traffic, computing capacity/utilization of each 183 cluster, etc. When delivering functions, the scheduling decisions 184 SHOULD be adaptive to the network states in order to achieve better 185 performance. Therefore, the ALTO protocol can help manage the 186 network information and traffic such that the function can be 187 delivered to a proper edge computing cluster. 189 3.3. List of resources and services/functions 191 Network devices, including routers, servers and clients, are able to 192 communicate with each other. In a realistic network, on the one 193 hand, we have several limited resources, including: 195 o Computing resource: that refers to computing powers of CPUs and 196 GPUs. It is noticeable that CPUs have different architectures, 197 e.g., ARM and x86. The properties of a CPU include the current 198 load, total space and available space, etc. 200 o Link/Path: that refers to physical and logical channels between 201 network devices. A link/path has the properties of bandwidth, 202 communication latency, etc. 204 o Storage: that refers to space to store the data. The property of 205 storage includes the amount of space to save the data. 207 o Radio resource: that refers to radio information in wireless 208 communication systems, e.g., cellular networks, wireless local 209 area networks. Note that in 5G network, radio resources can be 210 reserved by slicing technology. 212 On the other hand, with the development of network technology, we 213 have several network services and functions providing efficient 214 computation service for network users, including: 216 o Software as a service: that provides software services as a 217 platform. SaaS vendors deploy software services on their servers, 218 allowing users to purchase and use the software services. 220 o AI as a service: that provides artificial intelligence services as 221 a platform. Vendors provide different artificial intelligence- 222 based services for different tasks, for example, object detection 223 and big data analysis. 225 o Encoding/Decoding as a service: that provides encoding and 226 decoding services for high-definition and VR/AR videos. 228 o Function as a service: that provides function services as a 229 platform. Functions can be pieces of code or encapsulated docker 230 images. The vendors will expose the function APIs, such that 231 users can access the FaaS services easily. FaaS technology allows 232 network resources to be dynamically allocated to computing 233 clusters. Users can apply for function-based computation services 234 (including object detection, big data analysis, etc.), and avoid 235 the complicated environment configuration and resource management 236 process. 238 o Content: that provides storage services as a platform. Users can 239 store their data in the content service, which allows users to 240 spare their limited local storage and retrieve the data in 241 different terminals. 243 4. Scenario of delivering function 245 Suppose a scenario in Internet of Things (IoT), where surveillance 246 cameras are connected via the Internet that apply object detection 247 computing services. When a camera submits a task, the objection 248 detection function will be delivered to an edge server that handles 249 the task, then returns the results to the camera. The system will 250 request and retrieve the network information, including link delay 251 and other cost metrics, by the ALTO protocols from ALTO servers and 252 clients. According to the information provided by ALTO, the function 253 and task will be delivered to the most appropriate edge server that 254 has the best performance from the cameras. The infrastructure is 255 demonstrated in Figure 1. 257 +---------------+ +-------------------+ 258 | | | | 259 | | | | 260 | ALTO Server |<---------------->| ALTO Client | 261 | | | | 262 | | | | 263 +---------------+ +------^-----+------+ 264 | | 265 | | 266 | | 267 +--+-----v--+ 268 | Cluster | 269 +-------+ Client +------+ 270 | +-----------+ | 271 | | 272 | | 273 | | 274 +------v-------+ +-------v------+ 275 |Edge Computing| |Edge Computing| 276 | | ...... | | 277 | Cluster 1 | | Cluster N | 278 +--------------+ +--------------+ 280 Figure 1. Scenario of delivering function over edge network in IoT 282 5. Delivering functions over edge computing with ALTO protocol 284 Since lots of edge clusters and servers are distributing in the 285 network, the system MUST handle the huge amount of edge devices and 286 their corresponding network traffic. A cluster client is employed to 287 manage the connectivity and traffic information of the distributed 288 edge clusters. The ALTO client will communicate with the cluster 289 client and provide the necessary network information. The usage of 290 ALTO is to optimize the network traffic and guide the function 291 delivering process in edge computing. It will provide the overall 292 network states with information for the distributed edge clusters, 293 and decide the appropriate edge cluster to deploy the functions. 295 More specifically, the ALTO server will collect and compute the 296 network cost metrics; including the link delay, availability, network 297 traffic, bandwidth, and etc. The information will then be sent to 298 the ALTO client. The ALTO client will select the target appropriate 299 edge clusters to deploy the target function. Finally, the system 300 will connect and deploy the function to the target servers, so that 301 users can submit their computation task to the selected edge 302 clusters. 304 +---------------+ +-------------------+ 305 | | (1) Network | | 306 | | Information | | 307 | ALTO Server |<---------------->| ALTO Client | 308 | | | | 309 | | | | 310 +---------------+ +------^-----+------+ 311 | | 312 (2)Get clusters | | (3)Select Cluster List 313 | | 314 +--+-----v--+ 315 | Cluster | 316 +-------+ Client +------+ 317 | +-----------+ | 318 | | 319 | (4) Connect to Cluster | 320 | and deliver function | 321 +------v-------+ +-------v------+ 322 |Edge Computing| |Edge Computing| 323 | | ...... | | 324 | Cluster 1 | | Cluster N | 325 +--------------+ +--------------+ 327 Figure 2. Delivering process in edge computing platform with ALTO 328 Figure 2 illustrates the infrastructure and function delivering 329 process of the edge computing platform. 331 1. The ALTO client requests the information, such as network map 332 and cost map of distributed edge clusters from the ALTO server, by 333 using ALTO protocol. 335 2. The Cluster Client requests an edge cluster list of the 336 network. 338 3. The ALTO Client returns the edge cluster list and 339 corresponding resource information about the clusters computed by 340 ALTO servers according to the network state. 342 4. The Cluster Client connects and delivers function to the 343 corresponding edge computing cluster according to the information, 344 and the cluster will process and return the computation results to 345 users. 347 Note that the data transfer process is using the ALTO protocol 348 described in [RFC7285] to guarantee the efficiency and security of 349 the delivering process. In this case, the edge computing clusters 350 are allowed to retrieve the network information, so that the function 351 can be delivered to the proper ones to achieve a better performance 352 in terms of latency, throughput, etc. 354 6. Implementation and Deployment 356 6.1. Implementation 358 We are inspired by the concept of Serverless Computing, which is a 359 new computing paradigm providing function-based computing services, 360 utilizing containerization technology to run functions. The 361 container, including the running code, library, and data 362 dependencies, will be deployed and orchestrated to target edge 363 servers and clusters by container orchestrator Kubernetes (or K8S). 364 The container orchestration scheme will be computed according to the 365 network information provided by ALTO. 367 We use IBM OpenWhisk as the FaaS platform in edge clusters, where the 368 resources are managed by K8S. Using containerization technology, 369 functions can be flexibly delivered to the target edge server. When 370 a user request for function-based edge computing services, its 371 request will be redirected to the edge server for better performance. 373 6.2. Deployment 375 We have implemented a prototype, and are deploying it in real 376 networks of Zhejiang Province, China. The initial results show that, 377 1) the performance of edge computing will be greatly improved with 378 the provided underlying network information; 2) the information 379 collection and scheduling policies need to be standardized to achieve 380 coordination among different domains. 382 6.3. ALTO Integration 384 T.B.D. 386 7. Management of Functions 388 To manage the functions more efficiently, we introduce the function 389 standardization in our system. More specifically, functions in our 390 system can be standardized, and also expose the standard APIs, such 391 that users can access and apply for function-based computation 392 services very easily. On top of them, the specific function codes 393 and docker images can be updated and replaced according to standards 394 and user demands, which is beneficial to function management of the 395 platform. 397 More specifically, function standardization consists of: 399 o Function repository: The repository stores all the functions for 400 users to apply for. 402 o Function registry/discovery: A service MUST be registered at the 403 beginning. After registry, the service information will be 404 broadcast to a registry server. In this case, when delivering the 405 functions, by accessing the registry server, the system will know 406 which node is registered with the function information, such that 407 system can determine the appropriate node to deliver the 408 functions. 410 o Function status update: When there are updates, functions in all 411 the network nodes MUST be updated accordingly. 413 Note that function standardization is beneficial to the function 414 delivery. By exposing the standard APIs, users can easily accomplish 415 their tasks by sending requests to the interfaces of the system, 416 bypassing the complicated resource deployment and configuration 417 process. Meanwhile, function standardization is good for system 418 management. Each function in the platform is saved and registered in 419 specific edge servers, such that users can easily locate the target 420 edge servers when applying for functions, and system operators can 421 update or replace the target functions easily. 423 8. Multi-domain System 425 A function delivery platform can be a multi-domain system. For 426 example, there may be multiple service providers offering the 427 function-based computation service. In this case, we should consider 428 how to collect and manage the network information from different 429 domains, in order to achieve better function delivery performance in 430 networks. Consequently, we SHOULD develop additional designs for our 431 platform. 433 On the one hand, we introduce the layered design for function 434 delivery. More specifically, we deploy multiple distributed registry 435 servers in the lower layer, each of which processes the function 436 registry in its domain. Then we deploy a centralized registry server 437 in the upper layer to collect and manage the distributed registry 438 servers in the lower layer. A server in the lower layer will report 439 and send network information of its domain to the centralized server 440 in the upper layer periodically. And the centralized server will 441 coordinate the domains by sending instructions to the distributed 442 servers in the lower layer, which will make adjustment according to 443 the instructions of the centralized registry server. In this case, 444 the centralized registry server is able to manage the distributed 445 function and network information easily and efficiently, which is 446 beneficial to multi-domain system management. 448 On the other hand, we introduce the policy management for multiple 449 domains. Note that different domains MAY have various delivery 450 policies, thus we need to provide a policy management tool for 451 multiple domains. When delivering functions in a multi-domain 452 system, the tool will provide the overall management policy to 453 synchronize and coordinate the distributed local policies in each 454 individual domain. In this case, the distributed multiple domains in 455 different policies are able to communicate and coordinate with each 456 other, with the help of the policy management tool. Therefore, by 457 utilizing the policy management tool, we can manage the multiple 458 domains for efficient function delivery. 460 9. Scheduling Framework 462 Recently, with the development of high-capacity computing devices, 463 the computing power of networks has improved much. However, due to 464 the lack of efficient scheduling strategies, the current computing 465 platforms cannot achieve better computing throughput, i.e., the 466 ability to schedule the distributed computing power over a long 467 period. To improve the scheduling efficiency of the computing power, 468 researchers proposed some high-throughput computing scheduling 469 frameworks, for example, HTCondor, PBS, CPUsage, etc., which are able 470 to schedule the limited distributed computing power to achieve better 471 throughput of the network in a long period. Inspired by the high- 472 throughput computing scheduling frameworks, we develop the scheduling 473 framework for function delivery, in order to achieve better 474 performance of networks. 476 The objective of our scheduling framework for function delivery is to 477 minimize the computational latency. The basic idea is, our platform 478 will compute the function scheduling schemes, according to the 479 information collected by the ALTO server, including the network 480 congestion, resource utilization, etc. The users will access the 481 most appropriate edge server, which will provide the function-based 482 computation service and return the results to the users. 484 More specifically, when a user applies for the function delivery 485 service, it will send requests to the interface provided by the ALTO 486 server, along with its location and task information. The ALTO 487 server will also collect the resource utilization and network 488 information of the decentralized edge servers. Then, according to 489 the collected information, the ALTO server will compute the function 490 scheduling scheme, to determine the function delivery destination of 491 a specific edge server. The platform will select the edge server 492 with lowest computation latency for user. However, if the selected 493 edge server is overloaded, the platform will proceed to search other 494 edge server that satisfies the load balance demand, along with 495 achieving considerable latency performance. Finally, the user will 496 establish the communication channel with the target edge server, 497 which will provide the function-based service and return the results 498 to the users. 500 By developing the scheduling framework and strategy for function 501 delivery, our platform can maintain the stable network condition and 502 guarantee the load balance over a long period, which is beneficial to 503 the reliability of system. And users can enjoy a low-latency and 504 high-throughput function delivery service at the same time. 506 10. Security Considerations 508 T.B.D. 510 11. IANA Considerations 512 This document includes no requests to IANA. 514 12. References 516 12.1. Normative References 518 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 519 Requirement Levels", March 1997. 521 [RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic 522 Optimization (ALTO) Problem Statement", RFC 5693, 523 DOI 10.17487/RFC5693, October 2009, 524 . 526 [RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., 527 Previdi, S., Roome, W., Shalunov, S., and R. Woundy, 528 "Application-Layer Traffic Optimization (ALTO) Protocol", 529 RFC 7285, DOI 10.17487/RFC7285, September 2014, 530 . 532 12.2. Informative References 534 [I-D.ietf-alto-unified-props-new] 535 Roome, W., Randriamasy, S., Yang, Y., Zhang, J., and K. 536 Gao, "Unified Properties for the ALTO Protocol", draft- 537 ietf-alto-unified-props-new-09 (work in progress), 538 September 2019. 540 Authors' Addresses 542 Shu Yang 543 Shenzhen University 544 South Campus, Shenzhen University 545 Shenzhen 518060 546 P.R. China 548 Phone: +86-755-2653-4078 549 Email: yang.shu@szu.edu.cn 551 Laizhong Cui 552 Shenzhen University 553 South Campus, Shenzhen University 554 Shenzhen 518060 555 P.R. China 557 Phone: +86-755-8695-6280 558 Email: cuilz@szu.edu.cn 559 Mingwei Xu 560 Tsinghua University 561 Department of Computer Science, Tsinghua University 562 Beijing 100084 563 P.R. China 565 Phone: +86-10-6278-5822 566 Email: xumw@tsinghua.edu.cn 568 Y.R. Yang 569 Yale University/PCL 570 51 Prospect Street 571 New Haven, CT 06511 572 United States of America 574 Email: yry@cs.yale.edu 575 URI: http://www.cs.yale.edu/~yry/ 577 Rui Huang 578 Research Institute of Tsinghua University in Shenzhen 579 Nanshan Hi-new Technology and Industry Park 580 Shenzhen 518060 581 P.R. China 583 Email: xw09@tsinghua.org.cn