idnits 2.17.1 draft-yang-alto-deliver-functions-over-networks-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (21 March 2022) is 767 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 5693 == Outdated reference: A later version (-24) exists of draft-ietf-alto-unified-props-new-09 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ALTO WG S. Yang 3 Internet-Draft L. Cui 4 Intended status: Standards Track Shenzhen University 5 Expires: 22 September 2022 M. Xu 6 Tsinghua University 7 Y. Yang 8 Yale University 9 W. Xiao 10 Research Institute of Tsinghua University in Shenzhen 11 21 March 2022 13 Delivering Functions over Networks: Traffic and Performance Optimization 14 for Edge Computing using ALTO 15 draft-yang-alto-deliver-functions-over-networks-03 17 Abstract 19 As the rapid development of internet, massive data are produced. 20 Service providers typically need to deploy services near the edge 21 networks to better satisfy user_s demand. In order to obtain better 22 quality of the networks, computing functions and user traffic need to 23 be scheduled properly. However, it is challenging to efficiently 24 schedule resources among the distributed edge servers because of the 25 lack of network information, such as network topology, traffic 26 distribution, link delay/bandwidth and utilization/capability of 27 computing servers. In this standard, we employed the ALTO protocol 28 to help deliver functions and schedule traffic at the edge computing 29 platform. This protocol supplied information of multiple resources 30 for the distributed edge computing platform, thus enhancing the 31 efficiency of function delivery in edge computing platform. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at https://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 22 September 2022. 50 Copyright Notice 52 Copyright (c) 2022 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Revised BSD License text as 61 described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Revised BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 68 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 3.1. Edge computing . . . . . . . . . . . . . . . . . . . . . 4 70 3.2. Features of ALTO protocol . . . . . . . . . . . . . . . . 4 71 3.3. Resources and services/functions . . . . . . . . . . . . 5 72 4. Scenario of delivering function . . . . . . . . . . . . . . . 6 73 5. Delivering functions by ALTO over edge computing . . . . . . 7 74 6. Implementation and Deployment . . . . . . . . . . . . . . . . 9 75 7. Management of Functions . . . . . . . . . . . . . . . . . . . 9 76 8. Multi-domain System . . . . . . . . . . . . . . . . . . . . . 10 77 9. Scheduling Framework . . . . . . . . . . . . . . . . . . . . 11 78 10. Security Considerations . . . . . . . . . . . . . . . . . . . 12 79 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 80 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 81 12.1. Normative References . . . . . . . . . . . . . . . . . . 12 82 12.2. Informative References . . . . . . . . . . . . . . . . . 12 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 85 1. Introduction 87 For recent years internet has been developing rapidly since it is 88 promising to be applied in industrial upgrading. In many scenarios 89 of industrial internet, massive data are produced and require to be 90 processed with high efficiency in real time. Typically, various 91 functions or services are delivered in these scenarios according to 92 the users_ demand. These functions or services could be (1) 93 surveillance videos that need analysis by AI, (2) Hi-Definition 94 videos that require to be encoded/decoded and (3) contents stored in 95 the edge network. It is noted that functions and services are 96 deployed widely in industrial internet. For instance, Function as a 97 service (FaaS) is applied and delivered more and more frequently in 98 the cloud computation service for industrial internet. 100 Many functions and services demand high quality of the supporting 101 networks. For instance, the delay and jitter are expected to be as 102 tiny as possible in order to obtain good user_s experience. 103 Different from Kubernets and Mesos that are able to efficiently 104 schedule computing resources in a single computing cluster, 105 deployment of functions in wide area networks usually encounters much 106 more complexity. 108 Many resources, including network traffic, bandwidth, topology, link 109 delay and the computing capacity/utilization of each computing 110 cluster, should be taken in to considerations when functions being 111 deployed over distributed networks. This network status or 112 information requires to be collected with unified interfaces and 113 protocols because the resources typically need to be scheduled 114 crossing different domains for satisfying user_s demands. In 115 addition, resources scheduling algorithms SHOULD and network 116 performances, such as load balancing, also needs to be optimized to 117 improve user_s experience. In this standard, we propose an efficient 118 method to utilize the computing and network resources by delivering 119 functions over the edge computing networks. 121 We use the ALTO (Application-Layer Traffic Optimization) [RFC7285] to 122 optimize network traffic and performance by delivering functions over 123 the edge computing network. ALTO can provide global network 124 information for the distributed applications, while the information 125 can not be retrieved or computed by the applications themselves 126 [RFC5693]. Specifically, the information of network for the 127 distributed edge cluster, such as network traffic, link delay and 128 other cost metrics, is collected and computed by ALTO. Subsequently, 129 by using the pre-defined scheduling algorithms, functions will be 130 delivered by system to the most suitable edge clusters based on the 131 information offered by ALTO. 133 For brevity, in this document, we will use the terminologies 134 introduced in [RFC7285] and [I-D.ietf-alto-unified-props-new]. 136 2. Conventions and Terminology 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 140 document are to be interpreted as described in [RFC2119]. 142 3. Background 143 3.1. Edge computing 145 There are many applications and scenario of network that are very 146 sensitive to the quality of network. For instance, remote 147 controlling for machine tool typically needs enhanced broadband and 148 low-latency communication in real time. However, the scale of the 149 data generated by the equipment or sensors are usually very large (up 150 to be hundreds of GB per day). In addition, the data produced from 151 various equipment will require support of heterogeneous computing. 152 In this case, uploading these data to the centralized cloud 153 computation platform is difficult, expensive but not so useful. 154 Similarly, delivering functions or services from the cloud 155 computation platform to the local equipment is also hard and the 156 real-time performance will not be guaranteed. Alternatively, new 157 framework of computation and network SHOULD be adopted to address 158 these problems that hinder the edge computation to acquire broader 159 promotion. 161 Edge computing was designed to enhance the quality of network, 162 including the packet loss, security, bandwidth and latency. In order 163 to decrease the distance between the servers and users, people 164 typically will deploy servers at the edge nearing the users in the 165 edge computing infrastructure. In this framework, User_s tasks can 166 be submitted to the edge servers that will handle with the tasks 167 close to the user and return the computation output back to the user 168 promptly. As a consequence, the latency, bandwidth and network 169 traffic performance of edge computing will be better than that of the 170 centralized cloud computing. With these advantages, edge computing 171 has been applied in various applications in network, such as AI 172 supporting HD videos and VR/AR. 174 In this standard, functions and services will be delivered over the 175 edge computing so that we can schedule the computing functions 176 dynamically in a distributed edge computing network to enhance the 177 performance of network. Nevertheless, it should be noted that during 178 the deployment of the functions to edge servers, there are multiple 179 resources, such as bandwidth, computing and link resources, that 180 SHOULD be allocated to satisfy the requirements in terms of latency 181 and throughput. 183 3.2. Features of ALTO protocol 185 Application-Layer Traffic Optimization (ALTO) [RFC7285] is designed 186 to provide network information for distributed applications. 187 Specifically, the resource scheduling process for distributed 188 applications will be guided by the network states and information 189 that is provided by the ALTO server. Otherwise, the information will 190 fail to be retrieved by the applications. In this case, much 191 essential network information in the resource selection process, such 192 as network traffic, cost map, and cost metrics, will be offered by 193 the ALTO protocol. As a consequence, network traffic can be managed 194 by the distributed applications. In addition, they could make better 195 choice in selecting the path that contains lower delay to access the 196 network and handle with the computation tasks. 198 Because of being distributed over different areas of network, the 199 edge computing clusters would comprise various network states, such 200 as topology, network traffic, link delay and the computing capacity/ 201 utilization. Generally, in order to obtain better performance, the 202 scheduling decisions require to be adaptive to the network states 203 during the delivering of functions. Thus, the network information 204 and traffic can be managed by ALTO according to its mentioned 205 advantage. As a consequence, functions and services will be 206 delivered to a proper edge computing cluster. 208 3.3. Resources and services/functions 210 Generally, we have both limited resources and massive network 211 services/functions on the network. Some common limited resources are 212 as below. 214 * Computing resource: it usually represents the computing powers of 215 CPU or GPU. CPUs have different architectures, including ARM and 216 x86. The power of a CPU is affected by the working status, such 217 as current load, total space and available space. 219 * Link/Path: They are a physical or logical communication channel 220 between network devices, such as routers, servers and clients. 221 Properties of a link or path include hop count, bandwidth and 222 communication latency. 224 * Storage: that refers to space to store the data. The property of 225 storage includes the amount of space to save the data. 227 * Radio resource: It means the radio information in wireless 228 communication systems, such as cellular networks and wireless 229 local area networks. In 5G network, radio resources can be 230 combined with slicing technology to provide customized network 231 property for user_s differentiated network demand. 233 Except the limited resources above, as the network technology rapidly 234 develops, there are many network services and functions that could 235 supply abundant computation service to users. 237 * Software as a service: It supplies software services as a 238 platform. Software services, such as wechat and Alipay, are 239 deployed on the SaaS vendors_ servers, for users to access to and 240 use. 242 * AI as a service: it supplies artificial intelligence services as a 243 platform. Various artificial intelligence-based services, such as 244 face recognition, speech recognition and big data analysis, are 245 provided by vendors. 247 * Encoding/Decoding as a service: it supplies encoding and decoding 248 services for high-definition videos and VR/AR videos. It has been 249 widely applied in areas of smart city and online entertainment. 251 * Function as a service: it supplies function services as a 252 platform. Functions can be encapsulated docker images or pieces 253 of code. Usually, in order to let user access to FaaS easily and 254 conveniently, the vendor will expose the function APIs. One of 255 the main features of FaaS is allowing network resources to be 256 dynamically allocated to computing clusters. Therefore, users do 257 not need to handle with the complicated environment configuration 258 and resource management process, as they can utilize the function- 259 based computation services, such as face recognition, speech 260 recognition and big data analysis. 262 * Content: it supplies storage services as a platform. Utilizing 263 the content service, users can store their data such that users 264 could spare their limited local storage. Meanwhile, users can 265 retrieve the data from different terminals. 267 4. Scenario of delivering function 269 Suppose a scenario in IoT, in which unmanned aerial vehicle (UAV) are 270 connected via the network to apply for the face recognition computing 271 services. When a UAV submits a task, the face recognition function 272 will be delivered to an edge server to process the task. Then the 273 recognition results will be returned to the UAV. During this 274 process, network information, such as link delay and other cost 275 metrics, would be requested and retrieved by the ALTO protocols from 276 ALTO servers and clients in the network system. According to the 277 information supplied by ALTO, the function and task will be delivered 278 to the most suitable edge server that would provide best performance 279 to the UAVs. The schematic diagram of this framework is shown in 280 Figure 1 below. 282 +---------------+ +-------------------+ 283 | | | | 284 | | | | 285 | ALTO Server |<---------------->| ALTO Client | 286 | | | | 287 | | | | 288 +---------------+ +------^-----+------+ 289 | | 290 | | 291 | | 292 +--+-----v--+ 293 | Cluster | 294 +-------+ Client +------+ 295 | +-----------+ | 296 | | 297 | | 298 | | 299 +------v-------+ +-------v------+ 300 |Edge Computing| |Edge Computing| 301 | | ...... | | 302 | Cluster 1 | | Cluster N | 303 +--------------+ +--------------+ 305 Figure 1. Scenario of delivering function over edge network in IoT 307 5. Delivering functions by ALTO over edge computing 309 Since lots of edge clusters and servers are distributing in the 310 network, the system MUST handle the huge amount of edge devices and 311 their corresponding network traffic. A cluster client is employed to 312 manage the connectivity and traffic information of the distributed 313 edge clusters. The ALTO client will communicate with the cluster 314 client and provide the necessary network information. The usage of 315 ALTO is to optimize the network traffic and guide the function 316 delivering process in edge computing. It will provide the overall 317 network states with information for the distributed edge clusters, 318 and decide the appropriate edge cluster to deploy the functions. 320 More specifically, the ALTO server will collect and compute the 321 network cost metrics; including the link delay, availability, network 322 traffic, bandwidth, and etc. The information will then be sent to 323 the ALTO client. The ALTO client will select the target appropriate 324 edge clusters to deploy the target function. Finally, the system 325 will connect and deploy the function to the target servers, so that 326 users can submit their computation task to the selected edge 327 clusters. 329 +---------------+ +-------------------+ 330 | | (1) Network | | 331 | | Information | | 332 | ALTO Server |<---------------->| ALTO Client | 333 | | | | 334 | | | | 335 +---------------+ +------^-----+------+ 336 | | 337 (2)Get clusters | | (3)Select Cluster List 338 | | 339 +--+-----v--+ 340 | Cluster | 341 +-------+ Client +------+ 342 | +-----------+ | 343 | | 344 | (4) Connect to Cluster | 345 | and deliver function | 346 +------v-------+ +-------v------+ 347 |Edge Computing| |Edge Computing| 348 | | ...... | | 349 | Cluster 1 | | Cluster N | 350 +--------------+ +--------------+ 352 Figure 2. Delivering process in edge computing platform with ALTO 354 Figure 2 illustrates the infrastructure and function delivering 355 process of the edge computing platform. 357 1. The ALTO client requests the information, such as network map 358 and cost map of distributed edge clusters from the ALTO server, by 359 using ALTO protocol. 361 2. The Cluster Client requests an edge cluster list of the 362 network. 364 3. The ALTO Client returns the edge cluster list and 365 corresponding resource information about the clusters computed by 366 ALTO servers according to the network state. 368 4. The Cluster Client connects and delivers function to the 369 corresponding edge computing cluster according to the information, 370 and the cluster will process and return the computation results to 371 users. 373 Note that the data transfer process is using the ALTO protocol 374 described in [RFC7285] to guarantee the efficiency and security of 375 the delivering process. In this case, the edge computing clusters 376 are allowed to retrieve the network information, so that the function 377 can be delivered to the proper ones to achieve a better performance 378 in terms of latency, throughput, etc. 380 6. Implementation and Deployment 382 We have implemented a prototype, where the edge cluster resources are 383 managed by K8S and Docker. When users request for edge computing 384 services, the appropriate edge cluster will be selected by ALTO 385 according to network map and cost information. 387 We have deployed the prototype in real network in the China Mobile 388 network. The preliminary results of our deployment show that 1) the 389 performance of edge computing will be greatly improved by using the 390 supplied network information and 2) collection and scheduling 391 policies of this information need to be standardized to obtain 392 coordination among different domains. 394 7. Management of Functions 396 As function standardization in our system could be useful in managing 397 functions efficiently, We will introduce this technique as below. 398 Our system can standardize the functions and expose standard APIs for 399 users to easily access to and apply for function-based computation 400 services. Above the function-based computation services, certain 401 function codes and docker images can be updated and replaced based on 402 the user_s demand or standard upgrade. This would be useful to the 403 function management of the platform. Specifically, function 404 standardization is composed of several steps below. 406 Specifically, function standardization consists of: 408 * Function repository: Generate the repository that stores all the 409 functions for users to apply for. 411 * Function registry/discovery: A service MUST be registered before 412 being applied for. After the registry, the information of service 413 will be broadcast to a registry server. In this circumstance, 414 when delivering the functions, the system will recognize which 415 node is registered with the function information by accessing to 416 the registry server. As a consequence, appropriate node can be 417 determined by our system in order to deliver functions with high 418 efficiency. 420 * Function status update: When there are updates, functions in all 421 the network nodes MUST be updated accordingly. 423 As function standardization is powerful in delivering function, users 424 can conveniently process their tasks by sending requests to the 425 interfaces of the system in which the standard APIs are exposed by 426 the process of function standardization. As described above, this 427 mechanism could help users to bypass the complexity of resource 428 deployment and configuration. In addition, the function 429 standardization is also useful in managing the system, because system 430 operators can update or replace the target functions more 431 efficiently. When users applying for functions, they can easily 432 locate the target edge servers since each function on the platform is 433 saved and registered in certain c edge servers before. 435 8. Multi-domain System 437 A function delivery platform can be a multi-domain system. For 438 example, there may be multiple service providers offering the 439 function-based computation service. In this case, we should consider 440 how to collect and manage the network information from different 441 domains, in order to achieve better function delivery performance in 442 networks. Consequently, we SHOULD develop additional designs for our 443 platform. 445 On the one hand, we introduce the layered design for function 446 delivery. More specifically, we deploy multiple distributed registry 447 servers in the lower layer, each of which processes the function 448 registry in its domain. Then we deploy a centralized registry server 449 in the upper layer to collect and manage the distributed registry 450 servers in the lower layer. A server in the lower layer will report 451 and send network information of its domain to the centralized server 452 in the upper layer periodically. And the centralized server will 453 coordinate the domains by sending instructions to the distributed 454 servers in the lower layer, which will make adjustment according to 455 the instructions of the centralized registry server. In this case, 456 the centralized registry server is able to manage the distributed 457 function and network information easily and efficiently, which is 458 beneficial to multi-domain system management. 460 On the other hand, we introduce the policy management for multiple 461 domains. Note that different domains MAY have various delivery 462 policies, thus we need to provide a policy management tool for 463 multiple domains. When delivering functions in a multi-domain 464 system, the tool will provide the overall management policy to 465 synchronize and coordinate the distributed local policies in each 466 individual domain. In this case, the distributed multiple domains in 467 different policies are able to communicate and coordinate with each 468 other, with the help of the policy management tool. Therefore, by 469 utilizing the policy management tool, we can manage the multiple 470 domains for efficient function delivery. 472 9. Scheduling Framework 474 Recently, with the development of high-capacity computing devices, 475 the computing power of networks has improved much. However, due to 476 the lack of efficient scheduling strategies, the current computing 477 platforms cannot achieve better computing throughput, i.e., the 478 ability to schedule the distributed computing power over a long 479 period. To improve the scheduling efficiency of the computing power, 480 researchers proposed some high-throughput computing scheduling 481 frameworks, for example, HTCondor, PBS, CPUsage, etc., which are able 482 to schedule the limited distributed computing power to achieve better 483 throughput of the network in a long period. Inspired by the high- 484 throughput computing scheduling frameworks, we develop the scheduling 485 framework for function delivery, in order to achieve better 486 performance of networks. 488 The objective of our scheduling framework for function delivery is to 489 minimize the computational latency. The basic idea is, our platform 490 will compute the function scheduling schemes, according to the 491 information collected by the ALTO server, including the network 492 congestion, resource utilization, etc. The users will access the 493 most appropriate edge server, which will provide the function-based 494 computation service and return the results to the users. 496 More specifically, when a user applies for the function delivery 497 service, it will send requests to the interface provided by the ALTO 498 server, along with its location and task information. The ALTO 499 server will also collect the resource utilization and network 500 information of the decentralized edge servers. Then, according to 501 the collected information, the ALTO server will compute the function 502 scheduling scheme, to determine the function delivery destination of 503 a specific edge server. The platform will select the edge server 504 with lowest computation latency for user. However, if the selected 505 edge server is overloaded, the platform will proceed to search other 506 edge server that satisfies the load balance demand, along with 507 achieving considerable latency performance. Finally, the user will 508 establish the communication channel with the target edge server, 509 which will provide the function-based service and return the results 510 to the users. 512 By developing the scheduling framework and strategy for function 513 delivery, our platform can maintain the stable network condition and 514 guarantee the load balance over a long period, which is beneficial to 515 the reliability of system. And users can enjoy a low-latency and 516 high-throughput function delivery service at the same time. 518 10. Security Considerations 520 T.B.D. 522 11. IANA Considerations 524 This document includes no requests to IANA. 526 12. References 528 12.1. Normative References 530 [RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic 531 Optimization (ALTO) Problem Statement", RFC 5693, 532 DOI 10.17487/RFC5693, October 2009, 533 . 535 [RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., 536 Previdi, S., Roome, W., Shalunov, S., and R. Woundy, 537 "Application-Layer Traffic Optimization (ALTO) Protocol", 538 RFC 7285, DOI 10.17487/RFC7285, September 2014, 539 . 541 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 542 Requirement Levels", March 1997. 544 12.2. Informative References 546 [I-D.ietf-alto-unified-props-new] 547 Roome, W., Randriamasy, S., Yang, Y., Zhang, J., and K. 548 Gao, "Unified Properties for the ALTO Protocol", Work in 549 Progress, Internet-Draft, draft-ietf-alto-unified-props- 550 new-09, 4 September 2019, . 553 Authors' Addresses 554 Shu Yang 555 Shenzhen University 556 South Campus, Shenzhen University 557 Shenzhen 558 518060 559 P.R. China 560 Phone: +86-755-2653-4078 561 Email: yang.shu@szu.edu.cn 563 Laizhong Cui 564 Shenzhen University 565 South Campus, Shenzhen University 566 Shenzhen 567 518060 568 P.R. China 569 Phone: +86-755-8695-6280 570 Email: cuilz@szu.edu.cn 572 Mingwei Xu 573 Tsinghua University 574 Department of Computer Science, Tsinghua University 575 Beijing 576 100084 577 P.R. China 578 Phone: +86-10-6278-5822 579 Email: xumw@tsinghua.edu.cn 581 Richard Yang 582 Yale University 583 51 Prospect St 584 New Haven, CT, 06511 585 United States of America 586 Email: yry@cs.yale.edu 588 Wei Xiao 589 Research Institute of Tsinghua University in Shenzhen 590 Nanshan Hi-new Technology and Industry Park 591 Shenzhen 592 518060 593 P.R. China 594 Email: xiaow@tsinghua-sz.org