idnits 2.17.1 draft-hong-nmrg-ai-deploy-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 45 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 2022) is 772 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-10) exists of draft-irtf-t2trg-iot-edge-04 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y-G. Hong 3 Internet-Draft Daejeon University 4 Intended status: Informational S-B. Oh 5 Expires: 7 September 2022 KSA 6 S-J. Lee 7 Korea University/KT 8 H-K. Kahng 9 Korea University 10 March 2022 12 Considerations of deploying AI services in a distributed approach 13 draft-hong-nmrg-ai-deploy-00 15 Abstract 17 As the development of AI technology matured and AI technology began 18 to be applied in various fields, AI technology is changed from 19 running only on very high-performance servers with small hardware, 20 including microcontrollers, low-performance CPUs and AI chipsets. In 21 this document, we consider how to configure the system in terms of AI 22 inference service to provide AI service in a distributed approach. 23 Also, we describe the points to be considered in the environment 24 where a client connects to a cloud server and an edge device and 25 requests an AI service. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on 2 September 2022. 44 Copyright Notice 46 Copyright (c) 2022 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Revised BSD License text as 55 described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Revised BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Procedure to provide AI services . . . . . . . . . . . . . . 5 62 3. Network configuration structure to provide AI services . . . 6 63 3.1. AI inference service on Local machine . . . . . . . . . . 6 64 3.2. AI inference service on Cloud server . . . . . . . . . . 7 65 3.3. AI inference service on Edge device . . . . . . . . . . . 8 66 3.4. AI inference service on Cloud server and Edge device . . 9 67 4. Considerations when configuring a system to provide AI 68 services . . . . . . . . . . . . . . . . . . . . . . . . 10 69 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 70 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 71 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 72 8. Informative References . . . . . . . . . . . . . . . . . . . 12 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 75 1. Introduction 77 In the Internet of Things (IoT), the amount of data generated from 78 IoT devices has exploded along with the number of IoT devices due to 79 industrial digitization and the development and dissemination of new 80 devices. Various methods are being tried to effectively process the 81 explosively increasing IoT devices and data of IoT devices. One of 82 them is to provide IoT services in a place located close to IoT 83 devices and users, away from cloud computing that transmits all data 84 generated from IoT devices to a cloud 85 server[I-D.irtf-t2trg-iot-edge]. 87 IoT services also started to break away from the traditional method 88 of analyzing IoT data collected so far in the cloud and delivering 89 the analyzed results back to IoT objects or devices. In other words, 90 AIoT (Artificial Intelligence of Things) technology, a combination of 91 IoT technology and artificial intelligence (AI) technology, started 92 to be discussed at international standardization organizations such 93 as ITU-T. AIoT technology, discussed by the ITU-T CG-AIoT group, is 94 defined as a technology that combines AI technology and IoT 95 infrastructure to achieve more efficient IoT operations, improve 96 human-machine interaction, and improve data management and 97 analysis[CG-AIoT]. 99 The first work started by the IETF to apply IoT technology to the 100 Internet was to research a lightweight protocol stack instead of the 101 existing TCP/IP protocol stack so that various types of IoT devices, 102 not traditional Internet terminals, could access the Internet. It 103 was a technology that made it possible to connect to the 104 Internet[RFC6574][RFC7452]. These technologies have been developed 105 by 6LoWPAN working group, 6lo working group, 6tisch working group, 106 core working group, t2trg group, etc. As the development of AI 107 technology matured and AI technology began to be applied in various 108 fields, just as IoT technology was mounted on resource-constrained 109 devices and connected to the Internet, AI technology is also changed 110 from running only on very high-performance servers with the old GPU 111 installed. The technology is being developed to run on small 112 hardware, including microcontrollers, low-performance CPUs and AI 113 chipsets. This technology development direction is called On-device 114 AI or TinyML[tinyML]. 116 In this document, we consider how to configure the system in terms of 117 AI inference service to provide AI service in the IoT environment. 118 In the IoT environment, the technology of collecting sensing data 119 from various sensors and delivering it to the cloud has already been 120 studied by many standardization organizations including the IETF and 121 many standards have been developed. Now, after creating an AI model 122 to provide AI services based on the collected data, how to configure 123 this AI model as a system has become the main research goal. Until 124 now, it has been common to develop AI services that collect data and 125 perform inferences from the trained servers, but in terms of the 126 spread and spread of AI services, it is not appropriate to use 127 expensive servers to provide AI services. In addition, since the 128 server that collects and trains data mainly exists in the form of a 129 cloud server, there are also many problems in proceeding in the form 130 of requesting AI service by connecting a large number of terminals to 131 these cloud servers to provide AI services. Therefore, when an AI 132 service is requested to an edge device located at a close distance, 133 it may have effects such as real-time service support, network 134 traffic reduction, and important data security rather than requesting 135 an AI service to an AI server located in a distant 136 cloud.[I-D.irtf-t2trg-iot-edge] 138 Even if an edge device is used to serve AI services, it is still 139 important to connect to an AI server in the cloud for tasks that take 140 a lot of time or require a lot of data. Therefore, an offloading 141 technique for properly distributing the workload between the cloud 142 server and the edge device is also a field that is being actively 143 studied. In this contribution, in the following proposed network 144 structure, the points to be considered in the environment where a 145 client connects to a server and an edge device and requests an AI 146 service are derived and described. That is, the following 147 considerations and options could be derived. 149 * AI inference service execution entity 151 * Hardware specifications of the machine to perform AI inference 152 services 154 * Selection of AI models to perform AI inference services 156 * A method of providing AI services from cloud servers or edge 157 devices 159 * Communication method to transmit data to request AI inference 160 service 162 2. Procedure to provide AI services 164 Since research on AI services has been started for a long time, there 165 may be shapes to provide various types of AI services. However, due 166 to the nature of AI technology, in general, a system for providing AI 167 services consists of the following 168 steps[AI_inference_archtecture][Google_cloud_iot]. 170 +-----------+ +-----------+ +-----------+ +-----------+ +-----------+ 171 | Collect & | | Analysis &| | Train | | Deploy & | | Monitor & | 172 | Store |->| Preprocess|->| AI model |->| Inference |->| Maintain | 173 | data | | data | | | | AI model | | Accuracy | 174 +-----------+ +-----------+ +-----------+ +-----------+ +-----------+ 175 |<--------->| |<------------------------>| |<--------->| |<--------->| 176 Sensor, DB AI Server Target AI Server & 177 machine Target machine 178 |<---------------->|<--------------------->|<-------------->|<--------->| 179 Interent Local Internet Local & 180 Internet 182 Figure 1: AI service workflow 184 * Data collection & Store 186 * Data Analysis & Preprocess 188 * AI Model Training 190 * AI Model Deploy & Inference 192 * Monitor & Maintain Accuracy 194 In the data collection step, data required for training is prepared 195 by collecting data from sensors and IoT devices or by using data 196 stored in a database. Equipment involved in this step includes 197 sensors, IoT devices and servers that store them, and database 198 servers. Since the operations performed at this step are conducted 199 through the Internet, many IoT technologies studied by the IETF so 200 far have developed technologies suitable for this step. 202 In the data analysis and pre-processing step, the features of the 203 prepared data are analyzed and pre-processing for training is 204 performed. Equipment involved in this step includes a high- 205 performance server equipped with a GPU and a database server, and is 206 mainly performed in the local network. 208 In the model training step, a training model is created by applying 209 an algorithm suitable for the characteristics of the data and the 210 problem to be solved. Equipment involved in this step includes a 211 high-performance server equipped with a GPU, and is mainly performed 212 on a local network. 214 In the model deploying and inference service provision step, the 215 problem to be solved (e.g., classification, regression problem) is 216 solved using AI technology. Equipment involved in this step may 217 include a target machine, a client, a cloud, etc. that provide AI 218 services, and since various equipment is involved in this stage, it 219 is conducted through the Internet. This document summarizes the 220 factors to be considered at this step. 222 In the accuracy monitoring step, if the performance deteriorates due 223 to new data, a new model is created through re-training, and the AI 224 service quality is maintained by using the newly created model. This 225 step is the same as described in the model training, model deploying, 226 and inference service provision steps described in the previous step 227 because re-training and model deploying are performed again. 229 3. Network configuration structure to provide AI services 231 In general, after training the AI model, the AI model can be built on 232 a local machine for AI model deploying and inference services to 233 provide AI services. Alternatively, we can place AI models on cloud 234 servers or edge devices and make AI service requests remotely. In 235 addition, for overall service performance, some AI service requests 236 to the cloud server and some AI service requests to edge devices can 237 be performed through appropriate load balancing. 239 3.1. AI inference service on Local machine 241 The following figure shows a case where a client module requesting AI 242 service on the same local machine requests AI service from an AI 243 server module on the same machine. 245 +---------------------------------------------------------------------+ 246 | | 247 | +-----------------+ Request AI +-----------------+ | 248 | | Client module | Inference service | Server module | | 249 | | for AI service |----------------------->| for AI service | | 250 | | |<-----------------------| | | 251 | +-----------------+ Reply AI +-----------------+ | 252 | Inference result | 253 +---------------------------------------------------------------------+ 254 Local machine 256 Figure 2: AI inference service on Local machine 258 This method is often used when configuring a system focused on 259 training AI models to improve the inference accuracy and performance 260 of AI models without considering AI services or AI model deploying 261 and inference in particular. In this case, since the client module 262 that requests the AI inference service and the AI server module that 263 directly performs the AI inference service are on the same machine, 264 it is not necessary to consider the communication/network environment 265 or service provision method too much. Alternatively, this method can 266 be used when we want to simply decorate the AI inference service on 267 one machine without changing the AI service in the future, such as an 268 embedded machine or a customized machine. 270 In this case, a high level of hardware performance is not required to 271 train the AI model, but hardware performance sufficient to run the AI 272 inference service is required, so it is possible on a machine with a 273 certain amount of hardware performance. 275 3.2. AI inference service on Cloud server 277 The following figure shows the case where the client module that 278 requests AI service and the AI server module that directly performs 279 AI service run on different machines. 281 +--------------------------------------+ 282 +------------------------+ | +---------------------------+ | 283 | +-----------------+ | | | +-----------------+ | | 284 | | Client module |<-+--------+-----+---->| Server module | | | 285 | | for AI service | | | | | for AI service | | | 286 | +-----------------+ | | | +-----------------+ | | 287 +------------------------+ | + --------------------------+ | 288 Client machine | Server machine | 289 +--------------------------------------+ 290 Cloud(Internet) 292 Figure 3: AI inference service on Cloud server 294 In this case, the client module requesting the AI inference service 295 runs on the client machine, and the AI server module that directly 296 performs the AI inference service runs on a separate server machine, 297 and this server machine is in the cloud network. In this case, the 298 performance of the client machine does not need to be high because 299 the client machine simply needs to request the AI inference service 300 and, if necessary, deliver only the data required for the AI service 301 request. For the AI server module that directly performs AI 302 inference service, we can set up our own AI server, or we can use 303 commercial clouds such as Amazon, Microsoft, and Google. 305 3.3. AI inference service on Edge device 307 The following figure shows the case where the client module that 308 requests AI service and the AI server module that directly performs 309 AI service are separated, and the AI server module is located in the 310 edge device. 312 +--------------------------------------+ 313 +------------------------+ | +---------------------------+ | 314 | +-----------------+ | | | +-----------------+ | | 315 | | Client module |<-+--------+-----+---->| Server module | | | 316 | | for AI service | | | | | for AI service | | | 317 | +-----------------+ | | | +-----------------+ | | 318 +------------------------+ | + --------------------------+ | 319 Client machine | Edge device | 320 +--------------------------------------+ 321 Edge network 323 Figure 4: AI inference service on Edge device 325 Even in this case, the client module that requests the AI inference 326 service runs on the client machine, the AI server module that 327 directly performs the AI inference service runs on the edge device, 328 and the edge device is in the edge network. Even in this case, the 329 client module that requests the AI inference service runs on the 330 client machine, the AI server module that directly performs the AI 331 inference service runs on the edge device, and the edge device is in 332 the edge network. The AI module that directly performs the AI 333 inference service on the edge device can directly configure the edge 334 device or use a commercial edge computing module. 336 The difference from the above case where the AI server module is in 337 the cloud is that the edge device is usually close to the client, 338 whereas the performance is lower than that of the server in the 339 cloud, so there are advantages in data transfer time and inference 340 time, but in unit time Inference service performance is poor. 342 3.4. AI inference service on Cloud server and Edge device 344 The following figure shows the case where AI server modules that 345 directly perform AI services are distributed in the cloud and edge 346 devices. 348 +--------------------------------------+ 349 +------------------------+ | +---------------------------+ | 350 | +-----------------+ | | | +-----------------+ | | 351 | | Client module |<-+---+----+-----+---->| Server module | | | 352 | | for AI service |<-+---+ | | | for AI service | | | 353 | +-----------------+ | | | | +-----------------+ | | 354 +------------------------+ | | + --------------------------+ | 355 Client machine | | Edge device | 356 | +--------------------------------------+ 357 | Edge network 358 | 359 | +--------------------------------------+ 360 | | +---------------------------+ | 361 | | | +-----------------+ | | 362 +----+-----+---->| Server module | | | 363 | | | for AI service | | | 364 | | +-----------------+ | | 365 | + --------------------------+ | 366 | Server machine | 367 +--------------------------------------+ 368 Cloud(Internet) 370 Figure 5: AI inference service on Cloud sever and Edge device 372 There is a difference between the AI server module performed in the 373 cloud and the AI server module performed on the edge device in terms 374 of AI inference service performance. Therefore, the client 375 requesting the AI inference service may request by distributing the 376 AI inference service request to the cloud and edge device 377 appropriately in order to perform the desired AI service. In other 378 words, in the case of an AI service with low inference accuracy but 379 short inference time, we can request an AI inference service to the 380 edge device. 382 4. Considerations when configuring a system to provide AI services 384 As described in the previous chapter, the AI server module that 385 directly performs AI inference services by utilizing AI models can be 386 performed on a local machine or a cloud server or an edge device. In 387 theory, if AI inference service is performed on a local machine, AI 388 service can be provided without communication delay time or packet 389 loss, but a certain amount of hardware performance is required to 390 perform AI service inference. So, in the future environment where AI 391 services become popular, such as when various AI services are 392 activated and AI services are disseminated, the cost of a machine 393 that performs AI services is important and this case would not that 394 many. If so, whether the AI inference service will be performed on 395 the cloud server or the discount price on the edge device can be a 396 determining factor in the system configuration. 398 When AI inference service request is made to a distant cloud server, 399 it may take a lot of time to transmit, but it has the advantage of 400 being able to perform many AI inference service requests in a short 401 time, and the accuracy of AI service inference increases. 402 Conversely, when an AI service request is made to a nearby edge 403 device, the transmission time is short, but many AI inference service 404 requests cannot be performed at once, and the accuracy of AI service 405 inference is lowered. Therefore, by analyzing the characteristics 406 and requirements of the AI service to be performed, it is necessary 407 to determine where to perform the AI inference service on a local 408 machine, a cloud server, or an edge device. 410 According to the characteristics of the AI service, the 411 characteristics of the data used for training and the problem to be 412 solved, the hardware characteristics of the machine performing the AI 413 service varies. In general, machines on cloud servers are viewed as 414 machines with higher performance than edge devices. However, the 415 performance of AI inference service varies depending on how the 416 hardware such as CPU, RAM, GPU, and network interface is configured 417 for each cloud server and edge device. If we do not think about 418 cost, it is good to configure a system for performing AI services 419 with a machine with the best hardware performance, but in reality, we 420 should always consider the cost when configuring the system. So, 421 according to the characteristics and requirements of the AI service 422 to be performed, the performance of the local machine, cloud server, 423 and edge device must be determined. 425 Although not directly related to communication/network, the biggest 426 influence on AI inference services is the AI model to be used for AI 427 inference service. For example, in AI services such as image 428 classification, there are various types of AI models such as ResNet, 429 EfficientNet, VGG, and Inception. These AI models differ in AI 430 inference accuracy, but also in AI model file size and AI inference 431 time. AI models with the highest inference accuracy typically have 432 very large file sizes and take a lot of AI inference time. So, when 433 constructing an AI service system, it is not always good to choose an 434 AI model with the highest AI inference accuracy. Again, it is 435 important to select an AI model according to the characteristics and 436 requirements of the AI service to be performed. 438 Experimentally, it is recommended to use an AI model with high AI 439 inference accuracy in the cloud server, and use an AI model that can 440 provide fast AI inference service although the AI inference accuracy 441 is slightly lower for the fast AI inference service in the edge 442 device. 444 It might be a bit of an implementation issue, but we should also 445 consider how we deliver AI services on cloud servers or edge devices. 446 With the current technology, a traditional web server method or a 447 server method specialized for AI service inference (e.g., Google's 448 Tensorflow Serving) can be used. Traditional web server methods such 449 as Flask and Django have the advantage of running on various types of 450 machines, but since they are designed to support general web 451 services, the service execution time is not fast. Tensorflow Serving 452 uses the features of Tensorflow to make AI service inference services 453 very fast and efficient. However, older CPUs that do not support AVX 454 cannot use the Tensorflow serving function because Google's 455 Tensorflow does not run. Therefore, rather than unconditionally 456 using the server method specialized in AI service inference, it is 457 necessary to decide the AI server module method that provides AI 458 services in consideration of the hardware characteristics of the AI 459 system that can be built. 461 The communication method for transferring data to request AI 462 inference service is also an important decision in constructing an AI 463 system. Using the traditional REST method, it can be used for 464 various machines and services, but its performance is inferior to 465 Google's gRPC. There are many advantages to using gRPC for AI 466 inference services because Google's gRPC enables large-capacity data 467 transfer and efficient data transfer compared to REST. 469 5. IANA Considerations 471 There are no IANA considerations related to this document. 473 6. Security Considerations 475 When AI service is performed on a local machine, there is no security 476 issue, but when AI service is provided through a cloud server or edge 477 device, IP address and port number may be known to the outside can 478 attack. Therefore, when providing AI services by utilizing machines 479 on the network such as cloud servers and edge devices, it is 480 necessary to analyze the characteristics of the modules to be used 481 well, identify vulnerabilities in security, and take countermeasures. 483 7. Acknowledgements 485 TBA 487 8. Informative References 489 [RFC6574] Tschofenig, H. and J. Arkko, "Report from the Smart Object 490 Workshop", RFC 6574, DOI 10.17487/RFC6574, April 2012, 491 . 493 [RFC7452] Tschofenig, H., Arkko, J., Thaler, D., and D. McPherson, 494 "Architectural Considerations in Smart Object Networking", 495 RFC 7452, DOI 10.17487/RFC7452, March 2015, 496 . 498 [I-D.irtf-t2trg-iot-edge] 499 Hong, J., Hong, Y., de Foy, X., Kovatsch, M., Schooler, 500 E., and D. Kutscher, "IoT Edge Challenges and Functions", 501 Work in Progress, Internet-Draft, draft-irtf-t2trg-iot- 502 edge-04, 11 January 2022, 503 . 506 [CG-AIoT] "ITU-T CG-AIoT", . 509 [tinyML] "tinyML Foundation", . 511 [AI_inference_archtecture] 512 "IBM Systems, AI Infrastructure Reference Architecture", 513 . 515 [Google_cloud_iot] 516 "Bringing intelligence to the edge with Cloud IoT", 517 . 520 Authors' Addresses 522 Yong-Geun Hong 523 Daejeon University 524 62 Daehak-ro, Dong-gu 525 Daejeon 526 Phone: +82 42 280 4841 527 Email: yonggeun.hong@gmail.com 529 SeokBeom Oh 530 KSA 531 Digital Transformation Center, 5 532 Teheran-ro 69-gil, Gangnamgu 533 Seoul 534 Phone: +82 2 1670 6009 535 Email: isb6655@korea.ac.kr 537 SooJeong Lee 538 Korea University/KT 539 2511 Sejong-ro 540 Sejong City 541 Email: ngenius@korea.ac.kr 543 Hyun-Kook Kahng 544 Korea University 545 2511 Sejong-ro 546 Sejong City 547 Email: kahng@korea.ac.kr