idnits 2.17.1 draft-gu-rtgwg-cfn-field-trial-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (December 2, 2019) is 1606 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC8174' is mentioned on line 107, but not defined -- No information found for draft-geng-cfn-req - is the name correct? -- No information found for draft-li-cfn-framework - is the name correct? Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG S. Gu 3 INTERNET-DRAFT G. Zhuang 4 Intended Status: Informational Huawei Technologies 5 H. Yao 6 X. Li 7 China Mobile 9 Expires: June 4, 2020 December 2, 2019 11 A Report on Compute First Networking (CFN) Field Trial 12 draft-gu-rtgwg-cfn-field-trial-01 14 Abstract 16 Compute First Networking (CFN) enables the routing of the service 17 request to an optimal edge site to improve the overall system load 18 balancing and efficiency. Especially when an edge site is overloaded, 19 other edges with service equivalency can dynamically serve the 20 request. This document describes a CFN field trial to show the effect 21 that CFN can achieve. Edge to edge interaction to get the available 22 computing resources information for services and the network status 23 to each other is introduced. Data plane to support late binding based 24 dynamic anycast is illustrated too. The field trial shows that CFN 25 can greatly improve the overall query per second served for a service 26 hosted on multiple edges in a more balanced way. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as 36 Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/1id-abstracts.html 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 Copyright and License Notice 51 Copyright (c) 2019 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2 Testbed overview . . . . . . . . . . . . . . . . . . . . . . . . 3 69 3. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 3.1 Control Plane . . . . . . . . . . . . . . . . . . . . . . . 5 71 3.2 Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 4. Preliminary Tests . . . . . . . . . . . . . . . . . . . . . . . 9 73 4.1 Requests rush to an edge (no system background load) . . . . 9 74 4.2 Requests rush to an edge (system background load exists) . . 10 75 4.3 Mixed requests rush to an edge (no system background load) . 11 76 4.4 Impact from update frequency . . . . . . . . . . . . . . . . 12 77 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 13 79 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 13 80 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 81 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 82 9.1 Normative References . . . . . . . . . . . . . . . . . . . 13 83 9.2 Informative References . . . . . . . . . . . . . . . . . . 14 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 86 1. Introduction 88 Compute First Networking (CFN) Scenarios and Requirements [CFN-req] 89 shows the usage scenarios and requirements to dynamically dispatch 90 the service request to multiple edge sites in order to overcome the 91 computing resource overloading problem in edge computing. Compute 92 First Networking (CFN) framework document [CFN-fmwk] presents the 93 basic system framework to dynamically route a service request to a 94 selected edge in real time based on the computing load status and 95 network conditions. This approach improves the load balancing between 96 multiple edges with service equivalency in a distributed manner. This 97 document introduces a more concrete CFN field trial and its 98 performance. 100 1.1 Terminology 102 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 103 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 104 "OPTIONAL" in this document are to be interpreted as described in BCP 105 14 [RFC2119] [RFC8174] when, and only when, they appear in all 106 capitals, as shown here. 108 2 Testbed overview 110 We deployed CFN node on three edge sites in Hangzhou. The sites are 111 approximately 30 kilometers apart. Figure 1 shows the topology and 112 configuration we used for this CFN testbed. 114 +-----+ edge site 1 115 +-----+| +---+ 116 +-----+|+ +----------+ | | 117 +------+|+ ------ |CFN node 1| ----------------| | 118 |client|+ +----------+ | | 119 +------+ inter-edge itf:10.11.103.1 | | 120 service ID:SID_S | | 121 binding IP BIP1:10.11.102.1 | | 122 | | 123 | n | 124 +-----+ edge site 2 | e | 125 +-----+| | t | 126 +-----+|+ +----------+ | w | 127 +------+|+ ------ |CFN node 2|------------------| o | 128 |client|+ +----------+ | r | 129 +------+ inter-edge itf:10.12.103.1 | k | 130 service ID:SID_S | | 131 binding IP BIP2:10.12.102.1 | | 132 | | 133 | | 134 | | 135 +-----+ edge site 3 | | 136 +-----+| | | 137 +-----+|+ +----------+ | | 138 +------+|+ ------ |CFN node 2| -----------------| | 139 |client|+ +----------+ | | 140 +------+ inter-edge itf:10.13.103.1 +---+ 141 service ID:SID_S 142 binding IP BIP3:10.13.102.1 144 Figure 1. CFN testbed overview 146 A matrix multiplication service S is provided by all three edge sites 147 (or edges for simplicity in this document). The CFN nodes use a 148 unique service ID SID_S to announce the its reachability to service 149 S. In our test, we use 200.200.200.201 for SID_S. Consider SID_S here 150 as a anycast IP address. Though this service is reachable by a single 151 SID_S in network, 3 edges indeed serve SID_S using 3 different 152 binding IP (BIP) addresses , BIP1/2/3 with address 10.11/12/13.102.1 153 via CFN node 1/2/3 respectively. Service node hosted on or attached 154 to a CFN node only knows that it uses its BIP to serve service S and 155 has no knowledge about SID_S. 157 Each CFN node has an inter-edge interface IP address for 158 communicating the computing load information among CFN nodes. About 159 200 simulated clients connect to each CFN node in the test. 161 3. Procedures 163 The procedures are introduced in [CFN-fmwk]. For easy reference, 164 control plane and data plane timeline diagrams are shown here too. 166 3.1 Control Plane 168 When a service node is initiated for service S, the edge platform 169 manager will send the registration information about service ID SID_S 170 and binding IP (BIP) to access SID_S to the CFN node that the service 171 node attaches to. 173 Each CFN node regularly gets the computing load information about the 174 service node attached to it for SID_S. The computing load information 175 can be CPU consumption for SID_S, number of current connections, 176 query per second processed, total capacity, or other performance 177 metrics. In our test, we give each type of metrics a weight. CFN 178 nodes distribute those information to each other by BGP extensions. 179 Figure 2 shows the CFN control plane procedures. 181 CFN CFN CFN Edge Platform 182 Node 1 Node 2 Node 3 Manager 184 | | | | 185 | | | | 186 | | |<------------------| 187 | | | 1.Service info | 188 | | | registration/ | 189 | | | update/withdraw | 190 | | | (SID_S, BIP 3) | 191 | | | | 192 | | | | 193 | | |<------------------| 194 | | | 2.Computing load | 195 | | | update triggering | 196 | | | (SID_S,computing | 197 | | | load information) | 198 | | | | 199 | | | | 200 | |<---------------------| | 201 | | | | 202 |<------------------------------| | 203 | | 3.BGP update for | | 204 | | computing load | | 205 | | (SID_S, CFN node 3, | | 206 | | computing load info)| | 207 | | | | 209 Figure 2. CFN control plane 211 3.2 Data Plane 213 When a client sends a service request for service S, it uses SID_S as 214 destination IP. In the test, SID_S is an anycast address. There are 215 various ways that a client can get the SID_S for a service, such as 216 by DNS or static configuration. 218 When the CFN ingress which is CFN node 1 in figure 3 receives the 219 request, it dynamically selects the most appropriate CFN egress based 220 on computing load information received. As figure 4 shows, CFN node 3 221 is selected as CFN egress in this case. CFN ingress further tunnels 222 the data packet to CFN egress. 224 When CFN egress receives the packet, it decapsulates the packet and 225 maps the destination address from SID_S to binding IP BIP3. The 226 service node for service S gets the packet and processes it. The 227 service response is returned back to CFN node 3. CFN node 3 is 228 conceptually the gateway of attached service nodes for CFN services. 229 It maps BIP3 to SID_S as source IP and then tunnels it to CFN node 1. 230 CFN node 1 further decapsulates the packet and sends it to the 231 client. 233 For the subsequent service request packets sent to CFN node 1 from 234 the same flow, CFN node always uses CFN node 3 as the egress to 235 ensure the flow affinity. 237 CFN node 1 CFN node 3 Service 238 client (CFN ingress) (CFN egress) Node for S 240 | | | | 241 |1.service req | | | 242 |------------->| | | 243 |dst=SID_S | | | 244 |src=client_IP | | | 245 | | | | 246 | | | | 247 | +----------------+ | | 248 | |2.Select CFN | | | 249 | |egress & save it| | | 250 | +----------------+ | | 251 | | | | 252 | |3. forward service req | | 253 | |with encapsulation | | 254 | |---------------------> | | 255 | |outer: dst=CFN_Node_3 | | 256 | | src=CFN_Node_1 | | 257 | |inner: dst=SID_S | | 258 | | src=client_IP | | 259 | | | | 260 | | +----------------+ | 261 | | |4.decap & map | | 262 | | |SID_S to binding| | 263 | | |IP | | 264 | | +----------------+ | 265 | | | | 266 | | | | 267 | | |5. forward pkt | 268 | | |------------------>| 269 | | |dst=BIP3 | 270 | | | | 271 | | | | 272 | | | | 273 | | | 6. service rsp | 274 | | |<----------------- | 275 | | |src=BIP3 | 276 | | | | 277 | | +----------------+ | 278 | | |7.map binding IP| | 279 | | |back to SID_S & | | 280 | | |encap | | 281 | | +----------------+ | 282 | | | | 283 | |8. forward service rsp | | 284 | |with encapsulation | | 285 | |<--------------------- | | 286 | |outer: dst=CFN_Node_1 | | 287 | | src=CFN_Node_3 | | 288 | |inner: dst=client_IP | | 289 | | src=SID_S | | 290 | | | | 291 | +----------+ | | 292 | |9 decap | | | 293 | +----------+ | | 294 | | | | 295 | 10. forward | | | 296 |<------------ | | | 297 |dst=client_IP | | | 298 |src=SID_S | | | 299 | | | | 301 Figure 3. CFN data plane for the first request of a flow 303 4. Preliminary Tests 305 4.1 Requests rush to an edge (no system background load) 307 In this test, we assume the service nodes capacities attached to all 308 three edges are the same and there is no background computing tasks 309 running. The overall computing task handling capacity from service 310 nodes can handle about 670 queries per second (qps). 312 The clients attached to edge 1 generating service request to it at 313 about 40 qps. The number of clients simultaneously send requests 314 varies. When 10 clients send requests, the computing power consumed 315 by the system can reach approximately 60% of its overall maximum. The 316 requests are all short-processing tasks and based on observation each 317 request roughly take 4ms to be completed at the server side. 319 CFN leverages the computing load reported by different edges and 320 together with network status to spread the service request. On the 321 other hand, a pure random selection from the edges to handle the 322 request is used for comparison. 324 We tested for 5, 10 and 15 clients attached to one edge which result 325 in the consumption of medium low, medium high and high computing 326 resources of the whole system respectively. Note it exceeds a single 327 edge capacity in any case. For 15 clients case, it almost reaches the 328 maximum system capacity. Figure 4 shows the average delay between a 329 request being sent and the response being received by a client and 330 system qps. 332 +-------------+--------+----------------+---------+ 333 | number of | system | average delay | qps | 334 | clients | | (ms) | | 335 +-------------+--------+----------------+---------+ 336 | | CFN | 3.954 | 208.5 | 337 | 5 +--------+----------------+---------+ 338 | (medium low)| random | 5.316 | 197.7 | 339 +-------------+--------+----------------+---------+ 340 | | CFN | 4.700 | 402.3 | 341 | 10 +--------+----------------+---------+ 342 |(medium high)| random | 5.595 | 302.1 | 343 +-------------+--------+----------------+---------+ 344 | | CFN | 5.506 | 559.3 | 345 | 15 +--------+----------------+---------+ 346 | (high) | random | 5.718 | 546.0 | 347 +-------------+--------+----------------+---------+ 349 Figure 4. Test results when service requests rush to 350 a single edge when no system background load 352 The CFN achieves better results compared with random selection based 353 application layer service dispatch. Average delay decreased by 25.62% 354 and 16.00% and total qps increased by 5.5% and 33.17% in medium low 355 and medium high computing load respectively. The unbalanced incoming 356 traffic is spread to all edges. Unlike random selection, CFN will 357 dispatch more requests to the local edge since its network cost is 358 the lowest. CFN balances between higher computing resources available 359 at the remote sites and lower network cost at the local site to make 360 a choice. Hence it outperforms the random selection. In high number 361 of clients case, as the maximum system capacity is almost reached, 362 the performance are similar for CFN and random case. 364 4.2 Requests rush to an edge (system background load exists) 366 In this test, different edge has different background computing tasks 367 to handle. We randomly select an edge to make it suffer from a 368 computing intensive burst which consumes almost 90% of its capacity 369 for about 4 seconds. Then computing load returns to zero for 2 370 seconds. It creates the busy edge and idle edges scenario. The other 371 settings are same as shown in section 4.1. 373 Figure 5 shows the average delay between a request being sent and the 374 response being received by a client and system qps for this case. 376 +-------------+--------+----------------+---------+ 377 | number of | system | average delay | qps | 378 | clients | | (ms) | | 379 +-------------+--------+----------------+---------+ 380 | | CFN | 6.291 | 185.6 | 381 | 5 +--------+----------------+---------+ 382 | (medium low)| random | 9.630 | 165.3 | 383 +-------------+--------+----------------+---------+ 384 | | CFN | 6.854 | 360.9 | 385 | 10 +--------+----------------+---------+ 386 |(medium high)| random | 10.592 | 316.3 | 387 +-------------+--------+----------------+---------+ 388 | | CFN | 7.987 | 512.4 | 389 | 15 +--------+----------------+---------+ 390 | (high) | random | 12.156 | 441.7 | 391 +-------------+--------+----------------+---------+ 393 Figure 5. Test results when service requests rush to 394 a single edge when system background load exists 396 The results show that CFN has average delay decreased by 34.67%, 397 35.29% and 34.30% in medium low, medium high and high computing load 398 respectively. And total qps is increased by 12.28%, 14.10% and 16.01% 399 in medium low, medium high and high computing load respectively. 401 The performance gain of CFN shown in this test case is much higher 402 than that in section 4.1 The reason is that the random service 403 dispatching has more than 20% chance to send the request to an edge 404 with service node with very high background computing load while CFN 405 can greatly reduce such possibility. 407 In addition, compare with the results in section 4.1, delay increases 408 59.10%, 45.83% and 45.06% in different computing load level in CFN 409 and 81.15%, 89.31%, 112.60% in random selection. It shows CFN can 410 much better adapt to dynamic computing load change especially when 411 system background load is high. 413 4.3 Mixed requests rush to an edge (no system background load) 415 We changed the characteristics of service requests to reflect the co- 416 existence nature of long-processing tasks and short-processing tasks. 417 Short-processing task takes roughly 4ms to complete and long- 418 processing task takes roughly 400ms to complete. And the ratio of 419 long and short tasks is approximately 1:100. 421 Figure 6 shows the average delay between a request being sent and the 422 response being received by a client and system qps for this case. 424 +-------------+--------+----------------+---------+ 425 | number of | system | average delay | qps | 426 | clients | | (ms) | | 427 +-------------+--------+----------------+---------+ 428 | | CFN | 5.205 | 193.5 | 429 | 5 +--------+----------------+---------+ 430 | (medium low)| random | 5.398 | 193.5 | 431 +-------------+--------+----------------+---------+ 432 | | CFN | 5.201 | 393.4 | 433 | 10 +--------+----------------+---------+ 434 |(medium high)| random | 5.985 | 385 | 435 +-------------+--------+----------------+---------+ 436 | | CFN | 6.147 | 559.4 | 437 | 15 +--------+----------------+---------+ 438 | (high) | random | 8.499 | 559.4 | 439 +-------------+--------+----------------+---------+ 441 Figure 6. Test results when mixed service requests rush to 442 a single edge when no system background load 444 The results show that CFN has average delay decreased by 3.58%, 445 13.10% and 27.76% in medium low, medium high and high computing load 446 respectively. The qps has no much difference for different levels of 447 computing load especially for the medium low and high case. 449 4.4 Impact from update frequency 451 The computing load information is updated and distributed when its 452 metric changes exceed some threshold compared to the last distributed 453 information. In the test, we used the 10% of maximum number of 454 connections allowed and 5% CPU consumption as threshold. Frequency of 455 update affects the system performance. We tested for different update 456 interval to see their impact. The clients keep sending requests to 457 make the computing resource consumption on each edge maintained at 458 medium low which is about 5 connections. Update internal has been set 459 to 10s, 5s, 1s, 100ms, 10ms, 1ms. Figure 7 shows the average delay 460 between a request being sent and the response being received by a 461 client under different update intervals and the improvement of delay 462 when comparing to the case of 10 second interval. 464 The results shows that the higher frequency of updates distributed 465 the better performance. 467 +-------------+--------------+----+----+----+-----+----+----+ 468 |# of clients | Interval | 10s| 5s |1s |100ms|10ms| 1ms| 469 |-------------+--------------+----+----+----+-----+----+----+ 470 | 5 | Delay(us) |6445|6255|5741|5312 |4883|4058| 471 |(medium low) |--------------+----+----+----+-----+----+----+ 472 | |Improvement(%)| 0 |3.5 |12.3|21.3 |32.3|58.8| 473 +-------------+--------------+----+----+----+-----+----+----+ 475 Figure 7. Test results under different update intervals 477 5. Summary 479 This draft presents a field trial for CFN system with three edge 480 sites in different locations. CFN enables a network-based fast-react 481 system to serve multi-edge based computing service in a more balanced 482 way. Computing load information are exchanged regularly between CFN 483 nodes. CFN egress bound to serve a particular service is determined 484 in real time and maintained to ensure flow affinity. 486 The tests show that the overall clients' request delay is greatly 487 decreased and the system qps has some improvement too. CFN is a 488 feasible and efficient way in edge computing to provide multi-edge 489 service balancing. 491 6. Security Considerations 493 The security risks mentioned in [CFN-fmwk] apply in the tests. As a 494 preliminary tests, no extra security risks control is implemented 495 currently. Mechanisms such as authentication of edge node and 496 fluctuation avoidance should be considered in deployment. 498 7. IANA Considerations 500 No IANA action is required. 502 8. Acknowledgements 504 The authors would like to thank Xunwen Li's team members for their 505 help in setting up the testbed in Hangzhou. 507 9. References 509 9.1 Normative References 511 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 512 Requirement Levels", BCP 14, RFC 2119, March 1997. 514 9.2 Informative References 515 [CFN-req] Geng, L., et al, "Compute First Networking (CFN) Scenarios 516 and Requirements", draft-geng-cfn-req-00, November 2019. 518 [CFN-fmwk] Li, Y., et al, "Framework of Compute First Networking 519 (CFN)", draft-li-cfn-framework-00, November 2019. 521 Authors' Addresses 523 Shuheng Gu 524 Huawei Technologies 526 EMail: gushuheng@huawei.com 528 Guanhua Zhuang 529 Huawei Technologies 531 EMail: zhuangguanhua@huawei.com 533 Huijuan Yao 534 China Mobile 536 EMail: yaohuijuan@chinamobile.com 538 Xunwen Li 539 China Mobile 541 EMail: lixunwen@zj.chinamobile.com