idnits 2.17.1 draft-qiang-detnet-large-scale-detnet-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 6 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (September 3, 2019) is 1689 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Qiang, Ed. 3 Internet-Draft X. Geng 4 Intended status: Informational B. Liu 5 Expires: March 6, 2020 T. Eckert, Ed. 6 Huawei 7 L. Geng 8 China Mobile 9 G. Li 10 September 3, 2019 12 Large-Scale Deterministic IP Network 13 draft-qiang-detnet-large-scale-detnet-05 15 Abstract 17 This document presents the overall framework and key method for 18 Large-scale Deterministic Network (LDN). LDN can provide bounded 19 latency and delay variation (jitter) without requiring precise time 20 synchronization among nodes, or per-flow state in transit nodes. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on March 6, 2020. 39 Copyright Notice 41 Copyright (c) 2019 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 58 1.2. Terminology & Abbreviations . . . . . . . . . . . . . . . 3 59 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.2. Background . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.2.1. Deterministic End-to-End Latency . . . . . . . . . . 4 63 2.2.2. Hop-by-Hop Delay . . . . . . . . . . . . . . . . . . 4 64 2.2.3. Cyclic Forwarding . . . . . . . . . . . . . . . . . . 5 65 2.2.4. Co-Existence with Non-Deterministic Traffic . . . . . 6 66 2.3. System Components . . . . . . . . . . . . . . . . . . . . 6 67 3. LDN Forwarding Mechanism . . . . . . . . . . . . . . . . . . 7 68 3.1. Cyclic Queues . . . . . . . . . . . . . . . . . . . . . . 8 69 3.2. Cycle Mapping . . . . . . . . . . . . . . . . . . . . . . 9 70 4. Performance Analysis . . . . . . . . . . . . . . . . . . . . 11 71 4.1. Queueing Delay . . . . . . . . . . . . . . . . . . . . . 11 72 4.2. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 74 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 75 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 76 8. Normative References . . . . . . . . . . . . . . . . . . . . 14 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 79 1. Introduction 81 This document explores the DetNet forwarding over large-scale 82 network. In contrast to TSN that deployed in LAN, DetNet is expected 83 to be deployed in larger scale network that has the following 84 features: 86 o a large number of network devices 88 o the distance between two network devices is long 90 o a lot of deterministic flows on the network 92 These above features will bring the following challenges to DetNet 93 forwarding: 95 o difficult to achieve precise time synchronization among all nodes 96 o long link propagation delay may introduce bigger jitter 98 o per-flow state is un-scalable 100 Motivated by these challenges, this document presents a Large-scale 101 Deterministic Network (LDN) mechanism. As 102 [draft-ietf-detnet-problem-statement] indicates, deterministic 103 forwarding can only apply on flows with well-defined traffic 104 characteristics. The traffic characteristics of DetNet flow has been 105 discussed in [draft-ietf-detnet-architecture], that could be achieved 106 through shaping at Ingress node or up-front commitment by 107 application. LDN assumes that DetNet flows follow some specific 108 traffic patterns accordingly. 110 1.1. Requirements Language 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 114 "OPTIONAL" in this document are to be interpreted as described in BCP 115 14 [RFC2119][RFC8174] when, and only when, they appear in all 116 capitals, as shown here. 118 1.2. Terminology & Abbreviations 120 This document uses the terminology defined in 121 [draft-ietf-detnet-architecture]. 123 TSN: Time Sensitive Network 125 PQ: Priority Queuing 127 CQF: Cyclic Queuing and Forwarding 129 LDN: Large-scale Deterministic Network 131 DSCP: Differentiated Services Code Point 133 EXP: Experimental 135 TC: Traffic Class 137 T: the length of a cycle 139 H: the number of hops 141 2. Overview 143 2.1. Summary 145 In LDN, nodes (network devices) have synchronized frequency, and each 146 node forwards packets in a slotted fashion based on a cycle 147 identifiers carried in packets. Ingres nodes or senders have a 148 function called gate to shape/condition traffic flows. Except for 149 this gate function, the LDN has no awareness of individual flows. 151 2.2. Background 153 This section motivates the design choices taken by the proposed 154 solution and gives the necessary background for deterministic delay 155 based forwarding plane designs. 157 2.2.1. Deterministic End-to-End Latency 159 Bounded delay is delay that has a deterministic upper and lower 160 bound. 162 The delay for packets that need to be forwarded with deterministic 163 delay needs to be deterministic on every hop. If any hop in the 164 network introduces non-deterministic delay, then the network itself 165 can not deliver a deterministic delay service anymore. 167 2.2.2. Hop-by-Hop Delay 169 Consider a simple example shown in Figure 1, where Node X has 10 170 receiving interfaces and one outgoing interface I all of the same 171 speed. There are 10 deterministic traffic flows, each consuming 5% 172 of a links bandwidth, one from each receiving interface to the 173 outgoing interface. 175 Node X sends 'only' 50% deterministic traffic to interface I, so 176 there is no ongoing congestion, but there is added delay. If the 177 arrival time of packets for these 10 flows into X is uncontrolled, 178 then the worst case is for them to all arrive at the same time. One 179 packet has to wait in X until the other 9 packets are sent out on I, 180 resulting in a worst case deterministic delay of 9 packets 181 serialization time. On the next hop node Y downstream from X, this 182 problem can become worse. Assume Y has 10 upstream nodes like X, the 183 worst case simultaneous burst of packets is now 100 packets, or a 99 184 packet serialization delay as the worst case upper bounded delay 185 incurred on this hop. 187 To avoid the problem of high upper bound end-to-end delay, traffic 188 needs to be conditioned/interleaved on every hop. This allows to 189 create solutions where the per-hop-delay is bounded purely by the 190 physics of the forwarding plane across the node, but not the 191 accumulated characteristics of prior hop traffic profiles. 193 +--+ +--+ --- --- 194 |A1| |A0| - - - - 195 +--+ +--+ - - - - 196 ---------------->- - - - 197 +--+ +--+ - - - - 198 |B1| |B0| - - - - 199 +--+ +--+ - -Interface I - - 200 ---------------->-Node X ---------------> -Node Y -----> 201 +--++--+ - - - - 202 |C1||C0| - - 203 +--++--+ - - - - 204 ---------------->- - - - 205 . - - - - 206 . - - - - 207 . - - - - 208 --- --- 210 Figure 1: Micro-burst and micro-burst iteration 212 2.2.3. Cyclic Forwarding 214 The common approach to solve that problem is that of a cyclic hop-by- 215 hop forwarding mechanism. Assume packets forwarded from N1 via N2 to 216 N3 as shown in Figure 2. When N1 sends a packet P to interface I1 217 with a Cycle X, it must be guaranteed by the forwarding mechanism 218 that N2 will forward P via I2 to N3 in a cycle Y. 220 The cycle of a packet can either be deduced by a receiving node from 221 the exact time it was received as is done in SDN/TDMA systems, and/or 222 it can be indicated in the packet. This document solution relies on 223 such markings because they allow to reduce the need for synchronous 224 hop-by-hop transmission timings of packets. 226 In a packet marking based slotted forwarding model, node N1 needs to 227 send packets for cycle X before the latest possible time that will 228 allow for N2 to further forward it in cycle Y to N3. Because of the 229 marking, N1 could even transmit packets for cycle X before all 230 packets for the previous cycle (X-1) have been sent, reducing the 231 synchronization requirements between across nodes. 233 P sent in P sent in P sent in 234 cycle(N1,I1,X) cycle(N2,I2,Y) cycle(N3,I3,Z) 235 +--------+ +--------+ +--------+ 236 | Node N1|------->| Node N2|-------->| Node N3|------> 237 +--------+I1 +--------+I2 +--------+I3 239 Figure 2: Cyclic Forwarding 241 2.2.4. Co-Existence with Non-Deterministic Traffic 243 Traffic with deterministic delay requirements can co-exist with 244 traffic only requiring non-deterministic delay by using packet 245 scheduling where the delay incurred by non-deterministic packets is 246 deterministic for the deterministic traffic (and low). If LDN is 247 deployed together with such non-deterministic delay traffic than such 248 a scheme must be supported by the forwarding plane. A simple 249 approach for the delay incurred on the sending interface of a 250 deterministic node due to non-deterministic traffic is to serve 251 deterministic traffic via a strict, highest-priority queue and 252 include the worst case delay of a currently serialized non- 253 deterministic packet into the deterministic delay budget of the node. 254 Similar considerations apply to the internal processing delays in a 255 node. 257 2.3. System Components 259 The Figure 3 shows an overview of the components considered in this 260 document system and how they interact. 262 A network topology of nodes, Ingress, Core and Egress support a 263 method for cyclic forwarding to enable LDN. This forwarding requires 264 no per-flow state on the nodes, and tolerates loss time 265 synchronization. 267 Ingress edge nodes may support the (G)ate function to shape traffic 268 from sources into the desired traffic characteristics, unless the 269 source itself has such function. Per-flow state is required on the 270 ingress edge node. LDN should work with some resource reservation 271 methods, that will be not discussed in this document. 273 /--\. +--+ +--+ +--+ +--+. /--\ 274 | (G)+-----+GS+--------+ S+------+ S+--------+ S+-----+ | 275 \--/ +--+ +--+ +--+ +--+ \--/ 277 Sender Ingress Core Core Egress Receiver 278 Edge Node Node Node Edge Node 280 Figure 3: Overview of LDN 282 3. LDN Forwarding Mechanism 284 DetNet aims at providing deterministic service over large scale 285 network. In such large scale network, it is difficulty to get 286 precise time synchronization among numerous devices. To reduce 287 requirements, the forwarding mechanism described in this document 288 assumes only frequency synchronization but not time synchronization 289 across nodes: nodes maintain the same clock frequency 1/T, but do not 290 require the same time as shown in Figure 4. 292 <-----T-----> <-----T-----> 293 | | | | | | 294 Node A +-----------+-----------+ Node A +-----------+-----------+ 295 T0 T0 297 | | | | | | 298 Node B +-----------+-----------+ Node B +-----------+-----------+ 299 T0 T0 301 (i) time synchronization (ii) frequency synchronization 303 T: length of a cycle 304 T0: timestamp 306 Figure 4: Time Synchronization & Frequency Synchronization 308 IEEE 802.1 CQF is an efficient forwarding mechanism in TSN that 309 guarantees bounded end-to-end latency. CQF is designed for limited 310 scale networks. Time synchronization is required, and the link 311 propagation delay is required to be smaller than a cycle length T. 312 Considering the large scale network deployment, the proposed LDN 313 Forwarding mechanism permits frequency synchronization and link 314 propagation delay may exceed T. Besides these two points, CQF and 315 the asynchronous forwarding of LDN are very similar. 317 Figure 5 compares CQF and LDN through an example. Suppose Node A is 318 the upstream node of Node B. In CQF, packets sent from Node A at 319 cycle x, will be received by Node B at the same cycle, then further 320 be sent to downstream node by Node B at cycle x+1. 322 In LDN, due to long link propagation delay and frequency 323 synchronization, Node B will receive packets from Node A at different 324 cycle denoted by y, then re-send out at cycle y+1. The cycle mapping 325 relationship (e.g., x->y+1) exists between any pair of neighbor 326 nodes. With this kind of cycle mapping, the receiving node can 327 easily figure out when the received packets should be sent out, the 328 only requirement is to carry the cycle identifier of sending node in 329 the packets. 331 | cycle x | cycle x+1 | | cycle x | cycle x+1 | 332 Node A +-----------+-----------+ Node A +-----------+-----------+ 333 \ \ 334 \packet \packet 335 \receiving \receiving 336 \ \ 337 | V | cycle x+1 | | V | cycle y+1| 338 Node B +-----------+-----------+ Node B +-----------+-----------+ 339 cycle x \packets cycle y \packets 340 \sending \sending 341 \ \ 342 \ \ 343 V V 345 (i) CQF (ii) LDN 347 Figure 5: CQF & LDN 349 3.1. Cyclic Queues 351 In CQF each port needs to maintain 2 (or 3) queues, one receiving 352 queue is used to buffer newly received packets, one sending queue is 353 used to store the packets that are going to be sent out, one more 354 queue may be needed to avoid output starvation [scheduled-queues]. 356 In LDN, at least 3 cyclic queues (2 receiving queues and 1 sending 357 queue) are maintained for each port on a node. A cyclic queue 358 corresponds to a cycle. As Figure 6 illustrated, the downstream Node 359 B may receive packets sent at two different cycles from Node A due to 360 the absence of time synchronization. Following the cycle mapping 361 (i.e., x --> y+1), packets that carry cycle identifier x should be 362 sent out by Node B at cycle y+1, and packets that carry cycle 363 identifier x+1 should be sent out by Node B at cycle y+2. Therefore, 364 2 receiving queues are needed to store the received packets, one is 365 for the packets that carry cycle identifier x, another one is for the 366 packets that carry cycle identifier x+1. Plus one sending queue, 367 each port needs at least 3 cyclic queues in LDN. In order to absorb 368 more link delay variation (such as on radio interface), more queues 369 may be necessary. 371 | cycle x | cycle x+1 | 372 Node A +-----------+-----------+ 373 \ \ 374 \ \packet 375 \ \receiving 376 | V V | | 377 Node B +-----------+-----------+ 378 cycle y cycle y+1 380 Figure 6: An example illustrates for 2 receiving queue in LDN 382 3.2. Cycle Mapping 384 The cycle mapping relationship (e.g., x->y+1) exists between any pair 385 of neighbor nodes, that could be configured through control plane or 386 self-studied in data plane. As Figure 7 shows, the cycle mapping 387 relationship instructs the packet forwarding in two modes -- swap 388 mode or stack mode. 390 o In swap mode, node stores the cycle mapping relationship locally. 391 After receiving a packet carrying a cycle identifier, the node 392 will check its cycle mapping relationship table, swap the cycle 393 identifier with a new cycle identifier, then put the packet into 394 an appropriate queue. A path with dedicated resource needs to be 395 established first, then packet is forwarded along the path in swap 396 mode. 398 o In stack mode, a central controller computes the cycle identifier 399 of every node, which ensures that there is no flow confliction 400 along the path and satisfies the end-to-end latency requirement. 401 The cycle identifiers are encapsulated into the packet in the 402 ingress. No other status information needs to be maintained in 403 the intermediate nodes. 405 LDN Packet 406 +------+---+ +-----------------------+ +------+---+ 407 | | x | | | | |y+1| 408 +------+---+ | Swap Mode Node | +------+---+ 409 ----------->| |-----------> 410 | (x->y+1) | 411 | | 412 +-----------------------+ 414 LDN Packet 415 +------+---===== +-----------------------+ +------=====---+ 416 | |y+1= x = | | | =y+1= x | 417 +------+---===== | | +------=====---+ 418 ----------->| Stack Mode Node |-----------> 419 | | 420 | | 421 +-----------------------+ 423 ===== 424 = = Current Cycle Identifier 425 ===== 427 Figure 7: Two Modes 429 As section 3.1 illustrates, there are 3 (or 4) different queues at 430 each port. Therefore, the cycle identifier should be able to express 431 3 (or 4) different values, each value corresponds to a queue. That 432 means minimal 2 bits are needed to identify different cycles between 433 a pair of neighboring nodes. This document does not yet aim to 434 propose one, but gives an (incomplete) list of ideas: 436 o DSCP of IPv4 Header 438 o Traffic Class of IPv6 Header 440 o TC of MPLS Header (used to be EXP) 442 o IPv6 Extension Header 444 o UDP Option 446 o SID of SRv6 448 o Reserved of SRH 450 o TLV of SRv6 451 o TC of SR-MPLS Header (used to be EXP) 453 o 3 (or 4) labels/adjacency SIDs for SR-MPLS 455 4. Performance Analysis 457 4.1. Queueing Delay 459 Figure 8 describes one-hop packet forwarding delay, that mainly 460 consisted of A->B link propagation delay and queuing delay in Node B. 462 |cycle x | 463 Node A +-------\+ 464 \ 465 \ 466 \ 467 |\ cycle y|cycle y+1| 468 Node B +V--------+--------\+ 469 : \ 470 : Queueing Delay :\ 471 :...=2*T ............ V 473 Figure 8: Single-Hop Queueing Delay 475 As Figure 8 shows, cycle x of Node A will be mapped into cycle y+1 of 476 Node B as long as the last packet sent from A->B is received within 477 the cycle y. If the last packet is re-sent out by B at the end of 478 cycle y+1, then the largest single-hop queueing delay is 2*T. 479 Therefore the end-to-end queueing delay's upper bound is 2*T*H, where 480 H is the number of hops. 482 If A did not forward the LDN packet from a prior LDN forwarder but is 483 the actual traffic source, then the packet may have been delayed by a 484 gate function before it was sent to B. The delay of this function is 485 outside of scope for the LDN delay considerations. If B is not 486 forwarding the LDN packet but the final receiver, then the packet may 487 not need to be queued and released in the same fashion to the 488 receiver as it would be queued/released to a downstream LDN node, so 489 if a path has one source followed by N LDN forwarders followed by one 490 receivers, this should be considered to be a path with N-1 LDN hops 491 for the purpose of latency and jitter calculations. 493 4.2. Jitter 495 Considering the simplest scenario one hop forwarding at first, 496 suppose Node A is the upstream node of Node B, the packet sent from 497 Node A at cycle x will be received by Node B at cycle y as Figure 9 498 shows. 500 - The best situation is Node A sends packet at the end of cycle x, 501 and Node B receives packet at the beginning of cycle y, then the 502 delay is denoted by w; 504 - The worst situation is Node A sends packet at the beginning of 505 cycle x, and Node B receives packet at the end of cycle y, then 506 the delay= w + length of cycle x + length of cycle y= w+2*T; 508 - Hence the jitter's upper bound of this simplest scenario= worst 509 case-best case=2*T. 511 |cycle x | |cycle x | 512 Node A +-------\+ Node A +\-------+ 513 :\ \ : 514 : \ -------------\ 515 : \ : \ 516 :w |\ | :w| \ | 517 Node B : +V--------+ Node B : +--------V+ 518 cycle y cycle y 520 (a) best situation (b) worst situation 522 Figure 9: Jitter Analysis for One Hop Forwarding 524 Next considering two hops forwarding as Figure 10 shows. 526 - The best situation is Node A sends packet at the end of cycle x, 527 and Node C receives packet at the beginning of cycle z, then the 528 delay is denoted by w'; 530 - The worst situation is Node A sends packet at the beginning of 531 cycle x, and Node C receives packet at the end of cycle z, then 532 the delay= w' + length of cycle x + length of cycle z= w'+2*T; 534 - Hence the jitter's upper bound = worst case-best case=2*T. 536 |cycle x | 537 Node A +-------\+ 538 \ 539 :\| cycle y | 540 Node B : \---------+ 541 : \ 542 : \--------\ 543 : \ | 544 Node C ......w'......+V--------+ 545 cycle z 547 (a) best situation 549 |cycle x | 550 Node A +\-------+ 551 \ : 552 \ : | cycle y | 553 Node B \ : +---------+ 554 \ : 555 ---:--------------------\ 556 : | \ | 557 Node C :......w'.....+--------V+ 558 cycle z 560 (b) worst situation 562 Figure 10: Jitter Analysis for Two Hops Forwarding 564 And so on. For multi-hop forwarding, the end-to-end delay will 565 increase as the number of hops increases, while the delay variation 566 (jitter) still does not exceed 2*T. 568 5. IANA Considerations 570 This document makes no request of IANA. 572 6. Security Considerations 574 Security issues have been carefully considered in 575 [draft-ietf-detnet-security]. More discussion is TBD. 577 7. Acknowledgements 579 TBD. 581 8. Normative References 583 [draft-ietf-detnet-architecture] 584 "DetNet Architecture", . 587 [draft-ietf-detnet-dp-sol] 588 "DetNet Data Plane Encapsulation", 589 . 592 [draft-ietf-detnet-problem-statement] 593 "DetNet Problem Statement", 594 . 597 [draft-ietf-detnet-security] 598 "DetNet Security Considerations", 599 . 602 [draft-ietf-detnet-use-cases] 603 "DetNet Use Cases", . 606 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 607 Requirement Levels", BCP 14, RFC 2119, 608 DOI 10.17487/RFC2119, March 1997, 609 . 611 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 612 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 613 May 2017, . 615 [scheduled-queues] 616 "Scheduled queues, UBS, CQF, and Input Gates", 617 . 620 Authors' Addresses 622 Li Qiang (editor) 623 Huawei 624 Beijing 625 China 627 Email: qiangli3@huawei.com 628 Xuesong Geng 629 Huawei 630 Beijing 631 China 633 Email: gengxuesong@huawei.com 635 Bingyang Liu 636 Huawei 637 Beijing 638 China 640 Email: liubingyang@huawei.com 642 Toerless Eckert (editor) 643 Huawei USA - Futurewei Technologies Inc. 644 2330 Central Expy 645 Santa Clara 95050 646 USA 648 Email: tte+ietf@cs.fau.de 650 Liang Geng 651 China Mobile 652 Beijing 653 China 655 Email: gengliang@chinamobile.com 657 Guangpeng Li