idnits 2.17.1 draft-qiang-detnet-large-scale-detnet-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 6 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 8, 2019) is 1875 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Qiang, Ed. 3 Internet-Draft X. Geng 4 Intended status: Informational B. Liu 5 Expires: September 9, 2019 T. Eckert, Ed. 6 Huawei 7 L. Geng 8 China Mobile 9 March 8, 2019 11 Large-Scale Deterministic IP Network 12 draft-qiang-detnet-large-scale-detnet-04 14 Abstract 16 This document presents the overall framework and key method for 17 Large-scale Deterministic Network (LDN). LDN can provide bounded 18 latency and delay variation (jitter) without requiring precise time 19 synchronization among nodes, or per-flow state in transit nodes. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on September 9, 2019. 38 Copyright Notice 40 Copyright (c) 2019 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 57 1.2. Terminology & Abbreviations . . . . . . . . . . . . . . . 3 58 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.2. Background . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.2.1. Deterministic End-to-End Latency . . . . . . . . . . 4 62 2.2.2. Hop-by-Hop Delay . . . . . . . . . . . . . . . . . . 4 63 2.2.3. Cyclic Forwarding . . . . . . . . . . . . . . . . . . 5 64 2.2.4. Co-Existence with Non-Deterministic Traffic . . . . . 6 65 2.3. System Components . . . . . . . . . . . . . . . . . . . . 6 66 3. LDN Forwarding Mechanism . . . . . . . . . . . . . . . . . . 7 67 3.1. Cyclic Queues . . . . . . . . . . . . . . . . . . . . . . 8 68 3.2. Cycle Mapping . . . . . . . . . . . . . . . . . . . . . . 9 69 4. Performance Analysis . . . . . . . . . . . . . . . . . . . . 11 70 4.1. Queueing Delay . . . . . . . . . . . . . . . . . . . . . 11 71 4.2. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 73 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 74 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 75 8. Normative References . . . . . . . . . . . . . . . . . . . . 14 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 78 1. Introduction 80 This document explores the DetNet forwarding over large-scale 81 network. In contrast to TSN that deployed in LAN, DetNet is expected 82 to be deployed in larger scale network that has the following 83 features: 85 o a large number of network devices 87 o the distance between two network devices is long 89 o a lot of deterministic flows on the network 91 These above features will bring the following challenges to DetNet 92 forwarding: 94 o difficult to achieve precise time synchronization among all nodes 96 o long link propagation delay may introduce bigger jitter 97 o per-flow state is un-scalable 99 Motivated by these challenges, this document presents a Large-scale 100 Deterministic Network (LDN) mechanism. As 101 [draft-ietf-detnet-problem-statement] indicates, deterministic 102 forwarding can only apply on flows with well-defined traffic 103 characteristics. The traffic characteristics of DetNet flow has been 104 discussed in [draft-ietf-detnet-architecture], that could be achieved 105 through shaping at Ingress node or up-front commitment by 106 application. LDN assumes that DetNet flows follow some specific 107 traffic patterns accordingly. 109 1.1. Requirements Language 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 113 "OPTIONAL" in this document are to be interpreted as described in BCP 114 14 [RFC2119][RFC8174] when, and only when, they appear in all 115 capitals, as shown here. 117 1.2. Terminology & Abbreviations 119 This document uses the terminology defined in 120 [draft-ietf-detnet-architecture]. 122 TSN: Time Sensitive Network 124 PQ: Priority Queuing 126 CQF: Cyclic Queuing and Forwarding 128 LDN: Large-scale Deterministic Network 130 DSCP: Differentiated Services Code Point 132 EXP: Experimental 134 TC: Traffic Class 136 T: the length of a cycle 138 H: the number of hops 140 2. Overview 141 2.1. Summary 143 In LDN, nodes (network devices) have synchronized frequency, and each 144 node forwards packets in a slotted fashion based on a cycle 145 identifiers carried in packets. Ingres nodes or senders have a 146 function called gate to shape/condition traffic flows. Except for 147 this gate function, the LDN has no awareness of individual flows. 149 2.2. Background 151 This section motivates the design choices taken by the proposed 152 solution and gives the necessary background for deterministic delay 153 based forwarding plane designs. 155 2.2.1. Deterministic End-to-End Latency 157 Bounded delay is delay that has a deterministic upper and lower 158 bound. 160 The delay for packets that need to be forwarded with deterministic 161 delay needs to be deterministic on every hop. If any hop in the 162 network introduces non-deterministic delay, then the network itself 163 can not deliver a deterministic delay service anymore. 165 2.2.2. Hop-by-Hop Delay 167 Consider a simple example shown in Figure 1, where Node X has 10 168 receiving interfaces and one outgoing interface I all of the same 169 speed. There are 10 deterministic traffic flows, each consuming 5% 170 of a links bandwidth, one from each receiving interface to the 171 outgoing interface. 173 Node X sends 'only' 50% deterministic traffic to interface I, so 174 there is no ongoing congestion, but there is added delay. If the 175 arrival time of packets for these 10 flows into X is uncontrolled, 176 then the worst case is for them to all arrive at the same time. One 177 packet has to wait in X until the other 9 packets are sent out on I, 178 resulting in a worst case deterministic delay of 9 packets 179 serialization time. On the next hop node Y downstream from X, this 180 problem can become worse. Assume Y has 10 upstream nodes like X, the 181 worst case simultaneous burst of packets is now 100 packets, or a 99 182 packet serialization delay as the worst case upper bounded delay 183 incurred on this hop. 185 To avoid the problem of high upper bound end-to-end delay, traffic 186 needs to be conditioned/interleaved on every hop. This allows to 187 create solutions where the per-hop-delay is bounded purely by the 188 physics of the forwarding plane across the node, but not the 189 accumulated characteristics of prior hop traffic profiles. 191 +--+ +--+ --- --- 192 |A1| |A0| - - - - 193 +--+ +--+ - - - - 194 ---------------->- - - - 195 +--+ +--+ - - - - 196 |B1| |B0| - - - - 197 +--+ +--+ - -Interface I - - 198 ---------------->-Node X ---------------> -Node Y -----> 199 +--++--+ - - - - 200 |C1||C0| - - 201 +--++--+ - - - - 202 ---------------->- - - - 203 . - - - - 204 . - - - - 205 . - - - - 206 --- --- 208 Figure 1: Micro-burst and micro-burst iteration 210 2.2.3. Cyclic Forwarding 212 The common approach to solve that problem is that of a cyclic hop-by- 213 hop forwarding mechanism. Assume packets forwarded from N1 via N2 to 214 N3 as shown in Figure 2. When N1 sends a packet P to interface I1 215 with a Cycle X, it must be guaranteed by the forwarding mechanism 216 that N2 will forward P via I2 to N3 in a cycle Y. 218 The cycle of a packet can either be deduced by a receiving node from 219 the exact time it was received as is done in SDN/TDMA systems, and/or 220 it can be indicated in the packet. This document solution relies on 221 such markings because they allow to reduce the need for synchronous 222 hop-by-hop transmission timings of packets. 224 In a packet marking based slotted forwarding model, node N1 needs to 225 send packets for cycle X before the latest possible time that will 226 allow for N2 to further forward it in cycle Y to N3. Because of the 227 marking, N1 could even transmit packets for cycle X before all 228 packets for the previous cycle (X-1) have been sent, reducing the 229 synchronization requirements between across nodes. 231 P sent in P sent in P sent in 232 cycle(N1,I1,X) cycle(N2,I2,Y) cycle(N3,I3,Z) 233 +--------+ +--------+ +--------+ 234 | Node N1|------->| Node N2|-------->| Node N3|------> 235 +--------+I1 +--------+I2 +--------+I3 237 Figure 2: Cyclic Forwarding 239 2.2.4. Co-Existence with Non-Deterministic Traffic 241 Traffic with deterministic delay requirements can co-exist with 242 traffic only requiring non-deterministic delay by using packet 243 scheduling where the delay incurred by non-deterministic packets is 244 deterministic for the deterministic traffic (and low). If LDN is 245 deployed together with such non-deterministic delay traffic than such 246 a scheme must be supported by the forwarding plane. A simple 247 approach for the delay incurred on the sending interface of a 248 deterministic node due to non-deterministic traffic is to serve 249 deterministic traffic via a strict, highest-priority queue and 250 include the worst case delay of a currently serialized non- 251 deterministic packet into the deterministic delay budget of the node. 252 Similar considerations apply to the internal processing delays in a 253 node. 255 2.3. System Components 257 The Figure 3 shows an overview of the components considered in this 258 document system and how they interact. 260 A network topology of nodes, Ingress, Core and Egress support a 261 method for cyclic forwarding to enable LDN. This forwarding requires 262 no per-flow state on the nodes, and tolerates loss time 263 synchronization. 265 Ingress edge nodes may support the (G)ate function to shape traffic 266 from sources into the desired traffic characteristics, unless the 267 source itself has such function. Per-flow state is required on the 268 ingress edge node. LDN should work with some resource reservation 269 methods, that will be not discussed in this document. 271 /--\. +--+ +--+ +--+ +--+. /--\ 272 | (G)+-----+GS+--------+ S+------+ S+--------+ S+-----+ | 273 \--/ +--+ +--+ +--+ +--+ \--/ 275 Sender Ingress Core Core Egress Receiver 276 Edge Node Node Node Edge Node 278 Figure 3: Overview of LDN 280 3. LDN Forwarding Mechanism 282 DetNet aims at providing deterministic service over large scale 283 network. In such large scale network, it is difficulty to get 284 precise time synchronization among numerous devices. To reduce 285 requirements, the forwarding mechanism described in this document 286 assumes only frequency synchronization but not time synchronization 287 across nodes: nodes maintain the same clock frequency 1/T, but do not 288 require the same time as shown in Figure 4. 290 <-----T-----> <-----T-----> 291 | | | | | | 292 Node A +-----------+-----------+ Node A +-----------+-----------+ 293 T0 T0 295 | | | | | | 296 Node B +-----------+-----------+ Node B +-----------+-----------+ 297 T0 T0 299 (i) time synchronization (ii) frequency synchronization 301 T: length of a cycle 302 T0: timestamp 304 Figure 4: Time Synchronization & Frequency Synchronization 306 IEEE 802.1 CQF is an efficient forwarding mechanism in TSN that 307 guarantees bounded end-to-end latency. CQF is designed for limited 308 scale networks. Time synchronization is required, and the link 309 propagation delay is required to be smaller than a cycle length T. 310 Considering the large scale network deployment, the proposed LDN 311 Forwarding mechanism permits frequency synchronization and link 312 propagation delay may exceed T. Besides these two points, CQF and 313 the asynchronous forwarding of LDN are very similar. 315 Figure 5 compares CQF and LDN through an example. Suppose Node A is 316 the upstream node of Node B. In CQF, packets sent from Node A at 317 cycle x, will be received by Node B at the same cycle, then further 318 be sent to downstream node by Node B at cycle x+1. 320 In LDN, due to long link propagation delay and frequency 321 synchronization, Node B will receive packets from Node A at different 322 cycle denoted by y, then re-send out at cycle y+1. The cycle mapping 323 relationship (e.g., x->y+1) exists between any pair of neighbor 324 nodes. With this kind of cycle mapping, the receiving node can 325 easily figure out when the received packets should be sent out, the 326 only requirement is to carry the cycle identifier of sending node in 327 the packets. 329 | cycle x | cycle x+1 | | cycle x | cycle x+1 | 330 Node A +-----------+-----------+ Node A +-----------+-----------+ 331 \ \ 332 \packet \packet 333 \receiving \receiving 334 \ \ 335 | V | cycle x+1 | | V | cycle y+1| 336 Node B +-----------+-----------+ Node B +-----------+-----------+ 337 cycle x \packets cycle y \packets 338 \sending \sending 339 \ \ 340 \ \ 341 V V 343 (i) CQF (ii) LDN 345 Figure 5: CQF & LDN 347 3.1. Cyclic Queues 349 In CQF each port needs to maintain 2 (or 3) queues, one receiving 350 queue is used to buffer newly received packets, one sending queue is 351 used to store the packets that are going to be sent out, one more 352 queue may be needed to avoid output starvation [scheduled-queues]. 354 In LDN, at least 3 cyclic queues (2 receiving queues and 1 sending 355 queue) are maintained for each port on a node. A cyclic queue 356 corresponds to a cycle. As Figure 6 illustrated, the downstream Node 357 B may receive packets sent at two different cycles from Node A due to 358 the absence of time synchronization. Following the cycle mapping 359 (i.e., x --> y+1), packets that carry cycle identifier x should be 360 sent out by Node B at cycle y+1, and packets that carry cycle 361 identifier x+1 should be sent out by Node B at cycle y+2. Therefore, 362 2 receiving queues are needed to store the received packets, one is 363 for the packets that carry cycle identifier x, another one is for the 364 packets that carry cycle identifier x+1. Plus one sending queue, 365 each port needs at least 3 cyclic queues in LDN. In order to absorb 366 more link delay variation (such as on radio interface), more queues 367 may be necessary. 369 | cycle x | cycle x+1 | 370 Node A +-----------+-----------+ 371 \ \ 372 \ \packet 373 \ \receiving 374 | V V | | 375 Node B +-----------+-----------+ 376 cycle y cycle y+1 378 Figure 6: An example illustrates for 2 receiving queue in LDN 380 3.2. Cycle Mapping 382 The cycle mapping relationship (e.g., x->y+1) exists between any pair 383 of neighbor nodes, that could be configured through control plane or 384 self-studied in data plane. As Figure 7 shows, the cycle mapping 385 relationship instructs the packet forwarding in two modes -- swap 386 mode or stack mode. 388 o In swap mode, node stores the cycle mapping relationship locally. 389 After receiving a packet carrying a cycle identifier, the node 390 will check its cycle mapping relationship table, swap the cycle 391 identifier with a new cycle identifier, then put the packet into 392 an appropriate queue. A path with dedicated resource needs to be 393 established first, then packet is forwarded along the path in swap 394 mode. 396 o In stack mode, a central controller computes the cycle identifier 397 of every node, which ensures that there is no flow confliction 398 along the path and satisfies the end-to-end latency requirement. 399 The cycle identifiers are encapsulated into the packet in the 400 ingress. No other status information needs to be maintained in 401 the intermediate nodes. 403 LDN Packet 404 +------+---+ +-----------------------+ +------+---+ 405 | | x | | | | |y+1| 406 +------+---+ | Swap Mode Node | +------+---+ 407 ----------->| |-----------> 408 | (x->y+1) | 409 | | 410 +-----------------------+ 412 LDN Packet 413 +------+---===== +-----------------------+ +------=====---+ 414 | |y+1= x = | | | =y+1= x | 415 +------+---===== | | +------=====---+ 416 ----------->| Stack Mode Node |-----------> 417 | | 418 | | 419 +-----------------------+ 421 ===== 422 = = Current Cycle Identifier 423 ===== 425 Figure 7: Two Modes 427 As section 3.1 illustrates, there are 3 (or 4) different queues at 428 each port. Therefore, the cycle identifier should be able to express 429 3 (or 4) different values, each value corresponds to a queue. That 430 means minimal 2 bits are needed to identify different cycles between 431 a pair of neighboring nodes. This document does not yet aim to 432 propose one, but gives an (incomplete) list of ideas: 434 o DSCP of IPv4 Header 436 o Traffic Class of IPv6 Header 438 o TC of MPLS Header (used to be EXP) 440 o IPv6 Extension Header 442 o UDP Option 444 o SID of SRv6 446 o Reserved of SRH 448 o TLV of SRv6 449 o TC of SR-MPLS Header (used to be EXP) 451 o 3 (or 4) labels/adjacency SIDs for SR-MPLS 453 4. Performance Analysis 455 4.1. Queueing Delay 457 Figure 8 describes one-hop packet forwarding delay, that mainly 458 consisted of A->B link propagation delay and queuing delay in Node B. 460 |cycle x | 461 Node A +-------\+ 462 \ 463 \ 464 \ 465 |\ cycle y|cycle y+1| 466 Node B +V--------+--------\+ 467 : \ 468 : Queueing Delay :\ 469 :...=2*T ............ V 471 Figure 8: Single-Hop Queueing Delay 473 As Figure 8 shows, cycle x of Node A will be mapped into cycle y+1 of 474 Node B as long as the last packet sent from A->B is received within 475 the cycle y. If the last packet is re-sent out by B at the end of 476 cycle y+1, then the largest single-hop queueing delay is 2*T. 477 Therefore the end-to-end queueing delay's upper bound is 2*T*H, where 478 H is the number of hops. 480 If A did not forward the LDN packet from a prior LDN forwarder but is 481 the actual traffic source, then the packet may have been delayed by a 482 gate function before it was sent to B. The delay of this function is 483 outside of scope for the LDN delay considerations. If B is not 484 forwarding the LDN packet but the final receiver, then the packet may 485 not need to be queued and released in the same fashion to the 486 receiver as it would be queued/released to a downstream LDN node, so 487 if a path has one source followed by N LDN forwarders followed by one 488 receivers, this should be considered to be a path with N-1 LDN hops 489 for the purpose of latency and jitter calculations. 491 4.2. Jitter 493 Considering the simplest scenario one hop forwarding at first, 494 suppose Node A is the upstream node of Node B, the packet sent from 495 Node A at cycle x will be received by Node B at cycle y as Figure 9 496 shows. 498 - The best situation is Node A sends packet at the end of cycle x, 499 and Node B receives packet at the beginning of cycle y, then the 500 delay is denoted by w; 502 - The worst situation is Node A sends packet at the beginning of 503 cycle x, and Node B receives packet at the end of cycle y, then 504 the delay= w + length of cycle x + length of cycle y= w+2*T; 506 - Hence the jitter's upper bound of this simplest scenario= worst 507 case-best case=2*T. 509 |cycle x | |cycle x | 510 Node A +-------\+ Node A +\-------+ 511 :\ \ : 512 : \ -------------\ 513 : \ : \ 514 :w |\ | :w| \ | 515 Node B : +V--------+ Node B : +--------V+ 516 cycle y cycle y 518 (a) best situation (b) worst situation 520 Figure 9: Jitter Analysis for One Hop Forwarding 522 Next considering two hops forwarding as Figure 10 shows. 524 - The best situation is Node A sends packet at the end of cycle x, 525 and Node C receives packet at the beginning of cycle z, then the 526 delay is denoted by w'; 528 - The worst situation is Node A sends packet at the beginning of 529 cycle x, and Node C receives packet at the end of cycle z, then 530 the delay= w' + length of cycle x + length of cycle z= w'+2*T; 532 - Hence the jitter's upper bound = worst case-best case=2*T. 534 |cycle x | 535 Node A +-------\+ 536 \ 537 :\| cycle y | 538 Node B : \---------+ 539 : \ 540 : \--------\ 541 : \ | 542 Node C ......w'......+V--------+ 543 cycle z 545 (a) best situation 547 |cycle x | 548 Node A +\-------+ 549 \ : 550 \ : | cycle y | 551 Node B \ : +---------+ 552 \ : 553 ---:--------------------\ 554 : | \ | 555 Node C :......w'.....+--------V+ 556 cycle z 558 (b) worst situation 560 Figure 10: Jitter Analysis for Two Hops Forwarding 562 And so on. For multi-hop forwarding, the end-to-end delay will 563 increase as the number of hops increases, while the delay variation 564 (jitter) still does not exceed 2*T. 566 5. IANA Considerations 568 This document makes no request of IANA. 570 6. Security Considerations 572 Security issues have been carefully considered in 573 [draft-ietf-detnet-security]. More discussion is TBD. 575 7. Acknowledgements 577 TBD. 579 8. Normative References 581 [draft-ietf-detnet-architecture] 582 "DetNet Architecture", . 585 [draft-ietf-detnet-dp-sol] 586 "DetNet Data Plane Encapsulation", 587 . 590 [draft-ietf-detnet-problem-statement] 591 "DetNet Problem Statement", 592 . 595 [draft-ietf-detnet-security] 596 "DetNet Security Considerations", 597 . 600 [draft-ietf-detnet-use-cases] 601 "DetNet Use Cases", . 604 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 605 Requirement Levels", BCP 14, RFC 2119, 606 DOI 10.17487/RFC2119, March 1997, 607 . 609 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 610 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 611 May 2017, . 613 [scheduled-queues] 614 "Scheduled queues, UBS, CQF, and Input Gates", 615 . 618 Authors' Addresses 620 Li Qiang (editor) 621 Huawei 622 Beijing 623 China 625 Email: qiangli3@huawei.com 626 Xuesong Geng 627 Huawei 628 Beijing 629 China 631 Email: gengxuesong@huawei.com 633 Bingyang Liu 634 Huawei 635 Beijing 636 China 638 Email: liubingyang@huawei.com 640 Toerless Eckert (editor) 641 Huawei USA - Futurewei Technologies Inc. 642 2330 Central Expy 643 Santa Clara 95050 644 USA 646 Email: tte+ietf@cs.fau.de 648 Liang Geng 649 China Mobile 650 Beijing 651 China 653 Email: gengliang@chinamobile.com