idnits 2.17.1 draft-ietf-lsr-isis-fast-flooding-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 December 2021) is 869 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-18) exists of draft-ietf-lsr-dynamic-flooding-10 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Decraene 3 Internet-Draft Orange 4 Intended status: Experimental L. Ginsberg 5 Expires: 12 June 2022 Cisco Systems 6 T. Li 7 Arista Networks 8 G. Solignac 10 M. Karasek 11 Cisco Systems 12 C. Bowers 13 Juniper Networks, Inc. 14 G. Van de Velde 15 Nokia 16 P. Psenak 17 Cisco Systems 18 T. Przygienda 19 Juniper 20 9 December 2021 22 IS-IS Fast Flooding 23 draft-ietf-lsr-isis-fast-flooding-00 25 Abstract 27 Current Link State Protocol Data Unit (PDU) flooding rates are much 28 slower than what modern networks can support. The use of IS-IS at 29 larger scale requires faster flooding rates to achieve desired 30 convergence goals. This document discusses the need for faster 31 flooding, the issues around faster flooding, and some example 32 approaches to achieve faster flooding. It also defines protocol 33 extensions relevant to faster flooding. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at https://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on 12 June 2022. 51 Copyright Notice 53 Copyright (c) 2021 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 58 license-info) in effect on the date of publication of this document. 59 Please review these documents carefully, as they describe your rights 60 and restrictions with respect to this document. Code Components 61 extracted from this document must include Revised BSD License text as 62 described in Section 4.e of the Trust Legal Provisions and are 63 provided without warranty as described in the Revised BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 69 3. Historical Behavior . . . . . . . . . . . . . . . . . . . . . 4 70 4. Flooding Parameters TLV . . . . . . . . . . . . . . . . . . . 5 71 4.1. LSP Burst Window sub-TLV . . . . . . . . . . . . . . . . 6 72 4.2. LSP Transmission Interval sub-TLV . . . . . . . . . . . . 6 73 4.3. LSPs Per PSNP sub-TLV . . . . . . . . . . . . . . . . . . 6 74 4.4. Flags sub-TLV . . . . . . . . . . . . . . . . . . . . . . 6 75 4.5. Partial SNP Interval sub-TLV . . . . . . . . . . . . . . 7 76 4.6. Operation on a LAN interface . . . . . . . . . . . . . . 7 77 5. Performance improvement on the receiver . . . . . . . . . . . 8 78 5.1. Rate of LSP Acknowledgments . . . . . . . . . . . . . . . 8 79 5.2. Packet Prioritization on Receive . . . . . . . . . . . . 9 80 6. Congestion and Flow Control . . . . . . . . . . . . . . . . . 10 81 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 82 6.2. Congestion and Flow Control algorithm: Example 1 . . . . 10 83 6.3. Congestion Control algorithm: Example 2 . . . . . . . . . 17 84 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 85 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 86 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 87 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 88 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 89 11.1. Normative References . . . . . . . . . . . . . . . . . . 21 90 11.2. Informative References . . . . . . . . . . . . . . . . . 22 91 Appendix A. Changes / Author Notes . . . . . . . . . . . . . . . 22 92 Appendix B. Issues for Further Discussion . . . . . . . . . . . 22 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 95 1. Introduction 97 Link state IGPs such as Intermediate-System-to-Intermediate-System 98 (IS-IS) depend upon having consistent Link State Databases (LSDB) on 99 all Intermediate Systems (ISs) in the network in order to provide 100 correct forwarding of data packets. When topology changes occur, 101 new/updated Link State PDUs (LSPs) are propagated network-wide. The 102 speed of propagation is a key contributor to convergence time. 104 Historically, flooding rates have been conservative - on the order of 105 10s of LSPs/second. This is the result of guidance in the base 106 specification [ISO10589] and early deployments when both CPU speeds 107 and interface speeds were much slower and the scale of an area was 108 much smaller than they are today. 110 As IS-IS is deployed in greater scale both in the number of nodes in 111 an area and in the number of neighbors per node, the impact of the 112 historic flooding rates becomes more significant. Consider the 113 bringup or failure of a node with 1000 neighbors. This will result 114 in a minimum of 1000 LSP updates. At typical LSP flooding rates used 115 today (33 LSPs/second), it would take 30+ seconds simply to send the 116 updated LSPs to a given neighbor. Depending on the diameter of the 117 network, achieving a consistent LSDB on all nodes in the network 118 could easily take a minute or more. 120 Increasing the LSP flooding rate therefore becomes an essential 121 element of supporting greater network scale. 123 Improving the LSP flooding rate is complementary to protocol 124 extensions that reduce LSP flooding traffic by reducing the flooding 125 topology such as Mesh Groups [RFC2973] or Dynamic Flooding 126 [I-D.ietf-lsr-dynamic-flooding] . Reduction of the flooding topology 127 does not alter the number of LSPs required to be exchanged between 128 two nodes, so increasing the overall flooding speed is still 129 beneficial when such extensions are in use. It is also possible that 130 the flooding topology can be reduced in ways that prefer the use of 131 neighbors that support improved flooding performance. 133 2. Requirements Language 135 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 136 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 137 "OPTIONAL" in this document are to be interpreted as described in BCP 138 14 [RFC2119] [RFC8174] when, and only when, they appear in all 139 capitals, as shown here. 141 3. Historical Behavior 143 The base specification for IS-IS [ISO10589] was first published in 144 1992 and updated in 2002. The update made no changes in regards to 145 suggested timer values. Convergence targets at the time were on the 146 order of seconds and the specified timer values reflect that. Here 147 are some examples: 149 minimumLSPGenerationInterval - This is the minimum time interval 150 between generation of Link State PDUs. A source Intermediate 151 system shall wait at least this long before re-generating one 152 of its own Link State PDUs. 154 The recommended value is 30 seconds. 156 minimumLSPTransmissionInterval - This is the amount of time an 157 Intermediate system shall wait before further propagating 158 another Link State PDU from the same source system. 160 The recommended value is 5 seconds. 162 partialSNPInterval - This is the amount of time between periodic 163 action for transmission of Partial Sequence Number PDUs. 164 It shall be less than minimumLSPTransmission-Interval. 166 The recommended value is 2 seconds. 168 Most relevant to a discussion of the LSP flooding rate is the 169 recommended interval between the transmission of two different LSPs 170 on a given interface. 172 For broadcast interfaces, [ISO10589] defined: 174 minimumBroadcastLSPTransmissionInterval - the minimum interval 175 between PDU arrivals which can be processed by the slowest 176 Intermediate System on the LAN. 178 The default value was defined as 33 milliseconds. It is permitted to 179 send multiple LSPs "back-to-back" as a burst, but this was limited to 180 10 LSPs in a one second period. 182 Although this value was specific to LAN interfaces, this has commonly 183 been applied by implementations to all interfaces though that was not 184 the original intent of the base specification. In fact 185 Section 12.1.2.4.3 states: 187 On point-to-point links the peak rate of arrival is limited only 188 by the speed of the data link and the other traffic flowing on 189 that link. 191 Although modern implementations have not strictly adhered to the 33 192 millisecond interval, it is commonplace for implementations to limit 193 the flooding rate to an order of magnitude similar to the 33 ms 194 value. 196 In the past 20 years, significant work on achieving faster 197 convergence - more specifically sub-second convergence - has resulted 198 in implementations modifying a number of the above timers in order to 199 support faster signaling of topology changes. For example, 200 minimumLSPGenerationInterval has been modified to support millisecond 201 intervals, often with a backoff algorithm applied to prevent LSP 202 generation storms in the event of a series of rapid oscillations. 204 However, the flooding rate has not been fundamentally altered. 206 4. Flooding Parameters TLV 208 This document defines a new Type-Length-Value tuple (TLV) called the 209 "Flooding Parameters TLV" that may be included in IS to IS Hellos 210 (IIH) or Partial Sequence Number PDUs (PSNPs). It allows IS-IS 211 implementations to advertise flooding related parameters and 212 capabilities which may be of use to the peer in support of faster 213 flooding. 215 Type: TBD1 217 Length: variable, the size in octets of the Value field 219 Value: One or more sub-TLVs 221 Several sub-TLVs are defined in this document. The support of any 222 sub-TLV is OPTIONAL. 224 For a given IS-IS adjacency, the Flooding Parameters TLV does not 225 need to be advertised in each IIH or PSNP. An IS uses the latest 226 received value for each parameter until a new value is advertised by 227 the peer. However, as IIHs and PSNPs are not reliably exchanged, and 228 may never be received, parameters SHOULD be sent even if there is no 229 change in value since the last transmission. For a parameter which 230 has never been advertised, an IS SHOULD use its local default value. 231 That value SHOULD be configurable on a per node basis and MAY be 232 configurable on a per interface basis. 234 4.1. LSP Burst Window sub-TLV 236 The LSP Burst Window sub-TLV advertises the maximum number of LSPs 237 that the node can receive with no separation interval between LSPs. 239 Type: 1 241 Length: 4 octets 243 Value: number of LSPs that can be sent back to back 245 4.2. LSP Transmission Interval sub-TLV 247 The LSP Transmission Interval sub-TLV advertises the minimum 248 interval, in micro-seconds, between LSPs arrivals which can be 249 received on this interface, after the maximum number of un- 250 acknowledged LSPs has been sent. 252 Type: 2 254 Length: 4 octets 256 Value: minimum interval, in micro-seconds, between two consecutive 257 LSPs sent after the burst window has been used 259 The LSP Transmission Interval is an advertisement of the receiver's 260 steady-state LSP reception rate. 262 4.3. LSPs Per PSNP sub-TLV 264 The LSP per PSNP (LPP) sub-TLV advertises the number of received LSPs 265 that triggers the immediate sending of a PSNP to acknowledge them. 267 Type: 3 269 Length: 2 octets 271 Value: number of LSPs acknowledged per PSNP 273 A node advertising this sub-TLV with a value LPP MUST send a PSNP 274 once LPP LSPs have been received and need to be acknowledged. 276 4.4. Flags sub-TLV 278 The sub-TLV Flags advertises a set of flags. 280 Type: 4 281 Length: Indicates the length in octets (1-8) of the Value field. The 282 length SHOULD be the minimum required to send all bits that are set. 284 Value: List of flags. 286 0 1 2 3 4 5 6 7 ... 287 +-+-+-+-+-+-+-+-+... 288 |O| ... 289 +-+-+-+-+-+-+-+-+... 291 When the O flag is set, the LSP will be acknowledged in the order 292 they are received: a PSNP acknowledging N LSPs is acknowledging the N 293 oldest LSPs received. The order inside the PSNP is meaningless. If 294 the sender keeps track of the order of LSPs sent, this indication 295 allows a fast detection of the loss of an LSP. This MUST NOT be used 296 to trigger faster retransmission of LSP. This MAY be used to trigger 297 a congestion signal. 299 4.5. Partial SNP Interval sub-TLV 301 The Partial SNP Interval sub-TLV advertises the amount of time in 302 milliseconds between periodic action for transmission of Partial 303 Sequence Number PDUs. This time will trigger the sending of a PSNP 304 even if the number of unacknowledged LSPs received on a given 305 interface does not exceed LPP (Section 4.3). The time is measured 306 from the reception of the first unacknowldeged LSP. 308 Type: 5 310 Length: 2 octets 312 Value: partialSNPInterval in milliseconds 314 A node advertising this sub-TLV SHOULD send a PSNP at least once per 315 Partial SNP Interval if one or more unacknowledged LSPs have been 316 received on a given interface. 318 4.6. Operation on a LAN interface 320 On a LAN interface, all LSPs are link-level multicasts. Each LSP 321 sent will be received by all ISs on the LAN and each IS will receive 322 LSPs from all transmitters. In this section, we clarify how the 323 flooding parameters should be interpreted in the context of a LAN. 325 An LSP receiver on a LAN will communicate its desired flooding 326 parameters using a single Flooding Parameters TLV, copies of which 327 will be received by all transmitters. The flooding parameters sent 328 by the LSP receiver MUST be understood as instructions from the 329 receiver to each transmitter about the desired maximum transmit 330 characteristics of each transmitter. The receiver is aware that 331 there are multiple transmitters that can send LSPs to the receiver 332 LAN interface. The receiver might want to take that into account by 333 advertising more conservative values, e.g. a higher LSP Transmission 334 Interval. When the transmitters receive the LSP Transmission 335 Interval value advertised by a LSP receiver, the transmitters should 336 rate limit LSPs according to the advertised flooding parameters. 337 They should not apply any further interpretation to the flooding 338 parameters advertised by the receiver. 340 A given LSP transmitter will receive multiple flooding parameter 341 advertisements from different receivers that may carry different 342 flooding parameter values. A given transmitter SHOULD use the most 343 convervative value on a per parameter basis. For example, if the 344 transmitter receives multiple LSP Burst Window values, it should use 345 the smallest value. 347 5. Performance improvement on the receiver 349 This section defines two behaviors that SHOULD be implemented on the 350 receiver. 352 5.1. Rate of LSP Acknowledgments 354 On point-to-point networks, PSNP PDUs provide acknowledgments for 355 received LSPs. [ISO10589] suggests that some delay be used when 356 sending PSNPs. This provides some optimization as multiple LSPs can 357 be acknowledged in a single PSNP. 359 Faster LSP flooding benefits from a faster feedback loop. This 360 requires a reduction in the delay in sending PSNPs. 362 The receiver SHOULD reduce its partialSNPInterval. The choice of 363 this lower value is a local choice. It may depend on the available 364 processing power of the node, the number of adjacencies, and the 365 requirement to synchronize the LSDB more quickly. 200 ms seems to be 366 a reasonable value. 368 In addition to the timer based partialSNPInterval, the receiver 369 SHOULD keep track of the number of unacknowledged LSPs per circuit 370 and level. When this number exceeds a preset threshold of LSPs Per 371 PSNP (LPP), the receiver SHOULD immediately send a PSNP without 372 waiting for the PSNP timer to expire. In case of a burst of LSPs, 373 this allows for more frequent PSNPs, giving faster feedback to the 374 sender. Outside of the burst case, the usual time-based PSNP 375 approach comes into effect. The LPP SHOULD also be less than or 376 equal to 90 as this is the maximum number of LSPs that can be 377 acknowledged in a PSNP at common MTU sizes, hence waiting longer 378 would not reduce the number of PSNPs sent but would delay the 379 acknowledgements. Based on experimental evidence, 15 unacknowledged 380 LSPs is a good value assuming that the LSP Burst Window is at least 381 30 and reasonably fast CPUs for both the transmitter and receiver. 382 More frequent PSNPs gives the transmitter more feedback on receiver 383 progress, allowing the transmitter to continue transmitting while not 384 burdening the receiver with undue overhead. 386 By deploying both the time-based and the threshold-based PSNP 387 approaches, the receiver can be adaptive to both LSP bursts and 388 infrequent LSP updates. 390 As PSNPs also consume link bandwidth, packet queue space, and 391 protocol processing time on receipt, the increased sending of PSNPs 392 should be taken into account when considering the rate at which LSPs 393 can be sent on an interface. 395 5.2. Packet Prioritization on Receive 397 There are three classes of PDUs sent by IS-IS: 399 * Hellos 401 * LSPs 403 * Complete Sequence Number PDUs (CSNPs) and PSNPs 405 Implementations today may prioritize the reception of Hellos over 406 LSPs and SNPs in order to prevent a burst of LSP updates from 407 triggering an adjacency timeout which in turn would require 408 additional LSPs to be updated. 410 CSNPs and PSNPs serve to trigger or acknowledge the transmission of 411 specified LSPs. On a point-to-point link, PSNPs acknowledge the 412 receipt of one or more LSPs. For this reason, [ISO10589] specifies a 413 delay (partialSNPInterval) before sending a PSNP so that the number 414 of PSNPs required to be sent is reduced. On receipt of a PSNP, the 415 set of LSPs acknowledged by that PSNP can be marked so that they do 416 not need to be retransmitted. 418 If a PSNP is dropped on reception, the set of LSPs advertised in the 419 PSNP cannot be marked as acknowledged and this results in needless 420 retransmissions that will further delay transmission of other LSPs 421 that have yet to be transmitted. It may also make it more likely 422 that a receiver becomes overwhelmed by LSP transmissions. 424 It is therefore RECOMMENDED that implementations prioritize the 425 receipt of Hellos and then SNPs over LSPs. Implementations MAY also 426 prioritize IS-IS packets over other less critical protocols. 428 6. Congestion and Flow Control 430 6.1. Overview 432 Ensuring the goodput between two entities is a layer 4 responsibility 433 as per the OSI model and a typical example is the TCP protocol 434 defined in RFC 793 [RFC0793] and relies on the flow control, 435 congestion control, and reliability mechanisms of the protocol. 437 Flow control creates a control loop between a transmiter and a 438 receiver so that the transmitter does not overwhelm the receiver. 439 TCP provides a mean for the receiver to govern the amount of data 440 sent by the sender through the use of a sliding window. 442 Congestion control creates multiple interacting control loops between 443 multiple transmitters and multiple receivers to prevent the 444 transmitters from overwhelming the overall network. For an IS-IS 445 adjacency, the network between two IS-IS neighbors is relatively 446 limited in scope and consist of a link that is typically over-sized 447 compared to the capability of the IS-IS speakers, but may also 448 includes components inside both routers such as a switching fabric, 449 line card CPU, and forwarding plane buffers that may experience 450 congestion. These resources may be shared across multiple IS-IS 451 adjacencies for the system and it is the responsibility of congestion 452 control to ensure that these are shared reasonably. 454 Reliability provides loss detection and recovery. IS-IS already has 455 mechanisms to ensure the reliable transmission of LSPs. This is not 456 changed by this document. 458 The following two sections provides examples of Flow and/or 459 Congestion control algorithms as examples that may be implemented by 460 taking advantage of the extensions defined in this document. They 461 are non-normative. An implementation may implement any congestion 462 control algorithm. 464 6.2. Congestion and Flow Control algorithm: Example 1 465 6.2.1. Flow control 467 A flow control mechanism creates a control loop between a single 468 instance of a transmitter and a single receiver. This example uses a 469 mechanism similar to the TCP receive window to allow the receiver to 470 govern the amount of data sent by the sender. This receive window 471 ('rwin') indicates an allowed number of LSPs that the sender may 472 transmit before waiting for an acknowledgment. The size of the 473 receive window, in units of LSPs, is initialized with the value 474 advertised by the receiver in the LSP Burst Window sub-TLV. If no 475 value is advertised, the transmitter should initialize rwin with its 476 own local value. 478 When the transmitter sends a set of LSPs to the receiver, it 479 subtracts the number of LSPs sent from rwin. If the transmitter 480 receives a PSNP, then rwin is incremented for each acknowledged LSP. 481 The transmitter must ensure that the value of rwin never goes 482 negative. 484 6.2.1.1. Operation on a point to point interface 486 By sending the LSP Burst Window sub-TLV, a node advertises to its 487 neighbor its ability to receive that many un-acknowledged LSPs from 488 the neighbor, with no separation interval. This is akin to a receive 489 window or sliding window in flow control. In some implementations, 490 this value should reflect the IS-IS socket buffer size. Special care 491 must be taken to leave space for CSNP and PSNP (SNP) PDUs and IIHs if 492 they share the same input queue. In this case, this document 493 suggests advertising an LSP Burst Window corresponding to half the 494 size of the IS-IS input queue. 496 By advertising an LSP Transmission Interval sub-TLV, a node 497 advertises its ability to receive LSPs separated by at least the 498 advertised value, outside of LSP bursts. 500 The LSP transmitter MUST NOT exceed these parameters. After having 501 sent a full burst of un-acknowledged LSPs, it MUST send the following 502 LSPs with an LSP Transmission Interval between LSP arrivals. For CPU 503 scheduling reasons, this rate may be averaged over a small period 504 e.g. 10 to 30ms. 506 If either the LSP transmitter or receiver does not adhere to these 507 parameters, for example because of transient conditions, this causes 508 no fatal condition to the operation of IS-IS. In the worst case, an 509 LSP is lost at the receiver and this situation is already remedied by 510 mechanisms in [ISO10589] . After a few seconds, neighbors will 511 exchange PSNPs (for point to point interfaces) or CSNPs (for 512 broadcast interfaces) and recover from the lost LSPs. This worst 513 case should be avoided as those additional seconds impact convergence 514 time as the LSDB is not fully synchronized. Hence it is better to 515 err on the conservative side and to under-run the receiver rather 516 than over-run it. 518 6.2.1.2. Operation on a broadcast LAN interface 520 In order for the LSP Burst Window to be a useful parameter, an LSP 521 transmitter needs to be able to keep track of the number of un- 522 acknowledged LSPs it has sent to a given LSP receiver. On a LAN 523 there is no explicit acknowledgment of the receipt of LSPs between a 524 given LSP transmitter and a given LSP receiver. However, an LSP 525 transmitter on a LAN can infer whether any LSP receiver on the LAN 526 has requested retransmission of LSPs from the DIS by monitoring PSNPs 527 generated on the LAN. If no PSNPs have been generated on the LAN for 528 a suitable period of time, then an LSP transmitter can safely set the 529 number of un-acknowledged LSPs to zero. Since this suitable period 530 of time is much higher than the fast acknowledgment of LSPs defined 531 in Section 5.1, the sustainable transmission rate of LSPs will be 532 much slower on a LAN interface than on a point to point interface. 533 The LSP Burst Window is still very useful for the first burst of LSPs 534 sent, especially in the case of a single node failure that requires 535 the flooding of a relatively small number of LSPs. 537 6.2.2. Congestion control 539 Whereas flow control prevents the sender from overwhelming the 540 receiver, congestion control prevents senders from overwhelming the 541 network. For an IS-IS adjacency, the network between two IS-IS 542 neighbors is relatively limited in scope and includes a single link 543 which is typically over-sized compared to the capability of the IS-IS 544 speakers. 546 This section describes one congestion control algorithm largely 547 inspired by the TCP congestion control algorithm RFC 5681 [RFC5681]. 549 The proposed algorithm uses a variable congestion window 'cwin'. It 550 plays a role similar to the receive window described above. The main 551 difference is that cwin is dynamically changed according to various 552 events described below. 554 6.2.2.1. Core algorithm 556 In its simplest form, the congestion control algorithm looks like the 557 following: 559 +---------------+ 560 | | 561 | v 562 | +----------------------+ 563 | | Congestion avoidance | 564 | + ---------------------+ 565 | | 566 | | Congestion signal 567 ----------------+ 569 Figure 1 571 The algorithm starts with cwin := LPP + 1. In the congestion 572 avoidance phase, cwin increases as LSPs are acked: for every acked 573 LSP, cwin += 1 / cwin. Thus, the sending rate roughly increases 574 linearly with the RTT. Since the RTT is low in many IS-IS 575 deployments, the sending rate can reach fast rates in short periods 576 of time. 578 When updating cwin, it must not become higher than the number of LSPs 579 waiting to be sent, otherwise the sending will not be paced by the 580 receiving of acks. Said differently, tx pressure is needed to 581 maintain and increase cwin. 583 When the congestion signal is triggered, cwin is set back to its 584 initial value and the congestion avoidance phase starts again. 586 6.2.2.2. Congestion signals 588 The congestion signal can take various forms. The more reactive the 589 congestion signals, the less LSPs will be lost due to congestion. 590 However, congestion signals too aggressive will cause a sender to 591 keep a very low sending rate even without actual congestion on the 592 path. 594 Two practical signals are given hereafter. 596 Timers: when receiving acknowledgements, a sender estimates the 597 acknowledgement time of the receiver. Based on this estimation, it 598 can infer that a packet was lost, and infer congestion on the path. 600 There can be a timer per LSP, but this can become costly for 601 implementations. It is possible to use only a single timer t1 for 602 every LSPs: during t1, sent LSPs are recorded in a list list_1. Once 603 the RTT is over, list_1 is kept and another list list_2 is used to 604 store the next LSPs. LSPs are removed from the lists when acked. At 605 the end of the second t1 period, every LSP in list_1 should have been 606 acked, so list_1 is checked to be empty. list_1 can then be reused 607 for the next RTT. 609 There are multiple strategies to set the timeout value t1. It should 610 be based on measures of the maximum acknowledgement time (MAT) of 611 each PSNPs. The simplest one is to use a exponential moving average 612 of the MATs, like RFC 6298 [RFC6298]. A more elaborate one is to 613 take a running maximum of the MATs over a period of time of a few 614 seconds. This value should include a margin of error to avoid false 615 positives (e.g. estimated MAT measure variance) which would have a 616 significant impact on performance. 618 Reordering: a sender can record its sending order and check that 619 acknowledgements arrive on the same order than LSPs. This makes an 620 additional assumption and should ideally be backed up by a 621 confirmation by the receiver that this assumption stands. The O flag 622 defined in Section 4.4 serves this purpose. 624 6.2.2.3. Refinement 1 626 With the algorithm presented above, if congestion is detected, cwin 627 goes back to its initial value, and does not use the information 628 gathered in previous congestion avoidance phases. 630 It is possible to use a fast recovery phase once congestion is 631 detected, to avoid going through this linear rate of growth from 632 scratch. When congestion is detected, a fast recovery threshold 633 frthresh is set to frthresh := cwin / 2 In this fast recovery phase, 634 for every acked LSP, cwin += 1. Once cwin reaches frthresh, the 635 algorithm goes back to the congestion avoidance phase. 637 +---------------+ 638 | | 639 | v 640 | +----------------------+ 641 | | Congestion avoidance | 642 | + ---------------------+ 643 | | 644 | | Congestion signal 645 | | 646 | +----------------------+ 647 | | Fast recovery | 648 | +----------------------+ 649 | | 650 | | frthresh reached 651 ----------------+ 653 Figure 2 655 6.2.2.4. Refinement 2 657 The rates of increase were inspired from TCP RFC 5681 [RFC5681], but 658 it is possible that a different rate of increase for cwin in the 659 congestion avoidance phase actually yields better results due to the 660 low RTT values in most IS-IS deployments. 662 6.2.2.5. Remarks 664 This algorithm's performance is dependent on the LPP value. Indeed, 665 the smaller LPP is, the more information is available for the 666 congestion control algorithm to perform well. However, it also 667 increases the resources spent on sending PSNPs, so a tradeoff must be 668 made. This document recommends to use an LPP of 15 or less. If an 669 LSP Burst Window is advertised, LPP SHOULD be lower and the best 670 performance is achieved when LPP is an integer fraction of the LSP 671 Burst Window. 673 Note that this congestion control algorithm benefits from the 674 extensions proposed in this document. The advertisement of a receive 675 window from the receiver (Section 6.2.1) avoids the use of an 676 arbitrary maximum value by the sender. The faster acknowledgment of 677 LSPs (Section 5.1) allows for a faster control loop and hence a 678 faster increase of the congestion window in the absence of 679 congestion. 681 6.2.3. Determining values to be advertised in the Flooding Parameters 682 TLV 684 The values that a receiver advertises do not need to be perfect. If 685 the values are too low then the transmitter will not use the full 686 bandwidth or available CPU resources. If the values are too high 687 then the receiver may drop some LSPs during the first RTT and this 688 loss will reduce the usable receive window and the protocol 689 mechanisms will allow the adjacency to recover. Flooding several 690 orders of magnitude slower than both nodes can achieve will hurt 691 performance, as will consistently overloading the receiver. 693 The values advertised need not be dynamic as feedback is provided by 694 the acknowledgment of LSPs in SNP messages. Acknowledgments provide 695 a feedback loop on how fast the LSPs are processed by the receiver. 696 They also signal that the LSPs can be removed from receive window, 697 explicitly signaling to the sender that more LSPs may be sent. By 698 advertising relatively static parameters, we expect to produce 699 overall flooding behavior similar to what might be achieved by 700 manually configuring per-interface LSP rate limiting on all 701 interfaces in the network. The advertised values may be based, for 702 example, on an offline tests of the overall LSP processing speed for 703 a particular set of hardware and the number of interfaces configured 704 for IS-IS. With such a formula, the values advertised in the 705 Flooding Parameters TLV would only change when additional IS-IS 706 interfaces are configured. 708 The values may be updated dynamically, to reflect the relative change 709 of load of the receiver, by improving the values when the receiver 710 load is getting lower and degrading the values when the receiver load 711 is getting higher. For example, if LSPs are regularly dropped, or if 712 the queue regularly comes close to being filled, then the values may 713 be too high. On the other hand, if the queue is barely used (by IS- 714 IS), then values may be too low. 716 The values may also be absolute value reflecting relevant average 717 hardware resources that are been monitored, typically the amount of 718 buffer space used by incoming LSPs. In this case, care must be taken 719 when choosing the parameters influencing the values in order to avoid 720 undesirable or instable feedback loops. It would be undesirable to 721 use a formula that depends, for example, on an active measurement of 722 the instantaneous CPU load to modify the values advertised in the 723 Flooding Parameters TLV. This could introduce feedback into the IGP 724 flooding process that could produce unexpected behavior. 726 6.2.4. Operation considerations 728 As discussed in Section 4.6, the solution is more effective on point 729 to point adjacencies. Hence a broadcast interface (e.g. Ethernet) 730 only shared by two IS-IS neighbhors should be configured as point to 731 point in order to have a more effective flooding. 733 6.3. Congestion Control algorithm: Example 2 735 This section describes a congestion control algorithm based on 736 performance measured by the transmitter without dependance on 737 signaling from the receiver. 739 6.3.1. Router Architecture Discussion 741 (The following description is an abstraction - implementation details 742 vary.) 744 Existing router architectures may utilize multiple input queues. On 745 a given line card, IS-IS PDUs from multiple interfaces may be placed 746 in a rate limited input queue. This queue may be dedicated to IS-IS 747 PDUs or may be shared with other routing related packets. 749 The input queue may then pass IS-IS PDUs to a "punt queue" which is 750 used to pass PDUs from the data plane to the control plane. The punt 751 queue typically also has controls on its size and the rate at which 752 packets will be punted. 754 An input queue in the control plane may then be used to assemble PDUs 755 from multiple linecards, separate the IS-ISs PDU from other types of 756 packets, and place the IS-IS PDUs in an input queue dedicated to the 757 IS-IS protocol. 759 The IS-IS input queue then separates the IS-IS PDUs and directs them 760 to an instance specific processing queue. The instance specififc 761 processing queue may then further separate the IS-IS PDUs by type 762 (IIHs, SNPs, and LSPs) so that separate processing threads with 763 varying priorities may be employed to process the incoming PDUs. 765 In such an architecture, it may be difficult for IS-IS in the control 766 plane to accurately track the state of the various input queues and 767 determine what value should be advertised as a current receive 768 window. 770 The following section describes a congestion control algorithm based 771 on performance measured by the transmitter without dependance on 772 signaling from the receiver. 774 6.3.2. Transmitter Based Flow Control 776 The congestion control algorithm described in this section does not 777 depend upon direct signaling from the receiver. Instead it adapts 778 the tranmsmission rate based on measurement of the actual rate of 779 acknowledgments received. 781 When flow control is necessary, it can be implemented in a 782 straightforward manner based on knowledge of the current flooding 783 rate and the current acknowledgement rate. Such an algorithm is a 784 local matter and there is no requirement or intent to standardize an 785 algorithm. There are a number of aspects which serve as guidelines 786 which can be described. 788 A maximum target LSP transmission rate (LSPTxMax) SHOULD be 789 configurable. This represents the fastest LSP transmission rate 790 which will be attempted. This value SHOULD be applicable to all 791 interfaces and SHOULD be consistent network wide. 793 When the current rate of LSP transmission (LSPTxRate) exceeds the 794 capabilities of the receiver, the flow control algorithm needs to 795 aggressively reduce the LSPTxRate within a few seconds. Slower 796 responsiveness is likely to result in a large number of 797 retransmissions which can introduce much larger delays in 798 convergence. 800 NOTE: Even with modest increases in flooding speed (for example, a 801 target LSPTxMax of 300 LSPs/second (10 times the typical rate 802 supported today)), a topology change triggering 2100 new LSPs would 803 only take 7 seconds to complete. 805 Dynamic adjustment of the rate of LSP transmission (LSPTxRate) 806 upwards (i.e., faster) SHOULD be done less aggressively and only be 807 done when the neighbor has demonstrated its ability to sustain the 808 current LSPTxRate. 810 The flow control algorithm MUST NOT assume the receive capabilities 811 of a neighbor are static, i.e., it MUST handle transient conditions 812 which result in a slower or faster receive rate on the part of a 813 neighbor. 815 The flow control algorithm needs to consider the expected delay time 816 in receiving an acknowledgment. It therefore incorporates the 817 neighbor partialSNPInterval(Section 4.5) to help determine whether 818 acknowlegments are keeping pace with the rate of LSPs transmitted. 819 In the absence of an advertisement of partialSNPInterval a locally 820 configured value can be used. 822 7. IANA Considerations 824 IANA is requested to allocate one TLV from the IS-IS TLV codepoint 825 registry. 827 Type Description IIH LSP SNP Purge 828 ---- --------------------------- --- --- --- --- 829 TBD1 Flooding Parameters TLV y n y n 831 Figure 3 833 This document creates the following sub-TLV Registry: 835 Name: Sub-TLVs for TLV TBD1 (Flooding Parameters TLV). 837 Registration Procedure(s): Expert Review 839 Expert(s): TBD 841 Reference: TBD 843 +=======+===========================+ 844 | Type | Description | 845 +=======+===========================+ 846 | 0 | Reserved | 847 +-------+---------------------------+ 848 | 1 | LSP Burst Window | 849 +-------+---------------------------+ 850 | 2 | LSP Transmission Interval | 851 +-------+---------------------------+ 852 | 3 | LSPs Per PSNP | 853 +-------+---------------------------+ 854 | 4 | Flags | 855 +-------+---------------------------+ 856 | 5 | Partial SNP Interval | 857 +-------+---------------------------+ 858 | 6-255 | Unassigned | 859 +-------+---------------------------+ 861 Table 1: Initial allocations 863 This document also requests IANA to create a new registry for 864 assigning Flag bits advertised in the Flags sub-TLV. 866 Name: Flooding Parameters Flags Bits. 868 Registration Procedure: 870 Expert Review Expert(s): TBD 872 +--------+------------------------+ 873 | Bit # | Description | 874 +--------|------------------------+ 875 | 0 | O Flag | 876 +--------|------------------------+ 878 8. Security Considerations 880 Security concerns for IS-IS are addressed in [ISO10589] , [RFC5304] , 881 and [RFC5310] . These documents describe mechanisms that provide the 882 authentication and integrity of IS-IS PDUs, including SNPs and IIHs. 883 These authentication mechanisms are not altered by this document. 885 With the cryptographic mechanisms described in [RFC5304] and 886 [RFC5310] , an attacker wanting to advertise an incorrect Flooding 887 Parameters TLV would have to first defeat these mechanisms. 889 In the absence of cryptographic authentication, as IS-IS does not run 890 over IP but directly over the link layer, it's considered difficult 891 to inject false SNP/IHH without having access to the link layer. 893 If a false SNP/IIH is sent with a Flooding Parameters TLV set to 894 conservative values, the attacker can reduce the flooding speed 895 between the two adjacent neighbors which can result in LSDB 896 inconsistencies and transient forwarding loops. However, it is not 897 significantly different than filtering or altering LSPs which would 898 also be possible with access to the link layer. In addition, if the 899 downstream flooding neighbor has multiple IGP neighbors, which is 900 typically the case for reliability or topological reasons, it would 901 receive LSPs at a regular speed from its other neighbors and hence 902 would maintain LSDB consistency. 904 If a false SNP/IIH is sent with a Flooding Parameters TLV set to 905 aggressive values, the attacker can increase the flooding speed which 906 can either overload a node or more likely generate loss of LSPs. 907 However, it is not significantly different than sending many LSPs 908 which would also be possible with access to the link layer, even with 909 cryptographic authentication enabled. In addition, IS-IS has 910 procedures to detect the loss of LSPs and recover. 912 This TLV advertisement is not flooded across the network but only 913 sent between adjacent IS-IS neighbors. This would limit the 914 consequences in case of forged messages, and also limits the 915 dissemination of such information. 917 9. Contributors 919 The following people gave a substantial contribution to the content 920 of this document and should be considered as coauthors: 922 Acee Lindem, Cisco Systems, acee@cisco.com 924 Jayesh J, Juniper Networks, jayeshj@juniper.net 926 10. Acknowledgments 928 The authors would like to thank Henk Smit, Sarah Chen, Xuesong Geng, 929 Pierre Francois and Hannes Gredler for their reviews, comments and 930 suggestions. 932 The authors would like to thank David Jacquet, Sarah Chen, and 933 Qiangzhou Gao for the tests performed on commercial implementations 934 and their identification of some limiting factors. 936 11. References 938 11.1. Normative References 940 [ISO10589] International Organization for Standardization, 941 "Intermediate system to Intermediate system intra-domain 942 routeing information exchange protocol for use in 943 conjunction with the protocol for providing the 944 connectionless-mode Network Service (ISO 8473)", ISO/ 945 IEC 10589:2002, Second Edition, November 2002. 947 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 948 Requirement Levels", BCP 14, RFC 2119, 949 DOI 10.17487/RFC2119, March 1997, 950 . 952 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 953 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 954 2008, . 956 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 957 and M. Fanto, "IS-IS Generic Cryptographic 958 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 959 2009, . 961 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 962 "Computing TCP's Retransmission Timer", RFC 6298, 963 DOI 10.17487/RFC6298, June 2011, 964 . 966 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 967 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 968 May 2017, . 970 11.2. Informative References 972 [I-D.ietf-lsr-dynamic-flooding] 973 Li, T., Przygienda, T., Psenak, P., Ginsberg, L., Chen, 974 H., Cooper, D., Jalil, L., Dontula, S., and G. S. Mishra, 975 "Dynamic Flooding on Dense Graphs", Work in Progress, 976 Internet-Draft, draft-ietf-lsr-dynamic-flooding-10, 7 977 December 2021, . 980 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 981 RFC 793, DOI 10.17487/RFC0793, September 1981, 982 . 984 [RFC2973] Balay, R., Katz, D., and J. Parker, "IS-IS Mesh Groups", 985 RFC 2973, DOI 10.17487/RFC2973, October 2000, 986 . 988 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 989 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 990 . 992 Appendix A. Changes / Author Notes 994 [RFC Editor: Please remove this section before publication] 996 IND 00: Initial version. 998 WG 00: No change. 1000 Appendix B. Issues for Further Discussion 1002 [RFC Editor: Please remove this section before publication] 1004 This section captures issues which the authors either have not yet 1005 had time to address or on which the authors have not yet reached 1006 consensus. Future revisions of this document may include new/altered 1007 text relevant to these issues. 1009 There are no open issues at this time. 1011 Authors' Addresses 1012 Bruno Decraene 1013 Orange 1015 Email: bruno.decraene@orange.com 1017 Les Ginsberg 1018 Cisco Systems 1019 821 Alder Drive 1020 Milpitas, CA 95035 1021 United States of America 1023 Email: ginsberg@cisco.com 1025 Tony Li 1026 Arista Networks 1027 5453 Great America Parkway 1028 Santa Clara, California 95054 1029 United States of America 1031 Email: tony.li@tony.li 1033 Guillaume Solignac 1035 Email: gsoligna@protonmail.com 1037 Marek Karasek 1038 Cisco Systems 1039 Pujmanove 1753/10a, Prague 4 - Nusle 1040 10 14000 Prague 1041 Czech Republic 1043 Email: mkarasek@cisco.com 1045 Chris Bowers 1046 Juniper Networks, Inc. 1047 1194 N. Mathilda Avenue 1048 Sunnyvale, CA 94089 1049 United States of America 1051 Email: cbowers@juniper.net 1052 Gunter Van de Velde 1053 Nokia 1054 Copernicuslaan 50 1055 2018 Antwerp 1056 Belgium 1058 Email: gunter.van_de_velde@nokia.com 1060 Peter Psenak 1061 Cisco Systems 1062 Apollo Business Center Mlynske nivy 43 1063 821 09 Bratislava 1064 Slovakia 1066 Email: ppsenak@cisco.com 1068 Tony Przygienda 1069 Juniper 1070 1137 Innovation Way 1071 Sunnyvale, Ca 1072 United States of America 1074 Email: prz@juniper.net