idnits 2.17.1 draft-ietf-lsr-isis-rfc5306bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 3, 2019) is 1761 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IS-IS for IP Internets L. Ginsberg 3 Internet-Draft P. Wells 4 Obsoletes: 5306 (if approved) Cisco Systems, Inc. 5 Intended status: Standards Track June 3, 2019 6 Expires: December 5, 2019 8 Restart Signaling for IS-IS 9 draft-ietf-lsr-isis-rfc5306bis-02 11 Abstract 13 This document describes a mechanism for a restarting router to signal 14 to its neighbors that it is restarting, allowing them to reestablish 15 their adjacencies without cycling through the down state, while still 16 correctly initiating database synchronization. 18 This document additionally describes a mechansim for a router to 19 signal its neighbors that it is preparing to initiate a restart while 20 maintaining forwarding plane state. This allows the neighbors to 21 maintain their adjacencies until the router has restarted, but also 22 allows the neighbors to bring the adjacencies down in the event of 23 other topology changes. 25 This document additionally describes a mechanism for a restarting 26 router to determine when it has achieved Link State Protocol Data 27 Unit (LSP) database synchronization with its neighbors and a 28 mechanism to optimize LSP database synchronization, while minimizing 29 transient routing disruption when a router starts. 31 This document obsoletes RFC 5306. 33 Requirements Language 35 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 36 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 37 "OPTIONAL" in this document are to be interpreted as described in BCP 38 14 [RFC2119] [RFC8174] when, and only when, they appear in all 39 capitals, as shown here. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at https://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on December 5, 2019. 58 Copyright Notice 60 Copyright (c) 2019 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (https://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 2.1. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2.2. Restart TLV . . . . . . . . . . . . . . . . . . . . . . . 5 79 2.2.1. Use of RR and RA Bits . . . . . . . . . . . . . . . . 6 80 2.2.2. Use of the SA Bit . . . . . . . . . . . . . . . . . . 7 81 2.2.3. Use of PR and PA Bits . . . . . . . . . . . . . . . . 8 82 2.3. Adjacency (Re)Acquisition . . . . . . . . . . . . . . . . 10 83 2.3.1. Adjacency Reacquisition during Restart . . . . . . . 10 84 2.3.2. Adjacency Acquisition during Start . . . . . . . . . 13 85 2.3.3. Multiple Levels . . . . . . . . . . . . . . . . . . . 14 86 2.4. Database Synchronization . . . . . . . . . . . . . . . . 14 87 2.4.1. LSP Generation and Flooding and SPF Computation . . . 15 88 3. State Tables . . . . . . . . . . . . . . . . . . . . . . . . 18 89 3.1. Running Router . . . . . . . . . . . . . . . . . . . . . 18 90 3.2. Restarting Router . . . . . . . . . . . . . . . . . . . . 19 91 3.3. Starting Router . . . . . . . . . . . . . . . . . . . . . 21 92 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 93 5. Security Considerations . . . . . . . . . . . . . . . . . . . 22 94 6. Manageability Considerations . . . . . . . . . . . . . . . . 22 95 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 96 8. Normative References . . . . . . . . . . . . . . . . . . . . 23 97 Appendix A. Summary of Changes from RFC 5306 . . . . . . . . . . 24 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 100 1. Overview 102 The Intermediate System to Intermediate System (IS-IS) routing 103 protocol [RFC1195] [ISO10589] is a link state intra-domain routing 104 protocol. Normally, when an IS-IS router is restarted, temporary 105 disruption of routing occurs due to events in both the restarting 106 router and the neighbors of the restarting router. 108 The router that has been restarted computes its own routes before 109 achieving database synchronization with its neighbors. The results 110 of this computation are likely to be non-convergent with the routes 111 computed by other routers in the area/domain. 113 Neighbors of the restarting router detect the restart event and cycle 114 their adjacencies with the restarting router through the down state. 115 The cycling of the adjacency state causes the neighbors to regenerate 116 their LSPs describing the adjacency concerned. This in turn causes a 117 temporary disruption of routes passing through the restarting router. 119 In certain scenarios, the temporary disruption of the routes is 120 highly undesirable. This document describes mechanisms to avoid or 121 minimize the disruption due to both of these causes. 123 When an adjacency is reinitialized as a result of a neighbor 124 restarting, a router does three things: 126 1. It causes its own LSP(s) to be regenerated, thus triggering SPF 127 runs throughout the area (or in the case of Level 2, throughout 128 the domain). 130 2. It sets SRMflags on its own LSP database on the adjacency 131 concerned. 133 3. In the case of a Point-to-Point link, it transmits a complete set 134 of Complete Sequence Number PDUs (CSNPs), over the adjacency. 136 In the case of a restarting router process, the first of these is 137 highly undesirable, but the second is essential in order to ensure 138 synchronization of the LSP database. 140 The third action above minimizes the number of LSPs that must be 141 exchanged and, if made reliable, provides a means of determining when 142 the LSP databases of the neighboring routers have been synchronized. 143 This is desirable whether or not the router is being restarted (so 144 that the overload bit can be cleared in the router's own LSP, for 145 example). 147 This document describes a mechanism for a restarting router to signal 148 that it is restarting to its neighbors, and allow them to reestablish 149 their adjacencies without cycling through the down state, while still 150 correctly initiating database synchronization. 152 This document additionally describes a mechanism for a restarting 153 router to determine when it has achieved LSP database synchronization 154 with its neighbors and a mechanism to optimize LSP database 155 synchronization and minimize transient routing disruption when a 156 router starts. 158 It is assumed that the three-way handshake [RFC5303] is being used on 159 Point-to-Point circuits. 161 2. Approach 163 2.1. Timers 165 Three additional timers, T1, T2, and T3, are required to support the 166 behavior of a restarting router defined in this document. 168 NOTE: These timers are NOT applicable to a router which is preparing 169 to do a planned restart. 171 An instance of the timer T1 is maintained per interface, and 172 indicates the time after which an unacknowledged (re)start attempt 173 will be repeated. A typical value might be 3 seconds. 175 An instance of the timer T2 is maintained for each LSP database 176 (LSPDB) present in the system, i.e., for a Level 1/2 system, there 177 will be an instance of the timer T2 for Level 1 and an instance for 178 Level 2. This is the maximum time that the system will wait for 179 LSPDB synchronization. A typical value might be 60 seconds. 181 A single instance of the timer T3 is maintained for the entire 182 system. It indicates the time after which the router will declare 183 that it has failed to achieve database synchronization (by setting 184 the overload bit in its own LSP). This is initialized to 65535 185 seconds, but is set to the minimum of the remaining times of received 186 IS-IS Hellos (IIHs) containing a restart TLV with the Restart 187 Acknowledgement (RA) set and an indication that the neighbor has an 188 adjacency in the "UP" state to the restarting router. 190 NOTE: The timer T3 is only used by a restarting router. 192 2.2. Restart TLV 194 A new TLV is defined to be included in IIH PDUs. The presence of 195 this TLV indicates that the sender supports the functionality defined 196 in this document and it carries flags that are used to convey 197 information during a (re)start. All IIHs transmitted by a router 198 that supports this capability MUST include this TLV. 200 Type 211 202 Length: Number of octets in the Value field (1 to (3 + ID Length)) 203 Value 205 No. of octets 206 +-----------------------+ 207 | Flags | 1 208 +-----------------------+ 209 | Remaining Time | 2 210 +-----------------------+ 211 | Restarting Neighbor ID| ID Length 212 +-----------------------+ 214 Flags (1 octet) 216 0 1 2 3 4 5 6 7 217 +--+--+--+--+--+--+--+--+ 218 |Reserved|PA|PR|SA|RA|RR| 219 +--+--+--+--+--+--+--+--+ 221 RR - Restart Request 222 RA - Restart Acknowledgement 223 SA - Suppress adjacency advertisement 224 PR - Restart is planned 225 PA - Planned restart acknowledgement 227 (Note: Remaining fields are ) 229 Remaining Time (2 octets) 231 Remaining/recommended holding time (in seconds). 233 Required when the RA, PR, or PA bit is set. Otherwise 234 this field SHOULD be omitted when sent and 235 MUST be ignored when received. 237 Restarting Neighbor System ID (ID Length octets) 239 The System ID of the neighbor to which an RA/PA refers. 241 Required when the RA or PA bit is set. Otherwise 242 this field SHOULD be omitted when sent and 243 MUST be ignored when received. 245 Note: Implementations based on earlier drafts of RFC 5306 246 may not include this field in the TLV when the RA bit is set. 247 In this case, a router that is expecting an RA on a LAN circuit 248 SHOULD assume that the acknowledgement is directed at the local 249 system. 251 2.2.1. Use of RR and RA Bits 253 The RR bit is used by a (re)starting router to signal to its 254 neighbors that a (re)start is in progress, that an existing adjacency 255 SHOULD be maintained even under circumstances when the normal 256 operation of the adjacency state machine would require the adjacency 257 to be reinitialized, to request a set of CSNPs, and to request 258 setting of the SRMflags. 260 The RA bit is sent by the neighbor of a (re)starting router to 261 acknowledge the receipt of a restart TLV with the RR bit set. 263 When the neighbor of a (re)starting router receives an IIH with the 264 restart TLV having the RR bit set, if there exists on this interface 265 an adjacency in state "UP" with the same System ID, and in the case 266 of a LAN circuit, with the same source LAN address, then, 267 irrespective of the other contents of the "Intermediate System 268 Neighbors" option (LAN circuits) or the "Point-to-Point Three-Way 269 Adjacency" option (Point-to-Point circuits): 271 a. the state of the adjacency is not changed. If this is the first 272 IIH with the RR bit set that this system has received associated 273 with this adjacency, then the adjacency is marked as being in 274 "Restart mode" and the adjacency holding time is refreshed -- 275 otherwise, the holding time is not refreshed. The "remaining 276 time" transmitted according to (b) below MUST reflect the actual 277 time after which the adjacency will now expire. Receipt of a 278 normal IIH with the RR bit reset will clear the "Restart mode" 279 state. This procedure allows the restarting router to cause the 280 neighbor to maintain the adjacency long enough for restart to 281 successfully complete, while also preventing repetitive restarts 282 from maintaining an adjacency indefinitely. Whether or not an 283 adjacency is marked as being in "Restart mode" has no effect on 284 adjacency state transitions. 286 b. immediately (i.e., without waiting for any currently running 287 timer interval to expire, but with a small random delay of a few 288 tens of milliseconds on LANs to avoid "storms") transmit over the 289 corresponding interface an IIH including the restart TLV with the 290 RR bit clear and the RA bit set, in the case of Point-to-Point 291 adjacencies having updated the "Point-to-Point Three-Way 292 Adjacency" option to reflect any new values received from the 293 (re)starting router. (This allows a restarting router to quickly 294 acquire the correct information to place in its hellos.) The 295 "Remaining Time" MUST be set to the current time (in seconds) 296 before the holding timer on this adjacency is due to expire. If 297 the corresponding interface is a LAN interface, then the 298 Restarting Neighbor System ID SHOULD be set to the System ID of 299 the router from which the IIH with the RR bit set was received. 300 This is required to correctly associate the acknowledgement and 301 holding time in the case where multiple systems on a LAN restart 302 at approximately the same time. This IIH SHOULD be transmitted 303 before any LSPs or SNPs are transmitted as a result of the 304 receipt of the original IIH. 306 c. if the corresponding interface is a Point-to-Point interface, or 307 if the receiving router has the highest LnRouterPriority (with 308 the highest source MAC (Media Access Control) address breaking 309 ties) among those routers to which the receiving router has an 310 adjacency in state "UP" on this interface whose IIHs contain the 311 restart TLV, excluding adjacencies to all routers which are 312 considered in "Restart mode" (note the actual DIS is NOT changed 313 by this process), initiate the transmission over the 314 corresponding interface of a complete set of CSNPs, and set 315 SRMflags on the corresponding interface for all LSPs in the local 316 LSP database. 318 Otherwise (i.e., if there was no adjacency in the "UP" state to the 319 System ID in question), process the IIH as normal by reinitializing 320 the adjacency and setting the RA bit in the returned IIH. 322 2.2.2. Use of the SA Bit 324 The SA bit is used by a starting router to request that its neighbor 325 suppress advertisement of the adjacency to the starting router in the 326 neighbor's LSPs. 328 A router that is starting has no maintained forwarding function 329 state. This may or may not be the first time the router has started. 330 If this is not the first time the router has started, copies of LSPs 331 generated by this router in its previous incarnation may exist in the 332 LSP databases of other routers in the network. These copies are 333 likely to appear "newer" than LSPs initially generated by the 334 starting router due to the reinitialization of LSP fragment sequence 335 numbers by the starting router. This may cause temporary blackholes 336 to occur until the normal operation of the update process causes the 337 starting router to regenerate and flood copies of its own LSPs with 338 higher sequence numbers. The temporary blackholes can be avoided if 339 the starting router's neighbors suppress advertising an adjacency to 340 the starting router until the starting router has been able to 341 propagate newer versions of LSPs generated by previous incarnations. 343 When a router receives an IIH with the restart TLV having the SA bit 344 set, if there exists on this interface an adjacency in state "UP" 345 with the same System ID, and in the case of a LAN circuit, with the 346 same source LAN address, then the router MUST suppress advertisement 347 of the adjacency to the neighbor in its own LSPs. Until an IIH with 348 the SA bit clear has been received, the neighbor advertisement MUST 349 continue to be suppressed. If the adjacency transitions to the "UP" 350 state, the new adjacency MUST NOT be advertised until an IIH with the 351 SA bit clear has been received. 353 Note that a router that suppresses advertisement of an adjacency MUST 354 NOT use this adjacency when performing its SPF calculation. In 355 particular, if an implementation follows the example guidelines 356 presented in [ISO10589], Annex C.2.5, Step 0:b) "pre-load TENT with 357 the local adjacency database", the suppressed adjacency MUST NOT be 358 loaded into TENT. 360 2.2.3. Use of PR and PA Bits 362 The PR bit is used by a router which is planning to initiate a 363 restart to signal to its neighbors that it will be restarting. The 364 router sending an IIH with PR bit set SHOULD set the "remaining time" 365 to a value greater than the expected control plane restart time. The 366 PR bit SHOULD remain set in IIHs until the restart is initiated. 368 The PA bit is sent by the neighbor of a router planning to restart to 369 acknowledge receipt of a restart TLV with the PR bit set. 371 When the neighbor of a router planning a restart receives an IIH with 372 the restart TLV having the PR bit set, if there exists on this 373 interface an adjacency in state "UP" with the same System ID, and in 374 the case of a LAN circuit, with the same source LAN address, then: 376 a. if this is the first IIH with the PR bit set that this system has 377 received associated with this adjacency, then the adjacency is 378 marked as being in "Planned Restart state" and the adjacency 379 holding time is refreshed -- otherwise, the holding time is not 380 refreshed. The holding time SHOULD be set to the "remaining 381 time" specified in the received IIH with PR set. The "remaining 382 time" transmitted according to (b) below MUST reflect the actual 383 time after which the adjacency will now expire. Receipt of a 384 normal IIH with the PR bit reset will clear the "Planned Restart 385 mode" state and cause the receiving router to set the adjacency 386 hold time to the locally configured value. This procedure allows 387 the router planning a restart to cause the neighbor to maintain 388 the adjacency long enough for restart to successfully complete. 389 Whether or not an adjacency is marked as being in "Planned 390 Restart mode" has no effect on adjacency state transitions. 392 b. immediately (i.e., without waiting for any currently running 393 timer interval to expire, but with a small random delay of a few 394 tens of milliseconds on LANs to avoid "storms") transmit over the 395 corresponding interface an IIH including the restart TLV with the 396 PR bit clear and the PA bit set. The "Remaining Time" MUST be 397 set to the current time (in seconds) before the holding timer on 398 this adjacency is due to expire. If the corresponding interface 399 is a LAN interface, then the Restarting Neighbor System ID SHOULD 400 be set to the System ID of the router from which the IIH with the 401 PR bit set was received. This is required to correctly associate 402 the acknowledgement and holding time in the case where multiple 403 systems on a LAN are planning a restart at approximately the same 404 time. 406 NOTE: Receipt of an IIH with PA bit set indicates to the router 407 planning a restart that the neighbor is aware of the planned restart 408 and - in the absence of topology changes as described below - will 409 maintain the adjacency for the "remaining time" included in the IIH 410 with PA set. 412 While a control plane restart is in progress it is expected that the 413 restarting router will be unable to respond to topology changes. It 414 is therefore useful to signal a planned restart (if the forwarding 415 plane on the restarting router is maintained) so that the neighbors 416 of the restarting router can determine whether it is safe to maintain 417 the adjacency if other topology changes occur prior to the completion 418 of the restart. Signalling a planned restart in the absence of 419 maintained forwarding plane state is likely to lead to significant 420 traffic loss and MUST NOT be done. 422 Neighbors of the router which has signaled planned restart SHOULD 423 maintain the adjacency in a planned restart state until it receives 424 an IIH with the RR bit set, receives an IIH with both PR and RR bits 425 clear, or the adjacency holding time expires - whichever occurs 426 first. 428 While the adjacency is in planned restart state the following actions 429 MAY be taken: 431 a. If additional topology changes occur, the adjacency which is in 432 planned restart state MAY be brought down even though the hold 433 time has not yet expired. Given that the neighbor which has 434 signaled a planned restart is not expected to update its 435 forwarding plane in response to signaling of the topology changes 436 (since it is restarting) traffic which transits that node is at 437 risk of being improperly forwarded. On a LAN circuit, if the 438 router in planned restart state is the DIS at any supported 439 level, the adjacency(ies) SHOULD be brought down whenever any LSP 440 update is either generated or received so as to trigger a new DIS 441 election. Failure to do so will compromise the reliability of 442 the Update Process on that circuit. What other criteria are used 443 to determine what topology changes will trigger bringing the 444 adjacency down is a local implementation decision. 446 b. If a BFD session to the neighbor which signals a planned restart 447 is in the UP state and subsequently goes DOWN, the event MAY be 448 ignored since it is possible this is an expected side effect of 449 the restart. Use of the Control Plane Independent state as 450 signalled in BFD control packets [RFC5880] SHOULD be considered 451 in the decision to ignore a BFD Session DOWN event 453 c. On a Point-to-Point circuit, transmission of LSPs, CSNPs, and 454 PSNPs MAY be suppressed. It is expected that the PDUs will not 455 be received. 457 Use of the PR bit provides a means to safely support restart periods 458 which are significantly longer than standard holdtimes. 460 2.3. Adjacency (Re)Acquisition 462 Adjacency (re)acquisition is the first step in (re)initialization. 463 Restarting and starting routers will make use of the RR bit in the 464 restart TLV, though each will use it at different stages of the 465 (re)start procedure. 467 2.3.1. Adjacency Reacquisition during Restart 469 The restarting router explicitly notifies its neighbor that the 470 adjacency is being reacquired, and hence that it SHOULD NOT 471 reinitialize the adjacency. This is achieved by setting the RR bit 472 in the restart TLV. When the neighbor of a restarting router 473 receives an IIH with the restart TLV having the RR bit set, if there 474 exists on this interface an adjacency in state "UP" with the same 475 System ID, and in the case of a LAN circuit, with the same source LAN 476 address, then the procedures described in Section 3.2.1 are followed. 478 A router that does not support the restart capability will ignore the 479 restart TLV and reinitialize the adjacency as normal, returning an 480 IIH without the restart TLV. 482 On restarting, a router initializes the timer T3, starts the timer T2 483 for each LSPDB, and for each interface (and in the case of a LAN 484 circuit, for each level) starts the timer T1 and transmits an IIH 485 containing the restart TLV with the RR bit set. 487 On a Point-to-Point circuit, the restarting router SHOULD set the 488 "Adjacency Three-Way State" to "Init", because the receipt of the 489 acknowledging IIH (with RA set) MUST cause the adjacency to enter the 490 "UP" state immediately. 492 On a LAN circuit, the LAN-ID assigned to the circuit SHOULD be the 493 same as that used prior to the restart. In particular, for any 494 circuits for which the restarting router was previously DIS, the use 495 of a different LAN-ID would necessitate the generation of a new set 496 of pseudonode LSPs, and corresponding changes in all the LSPs 497 referencing them from other routers on the LAN. By preserving the 498 LAN-ID across the restart, this churn can be prevented. To enable a 499 restarting router to learn the LAN-ID used prior to restart, the LAN- 500 ID specified in an IIH with RR set MUST be ignored. 502 Transmission of "normal" IIHs is inhibited until the conditions 503 described below are met (in order to avoid causing an unnecessary 504 adjacency initialization). Upon expiry of the timer T1, it is 505 restarted and the IIH is retransmitted as above. 507 When a restarting router receives an IIH a local adjacency is 508 established as usual, and if the IIH contains a restart TLV with the 509 RA bit set (and on LAN circuits with a Restart Neighbor System ID 510 that matches that of the local system), the receipt of the 511 acknowledgement over that interface is noted. When the RA bit is set 512 and the state of the remote adjacency is "UP", then the timer T3 is 513 set to the minimum of its current value and the value of the 514 "Remaining Time" field in the received IIH. 516 On a Point-to-Point link, receipt of an IIH not containing the 517 restart TLV is also treated as an acknowledgement, since it indicates 518 that the neighbor is not restart capable. However, since no CSNP is 519 guaranteed to be received over this interface, the timer T1 is 520 cancelled immediately without waiting for a complete set of CSNPs. 521 Synchronization may therefore be deemed complete even though there 522 are some LSPs which are held (only) by this neighbor (see 523 Section 3.4). In this case, we also want to be certain that the 524 neighbor will reinitialize the adjacency in order to guarantee that 525 the SRMflags have been set on its database, thus ensuring eventual 526 LSPDB synchronization. This is guaranteed to happen except in the 527 case where the Adjacency Three-Way State in the received IIH is "UP" 528 and the Neighbor Extended Local Circuit ID matches the extended local 529 circuit ID assigned by the restarting router. In this case, the 530 restarting router MUST force the adjacency to reinitialize by setting 531 the local Adjacency Three-Way State to "DOWN" and sending a normal 532 IIH. 534 In the case of a LAN interface, receipt of an IIH not containing the 535 restart TLV is unremarkable since synchronization can still occur so 536 long as at least one of the non-restarting neighboring routers on the 537 LAN supports restart. Therefore, T1 continues to run in this case. 538 If none of the neighbors on the LAN are restart capable, T1 will 539 eventually expire after the locally defined number of retries. 541 In the case of a Point-to-Point circuit, the "LocalCircuitID" and 542 "Extended Local Circuit ID" information contained in the IIH can be 543 used immediately to generate an IIH containing the correct three-way 544 handshake information. The presence of "Neighbor Extended Local 545 Circuit ID" information that does not match the value currently in 546 use by the local system is ignored (since the IIH may have been 547 transmitted before the neighbor had received the new value from the 548 restarting router), but the adjacency remains in the initializing 549 state until the correct information is received. 551 In the case of a LAN circuit, the source neighbor information (e.g., 552 SNPAAddress) is recorded and used for adjacency establishment and 553 maintenance as normal. 555 When BOTH a complete set of CSNPs (for each active level, in the case 556 of a Point-to-Point circuit) and an acknowledgement have been 557 received over the interface, the timer T1 is cancelled. 559 Once the timer T1 has been cancelled, subsequent IIHs are transmitted 560 according to the normal algorithms, but including the restart TLV 561 with both RR and RA clear. 563 If a LAN contains a mixture of systems, only some of which support 564 the new algorithm, database synchronization is still guaranteed, but 565 the "old" systems will have reinitialized their adjacencies. 567 If an interface is active, but does not have any neighboring router 568 reachable over that interface, the timer T1 would never be cancelled, 569 and according to Section 3.4.1.1, the SPF would never be run. 570 Therefore, timer T1 is cancelled after some predetermined number of 571 expirations (which MAY be 1). 573 2.3.2. Adjacency Acquisition during Start 575 The starting router wants to ensure that in the event that a 576 neighboring router has an adjacency to the starting router in the 577 "UP" state (from a previous incarnation of the starting router), this 578 adjacency is reinitialized. The starting router also wants 579 neighboring routers to suppress advertisement of an adjacency to the 580 starting router until LSP database synchronization is achieved. This 581 is achieved by sending IIHs with the RR bit clear and the SA bit set 582 in the restart TLV. The RR bit remains clear and the SA bit remains 583 set in subsequent transmissions of IIHs until the adjacency has 584 reached the "UP" state and the initial T1 timer interval (see below) 585 has expired. 587 Receipt of an IIH with the RR bit clear will result in the 588 neighboring router utilizing normal operation of the adjacency state 589 machine. This will ensure that any old adjacency on the neighboring 590 router will be reinitialized. 592 Upon receipt of an IIH with the SA bit set, the behavior described in 593 Section 3.2.2 is followed. 595 Upon starting, a router starts timer T2 for each LSPDB. 597 For each interface (and in the case of a LAN circuit, for each 598 level), when an adjacency reaches the "UP" state, the starting router 599 starts a timer T1 and transmits an IIH containing the restart TLV 600 with the RR bit clear and SA bit set. Upon expiry of the timer T1, 601 it is restarted and the IIH is retransmitted with both RR and SA bits 602 set (only the RR bit has changed state from earlier IIHs). 604 Upon receipt of an IIH with the RR bit set (regardless of whether or 605 not the SA bit is set), the behavior described in Section 2.2.1 is 606 followed. 608 When an IIH is received by the starting router and the IIH contains a 609 restart TLV with the RA bit set (and on LAN circuits with a Restart 610 Neighbor System ID that matches that of the local system), the 611 receipt of the acknowledgement over that interface is noted. 613 On a Point-to-Point link, receipt of an IIH not containing the 614 restart TLV is also treated as an acknowledgement, since it indicates 615 that the neighbor is not restart capable. Since the neighbor will 616 have reinitialized the adjacency, this guarantees that SRMflags have 617 been set on its database, thus ensuring eventual LSPDB 618 synchronization. However, since no CSNP is guaranteed to be received 619 over this interface, the timer T1 is cancelled immediately without 620 waiting for a complete set of CSNPs. Synchronization may therefore 621 be deemed complete even though there are some LSPs that are held 622 (only) by this neighbor (see Section 2.4). 624 In the case of a LAN interface, receipt of an IIH not containing the 625 restart TLV is unremarkable since synchronization can still occur so 626 long as at least one of the non-restarting neighboring routers on the 627 LAN supports restart. Therefore, T1 continues to run in this case. 628 If none of the neighbors on the LAN are restart capable, T1 will 629 eventually expire after the locally defined number of retries. The 630 usual operation of the update process will ensure that 631 synchronization is eventually achieved. 633 When BOTH a complete set of CSNPs (for each active level, in the case 634 of a Point-to-Point circuit) and an acknowledgement have been 635 received over the interface, the timer T1 is cancelled. Subsequent 636 IIHs sent by the starting router have the RR and RA bits clear and 637 the SA bit set in the restart TLV. 639 Timer T1 is cancelled after some predetermined number of expirations 640 (which MAY be 1). 642 When the T2 timer(s) are cancelled or expire, transmission of 643 "normal" IIHs (with RR, RA, and SA bits clear) will begin. 645 2.3.3. Multiple Levels 647 A router that is operating as both a Level 1 and a Level 2 router on 648 a particular interface MUST perform the above operations for each 649 level. 651 On a LAN interface, it MUST send and receive both Level 1 and Level 2 652 IIHs and perform the CSNP synchronizations independently for each 653 level. 655 On a Point-to-Point interface, only a single IIH (indicating support 656 for both levels) is required, but it MUST perform the CSNP 657 synchronizations independently for each level. 659 2.4. Database Synchronization 661 When a router is started or restarted, it can expect to receive a 662 complete set of CSNPs over each interface. The arrival of the 663 CSNP(s) is now guaranteed, since an IIH with the RR bit set will be 664 retransmitted until the CSNP(s) are correctly received. 666 The CSNPs describe the set of LSPs that are currently held by each 667 neighbor. Synchronization will be complete when all these LSPs have 668 been received. 670 When (re)starting, a router starts an instance of timer T2 for each 671 LSPDB as described in Section 3.3.1 or Section 3.3.2. In addition to 672 normal processing of the CSNPs, the set of LSPIDs contained in the 673 first complete set of CSNPs received over each interface is recorded, 674 together with their remaining lifetime. In the case of a LAN 675 interface, a complete set of CSNPs MUST consist of CSNPs received 676 from neighbors that are not restarting. If there are multiple 677 interfaces on the (re)starting router, the recorded set of LSPIDs is 678 the union of those received over each interface. LSPs with a 679 remaining lifetime of zero are NOT so recorded. 681 As LSPs are received (by the normal operation of the update process) 682 over any interface, the corresponding LSPID entry is removed (it is 683 also removed if an LSP arrives before the CSNP containing the 684 reference). When an LSPID has been held in the list for its 685 indicated remaining lifetime, it is removed from the list. When the 686 list of LSPIDs is empty and the timer T1 has been cancelled for all 687 the interfaces that have an adjacency at this level, the timer T2 is 688 cancelled. 690 At this point, the local database is guaranteed to contain all the 691 LSP(s) (either the same sequence number or a more recent sequence 692 number) that were present in the neighbors' databases at the time of 693 (re)starting. LSPs that arrived in a neighbor's database after the 694 time of (re)starting may or may not be present, but the normal 695 operation of the update process will guarantee that they will 696 eventually be received. At this point, the local database is deemed 697 to be "synchronized". 699 Since LSPs mentioned in the CSNP(s) with a zero remaining lifetime 700 are not recorded, and those with a short remaining lifetime are 701 deleted from the list when the lifetime expires, cancellation of the 702 timer T2 will not be prevented by waiting for an LSP that will never 703 arrive. 705 2.4.1. LSP Generation and Flooding and SPF Computation 707 The operation of a router starting, as opposed to restarting, is 708 somewhat different. These two cases are dealt with separately below. 710 2.4.1.1. Restarting 712 In order to avoid causing unnecessary routing churn in other routers, 713 it is highly desirable that the router's own LSPs generated by the 714 restarting system are the same as those previously present in the 715 network (assuming no other changes have taken place). It is 716 important therefore not to regenerate and flood the LSPs until all 717 the adjacencies have been re-established and any information required 718 for propagation into the local LSPs is fully available. Ideally, the 719 information is loaded into the LSPs in a deterministic way, such that 720 the same information occurs in the same place in the same LSP (and 721 hence the LSPs are identical to their previous versions). If this 722 can be achieved, the new versions may not even cause SPF to be run in 723 other systems. However, provided the same information is included in 724 the set of LSPs (albeit in a different order, and possibly different 725 LSPs), the result of running the SPF will be the same and will not 726 cause churn to the forwarding tables. 728 In the case of a restarting router, none of the router's own LSPs are 729 transmitted, nor are the router's own forwarding tables updated while 730 the timer T3 is running. 732 Redistribution of inter-level information MUST be regenerated before 733 this router's LSP is flooded to other nodes. Therefore, the Level-n 734 non-pseudonode LSP(s) MUST NOT be flooded until the other level's T2 735 timer has expired and its SPF has been run. This ensures that any 736 inter-level information that is to be propagated can be included in 737 the Level-n LSP(s). 739 During this period, if one of the router's own (including 740 pseudonodes) LSPs is received, which the local router does not 741 currently have in its own database, it is NOT purged. Under normal 742 operation, such an LSP would be purged, since the LSP clearly should 743 not be present in the global LSP database. However, in the present 744 circumstances, this would be highly undesirable, because it could 745 cause premature removal of a router's own LSP -- and hence churn in 746 remote routers. Even if the local system has one or more of the 747 router's own LSPs (which it has generated, but not yet transmitted), 748 it is still not valid to compare the received LSP against this set, 749 since it may be that as a result of propagation between Level 1 and 750 Level 2 (or vice versa), a further router's own LSP will need to be 751 generated when the LSP databases have synchronized. 753 During this period, a restarting router SHOULD send CSNPs as it 754 normally would. Information about the router's own LSPs MAY be 755 included, but if it is included it MUST be based on LSPs that have 756 been received, not on versions that have been generated (but not yet 757 transmitted). This restriction is necessary to prevent premature 758 removal of an LSP from the global LSP database. 760 When the timer T2 expires or is cancelled indicating that 761 synchronization for that level is complete, the SPF for that level is 762 run in order to derive any information that is required to be 763 propagated to another level, but the forwarding tables are not yet 764 updated. 766 Once the other level's SPF has run and any inter-level propagation 767 has been resolved, the router's own LSPs can be generated and 768 flooded. Any own LSPs that were previously ignored, but that are not 769 part of the current set of own LSPs (including pseudonodes), MUST 770 then be purged. Note that it is possible that a Designated Router 771 change may have taken place, and consequently the router SHOULD purge 772 those pseudonode LSPs that it previously owned, but that are now no 773 longer part of its set of pseudonode LSPs. 775 When all the T2 timers have expired or been cancelled, the timer T3 776 is cancelled and the local forwarding tables are updated. 778 If the timer T3 expires before all the T2 timers have expired or been 779 cancelled, this indicates that the synchronization process is taking 780 longer than the minimum holding time of the neighbors. The router's 781 own LSP(s) for levels that have not yet completed their first SPF 782 computation are then flooded with the overload bit set to indicate 783 that the router's LSPDB is not yet synchronized (and therefore other 784 routers MUST NOT compute routes through this router). Normal 785 operation of the update process resumes, and the local forwarding 786 tables are updated. In order to prevent the neighbor's adjacencies 787 from expiring, IIHs with the normal interface value for the holding 788 time are transmitted over all interfaces with neither RR nor RA set 789 in the restart TLV. This will cause the neighbors to refresh their 790 adjacencies. The router's own LSP(s) will continue to have the 791 overload bit set until timer T2 has expired or been cancelled. 793 2.4.1.2. Starting 795 In the case of a starting router, as soon as each adjacency is 796 established, and before any CSNP exchanges, the router's own zeroth 797 LSP is transmitted with the overload bit set. This prevents other 798 routers from computing routes through the router until it has 799 reliably acquired the complete set of LSPs. The overload bit remains 800 set in subsequent transmissions of the zeroth LSP (such as will occur 801 if a previous copy of the router's own zeroth LSP is still present in 802 the network) while any timer T2 is running. 804 When all the T2 timers have been cancelled, the router's own LSP(s) 805 MAY be regenerated with the overload bit clear (assuming the router 806 is not in fact overloaded, and there is no other reason, such as 807 incomplete BGP convergence, to keep the overload bit set) and flooded 808 as normal. 810 Other LSPs owned by this router (including pseudonodes) are generated 811 and flooded as normal, irrespective of the timer T2. The SPF is also 812 run as normal and the Routing Information Base (RIB) and Forwarding 813 Information Base (FIB) updated as routes become available. 815 To avoid the possible formation of temporary blackholes, the starting 816 router sets the SA bit in the restart TLV (as described in 817 Section 3.3.2) in all IIHs that it sends. 819 When all T2 timers have been cancelled, the starting router MUST 820 transmit IIHs with the SA bit clear. 822 3. State Tables 824 This section presents state tables that summarize the behaviors 825 described in this document. Other behaviors, in particular adjacency 826 state transitions and LSP database update operation, are NOT included 827 in the state tables except where this document modifies the behaviors 828 described in [ISO10589] and [RFC5303]. 830 The states named in the columns of the tables below are a mixture of 831 states that are specific to a single adjacency (ADJ suppressed, ADJ 832 Seen RA, ADJ Seen CSNP) and states that are indicative of the state 833 of the protocol instance (Running, Restarting, Starting, SPF Wait). 835 Three state tables are presented from the point of view of a running 836 router, a restarting router, and a starting router. 838 3.1. Running Router 839 Event | Running | ADJ suppressed 840 ============================================================== 841 RX PR | Set Planned Restart | 842 | state. | 843 | Update hold time 844 | Send PA | 845 -------------+----------------------+------------------------- 846 RX PR clr | Clear Planned | 847 and RR clr | Restart State | 848 | Restore holdtime to | 849 | local value | 850 -------------+----------------------+------------------------- 851 RX PA | Proceed with planned | 852 | restart | 853 -------------+----------------------+------------------------- 854 RX RR | Maintain ADJ State | 855 | Send RA | 856 | Set SRM,send CSNP | 857 | (Note 1) | 858 | Update Hold Time, | 859 | set Restart Mode | 860 | (Note 2) | 861 -------------+----------------------+------------------------- 862 RX RR clr | Clr Restart mode | 863 -------------+----------------------+------------------------- 864 RX SA | Suppress IS neighbor | 865 | TLV in LSP(s) | 866 | Goto ADJ Suppressed | 867 -------------+----------------------+------------------------- 868 RX SA clr | |Unsuppress IS neighbor 869 | | TLV in LSP(s) 870 | |Goto Running 871 ============================================================== 873 Note 1: CSNPs are sent by routers in accordance with Section 2.2.1c 875 Note 2: If Restart Mode clear 877 3.2. Restarting Router 879 Event | Restarting | ADJ Seen | ADJ Seen | SPF Wait 880 | | RA | CSNP | 881 =================================================================== 882 Restart | Send PR | | | 883 planned | | | | 884 ------------+--------------------+-----------+-----------+------------ 885 Planned | Send PR clr | | | 886 restart | | | | 887 canceled | | | | 888 ------------+--------------------+-----------+-----------+------------ 889 Router | Send IIH/RR | | | 890 restarts | ADJ Init | | | 891 | Start T1,T2,T3 | | | 892 ------------+--------------------+-----------+-----------+------------ 893 RX RR | Send RA | | | 894 ------------+--------------------+-----------+-----------+------------ 895 RX RA | Adjust T3 | | Cancel T1 | 896 | Goto ADJ Seen RA | | Adjust T3 | 897 ----------- +--------------------+-----------+-----------+------------ 898 RX CSNP set| Goto ADJ Seen CSNP | Cancel T1 | | 899 ------------+--------------------+-----------+-----------+------------ 900 RX IIH w/o | Cancel T1 (Point- | | | 901 Restart TLV| to-point only) | | | 902 ------------+--------------------+-----------+-----------+------------ 903 T1 expires | Send IIH/RR |Send IIH/RR|Send IIH/RR| 904 | Restart T1 | Restart T1| Restart T1| 905 ------------+--------------------+-----------+-----------+------------ 906 T1 expires | Send IIH/ | Send IIH/ | Send IIH/ | 907 nth time | normal | normal | normal | 908 ------------+--------------------+-----------+-----------+------------ 909 T2 expires | Trigger SPF | | | 910 | Goto SPF Wait | | | 911 ------------+--------------------+-----------+-----------+------------ 912 T3 expires | Set overload bit | | | 913 | Flood local LSPs | | | 914 | Update fwd plane | | | 915 ------------+--------------------+-----------+-----------+------------ 916 LSP DB Sync| Cancel T2, and T3 | | | 917 | Trigger SPF | | | 918 | Goto SPF wait | | | 919 ------------+--------------------+-----------+-----------+------------ 920 All SPF | | | | Clear 921 done | | | | overload bit 922 | | | | Update fwd 923 | | | | plane 924 | | | | Flood local 925 | | | | LSPs 926 | | | | Goto Running 927 ====================================================================== 929 3.3. Starting Router 931 Event | Starting | ADJ Seen RA| ADJ Seen CSNP 932 ============================================================= 933 Router | Send IIH/SA | | 934 starts | Start T1,T2 | | 935 -------------+-------------------+------------+--------------- 936 RX RR | Send RA | | 937 -------------+-------------------+------------+--------------- 938 RX RA | Goto ADJ Seen RA | | Cancel T1 939 -------------+-------------------+------------+--------------- 940 RX CSNP Set | Goto ADJ Seen CSNP| Cancel T1 | 941 -------------+-------------------+------------+--------------- 942 RX IIH w | Cancel T1 | | 943 no Restart | (Point-to-Point | | 944 TLV | only) | | 945 -------------+-------------------+------------+--------------- 946 ADJ UP | Start T1 | | 947 | Send local LSPs | | 948 | with overload bit| | 949 | set | | 950 -------------+-------------------+------------+--------------- 951 T1 expires | Send IIH/RR |Send IIH/RR | Send IIH/RR 952 | and SA | and SA | and SA 953 | Restart T1 |Restart T1 | Restart T1 954 -------------+-------------------+------------+--------------- 955 T1 expires | Send IIH/SA |Send IIH/SA | Send IIH/SA 956 nth time | | | 957 -------------+-------------------+------------+--------------- 958 T2 expires | Clear overload bit| | 959 | Send IIH normal | | 960 | Goto Running | | 961 -------------+-------------------+------------+--------------- 962 LSP DB Sync | Cancel T2 | | 963 | Clear overload bit| | 964 | Send IIH normal | | 965 ============================================================== 967 4. IANA Considerations 969 This document defines the following IS-IS TLV that is listed in the 970 IS-IS TLV codepoint registry: 972 Type Description IIH LSP SNP 973 ---- ----------------------------------- --- --- --- 974 211 Restart TLV y n n 976 5. Security Considerations 978 Any new security issues raised by the procedures in this document 979 depend upon the ability of an attacker to inject a false but 980 apparently valid IIH, the ease/difficulty of which has not been 981 altered. 983 If the RR bit is set in a false IIH, neighbors who receive such an 984 IIH will continue to maintain an existing adjacency in the "UP" state 985 and may (re)send a complete set of CSNPs. While the latter action is 986 wasteful, neither action causes any disruption in correct protocol 987 operation. 989 If the RA bit is set in a false IIH, a (re)starting router that 990 receives such an IIH may falsely believe that there is a neighbor on 991 the corresponding interface that supports the procedures described in 992 this document. In the absence of receipt of a complete set of CSNPs 993 on that interface, this could delay the completion of (re)start 994 procedures by requiring the timer T1 to time out the locally defined 995 maximum number of retries. This behavior is the same as would occur 996 on a LAN where none of the (re)starting router's neighbors support 997 the procedures in this document and is covered in Sections 2.3.1 and 998 2.3.2. 1000 If an SA bit is set in a false IIH, this could cause suppression of 1001 the advertisement of an IS neighbor, which could either continue for 1002 an indefinite period or occur intermittently with the result being a 1003 possible loss of reachability to some destinations in the network 1004 and/or increased frequency of LSP flooding and SPF calculation. 1006 The possibility of IS-IS PDU spoofing can be reduced by the use of 1007 authentication as described in [RFC1195] and [ISO10589], and 1008 especially the use of cryptographic authentication as described in 1009 [RFC5304] and [RFC5310]. 1011 6. Manageability Considerations 1013 These extensions that have been designed, developed, and deployed for 1014 many years do not have any new impact on management and operation of 1015 the IS-IS protocol via this standardization process. 1017 7. Acknowledgements 1019 For RFC 5306 the authors acknowledged contributions made by Jeff 1020 Parker, Radia Perlman, Mark Schaefer, Naiming Shen, Nischal Sheth, 1021 Russ White, and Rena Yang. 1023 The authors of this updated version acknowledge the contribution of 1024 Mike Shand, co-auther of RFC 5306. 1026 8. Normative References 1028 [ISO10589] 1029 International Organization for Standardization, 1030 "Intermediate system to Intermediate system intra-domain 1031 routeing information exchange protocol for use in 1032 conjunction with the protocol for providing the 1033 connectionless-mode Network Service (ISO 8473)", ISO/ 1034 IEC 10589:2002, Second Edition, Nov 2002. 1036 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 1037 dual environments", RFC 1195, DOI 10.17487/RFC1195, 1038 December 1990, . 1040 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1041 Requirement Levels", BCP 14, RFC 2119, 1042 DOI 10.17487/RFC2119, March 1997, 1043 . 1045 [RFC5303] Katz, D., Saluja, R., and D. Eastlake 3rd, "Three-Way 1046 Handshake for IS-IS Point-to-Point Adjacencies", RFC 5303, 1047 DOI 10.17487/RFC5303, October 2008, 1048 . 1050 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 1051 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 1052 2008, . 1054 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 1055 and M. Fanto, "IS-IS Generic Cryptographic 1056 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 1057 2009, . 1059 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1060 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1061 . 1063 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1064 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1065 May 2017, . 1067 Appendix A. Summary of Changes from RFC 5306 1069 This document extends RFC 5306 by introducing support for signalling 1070 the neighbors of a restarting router that a planned restart is about 1071 to occur. This allows the neighbors to be aware of the state of the 1072 restarting router so that appropriate action may be taken if other 1073 topology changes occur while the planned restart is in progress. 1074 Since the forwarding plane of the restarting router is maintained 1075 based upon the pre-restart state of the network, additional topology 1076 changes introduce the possibility that traffic may be lost if paths 1077 via the restarting router continue to be used while the restart is in 1078 progress. 1080 In support of this new functionality two new flags have been 1081 introduced: 1083 PR - Restart is planned 1084 PA - Planned restart acknowledgement 1086 No changes to the post restart exchange between the restarting router 1087 and its neighbors have been introduced. 1089 Authors' Addresses 1091 Les Ginsberg 1092 Cisco Systems, Inc. 1094 Email: ginsberg@cisco.com 1096 Paul Wells 1097 Cisco Systems, Inc. 1099 Email: pauwells@cisco.com