idnits 2.17.1 draft-ginsberg-isis-rfc5306bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 28, 2018) is 2128 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IS-IS for IP Internets L. Ginsberg 3 Internet-Draft P. Wells 4 Obsoletes: 5306 (if approved) Cisco Systems, Inc. 5 Intended status: Standards Track June 28, 2018 6 Expires: December 30, 2018 8 Restart Signaling for IS-IS 9 draft-ginsberg-isis-rfc5306bis-01 11 Abstract 13 This document describes a mechanism for a restarting router to signal 14 to its neighbors that it is restarting, allowing them to reestablish 15 their adjacencies without cycling through the down state, while still 16 correctly initiating database synchronization. 18 This document additionally describes a mechansim for a router to 19 signal its neighbors that it is preparing to initiate a restart while 20 maintaining forwarding plane state. This allows the neighbors to 21 maintain their adjacencies until the router has restarted, but also 22 allows the neighbors to bring the adjacencies down in the event of 23 other topology changes. 25 This document additionally describes a mechanism for a restarting 26 router to determine when it has achieved Link State Protocol Data 27 Unit (LSP) database synchronization with its neighbors and a 28 mechanism to optimize LSP database synchronization, while minimizing 29 transient routing disruption when a router starts. 31 This document obsoletes RFC 5306. 33 Requirements Language 35 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 36 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 37 "OPTIONAL" in this document are to be interpreted as described in BCP 38 14 [RFC2119] [RFC8174] when, and only when, they appear in all 39 capitals, as shown here. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at https://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on December 30, 2018. 58 Copyright Notice 60 Copyright (c) 2018 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (https://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 2.1. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2.2. Restart TLV . . . . . . . . . . . . . . . . . . . . . . . 5 79 2.2.1. Use of RR and RA Bits . . . . . . . . . . . . . . . . 6 80 2.2.2. Use of the SA Bit . . . . . . . . . . . . . . . . . . 7 81 2.2.3. Use of PR and PA Bits . . . . . . . . . . . . . . . . 8 82 2.3. Adjacency (Re)Acquisition . . . . . . . . . . . . . . . . 10 83 2.3.1. Adjacency Reacquisition during Restart . . . . . . . 10 84 2.3.2. Adjacency Acquisition during Start . . . . . . . . . 12 85 2.3.3. Multiple Levels . . . . . . . . . . . . . . . . . . . 14 86 2.4. Database Synchronization . . . . . . . . . . . . . . . . 14 87 2.4.1. LSP Generation and Flooding and SPF Computation . . . 15 88 3. State Tables . . . . . . . . . . . . . . . . . . . . . . . . 17 89 3.1. Running Router . . . . . . . . . . . . . . . . . . . . . 18 90 3.2. Restarting Router . . . . . . . . . . . . . . . . . . . . 18 91 3.3. Starting Router . . . . . . . . . . . . . . . . . . . . . 19 92 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 93 5. Security Considerations . . . . . . . . . . . . . . . . . . . 21 94 6. Manageability Considerations . . . . . . . . . . . . . . . . 21 95 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 96 8. Normative References . . . . . . . . . . . . . . . . . . . . 22 97 Appendix A. Summary of Changes from RFC 5306 . . . . . . . . . . 23 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 100 1. Overview 102 The Intermediate System to Intermediate System (IS-IS) routing 103 protocol [RFC1195] [ISO10589] is a link state intra-domain routing 104 protocol. Normally, when an IS-IS router is restarted, temporary 105 disruption of routing occurs due to events in both the restarting 106 router and the neighbors of the restarting router. 108 The router that has been restarted computes its own routes before 109 achieving database synchronization with its neighbors. The results 110 of this computation are likely to be non-convergent with the routes 111 computed by other routers in the area/domain. 113 Neighbors of the restarting router detect the restart event and cycle 114 their adjacencies with the restarting router through the down state. 115 The cycling of the adjacency state causes the neighbors to regenerate 116 their LSPs describing the adjacency concerned. This in turn causes a 117 temporary disruption of routes passing through the restarting router. 119 In certain scenarios, the temporary disruption of the routes is 120 highly undesirable. This document describes mechanisms to avoid or 121 minimize the disruption due to both of these causes. 123 When an adjacency is reinitialized as a result of a neighbor 124 restarting, a router does three things: 126 1. It causes its own LSP(s) to be regenerated, thus triggering SPF 127 runs throughout the area (or in the case of Level 2, throughout 128 the domain). 130 2. It sets SRMflags on its own LSP database on the adjacency 131 concerned. 133 3. In the case of a Point-to-Point link, it transmits a complete set 134 of Complete Sequence Number PDUs (CSNPs), over the adjacency. 136 In the case of a restarting router process, the first of these is 137 highly undesirable, but the second is essential in order to ensure 138 synchronization of the LSP database. 140 The third action above minimizes the number of LSPs that must be 141 exchanged and, if made reliable, provides a means of determining when 142 the LSP databases of the neighboring routers have been synchronized. 143 This is desirable whether or not the router is being restarted (so 144 that the overload bit can be cleared in the router's own LSP, for 145 example). 147 This document describes a mechanism for a restarting router to signal 148 that it is restarting to its neighbors, and allow them to reestablish 149 their adjacencies without cycling through the down state, while still 150 correctly initiating database synchronization. 152 This document additionally describes a mechanism for a restarting 153 router to determine when it has achieved LSP database synchronization 154 with its neighbors and a mechanism to optimize LSP database 155 synchronization and minimize transient routing disruption when a 156 router starts. 158 It is assumed that the three-way handshake [RFC5303] is being used on 159 Point-to-Point circuits. 161 2. Approach 163 2.1. Timers 165 Three additional timers, T1, T2, and T3, are required to support the 166 functionality defined in this document. 168 An instance of the timer T1 is maintained per interface, and 169 indicates the time after which an unacknowledged (re)start attempt 170 will be repeated. A typical value might be 3 seconds. 172 An instance of the timer T2 is maintained for each LSP database 173 (LSPDB) present in the system, i.e., for a Level 1/2 system, there 174 will be an instance of the timer T2 for Level 1 and an instance for 175 Level 2. This is the maximum time that the system will wait for 176 LSPDB synchronization. A typical value might be 60 seconds. 178 A single instance of the timer T3 is maintained for the entire 179 system. It indicates the time after which the router will declare 180 that it has failed to achieve database synchronization (by setting 181 the overload bit in its own LSP). This is initialized to 65535 182 seconds, but is set to the minimum of the remaining times of received 183 IS-IS Hellos (IIHs) containing a restart TLV with the Restart 184 Acknowledgement (RA) set and an indication that the neighbor has an 185 adjacency in the "UP" state to the restarting router. 187 NOTE: The timer T3 is only used by a restarting router. 189 2.2. Restart TLV 191 A new TLV is defined to be included in IIH PDUs. The presence of 192 this TLV indicates that the sender supports the functionality defined 193 in this document and it carries flags that are used to convey 194 information during a (re)start. All IIHs transmitted by a router 195 that supports this capability MUST include this TLV. 197 Type 211 199 Length: Number of octets in the Value field (1 to (3 + ID Length)) 200 Value 202 No. of octets 203 +-----------------------+ 204 | Flags | 1 205 +-----------------------+ 206 | Remaining Time | 2 207 +-----------------------+ 208 | Restarting Neighbor ID| ID Length 209 +-----------------------+ 211 Flags (1 octet) 213 0 1 2 3 4 5 6 7 214 +--+--+--+--+--+--+--+--+ 215 |Reserved|PA|PR|SA|RA|RR| 216 +--+--+--+--+--+--+--+--+ 218 RR - Restart Request 219 RA - Restart Acknowledgement 220 SA - Suppress adjacency advertisement 221 PR - Restart is planned 222 PA - Planned restart acknowledgement 224 (Note: Remaining fields are required when the RA bit is set.) 225 Remaining Time (2 octets) 227 Remaining holding time (in seconds) 229 Restarting Neighbor System ID (ID Length octets) 231 The System ID of the neighbor to which an RA refers. Note: 232 Implementations based on earlier versions of this document may not 233 include this field in the TLV when the RA is set. In this case, a 234 router that is expecting an RA on a LAN circuit SHOULD assume that 235 the acknowledgement is directed at the local system. 237 2.2.1. Use of RR and RA Bits 239 The RR bit is used by a (re)starting router to signal to its 240 neighbors that a (re)start is in progress, that an existing adjacency 241 SHOULD be maintained even under circumstances when the normal 242 operation of the adjacency state machine would require the adjacency 243 to be reinitialized, to request a set of CSNPs, and to request 244 setting of the SRMflags. 246 The RA bit is sent by the neighbor of a (re)starting router to 247 acknowledge the receipt of a restart TLV with the RR bit set. 249 When the neighbor of a (re)starting router receives an IIH with the 250 restart TLV having the RR bit set, if there exists on this interface 251 an adjacency in state "UP" with the same System ID, and in the case 252 of a LAN circuit, with the same source LAN address, then, 253 irrespective of the other contents of the "Intermediate System 254 Neighbors" option (LAN circuits) or the "Point-to-Point Three-Way 255 Adjacency" option (Point-to-Point circuits): 257 a. the state of the adjacency is not changed. If this is the first 258 IIH with the RR bit set that this system has received associated 259 with this adjacency, then the adjacency is marked as being in 260 "Restart mode" and the adjacency holding time is refreshed -- 261 otherwise, the holding time is not refreshed. The "remaining 262 time" transmitted according to (b) below MUST reflect the actual 263 time after which the adjacency will now expire. Receipt of a 264 normal IIH with the RR bit reset will clear the "Restart mode" 265 state. This procedure allows the restarting router to cause the 266 neighbor to maintain the adjacency long enough for restart to 267 successfully complete, while also preventing repetitive restarts 268 from maintaining an adjacency indefinitely. Whether or not an 269 adjacency is marked as being in "Restart mode" has no effect on 270 adjacency state transitions. 272 b. immediately (i.e., without waiting for any currently running 273 timer interval to expire, but with a small random delay of a few 274 tens of milliseconds on LANs to avoid "storms") transmit over the 275 corresponding interface an IIH including the restart TLV with the 276 RR bit clear and the RA bit set, in the case of Point-to-Point 277 adjacencies having updated the "Point-to-Point Three-Way 278 Adjacency" option to reflect any new values received from the 279 (re)starting router. (This allows a restarting router to quickly 280 acquire the correct information to place in its hellos.) The 281 "Remaining Time" MUST be set to the current time (in seconds) 282 before the holding timer on this adjacency is due to expire. If 283 the corresponding interface is a LAN interface, then the 284 Restarting Neighbor System ID SHOULD be set to the System ID of 285 the router from which the IIH with the RR bit set was received. 286 This is required to correctly associate the acknowledgement and 287 holding time in the case where multiple systems on a LAN restart 288 at approximately the same time. This IIH SHOULD be transmitted 289 before any LSPs or SNPs are transmitted as a result of the 290 receipt of the original IIH. 292 c. if the corresponding interface is a Point-to-Point interface, or 293 if the receiving router has the highest LnRouterPriority (with 294 the highest source MAC (Media Access Control) address breaking 295 ties) among those routers to which the receiving router has an 296 adjacency in state "UP" on this interface whose IIHs contain the 297 restart TLV, excluding adjacencies to all routers which are 298 considered in "Restart mode" (note the actual DIS is NOT changed 299 by this process), initiate the transmission over the 300 corresponding interface of a complete set of CSNPs, and set 301 SRMflags on the corresponding interface for all LSPs in the local 302 LSP database. 304 Otherwise (i.e., if there was no adjacency in the "UP" state to the 305 System ID in question), process the IIH as normal by reinitializing 306 the adjacency and setting the RA bit in the returned IIH. 308 2.2.2. Use of the SA Bit 310 The SA bit is used by a starting router to request that its neighbor 311 suppress advertisement of the adjacency to the starting router in the 312 neighbor's LSPs. 314 A router that is starting has no maintained forwarding function 315 state. This may or may not be the first time the router has started. 316 If this is not the first time the router has started, copies of LSPs 317 generated by this router in its previous incarnation may exist in the 318 LSP databases of other routers in the network. These copies are 319 likely to appear "newer" than LSPs initially generated by the 320 starting router due to the reinitialization of LSP fragment sequence 321 numbers by the starting router. This may cause temporary blackholes 322 to occur until the normal operation of the update process causes the 323 starting router to regenerate and flood copies of its own LSPs with 324 higher sequence numbers. The temporary blackholes can be avoided if 325 the starting router's neighbors suppress advertising an adjacency to 326 the starting router until the starting router has been able to 327 propagate newer versions of LSPs generated by previous incarnations. 329 When a router receives an IIH with the restart TLV having the SA bit 330 set, if there exists on this interface an adjacency in state "UP" 331 with the same System ID, and in the case of a LAN circuit, with the 332 same source LAN address, then the router MUST suppress advertisement 333 of the adjacency to the neighbor in its own LSPs. Until an IIH with 334 the SA bit clear has been received, the neighbor advertisement MUST 335 continue to be suppressed. If the adjacency transitions to the "UP" 336 state, the new adjacency MUST NOT be advertised until an IIH with the 337 SA bit clear has been received. 339 Note that a router that suppresses advertisement of an adjacency MUST 340 NOT use this adjacency when performing its SPF calculation. In 341 particular, if an implementation follows the example guidelines 342 presented in [ISO10589], Annex C.2.5, Step 0:b) "pre-load TENT with 343 the local adjacency database", the suppressed adjacency MUST NOT be 344 loaded into TENT. 346 2.2.3. Use of PR and PA Bits 348 The PR bit is used by a router which is planning to initiate a 349 restart to signal to its neighbors that it will be restarting. 351 The PA bit is sent by the neighbor of a router planning to restart to 352 acknowledge receipt of a restart TLV with the PR bit set. 354 When the neighbor of a router planning a restart receives an IIH with 355 the restart TLV having the PR bit set, if there exists on this 356 interface an adjacency in state "UP" with the same System ID, and in 357 the case of a LAN circuit, with the same source LAN address, then: 359 a. if this is the first IIH with the PR bit set that this system has 360 received associated with this adjacency, then the adjacency is 361 marked as being in "Planned Restart state" and the adjacency 362 holding time is refreshed -- otherwise, the holding time is not 363 refreshed. The "remaining time" transmitted according to (b) 364 below MUST reflect the actual time after which the adjacency will 365 now expire. Receipt of a normal IIH with the PR bit reset will 366 clear the "Planned Restart mode" state. This procedure allows 367 the router planning a restart to cause the neighbor to maintain 368 the adjacency long enough for restart to successfully complete. 369 Whether or not an adjacency is marked as being in "Planned 370 Restart mode" has no effect on adjacency state transitions. 372 b. immediately (i.e., without waiting for any currently running 373 timer interval to expire, but with a small random delay of a few 374 tens of milliseconds on LANs to avoid "storms") transmit over the 375 corresponding interface an IIH including the restart TLV with the 376 PR bit clear and the PA bit set. The "Remaining Time" MUST be 377 set to the current time (in seconds) before the holding timer on 378 this adjacency is due to expire. If the corresponding interface 379 is a LAN interface, then the Restarting Neighbor System ID SHOULD 380 be set to the System ID of the router from which the IIH with the 381 PR bit set was received. This is required to correctly associate 382 the acknowledgement and holding time in the case where multiple 383 systems on a LAN are planning a restart at approximately the same 384 time. 386 While a control plane restart is in progress it is expected that the 387 restarting router will be unable to respond to topology changes. It 388 is therefore useful to signal a planned restart (if the forwarding 389 plane on the restarting router is maintained) so that the neighbors 390 of the restarting router can determine whether it is safe to maintain 391 the adjacency if other topology changes occur prior to the completion 392 of the restart. Signalling a planned restart in the absence of 393 maintained forwarding plane state is likely to lead to significant 394 traffic loss and MUST NOT be done. 396 Neighbors of the router which has signaled planned restart SHOULD 397 maintain the adjacency in a planned restart state until it receives 398 an IIH with the RR bit set, receives an IIH with both PR and RR bits 399 clear, or the adjacency holding time expires - whichever occurs 400 first. 402 While the adjacency is in planned restart state the following actions 403 MAY be taken: 405 a. If additional topology changes occur, the adjacency which is in 406 planned restart state MAY be brought down even though the hold 407 time has not yet expired. Given that the neighbor which has 408 signaled a planned restart is not expected to update its 409 forwarding plane in response to signaling of the topology changes 410 (since it is restarting) traffic which transits that node is at 411 risk of being improperly forwarded. On a LAN circuit, if the 412 router in planned restart state is the DIS at any supported 413 level, the adjacency(ies) SHOULD be brought down whenever any LSP 414 update is either generated or received so as to trigger a new DIS 415 election. Failure to do so will compromise the reliability of 416 the Update Process on that circuit. What other criteria are used 417 to determine what topology changes will trigger bringing the 418 adjacency down is a local implementation decision. 420 b. If a BFD session to the neighbor which signals a planned restart 421 is in the UP state and subsequently goes DOWN, the event MAY be 422 ignored since it is possible this is an expected side effect of 423 the restart. Use of the Control Plane Independent state as 424 signalled in BFD control packets [RFC5880] SHOULD be considered 425 in the decision to ignore a BFD Session DOWN event 427 c. On a Point-to-Point circuit, transmission of LSPs, CSNPs, and 428 PSNPs MAY be suppressed. It is expected that the PDUs will not 429 be received. 431 2.3. Adjacency (Re)Acquisition 433 Adjacency (re)acquisition is the first step in (re)initialization. 434 Restarting and starting routers will make use of the RR bit in the 435 restart TLV, though each will use it at different stages of the 436 (re)start procedure. 438 2.3.1. Adjacency Reacquisition during Restart 440 The restarting router explicitly notifies its neighbor that the 441 adjacency is being reacquired, and hence that it SHOULD NOT 442 reinitialize the adjacency. This is achieved by setting the RR bit 443 in the restart TLV. When the neighbor of a restarting router 444 receives an IIH with the restart TLV having the RR bit set, if there 445 exists on this interface an adjacency in state "UP" with the same 446 System ID, and in the case of a LAN circuit, with the same source LAN 447 address, then the procedures described in Section 3.2.1 are followed. 449 A router that does not support the restart capability will ignore the 450 restart TLV and reinitialize the adjacency as normal, returning an 451 IIH without the restart TLV. 453 On restarting, a router initializes the timer T3, starts the timer T2 454 for each LSPDB, and for each interface (and in the case of a LAN 455 circuit, for each level) starts the timer T1 and transmits an IIH 456 containing the restart TLV with the RR bit set. 458 On a Point-to-Point circuit, the restarting router SHOULD set the 459 "Adjacency Three-Way State" to "Init", because the receipt of the 460 acknowledging IIH (with RA set) MUST cause the adjacency to enter the 461 "UP" state immediately. 463 On a LAN circuit, the LAN-ID assigned to the circuit SHOULD be the 464 same as that used prior to the restart. In particular, for any 465 circuits for which the restarting router was previously DIS, the use 466 of a different LAN-ID would necessitate the generation of a new set 467 of pseudonode LSPs, and corresponding changes in all the LSPs 468 referencing them from other routers on the LAN. By preserving the 469 LAN-ID across the restart, this churn can be prevented. To enable a 470 restarting router to learn the LAN-ID used prior to restart, the LAN- 471 ID specified in an IIH with RR set MUST be ignored. 473 Transmission of "normal" IIHs is inhibited until the conditions 474 described below are met (in order to avoid causing an unnecessary 475 adjacency initialization). Upon expiry of the timer T1, it is 476 restarted and the IIH is retransmitted as above. 478 When a restarting router receives an IIH a local adjacency is 479 established as usual, and if the IIH contains a restart TLV with the 480 RA bit set (and on LAN circuits with a Restart Neighbor System ID 481 that matches that of the local system), the receipt of the 482 acknowledgement over that interface is noted. When the RA bit is set 483 and the state of the remote adjacency is "UP", then the timer T3 is 484 set to the minimum of its current value and the value of the 485 "Remaining Time" field in the received IIH. 487 On a Point-to-Point link, receipt of an IIH not containing the 488 restart TLV is also treated as an acknowledgement, since it indicates 489 that the neighbor is not restart capable. However, since no CSNP is 490 guaranteed to be received over this interface, the timer T1 is 491 cancelled immediately without waiting for a complete set of CSNPs. 492 Synchronization may therefore be deemed complete even though there 493 are some LSPs which are held (only) by this neighbor (see 494 Section 3.4). In this case, we also want to be certain that the 495 neighbor will reinitialize the adjacency in order to guarantee that 496 the SRMflags have been set on its database, thus ensuring eventual 497 LSPDB synchronization. This is guaranteed to happen except in the 498 case where the Adjacency Three-Way State in the received IIH is "UP" 499 and the Neighbor Extended Local Circuit ID matches the extended local 500 circuit ID assigned by the restarting router. In this case, the 501 restarting router MUST force the adjacency to reinitialize by setting 502 the local Adjacency Three-Way State to "DOWN" and sending a normal 503 IIH. 505 In the case of a LAN interface, receipt of an IIH not containing the 506 restart TLV is unremarkable since synchronization can still occur so 507 long as at least one of the non-restarting neighboring routers on the 508 LAN supports restart. Therefore, T1 continues to run in this case. 509 If none of the neighbors on the LAN are restart capable, T1 will 510 eventually expire after the locally defined number of retries. 512 In the case of a Point-to-Point circuit, the "LocalCircuitID" and 513 "Extended Local Circuit ID" information contained in the IIH can be 514 used immediately to generate an IIH containing the correct three-way 515 handshake information. The presence of "Neighbor Extended Local 516 Circuit ID" information that does not match the value currently in 517 use by the local system is ignored (since the IIH may have been 518 transmitted before the neighbor had received the new value from the 519 restarting router), but the adjacency remains in the initializing 520 state until the correct information is received. 522 In the case of a LAN circuit, the source neighbor information (e.g., 523 SNPAAddress) is recorded and used for adjacency establishment and 524 maintenance as normal. 526 When BOTH a complete set of CSNPs (for each active level, in the case 527 of a Point-to-Point circuit) and an acknowledgement have been 528 received over the interface, the timer T1 is cancelled. 530 Once the timer T1 has been cancelled, subsequent IIHs are transmitted 531 according to the normal algorithms, but including the restart TLV 532 with both RR and RA clear. 534 If a LAN contains a mixture of systems, only some of which support 535 the new algorithm, database synchronization is still guaranteed, but 536 the "old" systems will have reinitialized their adjacencies. 538 If an interface is active, but does not have any neighboring router 539 reachable over that interface, the timer T1 would never be cancelled, 540 and according to Section 3.4.1.1, the SPF would never be run. 541 Therefore, timer T1 is cancelled after some predetermined number of 542 expirations (which MAY be 1). 544 2.3.2. Adjacency Acquisition during Start 546 The starting router wants to ensure that in the event that a 547 neighboring router has an adjacency to the starting router in the 548 "UP" state (from a previous incarnation of the starting router), this 549 adjacency is reinitialized. The starting router also wants 550 neighboring routers to suppress advertisement of an adjacency to the 551 starting router until LSP database synchronization is achieved. This 552 is achieved by sending IIHs with the RR bit clear and the SA bit set 553 in the restart TLV. The RR bit remains clear and the SA bit remains 554 set in subsequent transmissions of IIHs until the adjacency has 555 reached the "UP" state and the initial T1 timer interval (see below) 556 has expired. 558 Receipt of an IIH with the RR bit clear will result in the 559 neighboring router utilizing normal operation of the adjacency state 560 machine. This will ensure that any old adjacency on the neighboring 561 router will be reinitialized. 563 Upon receipt of an IIH with the SA bit set, the behavior described in 564 Section 3.2.2 is followed. 566 Upon starting, a router starts timer T2 for each LSPDB. 568 For each interface (and in the case of a LAN circuit, for each 569 level), when an adjacency reaches the "UP" state, the starting router 570 starts a timer T1 and transmits an IIH containing the restart TLV 571 with the RR bit clear and SA bit set. Upon expiry of the timer T1, 572 it is restarted and the IIH is retransmitted with both RR and SA bits 573 set (only the RR bit has changed state from earlier IIHs). 575 Upon receipt of an IIH with the RR bit set (regardless of whether or 576 not the SA bit is set), the behavior described in Section 2.2.1 is 577 followed. 579 When an IIH is received by the starting router and the IIH contains a 580 restart TLV with the RA bit set (and on LAN circuits with a Restart 581 Neighbor System ID that matches that of the local system), the 582 receipt of the acknowledgement over that interface is noted. 584 On a Point-to-Point link, receipt of an IIH not containing the 585 restart TLV is also treated as an acknowledgement, since it indicates 586 that the neighbor is not restart capable. Since the neighbor will 587 have reinitialized the adjacency, this guarantees that SRMflags have 588 been set on its database, thus ensuring eventual LSPDB 589 synchronization. However, since no CSNP is guaranteed to be received 590 over this interface, the timer T1 is cancelled immediately without 591 waiting for a complete set of CSNPs. Synchronization may therefore 592 be deemed complete even though there are some LSPs that are held 593 (only) by this neighbor (see Section 2.4). 595 In the case of a LAN interface, receipt of an IIH not containing the 596 restart TLV is unremarkable since synchronization can still occur so 597 long as at least one of the non-restarting neighboring routers on the 598 LAN supports restart. Therefore, T1 continues to run in this case. 599 If none of the neighbors on the LAN are restart capable, T1 will 600 eventually expire after the locally defined number of retries. The 601 usual operation of the update process will ensure that 602 synchronization is eventually achieved. 604 When BOTH a complete set of CSNPs (for each active level, in the case 605 of a Point-to-Point circuit) and an acknowledgement have been 606 received over the interface, the timer T1 is cancelled. Subsequent 607 IIHs sent by the starting router have the RR and RA bits clear and 608 the SA bit set in the restart TLV. 610 Timer T1 is cancelled after some predetermined number of expirations 611 (which MAY be 1). 613 When the T2 timer(s) are cancelled or expire, transmission of 614 "normal" IIHs (with RR, RA, and SA bits clear) will begin. 616 2.3.3. Multiple Levels 618 A router that is operating as both a Level 1 and a Level 2 router on 619 a particular interface MUST perform the above operations for each 620 level. 622 On a LAN interface, it MUST send and receive both Level 1 and Level 2 623 IIHs and perform the CSNP synchronizations independently for each 624 level. 626 On a Point-to-Point interface, only a single IIH (indicating support 627 for both levels) is required, but it MUST perform the CSNP 628 synchronizations independently for each level. 630 2.4. Database Synchronization 632 When a router is started or restarted, it can expect to receive a 633 complete set of CSNPs over each interface. The arrival of the 634 CSNP(s) is now guaranteed, since an IIH with the RR bit set will be 635 retransmitted until the CSNP(s) are correctly received. 637 The CSNPs describe the set of LSPs that are currently held by each 638 neighbor. Synchronization will be complete when all these LSPs have 639 been received. 641 When (re)starting, a router starts an instance of timer T2 for each 642 LSPDB as described in Section 3.3.1 or Section 3.3.2. In addition to 643 normal processing of the CSNPs, the set of LSPIDs contained in the 644 first complete set of CSNPs received over each interface is recorded, 645 together with their remaining lifetime. In the case of a LAN 646 interface, a complete set of CSNPs MUST consist of CSNPs received 647 from neighbors that are not restarting. If there are multiple 648 interfaces on the (re)starting router, the recorded set of LSPIDs is 649 the union of those received over each interface. LSPs with a 650 remaining lifetime of zero are NOT so recorded. 652 As LSPs are received (by the normal operation of the update process) 653 over any interface, the corresponding LSPID entry is removed (it is 654 also removed if an LSP arrives before the CSNP containing the 655 reference). When an LSPID has been held in the list for its 656 indicated remaining lifetime, it is removed from the list. When the 657 list of LSPIDs is empty and the timer T1 has been cancelled for all 658 the interfaces that have an adjacency at this level, the timer T2 is 659 cancelled. 661 At this point, the local database is guaranteed to contain all the 662 LSP(s) (either the same sequence number or a more recent sequence 663 number) that were present in the neighbors' databases at the time of 664 (re)starting. LSPs that arrived in a neighbor's database after the 665 time of (re)starting may or may not be present, but the normal 666 operation of the update process will guarantee that they will 667 eventually be received. At this point, the local database is deemed 668 to be "synchronized". 670 Since LSPs mentioned in the CSNP(s) with a zero remaining lifetime 671 are not recorded, and those with a short remaining lifetime are 672 deleted from the list when the lifetime expires, cancellation of the 673 timer T2 will not be prevented by waiting for an LSP that will never 674 arrive. 676 2.4.1. LSP Generation and Flooding and SPF Computation 678 The operation of a router starting, as opposed to restarting, is 679 somewhat different. These two cases are dealt with separately below. 681 2.4.1.1. Restarting 683 In order to avoid causing unnecessary routing churn in other routers, 684 it is highly desirable that the router's own LSPs generated by the 685 restarting system are the same as those previously present in the 686 network (assuming no other changes have taken place). It is 687 important therefore not to regenerate and flood the LSPs until all 688 the adjacencies have been re-established and any information required 689 for propagation into the local LSPs is fully available. Ideally, the 690 information is loaded into the LSPs in a deterministic way, such that 691 the same information occurs in the same place in the same LSP (and 692 hence the LSPs are identical to their previous versions). If this 693 can be achieved, the new versions may not even cause SPF to be run in 694 other systems. However, provided the same information is included in 695 the set of LSPs (albeit in a different order, and possibly different 696 LSPs), the result of running the SPF will be the same and will not 697 cause churn to the forwarding tables. 699 In the case of a restarting router, none of the router's own LSPs are 700 transmitted, nor are the router's own forwarding tables updated while 701 the timer T3 is running. 703 Redistribution of inter-level information MUST be regenerated before 704 this router's LSP is flooded to other nodes. Therefore, the Level-n 705 non-pseudonode LSP(s) MUST NOT be flooded until the other level's T2 706 timer has expired and its SPF has been run. This ensures that any 707 inter-level information that is to be propagated can be included in 708 the Level-n LSP(s). 710 During this period, if one of the router's own (including 711 pseudonodes) LSPs is received, which the local router does not 712 currently have in its own database, it is NOT purged. Under normal 713 operation, such an LSP would be purged, since the LSP clearly should 714 not be present in the global LSP database. However, in the present 715 circumstances, this would be highly undesirable, because it could 716 cause premature removal of a router's own LSP -- and hence churn in 717 remote routers. Even if the local system has one or more of the 718 router's own LSPs (which it has generated, but not yet transmitted), 719 it is still not valid to compare the received LSP against this set, 720 since it may be that as a result of propagation between Level 1 and 721 Level 2 (or vice versa), a further router's own LSP will need to be 722 generated when the LSP databases have synchronized. 724 During this period, a restarting router SHOULD send CSNPs as it 725 normally would. Information about the router's own LSPs MAY be 726 included, but if it is included it MUST be based on LSPs that have 727 been received, not on versions that have been generated (but not yet 728 transmitted). This restriction is necessary to prevent premature 729 removal of an LSP from the global LSP database. 731 When the timer T2 expires or is cancelled indicating that 732 synchronization for that level is complete, the SPF for that level is 733 run in order to derive any information that is required to be 734 propagated to another level, but the forwarding tables are not yet 735 updated. 737 Once the other level's SPF has run and any inter-level propagation 738 has been resolved, the router's own LSPs can be generated and 739 flooded. Any own LSPs that were previously ignored, but that are not 740 part of the current set of own LSPs (including pseudonodes), MUST 741 then be purged. Note that it is possible that a Designated Router 742 change may have taken place, and consequently the router SHOULD purge 743 those pseudonode LSPs that it previously owned, but that are now no 744 longer part of its set of pseudonode LSPs. 746 When all the T2 timers have expired or been cancelled, the timer T3 747 is cancelled and the local forwarding tables are updated. 749 If the timer T3 expires before all the T2 timers have expired or been 750 cancelled, this indicates that the synchronization process is taking 751 longer than the minimum holding time of the neighbors. The router's 752 own LSP(s) for levels that have not yet completed their first SPF 753 computation are then flooded with the overload bit set to indicate 754 that the router's LSPDB is not yet synchronized (and therefore other 755 routers MUST NOT compute routes through this router). Normal 756 operation of the update process resumes, and the local forwarding 757 tables are updated. In order to prevent the neighbor's adjacencies 758 from expiring, IIHs with the normal interface value for the holding 759 time are transmitted over all interfaces with neither RR nor RA set 760 in the restart TLV. This will cause the neighbors to refresh their 761 adjacencies. The router's own LSP(s) will continue to have the 762 overload bit set until timer T2 has expired or been cancelled. 764 2.4.1.2. Starting 766 In the case of a starting router, as soon as each adjacency is 767 established, and before any CSNP exchanges, the router's own zeroth 768 LSP is transmitted with the overload bit set. This prevents other 769 routers from computing routes through the router until it has 770 reliably acquired the complete set of LSPs. The overload bit remains 771 set in subsequent transmissions of the zeroth LSP (such as will occur 772 if a previous copy of the router's own zeroth LSP is still present in 773 the network) while any timer T2 is running. 775 When all the T2 timers have been cancelled, the router's own LSP(s) 776 MAY be regenerated with the overload bit clear (assuming the router 777 is not in fact overloaded, and there is no other reason, such as 778 incomplete BGP convergence, to keep the overload bit set) and flooded 779 as normal. 781 Other LSPs owned by this router (including pseudonodes) are generated 782 and flooded as normal, irrespective of the timer T2. The SPF is also 783 run as normal and the Routing Information Base (RIB) and Forwarding 784 Information Base (FIB) updated as routes become available. 786 To avoid the possible formation of temporary blackholes, the starting 787 router sets the SA bit in the restart TLV (as described in 788 Section 3.3.2) in all IIHs that it sends. 790 When all T2 timers have been cancelled, the starting router MUST 791 transmit IIHs with the SA bit clear. 793 3. State Tables 795 This section presents state tables that summarize the behaviors 796 described in this document. Other behaviors, in particular adjacency 797 state transitions and LSP database update operation, are NOT included 798 in the state tables except where this document modifies the behaviors 799 described in [ISO10589] and [RFC5303]. 801 The states named in the columns of the tables below are a mixture of 802 states that are specific to a single adjacency (ADJ suppressed, ADJ 803 Seen RA, ADJ Seen CSNP) and states that are indicative of the state 804 of the protocol instance (Running, Restarting, Starting, SPF Wait). 806 Three state tables are presented from the point of view of a running 807 router, a restarting router, and a starting router. 809 3.1. Running Router 811 Event | Running | ADJ suppressed 812 ============================================================== 813 RX PR | Set Planned Restart | 814 | state. | 815 | Send PA | 816 -------------+----------------------+------------------------- 817 RX PR clr | Clear Planned | 818 and RR clr | Restart State | 819 -------------+----------------------+------------------------- 820 RX RR | Maintain ADJ State | 821 | Send RA | 822 | Set SRM,send CSNP | 823 | (Note 1) | 824 | Update Hold Time, | 825 | set Restart Mode | 826 | (Note 2) | 827 -------------+----------------------+------------------------- 828 RX RR clr | Clr Restart mode | 829 -------------+----------------------+------------------------- 830 RX SA | Suppress IS neighbor | 831 | TLV in LSP(s) | 832 | Goto ADJ Suppressed | 833 -------------+----------------------+------------------------- 834 RX SA clr | |Unsuppress IS neighbor 835 | | TLV in LSP(s) 836 | |Goto Running 837 ============================================================== 839 Note 1: CSNPs are sent by routers in accordance with Section 2.2.1c 841 Note 2: If Restart Mode clear 843 3.2. Restarting Router 845 Event | Restarting | ADJ Seen | ADJ Seen | SPF Wait 846 | | RA | CSNP | 847 =================================================================== 848 Restart | Send PR | | | 849 planned | | | | 850 ------------+--------------------+-----------+-----------+------------ 851 Planned | Send PR clr | | | 852 restart | | | | 853 canceled | | | | 854 ------------+--------------------+-----------+-----------+------------ 855 Router | Send IIH/RR | | | 856 restarts | ADJ Init | | | 857 | Start T1,T2,T3 | | | 858 ------------+--------------------+-----------+-----------+------------ 859 RX RR | Send RA | | | 860 ------------+--------------------+-----------+-----------+------------ 861 RX RA | Adjust T3 | | Cancel T1 | 862 | Goto ADJ Seen RA | | Adjust T3 | 863 ----------- +--------------------+-----------+-----------+------------ 864 RX CSNP set| Goto ADJ Seen CSNP | Cancel T1 | | 865 ------------+--------------------+-----------+-----------+------------ 866 RX IIH w/o | Cancel T1 (Point- | | | 867 Restart TLV| to-point only) | | | 868 ------------+--------------------+-----------+-----------+------------ 869 T1 expires | Send IIH/RR |Send IIH/RR|Send IIH/RR| 870 | Restart T1 | Restart T1| Restart T1| 871 ------------+--------------------+-----------+-----------+------------ 872 T1 expires | Send IIH/ | Send IIH/ | Send IIH/ | 873 nth time | normal | normal | normal | 874 ------------+--------------------+-----------+-----------+------------ 875 T2 expires | Trigger SPF | | | 876 | Goto SPF Wait | | | 877 ------------+--------------------+-----------+-----------+------------ 878 T3 expires | Set overload bit | | | 879 | Flood local LSPs | | | 880 | Update fwd plane | | | 881 ------------+--------------------+-----------+-----------+------------ 882 LSP DB Sync| Cancel T2, and T3 | | | 883 | Trigger SPF | | | 884 | Goto SPF wait | | | 885 ------------+--------------------+-----------+-----------+------------ 886 All SPF | | | | Clear 887 done | | | | overload bit 888 | | | | Update fwd 889 | | | | plane 890 | | | | Flood local 891 | | | | LSPs 892 | | | | Goto Running 893 ====================================================================== 895 3.3. Starting Router 896 Event | Starting | ADJ Seen RA| ADJ Seen CSNP 897 ============================================================= 898 Router | Send IIH/SA | | 899 starts | Start T1,T2 | | 900 -------------+-------------------+------------+--------------- 901 RX RR | Send RA | | 902 -------------+-------------------+------------+--------------- 903 RX RA | Goto ADJ Seen RA | | Cancel T1 904 -------------+-------------------+------------+--------------- 905 RX CSNP Set | Goto ADJ Seen CSNP| Cancel T1 | 906 -------------+-------------------+------------+--------------- 907 RX IIH w | Cancel T1 | | 908 no Restart | (Point-to-Point | | 909 TLV | only) | | 910 -------------+-------------------+------------+--------------- 911 ADJ UP | Start T1 | | 912 | Send local LSPs | | 913 | with overload bit| | 914 | set | | 915 -------------+-------------------+------------+--------------- 916 T1 expires | Send IIH/RR |Send IIH/RR | Send IIH/RR 917 | and SA | and SA | and SA 918 | Restart T1 |Restart T1 | Restart T1 919 -------------+-------------------+------------+--------------- 920 T1 expires | Send IIH/SA |Send IIH/SA | Send IIH/SA 921 nth time | | | 922 -------------+-------------------+------------+--------------- 923 T2 expires | Clear overload bit| | 924 | Send IIH normal | | 925 | Goto Running | | 926 -------------+-------------------+------------+--------------- 927 LSP DB Sync | Cancel T2 | | 928 | Clear overload bit| | 929 | Send IIH normal | | 930 ============================================================== 932 4. IANA Considerations 934 This document defines the following IS-IS TLV that is listed in the 935 IS-IS TLV codepoint registry: 937 Type Description IIH LSP SNP 938 ---- ----------------------------------- --- --- --- 939 211 Restart TLV y n n 941 5. Security Considerations 943 Any new security issues raised by the procedures in this document 944 depend upon the ability of an attacker to inject a false but 945 apparently valid IIH, the ease/difficulty of which has not been 946 altered. 948 If the RR bit is set in a false IIH, neighbors who receive such an 949 IIH will continue to maintain an existing adjacency in the "UP" state 950 and may (re)send a complete set of CSNPs. While the latter action is 951 wasteful, neither action causes any disruption in correct protocol 952 operation. 954 If the RA bit is set in a false IIH, a (re)starting router that 955 receives such an IIH may falsely believe that there is a neighbor on 956 the corresponding interface that supports the procedures described in 957 this document. In the absence of receipt of a complete set of CSNPs 958 on that interface, this could delay the completion of (re)start 959 procedures by requiring the timer T1 to time out the locally defined 960 maximum number of retries. This behavior is the same as would occur 961 on a LAN where none of the (re)starting router's neighbors support 962 the procedures in this document and is covered in Sections 2.3.1 and 963 2.3.2. 965 If an SA bit is set in a false IIH, this could cause suppression of 966 the advertisement of an IS neighbor, which could either continue for 967 an indefinite period or occur intermittently with the result being a 968 possible loss of reachability to some destinations in the network 969 and/or increased frequency of LSP flooding and SPF calculation. 971 The possibility of IS-IS PDU spoofing can be reduced by the use of 972 authentication as described in [RFC1195] and [ISO10589], and 973 especially the use of cryptographic authentication as described in 974 [RFC5304] and [RFC5310]. 976 6. Manageability Considerations 978 These extensions that have been designed, developed, and deployed for 979 many years do not have any new impact on management and operation of 980 the IS-IS protocol via this standardization process. 982 7. Acknowledgements 984 For RFC 5306 the authors acknowledged contributions made by Jeff 985 Parker, Radia Perlman, Mark Schaefer, Naiming Shen, Nischal Sheth, 986 Russ White, and Rena Yang. 988 The authors of this updated version acknowledge the contribution of 989 Mike Shand, co-auther of RFC 5306. 991 8. Normative References 993 [ISO10589] 994 International Organization for Standardization, 995 "Intermediate system to Intermediate system intra-domain 996 routeing information exchange protocol for use in 997 conjunction with the protocol for providing the 998 connectionless-mode Network Service (ISO 8473)", ISO/ 999 IEC 10589:2002, Second Edition, Nov 2002. 1001 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 1002 dual environments", RFC 1195, DOI 10.17487/RFC1195, 1003 December 1990, . 1005 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1006 Requirement Levels", BCP 14, RFC 2119, 1007 DOI 10.17487/RFC2119, March 1997, 1008 . 1010 [RFC5303] Katz, D., Saluja, R., and D. Eastlake 3rd, "Three-Way 1011 Handshake for IS-IS Point-to-Point Adjacencies", RFC 5303, 1012 DOI 10.17487/RFC5303, October 2008, 1013 . 1015 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 1016 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 1017 2008, . 1019 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 1020 and M. Fanto, "IS-IS Generic Cryptographic 1021 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 1022 2009, . 1024 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1025 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1026 . 1028 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1029 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1030 May 2017, . 1032 Appendix A. Summary of Changes from RFC 5306 1034 This document extends RFC 5306 by introducing support for signalling 1035 the neighbors of a restarting router that a planned restart is about 1036 to occur. This allows the neighbors to be aware of the state of the 1037 restarting router so that appropriate action may be taken if other 1038 topology changes occur while the planned restart is in progress. 1039 Since the forwarding plane of the restarting router is maintained 1040 based upon the pre-restart state of the network, additional topology 1041 changes introduce the possibility that traffic may be lost if paths 1042 via the restarting router continue to be used while the restart is in 1043 progress. 1045 In support of this new functionality two new flags have been 1046 introduced: 1048 PR - Restart is planned 1049 PA - Planned restart acknowledgement 1051 No changes to the post restart exchange between the restarting router 1052 and its neighbors have been introduced. 1054 Authors' Addresses 1056 Les Ginsberg 1057 Cisco Systems, Inc. 1059 Email: ginsberg@cisco.com 1061 Paul Wells 1062 Cisco Systems, Inc. 1064 Email: pauwells@cisco.com