idnits 2.17.1 draft-ietf-lsr-isis-rfc5306bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 13, 2018) is 1933 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IS-IS for IP Internets L. Ginsberg 3 Internet-Draft P. Wells 4 Obsoletes: 5306 (if approved) Cisco Systems, Inc. 5 Intended status: Standards Track December 13, 2018 6 Expires: June 16, 2019 8 Restart Signaling for IS-IS 9 draft-ietf-lsr-isis-rfc5306bis-01 11 Abstract 13 This document describes a mechanism for a restarting router to signal 14 to its neighbors that it is restarting, allowing them to reestablish 15 their adjacencies without cycling through the down state, while still 16 correctly initiating database synchronization. 18 This document additionally describes a mechansim for a router to 19 signal its neighbors that it is preparing to initiate a restart while 20 maintaining forwarding plane state. This allows the neighbors to 21 maintain their adjacencies until the router has restarted, but also 22 allows the neighbors to bring the adjacencies down in the event of 23 other topology changes. 25 This document additionally describes a mechanism for a restarting 26 router to determine when it has achieved Link State Protocol Data 27 Unit (LSP) database synchronization with its neighbors and a 28 mechanism to optimize LSP database synchronization, while minimizing 29 transient routing disruption when a router starts. 31 This document obsoletes RFC 5306. 33 Requirements Language 35 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 36 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 37 "OPTIONAL" in this document are to be interpreted as described in BCP 38 14 [RFC2119] [RFC8174] when, and only when, they appear in all 39 capitals, as shown here. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at https://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on June 16, 2019. 58 Copyright Notice 60 Copyright (c) 2018 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (https://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 2.1. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2.2. Restart TLV . . . . . . . . . . . . . . . . . . . . . . . 5 79 2.2.1. Use of RR and RA Bits . . . . . . . . . . . . . . . . 6 80 2.2.2. Use of the SA Bit . . . . . . . . . . . . . . . . . . 7 81 2.2.3. Use of PR and PA Bits . . . . . . . . . . . . . . . . 8 82 2.3. Adjacency (Re)Acquisition . . . . . . . . . . . . . . . . 10 83 2.3.1. Adjacency Reacquisition during Restart . . . . . . . 10 84 2.3.2. Adjacency Acquisition during Start . . . . . . . . . 12 85 2.3.3. Multiple Levels . . . . . . . . . . . . . . . . . . . 14 86 2.4. Database Synchronization . . . . . . . . . . . . . . . . 14 87 2.4.1. LSP Generation and Flooding and SPF Computation . . . 15 88 3. State Tables . . . . . . . . . . . . . . . . . . . . . . . . 17 89 3.1. Running Router . . . . . . . . . . . . . . . . . . . . . 18 90 3.2. Restarting Router . . . . . . . . . . . . . . . . . . . . 18 91 3.3. Starting Router . . . . . . . . . . . . . . . . . . . . . 19 92 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 93 5. Security Considerations . . . . . . . . . . . . . . . . . . . 21 94 6. Manageability Considerations . . . . . . . . . . . . . . . . 21 95 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 96 8. Normative References . . . . . . . . . . . . . . . . . . . . 22 97 Appendix A. Summary of Changes from RFC 5306 . . . . . . . . . . 23 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 100 1. Overview 102 The Intermediate System to Intermediate System (IS-IS) routing 103 protocol [RFC1195] [ISO10589] is a link state intra-domain routing 104 protocol. Normally, when an IS-IS router is restarted, temporary 105 disruption of routing occurs due to events in both the restarting 106 router and the neighbors of the restarting router. 108 The router that has been restarted computes its own routes before 109 achieving database synchronization with its neighbors. The results 110 of this computation are likely to be non-convergent with the routes 111 computed by other routers in the area/domain. 113 Neighbors of the restarting router detect the restart event and cycle 114 their adjacencies with the restarting router through the down state. 115 The cycling of the adjacency state causes the neighbors to regenerate 116 their LSPs describing the adjacency concerned. This in turn causes a 117 temporary disruption of routes passing through the restarting router. 119 In certain scenarios, the temporary disruption of the routes is 120 highly undesirable. This document describes mechanisms to avoid or 121 minimize the disruption due to both of these causes. 123 When an adjacency is reinitialized as a result of a neighbor 124 restarting, a router does three things: 126 1. It causes its own LSP(s) to be regenerated, thus triggering SPF 127 runs throughout the area (or in the case of Level 2, throughout 128 the domain). 130 2. It sets SRMflags on its own LSP database on the adjacency 131 concerned. 133 3. In the case of a Point-to-Point link, it transmits a complete set 134 of Complete Sequence Number PDUs (CSNPs), over the adjacency. 136 In the case of a restarting router process, the first of these is 137 highly undesirable, but the second is essential in order to ensure 138 synchronization of the LSP database. 140 The third action above minimizes the number of LSPs that must be 141 exchanged and, if made reliable, provides a means of determining when 142 the LSP databases of the neighboring routers have been synchronized. 143 This is desirable whether or not the router is being restarted (so 144 that the overload bit can be cleared in the router's own LSP, for 145 example). 147 This document describes a mechanism for a restarting router to signal 148 that it is restarting to its neighbors, and allow them to reestablish 149 their adjacencies without cycling through the down state, while still 150 correctly initiating database synchronization. 152 This document additionally describes a mechanism for a restarting 153 router to determine when it has achieved LSP database synchronization 154 with its neighbors and a mechanism to optimize LSP database 155 synchronization and minimize transient routing disruption when a 156 router starts. 158 It is assumed that the three-way handshake [RFC5303] is being used on 159 Point-to-Point circuits. 161 2. Approach 163 2.1. Timers 165 Three additional timers, T1, T2, and T3, are required to support the 166 functionality defined in this document. 168 An instance of the timer T1 is maintained per interface, and 169 indicates the time after which an unacknowledged (re)start attempt 170 will be repeated. A typical value might be 3 seconds. 172 An instance of the timer T2 is maintained for each LSP database 173 (LSPDB) present in the system, i.e., for a Level 1/2 system, there 174 will be an instance of the timer T2 for Level 1 and an instance for 175 Level 2. This is the maximum time that the system will wait for 176 LSPDB synchronization. A typical value might be 60 seconds. 178 A single instance of the timer T3 is maintained for the entire 179 system. It indicates the time after which the router will declare 180 that it has failed to achieve database synchronization (by setting 181 the overload bit in its own LSP). This is initialized to 65535 182 seconds, but is set to the minimum of the remaining times of received 183 IS-IS Hellos (IIHs) containing a restart TLV with the Restart 184 Acknowledgement (RA) set and an indication that the neighbor has an 185 adjacency in the "UP" state to the restarting router. 187 NOTE: The timer T3 is only used by a restarting router. 189 2.2. Restart TLV 191 A new TLV is defined to be included in IIH PDUs. The presence of 192 this TLV indicates that the sender supports the functionality defined 193 in this document and it carries flags that are used to convey 194 information during a (re)start. All IIHs transmitted by a router 195 that supports this capability MUST include this TLV. 197 Type 211 199 Length: Number of octets in the Value field (1 to (3 + ID Length)) 200 Value 202 No. of octets 203 +-----------------------+ 204 | Flags | 1 205 +-----------------------+ 206 | Remaining Time | 2 207 +-----------------------+ 208 | Restarting Neighbor ID| ID Length 209 +-----------------------+ 211 Flags (1 octet) 213 0 1 2 3 4 5 6 7 214 +--+--+--+--+--+--+--+--+ 215 |Reserved|PA|PR|SA|RA|RR| 216 +--+--+--+--+--+--+--+--+ 218 RR - Restart Request 219 RA - Restart Acknowledgement 220 SA - Suppress adjacency advertisement 221 PR - Restart is planned 222 PA - Planned restart acknowledgement 224 (Note: Remaining fields are required when the RA bit is set.) 225 Remaining Time (2 octets) 227 Remaining holding time (in seconds) 229 Restarting Neighbor System ID (ID Length octets) 231 The System ID of the neighbor to which an RA refers. Note: 232 Implementations based on earlier versions of this document may not 233 include this field in the TLV when the RA is set. In this case, a 234 router that is expecting an RA on a LAN circuit SHOULD assume that 235 the acknowledgement is directed at the local system. 237 2.2.1. Use of RR and RA Bits 239 The RR bit is used by a (re)starting router to signal to its 240 neighbors that a (re)start is in progress, that an existing adjacency 241 SHOULD be maintained even under circumstances when the normal 242 operation of the adjacency state machine would require the adjacency 243 to be reinitialized, to request a set of CSNPs, and to request 244 setting of the SRMflags. 246 The RA bit is sent by the neighbor of a (re)starting router to 247 acknowledge the receipt of a restart TLV with the RR bit set. 249 When the neighbor of a (re)starting router receives an IIH with the 250 restart TLV having the RR bit set, if there exists on this interface 251 an adjacency in state "UP" with the same System ID, and in the case 252 of a LAN circuit, with the same source LAN address, then, 253 irrespective of the other contents of the "Intermediate System 254 Neighbors" option (LAN circuits) or the "Point-to-Point Three-Way 255 Adjacency" option (Point-to-Point circuits): 257 a. the state of the adjacency is not changed. If this is the first 258 IIH with the RR bit set that this system has received associated 259 with this adjacency, then the adjacency is marked as being in 260 "Restart mode" and the adjacency holding time is refreshed -- 261 otherwise, the holding time is not refreshed. The "remaining 262 time" transmitted according to (b) below MUST reflect the actual 263 time after which the adjacency will now expire. Receipt of a 264 normal IIH with the RR bit reset will clear the "Restart mode" 265 state. This procedure allows the restarting router to cause the 266 neighbor to maintain the adjacency long enough for restart to 267 successfully complete, while also preventing repetitive restarts 268 from maintaining an adjacency indefinitely. Whether or not an 269 adjacency is marked as being in "Restart mode" has no effect on 270 adjacency state transitions. 272 b. immediately (i.e., without waiting for any currently running 273 timer interval to expire, but with a small random delay of a few 274 tens of milliseconds on LANs to avoid "storms") transmit over the 275 corresponding interface an IIH including the restart TLV with the 276 RR bit clear and the RA bit set, in the case of Point-to-Point 277 adjacencies having updated the "Point-to-Point Three-Way 278 Adjacency" option to reflect any new values received from the 279 (re)starting router. (This allows a restarting router to quickly 280 acquire the correct information to place in its hellos.) The 281 "Remaining Time" MUST be set to the current time (in seconds) 282 before the holding timer on this adjacency is due to expire. If 283 the corresponding interface is a LAN interface, then the 284 Restarting Neighbor System ID SHOULD be set to the System ID of 285 the router from which the IIH with the RR bit set was received. 286 This is required to correctly associate the acknowledgement and 287 holding time in the case where multiple systems on a LAN restart 288 at approximately the same time. This IIH SHOULD be transmitted 289 before any LSPs or SNPs are transmitted as a result of the 290 receipt of the original IIH. 292 c. if the corresponding interface is a Point-to-Point interface, or 293 if the receiving router has the highest LnRouterPriority (with 294 the highest source MAC (Media Access Control) address breaking 295 ties) among those routers to which the receiving router has an 296 adjacency in state "UP" on this interface whose IIHs contain the 297 restart TLV, excluding adjacencies to all routers which are 298 considered in "Restart mode" (note the actual DIS is NOT changed 299 by this process), initiate the transmission over the 300 corresponding interface of a complete set of CSNPs, and set 301 SRMflags on the corresponding interface for all LSPs in the local 302 LSP database. 304 Otherwise (i.e., if there was no adjacency in the "UP" state to the 305 System ID in question), process the IIH as normal by reinitializing 306 the adjacency and setting the RA bit in the returned IIH. 308 2.2.2. Use of the SA Bit 310 The SA bit is used by a starting router to request that its neighbor 311 suppress advertisement of the adjacency to the starting router in the 312 neighbor's LSPs. 314 A router that is starting has no maintained forwarding function 315 state. This may or may not be the first time the router has started. 316 If this is not the first time the router has started, copies of LSPs 317 generated by this router in its previous incarnation may exist in the 318 LSP databases of other routers in the network. These copies are 319 likely to appear "newer" than LSPs initially generated by the 320 starting router due to the reinitialization of LSP fragment sequence 321 numbers by the starting router. This may cause temporary blackholes 322 to occur until the normal operation of the update process causes the 323 starting router to regenerate and flood copies of its own LSPs with 324 higher sequence numbers. The temporary blackholes can be avoided if 325 the starting router's neighbors suppress advertising an adjacency to 326 the starting router until the starting router has been able to 327 propagate newer versions of LSPs generated by previous incarnations. 329 When a router receives an IIH with the restart TLV having the SA bit 330 set, if there exists on this interface an adjacency in state "UP" 331 with the same System ID, and in the case of a LAN circuit, with the 332 same source LAN address, then the router MUST suppress advertisement 333 of the adjacency to the neighbor in its own LSPs. Until an IIH with 334 the SA bit clear has been received, the neighbor advertisement MUST 335 continue to be suppressed. If the adjacency transitions to the "UP" 336 state, the new adjacency MUST NOT be advertised until an IIH with the 337 SA bit clear has been received. 339 Note that a router that suppresses advertisement of an adjacency MUST 340 NOT use this adjacency when performing its SPF calculation. In 341 particular, if an implementation follows the example guidelines 342 presented in [ISO10589], Annex C.2.5, Step 0:b) "pre-load TENT with 343 the local adjacency database", the suppressed adjacency MUST NOT be 344 loaded into TENT. 346 2.2.3. Use of PR and PA Bits 348 The PR bit is used by a router which is planning to initiate a 349 restart to signal to its neighbors that it will be restarting. 351 The PA bit is sent by the neighbor of a router planning to restart to 352 acknowledge receipt of a restart TLV with the PR bit set. 354 When the neighbor of a router planning a restart receives an IIH with 355 the restart TLV having the PR bit set, if there exists on this 356 interface an adjacency in state "UP" with the same System ID, and in 357 the case of a LAN circuit, with the same source LAN address, then: 359 a. if this is the first IIH with the PR bit set that this system has 360 received associated with this adjacency, then the adjacency is 361 marked as being in "Planned Restart state" and the adjacency 362 holding time is refreshed -- otherwise, the holding time is not 363 refreshed. The "remaining time" transmitted according to (b) 364 below MUST reflect the actual time after which the adjacency will 365 now expire. Receipt of a normal IIH with the PR bit reset will 366 clear the "Planned Restart mode" state. This procedure allows 367 the router planning a restart to cause the neighbor to maintain 368 the adjacency long enough for restart to successfully complete. 369 Whether or not an adjacency is marked as being in "Planned 370 Restart mode" has no effect on adjacency state transitions. 372 b. immediately (i.e., without waiting for any currently running 373 timer interval to expire, but with a small random delay of a few 374 tens of milliseconds on LANs to avoid "storms") transmit over the 375 corresponding interface an IIH including the restart TLV with the 376 PR bit clear and the PA bit set. The "Remaining Time" MUST be 377 set to the current time (in seconds) before the holding timer on 378 this adjacency is due to expire. If the corresponding interface 379 is a LAN interface, then the Restarting Neighbor System ID SHOULD 380 be set to the System ID of the router from which the IIH with the 381 PR bit set was received. This is required to correctly associate 382 the acknowledgement and holding time in the case where multiple 383 systems on a LAN are planning a restart at approximately the same 384 time. 386 While a control plane restart is in progress it is expected that the 387 restarting router will be unable to respond to topology changes. It 388 is therefore useful to signal a planned restart (if the forwarding 389 plane on the restarting router is maintained) so that the neighbors 390 of the restarting router can determine whether it is safe to maintain 391 the adjacency if other topology changes occur prior to the completion 392 of the restart. Signalling a planned restart in the absence of 393 maintained forwarding plane state is likely to lead to significant 394 traffic loss and MUST NOT be done. 396 Neighbors of the router which has signaled planned restart SHOULD 397 maintain the adjacency in a planned restart state until it receives 398 an IIH with the RR bit set, receives an IIH with both PR and RR bits 399 clear, or the adjacency holding time expires - whichever occurs 400 first. 402 While the adjacency is in planned restart state the following actions 403 MAY be taken: 405 a. If additional topology changes occur, the adjacency which is in 406 planned restart state MAY be brought down even though the hold 407 time has not yet expired. Given that the neighbor which has 408 signaled a planned restart is not expected to update its 409 forwarding plane in response to signaling of the topology changes 410 (since it is restarting) traffic which transits that node is at 411 risk of being improperly forwarded. On a LAN circuit, if the 412 router in planned restart state is the DIS at any supported 413 level, the adjacency(ies) SHOULD be brought down whenever any LSP 414 update is either generated or received so as to trigger a new DIS 415 election. Failure to do so will compromise the reliability of 416 the Update Process on that circuit. What other criteria are used 417 to determine what topology changes will trigger bringing the 418 adjacency down is a local implementation decision. 420 b. If a BFD session to the neighbor which signals a planned restart 421 is in the UP state and subsequently goes DOWN, the event MAY be 422 ignored since it is possible this is an expected side effect of 423 the restart. Use of the Control Plane Independent state as 424 signalled in BFD control packets [RFC5880] SHOULD be considered 425 in the decision to ignore a BFD Session DOWN event 427 c. On a Point-to-Point circuit, transmission of LSPs, CSNPs, and 428 PSNPs MAY be suppressed. It is expected that the PDUs will not 429 be received. 431 Use of the PR bit provides a means to safely support restart periods 432 which are significantly longer than standard holdtimes. 434 2.3. Adjacency (Re)Acquisition 436 Adjacency (re)acquisition is the first step in (re)initialization. 437 Restarting and starting routers will make use of the RR bit in the 438 restart TLV, though each will use it at different stages of the 439 (re)start procedure. 441 2.3.1. Adjacency Reacquisition during Restart 443 The restarting router explicitly notifies its neighbor that the 444 adjacency is being reacquired, and hence that it SHOULD NOT 445 reinitialize the adjacency. This is achieved by setting the RR bit 446 in the restart TLV. When the neighbor of a restarting router 447 receives an IIH with the restart TLV having the RR bit set, if there 448 exists on this interface an adjacency in state "UP" with the same 449 System ID, and in the case of a LAN circuit, with the same source LAN 450 address, then the procedures described in Section 3.2.1 are followed. 452 A router that does not support the restart capability will ignore the 453 restart TLV and reinitialize the adjacency as normal, returning an 454 IIH without the restart TLV. 456 On restarting, a router initializes the timer T3, starts the timer T2 457 for each LSPDB, and for each interface (and in the case of a LAN 458 circuit, for each level) starts the timer T1 and transmits an IIH 459 containing the restart TLV with the RR bit set. 461 On a Point-to-Point circuit, the restarting router SHOULD set the 462 "Adjacency Three-Way State" to "Init", because the receipt of the 463 acknowledging IIH (with RA set) MUST cause the adjacency to enter the 464 "UP" state immediately. 466 On a LAN circuit, the LAN-ID assigned to the circuit SHOULD be the 467 same as that used prior to the restart. In particular, for any 468 circuits for which the restarting router was previously DIS, the use 469 of a different LAN-ID would necessitate the generation of a new set 470 of pseudonode LSPs, and corresponding changes in all the LSPs 471 referencing them from other routers on the LAN. By preserving the 472 LAN-ID across the restart, this churn can be prevented. To enable a 473 restarting router to learn the LAN-ID used prior to restart, the LAN- 474 ID specified in an IIH with RR set MUST be ignored. 476 Transmission of "normal" IIHs is inhibited until the conditions 477 described below are met (in order to avoid causing an unnecessary 478 adjacency initialization). Upon expiry of the timer T1, it is 479 restarted and the IIH is retransmitted as above. 481 When a restarting router receives an IIH a local adjacency is 482 established as usual, and if the IIH contains a restart TLV with the 483 RA bit set (and on LAN circuits with a Restart Neighbor System ID 484 that matches that of the local system), the receipt of the 485 acknowledgement over that interface is noted. When the RA bit is set 486 and the state of the remote adjacency is "UP", then the timer T3 is 487 set to the minimum of its current value and the value of the 488 "Remaining Time" field in the received IIH. 490 On a Point-to-Point link, receipt of an IIH not containing the 491 restart TLV is also treated as an acknowledgement, since it indicates 492 that the neighbor is not restart capable. However, since no CSNP is 493 guaranteed to be received over this interface, the timer T1 is 494 cancelled immediately without waiting for a complete set of CSNPs. 495 Synchronization may therefore be deemed complete even though there 496 are some LSPs which are held (only) by this neighbor (see 497 Section 3.4). In this case, we also want to be certain that the 498 neighbor will reinitialize the adjacency in order to guarantee that 499 the SRMflags have been set on its database, thus ensuring eventual 500 LSPDB synchronization. This is guaranteed to happen except in the 501 case where the Adjacency Three-Way State in the received IIH is "UP" 502 and the Neighbor Extended Local Circuit ID matches the extended local 503 circuit ID assigned by the restarting router. In this case, the 504 restarting router MUST force the adjacency to reinitialize by setting 505 the local Adjacency Three-Way State to "DOWN" and sending a normal 506 IIH. 508 In the case of a LAN interface, receipt of an IIH not containing the 509 restart TLV is unremarkable since synchronization can still occur so 510 long as at least one of the non-restarting neighboring routers on the 511 LAN supports restart. Therefore, T1 continues to run in this case. 512 If none of the neighbors on the LAN are restart capable, T1 will 513 eventually expire after the locally defined number of retries. 515 In the case of a Point-to-Point circuit, the "LocalCircuitID" and 516 "Extended Local Circuit ID" information contained in the IIH can be 517 used immediately to generate an IIH containing the correct three-way 518 handshake information. The presence of "Neighbor Extended Local 519 Circuit ID" information that does not match the value currently in 520 use by the local system is ignored (since the IIH may have been 521 transmitted before the neighbor had received the new value from the 522 restarting router), but the adjacency remains in the initializing 523 state until the correct information is received. 525 In the case of a LAN circuit, the source neighbor information (e.g., 526 SNPAAddress) is recorded and used for adjacency establishment and 527 maintenance as normal. 529 When BOTH a complete set of CSNPs (for each active level, in the case 530 of a Point-to-Point circuit) and an acknowledgement have been 531 received over the interface, the timer T1 is cancelled. 533 Once the timer T1 has been cancelled, subsequent IIHs are transmitted 534 according to the normal algorithms, but including the restart TLV 535 with both RR and RA clear. 537 If a LAN contains a mixture of systems, only some of which support 538 the new algorithm, database synchronization is still guaranteed, but 539 the "old" systems will have reinitialized their adjacencies. 541 If an interface is active, but does not have any neighboring router 542 reachable over that interface, the timer T1 would never be cancelled, 543 and according to Section 3.4.1.1, the SPF would never be run. 544 Therefore, timer T1 is cancelled after some predetermined number of 545 expirations (which MAY be 1). 547 2.3.2. Adjacency Acquisition during Start 549 The starting router wants to ensure that in the event that a 550 neighboring router has an adjacency to the starting router in the 551 "UP" state (from a previous incarnation of the starting router), this 552 adjacency is reinitialized. The starting router also wants 553 neighboring routers to suppress advertisement of an adjacency to the 554 starting router until LSP database synchronization is achieved. This 555 is achieved by sending IIHs with the RR bit clear and the SA bit set 556 in the restart TLV. The RR bit remains clear and the SA bit remains 557 set in subsequent transmissions of IIHs until the adjacency has 558 reached the "UP" state and the initial T1 timer interval (see below) 559 has expired. 561 Receipt of an IIH with the RR bit clear will result in the 562 neighboring router utilizing normal operation of the adjacency state 563 machine. This will ensure that any old adjacency on the neighboring 564 router will be reinitialized. 566 Upon receipt of an IIH with the SA bit set, the behavior described in 567 Section 3.2.2 is followed. 569 Upon starting, a router starts timer T2 for each LSPDB. 571 For each interface (and in the case of a LAN circuit, for each 572 level), when an adjacency reaches the "UP" state, the starting router 573 starts a timer T1 and transmits an IIH containing the restart TLV 574 with the RR bit clear and SA bit set. Upon expiry of the timer T1, 575 it is restarted and the IIH is retransmitted with both RR and SA bits 576 set (only the RR bit has changed state from earlier IIHs). 578 Upon receipt of an IIH with the RR bit set (regardless of whether or 579 not the SA bit is set), the behavior described in Section 2.2.1 is 580 followed. 582 When an IIH is received by the starting router and the IIH contains a 583 restart TLV with the RA bit set (and on LAN circuits with a Restart 584 Neighbor System ID that matches that of the local system), the 585 receipt of the acknowledgement over that interface is noted. 587 On a Point-to-Point link, receipt of an IIH not containing the 588 restart TLV is also treated as an acknowledgement, since it indicates 589 that the neighbor is not restart capable. Since the neighbor will 590 have reinitialized the adjacency, this guarantees that SRMflags have 591 been set on its database, thus ensuring eventual LSPDB 592 synchronization. However, since no CSNP is guaranteed to be received 593 over this interface, the timer T1 is cancelled immediately without 594 waiting for a complete set of CSNPs. Synchronization may therefore 595 be deemed complete even though there are some LSPs that are held 596 (only) by this neighbor (see Section 2.4). 598 In the case of a LAN interface, receipt of an IIH not containing the 599 restart TLV is unremarkable since synchronization can still occur so 600 long as at least one of the non-restarting neighboring routers on the 601 LAN supports restart. Therefore, T1 continues to run in this case. 602 If none of the neighbors on the LAN are restart capable, T1 will 603 eventually expire after the locally defined number of retries. The 604 usual operation of the update process will ensure that 605 synchronization is eventually achieved. 607 When BOTH a complete set of CSNPs (for each active level, in the case 608 of a Point-to-Point circuit) and an acknowledgement have been 609 received over the interface, the timer T1 is cancelled. Subsequent 610 IIHs sent by the starting router have the RR and RA bits clear and 611 the SA bit set in the restart TLV. 613 Timer T1 is cancelled after some predetermined number of expirations 614 (which MAY be 1). 616 When the T2 timer(s) are cancelled or expire, transmission of 617 "normal" IIHs (with RR, RA, and SA bits clear) will begin. 619 2.3.3. Multiple Levels 621 A router that is operating as both a Level 1 and a Level 2 router on 622 a particular interface MUST perform the above operations for each 623 level. 625 On a LAN interface, it MUST send and receive both Level 1 and Level 2 626 IIHs and perform the CSNP synchronizations independently for each 627 level. 629 On a Point-to-Point interface, only a single IIH (indicating support 630 for both levels) is required, but it MUST perform the CSNP 631 synchronizations independently for each level. 633 2.4. Database Synchronization 635 When a router is started or restarted, it can expect to receive a 636 complete set of CSNPs over each interface. The arrival of the 637 CSNP(s) is now guaranteed, since an IIH with the RR bit set will be 638 retransmitted until the CSNP(s) are correctly received. 640 The CSNPs describe the set of LSPs that are currently held by each 641 neighbor. Synchronization will be complete when all these LSPs have 642 been received. 644 When (re)starting, a router starts an instance of timer T2 for each 645 LSPDB as described in Section 3.3.1 or Section 3.3.2. In addition to 646 normal processing of the CSNPs, the set of LSPIDs contained in the 647 first complete set of CSNPs received over each interface is recorded, 648 together with their remaining lifetime. In the case of a LAN 649 interface, a complete set of CSNPs MUST consist of CSNPs received 650 from neighbors that are not restarting. If there are multiple 651 interfaces on the (re)starting router, the recorded set of LSPIDs is 652 the union of those received over each interface. LSPs with a 653 remaining lifetime of zero are NOT so recorded. 655 As LSPs are received (by the normal operation of the update process) 656 over any interface, the corresponding LSPID entry is removed (it is 657 also removed if an LSP arrives before the CSNP containing the 658 reference). When an LSPID has been held in the list for its 659 indicated remaining lifetime, it is removed from the list. When the 660 list of LSPIDs is empty and the timer T1 has been cancelled for all 661 the interfaces that have an adjacency at this level, the timer T2 is 662 cancelled. 664 At this point, the local database is guaranteed to contain all the 665 LSP(s) (either the same sequence number or a more recent sequence 666 number) that were present in the neighbors' databases at the time of 667 (re)starting. LSPs that arrived in a neighbor's database after the 668 time of (re)starting may or may not be present, but the normal 669 operation of the update process will guarantee that they will 670 eventually be received. At this point, the local database is deemed 671 to be "synchronized". 673 Since LSPs mentioned in the CSNP(s) with a zero remaining lifetime 674 are not recorded, and those with a short remaining lifetime are 675 deleted from the list when the lifetime expires, cancellation of the 676 timer T2 will not be prevented by waiting for an LSP that will never 677 arrive. 679 2.4.1. LSP Generation and Flooding and SPF Computation 681 The operation of a router starting, as opposed to restarting, is 682 somewhat different. These two cases are dealt with separately below. 684 2.4.1.1. Restarting 686 In order to avoid causing unnecessary routing churn in other routers, 687 it is highly desirable that the router's own LSPs generated by the 688 restarting system are the same as those previously present in the 689 network (assuming no other changes have taken place). It is 690 important therefore not to regenerate and flood the LSPs until all 691 the adjacencies have been re-established and any information required 692 for propagation into the local LSPs is fully available. Ideally, the 693 information is loaded into the LSPs in a deterministic way, such that 694 the same information occurs in the same place in the same LSP (and 695 hence the LSPs are identical to their previous versions). If this 696 can be achieved, the new versions may not even cause SPF to be run in 697 other systems. However, provided the same information is included in 698 the set of LSPs (albeit in a different order, and possibly different 699 LSPs), the result of running the SPF will be the same and will not 700 cause churn to the forwarding tables. 702 In the case of a restarting router, none of the router's own LSPs are 703 transmitted, nor are the router's own forwarding tables updated while 704 the timer T3 is running. 706 Redistribution of inter-level information MUST be regenerated before 707 this router's LSP is flooded to other nodes. Therefore, the Level-n 708 non-pseudonode LSP(s) MUST NOT be flooded until the other level's T2 709 timer has expired and its SPF has been run. This ensures that any 710 inter-level information that is to be propagated can be included in 711 the Level-n LSP(s). 713 During this period, if one of the router's own (including 714 pseudonodes) LSPs is received, which the local router does not 715 currently have in its own database, it is NOT purged. Under normal 716 operation, such an LSP would be purged, since the LSP clearly should 717 not be present in the global LSP database. However, in the present 718 circumstances, this would be highly undesirable, because it could 719 cause premature removal of a router's own LSP -- and hence churn in 720 remote routers. Even if the local system has one or more of the 721 router's own LSPs (which it has generated, but not yet transmitted), 722 it is still not valid to compare the received LSP against this set, 723 since it may be that as a result of propagation between Level 1 and 724 Level 2 (or vice versa), a further router's own LSP will need to be 725 generated when the LSP databases have synchronized. 727 During this period, a restarting router SHOULD send CSNPs as it 728 normally would. Information about the router's own LSPs MAY be 729 included, but if it is included it MUST be based on LSPs that have 730 been received, not on versions that have been generated (but not yet 731 transmitted). This restriction is necessary to prevent premature 732 removal of an LSP from the global LSP database. 734 When the timer T2 expires or is cancelled indicating that 735 synchronization for that level is complete, the SPF for that level is 736 run in order to derive any information that is required to be 737 propagated to another level, but the forwarding tables are not yet 738 updated. 740 Once the other level's SPF has run and any inter-level propagation 741 has been resolved, the router's own LSPs can be generated and 742 flooded. Any own LSPs that were previously ignored, but that are not 743 part of the current set of own LSPs (including pseudonodes), MUST 744 then be purged. Note that it is possible that a Designated Router 745 change may have taken place, and consequently the router SHOULD purge 746 those pseudonode LSPs that it previously owned, but that are now no 747 longer part of its set of pseudonode LSPs. 749 When all the T2 timers have expired or been cancelled, the timer T3 750 is cancelled and the local forwarding tables are updated. 752 If the timer T3 expires before all the T2 timers have expired or been 753 cancelled, this indicates that the synchronization process is taking 754 longer than the minimum holding time of the neighbors. The router's 755 own LSP(s) for levels that have not yet completed their first SPF 756 computation are then flooded with the overload bit set to indicate 757 that the router's LSPDB is not yet synchronized (and therefore other 758 routers MUST NOT compute routes through this router). Normal 759 operation of the update process resumes, and the local forwarding 760 tables are updated. In order to prevent the neighbor's adjacencies 761 from expiring, IIHs with the normal interface value for the holding 762 time are transmitted over all interfaces with neither RR nor RA set 763 in the restart TLV. This will cause the neighbors to refresh their 764 adjacencies. The router's own LSP(s) will continue to have the 765 overload bit set until timer T2 has expired or been cancelled. 767 2.4.1.2. Starting 769 In the case of a starting router, as soon as each adjacency is 770 established, and before any CSNP exchanges, the router's own zeroth 771 LSP is transmitted with the overload bit set. This prevents other 772 routers from computing routes through the router until it has 773 reliably acquired the complete set of LSPs. The overload bit remains 774 set in subsequent transmissions of the zeroth LSP (such as will occur 775 if a previous copy of the router's own zeroth LSP is still present in 776 the network) while any timer T2 is running. 778 When all the T2 timers have been cancelled, the router's own LSP(s) 779 MAY be regenerated with the overload bit clear (assuming the router 780 is not in fact overloaded, and there is no other reason, such as 781 incomplete BGP convergence, to keep the overload bit set) and flooded 782 as normal. 784 Other LSPs owned by this router (including pseudonodes) are generated 785 and flooded as normal, irrespective of the timer T2. The SPF is also 786 run as normal and the Routing Information Base (RIB) and Forwarding 787 Information Base (FIB) updated as routes become available. 789 To avoid the possible formation of temporary blackholes, the starting 790 router sets the SA bit in the restart TLV (as described in 791 Section 3.3.2) in all IIHs that it sends. 793 When all T2 timers have been cancelled, the starting router MUST 794 transmit IIHs with the SA bit clear. 796 3. State Tables 798 This section presents state tables that summarize the behaviors 799 described in this document. Other behaviors, in particular adjacency 800 state transitions and LSP database update operation, are NOT included 801 in the state tables except where this document modifies the behaviors 802 described in [ISO10589] and [RFC5303]. 804 The states named in the columns of the tables below are a mixture of 805 states that are specific to a single adjacency (ADJ suppressed, ADJ 806 Seen RA, ADJ Seen CSNP) and states that are indicative of the state 807 of the protocol instance (Running, Restarting, Starting, SPF Wait). 809 Three state tables are presented from the point of view of a running 810 router, a restarting router, and a starting router. 812 3.1. Running Router 814 Event | Running | ADJ suppressed 815 ============================================================== 816 RX PR | Set Planned Restart | 817 | state. | 818 | Send PA | 819 -------------+----------------------+------------------------- 820 RX PR clr | Clear Planned | 821 and RR clr | Restart State | 822 -------------+----------------------+------------------------- 823 RX RR | Maintain ADJ State | 824 | Send RA | 825 | Set SRM,send CSNP | 826 | (Note 1) | 827 | Update Hold Time, | 828 | set Restart Mode | 829 | (Note 2) | 830 -------------+----------------------+------------------------- 831 RX RR clr | Clr Restart mode | 832 -------------+----------------------+------------------------- 833 RX SA | Suppress IS neighbor | 834 | TLV in LSP(s) | 835 | Goto ADJ Suppressed | 836 -------------+----------------------+------------------------- 837 RX SA clr | |Unsuppress IS neighbor 838 | | TLV in LSP(s) 839 | |Goto Running 840 ============================================================== 842 Note 1: CSNPs are sent by routers in accordance with Section 2.2.1c 844 Note 2: If Restart Mode clear 846 3.2. Restarting Router 848 Event | Restarting | ADJ Seen | ADJ Seen | SPF Wait 849 | | RA | CSNP | 850 =================================================================== 851 Restart | Send PR | | | 852 planned | | | | 853 ------------+--------------------+-----------+-----------+------------ 854 Planned | Send PR clr | | | 855 restart | | | | 856 canceled | | | | 857 ------------+--------------------+-----------+-----------+------------ 858 Router | Send IIH/RR | | | 859 restarts | ADJ Init | | | 860 | Start T1,T2,T3 | | | 861 ------------+--------------------+-----------+-----------+------------ 862 RX RR | Send RA | | | 863 ------------+--------------------+-----------+-----------+------------ 864 RX RA | Adjust T3 | | Cancel T1 | 865 | Goto ADJ Seen RA | | Adjust T3 | 866 ----------- +--------------------+-----------+-----------+------------ 867 RX CSNP set| Goto ADJ Seen CSNP | Cancel T1 | | 868 ------------+--------------------+-----------+-----------+------------ 869 RX IIH w/o | Cancel T1 (Point- | | | 870 Restart TLV| to-point only) | | | 871 ------------+--------------------+-----------+-----------+------------ 872 T1 expires | Send IIH/RR |Send IIH/RR|Send IIH/RR| 873 | Restart T1 | Restart T1| Restart T1| 874 ------------+--------------------+-----------+-----------+------------ 875 T1 expires | Send IIH/ | Send IIH/ | Send IIH/ | 876 nth time | normal | normal | normal | 877 ------------+--------------------+-----------+-----------+------------ 878 T2 expires | Trigger SPF | | | 879 | Goto SPF Wait | | | 880 ------------+--------------------+-----------+-----------+------------ 881 T3 expires | Set overload bit | | | 882 | Flood local LSPs | | | 883 | Update fwd plane | | | 884 ------------+--------------------+-----------+-----------+------------ 885 LSP DB Sync| Cancel T2, and T3 | | | 886 | Trigger SPF | | | 887 | Goto SPF wait | | | 888 ------------+--------------------+-----------+-----------+------------ 889 All SPF | | | | Clear 890 done | | | | overload bit 891 | | | | Update fwd 892 | | | | plane 893 | | | | Flood local 894 | | | | LSPs 895 | | | | Goto Running 896 ====================================================================== 898 3.3. Starting Router 899 Event | Starting | ADJ Seen RA| ADJ Seen CSNP 900 ============================================================= 901 Router | Send IIH/SA | | 902 starts | Start T1,T2 | | 903 -------------+-------------------+------------+--------------- 904 RX RR | Send RA | | 905 -------------+-------------------+------------+--------------- 906 RX RA | Goto ADJ Seen RA | | Cancel T1 907 -------------+-------------------+------------+--------------- 908 RX CSNP Set | Goto ADJ Seen CSNP| Cancel T1 | 909 -------------+-------------------+------------+--------------- 910 RX IIH w | Cancel T1 | | 911 no Restart | (Point-to-Point | | 912 TLV | only) | | 913 -------------+-------------------+------------+--------------- 914 ADJ UP | Start T1 | | 915 | Send local LSPs | | 916 | with overload bit| | 917 | set | | 918 -------------+-------------------+------------+--------------- 919 T1 expires | Send IIH/RR |Send IIH/RR | Send IIH/RR 920 | and SA | and SA | and SA 921 | Restart T1 |Restart T1 | Restart T1 922 -------------+-------------------+------------+--------------- 923 T1 expires | Send IIH/SA |Send IIH/SA | Send IIH/SA 924 nth time | | | 925 -------------+-------------------+------------+--------------- 926 T2 expires | Clear overload bit| | 927 | Send IIH normal | | 928 | Goto Running | | 929 -------------+-------------------+------------+--------------- 930 LSP DB Sync | Cancel T2 | | 931 | Clear overload bit| | 932 | Send IIH normal | | 933 ============================================================== 935 4. IANA Considerations 937 This document defines the following IS-IS TLV that is listed in the 938 IS-IS TLV codepoint registry: 940 Type Description IIH LSP SNP 941 ---- ----------------------------------- --- --- --- 942 211 Restart TLV y n n 944 5. Security Considerations 946 Any new security issues raised by the procedures in this document 947 depend upon the ability of an attacker to inject a false but 948 apparently valid IIH, the ease/difficulty of which has not been 949 altered. 951 If the RR bit is set in a false IIH, neighbors who receive such an 952 IIH will continue to maintain an existing adjacency in the "UP" state 953 and may (re)send a complete set of CSNPs. While the latter action is 954 wasteful, neither action causes any disruption in correct protocol 955 operation. 957 If the RA bit is set in a false IIH, a (re)starting router that 958 receives such an IIH may falsely believe that there is a neighbor on 959 the corresponding interface that supports the procedures described in 960 this document. In the absence of receipt of a complete set of CSNPs 961 on that interface, this could delay the completion of (re)start 962 procedures by requiring the timer T1 to time out the locally defined 963 maximum number of retries. This behavior is the same as would occur 964 on a LAN where none of the (re)starting router's neighbors support 965 the procedures in this document and is covered in Sections 2.3.1 and 966 2.3.2. 968 If an SA bit is set in a false IIH, this could cause suppression of 969 the advertisement of an IS neighbor, which could either continue for 970 an indefinite period or occur intermittently with the result being a 971 possible loss of reachability to some destinations in the network 972 and/or increased frequency of LSP flooding and SPF calculation. 974 The possibility of IS-IS PDU spoofing can be reduced by the use of 975 authentication as described in [RFC1195] and [ISO10589], and 976 especially the use of cryptographic authentication as described in 977 [RFC5304] and [RFC5310]. 979 6. Manageability Considerations 981 These extensions that have been designed, developed, and deployed for 982 many years do not have any new impact on management and operation of 983 the IS-IS protocol via this standardization process. 985 7. Acknowledgements 987 For RFC 5306 the authors acknowledged contributions made by Jeff 988 Parker, Radia Perlman, Mark Schaefer, Naiming Shen, Nischal Sheth, 989 Russ White, and Rena Yang. 991 The authors of this updated version acknowledge the contribution of 992 Mike Shand, co-auther of RFC 5306. 994 8. Normative References 996 [ISO10589] 997 International Organization for Standardization, 998 "Intermediate system to Intermediate system intra-domain 999 routeing information exchange protocol for use in 1000 conjunction with the protocol for providing the 1001 connectionless-mode Network Service (ISO 8473)", ISO/ 1002 IEC 10589:2002, Second Edition, Nov 2002. 1004 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 1005 dual environments", RFC 1195, DOI 10.17487/RFC1195, 1006 December 1990, . 1008 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1009 Requirement Levels", BCP 14, RFC 2119, 1010 DOI 10.17487/RFC2119, March 1997, 1011 . 1013 [RFC5303] Katz, D., Saluja, R., and D. Eastlake 3rd, "Three-Way 1014 Handshake for IS-IS Point-to-Point Adjacencies", RFC 5303, 1015 DOI 10.17487/RFC5303, October 2008, 1016 . 1018 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 1019 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 1020 2008, . 1022 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 1023 and M. Fanto, "IS-IS Generic Cryptographic 1024 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 1025 2009, . 1027 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1028 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1029 . 1031 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1032 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1033 May 2017, . 1035 Appendix A. Summary of Changes from RFC 5306 1037 This document extends RFC 5306 by introducing support for signalling 1038 the neighbors of a restarting router that a planned restart is about 1039 to occur. This allows the neighbors to be aware of the state of the 1040 restarting router so that appropriate action may be taken if other 1041 topology changes occur while the planned restart is in progress. 1042 Since the forwarding plane of the restarting router is maintained 1043 based upon the pre-restart state of the network, additional topology 1044 changes introduce the possibility that traffic may be lost if paths 1045 via the restarting router continue to be used while the restart is in 1046 progress. 1048 In support of this new functionality two new flags have been 1049 introduced: 1051 PR - Restart is planned 1052 PA - Planned restart acknowledgement 1054 No changes to the post restart exchange between the restarting router 1055 and its neighbors have been introduced. 1057 Authors' Addresses 1059 Les Ginsberg 1060 Cisco Systems, Inc. 1062 Email: ginsberg@cisco.com 1064 Paul Wells 1065 Cisco Systems, Inc. 1067 Email: pauwells@cisco.com