idnits 2.17.1 draft-shand-isis-restart-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2001) is 8320 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '2' on line 443 looks like a reference -- Missing reference section? '3' on line 443 looks like a reference -- Missing reference section? '4' on line 65 looks like a reference -- Missing reference section? '5' on line 96 looks like a reference -- Missing reference section? 'TBD' on line 137 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Shand 3 Internet Draft Cisco Systems 4 Expiration Date: January 2002 5 July 2001 7 Restart signaling for ISIS 8 draft-shand-isis-restart-01.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. Internet-Drafts are draft documents valid for a maximum of 19 six months and may be updated, replaced, or obsoleted by other 20 documents at any time. It is inappropriate to use Internet-Drafts as 21 reference material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 1. Abstract 31 The IS-IS routing protocol (RFC 1142 [2], ISO/IEC 10589 [3]) is a 32 link state intra-domain routing protocol. Normally, when an IS-IS 33 router is re-started, the neighboring routers detect the restart 34 event and cycle their adjacencies with the restarting router through 35 the down state. This is necessary in order to invoke the protocol 36 mechanisms to ensure correct re-synchronization of the LSP database. 37 However, the cycling of the adjacency state causes the neighbors to 38 regenerate their LSPs describing the adjacency concerned. This in 39 turn causes temporary disruption of routes passing through the 40 restarting router. 42 In certain scenarios such temporary disruption of the routes is 43 highly undesirable. 45 This draft describes a mechanism for a restarting router to signal 46 that it is restarting to its neighbors, and allow them to re- 47 establish their adjacencies without cycling through the down state, 48 while still correctly initiating database synchronization. 50 When such a router is restarted, it is highly desirable that it does 51 not re-compute its own routes until it has achieved database 52 synchronization with its neighbors. Re-computing its routes before 53 synchronization is achieved will result in its own routes being 54 temporarily incorrect. 56 This draft additionally describes a mechanism for a restarting 57 router to determine when it has achieved synchronization with its 58 neighbors. 60 2. Conventions used in this document 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 64 this document are to be interpreted as described in RFC-2119 [4]. 66 3. Overview 68 There are two related problems with the existing specification of 69 IS-IS with regard to re-synchronization of LSP databases when a 70 router is re-started. 72 Firstly, when a routing process restarts, and an adjacency to a 73 neighboring router is re-initialized the neighboring routing process 74 does three things 76 1. It re-initializes the adjacency and causes its own LSP(s) to be 77 regenerated, thus triggering SPF runs throughout the area (or 78 in the case of Level 2, throughout the domain). 80 2. It sets SRMflags on its own LSP database on the adjacency 81 concerned. 83 3. In the case of a Point-to-Point link it transmits a (set of) 84 CSNP(s) over the adjacency. 86 In the case of a restarting router process, the first of these is 87 highly undesirable, but the second is essential in order to ensure 88 re-synchronization of the LSP database. 90 Secondly, whether or not the router is being re-started, it is 91 desirable to be able to determine when the LSP databases of the 92 neighboring routers have been synchronized (so that the overload bit 93 can be cleared in the router's own LSP, for example). This document 94 describes modifications to achieve this. 96 It is assumed that the three-way handshake [5] is being used on 97 Point-to-Point circuits. 99 4. Approach 101 4.1 Timers 103 A router that is restart capable maintains three additional timers, 104 T1, T2 and T3. 106 An instance of T1 is maintained per interface, and indicates the 107 time after which an unacknowledged restart attempt will be repeated. 108 A typical value might be 3 seconds. 110 An instance of T2 is maintained for each LSP database present in the 111 system. I.e. for a level1/2 system, there will be an instance of T2 112 for Level 1 and one for level 2. This is the maximum time that the 113 system will wait for LSPDB synchronization. A typical value might be 114 60 seconds. 116 A single instance of T3 is maintained for the entire system. It 117 indicates the time after which the router will declare that it has 118 failed to achieve database synchronization (by setting the overload 119 bit in its own LSP). This is initialized to 65535 seconds, but is 120 set to the minimum of the remaining times (less one second) of 121 received IIHs containing a restart TLV with RA set. This ensures 122 that T3 will expire one second before the first adjacency is due to 123 expire. 125 4.2 Adjacency re-acquisition 127 Adjacency re-acquisition is the first step in re-initialization. The 128 restarting router explicitly notifies its neighbor that the 129 adjacency is being re-acquired, and hence that it should not re- 130 initialize the adjacency. This is achieved by the inclusion of a new 131 "re-start" option (TLV) in the IIH PDU. The presence of this TLV 132 indicates that the sender supports the new restart capability and it 133 carries flags that are used to convey information during a restart. 134 All IIHs transmitted by a router that supports this capability MUST 135 include this TLV. 137 Type [TBD] 138 Length 3 139 Value (3 octets) 140 Flags (1 octet) 141 Bit 1 - Restart Request (RR) 142 Bit 2 - Restart Acknowledgment (RA) 143 Bits 3-8 � Reserved 144 Remaining Time (2 octets) 145 Remaining holding time (in seconds) 146 (note: only significant when RA bit is set) 148 On receipt of an IIH with the "re-start" TLV having the RR bit set, 149 if there exists on this interface an adjacency in state "Up" with 150 the same System ID, and in the case of a LAN circuit, with the same 151 source LAN address, then, irrespective of the other contents of the 152 "Intermediate System Neighbors" option (LAN circuits), or the 153 "Point-to-Point Adjacency State" option (Point-to-Point circuits):- 155 a) DO NOT refresh the timer on the adjacency, but leave the 156 adjacency in state "Up", 158 b) immediately (i.e. without waiting for any currently running timer 159 interval to expire, but with a small random delay of a few 10s of 160 milliseconds on LANs to avoid "storms"), transmit over the 161 corresponding interface an IIH including the "re-start" TLV with 162 the RA bit set, having updated the "Point-to-Point Adjacency 163 State" option to reflect any new values received from the re- 164 starting router. (This allows the restarting router to quickly 165 acquire the correct information to place in its hellos.) The 166 "Remaining Time" MUST be set to the current time (in seconds) 167 before the holding timer on this adjacency is due to expire, 169 c) if the corresponding interface is a Point-to-Point interface, or 170 if the receiving router has the highest LnRouterPriority (with 171 highest source MAC address breaking ties) among those routers 172 whose IIHs contain the restart TLV, excluding the transmitting 173 router (note the actual DR is NOT changed by this process.), 174 initiate the transmission over the corresponding interface of a 175 complete set of CSNPs, and set SRMflags on the corresponding 176 interface for all LSPs in the local LSP database. 178 Otherwise (i.e. if there was no adjacency to the system ID in 179 question), process the IIH as normal by re-initializing the 180 adjacency, and setting the RA bit and a "Remaining Time" of zero in 181 the returned IIH. 183 A router that does not support the re-start capability will ignore 184 the "re-start" TLV and re-initialize the adjacency as normal, 185 returning an IIH without the "re-start" TLV. 187 On starting, a router initializes the timer T3, starts timer T2 for 188 each LSPDB and for each interface starts a timer T1 and transmits an 189 IIH containing the "re-start" TLV with the RR bit set. 191 1. On a LAN circuit the IIH contains an empty "Intermediate 192 Systems Neighbors" TLV. 194 2. On a Point-to-Point circuit the IIH contains a "Point-to-Point 195 Adjacency State" option with state "Down", and with empty 196 "Neighbor System ID" and "Neighbor Extended Local Circuit ID" 197 options. The values of the "LocalCircuitID" and the "Extended 198 Local CircuitID" may, but need not be, the same as those used 199 previously for this circuit. 201 Transmission of "normal" IIHs is inhibited until the conditions 202 described below are met (in order to avoid causing an unnecessary 203 adjacency re-initialization). On expiry of the timer T1, it is 204 restarted and the IIH is re-transmitted as above. 206 On receipt of an IIH by the restarting router, a local adjacency is 207 established as usual, and if the IIH contains a "re-start" TLV with 208 the RA bit set, the receipt of the acknowledgement over that 209 interface is noted (see section 4.2.1). T3 is set to the minimum of 210 its current value and one less than the value of the "Remaining 211 Time" field in the received IIH. If this results in T3 taking a 212 value less than or equal to zero, then the actions described for 213 expiration of T3 are taken. 215 Receipt of an IIH not containing the "re-start" option is also 216 treated as an acknowledgement, since it indicates that the neighbor 217 is not re-start capable. In this case the neighbor will have re- 218 initialized the adjacency as normal, which in the case of a Point- 219 to-Point link will guarantee that SRMflags have been set on its 220 database, thus ensuring eventual LSPDB synchronization. In the case 221 of a LAN interface, the usual operation of the update process will 222 also ensure that synchronization is eventually achieved. However, 223 since no CSNP is guaranteed to be received over this interface, T1 224 is cancelled immediately without waiting for a CSNP. Synchronization 225 may therefore be deemed complete even though there are some LSPs 226 which are held (only) by this neighbor (see section 4.3). The timer 227 T3 is considered to have expired, as if the IIH had contained a 228 restart option with a "Remaining Time" value of zero. 230 In the case of a Point-to-Point circuit, the "LocalCircuitID" and 231 "Extended Local Circuit ID" information contained in the IIH can be 232 used immediately to generate an IIH containing the correct 3-way 233 handshake information. The presence of "Neighbor System ID" or 234 "Neighbor Extended Local Circuit ID" information which does not 235 match the values currently in use by the local system is ignored 236 (since the IIH may have been transmitted before the neighbor had 237 received the new values from the re-starting router), but the 238 adjacency remains in the initializing state until the correct 239 information is received. 241 In the case of a LAN circuit the information in the Intermediate 242 Systems Neighbors option is recorded and used for the generation of 243 subsequent IIHs as normal. 245 When BOTH a complete set of CSNP(s) and an acknowledgement have been 246 received over the interface, the timer T1 is cancelled. 248 Once T3 has expired or been cancelled, subsequent IIHs are 249 transmitted according to the normal algorithms, but including the 250 "re-start" TLV with both RR and RA clear. 252 If a LAN contains a mixture of systems, only some of which support 253 the new algorithm, database synchronization is still guaranteed, but 254 the "old" systems will have re-initialized their adjacencies. 256 If an interface is active, but does not have any neighboring router 257 reachable over that interface the timer T1 would never be cancelled, 258 and according to clause 4.3.1.2 the SPF would never be run. 259 Therefore timer T1 is cancelled after some pre-determined number of 260 expirations. (By this time any existing adjacency on a remote system 261 would probably have expired anyway.) 263 A router which supports re-start SHOULD ensure that the holding time 264 of any IIHs it transmits is greater than the expected time to 265 complete a re-start. 267 4.2.1 State Table 269 The above operations can be summarized by the following state table. 271 Event | Running | Restarting | Seen RA | Seen CSNP 272 ================================================================== 273 RX RR | Set SRM | Set SRM | Set SRM | Set SRM 274 | Send RA | Send RA | Send RA | Send RA 275 | Send CSNP | Send CSNP | Send CSNP | Send CSNP 276 -------------+------------+------------+------------+------------- 277 RX RA | | Goto Seen | | Cancel T1 278 | | RA | | Goto Running 279 -------------+------------+------------+------------+------------- 280 RX CSNP | | Goto Seen | Cancel T1 | 281 | | CSNP | Goto | 282 | | | Running | 283 -------------+------------+------------+------------+------------- 284 RX IIH | | Cancel T1 | Cancel T1 | Cancel T1 285 with no | | Goto | Goto | Goto 286 Reset TLV | | Running | Running | Running 287 -------------+------------+------------+------------+------------- 288 T1 | | Send RR | Send RR | Send RR 289 Expires | | Send CSNP | Send CSNP | Send CSNP 290 | | Start T1 | Start T1 | Start T1 291 -------------+------------+------------+------------+------------- 292 T1 | | Cancel T1 | Cancel T1 | Cancel T1 293 Expires | | Goto | Goto | Goto 294 n times | | Running | Running | Running 295 -------------+------------+------------+------------+------------- 296 Router | Set SRM | Set SRM | Set SRM | Set SRM 297 Restarted | Send RR | Send RR | Send RR | Send RR 298 | Send CSNP | Send CSNP | Send CSNP | Send CSNP 299 | Start T1 | Start T1 | Start T1 | Start T1 300 | Goto | Goto | Goto | Goto 301 | Restarting | Restarting | Restarting | Restarting 302 ================================================================== 304 4.3 Database synchronization 306 When a router is started or re-started it can expect to receive a 307 (set of) CSNP(s) over each interface. The arrival of the CSNP(s) is 308 now guaranteed, since the "re-start" IIH with the RR bit set will be 309 retransmitted until the CSNP(s) are correctly received. 311 The CSNPs describe the set of LSPs that are currently held by each 312 neighbor. Synchronization will be complete when all these LSPs have 313 been received. 315 On starting, a router starts the timer T3 and an instance of timer 316 T2 for each LSPDB. In addition to normal processing of the CSNPs, 317 the set of LSPIDs contained in the first complete set of CSNP(s) 318 received over each interface is recorded, together with their 319 remaining lifetime. If there are multiple interfaces on the 320 restarting router, the recorded set of LSPIDs is the union of those 321 received over each interface. LSPs with a remaining lifetime of zero 322 are NOT so recorded. 324 As LSPs are received (by the normal operation of the update process) 325 over any interface, the corresponding LSPID entry is removed (it is 326 also removed if the LSP had arrived before the CSNP containing the 327 reference). When an LSPID has been held in the list for its 328 indicated remaining lifetime, it is removed from the list. When the 329 list of LSPIDs becomes empty, the timer T2 is cancelled. 331 At this point the local database is guaranteed to contain all the 332 LSP(s) (either the same sequence number, or a more recent sequence 333 number) which were present in the neighbors' databases at the time 334 of re-starting. LSPs that arrived in a neighbor's database after the 335 time of re-starting may, or may not, be present, but the normal 336 operation of the update process will guarantee that they will 337 eventually be received. At this point the local database is deemed 338 to be "synchronized". 340 Since LSPs mentioned in the CSNP(s) with a zero remaining lifetime 341 are not recorded, and those with a short remaining lifetime are 342 deleted from the list when the lifetime expires, cancellation of the 343 timer T2 will not be prevented by waiting for an LSP that will never 344 arrive. 346 4.3.1 LSP generation and flooding and SPF computation 348 The operation of a router starting, as opposed to re-starting is 349 somewhat different. These two cases are dealt with separately below. 351 4.3.1.1. Starting for the first time 353 In the case of a starting router, as soon as each adjacency is 354 established, and before any CSNP exchanges, the router's own zeroth 355 LSP is transmitted with the overload bit set. This prevents other 356 routers from computing routes through the router until it has 357 reliably acquired the complete set of LSPs. The overload bit remains 358 set in subsequent transmissions of the zeroth LSP (such as will 359 occur if a previous copy of the routers LSP is still present in the 360 network) while any timer T2 is running. 362 When all the T2 timers have been cancelled, the own LSP(s) are 363 regenerated with the overload bit clear (assuming the router isn't 364 in fact overloaded), and flooded as normal. 366 Other 'own' LSPs (including pseudonodes) are generated and flooded 367 as normal, irrespective of the timer T2. The SPF is also run as 368 normal and the RIB and FIB updated as routes become available. 370 4.3.1.2. Re-starting 372 In order to avoid causing unnecessary routing churn in other 373 routers, it is highly desirable that the own LSPs generated by the 374 restarting system are the same as those previously present in the 375 network (assuming no other changes have taken place). It is 376 important therefore not to regenerate and flood the LSPs until all 377 the adjacencies have been re-established and any information 378 required for propagation into the local LSPs is fully available. 379 Ideally, the information should be loaded into the LSPs in a 380 deterministic way, such that the same information occurs in the same 381 place in the same LSP (and hence the LSPs are identical to their 382 previous versions). If this can be achieved, the new versions will 383 not even cause SPF to be run in other systems. However, provided the 384 same information is included in the set of LSPs (albeit in a 385 different order, and possibly different LSPs), the result of running 386 the SPF will be the same and will not cause churn to the forwarding 387 tables. 389 In the case of a re-starting router, none of the router's own non- 390 pseudonode LSPs are transmitted, nor is the SPF run to update the 391 forwarding tables while the timer T3 is running. 393 Redistribution of inter-level information must be regenerated before 394 this router's LSP is flooded to other nodes. Therefore the level-n 395 non-pseudonode LSP(s) should not be flooded until the other level's 396 T2 timer has expired and its SPF has been run. This ensures that any 397 inter-level information that should be propagated can be included in 398 the level-n LSP(s). 400 During this period, if one of the router's own (including 401 pseudonodes) LSPs is received, which the local router does not 402 currently have in its own database, it is NOT purged. Under normal 403 operation, such an LSP would be purged, since the LSP clearly should 404 not be present in the global LSP database. However, in the present 405 circumstances, this would be highly undesirable, because it could 406 cause premature removal of an own LSP -- and hence churn in remote 407 routers. Even if the local system has one or more own LSPs (which it 408 has generated, but not yet transmitted) it is still not valid to 409 compare the received LSP against this set, since it may be that as a 410 result of propagation between level 1 and level 2 (or vice versa) a 411 further own LSP will need to be generated when the LSP databases 412 have synchronized. 414 When the timer T2 expires, or is cancelled, the SPF is run to update 415 the RIB and FIB. 417 Once the other level's SPF has run and any inter-level propagation 418 has been resolved, the 'own' LSPs can be generated and flooded. Any 419 'own' LSPs which were previously ignored, but which are not part of 420 the current set of 'own' LSPs (including pseudonodes) should then be 421 purged. Note that it is possible that a Designated Router change may 422 have taken place, and consequently the router should purge those 423 pseudonode LSPs which it previously owned, but which are now no 424 longer part of its set of pseudonode LSPs. 426 If the timer T3 expires before all the T2 timers have been cancelled 427 or expired, this indicates that the synchronization process is 428 taking longer than minimum holding time of the neighbors. The 429 router's own LSP(s) for all levels are then flooded with the 430 overload bit set to indicate that the router's LSPDB is not yet 431 synchronized (and other routers should therefore not compute routes 432 through this router). In order to prevent the neighbor's adjacencies 433 from expiring, IIHs are transmitted over all interfaces with neither 434 RR nor RA set in the restart TLV. This will cause the neighbors to 435 refresh their adjacencies. The own LSP(s) will continue to have the 436 overload bit set until timer T2 has been cancelled as in the case of 437 starting for the first time described in section 4.3.1.1 439 5. Security Considerations 441 This memo does not create any new security issues for the IS-IS 442 protocol. Security considerations for the base IS-IS protocol are 443 covered in [2] and [3]. 445 6. References 447 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 448 9, RFC 2026, October 1996. 450 2 Callon, R., "OSI IS-IS for IP and Dual Environment," RFC 1195, 451 December 1990. 453 3 ISO, "Intermediate system to Intermediate system routeing 454 information exchange protocol for use in conjunction with the 455 Protocol for providing the Connectionless-mode Network Service 456 (ISO 8473)," ISO/IEC 10589:1992. 458 4 Bradner, S., "Key words for use in RFCs to Indicate Requirement 459 Levels", BCP 14, RFC 2119, March 1997 461 5 Katz, D., "Three-Way Handshake for IS-IS Point-to-Point 462 Adjacencies", draft-ietf-isis-3way-03.txt, July 2000 464 7. Acknowledgments 466 The author would like to acknowledge contributions made by Radia 467 Perlman, Mark Schaefer, Russ White and Rena Yang. 469 8. Author's Addresses 471 Mike Shand 472 Cisco Systems 473 4, The Square, 474 Stockley Park, 475 UXBRIDGE, 476 Middlesex 477 UB11 1BN, UK 479 Phone: +44 20 8756 8690 480 Email: mshand@cisco.com