idnits 2.17.1 draft-ietf-ospf-hitless-restart-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 182 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 2001) is 8469 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2370 (ref. 'Ref2') (Obsoleted by RFC 5250) ** Downref: Normative reference to an Experimental RFC: RFC 2154 (ref. 'Ref3') Summary: 7 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Moy 3 Internet Draft Sycamore Networks, Inc. 4 Expiration Date: July 2001 February 2001 5 File name: draft-ietf-ospf-hitless-restart-00.txt 7 Hitless OSPF Restart 8 draft-ietf-ospf-hitless-restart-00.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet- Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This memo documents an enhancement to the OSPF routing protocol, 34 whereby an OSPF router can stay on the forwarding path even as its 35 OSPF software is restarted. This is called "hitless restart" or 36 "non-stop forwarding". A restarting router may not be capable of 37 adjusting its forwarding in a timely manner when the network 38 topology changes. In order to avoid the possible resulting routing 39 loops the procedure in this memo automatically terminates when such 40 a topology change is detected. The restart procedure is also 41 backward-compatible, reverting to standard OSPF processing when one 42 or more of the restarting router's neighbors do not support the 43 enhancements in this memo. Proper network operation during a hitless 44 restart makes assumptions upon the operating environment of the 45 restarting router; these assumptions are also documented. 47 Table of Contents 49 1 Overview ............................................... 2 50 2 Operation of restarting router ......................... 3 51 2.1 Entering hitless restart ............................... 3 52 2.2 Exiting hitless restart ................................ 5 53 3 Operation of helper neighbor ........................... 6 54 3.1 Entering helper mode ................................... 6 55 3.2 Exiting helper mode .................................... 7 56 4 Backward compatibility ................................. 7 57 5 Notes .................................................. 7 58 References ............................................. 8 59 A Grace-LSA format ....................................... 9 60 Security Considerations ............................... 10 61 Authors' Addresses .................................... 10 63 1. Overview 65 Today many Internet routers implement a separation of control and 66 forwarding functions. Certain processors are dedicated to control 67 and management tasks such as OSPF routing, while other processors 68 perform the data forwarding tasks. This separation creates the 69 possibility of maintaining a router's data forwarding capability 70 while the router's control software is restarted/reloaded. We call 71 such a possibility "hitless restart" or "non-stop forwarding". 73 The problem that the OSPF protocol presents to hitless restart is 74 that, under normal operation, OSPF intentionally routes around a 75 restarting router while it rebuilds its link-state database. OSPF 76 avoids the restarting router to minimize the possibility of routing 77 loops and/or black holes caused by lack of database synchronization. 78 Avoidance is accomplished by have the router's neighbors reissue 79 their LSAs, omitting links to the restarting router. 81 However, if (a) the network topology remains stable and (b) the 82 restarting router is able to keep its forwarding table(s) across the 83 restart, it would be safe to keep the restarting router on the 84 forwarding path. This memo documents an enhancement to OSPF that 85 makes such hitless restart possible, and one that automatically 86 reverts back to standard OSPF for safety when network topology 87 changes are detected. 89 In a nutshell, the OSPF enhancements for hitless restart are as 90 follows. The router attempting a hitless restart originates link- 91 local Opaque-LSAs, herein called Grace-LSAs, announcing the 92 intention to perform a hitless restart, and asking for a "grace 93 period". During the grace period its neighbors continue to announce 94 the restarting router in their LSAs as if it were fully adjacent 95 (i.e., OSPF neighbor state Full), but only if the network topology 96 remains static (i.e, the contents of the LSAs in the link-state 97 database having LS types 1-5,7 remain unchanged; simple refreshes 98 are allowed). 100 There are two roles being played by OSPF routers during hitless 101 restart. First there is the router that is being restarted. The 102 operation of this router during hitless restart, including how the 103 router enters and leaves hitless restart, is the subject of Section 104 2. Then there are the router's neighbors, which must cooperate in 105 order for the restart to be hitless. During hitless restart we say 106 that the neighbors are executing in "helper mode". Section 3 covers 107 the responsibilities of a router executing in helper mode, including 108 entering and leaving helper mode. 110 2. Operation of restarting router 112 After the router restarts/reloads, it must change its OSPF 113 processing somewhat until it re-establishes full adjacencies with 114 all its previously fully-adjacent neighbors. This time period, 115 between the restart/reload and the reestablishment of adjacencies, 116 is called "hitless restart". During hitless restart: 118 (1) The restarting router does not originate LSAs with LS types 119 1-5,7. Instead, the restarting router wants the other routers 120 in the OSPF domain to calculate routes using the LSAs that it 121 had originated prior to its restart, in order to maintain 122 forwarding through the restart. 124 (2) The restarting router doesn't run its OSPF routing 125 calculations, instead using the forwarding table(s) that it 126 had built prior to the restart. 128 Otherwise, the restarting router operates the same as any other OSPF 129 router. It discovers neighbors using OSPF's Hello protocol, elects 130 Designated and Backup Designated Routers, performs the Database 131 Exchange procedure to initially synchronize link-state databases 132 with its neighbors, and maintains this synchronization through 133 flooding. 135 The processes of entering hitless restart, and of exiting hitless 136 restart (either successfully or not) are covered in the following 137 sections. 139 2.1. Entering hitless restart 141 The router (call it Router X) is informed of the desire for its 142 hitless restart when an appropriate command is issued by the 143 network operator. The network operator may also specify the 144 length of the grace period, or the necessary grace period may be 145 calculated by the router's OSPF software. 147 In preparation for the hitless restart, Router X must perform 148 the following actions before its software is restarted/reloaded. 149 Note that common OSPF shutdown procedures are *not* performed, 150 since we want the other OSPF routers to act as if Router X 151 remains in continuous service. For example, Router X does not 152 flush its locally originated LSAs, since we want them to remain 153 in other routers' link-state databases throughout the restart 154 period. 156 (1) Router X must ensure that its forwarding table(s) is/are 157 up-to-date and will remain in place across the restart. 159 (2) Router X must resign any Designated Router (DR) or Backup 160 Designated Router duties that it currently has. It does 161 this by sending Hellos with Designated Router Priority 162 set to 0. Resigning DR duties ensures that flooding works 163 unimpeded across restarts, and that the DR/Backup will 164 not change *after* the Grace-LSA is generated, which 165 would be interpreted as a topology change and would 166 terminate the hitless restart procedure prematurely. 168 (3) The router must note in non-volatile storage the 169 cryptographic sequence numbers being used for each 170 interface. Otherwise it will take up to 171 RouterDeadInterval seconds after the restart before it 172 can start to reestablish its adjacencies, which would 173 force the grace period to be lengthened severely. 175 Router X then originates the grace-LSAs. These are link-local 176 Opaque-LSAs (see Appendix A). Their LS Age field is set to 0, 177 and the requested grace period (in seconds) is inserted into the 178 body of the grace-LSA. A grace-LSA is originated for each of the 179 router's OSPF interfaces. However, a grace-LSA need not be 180 originated for an interface if either a) the interface has no 181 fully adjacent neighbors or b) the interface is of type point- 182 to-point and a grace-LSA has already been sent to the attached 183 neighbor on another interface. If Router X wants to ensure that 184 its neighbors receive the grace-LSAs, it should retransmit the 185 grace-LSAs until they are acknowledged (i.e, perform standard 186 OSPF reliable flooding of the grace-LSAs). If one or more fully 187 adjacent neighbors do not receive grace-LSAs, they will more 188 than likely cause premature termination of the hitless restart 189 procedure (see Section 4). 191 After the grace-LSAs have been sent, the router should store the 192 fact that it is performing hitless restart along with the length 193 of the requested grace period in non-volatile storage. The OSPF 194 software should then be restarted/reloaded, and when the 195 reloaded software starts executing the hitless restart 196 modifications in Section 2 above are followed. 198 2.2. Exiting hitless restart 200 On exiting "hitless restart", the reloaded router reverts back 201 to completely normal OSPF operation, reoriginating LSAs based on 202 the router's current state and recalculating its forwarding 203 table(s) based on the current contents of the link-state 204 database. The router exits hitless restart when any of the 205 following occurs: 207 (1) Router X has reestablished all its adjacencies. Router X 208 can determine this by building (but not installing or 209 flooding) its router-LSA, based on the current router 210 state, and comparing it to the router-LSA that it had 211 last originated before the restart (called the "pre- 212 restart router-LSA"). If the contents are the same, all 213 adjacencies have been reestablished. 215 (2) Router X receives an LSA that is inconsistent with its 216 pre-restart router-LSA. For example, X receives a router- 217 LSA originated by router Y that does not contain a link 218 to X, even though X's pre-start router-LSA did contain a 219 link to Y. This indicates that either a) Y does not 220 support hitless restart, b) Y never received the grace- 221 LSA or c) Y has terminated its helper mode for some 222 reason (Section 3.2). 224 (3) The grace period expires. 226 (4) Router X gets a valid hitless restart request (grace-LSA) 227 from another router. A router cannot both simultaneously 228 attempt hitless restart and help a neighboring router 229 undergo hitless restart, because the neighboring router 230 must be monitoring the network state for changes 231 throughout the entire restart period. 233 When it exits hitless restart, the reloaded router should flush 234 any grace-LSAs that it had originated. 236 3. Operation of helper neighbor 238 As a "helper neighbor" for a router X undergoing hitless restart, 239 router Y has two duties. It monitors the network for topology 240 changes, and as long as there are none, continues to its advertise 241 its LSAs as if X had remained in continuous OSPF operation. This 242 means that Y's LSAs continue to list all adjacencies to X that were 243 full (OSPF neighbor state Full) when the grace-LSA was first 244 received, regardless of their current sycnchronization state. This 245 logic affects the contents of both router-LSAs and network-LSAs, and 246 also depends on the type of interface associated with the (possibly 247 former) adjacency (see Sections 12.4.1.1 through 12.4.1.5 and 248 Section 12.4.2 of [Ref1]). 250 3.1. Entering helper mode 252 When a router Y receives a grace-LSA from router X, it enters 253 helper mode for X as long as all the following checks pass: 255 (1) There have been no changes in content to the link-state 256 database (LS types 1-5,7) since the beginning of the 257 grace period specified by the grace-LSA. The grace period 258 began N seconds ago, where N is the current LS age of the 259 grace-LSA. 261 (2) The grace period has not yet expired. This means that the 262 LS age of the grace-LSA is less than the grace period 263 specified in the body of the grace-LSA (Appendix A). 265 (3) Local policy allows Y to act as the helper for X. 266 Examples of configured policies might be a) never act as 267 helper, b) never allow the grace period to exceed a Time 268 T, or c) never act as a helper for certain specific 269 routers (specified by OSPF Router ID). 271 Note that Router Y only needs to receive a single grace-LSA from 272 X, even if X and Y attach to multiple common segments. The data 273 in the first valid grace-LSA received is used to indicate the 274 beginning and the end of the grace period -- all subsequent 275 grace-LSAs received from X are ignored. This first grace-LSA is 276 referred to below as simply "the grace-LSA from X". 278 A single router is allowed to simultaneously serve as a helper 279 for multiple restarting neighbors. 281 3.2. Exiting helper mode 283 Router Y ceases to perform the helper function for its neighbor 284 Router X when one of the following events occurs. 286 (1) The grace-LSA originated by X is flushed. This is the 287 successful termination of hitless restart. 289 (2) The grace period expires. 291 (3) Router Y receives an LSA with LS types 1-5,7 and whose 292 contents have changed. This includes LSAs with no 293 previous link-state database instance and the flushing of 294 LSAs from the database, but excludes simple LSA 295 refreshes. A change in LSA contents indicates a network 296 topology change, which forces termination of a hitless 297 restart. 299 When router Y exits helper mode for X, Y reoriginates its LSAs 300 based on the current state of its Router X adjacencies. 302 4. Backward compatibility 304 Backward-compatibility with unmodified OSPF routers is an automatic 305 consequence of the functionality documented above. If one or more 306 neighbors of a router requesting hitless restart are unmodified, or 307 if they do not received the grace-LSA, the hitless restart is 308 prematurely aborted. 310 The unmodified routers will start routing around the restarted 311 router X as it performs initial database synchronization, by 312 reissuing their LSAs with links to X omitted. These LSAs will be 313 interpreted by helper neighbors as a topology change, and by X as an 314 LSA inconsistency, in either case aborting hitless restart and 315 resuming normal OSPF operation. 317 5. Notes 319 Note the following details concerning the hitless OSPF restart 320 mechanism described in this memo. 322 o DoNotAge is never set in a grace-LSA, even if the grace-LSA is 323 flooded over a demand circuit. This is because the grace-LSA's 324 LS age field is used to calculate the extent of the grace period 325 (see Appendix A). 327 o Grace-LSAs have link-local scope because a) they only need to be 328 seen by the router's direct neighbors and b) restricting them to 329 link-local scope makes it easy to detect the illegal 330 configuration of two restarting routers being asked to help each 331 other (Section 2.2). 333 o It may be noted that the hitless restart mechanisms in this memo 334 can also be used for unplanned outages. For example, after a 335 crash of its control software, the router may come up and send 336 grace-LSAs in an attempt to remain on the forwarding path while 337 it regains its control state. This may not be a good idea, as it 338 seems unlikely that such a router could guarantee the sanity of 339 its forwarding table(s). However, if the router does attempt a 340 hitless restart from an unplanned outage, it should at the least 341 (a) allow the network operator to turn this feature off and (b) 342 attempt to determine when its forwarding tables were last 343 updated, setting the beginning of the grace period accordingly 344 (this means originating the grace-LSA with LS age equal to the 345 time that the forwarding tables were last updated). 347 References 349 [Ref1] Moy, J., "OSPF Version 2", RFC 2328, April 1998. 351 [Ref2] Coltun, R., "The OSPF Opaque LSA Option", RFC 2370, July 352 1998. 354 [Ref3] Murphy, S., M. Badger and B. Wellington, "OSPF with Digital 355 Signatures", RFC 2154, June 1997. 357 A. Grace-LSA format 359 The grace-LSA is a link-local scoped Opaque-LSA [Ref2] having Opaque 360 Type of TBD1 and Opaque ID equal to TBD2. The grace-LSA is 361 originated by a router that wishes to execute a hitless restart of 362 its OSPF software. The grace-LSA requests that the router's 363 neighbors aid it in its hitless restart by continuing to advertise 364 the router as fully adjacent during a specified grace period. 366 It is assumed that the grace-LSA has LS age field set to 0 when the 367 LSA is first originated; the current value of LS age then indicates 368 how long ago the restarting router made its request. The body of the 369 LSA contains the length of the grace period in seconds. 371 0 1 2 3 372 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 374 | LS age | Options | 9 | 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | Opaque Type | Opaque ID | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | Advertising Router | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 | LS sequence number | 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 | LS checksum | length | 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 | Grace Period | 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 Grace Period 388 The number of seconds that the router's neighbors should 389 continue to advertise the router as fully adjacent, regardless 390 of the the state of database synchronization between the router 391 and its neighbors. Since this time period began when grace-LSA's 392 LS age was equal to 0, the grace period terminates when either 393 a) the LS age of the grace-LSA exceeds the value of Grace Period 394 or b) the grace-LSA is flushed. See Section 3.2 for other 395 conditions which terminate the grace period. 397 Security Considerations 399 One of the ways to attack a link-state protocol such as OSPF is to 400 inject false LSAs into, or corrupt existing LSAs in, the link-state 401 database. Injecting a false grace-LSA would allow an attacker to 402 spoof a router that, in reality, has been withdrawn from service. 403 The standard way to prevent such corruption of the link-state 404 database is to secure OSPF protocol exchanges using the Crytographic 405 authentication specified in [Ref1]. An even stronger way of securing 406 link-state database contents has been proposed in [Ref3]. 408 Authors' Addresses 410 J. Moy 411 Sycamore Networks, Inc. 412 150 Apollo Drive 413 Chelmsford, MA 01824 414 Phone: (978) 367-2505 415 Fax: (978) 256-4203 416 email: jmoy@sycamorenet.com