idnits 2.17.1 draft-ietf-ospf-refresh-guide-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2000) is 8680 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Alex Zinin 3 Internet Draft Cisco Systems 4 Expiration Date: January 2001 July 2000 5 File name: draft-ietf-ospf-refresh-guide-01.txt 7 Guidelines for Efficient LSA Refreshment in OSPF 8 draft-ietf-ospf-refresh-guide-01.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its Areas, and its Working Groups. Note that other 17 groups may also distribute working documents as Internet Drafts. 19 Internet Drafts are draft documents valid for a maximum of six 20 months. Internet Drafts may be updated, replaced, or obsoleted by 21 other documents at any time. It is not appropriate to use Internet 22 Drafts as reference material or to cite them other than as a "working 23 draft" or "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 OSPF, an IGP widely deployed in IP networks today, requires each LSA 34 to be refreshed by the originating router every 30 minutes. This 35 method increases the protocol's robustness and solves potential 36 problems that may be caused by software bugs, as well as some 37 properties of the protocol itself. Though OSPF expects each LSA to be 38 refreshed independently, ABRs and ASBRs tend to originate Type 3/4/5 39 LSAs within a short period of time, thus causing periodical network 40 resource exhaustion by LSA refreshments. This document discusses the 41 techniques that can be used to remedy this problem and provide smooth 42 protocol operation. It must be noted that discussed methods can be 43 applied to other link-state routing protocols, such as IS-IS. 45 1 Introduction 47 The requirement for each LSA to be refreshed every 30 minutes is 48 really important for OSPF [Ref1]. Not only because OSPF allows LSAs 49 to live no longer than one hour, but also because LSA refreshment 50 provides an automatic method for problem correction. 52 Possible problems solved by periodical LSA refreshments include those 53 caused by the software bugs and those caused by the features of OSPF 54 itself. This document does not describe the first type of problems, 55 the reader could easily think them out. Examples include lost LSAs, 56 not installed routes, etc. The description of a protocol feature that 57 can potentially cause problems follows. 59 OSPF uses the Fletcher checksum (see Section 12.1.7 of [Ref1]) to 60 insure that the information transferred and stored in LSAs is not 61 corrupted. But this is not the only function of the LS checksum. It 62 is also used during the LSDB synchronization and LSA flooding 63 processes to understand whether the newly received LSA and an LSA 64 with the same Link State ID and Advertising Router already installed 65 in the LSDB are the same when other parameters (LS Sequence Number 66 and LS Age) do not allow to make the right decision (see Section 13.1 67 of [Ref1]). Equal values of the LS checksum indicate that LSAs 68 contain identical information, so the received LSA is not processed 69 any further. In the overwhelming majority of cases this really 70 indicates that LSAs are the same and that another copy of already 71 installed LSA is received due to the redundancy of the flooding 72 protocol. However, in some situations, for example when the 73 originating router was reloaded, it is possible that the new LSA 74 (with the same values of the LS ID and Advertising Router fields) 75 will contain different information, but will still have the same 76 checksum as the old one. This is because of the nature of the 77 checksumming process---a small number of bits is used to characterize 78 the contents of a big informational block. The probability of having 79 the same checksum for two different data blocks depends on the mean 80 size of the data blocks, the number of bits used to calculate and 81 store the checksum, and (the last, but definitely not least) the 82 algorithm used to calculate it. Since the probability of this event 83 for OSPF Link State checksum is sufficiently low, the protocol itself 84 relies on the periodical LSA refreshments to solve potential LSDB 85 inconsistency. So, the periodical LSA refreshments insure that the 86 protocol converges within a finite period of time. 88 The most straightforward approach to refresh LSAs could be to have a 89 single timer firing every 30 minutes and causing reorigination of all 90 self-originated LSAs. This method proved itself as a non-scalable 91 solution causing periodical CPU and link bandwidth exhaustion in the 92 networks. Having a separate timer for each LSA does not mean solving 93 all problems. 95 First of all, a router may have a lot of self-originated LSAs in its 96 LSDB, so it may very often be not efficient to keep one timer per 97 each LSA in terms of memory and CPU resources. Second, ABRs tend to 98 originate Type-3 and 4 LSAs within a short period of time. This is 99 caused by the fact that in some implementations Dijkstra is not 100 scheduled every time a change in the LSDB is observed, but only after 101 the LSDB has been quiet for some period of time, preventing excessive 102 CPU load. This may very likely lead to the situation when all intra- 103 area routes are announced in summary-LSAs at once. Even though these 104 originated LSAs have separate timers, these timers fire approximately 105 simultaneously. The same effect can be seen when external routing 106 information is injected into an OSPF domain by an ASBR. If the 107 external routes are available at the moment the ASBR is instructed to 108 advertise them, LSAs will be originated almost at once, causing 109 periodical refreshments to be grouped into big chunks and leading to 110 periodical congestion. Similar logic can be applied to the NSSA 111 ASBRs, NSSA translating ABRs, as well as OSPF routers originating 112 Opaque-LSAs. 114 The graph show in Figure 1 depicts the number of self-originated LSAs 115 as a function of time. This graph is greatly simplified assuming that 116 the originating router is an ASBR and injects static routing 117 information with stable topology. 119 SelfLSAs() 120 ^ 121 | 122 | | 123 | | 124 |- - - - - - - - - - - - - - - - ________..._________| 125 | ^ | 126 | | # of ASE-LSAs | 127 | | | | 128 | V | 129 |- - - - _________...___________| | 130 | | 131 | | | | 132 | | 133 | | | | 134 | ___| 135 | / | | 136 |__| | 137 +------- ---------...-----------|--------...---------|----> t 138 | | 139 | LSRefreshTime | LSRefreshTime | 140 | |<--------------------->|<------------------>| 142 t1 t2 t3 t4 144 Figure 1. Graph of SelfLSAs(t) --- number of 145 self-originated LSAs 147 At moment t1, the router originates its router-LSA. Then, at t2, the 148 router is instructed to inject external routing information into the 149 OSPF domain. At t3 (t3 - t2 == LSRefreshTime) the router reoriginates 150 the AS-external-LSAs and floods them to its neighbors (reorigination 151 of the router-LSAs is not illustrated). It can be seen that the 152 router introduces periodical traffic bursts into the network, also 153 causing excessive CPU utilization in other OSPF routers. 155 The customers of such a network can suffer from this behavior, 156 experiencing periodical network congestion and increased response 157 time. 159 The following sections of this document discuss the methods of 160 implementing OSPF LSA refreshment feature so that it does not disrupt 161 network operation. 163 2 Consideration of possible approaches 165 2.1 Dispersing LSA origination 167 One of the most obvious techniques to avoid LSA refresh grouping 168 could be dispersion of LSA originations. If LSAs were originated with 169 some interval, their refreshes wouldn't group. Though this method 170 seems to be a very easy solution, it has several disadvantages. The 171 most important one is that the time of routing information 172 propagation is increased, i.e., it may take a lot of time (protocol- 173 wise) until the last LSA in the origination queue gets served. 174 However, introducing a minor dispersion in the initial origination 175 process is encouraged. 177 2.2 Increasing LSA refresh period 179 Another "easy" way to solve the problem is to increase the refresh 180 interval up to some value near the MaxAge value. This would, on the 181 one hand, decrease the frequency of the updates and, on the other 182 hand, would let the protocol function properly without changes in the 183 aging mechanism. This solution is not ideal, because it does not 184 eliminate periodical LSA refresh bursts, though making them less 185 frequent. This method also increases the time of protocol self- 186 correction in case of problems described in Section 1. 188 2.3 Avoiding periodical refreshes 190 Another proposal is to avoid periodic LSA refreshes at all by making 191 all or some subset of LSAs be DoNotAge as in [Ref2]. This approach 192 does solve the refresh burst problem (there are no refreshments at 193 all), but it also decreases the protocol robustness by removing one 194 of the automatic survival features, thus making the protocol less 195 bullet-proof and attractive. 197 2.4 Further increasing the refresh period using DNA LSAs 199 The DNA method mentioned above could be used to increase the LSA 200 refresh interval beyond the MaxAge value. This is possible because 201 DNA LSAs are not aged by the routers, so the originating router can 202 refresh its LSAs at whatever rate it chooses. This seems to be a 203 potentially advantageous possibility. However, neither DNA LSAs, nor 204 the refresh interval increase per se can be considered as a 205 sufficient method to prevent the periodical LSA burst problem. More 206 than that, as it was already mentioned, increasing the refresh 207 interval may lead to potentially longer nonoperational periods. 209 A wide range of methods can be used to implement LSA refreshments 210 efficiently. The following section describes an approach, called 211 "LSA Refreshment Dispersion", implemented in Zebra routing software, 212 as an example of such a technique. 214 3 LSA Refreshment Dispersion 216 Optimization of the LSA refreshment process described here is based 217 on the observation that if the first refreshment interval is 218 randomized rather than set exactly to LSRefreshTime, subsequent LSA 219 refreshes will also be dispersed. If LSAs are refreshed using 220 individual timers, randomization of the first interval for self- 221 originated LSAs prevents LSA refreshment events from grouping and 222 causing network congestion. Individual LSA refreshments are 223 distributed within the LSRefreshTime period evenly. 225 The described method results in distribution of LSA origination 226 events shown in Figure 2. 228 SelfLSAs() 229 ^ 230 | 231 | __ 232 | __| 233 | __| | 234 | __| 235 | __| | 236 | __| 237 | __| | 238 | __| 239 | _| | 240 | | 241 | | | 242 | | 243 | | | 244 | ___| 245 | / | 246 |__| | 247 +------- ---------...-----------|-----> t 248 | | 249 | LSRefreshTime | 250 | |<--------------------->| 252 t1 t2 t3 254 Figure 1. Graph of SelfLSAs(t) when Refreshment Dispersion is used 256 The effect of the LSA refreshment dispersion is that OSPF does not 257 put high load on the networks, but instead uses network resources 258 evenly, allowing routers to adapt to its traffic and network 259 administrators to predict necessary resources in more effective 260 manner. 262 In order to use this technique, the implementation should schedule 263 the LSA refreshment as shown in pseudocode in Figure 3. 265 if ((LSA_SEQ (lsa) == OSPF_INITIAL_SEQUENCE_NUMBER) && 266 (LSA_AGE (lsa) == 0)) 268 delay = OSPF_LS_REFRESH_SHIFT + 269 (random () % OSPF_LS_REFRESH_TIME); 271 else { 273 delay = OSPF_LS_REFRESH_TIME - LSA_AGE(lsa); 275 if (delay < 0) delay = 0; 277 /* Add jitter to avoid syncing */ 279 delay = delay + (random () % OSPF_LS_REFRESH_JITTER) +1; 281 } 283 schedule_lsa_refresh (lsa, delay); 285 Figure 3. Pseudocode for refreshment delay calculation 287 The constants used in the algorithm are defined as follows: 289 OSPF_INITIAL_SEQUENCE_NUMBER 290 is the InitialSequenceNumber constant as defined in [Ref1] 292 OSPF_LS_REFRESH_TIME 293 is the LSRefreshTime constant as defined in [Ref1] 295 OSPF_LS_REFRESH_SHIFT 296 is the length of the silent period, i.e., the minimum time 297 that must elapse after LSA origination before the first LSA 298 refresh. 300 OSPF_LS_REFRESH_JITTER 301 is the limit for the pseudo-random number added to the LSA 302 refresh interval to avoid synchronization, i.e., to introduce 303 additional minor dispersion. 305 As it was mentioned before, maintaining a separate timer for each 306 LSA can be costly. A simple way of solving this problem is to group 307 LSAs that must be refreshed approximately at the same time and 308 maintain a single timer for them. This may be implemented by having 309 an LSA accumulation group and a periodic timer. The LSA group is 310 used to list all LSAs originated (or reoriginated) between two 311 consecutive timer expirations (within OSPF_REFRESH_GROUP_TIME 312 seconds). The group is flushed either on the timer expiration or 313 when the size of the group reaches a predefined limit 314 (OSPF_REFRESH_GROUP_LIMIT). When a group is flushed, a refresh event 315 is scheduled as shown in the pseudocode in Figure 3. The event 316 handling routine is supposed to submit the LSAs in the group for 317 reorigination. Note that this is really necessary to limit the 318 maximum number of LSAs in the LSA group and flush the group when 319 this limit is reached in order to achieve acceptable results in the 320 refresh event dispersion. 322 Even when refresh events are dispersed using the method described 323 above, there can still happen some situations where a big number of 324 LSAs must be refreshed within a short period of time. This may lead 325 to high CPU load in the originating router and protocol traffic 326 bursts in the network. It is possible to avoid these problems if 327 the OSPF implementation maintains an LSA refresh queue, which is 328 served not faster than some predefined number of LSAs per second 329 (OSPF_REFRESH_QUEUE_RATE). Note that the value of the 330 OSPF_REFRESH_QUEUE_RATE should not be very low, because this may 331 lead to the reorigination queue being never served completely and 332 some LSAs remaining unrefreshed for a long period. This may also 333 affect the initial dispersion in the refresh events introduced by 334 randomizing the first refresh interval. The recommended value for 335 the OSPF_REFRESH_QUEUE_RATE constant is several times the 336 OSPF_REFRESH_GROUP_LIMIT per second. 338 The described algorithm provides desired results in situations when 339 a router receives self-originated LSAs during adjacency 340 establishment after a reload and decides to accept them as correct 341 (and consequently does not originate the new instances). In this 342 case after the received LSAs are identified as self-originated and 343 approved for future use, they are registered in the refreshment 344 module as if they were originated. Such LSAs are also supposed to 345 be bundled into refreshment groups, but in addition to the periodic 346 timer and maximum group size the implementation should care about 347 the ages of the LSAs being grouped. Generally, if the absolute 348 difference between the age of an LSA being added to the group and 349 the first LSA in the group is larger than a certain acceptable value 350 (OSPF_REFRESH_GROUP_AGE_DIF), the group should be unconditionally 351 flushed and the new LSA must start the new group. Note that it is 352 perfectly possible for the received self-originated LSAs to be not 353 grouped by their age at all when they are transmitted in the 354 database description and link-state update packets. In the worst 355 case each refreshment group will contain a single LSA. However, even 356 in this worst case, LSAs will be grouped by the age again after the 357 first refreshment cycle, so the grouping algorithm presented above 358 is self-optimizing. 360 Below goes the functional specification of the refreshment 361 algorithm. 363 1. Each originated (reoriginated) or received self-originated LSA 364 is passed for registration to the LSA refreshment module 366 2. The LSA refreshment module bundles the LSAs submitted for 367 registration into refresh groups. The LSAs in one group share 368 the same timer. A group is flushed (see item 3) whenever one of 369 the following three events happens: 371 a) the periodical timer expires (every 372 OSPF_REFRESH_GROUP_TIME seconds) 374 b) after an LSA has been added to the group, the size of the 375 group reaches OSPF_REFRESH_GROUP_LIMIT 377 c) the age of the LSA to be added to the group differs from 378 the age of the first LSA in the group more than 379 OSPF_REFRESH_GROUP_AGE_DIF 381 3. when a group is flushed, perform the following action: 383 a) calculate the delay for the refreshment event as 384 presented in Figure 3 386 b) schedule the refreshment event with the delay calculated 387 in the previous step, passing the LSA refresh group as 388 the parameter to the event handling routine 390 c) create a new LSA refresh group 392 4. When a refresh event is processed, put the LSAs in the refresh 393 group associated with the event to the reorigination queue. 395 5 The reorigination queue is served not faster than 396 OSPF_REFRESH_QUEUE_RATE LSAs per second 398 The following table gives recommended values for the constants used 399 in the algorithm description: 401 Constant Recommended Value 402 _________________________________________________ 404 OSPF_LS_REFRESH_SHIFT 60 405 OSPF_LS_REFRESH_JITTER 10 406 OSPF_REFRESH_GROUP_TIME 1 407 OSPF_REFRESH_GROUP_LIMIT 10 408 OSPF_REFRESH_GROUP_AGE_DIF 3 409 OSPF_REFRESH_QUEUE_RATE 70 411 Table 1. Recommended values for constants used in algorithm 413 Note that the constants shown above can be implemented as configur- 414 able variables. However, the default values should be taken from the 415 above table. 417 It is also worth noting that presented algorithm can be used along 418 with the extended LSA refresh period described in subsection 2.4 of 419 this document. However, this would require all routers in the net- 420 work to support DoNotAge LSAs, which may not be the case for some 421 already deployed networks. LSA Refreshment Dispersion per se, in 422 contrast, will possibly require software upgrade only on ASBRS and 423 other routers originating a lot of LSAs. 425 For more information on the described algorithm, please refer to the 426 source code of Zebra routing software (governed by GNU GPL), avail- 427 able at http://www.zebra.org. 429 4 Security Considerations 431 The algorithm specified in this document does not raise any security 432 issues that are not already covered in [Ref1]. 434 5 References 436 [Ref1] J. Moy. OSPF version 2. Technical Report RFC 2328, 437 Internet Engineering Task Force, 1998. 438 ftp://ftp.isi.edu/in-notes/rfc2328.txt. 440 [Ref2] J. Moy. Extending OSPF to Support Demand Circuits, 441 Standards Track RFC 1793, Internet Engineering Task Force, 1998. 442 ftp://ftp.isi.edu/in-notes/rfc1793.txt. 444 6 Author's Address 446 Alex D. Zinin 447 Cisco Systems 448 150 West Tasman Dr. 449 San Jose,CA 450 95134 451 E-mail: azinin@cisco.com