idnits 2.17.1 draft-ietf-ospf-isis-flood-opt-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 11 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 12 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 503: '... choice MAY be different on each occ...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2001) is 8433 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref2' == Outdated reference: A later version (-03) exists of draft-awduche-mpls-te-optical-02 -- Possible downref: Normative reference to a draft: ref. 'Ref3' == Outdated reference: A later version (-02) exists of draft-ietf-mpls-lmp-01 -- Possible downref: Normative reference to a draft: ref. 'Ref4' Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Alex Zinin 3 Internet Draft Mike Shand 4 Expiration Date: September 2001 Cisco Systems 5 File name: draft-ietf-ospf-isis-flood-opt-01.txt March 2001 7 Flooding optimizations 8 in link-state routing protocols 10 draft-ietf-ospf-isis-flood-opt-01.txt 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its Areas, and its Working Groups. Note that other 19 groups may also distribute working documents as Internet Drafts. 21 Internet Drafts are draft documents valid for a maximum of six 22 months. Internet Drafts may be updated, replaced, or obsoleted by 23 other documents at any time. It is not appropriate to use Internet 24 Drafts as reference material or to cite them other than as a "working 25 draft" or "work in progress". 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 The flooding algorithm is one of the most important parts of any link 36 state routing protocol. It ensures that all routers within a link 37 state domain converge on the same topological information within a 38 finite period of time. To ensure reliability, typical implementations 39 of the flooding algorithm send new information via all interfaces 40 other than the one the new piece of information was received on. This 41 redundancy is necessary to guarantee that flooding is performed 42 reliably, but implies considerable overhead of utilized bandwidth and 43 CPU time if neighboring routers are connected with more than one 44 link. This document describes a method that reduces this overhead. 46 1 Introduction 48 In order to guarantee convergence of a link state routing protocol, 49 it is vital to ensure that link state PDUs (LSAs in the case of OSPF 50 [Ref1] or LSPs in the case of ISIS [Ref2]) that are originated after 51 the initial LSDB synchronization between neighbors is completed are 52 delivered to all routers within the flooding scope limits (an area or 53 the whole AS depending on the protocol and the type of the link state 54 PDU). 56 The method used by link state protocols to achieve this implies that 57 a) PDUs are transmitted reliably between any pair of routers, and b) 58 whenever a new PDU is received, it is sent across all interfaces 59 other than the one it was received on (except for the case when the 60 router is the DR in OSPF, where the LSA is sent back over the same 61 interfaces, see [Ref1]). 63 To fulfil the first requirement, link state routing protocols keep 64 retransmitting new PDUs to the neighbors that have not acknowledged 65 reception (the only exception is flooding performed on broadcast 66 links in ISIS, see [Ref2]). As an example, in OSPF, a link state 67 retransmission list is maintained for every neighbor data structure 68 on every interface. When an LSA is sent through an interface, it is 69 put on the retransmission list of every neighbor associated with this 70 interface and is removed from it only after the neighbor has 71 acknowledged reception of the LSA. Similarly, ISIS implementations 72 use SRMflag and SSNflag that are interface-specific, as well as 73 periodical CSNP announcements on broadcast links to ensure 74 reliability of flooding. 76 2 Motivation 78 If multiple links connect two routers, flooding of new information 79 will cause considerable overhead of link bandwidth and CPU time spent 80 by the protocol. Let's consider an example shown in Figure 1. 82 | | 83 | _____ 1 ___ | 84 -| +--+/ . . \+--+ |- 85 |---|R1| : : |R2|--| 86 -| +--+\_____ N ___/+--+ |- 87 | | 88 | | 89 Figure 1. Sample topology 91 When R1 receives a new PDU from its LAN segment, it installs it in 92 its LSDB and submits for flooding through all of its interfaces. 93 Since flooding presumes sending the new PDU over all interfaces 94 except for the one it was received on routers end up doing the 95 following. 97 1) R1 sends not one, but N copies of the new PDU to R2. 99 2) Only the first copy of the PDU is actually installed 100 in R2's LSDB, but link bandwidth and CPU cycles are 101 spend to transmit and process all N copies. 103 3) Furthermore, when R2 receives the first copy of the LSA 104 and installs it, it floods back to R1 N-1 copies of it, 105 again spending extra bandwidth and CPU time. 107 4) If R1 receives an acknowledgment from R2 on some links, 108 but not from others, it will keep retransmitting unacknowledged 109 LSAs though they are already in R2's LSDB. 111 The solution described in this document provides a technique to 112 minimize the overhead that link state routing protocols cause in the 113 described situation and use link bandwidth more efficiently. 115 While the described optimization is generic for OSPF and ISIS, it 116 becomes very important in the contents of MPLambdaS [Ref3] where OXCs 117 may be connected with a large number of links. The problem is 118 partially addresses by LMP [Ref4] where a single control channel is 119 used for a bundle of optical links or lambdas. However, OXCs may 120 still have a big amount of control channels between each other, and 121 some type of OXCs may not be running LMP at all. The flooding 122 optimizations described in this document ensure scalability of the 123 IGP flooding algorithm in the presence of multiple links between 124 neighboring routers and increasing amount of traffic engineering 125 information flooded in optical and MPLS networks. 127 3 Proposed solution 129 3.1 Introduction 131 The main idea of the technique described in this document is to move 132 the flooding algorithm from the per-interface to per-neighbor basis 133 in a backward-compatible manner. 135 The technique is generic for all protocols utilizing reliable 136 flooding and is based on the observation that the ultimate goal of 137 the flooding algorithm is not to send link state PDUs over all 138 interfaces, but to deliver them to all routers in the network. 140 To implement this optimization, it is necessary to maintain a list of 141 neighbors within an area. Whenever a new neighbor is discovered on an 142 interface belonging to the area, the corresponding interface neighbor 143 data structure is linked to the corresponding element in the list of 144 neighbors. Based on the information in the list of neighbors, as well 145 as on the type of interfaces they use, interfaces within the area are 146 marked either flooding-active or flooding-passive. The process of 147 election of flooding-active interfaces takes into consideration the 148 costs of interfaces, giving preference to faster interfaces. 149 Multiaccess interfaces need special treatment, since they may be 150 (usually are) associated with more than one neighbor. However, if 151 such an interface connects only two routers, it still may be marked 152 as flooding-passive. Whenever the number of entries in the list or 153 state of the adjacency in the list changes, the interface election 154 algorithm is rerun. 156 Note that since the flooding paradigm is changed from the per- 157 interface to per-neighbor basis, PDU retransmission is not performed 158 for a specific neighbor on a specific interface, but is instead done 159 for a specific neighbor in general, and it is enough to receive a 160 single acknowledgment on any interface for sending router to stop 161 retransmitting. 163 The asynchronous flooding algorithm is changed to first consider the 164 area neighbor list and then use available physical interfaces to 165 reliably deliver link state PDUs to the neighbors. Note that if more 166 than one interface to a particular neighbor is marked as flooding- 167 active, the flooding algorithm may perform equal- or unequal-cost 168 load sharing, flooding different PDUs through different links. 170 Flooding is also changed not to send PDUs to the sending neighbor via 171 other links. 173 The initial process of LSDB synchronization is also changed to take 174 advantage of multiple links. If a new adjacency is coming up and the 175 router can be sure that its LSDB is already synchronized with the 176 remote router over other links, the router can speed up the adjacency 177 establishment process by sending a empty (or limited-size) database 178 description to the remote neighbor. Note that this speeds up the 179 announcement of links that come up, since OSPF announces an adjacency 180 only when it reaches Full state, i.e., when routers have synchronized 181 their LSDBs. Skipping the LSDB synchronization part in ISIS does not 182 speed link announcement (since ISIS announces adjacency as soon as 183 two-way connectivity has been ensured), but it reduces the amount of 184 time CPU spends on processing of CSNPs and LSPs. 186 To illustrate the benefits of the described method, consider the 187 situation where R1 in Figure 1 has 100 PDUs to flood to R2, and N 188 equals 3. Without the described optimization, we would have: 190 o 300 copies of PDUs going from R1 to R2 192 o 300 LSDB lookups performed by R2 194 o 300 acknowledgements coming back from R2 196 o 300 lookups on the retransmit list or LSDB by R1 to remove 197 acknowledged PDUs 199 o 200 copies of PDUs coming back from R2 to R1 201 o 200 LSDB lookups performed by R1 203 o 200 ackowledgments going from R1 to R2 205 o 200 lookups on the retransmit list or LSDB by R2 to remove 206 acknowledged PDUs 208 If described technique is implemented, we would have: 210 o 100 copies of PDUs going from R1 to R2 possibly over dif- 211 ferent interfaces for faster transmission 213 o 100 LSDB lookups performed by R2 215 o 100 acknowledgements coming back from R2 217 o 100 lookups on the retransmit list or LSDB by R1 to remove 218 acknowledged PDUs 220 o no copies of PDUs coming back from R2 to R1 222 o no LSDB lookups performed by R1 224 o no acknowledgments going from R1 to R2 226 o no lookups on the retransmit list or LSDB by R2 to remove 227 acknowledged PDUs 229 3.2 Changes to OSPF 231 3.2.1 Data structures 233 Some basic modifications to OSPF data structures are necessary to 234 implement the described solution. 236 First of all, a new field is introduced to the area data structure, 237 called NeighborList. NeighborList is a list of entries, each contain- 238 ing the following fields. Note that the NeighborList entry is created 239 when the first neighbor data structure for the neighbor with a par- 240 ticular router ID is created within an area. 242 o NeighborID---the router ID of the neighbor connected with 243 the calculating router with one or more interfaces. 245 o P2pIntList---list of interfaces that have only one fully 246 established adjacency and it is established with the neigh- 247 bor identified by the NeighborID (point-to-point and virtual 248 links, as well as broadcast and NBMA interfaces connecting 249 only two routers). 251 o P2mpIntList---list of interfaces that have more than one 252 fully established adjacency and one of them is established 253 with the neighbor identified by the NeighborID (apparently 254 NBMA and broadcast networks). 256 o Retransmission list---list of LSAs that must be delivered to 257 the remote router using available physical interfaces. 259 Note that when a neighbor is reachable over multiple interfaces, 260 there will be more than one entry in the above lists of interfaces. 262 A new field is introduced to the interface-specific neighbor data 263 structure---NeighborEntry. When an instance of the interface-specific 264 neighbor data structure is created, its NeighborEntry is set to 265 reference the corresponding entry in the area neighbor list. Note 266 that whenever a neighbor data structure is created for an interface, 267 the P2pIntList and P2mpIntList of area neighbor data structures 268 corresponding to the neighbors reachable through the same interface 269 are modified. If only one neighbor data structure is available for 270 the interface, the interface is put on the P2pIntList for that neigh- 271 bor. Otherwise, if more than one neighbor is known (regardless of 272 the state of the neighbors), the interface is placed on the 273 P2mpIntLists of all neighbors reachable through that interface. Note 274 that point-to-point interfaces may temporarily have more than one 275 neighbor linked to the interface data structure when the router-ID of 276 the neighbor is changing. The algorithm handles such transition 277 states by temporarily putting the interface on the P2mpIntList. 279 Also, a new field is introduced to the interface data structure, 280 called FloodingActive. If the value of this field is TRUE, the inter- 281 face is used for flooding. Otherwise the interface is flooding- pas- 282 sive and no LSAs are sent over it when asynchronous flooding is per- 283 formed. Note that whenever an interface is put on a P2mpIntList of 284 any area neighbor data structure, its FloodingActive field is always 285 set to TRUE. Actually, FloodingActive field is consulted only if the 286 interface is in a P2pIntList. 288 Another field introduced to the interface data structure is LSASent, 289 that is used by the flooding procedure (see Section 3.2.3 for more 290 details). 292 Whenever there is a change in the contents of the P2pIntList or 293 P2mpIntList of an area neighbor data structure, the router performs 294 election of flooding-active interfaces among the interfaces listed in 295 the P2pIntList field. 297 Below follows the algorithm describing the election process. Note 298 that this algorithm produces the minimal set of active interfaces. 299 Implementations may use different algorithms, but these algorithms 300 must not produce a smaller set of interfaces. 302 For every entry in the area neighbor list, do the following. 304 1. If the P2mpIntList is not empty, go through all interfaces 305 in the P2pIntList and mark them flooding-passive by setting 306 the FloodingActive interface field to FALSE. (We always 307 prefer sending LSAs to multiple neighbors simultaneously). 309 2. Otherwise, among the interfaces in the P2pIntList, set 310 FloodingActive field to TRUE for those interfaces that have 311 the best interface cost. Set it to FALSE for all other 312 interfaces in the list. 314 3.2.2 Initial LSDB synchronization 316 Implementations may decide to maintain a single link state request 317 list per neighbor in an area. This may be used to split the Loading 318 process among several links when more than one adjacency is coming up 319 simultaneously. Note that in this case, whenever the link state 320 request list for a particular neighbor becomes empty, a LoadingDone 321 event should be generated for all adjacencies with this neighbor that 322 are currently in the Loading state. 324 3.2.3 Asynchronous Flooding 326 Asynchronous flooding algorithm is changed as follows. Note that 327 changes described below do not affect flooding back to a multiaccess 328 interface if the router is the DR. The changes are only in the part 329 where LSA is sent over other interfaces. 331 If the flooding scope is domain-wide, perform the following for 332 all areas. If the flooding scope is area-wide, do the following 333 steps only for the area the interface on which the LSA was 334 received belongs to. 336 Consider every neighbor element in the area neighbor list as fol- 337 lows. 339 1) If the value of the NeighborID field is equal to the router 340 ID of neighbor that sent the LSA to the router, consider the 341 next neighbor element (there is no need to send the LSA back 342 to the sending router, except for the case when the receiv- 343 ing router is the DR and the LSA is flooded back to a 344 mutliaccess interface). 346 2) If P2mpIntList is empty, go to step 3. Otherwise do the fol- 347 lowing steps 349 a) Put the LSA on the neighbor's retransmission list 351 b) Go through every interface on the P2mpIntList and do 352 the following: 354 o Compare the LSA being flooded and the one identi- 355 fied by the LSASent field of the interface data 356 structure. If the LSAs are the same, the LSA has 357 already been sent on this interfaces and next 358 interface in P2mpIntList must be considered. 360 o Send the LSA in a link state update packets set- 361 ting the destination address according to the 362 rules in Section 13 of [Ref1] 364 o Set LSASent field of the interface data structure 365 to the LSA that has just been sent. 367 c) Consider the next neighbor element in the area 368 neighbor list. (That is, skip flooding over inter- 369 faces in the P2pIntList.) 371 3) If P2pIntList is empty, consider the next neighbor element. 372 Otherwise: 374 a) Put the LSA on the neighbor's retransmission list 376 b) Go through every interface on the P2pIntList and do the 377 following: 379 o If the interface FloodingActive flag is clear, 380 skip this interface and consider the next inter- 381 face in the list. 383 o Send the LSA in a link state update packets set- 384 ting the destination address according to the 385 rules in Section 13 of [Ref1] 387 c) Consider the next neighbor element in the area neighbor 388 list. 390 Reception of OSPF acknowledgements is modified as follows. 392 Whenever a link state acknowledgement is received from a neighbor, 393 the corresponding entry in the area neighbor list is located and 394 corresponding LSA is removed from the retransmission list. 396 3.2.4 Retransmitting LSAs 398 The OSPF implementation should also be modified to perform 399 retransmission of LSAs on a per-neighbor basis. 401 Normally, the interfaces for LSA retransmission should be selected 402 according to the rules used for asynchronous LSA flooding. However, 403 implementations may consider retransmitting LSAs over a bigger set of 404 interfaces leading to the neighbor if the minimal interface set is 405 suspected to be not sufficient (because of link load, or packet 406 drops) to complete LSDB synchronization within a reasonable period of 407 time. 409 3.2.5 Opaque LSA Support 411 Described optimizations can be applied to all LSAs, including Opaque 412 LSAs that have area and domain wide flooding scope (type-10 and 413 type-11). Note however, that transmission of type-9 LSAs (that have 414 link-local flooding scope) should remain intact. 416 3.2.6 Compatibility 418 The optimization described above is designed to be backward- 419 compatible. No software modification is necessary for the neighbor- 420 ing routers. However, if both routers support the described modifica- 421 tions, the advantages will be greater. 423 3.3 Changes to ISIS 425 The changes for IS-IS are similar to those for OSPF. 427 Each non-broadcast circuit has associated with it the system ID of 428 the neighbor that is adjacent over that circuit. At each of level 1 429 and level 2, the set of one or more circuits with an adjacency at 430 that level and a common neighbor is identified as a group. SRMflags 431 are associated with groups rather than circuit. SSNflags remain 432 associated with circuits. 434 ISO/IEC 10589 describes the setting or clearing of SRMflag or SSNflag 435 on a non-broadcast circuit for the following reasons. 437 1. SSNflag is cleared after the transmission of a PSNP over cir- 438 cuit C. 440 2. A packet has been received on circuit C and a flag on circuit 441 C set or cleared as a result. 443 3. A packet has been received on circuit C and the flags on all 444 other circuits set or cleared as a result. 446 4. The flags on all circuits are set or cleared. 448 These actions are modified as described below. In these descriptions, 449 the term Sxxflag refers to either SSNflag or SRMflag. 451 1. SSNflag is cleared after the transmission of a PSNP over cir- 452 cuit C. 454 Clear the flag on circuit C. 456 2. A packet has been received on circuit C and an Sxxflag on cir- 457 cuit C is to be set or cleared as a result. Circuit C is a 458 member of group G. 460 a. If an SSNflag is to be cleared, clear ALL SSNflags for 461 circuits in group G. 463 b. If an SRMflag is to be cleared, clear the SRMflag for 464 group G. 466 c. If an SSNflag is to be set, set the SSNflag for circuit C 467 only. 469 d. If an SRMflag is to be set, set the SRMflag for group G. 471 3. A packet has been received on circuit C and the Sxxflags on 472 all other circuits are to be set or cleared as a result. Cir- 473 cuit C is a member of group G. 475 a. If an SSNflag is to be cleared, clear ALL SSNflags for 476 all circuits belonging to groups other than G. 478 b. If an SRMflag is to be cleared, clear ALL SRMflags for 479 groups other than G. 481 c. If an SRMflag is to be set, set ALL SRMflags for groups 482 other than G. 484 4. The flags on all circuits are set or cleared 486 a. If an SSNflag is to be cleared, clear ALL SSNflags for 487 all circuits. 489 b. If an SRMflag is to be cleared, clear SRMflags for all 490 groups. 492 c. If an SRMflag is to be set, set SRMflags for all groups. 494 5 Transmitting an LSP as a result of SRMflags being set on group 495 G. 497 Choose ONE circuit from group G, and transmit the LSP 498 over that circuit. 500 Where a circuit is required to be chosen from within a group, the 501 choice made is implementation dependant and may be based on any cri- 502 teria, such as bandwidth or management control. The result of the 503 choice MAY be different on each occasion. Implementations may also 504 decide to choose no point-to-point links if a neighboring system is 505 available via a broadcast circuit, since LSPs need to be flooded 506 throught it anyway. It is also possible to treat broadcast circuits 507 with only two routers attached as point-to-point circuits (inlcuding 508 Hello PDUs and LSDB synchronization pocess), note however that the 509 routers should be explicitly configured to do so. 511 4 Security issues 513 This document does not introduce any new security issues to ISIS or 514 OSPF. 516 5 Acknowledgements 518 The authors would like to acknowledge Tony Przygienda, Yakov Rekhter, 519 and John Moy for their comments, and Jeff Learman for reviewing this 520 document. 522 6 References 524 [Ref1] J. Moy. OSPF version 2. Technical Report RFC 2328, Internet 525 Engineering Task Force, 1998. ftp://ftp.isi.edu/in- 526 notes/rfc2328.txt. 528 [Ref2] ISO, "Intermediate system to Intermediate system routeing 529 information exchange protocol for use in conjunction with the 530 Protocol for providing the Connectionless-mode Network Service 531 (ISO 8473)," ISO/IEC 10589:1992. 533 [Ref3] D. Awduche, Y. Rekhter, J. Drake, R. Coltun, 534 "Multi-Protocol Lambda Switching: Combining MPLS Traffic 535 Engineering Control With Optical Crossconnects", draft- 536 awduche-mpls-te-optical-02.txt. Work in progress. 538 [Ref4] J. Lang, et al. "Link Management Protocol (LMP)", 539 draft-ietf-mpls-lmp-01.txt. Work in progress. 541 7 Authors' addresses 543 Alex Zinin Mike Shand 544 Cisco Systems Cisco Systems 545 150 West Tasman Dr. 4, The Square 546 San Jose,CA Stockley Park 547 95134 UXBRIDGE 548 E-mail: azinin@cisco.com Middlesex 549 UB11 1BN, UK 550 E-mail: mshand@cisco.com