idnits 2.17.1 draft-zhang-qos-ospf-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == Mismatching filename: the document gives the document name as 'draft-zhang-qos-qospf-00', but the file name used is 'draft-zhang-qos-ospf-00' ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 50 instances of too long lines in the document, the longest one being 11 characters in excess of 72. ** The abstract seems to contain references ([2], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 30 looks like a reference -- Missing reference section? '2' on line 31 looks like a reference -- Missing reference section? '4' on line 32 looks like a reference -- Missing reference section? '5' on line 217 looks like a reference Summary: 11 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Zhang, Sanchez, Salkewicz, Crawley 2 Internet-Draft Bay Networks 3 draft-zhang-qos-qospf-00.txt June, 1996 5 Quality of Service Extensions to OSPF 6 or 7 Quality Of Service Path First Routing 8 (QOSPF) 10 Status Of This Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as "work in progress". 22 To learn the current status of any Internet-Draft, please check the 23 "1id- abstracts.txt" listing contained in the Internet- Drafts Shadow 24 Directo- ries on ds.internic.net (US East Coast), nic.nordu.net 25 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 26 Rim). 28 Abstract 30 This document describes a series of extensions for OSPF[1] and 31 MOSPF[2] that can be used to provide Quality of Service (QoS) routing 32 in conjunction with a resource reservation protocol such as RSVP[4] 33 or other mechanisms that can notify routing of the QoS needs of a 34 data flow. Advertisements indicating the resources available and the 35 resources used are advertised to the OSPF routing domain and paths 36 are computed based on topology in- formation, link resource 37 information, and the resource requirements of a particular data flow. 39 1.0 Introduction 41 QoS signalling protocols such as RSVP allow the instantiation of 42 network state to provide a specific service level to a data flow. 43 RSVP is specifically not a routing protocol but it does have 44 interfaces to routing in order to determine the forwarding of its own 45 state messages. 47 Existing routing protocols are usually concerned only with topology 48 information and not network resources such as bandwidth, thus they 49 all have their limitations in providing integrated services. The 50 following figure is a simple illustration: 52 [Figure not in Text Version at this time.] 54 FIGURE 1. Example Topology 56 Suppose host H1 is sending data to host H3 at rate R. The routing 57 protocol in use gives the shortest path as defined by the metrics, 58 H1-->R1-->R3-->H3. However, even if R1 does not have adequate 59 resources on its interface to R3 to handle the flow at the rate R, 60 the route H1-->R2-->R3-->H3 that does have adequate resources 61 available, is not used because the routing protocol always uses the 62 shortest path. 64 One solution is to let the routing protocols consider network 65 resource information as well as topology information when they 66 calculate routes. With the OSPF protocol, complete topology 67 information is used to calculate routes; in QOSPF, network resource 68 information is added and used to calculate "QoS routes" that can 69 provide the resources needed for the flow even though the route may 70 not be strictly the shortest path. 72 2.0 Protocol Overview 74 2.1 Network Resource Information 76 In QOSPF, routers advertise network resource information as well as 77 topology information. A route for a data flow is calculated based on 78 topology, network resource information, and QoS requirements (e.g. 79 the TSpec of the RSVP PATH message) for the flow. 81 The network resource information includes available link resources on 82 a router as well as existing link resource reservations on the 83 router. The resource information is advertised in Link Resource 84 Advertisements (RES-LSAs) and Resource Reservation Advertisements 85 (RRAs). Another type of advertisement, Deterministic Area Border 86 Router Advertisements (DABRA), are needed for inter-area multicast 87 QOSPF. 89 There are a lot of ways to represent network resource information. In 90 this document, we use Token Bucket parameters, as in the Controlled- 91 Load Service model[5]. It is expected that resource advertisements 92 that are related to other service models could be added over time. 94 The number of RRAs can easily get huge as the number of reserved 95 flows and network size grows, presenting a scaling issue. A solution 96 to this problem is addressed by Explicit Routing, discussed in 97 Section 6.0. 99 2.2 Route Pinning 101 Topology and network resource information not only make it possible 102 to calculate a shortest route that satisfies the required QoS for a 103 flow, but also makes Route Pinning very easy to achieve. Route 104 pinning means that an existing route with a reservation will not be 105 replaced by a better route unless the existing one is no longer 106 usable because of a topology change directly related to the existing 107 route. 109 2.3 Data-driven (Source, Destination) Route Computation 111 MOSPF uses data-driven (source, destination) routing. In other words, 112 a route is computed when the first packet for a (source, group) pair 113 is received. This is in contrast to unicast OSPF which pre-computes 114 routes based on destination only. 116 In QOSPF, routing for QoS flows is based on (source, destination), 117 and routing computations are triggered by external events regardless 118 of whether the flow is unicast or multicast. The initial trigger for 119 QoS routing computation comes from a resource reservation protocol 120 such as an RSVP PATH message. 122 There are two reasons for (source, destination) routing in unicast QOSPF: 124 - Resource reservations and RRAs are generally based on (source, destina- 125 tion); 127 - When (source, destination) routing is used, flows with the same destina- 128 tion but different sources can follow different paths when necessary. 130 Note the (source, destination) routing used in unicast QOSPF does not 131 mean that the distribution tree must be rooted at the source. It only 132 means that the routing table lookup is based upon (source, 133 destination) rather than just the destination. 135 3.0 Resource Advertisements 137 Available and reserved network resources are advertised via Link 138 Resource Advertisements (RES-LSAs) and Resource Reserved 139 Advertisements (RRAs), respectively. 141 3.1 Link Resource Advertisement (RES-LSA) 143 A RES-LSA is very similar to a Router-LSA. The purpose of the RES-LSA 144 is to advertise the link resources available for each router in the 145 network. When calculating QoS routes, RES-LSAs are used instead of 146 Router-LSAs. 148 Each QOSPF router originates a RES-LSA for each area, listing the 149 largest amount of available resources for reservation on each of the 150 router's interfaces in the area, along with the link's delay metric. 151 This metric is roughly analogous to the standard OSPF cost metric but 152 is independent of the standard TOS metric to better characterize the 153 static delay properties of a link. 155 A new instance of RES-LSA is originated whenever a new Router-LSA 156 instance is originated for the area, or whenever the available 157 bandwidth resource or delay changes for a link in the area. An 158 algorithm may be used so that a new RES-LSA is originated only when 159 the available bandwidth resource changes significantly. 161 Like Router-LSAs, RES-LSAs are flooded throughout a single area. 163 The format of RES-LSAs is shown in Figure 2 165 [Figure not in Text Version at this time.] 167 FIGURE 2. Resource LSA 169 The RES-LSA header is the same as all other LSA headers. 171 The "rtype" field is the same as that in a Router LSA. 173 The "number of links" is the count of links included in the LSA. For 174 each link the link type, link ID and link data are the same as for 175 the Router LSA. 177 The available link resource is represented by token bucket 178 parameters, in IEEE single precision floating point format, as in the 179 Controlled-Load Service model[5]. 181 The link delay is a static delay metric for the link, in units of 182 milliseconds. 184 3.2 Resource Reservation Advertisement (RRA) 186 A Resource Reservation Advertisement describes a router's 187 reservations for a particular flow (source, destination) on its 188 interfaces within an area. The purpose of the RRA is to indicate the 189 resources used by a flow such that other routers are aware of the 190 resources and path used by the flow when calculate or recalculate the 191 tree for the flow. A new RRA is originated whenever one or more of 192 the reservations change. 194 Like RES-LSAs, RRAs are flooded throughout a single area. 196 The format of RRAs is shown in Figure 3. 198 [Figure not in Text Version at this time.] 200 FIGURE 3. Resource Reservation Advertisement with its Opaque LSA header 202 RRAs are encapsulated in Opaque LSAs with type = 11. The Opaque ID is 203 chosen by the advertising router and the flooding scope is "area- 204 local". 206 In RRAs, the Source is the IP address of the source of the data flow. 208 The number of links value is the count of links included in the RRA. 209 For each link, the link type, link ID and link data are identical to 210 the values used in the Router LSA. 212 The pin-flag is used for partial route-pinning discussed in Section 213 5.2. 215 The reserved bandwidth resource is represented by token bucket 216 parameters, in IEEE single precision floating point format, as in the 217 Controlled-Load Service model[5]. 219 The src_prefix_mask and dest_prefix_mask correspond to the network 220 mask length of the source and destination respectively. 222 The reservation information comes from a resource reservation 223 protocol, such as RSVP or some other mechanism for reserving 224 resources on the node. Whenever a reservation is made or canceled, 225 QOSPF will originate a new instance of the RRA for the flow. RSVP SE 226 style reservations can cause multiple RRAs to be originated depending 227 on the number of PATH state that is matched, and a RSVP WF style 228 reservation will cause a RRA with a wildcard source (0) to be 229 originated. 231 4.0 QOSPF Route Calculation 233 Input to the QOSPF Dijkstra calculation includes the source and 234 destination address and the QoS requirements for the flow, which are 235 currently the token bucket parameters from the RSVP PATH message but 236 could also come from other triggers. 238 The QOSPF Dijkstra calculation for an area is performed by processing 239 the area's RES-LSAs, Network-LSAs, RRAs, and Group-Membership-LSAs. 240 The latter is only used for the multicast case. 242 The key difference between the QOSPF Dijkstra and the normal 243 OSPF/MOSPF Dijkstra is that a router's RES-LSA rather than Router-LSA 244 is used to discover its neighbors, and links will be ignored if they 245 do not have sufficient resources (either available or already 246 reserved) for the flow. 248 To calculate the best or lowest-delay path, the static delay metric 249 is used in the same way OSPF uses the TOS zero cost metric of the 250 Router LSA. 252 4.1 Multicast QOSPF 254 4.1.1 Intra-area Multicast QOSPF 256 Like in normal MOSPF, the intra-area QoS SPF tree is forward-linked. 258 This means that the best path is chosen based on the delay metrics 259 from the source to the target. 261 4.1.2 Inter-Area Multicast QOSPF 263 In MOSPF, for a (source, group) pair, a tree has to be calculated for 264 each area and then the trees are combined into a global tree. When 265 calculating a tree for an area, if the source is in another area, the 266 root of the tree is set to all the ABRs that support MOSPF and have 267 valid Summary LSAs containing the source. 269 As shown in Figure 4, suppose the source is in area 0.0.0.0. When R5 270 and R6 calculate their trees for area 0.0.0.1, they will root the 271 trees at R2, R3, and R4. 273 [Figure not in Text Version at this time.] 275 FIGURE 4. A Tree 277 In QOSPF, links without adequate resources for a data flow are not 278 considered. So, in Figure 1, suppose the link R1->R3 does not have 279 enough bandwidth, then R3 will not be on the multicast tree for area 280 0.0.0.0 so it will not get the packets. Now when R5 and R6 calculate 281 trees for area 0.0.0.1, they should root the trees only at R2 and R4. 283 For this reason, after R2 and R4 finishes calculation for area 284 0.0.0.0, they should notify routers in area 0.0.0.1 how to root the 285 tree via Deterministic ABR-Advertisements (DABRA). 287 [Figure not in Text Version at this time.] 289 FIGURE 5. DABRA with its Opaque LSA header 291 Each ABR on the QoS tree for the "source area" of a flow originates a 292 DABRA, listing all the ABRs on the tree, and floods it throughout all 293 "downstream areas". If the source of the flow is in one of the 294 router's directly attached areas, then the area is the "source area" 295 and all other areas are "downstream" areas; otherwise (the source is 296 in an area not directly attached to the router), the backbone area is 297 the "source area" and all non- backbone areas are "downstream areas". 299 4.1.3 Inter-AS Multicast QOSPF 301 Similar to the inter-area case, there should be a notification about 302 how to root the tree. The details are not explored in this document. 304 4.1.4 Detailed Multicast QOSPF Dijkstra Calculation 306 The following is a modification to section 12.2 in the Multicast 307 Extensions to OSPF, RFC 1584. 309 1. Initialize the algorithm's data structures. 311 Same as RFC 1584. 313 2. Initialize the candidate list. 315 If the source is in another OSPF area, the candidate list is initialized 316 with the set of area border routers that are both in the DABRA (assuming 317 that a DABRA has been received at this point) and in the calculating 318 area. Otherwise, candidate list is initialized as discussed in RFC 1584. 320 3. If the candidate list is empty, the algorithm terminates. 322 Same as RFC 1584. 324 4. Move the closest candidate vertex to the shortest-path tree. 326 Same as RFC 1584. 328 5. Examine Vertex V's neighbors for possible inclusion in the candidate list. 330 If the vertex V just moved from the candidate list to the SPF tree is a 331 router vertex then lookup the router's Resource-LSA (the Resource-LSA is 332 used rather than the Router LSA to determine the router's neighbors). If 333 it is not found then go to step 4. Also lookup the router's Resource 334 Reservation Advertisement for the flow. This may or may not be found. 336 Each link (link L) in the RES-LSA describes a connection to a neighboring 337 vertex (Vertex W). A link is eligible if either an existing reservation for 338 the flow or the remaining bandwidth is enough to meet the required 339 bandwidth. 341 For the eligible links perform the following steps on vertex W. 343 a. If W is already on the SPF tree, or if W's LSA does not contain a link 344 back to vertex V (if vertex W is a router vertex use vertex W's Router 345 LSA to make this determination as it is irrelevant whether or not there 346 is reserved bandwidth in the reverse direction), or if W's LSA has LS age 347 of MaxAge, or if W is not multicast capable (indicated by the MC- bit in 348 W's originators Router LSA or RES-LSA's options field), or if W does not 349 support QOSPF (indicated by the Q-bit [See Section 7.2] in W's 350 originators Router LSA or RES-LSA's options field), do not use this link. 352 b. If vertex V is a router vertex, the delay to associate with link L is 353 the delay metric from vertex V's RES-LSA. If both V and W are routers 354 there may be multiple links and the link with the least delay is used 355 (providing that the link meets the bandwidth requirements.) The delay 356 metric is always in the forward direction. If vertex V is a network 357 the delay is zero. 359 c. The delay from the source to W is the sum of the delay from the source 360 to V and the delay of the link from V to W. Let this sum be delay C. If 361 vertex W is not yet on the candidate list then install W on the 362 candidate list and modify it's parameters as described in step 5d 363 below. Otherwise W is already on the candidate list. In this case if: 365 1) C is less than W's current delay. In this case processing is the 366 same as in RFC1584. 368 2) C is equal to W's current delay. In this case processing is the 369 same as in RFC 1584. 371 3) C is greater than W's current delay. Examine the next link in V's 372 LSA. 374 d. This section is the same as RFC 1584 except that the delay is used 375 rather than the OSPF cost metric. 377 6. go to step 3. 379 After the tree for area A is built, the calculating router determines 380 if area A is used to determine the upstream node in the same was as 381 described by RFC 1548, and if the router is an ABR and area A is the 382 "source area" for the flow, a DABRA is originated and flooded 383 throughout "downstream areas". 385 4.2 Unicast QOSPF 387 In terms of adding to and moving from the candidate list, unicast 388 QOSPF Dijkstra is very similar to multicast QOSPF so the Dijkstra 389 details are not discussed here. 391 4.2.1 Unicast QOSPF Dijkstra is needed in only one area 393 If the calculating router has multiple areas, then the best effort 394 route to the destination has to be found first to identify the area 395 that needs to run the Dijkstra: 397 1. If the route is an intra-area route, then the area that the route belongs to 398 needs to run the Dijkstra to find a QoS route to the destination network. 400 2. If the route is an inter-area route, then backbone area needs to run the 401 Dijkstra to find a QoS route to one of the ABRs that advertises the best 402 effort route. 404 3. Suppose the route is an external route. If the ASBR used by the external 405 route is within one of the router's directly attached areas, then that area 406 needs to run the Dijkstra to find out a QoS route to the ASBR; otherwise, 407 backbone area needs to run the Dijkstra to find out a QoS route to one of the 408 ABRs that advertise the ASBR. 410 Unlike best-effort Dijkstra, a complete tree for the area is not 411 needed. Once the shortest path to the destination network or the ABR 412 or the ASBR is found, the Dijkstra terminates. 414 4.2.2 Inter-area and Inter-AS Unicast QOSPF 416 In the case that the destination is not in a directly attached area, 417 things are more complicated because OSPF areas hide detailed topology 418 and network resource information. Using the topology in Figure 1 419 again; when R1 calculate a QoS route for (H, H2), it finds a QoS 420 route to ABR R2 that has a shortest best-effort route the 421 destination, but R2 can not find a QoS route to the destination. R3 422 has a QoS route to the destination but the QoS route from R1 to R3 423 was not calculated. 425 One way to solve the problem is let R2 send a "summary" to area 426 0.0.0.0 indicating that it does not have a QoS route for the 427 particular flow, so R1 will try to find a QoS route to R3. A router 428 should send the summary to each area that it sends the Type 3 Summary 429 LSAs for the destination network. 431 However this may not be good idea because there may be a large number 432 of such summaries. 434 4.3 QOSPF Dijkstra Recalculation 436 Recalculation occurs upon one or more of the following situations: 438 - New instances of conventional OSPF/MOSPF LSAs, namely Network-LSAs, 439 Summary-LSAs, AS External LSAs and Group-Membership-LSAs in multicast 440 case. 442 - New instances of DABRAs in multicast case. 444 - New instances of RES-LSAs. 446 5.0 QOSPF Route Pinning 448 Route Pinning means that once reservations on a route from a source 449 to a destination have been made, the route will not be replaced with 450 a better route, unless the first route is no longer usable. 451 Therefore, a pinned path may not continue to be the shortest path or 452 the one with the most available resources. Control over route pinning 453 can be from a number of sources, such as configuration, flags in RSVP 454 RESV messages or other administrative controls. 456 Because Resource Reservation Advertisements describe existing 457 reservations, the route pinning algorithm can be accomplished with a 458 simple modification to the QOSPF Dijkstra algorithm: 460 When the Dijkstra is run for a flow, if the links from the RRAs for 461 the flow are used in preference to links from the RES-LSAs the 462 original path is automatically preserved when possible. This will 463 occur even if a new and better path is available. 465 5.1 Route Pinning Dijkstra Modification 467 Their are two changes that are made to the QOSPF Dijkstra algorithm 468 to implement route pinning. 470 5.1.1 Adding vertices to the candidate list 472 When adding a vertex to the candidate list, if its parent has a 473 reservation for the flow on the link that leads to the vertex, the 474 vertex is marked as "reserved"; or, if its parent is a network vertex 475 marked as "reserved", it is also marked as "reserved". 477 If a neighbor W, of a vertex V that is just moved to the SPF tree, is 478 already on the candidate list and it would not be updated in the 479 normal OSPF/ MOSPF Dijkstra, it still is updated if it is not marked 480 as "reserved" and there is a reservation for the flow on the link 481 from V to W. 483 5.1.2 Moving a vertex from the candidate list to the SPF tree 485 A vertex marked as "reserved" is chosen with the smallest delay, even 486 if there is an un-reserved vertex with a smaller delay. Vertices that 487 are un-reserved are only moved to the SPF tree when there are no more 488 "reserved" vertices on the candidate list. 490 5.2 Partial Route Pinning 492 The previous section assumes that all QoS paths are to be pinned. 493 However, sometimes it is desirable that only part of a QoS 494 distribution tree is pinned because it is possible to have some 495 receivers that desire pinning and some that do not. This can also be 496 easily achieved if RSVP or some dynamic mechanism can signal the 497 desire for route pinning. 499 Suppose a router/host sends a RESV message to its previous hop router 500 A, and it indicates in the RESV message that it wants the path to be 501 pinned. Router A makes the reservation and notifies QOSPF that the 502 path should be pinned. When A originates an RRA for the flow, it sets 503 a pin-flag in the reservation for the link. When the route is 504 recalculated, instead of preferring all links with reservations, only 505 those links with "pinned" reservation are preferred, hence only part 506 of the route is pinned. 508 6.0 Explicit Routing OSPF (EROSPF) 509 QOSPF needs both available resource information and existing resource 510 reservation information in addition to the normal topology and 511 membership information. When the size of a routing domain or the 512 number of QoS data flows increases, there is a scaling problem 513 because it takes a lot of bandwidth, memory and CPU power to flood, 514 store and process the resource reservation information even though 515 many of the routers may not be interested in the information. 517 To alleviate this scaling problem, Explicit Routing (ER) can be used: 518 for a flow (source, destination) only the source router(s) (see 519 Section 6.1.1 and Section 6.2) calculate a route, and then the 520 forwarding information is distributed to the downstream routers along 521 the path. 523 Because other routers do not need to perform the Dijkstra 524 calculation, they are saved from this possible CPU-intensive 525 computation. In the QOSPF case, the resource reservation information 526 only needs to be kept on the source routers, thus saving bandwidth, 527 memory, and CPU cycles. EROSPF is also applicable to standard MOSPF 528 to reduce the computational needs of the transit routers. 530 6.1 Multicast Explicit Routing 532 The following discussion is in terms of a single area. In the multi- 533 area case, each area maintains a forwarding table, and a global 534 forwarding table comes from the merge of all the areas' forwarding 535 tables. 537 6.1.1 Source Router Determination 539 The source router for a flow in an area is determined by one of the 540 following conditions: 542 - the source of a flow is on a directly connected network within the area. 544 - the router is an ABR and the source is not in the area. 546 In other words, explicit routes are only calculated by the source 547 router and the border routers that the flow travels through. It is 548 very possible to have multiple source routers for a (source, 549 destination) pair. In this case, each source router will calculate 550 the tree separately, and then distribute forwarding information 551 (i.e., its subtree) to the downstream routers on its subtree. 553 6.1.2 Explicit Routing Advertisements (ERAs) 555 The forwarding information for a (source, destination) pair is 556 contained in an Explicit Routing Advertisement (ERA), which is passed 557 in an Opaque LSA along the subtree described by the ERA. The passing 558 scope is determined by information contained in the ERA. 560 There are two kinds of ERAs. One is an Installation-ERA, used to 561 distribute forwarding information and the other is a Flushing-ERA, 562 used to flush obsolete forwarding information. 564 6.1.2.1 Format of Installation-ERA 566 The Format of Installation-ERAs is shown in Figure 6: 568 [Figure not in Text Version at this time.] 570 FIGURE 6. Format of Installation-ERA 572 ERAs are carried in Opaque LSAs with the Opaque type 10. The Opaque 573 ID is chosen by its originator. The flooding scope is no-flooding, 574 meaning that the receiver should not flood it out. However, when the 575 receiver parses the ERA, it will build new ERA(s) off the received 576 one and send out the new ones with the same Opaque LSA header and ERA 577 header (see Section 6.1.6). 579 The no-flooding scope is not supported in the current Opaque LSA 580 specification, but we are seeking to include it in future 581 specification. 583 The source and destination masks are represented as prefix lengths. 585 Each ERA describes routers on a route tree. For each router, its 586 incoming interface and a list of outgoing interfaces are listed. The 587 interface type is the same as in OSPF Router LSAs. The interface is 588 represented as one of the following: 590 - for a numbered interface, it is the ip address of the upstream (for 591 incoming interface) or downstream (for outgoing interface) neighbor. 593 - for an unnumbered point-to-point interface, it is the interface index. 595 The offset fields (adjust offset and child offset) are used to encode 596 the subtree into the ERA body, as explained in Section 6.1.3 and 597 Section 6.1.6. 599 6.1.2.2 Flushing-ERA 601 A Flushing-ERA is used to flush a previously advertised 602 Installation-ERA when the route changes (see Section 6.1.8). The 603 flushing-ERA uses the MaxAge instance of the previously advertised 604 ERA with an empty ERA body. 606 6.1.3 Creating Installation-ERAs 608 After a source router finishes a route calculation, it builds an ERA 609 to encode the subtree that has the router itself as the root. The 610 subtree is traversed in "preorder". In the example in Figure 7 611 (numbers are interface addresses or indices), the source router A 612 will build an ERA listing routers in the order of A,B,D,E,C. 614 [Figure not in text version at this time.] 616 FIGURE 7. An example 618 The "adjust offset" is set to 0 by the source router. Except for the 619 first router placed into the ERA, when a router is added to the ERA, 620 the "child offset" of the parent's outgoing interface leading to the 621 router is set to the offset of the router in the ERA body. Note that 622 all offsets are relative to the ERA body. After building the whole 623 ERA, the source router builds one ERA for each subtree under itself 624 and unicasts the ERA to the root of the subtree, which is the first 625 router listed in the ERA. For example, router A will build an ERA for 626 the subtree rooted at B and unicast it to B, and build an ERA for the 627 subtree rooted at C and unicasts it to C. This building process is 628 pretty simple and is described in Section 6.1.6. However, the source 629 router only stores the ERA for the whole tree and not the newly built 630 ERAs. The ERA for the subtree rooted at A is shown in Figure 8. 632 [Figure not in Text Version at this time.] 634 FIGURE 8. The ERA for the subtree in Figure 7 636 6.1.4 Using Multiple ERAs for Long Routes 638 The structure and processing of the ERA allows the router computing 639 the route to encode as much of the route as can fit in a packet. The 640 source router can send an ERA to a downstream router that is not an 641 immediate neighbor providing the subtree that continues from the 642 downstream router. It is not likely that this facility would be used 643 often in many networks. 645 6.1.5 Transmitting, acknowledging, and storing of ERAs: 647 A source router stores in its database ERAs (together with their 648 Opaque LSA header) for trees with itself as the root. An ERA built 649 for an immediate downstream neighbor is unicast to the incoming 650 interface of the first router in the ERA (the first router in the ERA 651 is always the receiver), encapsulated in an Opaque LSA. 653 A router also stores in its database ERAs received from its parents, 654 but not those ERAs built for its downstream neighbors. 656 The acknowledgment and retransmission mechanism is the same as that 657 used for conventional LSAs. Since the transmission and 658 acknowledgment of OSPF LSAs are between adjacent neighbors while 659 sometimes ERAs and DABRAs need to be sent to non-adjacent routers, a 660 special pair of update/ack packets are needed for ERAs for DABRAs. 661 See Section 7.3 663 6.1.6 Processing of Installation-ERAs: 665 The first listed router in a received ERA is always the receiver 666 itself. 668 Upon ERA receipt, the forwarding entry for a (source, destination) 669 pair is installed (or updated) and associated with the ERA. 671 If there is a previous instance of the Installation-ERA, to each 672 immediate downstream neighbor listed in the previous instance of the 673 ERA but not in the new ERA, send a Flushing-ERA with the same header 674 as that of the previous instance. 676 For each immediate downstream neighbor listed in the received ERA, a 677 new ERA is constructed from the received ERA and sent to the incoming 678 interface of the first listed router in the newly constructed ERA. 679 The Opaque LSA header and the ERA header remain the same, however. 680 The new ERA's "adjust offset" is set to the "child offset" associated 681 with the outgoing interface in the received ERA that leads to the 682 neighbor. The child offsets are not changed in the new ERA. The 683 subtree for the neighbor is then copied into the new ERA. The subtree 684 is in the following range of the RECEIVED ERA BODY: 686 [child offset - old adjust offset, next child offset - old adjust offset] 688 If there is no "next child", then the remaining portion of the ERA 689 body is copied. Notice that the encoding work done by the source, and 690 the offset fields make the downstream routers' job a matter of 691 copying and shifting. 693 In the example in Figure 7, A will build two ERAs from the ERA for 694 itself, one for B and the other for C. The two ERAs are illustrated 695 in Figure 9. 697 [Figure not in Text Version at this time.] 699 FIGURE 9. ERAs built by A for B and C 701 6.1.7 Processing of Flushing-ERAs 703 Upon receipt of a Flushing-ERA, the corresponding Installation-ERA is 704 found and a MaxAge Flushing-ERA is constructed and sent out with same 705 header as the existing ERA for each immediate downstream neighbor in 706 the Installation- ERA. If a forwarding entry exists for the 707 corresponding Installation-ERA, the forwarding entry's incoming 708 interface is set to NULL (so that no packets for the (source, group) 709 will be accepted on the interface) if there are no other 710 Installation-ERAs for the (s, g). If other Installation-ERAs exist, a 711 new forwarding entry is constructed for the (source, group) pair. If 712 there is no forwarding entry for the (source, group), forwarding 713 entry with a NULL incoming interface is installed to prevent 714 forwarding of any received packet for the (source, group) pair. 716 6.1.8 Route Change 718 For all routers, if the upstream neighbor or interface of the first 719 router in an ERA goes down, a MaxAge Flushing-ERA is immediately sent 720 to each immediate downstream neighbor to flush the ERA. This does not 721 need to wait until the source finishes recalculation. 723 When there is a topology change, the source routers recalculate the 724 tree, and send updated ERAs along their subtrees. New ERAs are 725 carried in the Opaque LSAs with the same Opaque ID as in the old 726 ones, but with a larger sequence number. 728 For all routers, if a previous downstream neighbor is no longer 729 listed in a newer ERA, a Flushing-ERA with the same header of the 730 previous instance of the new ERA is sent to the neighbor to flush its 731 corresponding Installation-ERA. 733 6.2 Unicast Explicit Routing 735 While Multicast ER makes sense even if QOSPF is not used, Unicast 736 Explicit Routing is needed only for QoS routing. 738 A router is a source router for a unicast flow (source, destination) 739 when one of the following conditions exists: 741 - The source is on one of the router's directly connected networks in the 742 area that needs the Dijkstra, or 744 - The source is not in the area, and the router is an ABR. 746 The multicast ERA is also used for unicast, but in the unicast case, 747 the "MOSPF IL Type", "MOSPF Init Case", and incoming interface are 748 not used, and the number of outgoing interface is always 1. 750 6.3 Changes of behavior of QOSPF if Explicit Routing is used 752 Explicit Routing is introduced to address QOSPF's scaling problem, 753 but QOSPF does not logically depend on Explicit Routing. The 754 discussions in Section 3.0 and Section 4.0 have been assuming that no 755 ER is used. 757 When ER is used, the following behaviors of QOSPF are changed: 759 6.3.1 Flooding scope of RRAs 761 RRAs are no longer flooded throughout an area. Instead, they are sent 762 to the source router(s) using opaque "no-flooding" scope. The source 763 routers then can use opaque "link-local" scope to flood the RRAs to 764 other routers on the source network. 766 6.3.2 Flooding scope of DABRAs 768 DABRAs are no longer flooded throughout "downstream areas" of the 769 "source area". Instead, a DABRA is sent to all the ABRs on the route 770 in the "source area". 772 6.4 Quick Scaling Performance Analysis 774 The Scaling problems with QOSPF are primarily caused by RRAs, so 775 let's do a scaling analysis in terms of number of RRAs flooded per 776 second, based on the following area configuration: 778 Number of routers in the area: R 780 Average number of routers on a multicast tree: M = abs(sqr_root(R)) 782 Average number of flows that sources from a router: F 784 Period of time during which to set up all the flows: T = 10 seconds 786 For each flow, each router has to originate a RRA, so there will be 787 (R * M * F) RRAs originated. 789 If explicit routing (ER) is not used, each router will get all the 790 RRAs, so the R routers will receive (R * F * M - F) RRAs (a router 791 does not need to receive its own RRAs), i.e, (R * F * M - F)/10 RRAs 792 have to be transmitted per second. 794 If ER is used, only the source routers will receive the RRAs. 795 Assuming those RRAs are sent to the source router following the 796 reversed multicast path, then at most (1 + 2 +,,, + (M - 2) + (M - 797 1)) transmissions are needed for each flow, or F * (1 + 2 +... + (M 798 -2) + (M - 1))/10 RRAs have to be transmitted per second. 800 Changing the value of R, we have the following result: 802 Number of routers (R): 9 16 25 36 49 64 804 Number of RRAs per sec w/ ER: 0.3F 0.6F 1.0F 1.5F 2.1F 2.8F 806 Number of RRAs per sec w/o ER: 2.6F 6.3F 9.9F 21.5F 34.2F 51.1F 808 It is clear that QOSPF does not scale without ER but it scales well with ER. 810 7.0 Changes to OSPF to accommodate QOSPF/ER 812 Because of the new functionality and new types of LSAs, the following 813 changes are needed to accommodate QOSPF or ER. 815 7.1 Database Exchange 817 In OSPF, when a router exchanges its database with a neighbor, it 818 only sends the neighbor those types of LSAs that the neighbor 819 understands. This is done by checking the Options field in the 820 neighbors' Database Description packets. 822 A new bit must be added to the Options field: the Q-bit. If set, 823 means the router supports QOSPF and understands RES-LSAs. The Options 824 field is almost full now. Fortunately there are two unused bytes 825 ahead of the Option field and they can be used for more options. 827 7.2 "Rtype" field in Router/RES LSAs 829 A new bit, P-bit, in the "Rtype" field of Router LSAs and RES-LSAs 830 must be set if the router is configured to do route pinning for 831 QMOSPF, so that all routers in the same area can agree on using or 832 disabling route pinning and to compute identical multicast trees. 833 Additionally, a Q-bit, indicating that a router supports QOSPF is 834 needed. 836 7.3 New Types of OSPF packets 838 OSPF requires that any LSAs be exchanged between neighbors that are 839 supposed to become adjacent and a Link State Update/Ack packet would 840 simply be discarded if it is from a neighbor with a state less than 841 ExStart. 843 However, when ER is used, the RRAs and ERAs may be sent to non- 844 adjacent routers. The solution is to invent a new pair of update/ack 845 packets that do not require adjacency to transmit/acknowledge RRAs 846 and ERAs when ER is used. The same acknowledgment/retransmission 847 scheme as those between adjacent neighbors can be used to ensure 848 reliable transmission of RRAs and ERAs. 850 8.0 Security Considerations 852 Given that QOSPF could be triggered by RSVP, it is expected that the 853 security mechanisms for RSVP will provide authorization and access 854 control for QOSPF routing calculations. Additionally, the OSPF 855 security mechanisms for authenticating neighbors and data received 856 are very important for explicit routing since ER packets can change 857 forwarding state in a very direct manner. Especially, since an ERA 858 can be sent to a router on a different network, ERA packets' 859 authentication should be per area instead of per interface. 861 9.0 Acknowledgments 863 The authors gratefully acknowledge the following people/organizations 864 for making this protocol come together: 866 - Tim Trapp of Thompson International for the initial problem, 867 constraints, as well as constructive discussions. 869 - E-Systems, Inc. Particularly, Hai Nguyen, Gerry Rosen, and Thomas Grill 870 for their patience and perserverence during some of the difficult 871 design and development phases. 873 - Richard Over, Brad Noblet, and the Custom Services Software Engineering 874 team at Bay Networks for sheltering and nurturing the design and 875 development team. 877 - The IP group at Bay Networks for lots of support and code modification 878 to multicast routing and RSVP to support QOSPF. 880 - John Krawczyk, Ross Callon, Mohd Bashar, Mike Davis, and Dennis Baker 881 for useful design comments. 883 10.0 Notice Regarding Intellectual Property Rights 885 Bay Networks may seek patent or other intellectual property 886 protection for some or all of the technologies disclosed in this 887 document. If any standards arising from this disclosure are or become 888 protected by one or more patents assigned to Bay Networks, Bay 889 Networks intends to disclose those patents and license them on 890 reasonable and non-discriminatory terms. Future revisions of this 891 draft may contain additional information regarding specific 892 intellectual property protection sought or received. 894 11.0 References 896 1. J. Moy, OSPF Version 2, Request for Comments (RFC) 1583 898 2. J. Moy, Multicast Extensions to OSPF, Request for Comments(RFC) 1584, 899 March 1994. 901 3. R. Coltun, The OSPF Opaque LSA Option, Internet Draft, 902 draft-coltun-ospf-opaque-01.txt 904 4. R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin. Resource 905 ReSerVation Protocol (RSVP) - Version 1 Functional Specification, 906 Internet Draft, draft-ietf-rsvp-spec-12.txt, May 1996. 908 5. J. Wroclawski, Specification of the Controlled-Load Network Element 909 Service, Internet Draft, draft-ietf-intserv-ctrl-load-svc-01.txt, 910 November, 1995. 912 12.0 Authors' Address 914 Zhaohui (Jeffrey) Zhang 915 Bay Networks, Inc. 916 2 Federal Street 917 Billerica, MA 01821 918 +1 508-670-8888 919 zzhang@baynetworks.com 921 Cheryl Sanchez 922 Bay Networks, Inc. 923 3 Federal Street 924 Billerica, MA 01821 925 +1 508-670-8888 926 csanchez@baynetworks.com 928 Bill Salkewicz 929 Bay Networks, Inc. 930 3 Federal Street 931 Billerica, MA 01821 932 +1 508-670-8888 933 bsalkewi@baynetworks.com 935 Eric S. Crawley 936 Bay Networks, Inc. 937 3 Federal Street 938 Billerica, MA 01821 939 +1 508-670-8888 940 esc@baynetworks.com