idnits 2.17.1 draft-zhang-qos-ospf-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 26) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 36 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([2], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There is 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 31 looks like a reference -- Missing reference section? '2' on line 31 looks like a reference -- Missing reference section? '4' on line 33 looks like a reference -- Missing reference section? '5' on line 297 looks like a reference Summary: 12 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Zhang, Sanchez, Salkewicz, Crawley 3 Internet-Draft Bay Networks, Avici Systems 4 Redback Networks, Gigapacket Networks 5 draft-zhang-qos-ospf-01.txt September, 1997 7 Quality of Service Extensions to OSPF 8 or 9 Quality Of Service Path First Routing 10 (QOSPF) 12 Status Of This Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, and 16 its working groups. Note that other groups may also distribute working 17 documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference material 22 or to cite them other than as "work in progress". 24 To learn the current status of any Internet-Draft, please check the 25 "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow 26 Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), 27 ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). 29 Abstract 31 This document describes a series of extensions for OSPF[1] and MOSPF[2] 32 that can be used to provide Quality of Service (QoS) routing in 33 conjunction with a resource reservation protocol such as RSVP[4] or 34 other mechanisms that can notify routing of the QoS needs of a data 35 flow. Advertisements indicating the resources available and the 36 resources used are advertised to the OSPF routing domain and paths are 37 computed based on topology information, link resource information, and 38 the resource requirements of a particular data flow. 40 1.0 Introduction 42 QoS signalling protocols such as RSVP allow the instantiation of network 43 state to provide a specific service level to a data flow. RSVP is 44 specifically not a routing protocol but it does have interfaces to 45 routing in order to determine the forwarding of its own state messages. 46 Existing routing protocols are usually concerned only with topology 47 information and not network resources such as bandwidth, thus they all 48 have their limitations in providing integrated services. The following 49 figure is a simple illustration: 51 +---+ +---+ 52 |H 1| |H 2| 53 +-+-+ +-+-+ 54 | | 55 +-----+--------+----+ 56 | N1 | 57 +-----+--------+----+ 58 | | 59 +-+-+ +-+-+ 60 |R 1| |R 2| 61 +-+-+ +-+-+ 62 \ / 63 metric 1\ /metric 2 64 \ / 65 +---+ 66 |R 3| 67 +-+-+ 68 | 69 +----------+--------+ 70 | N2 | 71 +-----+--------+----+ 72 | | 73 +-+-+ +-+-+ 74 |H 3| |H 4| 75 +---+ +---+ 77 FIGURE 1. Example Topology 79 Suppose host H1 is sending data to host H3 at rate R. The routing 80 protocol in use gives the shortest path as defined by the metrics, 81 H1-->R1-->R3-->H3. However, even if R1 does not have adequate resources 82 on its interface to R3 to handle the flow at the rate R, the route 83 H1-->R2-->R3-->H3 that does have adequate resources available, is not 84 used because the routing protocol always uses the shortest path. 86 One solution is to let the routing protocols consider network resource 87 information as well as topology information when they calculate 88 routes. With the OSPF protocol, complete topology information is used to 89 calculate routes; in QOSPF, network resource information is added and 90 used to calculate "QoS routes" that can provide the resources needed for 91 the flow even though the route may not be strictly the shortest path. 93 2.0 Protocol Overview 95 2.1 Network Resource Information 97 In QOSPF, routers advertise network resource information as well as 98 topology information. A route for a data flow is calculated based on 99 topology, network resource information, and QoS requirements (e.g. the 100 TSpec of the RSVP PATH message) for the flow. 102 The network resource information includes available link resources on a 103 router as well as existing link resource reservations on the router. The 104 resource information is advertised in Link Resource Advertisements 105 (RES-LSAs) and Resource Reservation Advertisements (RRAs). Another type 106 of advertisement, Deterministic Area Border Router Advertisements 107 (DABRA), are needed for inter-area multicast QOSPF. 109 There are a lot of ways to represent network resource information. In 110 this document, we use Token Bucket parameters, as in the Controlled-Load 111 Service model[5]. It is expected that resource advertisements that are 112 related to other service models could be added over time. 114 The number of RRAs can easily get huge as the number of reserved flows 115 and network size grow, presenting a scaling issue. A solution to this 116 problem is addressed by Explicit Routing, discussed in Section 6.0. 118 2.2 Route Pinning 120 Topology and network resource information not only make it possible to 121 calculate a shortest route that satisfies the required QoS for a flow, 122 but also makes Route Pinning very easy to achieve. Route pinning means 123 that an existing route with a reservation will not be replaced by a 124 better route unless the existing one is no longer usable because of a 125 topology change directly related to the existing route. 127 2.3 Data-driven (Source, Destination) Route Computation 129 MOSPF uses data-driven (source, destination) routing. In other words, a 130 route is computed when the first packet for a (source, group) pair is 131 received. This is in contrast to unicast OSPF that pre-computes routes 132 based on destination only. 134 In QOSPF, routing for QoS flows is based on (source, destination), and 135 routing computations are triggered by external events regardless of 136 whether the flow is unicast or multicast. The initial trigger for QoS 137 routing computation comes from a resource reservation protocol such as 138 an RSVP PATH message. 140 There are two reasons for (source, destination) routing in unicast QOSPF: 142 A. Resource reservations and RRAs are generally based on (src, dst); 144 B. When (source, destination) routing is used, flows with the same 145 destination but different sources can follow different paths when 146 necessary. 148 Note the (source, destination) routing used in unicast QOSPF does not 149 mean that the distribution tree must be rooted at the source. It only 150 means that the routing table lookup is based upon (source, destination) 151 rather than just the destination. 153 3.0 Resource Advertisements 155 Available and reserved network resources are advertised via Link 156 Resource Advertisements (RES-LSAs) and Resource Reservation Advertisements 157 (RRAs), respectively. 159 3.1 Link Resource Advertisement (RES-LSA) 161 A RES-LSA is very similar to a Router-LSA. The purpose of the RES-LSA is 162 to advertise the link resources available for each router in the 163 network. When calculating QoS routes, RES-LSAs are used instead of 164 Router-LSAs. 166 Each QOSPF router originates a RES-LSA for each area, listing the 167 largest amount of available resources for reservation on each of the 168 router's interfaces in the area, along with the link's delay 169 metric. This metric is roughly analogous to the standard OSPF cost 170 metric but is independent of the standard TOS metric to better 171 characterize the static delay properties of a link. 173 A new instance of RES-LSA is originated whenever a new Router-LSA 174 instance is originated for the area, or whenever the available bandwidth 175 resource or delay changes (significantly) for a link in the area. 177 An algorithm may be used so that a new RES-LSA is originated only when 178 the available bandwidth resource changes significantly. For example, a 179 router may choose to originate a new RES-LSA only when the change of 180 available bandwidth on a link exceeds a certain amount or certain 181 percentage of total/remaining bandwidth on the link. However, this can 182 cause routers to have incorrect resource information of the router and 183 the calculated routes may lead to reservation failures. Therefore, if a 184 reservation attempt fails on a router, it should immediately advertise 185 its correct resource information. 187 Like Router-LSAs, RES-LSAs are flooded throughout a single area. 189 The format of RES-LSAs is shown in Figure 2. 191 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 192 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 | LS age | Options | 16 | 194 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 195 | Link State ID | 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | Advertising Router | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 | LS sequence number | 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 201 | LS checksum | length | 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 203 | 0 |V|E|B| 0 | # links | 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | Link ID | 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 207 | Link Data | 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 209 | Link Type | 0 | TOS 0 metric | 210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 211 | Link Delay | 212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 213 | Available BW Resource: Token Bucket Depth | 214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 215 | Available BW Resource: Token Bucket Rate | 216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 218 FIGURE 2. Resource LSA 220 The RES-LSA header is the same as all other LSA headers. 222 The V-bit, E-bit, B-bit, #Links, Link type, Link ID and Link data are 223 the same as in a Router LSA. 225 The available link resource is represented by token bucket parameters, 226 in IEEE single precision floating point format, as in the 227 Controlled-Load Service model[5]. 229 The link delay is a static delay metric for the link, in units of 230 milliseconds. 232 The RES-LSA could be combined with regular Router-LSA because the delay 233 and resource information could be encoded as special TOS metrics in 234 Router-LSAs. However this would cause Router-LSAs to be updated much 235 more frequently and may have some impact to some current OSPF 236 implementations. Therefore, we choose to use a separate advertisement. 238 3.2 Resource Reservation Advertisement (RRA) 240 A Resource Reservation Advertisement describes a router's reservations 241 for a particular flow (source, destination) on its interfaces within an 242 area. The purpose of the RRA is to indicate the resources used by a flow 243 such that other routers are aware of the resources used by the flow when 244 they calculate or recalculate the tree for the flow. A new RRA is 245 originated whenever one or more of the router's reservations change in 246 the area. 248 Like RES-LSAs, RRAs are flooded throughout a single area. 250 The format of RRAs is shown in Figure 3. 252 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 254 | LS age | Options | 15 | | 255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 256 |opaque type: 11| Opaque ID | A 257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 258 ................................................................. | 259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 260 | Destination | 261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 | Source | 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 |dst_prefix_len |src_prefix_len | 0 | #Links | 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 266 | Link ID | | 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 268 | Link Data | | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 270 | Link Type | 0 |p| 0 | B 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 272 | Reserved BW Resource: Token Bucket Depth | | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 274 | Reserved BW Resource: Token Bucket Rate | | 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 277 A. Opaque LSA header B. Repeated for each link 279 FIGURE 3. Resource Reservation Advertisement with its Opaque LSA header 280 RRAs are encapsulated in Opaque LSAs with type = 11. The Opaque ID is 281 chosen by the advertising router and the flooding scope is "area-local". 283 The Destination and Source are the IP address of the destination and 284 source of the data flow, respectively, and the dst_prefix_length and 285 src_prefix_length correspond to the length of the network mask of the 286 destination and source respectively. Usually they are just 0xffffffff. 288 The "#Links" is the number of links included in the RRA. For each link, 289 the Link type, Link ID and Link data are identical to the values used in 290 the Router LSA. 292 The P-bit in the 8-bit options field following the "Link Type" is a 293 pin-flag used for route-pinning discussed in Section 5. 295 The reserved bandwidth resource is represented by token bucket 296 parameters, in IEEE single precision floating point format, as in the 297 Controlled-Load Service model[5]. 299 The reservation information comes from a resource reservation protocol, 300 such as RSVP or some other mechanism for reserving resources on the 301 node. Whenever a reservation is made or canceled, QOSPF will originate a 302 new instance of the RRA for the flow. RSVP SE style reservations can 303 cause multiple RRAs to be originated depending on the number of PATH 304 state that is matched, and a RSVP WF style reservation will cause a RRA 305 with a wildcard source (0) to be originated. 307 4.0 QOSPF Route Calculation 309 Input to the QOSPF Dijkstra calculation includes the source and 310 destination address and the QoS requirements for the flow, which are 311 currently the token bucket parameters from the RSVP PATH message but 312 could also come from other triggers. 314 The QOSPF Dijkstra calculation for an area is performed by processing 315 the area's RES-LSAs, Network-LSAs, RRAs, and Group-Membership-LSAs. The 316 latter is only used for the multicast case. 318 The key difference between the QOSPF Dijkstra and the normal OSPF/MOSPF 319 Dijkstra is that a router's RES-LSA rather than Router-LSA is used to 320 discover its neighbors, and links will be ignored if they do not have 321 sufficient resources (resource available plus already reserved) for the 322 flow. 324 To calculate the best or lowest-delay path, the delay metric in RES-LSAs 325 is used in the same way OSPF uses the TOS zero cost metric of Router-LSAs. 327 4.1 Multicast QOSPF 329 4.1.1 Intra-area Multicast QOSPF 331 Like in normal MOSPF, the intra-area QoS SPF tree is 332 forward-linked. This means that the best path is chosen based on the 333 delay metrics from the source to the target. 335 4.1.2 Inter-Area Multicast QOSPF 337 In MOSPF, for a (source, group) pair, a tree has to be calculated for 338 each area and then the trees are combined into a global tree. When 339 calculating a tree for an area, if the source is in another area, the 340 root of the tree is set to all the ABRs that support MOSPF and have 341 valid Summary LSAs containing the source. 343 As shown in Figure 4, suppose the source is in area 0.0.0.0. When R5 344 and R6 calculate their trees for area 0.0.0.1, they will root the trees 345 at R2, R3, and R4. 347 +---+ +---+ 348 |R 1+---+ H | source 349 +-+-+ +---+ 350 /|\ 351 / | \ 352 / | \ 353 / | \ 354 / | \ 355 / | \ 356 / | \ 357 / | \ 358 +-+-+ +-+-+ +-+-+ area 0.0.0.0 359 .....|R.2|....|R.3|....|R.4|................... 360 +-+-+ +-+-+ +-+-+ area 0.0.0.1 361 \ / \ / 362 \ / \ / 363 \ / \ / 364 \ / \ / 365 +-+-+ +-+-+ 366 |R 5| |R 6| 367 +-+-+ +---+ 368 | 369 +-+-+ 370 |H 2| 371 +---+ 373 FIGURE 4. A Tree 375 In QOSPF, links without adequate resources for a data flow are not 376 considered. So, in Figure 4, suppose the link R1->R3 does not have 377 enough bandwidth, then R3 will not be on the multicast tree for area 378 0.0.0.0 so it will not get the packets. Now when R5 and R6 calculate 379 trees for area 0.0.0.1, they should root the trees only at R2 and R4. 381 For this reason, after R2 and R4 finishes calculation for area 0.0.0.0, 382 they should notify routers in area 0.0.0.1 how to root the tree via 383 Deterministic ABR-Advertisements (DABRA). 385 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 387 | LS age | Options | 15 | | 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 389 |opaque type: 12| Opaque ID | | 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 391 | Advertising Router | | 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A 393 | LS sequence number | | 394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 395 | LS checksum | length | | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 397 | flooding scope | reserved | | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 399 | Destination | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | Source | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 |src_prefix_len |dst_prefix_len | 0 | #ABRs | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | Flow spec: Token Bucket Depth | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | Flow spec: Token Bucket Rate | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 409 | Router ID | | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B 411 | Delay | | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 414 A. Opaque LSA header 415 B. Repeated for each ABR on the tree 417 FIGURE 5. DABRA with its Opaque LSA header 419 Each ABR on the QoS tree for the "source area" of a flow originates a 420 DABRA, listing all the ABRs on the tree, and floods it throughout all 421 "downstream areas". If the source of the flow is in one of the router's 422 directly attached areas, then the area is the "source area" and all 423 other areas are "downstream" areas; otherwise (the source is in an area 424 not directly attached to the router), the backbone area is the "source 425 area" and all non-backbone areas are "downstream areas". 427 4.1.3 Inter-AS Multicast QOSPF 429 Similar to the inter-area case, there should be a notification about how 430 to root the tree. The details are not explored in this document. 432 4.1.4 Detailed Multicast QOSPF Dijkstra Calculation 434 The following procedure is a modification to section 12.2 in the 435 Multicast Extensions to OSPF, RFC 1584. It tries to build a multicast 436 distribution tree that satisfies the bandwidth resource requirement 437 first, then probably a partial best effort tree to cover the rest of 438 routers and networks. 440 Two new states are added to each vertex: the delay from the source to 441 the vertex, and the resource flag indicating if there is enough 442 bandwidth resource from the source to the vertex. 444 1) Initialize the algorithm's data structures as in RFC 1584. Set the 445 initial delay to infinity and resource flag to FALSE. 447 2) Initialize the candidate list as in RFC 1584, with the following 448 differences: 450 A. In intra-area case, when a Network vertex is put into the 451 candidate list, set the resource flag to TRUE and set the delay to 452 0. 454 B. In intra-area case, when a Router vertex is put into the candidate 455 list, If its RES-LSA exists and is valid, set the resource flag to 456 TRUE, and set the delay to 0. 458 C. In inter-area cases, if the DABRA(s) for this flow exist and 459 is/are valid, and the RES-LSA for an area border router that is 460 both in one of the DABRAs and in the calculating area exists and 461 is valid, set the resource flag of the vertex for the border 462 router to TRUE, and set the delay to the delay value from the 463 DABRA. 465 3) If the candidate list is empty, the algorithm terminates. 467 Same as RFC 1584. 469 4) Move the closest candidate vertex to the shortest-path tree. 471 If there are vertices with TRUE resource flags, the one with least 472 delay is chosen. The same tie-breaker as in RFC 1584 applies. 474 Otherwise, the one with least regular OSPF cost is chosen, and the 475 same tie-breaker as in RFC 1584 applies. 477 5) Examine Vertex V's neighbors for possible inclusion in the candidate 478 list. If V is a router vertex with a TRUE resource flag, consider 479 the links in its RES-LSA. Otherwise, consider the links in its 480 Router-LSA or Network-LSA. 482 Each link (say L) describes a connection to a neighboring vertex (say 483 W) or a stub network. Skip links connecting to stub networks. 485 If W is already on the SPF tree, or if W's LSA does not contain a 486 link back to vertex V (if vertex W is a router vertex use vertex W's 487 Router LSA to make this determination as it is irrelevant whether or 488 not there is reservable bandwidth in the reverse direction), or if W's 489 LSA has LS age of MaxAge, or if W is not multicast capable (indicated 490 by the MC-bit in W's Router LSA or RES-LSA's options field), skip the 491 link. 493 For each remaining link, perform the following: 495 a. Calculate the cost between the source and vertex W (forward or 496 backward), which is the sum of the cost between the source and V 497 and the cost between V and W. Let it be C. Same as in RFC 1584. 499 If all the following conditions are met: 501 o V has a TRUE resource flag 502 o if V is a router vertex, the resource on the link satisfies 503 the requirement (the sum of available resource and existing 504 reservation for the flow is equal to or greater than the 505 requirement) 506 o if W is a router vertex, the RES-LSA for W exists and is 507 valid 509 the delay from the source to W is also calculated as the sum of 510 the delay from the source to V and the delay of the link from V to 511 W. Let this sum be delay D. 513 The delay of link L is 0 if V is a network vertex, otherwise it's 514 the delay metric from vertex V's RES-LSA. It is always in the 515 forward direction. 517 b. If vertex W is not yet on the candidate list then install W on the 518 candidate list and modify its parameters as described in RFC 519 1584. If the delay D is calculated in step A, record it in W's 520 delay state and set W's resource flag to TRUE (step 5d). 522 c. Otherwise W is already on the candidate list and there are four 523 possibilities: 525 o W has a TRUE resource flag and D is NOT calculated in step 5a - W 526 is already reachable via a path that has enough resource and 527 this new path does not have enough resource - go to next link. 529 o W has a FALSE resource flag and D is calculated - the old path 530 does not have enough resource but the new one has enough so it 531 should be used - modify W as in RFC 1584, set W's resource flag 532 to TRUE and record the delay D (step 5d). 534 o W has a FALSE resource flag and D is not calculated - we are now 535 building a best effort (partial) tree - process as in RFC 1584 - 536 go to next link if the new path has higher cost, or modify W's 537 parameters (step 5d) if the new path should be used because of 538 either lower cost or a tie-breaker. 540 o W has a TRUE resource flag and D is calculated - process as in 541 RFC 1584 but use delay instead of regular OSPF cost - go to next 542 link if the new path has higher delay, or modify W's parameters 543 (step 5d) if the new path should be used because of either lower 544 delay or a tie-breaker. 546 d. Same as in RFC 1584, plus recording the delay value D and setting 547 the resource flag to TURE when necessary. 549 6) go to step 3. 551 After the tree for area A is built, the calculating router determines if 552 area A is used to determine the upstream node in the same way as 553 described by RFC 1584. If the router is an ABR and area A is the "source 554 area" for the flow, a DABRA is also originated to advertise all area 555 border routers that are on the tree and have a TRUE resource flag. It is 556 flooded to all "downstream areas". 558 4.2 Unicast QOSPF 560 In terms of adding to and moving from the candidate list, unicast QOSPF 561 Dijkstra is very similar to multicast QOSPF so the Dijkstra details are 562 not discussed here. 564 4.2.1 Unicast QOSPF Dijkstra is needed in only one area 566 If the calculating router has multiple areas, then the best effort route 567 to the destination has to be found first to identify the area that needs 568 to run the Dijkstra: 570 1. If the route is an intra-area route, then the area that the route 571 belongs to needs to run the Dijkstra to find a QoS route to the 572 destination network. 574 2. If the route is an inter-area route, then backbone area needs to run 575 the Dijkstra to find a QoS route to one of the ABRs that advertises 576 the best effort route. 578 3. Suppose the route is an external route. If the ASBR used by the 579 external route is within one of the router's directly attached areas, 580 then that area needs to run the Dijkstra to find out a QoS route to 581 the ASBR; otherwise, backbone area needs to run the Dijkstra to find 582 out a QoS route to one of the ABRs that advertise the ASBR. 584 Unlike best-effort Dijkstra, a complete tree for the area is not 585 needed. Once the shortest path to the destination network or the ABR or 586 the ASBR is found, the Dijkstra terminates. 588 4.2.2 Inter-area and Inter-AS Unicast QOSPF 590 In the case that the destination is not in a directly attached area, 591 things are more complicated because OSPF areas hide detailed topology 592 and network resource information. Using the topology in Figure 4 593 again; when R1 calculate a QoS route for (H, H2), it finds a QoS route 594 to ABR R2 that has a shortest best-effort route the destination, but R2 595 can not find a QoS route to the destination. R3 has a QoS route to the 596 destination but the QoS route from R1 to R3 was not calculated. 598 One way to solve the problem is let R2 send a "summary" to area 0.0.0.0 599 indicating that it does not have a QoS route for the particular flow, so 600 R1 will try to find a QoS route to R3. A router should send the summary 601 to each area that it sends the Type 3 Summary LSAs for the destination 602 network. However this may not be good idea because there would be a large 603 number of such summaries. 605 4.3 QOSPF Dijkstra Recalculation 607 Recalculation occurs upon one or more of the following situations: 609 o New instances of conventional OSPF/MOSPF LSAs, namely Network-LSAs, 610 Summary-LSAs, AS External LSAs and Group-Membership-LSAs in multicast 611 case - some or all QOSPF routes need to be recalculated (see MOSPF 612 protocol spec for details in multicast case). 614 o New instances of RRAs, and DABRAs in multicast case. Only the QOSPF 615 routes related to the RRAs and DABRAs need to be recalculated. 617 o New instances of RES-LSAs - All QOSPF routes need to be recalculated. 619 5.0 QOSPF Route Pinning 621 Route Pinning means that once reservations on a route from a source to a 622 destination have been made, the route will not be replaced with a better 623 route, unless the original one is no longer usable. Therefore, a pinned 624 path may not continue to be the shortest path. Control over route 625 pinning can be from a number of sources, such as configuration, flags 626 from a signaling protocol or other administrative controls. 628 Because Resource Reservation Advertisements describe existing 629 reservations, the route pinning algorithm can be accomplished with a 630 simple modification to the QOSPF Dijkstra algorithm: 632 When the Dijkstra is run for a flow, if the links with existing 633 reservations for the flow are preferred the original path is 634 automatically preserved when possible. This will occur even if a new and 635 better path is available. 637 Sometimes it is desirable that only part of a QoS distribution tree is 638 pinned because it is possible to have some receivers that desire pinning and 639 some that do not. This can also be easily achieved if RSVP or some dynamic 640 mechanism can signal the desire for route pinning. 642 Suppose a router/host sends a RESV message to its previous hop router A, and 643 it indicates in the RESV message that it wants the path to be pinned. Router A 644 makes the reservation and notifies QOSPF that the path should be pinned. 645 When A originates an RRA for the flow, it sets the P-bit (pin-flag) in the 646 reservation for the link. When the route is recalculated, instead of 647 preferring all links with reservations, only those links with "pinned" 648 reservation are preferred, hence only part of the route is pinned. 650 Before the support from a signal protocol is available, a router simply sets 651 the p-bit in its RRAs to indicate that route pinning should be used if it is 652 configured so. 654 5.1 Route Pinning Dijkstra Modification 656 Their are two changes that are made to the QOSPF Dijkstra algorithm to 657 implement route pinning. 659 5.1.1 Adding vertices to the candidate list 661 When adding a vertex to the candidate list, if its parent has a 662 reservation for the flow on the link that leads to the vertex, and the 663 reservation has the P-bit set in the RRA, the vertex is marked as 664 "reserved"; or, if its parent is a network vertex marked as "reserved", 665 it is also marked as "reserved". 667 If a neighbor W, of a vertex V that is just moved to the SPF tree, is 668 already on the candidate list but not marked as "reserved", and it would 669 not be updated in the normal Q/MOSPF Dijkstra, it still is updated if 670 there is a reservation with the P-bit set for the flow on the link from 671 V to W. 673 In summary, vertices are moved from the candidate list to the SPF tree in 674 the order of three preference groups: vertices with the "reserved" marks; 675 vertices with the TRUE resource flags; and finally the rest best-effort 676 ones. 678 5.1.2 Moving a vertex from the candidate list to the SPF tree 680 Of those vertices with TRUE resource flags, a vertex marked as 681 "reserved" is chosen with the smallest delay, even if there is an 682 un-reserved vertex with a smaller delay. Vertices that are un-reserved 683 are only moved to the SPF tree when there are no more "reserved" 684 vertices on the candidate list. 686 6.0 Explicit Routing OSPF (EROSPF) 688 QOSPF needs both available resource information and existing resource 689 reservation information in addition to the normal topology and 690 membership information. When the size of a routing domain or the number 691 of QoS data flows increases, there is a scaling problem because it takes 692 a lot of bandwidth, memory and CPU power to flood, store and process the 693 resource reservation information even though many of the routers may not 694 be interested in the information. 696 To alleviate this scaling problem, Explicit Routing (ER) can be used: 697 for a flow (source, destination) only the source router(s) (see 698 Section 6.1.1 and Section 6.2) calculate a route, and then the 699 forwarding information is distributed to the downstream routers along 700 the path. 702 Because other routers do not need to perform the Dijkstra calculation, 703 they are saved from this possible CPU-intensive computation. In the 704 QOSPF case, the resource reservation information only needs to be kept 705 on the source routers, thus saving bandwidth, memory, and CPU 706 cycles. EROSPF is also applicable to standard MOSPF to reduce the 707 computation needs of the transit routers. 709 6.1 Multicast Explicit Routing 711 The following discussion is in terms of a single area. In the multi-area 712 case, each area maintains a forwarding table, and a global forwarding 713 table comes from the merge of all the areas' forwarding tables. 715 6.1.1 Source Router Determination 717 The source router for a flow in an area is determined by one of the 718 following conditions: 720 o the source of a flow is on a directly connected network within the 721 area. 722 o the router is an ABR and the source is not in the area. 724 In other words, explicit routes are only calculated by the source router 725 and the border routers that the flow travels through. It is very 726 possible to have multiple source routers for a (source, destination) 727 pair. In this case, each source router will calculate the tree 728 separately, and then distribute forwarding information (i.e., its 729 subtree) to the downstream routers on its subtree. 731 6.1.2 Explicit Routing Advertisements (ERAs) 733 The forwarding information for a (source, destination) pair is contained 734 in an Explicit Routing Advertisement (ERA), which is passed in an Opaque 735 LSA along the subtree described by the ERA. The passing scope is 736 determined by information contained in the ERA. 738 There are two kinds of ERAs. One is an Installation-ERA, used to 739 distribute forwarding information and the other is a Flushing-ERA, used 740 to flush obsolete forwarding information. 742 6.1.2.1 Format of Installation-ERA 744 The Format of Installation-ERAs is shown in Figure 6: 746 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 748 | LS age | Options | 15 | | 749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 750 |opaque type: 10| Opaque ID | A 751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 752 ................................................................. | 753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 754 | Destination | * 755 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ERA 756 | Source | hdr 757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * 758 |src_prefix_len |dst_prefix_len | Adjust Offset | * 759 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -* 760 | in_intf_type | mospf_il_type |mospf_init_case| #outgoing intf| | 761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 762 | incoming intf address or index | | 763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C** 764 | out_intf_type | 0 | chiled offset | | * 765 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D 766 | outgoing intf address or index | | * 767 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+** 769 A. Opaque LSA header B. ERA header 770 C. Repeated for each router on the (sub)tree 771 D. Repeated for each outgoing interface 773 FIGURE 6. Format of Installation-ERA 775 ERAs are carried in Opaque LSAs with the Opaque type 10. The Opaque ID 776 is chosen by its originator. The flooding scope is no-flooding, meaning 777 that the receiver should not flood it out. However, when the receiver 778 parses the ERA, it will build new ERA(s) off the received one and send 779 out new ones with the same Opaque LSA header and ERA header (see 780 Section 6.1.6). 782 The source and destination masks are represented as prefix lengths. 784 Each ERA describes routers on a route tree. For each router, its 785 incoming interface and a list of outgoing interfaces are listed. The 786 interface type is the same as in OSPF Router LSAs. The interface is 787 represented as one of the following: 789 o for a numbered interface, it is the ip address of the upstream (for 790 incoming interface) or downstream (for outgoing interface) neighbor. 791 o for an unnumbered point-to-point interface, it is the interface 792 index. 794 The offset fields (adjust offset and child offset) are used to encode 795 the subtree into the ERA body, as explained in Section 6.1.3 and 796 Section 6.1.6. 798 6.1.2.2 Flushing-ERA 800 A Flushing-ERA is used to flush a previously advertised Installation-ERA 801 when the route changes (see Section 6.1.8). The flushing-ERA uses the 802 MaxAge instance of the previously advertised ERA with an empty ERA body. 804 6.1.3 Creating Installation-ERAs 806 After a source router finishes a route calculation, it builds an ERA to 807 encode the subtree that has the router itself as the root. The subtree 808 is traversed in "preorder". In the example in Figure 7 (numbers are 809 interface addresses or indices), the source router A will build an ERA 810 listing routers in the order of A,B,D,E,C. 812 ==============+===================== source net 813 | 814 | 0 815 +-+-+ 816 | A | 817 +-+-+ 818 1/ \2 819 / \ 820 +-+-+ +-+-+ 821 | B | | C | 822 +-+-+ +-+-+ 823 / \ 824 3/ \4 825 +-+-+ +-+-+ 826 | D | | E | 827 +-+-+ +-+-+ 829 FIGURE 7. An example 831 The "adjust offset" is set to 0 by the source router. Except for the 832 first router placed into the ERA, when a router is added to the ERA, the 833 "child offset" of the parent's outgoing interface leading to the router 834 is set to the offset of the router in the ERA body. Note that all 835 offsets are relative to the ERA body. After building the whole ERA, the 836 source router builds one ERA for each subtree under itself and unicasts 837 the ERA to the root of the subtree, which is the first router listed in 838 the ERA. For example, router A will build an ERA for the subtree rooted 839 at B and unicast it to B, and build an ERA for the subtree rooted at C 840 and unicasts it to C. This building process is pretty simple and is 841 described in Section 6.1.6. However, the source router only stores 842 the ERA for the whole tree and not newly built ERAs. The ERA for the 843 subtree rooted at A is shown in Figure 8. 845 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 847 | LS age | Options | 15 | | 848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 849 |opaque type: 10| Opaque ID | | 850 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 851 ................................................................. 852 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 853 | Destination | * 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ERA 855 | Source | hdr 856 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * 857 |src_prefix_len |dst_prefix_len | Adjust Offset: 0 | * 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -* 859 0 | in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 2| | 860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 861 4 | incoming intf address or index: 0 | | 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 863 8 | out_intf_type | 0 | chiled offset: 24 | A 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 865 12| outgoing intf address or index: 1 | | 866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 867 16| out_intf_type | 0 | chiled offset: 64 | | 868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 869 20| outgoing intf address or index: 2 | | 870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 871 24| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 2| | 872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 873 28| incoming intf address or index: 1 | | 874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 875 32| out_intf_type | 0 | chiled offset: 48 | | 876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B 877 36| outgoing intf address or index: 3 | | 878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 879 40| out_intf_type | 0 | chiled offset: 56 | | 880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 881 44| outgoing intf address or index: 4 | | 882 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 883 48| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 884 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D 885 52| incoming intf address or index: 3 | | 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 887 56| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ E 889 60| incoming intf address or index: 4 | | 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 891 64| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C 893 68| incoming intf address or index: 2 | | 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 895 FIGURE 8. The ERA for the subtree in Figure 7 897 6.1.4 Using Multiple ERAs for Long Routes 899 The structure and processing of the ERA allows the router computing the 900 route to encode as much of the route as can fit in a packet. The source 901 router can send an ERA to a downstream router that is not an immediate 902 neighbor providing the subtree that continues from the downstream 903 router. It is not likely that this facility would be used often in many 904 networks. 906 6.1.5 Transmitting, acknowledging, and storing of ERAs: 908 A source router stores in its database ERAs (together with their Opaque 909 LSA header) for trees with itself as the root. An ERA built for an 910 immediate downstream neighbor is unicast to the incoming interface of 911 the first router in the ERA (the first router in the ERA is always the 912 receiver), encapsulated in an Opaque LSA. 914 A router also stores in its database ERAs received from its parents, but 915 not those ERAs built for its downstream neighbors. 917 The acknowledgment and retransmission mechanism is the same as that used 918 for conventional LSAs. Since the transmission and acknowledgment of OSPF 919 LSAs are between adjacent neighbors while sometimes ERAs and DABRAs need 920 to be sent to non-adjacent routers, a special pair of update/ack packets 921 are needed for ERAs for DABRAs. See Section 7.3 923 6.1.6 Processing of Installation-ERAs: 925 The first listed router in a received ERA is always the receiver itself. 927 Upon ERA receipt, the forwarding entry for a (source, destination) pair 928 is installed (or updated) and associated with the ERA. 930 If there is a previous instance of the Installation-ERA, to each 931 immediate downstream neighbor listed in the previous instance of the ERA 932 but not in the new ERA, send a Flushing-ERA with the same header as that 933 of the previous instance. 935 For each immediate downstream neighbor listed in the received ERA, a new 936 ERA is constructed from the received ERA and sent to the incoming 937 interface of the first listed router in the newly constructed ERA. The 938 Opaque LSA header and the ERA header remain the same, however. The new 939 ERA's "adjust offset" is set to the "child offset" associated with the 940 outgoing interface in the received ERA that leads to the neighbor. The 941 child offsets are not changed in the new ERA. The subtree for the 942 neighbor is then copied into the new ERA. The subtree is in the 943 following range of the RECEIVED ERA BODY: 945 [child offset - old adjust offset, next child offset - old adjust offset] 946 If there is no "next child", then the remaining portion of the ERA body 947 is copied. Notice that the encoding work done by the source, and the 948 offset fields make the downstream routers' job a matter of copying and 949 shifting. 951 In the example in Figure 7, A will build two ERAs from the ERA for 952 itself, one for B and the other for C. The two ERAs are illustrated in 953 Figure 9. 955 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 957 | LS age | Options | 15 | | 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 959 |opaque type: 10| Opaque ID | | 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 961 | Advertising Router | | 962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 963 | LS sequence number | | 964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 965 | LS checksum | length | | 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 967 | flooding scope | reserved | | 968 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 969 | Destination | * 970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * 971 | Source | * 972 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * 973 |src_prefix_len |dst_prefix_len | Adjust Offset: 24 | * 974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -* 975 0 | in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 2| | 976 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 977 4 | incoming intf address or index: 1 | | 978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 979 8 | out_intf_type | 0 | chiled offset: 48 | | 980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B 981 12| outgoing intf address or index: 3 | | 982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 983 16| out_intf_type | 0 | chiled offset: 56 | | 984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 985 20| outgoing intf address or index: 4 | | 986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 987 24| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D 989 28| incoming intf address or index: 3 | | 990 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 991 32| in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 992 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ E 993 36| incoming intf address or index: 4 | | 994 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 995 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 996 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 997 | LS age | Options | 15 | | 998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 999 |opaque type: 10| Opaque ID | | 1000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1001 | Advertising Router | | 1002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1003 | LS sequence number | | 1004 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1005 | LS checksum | length | | 1006 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1007 | flooding scope | reserved | | 1008 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 1009 | Destination | * 1010 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * 1011 | Source | ERA 1012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ hdr 1013 |src_prefix_len |dst_prefix_len | Adjust Offset: 64 | * 1014 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -* 1015 0 | in_intf_type | mospf_il_type |mospf_init_case|#outgoing if: 0| | 1016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C 1017 4 | incoming intf address or index: 2 | | 1018 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ 1020 FIGURE 9. ERAs built by A for B and C 1022 6.1.7 Processing of Flushing-ERAs 1024 Upon receipt of a Flushing-ERA, the corresponding Installation-ERA is 1025 found and a MaxAge Flushing-ERA is constructed and sent out with same 1026 header as the existing ERA for each immediate downstream neighbor in the 1027 Installation-ERA. If a forwarding entry exists for the corresponding 1028 Installation-ERA, the forwarding entry's incoming interface is set to 1029 NULL (so that no packets for the (source, group) will be accepted on the 1030 interface) if there are no other Installation-ERAs for the (s, g). If 1031 other Installation-ERAs exist, a new forwarding entry is constructed for 1032 the (source, group) pair. If there is no forwarding entry for the 1033 (source, group), forwarding entry with a NULL incoming interface is 1034 installed to prevent forwarding of any received packet for the (source, 1035 group) pair. 1037 6.1.8 Route Change 1039 For all routers, if the upstream neighbor or interface of the first 1040 router in an ERA goes down, a MaxAge Flushing-ERA is immediately sent to 1041 each immediate downstream neighbor to flush the ERA. This does not need 1042 to wait until the source finishes recalculation. 1044 When there is a topology change, the source routers recalculate the 1045 tree, and send updated ERAs along their subtrees. New ERAs are carried 1046 in the Opaque LSAs with the same Opaque ID as in the old ones, but with 1047 a larger sequence number. 1049 For all routers, if a previous downstream neighbor is no longer listed 1050 in a newer ERA, a Flushing-ERA with the same header of the previous 1051 instance of the new ERA is sent to the neighbor to flush its 1052 corresponding Installation-ERA. 1054 6.2 Unicast Explicit Routing 1056 While Multicast ER makes sense even if QOSPF is not used, Unicast 1057 Explicit Routing is needed only for QoS routing. 1059 A router is a source router for a unicast flow (source, destination) 1060 when one of the following conditions exists: 1062 o The source is on one of the router's directly connected networks in 1063 the area that needs the Dijkstra, or, 1064 o The source is not in the area, and the router is an ABR. 1066 The multicast ERA is also used for unicast, but in the unicast case, the 1067 "MOSPF IL Type", "MOSPF Init Case", and incoming interface are not used, 1068 and the number of outgoing interface is always 1. 1070 6.3 Changes of behavior of QOSPF if Explicit Routing is used 1072 Explicit Routing is introduced to address QOSPF's scaling problem, but 1073 QOSPF does not logically depend on Explicit Routing. The discussions in 1074 Section 3.0 and Section 4.0 have been assuming that no ER is used. 1075 When ER is used, the following behaviors of QOSPF are changed: 1077 6.3.1 Flooding scope of RRAs 1079 RRAs are no longer flooded throughout an area. Instead, a RRA is sent to 1080 the source router that advertised the explicit route (branch) to the 1081 originator of the RRA. If the source router is on the source network in 1082 the same area, it then uses "link-local" scope to flood the RRAs to 1083 other routers on the source network. 1085 6.3.2 Flooding scope of DABRAs 1087 DABRAs are no longer flooded throughout "downstream areas" of the 1088 "source area". Instead, a DABRA is sent to all the ABRs on the route in 1089 the "source area". 1091 6.4 Quick Scaling Performance Analysis 1093 The Scaling problems with QOSPF are primarily caused by RRAs, so let's 1094 do a scaling analysis in terms of number of RRAs flooded per second, 1095 based on the following area configuration: 1097 Number of routers in the area: R 1098 Average number of routers on a multicast tree: M = abs(sqr_root(R)) 1099 Average number of flows that sources from a router: F 1100 Period of time during which to set up all the flows: T = 10 seconds 1102 For each flow, each router has to originate a RRA, so there will be (R * 1103 M * F) RRAs originated. 1105 If explicit routing (ER) is not used, each router will get all the RRAs, 1106 so the R routers will receive (R * F * M - F) RRAs (a router does not 1107 need to receive its own RRAs), i.e, (R * F * M - F)/10 RRAs have to be 1108 transmitted per second. 1110 If ER is used, only the source routers will receive the RRAs. Assuming 1111 those RRAs are sent to the source router following the reversed 1112 multicast path, then at most (1 + 2 +,,, + (M - 2) + (M - 1)) 1113 transmissions are needed for each flow, or F * (1 + 2 +... + (M -2) + (M 1114 - 1))/10 RRAs have to be transmitted per second. 1116 Changing the value of R, we have the following result: 1118 Number of routers (R): 9 16 25 36 49 64 1119 Number of RRAs per sec w/ ER: 0.3F 0.6F 1.0F 1.5F 2.1F 2.8F 1120 Number of RRAs per sec w/o ER: 2.6F 6.3F 9.9F 21.5F 34.2F 51.1F 1122 It is clear that QOSPF does not scale without ER but it scales well with ER. 1124 7.0 Changes to OSPF to accommodate QOSPF/ER 1126 Because of the new functionality and new types of LSAs, the following 1127 changes are needed to accommodate QOSPF or ER. 1129 7.1 The Options field 1131 The Options field in OSPF Hello, Database Description packet and all LSAs 1132 indicates what optional capabilities a router supports. 1134 A new bit must be added to the Options field: the Q-bit. If set, it means the 1135 router supports QOSPF and understands RES-LSAs. The Q-bit matters only in 1136 Database Description packets and Router LSAs. 1138 +---+---+---+---+---+---+---+---+ 1139 | * | Q |DC |EA |N/P|MC | E | * | 1140 +---+---+---+---+---+---+---+-+-+ 1142 When a router exchanges its database with a neighbor, it only sends the 1143 neighbor those types of LSAs that the neighbor understands. If the neighbor 1144 does not set the Q-bit in its Database Description packets, the router 1145 should not include RES-LSAs in its Database Description packets and LS 1146 Update packets. 1148 QOSPF Dijkstra should not be used if there is at least one router that 1149 does not support QOSPF comes up in an area. This is indicated by the 1150 existence of a valid Router-LSA with the Q-bit cleared in the Options field. 1152 However, note that if all multicast-capable routers supports QOSPF, then 1153 the QOSPF Dijkstra for multicast can still be used. 1155 7.3 New Types of OSPF packets 1157 OSPF requires that any LSAs be exchanged between neighbors that are 1158 supposed to become adjacent and a Link State Update/Ack packet would 1159 simply be discarded if it is from a neighbor with a state less than 1160 ExStart. However, when ER is used, the RRAs and ERAs may be sent to 1161 non-adjacent routers. The solution is to invent a new pair of update/ack 1162 packets that do not require adjacency to transmit/acknowledge RRAs and 1163 ERAs when ER is used. The same acknowledgment/retransmission scheme as 1164 those between adjacent neighbors can be used to ensure reliable 1165 transmission of RRAs and ERAs. 1167 8.0 Security Considerations 1169 Given that QOSPF could be triggered by RSVP, it is expected that the 1170 security mechanisms for RSVP will provide authorization and access 1171 control for QOSPF routing calculations. Additionally, the OSPF security 1172 mechanisms for authenticating neighbors and data received are very 1173 important for explicit routing since ER packets can change forwarding 1174 state in a very direct manner. Especially, since an ERA can be sent to a 1175 router on a different network, ERA packets' authentication should be per 1176 area instead of per interface. 1178 9.0 Acknowledgments 1180 The authors gratefully acknowledge the following people/organizations 1181 for making this protocol come together: 1183 o Tim Trapp of Thompson International for the initial problem, 1184 constraints, as well as constructive discussions. 1186 o E-Systems, Inc. Particularly, Hai Nguyen, Gerry Rosen, and Thomas 1187 Grill for their patience and perserverence during some of the 1188 difficult design and development phases. 1190 o John Krawczyk, Ross Callon, Mohd Bashar, Mike Davis, Ambrose Kwong, 1191 Billy Ng and Dennis Baker for useful design comments. 1193 o The IP group and Multimedia group at Bay Networks for lots of coding 1194 and debugging support. 1196 10.0 Notice Regarding Intellectual Property Rights 1198 Bay Networks may seek patent or other intellectual property protection 1199 for some or all of the technologies disclosed in this document. If any 1200 standards arising from this disclosure are or become protected by one or 1201 more patents assigned to Bay Networks, Bay Networks intends to disclose 1202 those patents and license them on reasonable and non-discriminatory 1203 terms. Future revisions of this draft may contain additional information 1204 regarding specific intellectual property protection sought or received. 1206 11.0 References 1208 1. J. Moy, OSPF Version 2, Request for Comments (RFC) 1583 1210 2. J. Moy, Multicast Extensions to OSPF, Request for Comments(RFC) 1584, 1211 March 1994. 1213 3. R. Coltun, The OSPF Opaque LSA Option, Internet Draft, 1214 draft-coltun-ospf-opaque-01.txt 1216 4. R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin. Resource 1217 ReSerVation Protocol (RSVP) - Version 1 Functional Specification, 1218 Internet Draft, draft-ietf-rsvp-spec-12.txt, May 1996. 1220 5. J. Wroclawski, Specification of the Controlled-Load Network Element 1221 Service, Internet Draft, draft-ietf-intserv-ctrl-load-svc-01.txt, 1222 November, 1995. 1224 12.0 Authors' Addresses 1226 Zhaohui (Jeffrey) Zhang 1227 Bay Networks, Inc. 1228 2 Federal Street 1229 Billerica, MA 01821 1230 +1 508-670-8888 1231 zzhang@baynetworks.com 1233 Cheryl Sanchez 1234 Avici Systems, Inc. 1235 12 Elizabeth Drive. 1236 Chelmsford, MA 01824 1237 +1 508-250-3344 1238 csanchez@Avici.com 1240 Bill Salkewicz, bills@redbacknetworks.com 1242 Eric S. Crawley 1243 Gigapacket Networks, Inc. 1244 25 Porter Road 1245 Littleton, MA 01460 1246 508-486-0665 1247 esc@gigapacket.com