idnits 2.17.1 draft-enyedi-rtgwg-mrt-frr-algorithm-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.atlas-rtgwg-mrt-frr-architecture]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 25 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2011) is 4568 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'R' is mentioned on line 1529, but not defined == Missing Reference: 'F' is mentioned on line 551, but not defined == Missing Reference: 'C' is mentioned on line 1529, but not defined == Missing Reference: 'G' is mentioned on line 1529, but not defined == Missing Reference: 'E' is mentioned on line 231, but not defined == Missing Reference: 'D' is mentioned on line 551, but not defined == Missing Reference: 'J' is mentioned on line 118, but not defined == Missing Reference: 'A' is mentioned on line 231, but not defined == Missing Reference: 'B' is mentioned on line 124, but not defined == Missing Reference: 'H' is mentioned on line 1188, but not defined == Missing Reference: 'K' is mentioned on line 551, but not defined == Missing Reference: 'N' is mentioned on line 551, but not defined == Missing Reference: 'X' is mentioned on line 1188, but not defined == Missing Reference: 'Y' is mentioned on line 1188, but not defined == Missing Reference: 'R-small' is mentioned on line 1147, but not defined == Missing Reference: 'R-great' is mentioned on line 1164, but not defined == Missing Reference: 'I' is mentioned on line 1529, but not defined == Unused Reference: 'I-D.ietf-rtgwg-lfa-applicability' is defined on line 1643, but no explicit reference was found in the text == Unused Reference: 'LFARevisited' is defined on line 1660, but no explicit reference was found in the text == Unused Reference: 'LightweightNotVia' is defined on line 1666, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-atlas-rtgwg-mrt-frr-architecture-00 == Outdated reference: A later version (-11) exists of draft-ietf-rtgwg-ipfrr-notvia-addresses-07 == Outdated reference: A later version (-06) exists of draft-ietf-rtgwg-lfa-applicability-03 == Outdated reference: A later version (-01) exists of draft-shand-remote-lfa-00 -- Obsolete informational reference (is this intentional?): RFC 3137 (Obsoleted by RFC 6987) Summary: 2 errors (**), 0 flaws (~~), 26 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group A. Atlas 3 Internet-Draft Juniper Networks 4 Intended status: Informational G. Enyedi 5 Expires: April 26, 2012 A. Csaszar 6 Ericsson 7 October 24, 2011 9 Algorithms for computing Maximally Redundant Trees for IP/LDP Fast- 10 Reroute 11 draft-enyedi-rtgwg-mrt-frr-algorithm-00 13 Abstract 15 A complete solution for IP and LDP Fast-Reroute using Maximally 16 Redundant Trees is presented in 17 [I-D.atlas-rtgwg-mrt-frr-architecture]. This document describes an 18 algorithm that can be used to compute the necessary Maximally 19 Redundant Trees and the associated next-hops. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 26, 2012. 38 Copyright Notice 40 Copyright (c) 2011 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 1. Introduction 55 MRT Fast-Reroute requires that packets can be forwarded not only on 56 the shortest-path tree, but also on two Maximally Redundant Trees 57 (MRTs), referred to as the Blue MRT and the Red MRT. A router which 58 experiences a local failure must also have pre-determined which 59 alternate to use. This document describes how to compute these three 60 things and the algorithm design decisions and rationale. The 61 algorithms are based on those presented in [MRTLinear] and expanded 62 in [EnyediThesis]. 64 Just as packets routed on a hop-by-hop basis require that each router 65 compute a shortest-path tree which is consistent, it is necessary for 66 each router to compute the Blue MRT and Red MRT in a consistent 67 fashion. This is the motivation for the detail in this document. 69 As now, a router's FIB will contain primary next-hops for the current 70 shortest-path tree for forwarding traffic. In addition, a router's 71 FIB will contain primary next-hops for the Blue MRT for forwarding 72 received traffic on the Blue MRT and primary next-hops for the Red 73 MRT for forwarding received traffic on the Red MRT. 75 What alternate next-hops a point-of-local-repair (PLR) selects need 76 not be consistent - but loops must be prevented. To reduce 77 congestion, it is possible for multiple alternate next-hops to be 78 selected; in the context of MRT alternates, each of those alternate 79 next-hops would be equal-cost paths. 81 This document provides an algorithm for selecting an appropriate MRT 82 alternate for consideration. Other alternates, e.g. LFAs that are 83 downstream paths, may be prefered when available and that decision- 84 making is not captured in this document. 86 [E]---[D]---| [E]<--[D]<--| [E]-->[D] 87 | | | | ^ | | 88 | | | V | | V 89 [R] [F] [C] [R] [F] [C] [R] [F] [C] 90 | | | ^ ^ | | 91 | | | | | V | 92 [A]---[B]---| [A]-->[B] [A]---[B]<--| 94 (a) (b) (c) 95 a 2-connected graph Blue MRT towards R Red MRT towards R 97 Figure 1 99 Algorithms for computing MRTs can handle arbitrary network topologies 100 where the whole network graph is not 2-connected, as in Figure 2, as 101 well as the easier case where the network graph is 2-connected 102 (Figure 1). Each MRT is a spanning tree. The pair of MRTs provide 103 two paths from every node X to the root of the MRTs. Those paths 104 share the minimum number of nodes and the minimum number of links. 105 Each such shared node is a cut-vertex. Any shared links are cut- 106 links. 108 [E]---[D]---| |---[J] 109 | | | | | 110 | | | | | 111 [R] [F] [C]---[G] | 112 | | | | | 113 | | | | | 114 [A]---[B]---| |---[H] 116 (a) a graph that isn't 2-connected 118 [E]<--[D]<--| |---[J] [E]-->[D] [J] 119 | ^ | | ^ | | 120 V | | V | V | 121 [R] [F] [C]<--[G] | [R] [F] [C]<--[G] | 122 ^ | ^ | | ^ | 123 | | | V | | V 124 [A]-->[B] [H] [A]---[B]<--| |---[H] 126 (b) Blue MRT towards R (c) Red MRT towards R 128 Figure 2 130 2. Terminology and Definitions 132 Redundant Trees (RT): A pair of trees where the path from any node 133 X to the root R along the first tree is node-disjoint with the 134 path from the same node X to the root along the second tree. 135 These can be computed in 2-connected graphs. 137 Maximally Redundant Trees (MRT): A pair of trees where the path 138 from any node X to the root R along the first tree and the path 139 from the same node X to the root along the second tree share the 140 minimum number of nodes and the minimum number of links. Each 141 such shared node is a cut-vertex. Any shared links are cut-links. 142 Any RT is an MRT but many MRTs are not RTs. 144 network graph: A graph that reflects the network topology where all 145 links connect exactly two nodes and broadcast links have been 146 transformed into the standard pseudo-node representation. 148 cut-vertex: A vertex whose removal partitions the network. 150 cut-link: A link whose removal partitions the network. A cut-link 151 by definition must be connected between two cut-vertices. If 152 there are multiple parallel links, then they are referred to as 153 cut-links in this document if removing the set of parallel links 154 would partition the network. 156 2-connected: A graph that has no cut-vertices. This is a graph 157 that requires two nodes to be removed before the network is 158 partitioned. 160 spanning tree: A tree containing links that connects all nodes in 161 the network graph. 163 back-edge: In the context of a spanning tree computed via a depth- 164 first search, a back-edge is a link that connects a descendant of 165 a node x with an ancestor of x. 167 DAG: Directed Acyclic Graph - a graph where all links are directed 168 and there are no cycles in it. 170 ADAG: Almost Directed Acyclic Graph - a graph that, if all links 171 incoming to the root were removed, would be a DAG. 173 2-connected cluster: A maximal set of nodes that are 2-connected. 174 In a network graph with at least one cut-vertex, there will be 175 multiple 2-connected clusters. 177 block: Either a 2-connected cluster, a cut-edge, or an isolated 178 vertex. 180 GADAG: Generalized ADAG - a graph that is the combination of the 181 ADAGs of all blocks. 183 DFS: Depth-First Search 185 DFS ancestor: A node n is a DFS ancestor of x if n is on the DFS- 186 tree path from the DFS root to x. 188 DFS descendant: A node n is a DFS descendant of x if x is on the 189 DFS-tree path from the DFS root to n. 191 ear: A path along not-yet-included-in-the-GADAG nodes that starts 192 at a node that is already-included-in-the-GADAG and that ends at a 193 node that is already-included-in-the-GADAG. The starting and 194 ending nodes may be the same node if it is a cut-vertex. 196 X >> Y or Y << X: Indicates the relationship between X and Y in a 197 partial order, such as found in a GADAG. X >> Y means that X is 198 higher in the partial order than Y. Y << X means that Y is lower 199 in the partial order than X. 201 X > Y or Y < X: Indicates the relationship between X and Y in the 202 total order, such as found via a topological sort. X > Y means 203 that X is higher in the total order than Y. Y < X means that Y is 204 lower in the total order than X. 206 3. Algorithm Key Concepts 208 There are five key concepts that are critical for understanding the 209 algorithms for computing MRTs. The first is the idea of partially 210 ordering the nodes in a network graph with regard to each other and 211 to the GADAG root. The second is the idea of finding an ear of nodes 212 and adding them in the correct direction. The third is the idea of a 213 Low-Point value and how it can be used to identify cut-vertices and 214 to find a second path towards the root. The fourth is the idea that 215 a non-2-connected graph is made up of blocks, where a block is a 216 2-connected cluster, a cut-edge or an isolated node. The fifth is 217 the idea of a local-root for each node; this is used to compute ADAGs 218 in each block. 220 3.1. Partial Ordering for Disjoint Paths 222 Given any two nodes X and Y in a graph, a particular total order 223 means that either X < Y or X > Y in that total order. An example 224 would be a graph where the nodes are ranked based upon their IP 225 loopback addresses. In a partial order, there may be some nodes for 226 which it can't be determined whether X << Y or X >> Y. A partial 227 order can be captured in a directed graph, as shown in Figure 3. In 228 a graphical representation, a link directed from X to Y indicates 229 that X is a neighbor of Y in the network graph and X << Y. 231 [A]<---[R] [E] R << A << B << C << D << E 232 | ^ R << A << B << F << G << H << D << E 233 | | 234 V | Unspecified Relationships: 235 [B]--->[C]--->[D] C and F 236 | ^ C and G 237 | | C and H 238 V | 239 [F]--->[G]--->[H] 241 Figure 3: Directed Graph showing a Partial Order 243 To compute MRTs, it is very useful to have the root of the MRTs be at 244 the very bottom and the very top of the partial ordering. This means 245 that from any node X, one can pick nodes higher in the order until 246 the root is reached. Similarly, from any node X, one can pick nodes 247 lower in the order until the root is reached. For instance, in 248 Figure 4, from G the higher nodes picked can be traced by following 249 the directed links and are H, D, E and R. Similarly, from G the lower 250 nodes picked can be traced by reversing the directed links and are F, 251 B, A, and R. A graph that represents this modified partial order is 252 no longer a DAG; it is termed an Almost DAG (ADAG) because if the 253 links directed to the root were removed, it would be a DAG. 255 [A]<---[R]<---[E] R << A << B << C << R 256 | ^ ^ R << A << B << C << D << E << R 257 | | | R << A << B << F << G << H << D << E << R 258 V | | 259 [B]--->[C]--->[D] Unspecified Relationships: 260 | ^ C and F 261 | | C and G 262 V | C and H 263 [F]--->[G]--->[H] 265 Figure 4: ADAG showing a Partial Order with R lowest and highest 267 Most importantly, if a node Y >> X, then Y can only appear on the 268 increasing path from X to the root and never on the decreasing path. 269 Similarly, if a node Z << X, then Z can only appear on the decreasing 270 path from X to the root and never on the inceasing path. 272 Additionally, when following the increasing paths, it is possible to 273 pick multiple higher nodes and still have the certainty that those 274 paths will be disjoint from the decreasing paths. E.g. in the 275 previous example node B has multiple possibilities to forward packets 276 along an increasing path: it can either forward packets to C or F. 278 3.2. Finding an Ear and the Correct Direction 280 For simplicity, the basic idea of creating a GADAG by adding ears is 281 described assuming that the network graph is a single 2-connected 282 cluster so that an ADAG is sufficient. Generalizing to multiple 283 blocks is done by considering the block-roots instead of the GADAG 284 root - and the actual algorithms given in Section 4.3 and 285 Section 4.4. 287 In order to understand the basic idea of finding an ADAG, first 288 suppose that we have already a partial ADAG, which doesn't contain 289 all the nodes in the block yet, and we want to extend it to cover all 290 the nodes. Suppose that we find a path from a node X to Y such that 291 X and Y are already contained by our partial ADAG, but all the 292 remaining nodes along the path are not added to the ADAG yet. We 293 refer to such a path as an ear. 295 Recall that our ADAG is closely related to a partial order, more 296 precisely, if we remove root R, the remaining DAG describes a partial 297 order of the nodes. If we suppose that neither X nor Y is the root, 298 we may be able to compare them. If one of them is definitely lesser 299 with respect to our partial order (say X<B---| A-->B---| 310 (a) (b) (c) 312 (a) A 2-connected graph 313 (b) Partial ADAG (C is not included) 314 (c) Resulting ADAG after adding path (or ear) B-C-D 316 Figure 5 318 In this partial ADAG, node C is not yet included. However, we can 319 find path B-C-D, where both endpoints are contained by this partial 320 ADAG (we say those nodes are *ready* in the sequel), and the 321 remaining node (node C) is not contained yet. If we remove R, the 322 remaining DAG defines a partial order, and with respect to this 323 partial order we can say that B<C and C->D are added). If 325 B were strictly greater than D, we would add the same path in reverse 326 direction. 328 If in the partial order where an ear's two ends are X and Y, X << Y, 329 then there must already be a directed path from X to Y already in the 330 ADAG. The ear must be added in a direction such that it doesn't 331 create a cycle; therefore the ear must go from X to Y. 333 In the case, when X and Y are not ordered with each other, we can 334 select either direction for the ear. We have no restriction since 335 neither of the directions can result in a cycle. In the corner case 336 when one of the endpoints of an ear, say X, is the root (recall that 337 the two endpoints must be different), we could use both directions 338 again for the ear because the root can be considered both as smaller 339 and as greater than Y. However, we strictly pick that direction in 340 which the root is greater than Y. The logic for this decision is 341 explained in Section 4.6 343 A partial ADAG is started by finding a cycle from the root R back to 344 itself. This can be done by selecting a non-ready neighbor N of R 345 and then finding a path from N to R that doesn't use any links 346 between R and N. The direction of the cycle can be assigned either 347 way since it is starting the ordering. 349 Once a partial ADAG is already present, we can always add ears to it: 351 just select a non-ready neighbor N of a ready node Q, such that Q is 352 not the root, find a path from N to the root in the graph with Q 353 removed. This path is an ear where the first node of the ear is Q, 354 the next is N, then the path until the first ready node the path 355 reached (that second ready node is the other endpoint of the path). 356 Since the graph is 2-connected, there must be a path from N to R 357 without Q. 359 It is always possible to select a non-ready neighbor N of a ready 360 node Q so that Q is not the root R. Because the network is 361 2-connected, N must be connected to two different nodes and only one 362 can be R. Because the initial cycle has already been added to the 363 ADAG, there are ready nodes that are not R. Since the graph is 364 2-connected, while there are non-ready nodes, there must be a non- 365 ready neighbor N of a ready node that is not R. 367 Generic_Find_Ears_ADAG(root) 368 Create an empty ADAG. Add root to the ADAG. 369 Mark root as IN_GADAG. 370 Select the shortest cycle containing root. 371 Add the shortest cycle to the ADAG. 372 Mark cycle's nodes as IN_GADAG. 373 Add cycle's non-root nodes to process_list. 374 while there exists connected nodes in graph that are not IN_GADAG 375 Select a new ear. Let its endpoints be X and Y. 376 if (X is root) or (Y<>X or (c)X, Y not ordered 379 Add the ear towards Y to the ADAG 381 Figure 6: Generic Algorithm to find ears and their direction in 382 2-connected graph 384 Algorithm Figure 6 merely requires that a cycle or ear be selected 385 without specifying how. Regardless of the way of selecting the path, 386 we will get an ADAG. The method used for finding and selecting the 387 ears is important; shorter ears result in shorter paths along the 388 MRTs. There are two options being considered. The Low-Point 389 Inheritance option is described in Section 4.3. The SPF-based option 390 is described in Section 4.4. 392 As an example, consider Figure 5 again. First, we select the 393 shortest cycle containing R, which can be R-A-B-F-D-E (uniform link 394 costs were assumed), so we get to the situation depicted in Figure 5 395 (b). Finally, we find a node next to a ready node; that must be node 396 C and assume we reached it from ready node B. We search a path from C 397 to R without B in the original graph. The first ready node along 398 this is node D, so the open ear is B-C-D. Since B<C and C->D to the ADAG. Since all the nodes are ready, we stop at 400 this point. 402 3.3. Low-Point Values and Their Uses 404 A basic way of computing a spanning tree on a network graph is to run 405 a depth-first-search, such as given in Figure 7. This tree has the 406 important property that if there is a link (x, n), then either n is a 407 DFS ancestor of x or n is a DFS descendant of x. In other words, 408 either n is on the path from the root to x or x is on the path from 409 the root to n. 411 global_variable: dfs_number 413 DFS_Visit(node x, node parent) 414 D(x) = dfs_number 415 dfs_number += 1 416 x.dfs_parent = parent 417 for each link (x, w) 418 if D(w) is not set 419 DFS_Visit(w, x) 421 Run_DFS(node root) 422 dfs_number = 0 423 DFS_Visit(root, NONE) 425 Figure 7: Basic Depth-First Search algorithm 427 Given a node x, one can compute the minimal DFS number of the 428 neighbours of x, i.e. min( D(w) if (x,w) is a link). This gives the 429 highest attachment point neighbouring x. What is interesting, 430 though, is what is the highest attachment point from x and x's 431 descendants. This is what is determined by computing the Low-Point 432 value, as given in Algorithm Figure 9 and illustrated on a graph in 433 Figure 8. 435 [E]---| [J]-------[I] [P]---[O] 436 | | | | | | 437 | | | | | | 438 [R] [D]---[C]--[F] [H]---[K] [N] 439 | | | | | | 440 | | | | | | 441 [A]--------[B] [G]---| [L]---[M] 443 (a) a non-2-connected graph 445 [E]----| [J]---------[I] [P]------[O] 446 (5, ) | (10, ) (9, ) (16, ) (15, ) 447 | | | | | | 448 | | | | | | 449 [R] [D]---[C]---[F] [H]----[K] [N] 450 (0, ) (4, ) (3, ) (6, ) (8, ) (11, ) (14, ) 451 | | | | | | 452 | | | | | | 453 [A]---------[B] [G]----| [L]------[M] 454 (1, ) (2, ) (7, ) (12, ) (13, ) 456 (b) with DFS values assigned (D(x), L(x)) 458 [E]----| [J]---------[I] [P]------[O] 459 (5,0) | (10,3) (9,3) (16,11) (15,11) 460 | | | | | | 461 | | | | | | 462 [R] [D]---[C]---[F] [H]----[K] [N] 463 (0, ) (4,0) (3,0) (6,3) (8,3) (11,11) (14,11) 464 | | | | | | 465 | | | | | | 466 [A]---------[B] [G]----| [L]------[M] 467 (1,0) (2,0) (7,3) (12,11) (13,11) 469 (c) with low-point values assigned (D(x), L(x)) 471 Figure 8 473 global_variable: dfs_number 475 Lowpoint_Visit(node x, node parent, interface p_to_x) 476 D(x) = dfs_number 477 L(x) = D(x) 478 dfs_number += 1 479 x.dfs_parent = parent 480 x.dfs_parent_intf = p_to_x 481 x.lowpoint_parent = NONE 482 for each interface intf of x: 483 if D(intf.remote_node) is not set 484 Lowpoint_Visit(intf.remote_node, x, intf) 485 if L(intf.remote_node) < L(x) 486 L(x) = L(intf.remote_node) 487 x.lowpoint_parent = intf.remote_node 488 x.lowpoint_parent_intf = intf 489 else if intf.remote_node is not parent 490 if D(intf.remote_node) < L(x) 491 L(x) = D(intf.remote) 492 x.lowpoint_parent = intf.remote_node 493 x.lowpoint_parent_intf = intf 495 Run_Lowpoint(node root) 496 dfs_number = 0 497 Lowpoint_Visit(root, NONE, NONE) 499 Figure 9: Computing Low-Point value 501 From the low-point value and lowpoint parent, there are two very 502 useful things which motivate our computation. 504 First, if there is a child c of x such that L(c) >= D(x), then there 505 are no paths in the network graph that go from c or its descendants 506 to an ancestor of x - and therefore x is a cut-vertex. This is 507 useful because it allows identification of the cut-vertices and thus 508 the blocks. As seen in Figure 8, even if L(x) < D(x), there may be a 509 block that contains both the root and a DFS-child of a node while 510 other DFS-children might be in different blocks. In this example, 511 C's child D is in the same block as R while F is not. 513 Second, by repeatedly following the path given by lowpoint_parent, 514 there is a path from x back to an ancestor of x that does not use the 515 link [x, x.dfs_parent] in either direction. The full path need not 516 be taken, but this gives a way of finding an initial cycle and then 517 ears. 519 3.4. Blocks in a Graph 521 A key idea for the MRT algorithm is that any non-2-connected graph is 522 made up by blocks (e.g. 2-connected clusters, cut-links, and/or 523 isolated nodes). To compute GADAGs and thus MRTs, computation is 524 done in each block to compute ADAGs or Redundant Trees and then those 525 ADAGs or Redundant Trees are combined into a GADAG or MRT. 527 [E]---| [J]-------[I] [P]---[O] 528 | | | | | | 529 | | | | | | 530 [R] [D]---[C]--[F] [H]---[K] [N] 531 | | | | | | 532 | | | | | | 533 [A]--------[B] [G]---| [L]---[M] 535 (a) A graph with four blocks that are: 536 3 2-connected clusters and a cut-link 538 [E]<--| [J]<------[I] [P]<--[O] 539 | | | ^ | ^ 540 V | V | V | 541 [R] [D]<--[C] [F] [H]<---[K] [N] 542 ^ | ^ ^ 543 | V | | 544 [A]------->[B] [G]---| [L]-->[M] 546 (b) Blue MRT 548 [E]---| [J]-------->[I] [P]-->[O] 549 | | | 550 V V V 551 [R] [D]-->[C]<---[F] [H]<---[K] [N] 552 ^ | ^ | ^ | 553 | V | | | V 554 [A]<-------[B] [G]<--| [L]<--[M] 556 (c) Red MRT 558 Figure 10 560 Consider the example depicted in Figure 10 (a). In this figure, a 561 special graph is presented, showing us all the ways 2-connected 562 clusters can be connected. It has four blocks: block 1 contains R, 563 A, B, C, D, E, block 2 contains C, F, G, H, I, J, block 3 contains K, 564 L, M, N, O, P, and block 4 is a cut-edge containing H and K. As can 565 be observed, the first two blocks have one common node (node C) and 566 blocks 2 and 3 do not have any common node, but they are connected 567 through a cut-edge that is block 4. No two blocks can have more than 568 one common node, since two blocks with at least 2 common nodes would 569 qualify as a single 2-connected cluster. 571 Moreover, observe that if we want to get from one block to another, 572 we must use a cut-vertex (the cut-vertices in this graph are C, H, 573 K), regardless of the path selected, so we can say that all the paths 574 from block 3 along the MRTs rooted at R will cross K first. This 575 observation means that if we want to find a pair of MRTs rooted at R, 576 then we need to build up a pair of RTs in block 3 with K as a root. 577 Similarly, we need to find another one in block 2 with C as a root, 578 and finally, we need the last one in block 1 with R as a root. When 579 all the trees are selected, we can simply combine them; when a block 580 is a cut-edge (as in block 4), that cut-edge is added in the same 581 direction to both of the trees. The resulting trees are depicted in 582 Figure 10 (b) and (c). 584 Similarly, to create a GADAG it is sufficient to compute ADAGs in 585 each block and connect them. 587 It is necessary, therefore, to identify the cut-vertices, the blocks 588 and identify the appropriate local-root to use for each block. 590 3.5. Determining Local-Root 592 Each node in a network graph has a local-root, which is the cut- 593 vertex (or root) in the same block that is closest to the root. The 594 local-root is used to determine whether two nodes share a common 595 block. 597 Compute_Localroot(node x, node localroot) 598 x.localroot = localroot 599 for each DFS child c 600 if L(c) < D(x) //x is not a cut-vertex 601 Compute_Localroot(c, x.localroot) 602 else 603 mark x as cut-vertex 604 Compute_Localroot(c, x) 606 Compute_Localroot(root, root) 608 Figure 11: A method for computing local-roots 610 There are two different ways of computing the local-root for each 611 node. The stand-alone method is given in Figure 11 and better 612 illustrates the concept. It is used in the second option for 613 computing a GADAG using SPFs. The other method is used in the first 614 option for computing a GADAG using Low-Point inheritance and the 615 essence of it is given in Figure 12. 617 Get the current node, s. 618 Compute an ear from s to a child c 619 and then via lowpoint inheritance, e.g. 620 ( n = c 621 while n is not ready: 622 n = n.lowpoint_parent 623 e = n 624 ) 625 to a ready node e. 626 if s is e 627 s is a cut-vertex 628 x.localroot = s 629 else 630 for each node x in the ear that is not s or e 631 x.localroot = s.localroot 633 Figure 12: Ear-based method for computing local-roots 635 Once the local-roots are known, two nodes X and Y are in a common 636 block if and only if one of the following three conditions apply. 638 o Y's local-root is X's local-root : They are in the same block and 639 neither is the cut-vertex closest to the root. 641 o Y's local-root is X: X is the cut-vertex closest to the root for 642 Y's block 644 o Y is X's local-root: Y is the cut-vertex closest to the root for 645 X's block 647 4. Algorithm Sections 649 This algorithm computes one GADAG that is then used by a router to 650 determine its blue MRT and red MRT next-hops to all destinations. 651 Finally, based upon that information, alternates are selected for 652 each next-hop to each destination. The different parts of this 653 algorithm are described below. These work on a network graph after, 654 for instance, its interfaces are ordered as per Figure 13. 656 1. Select the root to use for the GADAG. [See Section 4.1.] 657 2. Initialize all interfaces to UNDIRECTED. [See Section 4.2.] 659 3. Compute the DFS value,e.g. D(x), and lowpoint value, L(x). [See 660 Figure 9.] 662 4. Construct the GADAG. [See Section 4.3 for Option 1 using 663 Lowpoint Inheritance and Section 4.4 for Option 2 using SPFs.] 665 5. Assign directions to all interfaces that are still UNDIRECTED. 666 [See Section 4.5.] 668 6. From the computing router x, compute the next-hops for the blue 669 MRT and red MRT. [See Section 4.6.] 671 7. Identify alternates for each next-hop to each destination by 672 determining which one of the blue MRT and the red MRT the 673 computing router x should select. [See Section 4.7.] 675 To ensure consistency in computation, it is necessary that all 676 routers order interfaces identically. This is necessary for the DFS, 677 where the selection order of the interfaces to explore results in 678 different trees, and for computing the GADAG, where the selection 679 order of the interfaces to use to form ears can result in different 680 GADAGs. The recommended ordering between two interfaces from the 681 same router x is given in Figure 13. 683 Interface_Compare(interface a, interface b) 684 if a.metric < b.metric 685 return A_LESS_THAN_B 686 if b.metric < a.metric 687 return B_LESS_THAN_A 688 if a.neighbor.loopback_addr < b.neighbor.loopback_addr 689 return A_LESS_THAN_B 690 if b.neighbor.loopback_addr < a.neighbor.loopback_addr 691 return B_LESS_THAN_A 692 // Same metric to same node, so the order doesn't matter anymore. 693 // To have a unique, consistent total order, 694 // tie-break based on ifindex. 695 if a.ifindex < b.ifindex 696 return A_LESS_THAN_B 697 return B_LESS_THAN_A 699 Figure 13: Rules for ranking multiple interfaces. Order is from low 700 to high. 702 4.1. Root Selection 704 The precise mechanism by which routers advertise a priority for the 705 GADAG root is not described in this document. Nor is the algorithm 706 for selecting routers based upon priority described in this document. 708 A network may be partitioned or there may be islands of routers that 709 support MRT fast-reroute. Therefore, the root selected for use in a 710 GADAG must be consistent only across each connected island of MRT 711 fast-reroute support. Before beginning computation, the network 712 graph is reduced to contain only the set of routers that support a 713 compatible MRT fast-reroute. 715 The selection of a GADAG root is done among only those routers in the 716 same MRT fast-reroute island as the computing router x. 717 Additionally, only routers that are not marked as unusable or 718 overloaded (e.g. ISIS overload or [RFC3137]) are eligible for 719 selection as root. 721 4.2. Initialization 723 Before running the algorithm, there is the standard type of 724 initialization to be done, such as clearing any computed DFS-values, 725 lowpoint-values, DFS-parents, lowpoint-parents, any MRT-computed 726 next-hops, and flags associated with algorithm. 728 It is assumed that a regular SPF computation has been run so that the 729 primary next-hops from the computing router to each destination are 730 known. This is required for determining alternates at the last step. 732 Initially, all interfaces must be initialized to UNDIRECTED. Whether 733 they are OUTGOING, INCOMING or both is determined when the GADAG is 734 constructed and augmented. 736 It is possible that some links and nodes will be marked as unusable, 737 whether because of configuration, overload, or due to a transient 738 cause such as [RFC3137]. In the algorithm description, it is assumed 739 that such links and nodes will not be explored or used and no more 740 disussion is given of this restriction. 742 4.3. Option 1: Computing GADAG using lowpoint inheritance 744 The basic idea of this is to find ears from a node x that is already 745 in the GADAG (known as IN_GADAG). There are two methods to find 746 ears; both are required. The first is by going to a not IN_GADAG 747 DFS-child and then following the chain of low-point parents until an 748 IN_GADAG node is found. The second is by going to a not IN_GADAG 749 neighbor and then following the chain of DFS parents until an 750 IN_GADAG node is found. As an ear is found, the associated 751 interfaces are marked based on the direction taken. The nodes in the 752 ear are marked as IN_GADAG. In the algorithm, first the ears via 753 DFS-children are found and then the ears via DFS-neighbors are found. 755 By adding both types of ears when an IN_GADAG node is processed, all 756 ears that connect to that node are found. The order in which the 757 IN_GADAG nodes is processed is, of course, key to the algorithm. The 758 order is a stack of ears so the most recent ear is found at the top 759 of the stack. Of course, the stack stores nodes and not ears, so an 760 ordered list of nodes, from the first node in the ear to the last 761 node in the ear, is created as the ear is explored and then that list 762 is pushed onto the stack. 764 Each ear represents a partial order (see Figure 4) and processing the 765 nodes in order along each ear ensures that all ears connecting to a 766 node are found before a node higher in the partial order has its ears 767 explored. This means that the direction of the links in the ear is 768 always from the node x being processed towards the other end of the 769 ear. Additionally, by using a stack of ears, this means that any 770 unprocessed nodes in previous ears can only be ordered higher than 771 nodes in the ears below it on the stack. 773 In this algorithm that depends upon Low-Point inheritance, it is 774 necessary that every node have a low-point parent that is not itself. 775 If a node is a cut-vertex, that will not yet be the case. Therefore, 776 any nodes without a low-point parent will have their low-point parent 777 set to their DFS parent and their low-point value set to the DFS- 778 value of their parent. This assignment also properly allows an ear 779 to a cut-vertex to start and end at the same node. 781 Finally, the algorithm simultaneously computes each node's local- 782 root, as described in Figure 12. The local-root can be inherited 783 from the node x being processed to the nodes in the ear unless the 784 child of x is a cut-vertex in which case the rest of the nodes in the 785 ear are in a different block than x and have the child of x as their 786 local-root. 788 Construct_GADAG_via_Lowpoint(topology, root) 789 root.IN_GADAG = true 790 Initialize Stack to empty 791 push root onto Stack 792 while (Stack is not empty) 793 x = pop(Stack) 794 foreach interface intf of x 795 if ((intf.remote_node.IN_GADAG == false) and 796 (intf.remote_node.dfs_parent is x)) 797 Construct_Ear(x, Stack, intf, CHILD) 798 foreach interface intf of x 799 if ((intf.remote_node.IN_GADAG == false) and 800 (intf.remote_node.dfs_parent is not x)) 801 Construct_Ear(x, Stack, intf, NEIGHBOR) 803 Construct_Ear(x, Stack, intf, type) 804 ear_list = empty 805 cur_node = intf.remote_node 806 cur_intf = intf 808 while cur_node.IN_GADAG is false 809 cur_intf.UNDIRECTED = false 810 cur_intf.OUTGOING = true 811 cur_intf.remote_intf.UNDIRECTED = false 812 cur_intf.remote_intf.INCOMING = true 813 cur_node.IN_GADAG = true 814 add_to_list_end(ear_list, cur_node) 816 if type is CHILD 817 cur_intf = cur_node.lowpoint_parent_intf 818 else type must be NEIGHBOR 819 cur_intf = cur_node.dfs_parent_intf 820 cur_node = cur_intf.remote_node 822 if (type is CHILD) and (cur_node is x) 823 localroot = x 824 else 825 localroot = x.localroot 826 while ear_list is not empty 827 y = remove_end_item_from_list(ear_list) 828 y.localroot = localroot 829 push(Stack, y) 831 Figure 14: Low-point Inheritance GADAG algorithm 833 4.4. Option 2: Computing GADAG using SPFs 835 The basic idea in this option is to use slightly-modified SPF 836 computations to find ADAGs in each block. In each block, an SPF 837 computation is first done to find a cycle from the local root and 838 then SPF computations find ears until there are no more interfaces to 839 be explored. The used result from the SPF computation is the path of 840 interfaces indicated by following the previous hops from the 841 mininized IN_GADAG node back to the SPF root. 843 To do this, first all cut-vertices must be identified and local-roots 844 assigned as specified in Figure 11 846 The slight modifications to the SPF are as follows. The root of the 847 block is referred to as the block-root; it is either the GADAG root 848 or a cut-vertex. 850 a. The SPF is rooted at a neighbor x of an IN_GADAG node y. All 851 links between y and x are marked as TEMP_UNUSABLE. They should 852 not be used during the SPF computation. 854 b. If y is not the block-root, then it is marked TEMP_UNUSABLE. It 855 should not be used during the SPF computation. This prevents 856 ears from starting and ending at the same node and avoids cycles; 857 the exception is because cycles to/from the block-root are 858 acceptable and expected. 860 c. Do not explore links to nodes whose local-root is not the block- 861 root. This keeps the SPF confined to the particular block. 863 d. Terminate when the first IN_GADAG node z is minimized. 865 e. Respect the existing directions (e.g. INCOMING, OUTGOING, 866 UNDIRECTED) already specified for each interface. 868 Mod_SPF(spf_root, block_root) 869 Initialize spf_heap to empty 870 Initialize nodes' spf_metric to infinity 871 spf_root.spf_metric = 0 872 insert(spf_heap, spf_root) 873 found_in_gadag = false 874 while (spf_heap is not empty) and (found_in_gadag is false) 875 min_node = remove_lowest(spf_heap) 876 if min_node.IN_GADAG is true 877 found_in_gadag = true 878 else 879 foreach interface intf of min_node 880 if ((intf.OUTGOING or intf.UNDIRECTED) and 881 (intf.remote_node.localroot is block_root) and 882 (intf.remote_node is not TEMP_UNUSABLE)) 883 path_metric = min_node.spf_metric + intf.metric 884 if path_metric < intf.remote_node.spf_metric 885 intf.remote_node.spf_metric = path_metric 886 intf.remote_node.spf_prev_intf = intf 887 insert_or_update(spf_heap, intf.remote_node) 888 return min_node 890 SPF_for_Ear(spf_root, block_root, ear_list, cut_vertex_list) 891 end_ear = Mod_SPF(spf_root, block_root) 892 y = end_ear.spf_prev_hop 893 while y.local_node is not spf_root 894 add_to_list_start(cut_vertex_list, y) 895 if y.local_node is a cut-vertex 896 add_to_list_end(cut_vertex_list, y.local_node) 897 y = y.local_node.spf_prev_intf 899 Figure 15: Modified SPF for GADAG computation 901 In Figure 15, while the path is determined, any non-end node in the 902 path that is a cut-vertex is added to the list of cut-vertices. This 903 ensures that there is a path from the GADAG root to that cut-vertex 904 before adding it to the list of nodes. All such cut-vertices will be 905 treated as the root of a block and the ADAG in that block will be 906 computed. 908 Assume that an ear is found by going from y to x and then running an 909 SPF that terminates by minimizing z (e.g. y<->x...q<->z). Now it is 910 necessary to determine the direction of the ear; if y << z, then the 911 path should be y->x...q->z but if y >> z, then the path should be 912 y<-x...q<-z. In Section 4.3, the same problem was handled by finding 913 all ears that started at a node before looking at ears starting at 914 nodes higher in the partial order. In this algorithm, using that 915 approach could mean that new ears aren't added in order of their 916 total cost since all ears connected to a node would need to be found 917 before additional nodes could be found. 919 The alternative is to track the order relationship of each node with 920 respect to every other node. This can be accomplished by maintaining 921 two sets of nodes at each node. The first set, Higher_Nodes, 922 contains all nodes that are known to be ordered above the node. The 923 second set, Lower_Nodes, contains all nodes that are known to be 924 ordered below the node. This is the approach used in this algorithm. 926 Set_Ear_Direction(ear_list, end_a, end_b, block_root) 927 // Default of A_TO_B for the following cases: 928 // (a) end_a and end_b are the same (root) 929 // or (b) end_a is in end_b's Lower Nodes 930 // or (c) end_a and end_b were unordered with respect to each 931 // other 932 direction = A_TO_B 933 if (end_a is block_root) and (end_a is not end_b) 934 direction = B_TO_A 935 else if end_a is in end_b.Higher_Nodes 936 direction = B_TO_A 937 if direction is B_TO_A 938 foreach interface i in ear_list 939 i.UNDIRECTED = false 940 i.INCOMING = true 941 i.remote_intf.UNDIRECTED = false 942 i.remote_intf.OUTGOING = true 943 else 944 foreach interface i in ear_list 945 i.UNDIRECTED = false 946 i.OUTGOING = true 947 i.remote_intf.UNDIRECTED = false 948 i.remote_intf.INCOMING = true 949 if end_a is end_b 950 return 951 // Next, update all nodes' Lower_Nodes and Higher_Nodes 952 if (end_a is in end_b.Higher_Nodes) 953 foreach node x where x.localroot is block_root 954 if end_a is in x.Lower_Nodes 955 foreach interface i in ear_list 956 add i.remote_node to x.Lower_Nodes 957 if end_b is in x.Higher_Nodes 958 foreach interface i in ear_list 959 add i.local_node to x.Higher_Nodes 960 else 961 foreach node x where x.localroot is block_root 962 if end_b is in x.Lower_Nodes 963 foreach interface i in ear_list 964 add i.local_node to x.Lower_Nodes 965 if end_a is in x.Higher_Nodes 966 foreach interface i in ear_list 967 add i.remote_node to x.Higher_Nodes 969 Figure 16: Algorithm to assign links of an ear direction 971 A goal of the algorithm is to find the shortest cycles and ears. An 972 ear is started by going to a neighbor x of an IN_GADAG node y. The 973 path from x to an IN_GADAG node is minimal, since it is computed via 974 SPF. Since a shortest path is made of shortest paths, to find the 975 shortest ears requires reaching from the set of IN_GADAG nodes to the 976 closest node that isn't IN_GADAG. Therefore, an ordered tree is 977 maintained of interfaces that could be explored from the IN_GADAG 978 nodes. The interfaces are ordered by their characteristics of 979 metric, local loopback address, remote loopback address, and ifindex, 980 as in the algorithm previously described in Figure 13. 982 Finally, cut-edges are a special case because there is no point in 983 doing an SPF on a block of 2 nodes. The algorithm identifies cut- 984 edges simply as links where both ends of the link are cut-vertices. 985 Cut-edges can simply be added to the GADAG with both OUTGOING and 986 INCOMING specified on their interfaces. 988 Construct_GADAG_via_SPF(topology, root) 989 Compute_Localroot(root, root) 990 if root has multiple DFS-children 991 mark root as a cut-vertex 992 Initialize cut_vertex_list to empty 993 Initialize ordered_intfs_tree to empty 994 add_to_list_end(cut_vertex_list, root) 995 while cut_vertex_list is not empty 996 v = remove_start_item_from_list(cut_vertex_list) 997 foreach interface intf of v 998 if intf.remote_node is a cut-vertex 999 // Special case for cut-edges 1000 intf.UNDIRECTED = false 1001 intf.remote_intf.UNDIRECTED = false 1002 intf.OUTGOING = true 1003 intf.INCOMING = true 1004 intf.remote_intf.OUTGOING = true 1005 intf.remote_intf.INCOMING = true 1006 else if intf.remote_node.localroot is v 1007 insert(ordered_intfs_tree, intf) 1008 v.IN_GADAG = true 1009 while ordered_intfs_trees is not empty 1010 cand_intf = remove_lowest(ordered_intfs_tree) 1011 if cand_intf.remote_node.IN_GADAG is false 1012 Mark all interfaces between cand_intf.remote_node 1013 and cand_intf.local_node as TEMP_UNUSABLE 1014 if cand_intf.local_node is not v 1015 Mark cand_intf.local_node as TEMP_UNUSABLE 1016 Initialize ear_list to empty 1017 ear_end = SPF_for_Ear(cand_intf.remote_node, v, ear_list, 1018 cut_vertex_list) 1019 add_to_list_start(ear_list, cand_intf) 1020 Set_Ear_Direction(ear_list, cand_intf.remote, ear_end, v) 1021 Clear TEMP_UNUSABLE from all interfaces between 1022 cand_intf.remote_node and cand_intf.local_node 1023 Clear TEMP_UNUSABLE from cand_intf.local_node 1025 Figure 17: SPF-based GADAG algorithm 1027 4.5. Augmenting the GADAG by directing all links 1029 The GADAG, whether constructed via Low-Point Inheritance or with 1030 SPFs, at this point could be used to find MRTs but the topology does 1031 not include all links in the network graph. That has two impacts. 1032 First, there might be shorter paths that respect the GADAG partial 1033 ordering and so the alternate paths would not be as short as 1034 possible. Second, there may be additional paths between a router x 1035 and the root that are not included in the GADAG. Including those 1036 provides potentially more bandwidth to traffic flowing on the 1037 alternates and may reduce congestion compared to just using the GADAG 1038 as currently constructed. 1040 The goal is thus to assign direction to every remaining link marked 1041 as UNDIRECTED to improve the paths and number of paths found when the 1042 MRTs are computed. 1044 To do this, we need to establish a total order that respects the 1045 partial order described by the GADAG. This can be done using Kahn's 1046 topological sort[Kahn_1962_topo_sort] which essentially assigns a 1047 number to a node x only after all nodes before it (e.g. with a link 1048 incoming to x) have had their numbers assigned. The only issue with 1049 the topological sort is that it works on DAGs and not ADAGs or 1050 GADAGs. 1052 To convert a GADAG to a DAG, it is necessary to remove all links that 1053 point to a root of block from within that block. That provides the 1054 necessary conversion to a DAG and then a topological sort can be 1055 done. Finally, all UNDIRECTED links are assigned a direction based 1056 upon the partial ordering. Any UNDIRECTED links that connect to a 1057 root of a block from within that block are assigned a direction 1058 INCOMING to that root. The exact details of this whole process are 1059 captured in Figure 18 1061 Set_Block_Root_Incoming_Links(topo, root, mark_or_clear) 1062 foreach node x in topo 1063 if node x is a cut-vertex or root 1064 foreach interface i of x 1065 if (i.remote_node.localroot is x) 1066 if i.UNDIRECTED 1067 i.INCOMING = true 1068 i.remote_intf.OUTGOING = true 1069 i.UNDIRECTED = false 1070 i.remote_intf.UNDIRECTED = false 1071 if i.INCOMING 1072 if mark_or_clear is mark 1073 i.TEMP_UNUSABLE = true 1074 i.remote_intf.TEMP_UNUSABLE = true 1076 Run_Topological_Sort(topo, root) 1077 foreach node x 1078 set x.unvisited to the count of x's incoming interfaces 1079 that aren't marked TEMP_UNUSABLE 1080 Initialize working_list to empty 1081 Initialize topo_order_list to empty 1082 add_to_list_end(working_list, root) 1083 while working_list is not empty 1084 y = remove_start_item_from_list(working_list) 1085 add_to_list_end(topo_order_list, y) 1086 foreach interface i of y 1087 if (i.OUTGOING) and (not i.TEMP_UNUSABLE) 1088 i.remote_node.unvisited -= 1 1089 if i.remote_node.unvisited is 0 1090 add_to_list_end(working_list, i.remote_node) 1091 next_topo_order = 1 1092 while topo_order_list is not empty 1093 y = remove_start_item_from_list(topo_order_list) 1094 y.topo_order = next_topo_order 1095 next_topo_order += 1 1097 Add_Undirected_Links(topo, root) 1098 Set_Block_Root_Incoming_Links(topo, root, MARK) 1099 Run_Topological_Sort(topo, root) 1100 Set_Block_Root_Incoming_Links(topo, root, CLEAR) 1101 foreach node x in topo 1102 foreach interface i of x 1103 if i.UNDIRECTED 1104 if x.topo_order < i.remote_node.topo_order 1105 i.OUTGOING = true 1106 i.UNDIRECTED = false 1107 i.remote_intf.INCOMING = true 1108 i.remote_intf.UNDIRECTED = false 1109 else 1110 i.INCOMING = true 1111 i.UNDIRECTED = false 1112 i.remote_intf.OUTGOING = true 1113 i.remote_intf.UNDIRECTED = false 1115 Add_Undirected_Links(topo, root) 1117 Figure 18: Assigning direction to UNDIRECTED links 1119 4.6. Compute MRT next-hops 1121 As was discussed in Section 3.1, once a ADAG is found, it is 1122 straightforward to find the next-hops from any node X to the ADAG 1123 root. However, in this algorithm, we want to reuse the common GADAG 1124 and find not only one pair of redundant trees with it, but a pair 1125 rooted at each node. This is ideal, since it is faster and it 1126 results packet forwarding easier to trace and/or debug. The method 1127 for doing that is based on two basic ideas. First, if two nodes X 1128 and Y are ordered with respect to each other in the partial order, 1129 then the same SPF and reverse-SPF can be used to find the increasing 1130 and decreasing paths. Second, if two nodes X and Y aren't ordered 1131 with respect to each other in the partial order, then intermediary 1132 nodes can be used to create the paths by increasing/decreasing to the 1133 intermediary and then decreasing/increasing to reach Y. 1135 As usual, the two basic ideas will be discussed assuming the network 1136 is two-connected. The generalization to multiple blocks is discussed 1137 in Section 4.6.4. The full algorithm is given in Section 4.6.5. 1139 4.6.1. MRT next-hops to all nodes partially ordered with respect to the 1140 computing node 1142 To find two node-disjoint paths from the computing router X to any 1143 node Y, depends upon whether Y >> X or Y << X. As shown in Figure 19, 1144 if Y >> X, then there is an increasing path that goes from X to Y 1145 without crossing R; this contains nodes in the interval [X,Y]. There 1146 is also a decreasing path that decreases towards R and then decreases 1147 from R to Y; this contains nodes in the interval [X,R-small] or 1148 [R-great,Y]. The two paths cannot have common nodes other than X and 1149 Y. 1151 [Y]<---(Cloud 2)<--- [X] 1152 | ^ 1153 | | 1154 V | 1155 (Cloud 3)--->[R]--->(Cloud 1) 1157 Blue MRT path: X->Cloud 2->Y 1158 Red MRT path: X->Cloud 1->R->Cloud 3->Y 1160 Figure 19: Y >> X 1162 Similar logic applies if Y << X, as shown in Figure 20. In this 1163 case, the increasing path from X increases to R and then increases 1164 from R to Y to use nodes in the intervals [X,R-great] and [R-small, 1165 Y]. The decreasing path from X reaches Y without crossing R and uses 1166 nodes in the interval [Y,X]. 1168 [X]<---(Cloud 2)<--- [Y] 1169 | ^ 1170 | | 1171 V | 1172 (Cloud 3)--->[R]--->(Cloud 1) 1174 Blue MRT path: X->Cloud 3->R->Cloud 1->Y 1175 Red MRT path: X->Cloud 2->Y 1176 Figure 20: Y << X 1178 4.6.2. MRT next-hops to all nodes not partially ordered with respect to 1179 the computing node 1181 When X and Y are not ordered, the first path should increase until we 1182 get to a node G, where G >> Y. At G, we need to decrease to Y. The 1183 other path should be just the opposite: we must decrease until we get 1184 to a node H, where H << Y, and then increase. Since R is smaller and 1185 greater than Y, such G and H must exist. It is also easy to see that 1186 these two paths must be node disjoint: the first path contains nodes 1187 in interval [X,G] and [Y,G], while the second path contains nodes in 1188 interval [H,X] and [H,Y]. This is illustrated in Figure 21. It is 1189 necessary to decrease and then increase for the Blue MRT and increase 1190 and then decrease for the Red MRT; if one simply increased for one 1191 and decreased for the other, then both paths would go through the 1192 root R. 1194 (Cloud 6)<---[Y]<---(Cloud 5)<------------| 1195 | | 1196 | | 1197 V | 1198 [G]--->(Cloud 4)--->[R]--->(Cloud 1)--->[H] 1199 ^ | 1200 | | 1201 | | 1202 (Cloud 3)<---[X]<---(Cloud 2)<-----------| 1204 Blue MRT path: decrease to H and increase to Y 1205 X->Cloud 2->H->Cloud 5->Y 1206 Red MRT path: increase to G and decrease to Y 1207 X->Cloud 3->G->Cloud 6->Y 1209 Figure 21: X and Y unordered 1211 This gives disjoint paths as long as G and H are not the same node. 1212 Since G >> Y and H << Y, if G and H could be the same node, that 1213 would have to be the root R. This is not possible because there is 1214 only one out-going interface from the root R which is created when 1215 the initial cycle is found. Recall from Figure 6 that whenever an 1216 ear was found to have an end that was the root R, the ear was 1217 directed towards R so that the associated interface on R is incoming 1218 and not outgoing. Therefore, there must be exactly one node M which 1219 is the smallest one after R, so the Blue MRT path will never reach R; 1220 it will turn at M and increase to Y. 1222 4.6.3. Computing Redundant Tree next-hops in a 2-connected Graph 1224 The basic ideas for computing RT next-hops in a 2-connected graph 1225 were given in Section 4.6.1 and Section 4.6.2. Given these two 1226 ideas, how can we find the trees? 1228 If some node X only wants to find the next-hops (which is usually the 1229 case for IP networks), it is enough to find which nodes are greater 1230 and less than X, and which are not ordered; this can be done by 1231 running an SPF and a reverse-SPF rooted at X and not exploring any 1232 links from the ADAG root. ( Other traversal algorithms could safely 1233 be used instead where one traversal takes the links in their given 1234 directions and the other reverses the links' directions.) 1236 An SPF rooted at X and not exploring links from the root will find 1237 the increasing next-hops to all Y >> X. Those increasing next-hops 1238 are X's next-hops on the Blue MRT to reach Y. A reverse-SPF rooted at 1239 X and not exploring links from the root will find the decreasing 1240 next-hops to all Z << X. Those decreasing next-hops are X's next-hops 1241 on the Red MRT to reach Z. Since the root R is both greater than and 1242 less than X, after this SPF and reverse-SPF, X's next-hops on the 1243 Blue MRT and on the Red MRT to reach R are known. For every node Y 1244 >> X, X's next-hops on the Red MRT to reach Y are set to those on the 1245 Red MRT to reach R. For every node Z << X, X's next-hops on the Blue 1246 MRT to reach Z are set to those on the Blue MRT to reach R. 1248 For those nodes, which were not reached, we have the next-hops as 1249 well. The increasing Blue MRT next-hop for a node, which is not 1250 ordered, is the next-hop along the decreasing Red MRT towards R and 1251 the decreasing Red MRT next-hop is the next-hop along the increasing 1252 Blue MRT towards R. Naturally, since R is ordered with respect to all 1253 the nodes, there will always be an increasing and a decreasing path 1254 towards it. This algorithm does not provide the specific path taken 1255 but only the appropriate next-hops to use. The identity of G and H 1256 is not determined. 1258 The final case to considered is when the root R computes its own 1259 next-hops. Since the root R is << all other nodes, running an SPF 1260 rooted at R will reach all other nodes; the Blue MRT next-hops are 1261 those found with this SPF. Similarly, since the root R is >> all 1262 other nodes, running a reverse-SPF rooted at R will reach all other 1263 nodes; the Red MRT next-hops are those found with this reverse-SPF. 1265 E---D---| E<--D<--| 1266 | | | | ^ | 1267 | | | V | | 1268 R F C R F C 1269 | | | | ^ ^ 1270 | | | V | | 1271 A---B---| A-->B---| 1273 (a) (b) 1274 A 2-connected graph A spanning ADAG rooted at R 1276 Figure 22 1278 As an example consider the situation depicted in Figure 22. There 1279 node C runs an SPF and a reverse-SPF The SPF reaches D, E and R and 1280 the reverse SPF reaches B, A and R. So we immediately get that e.g. 1281 towards E the increasing next-hop is D (it was reached though D), and 1282 the decreasing next-hop is B (since R was reached though B). Since 1283 both D and B, A and R will compute the next hops similarly, the 1284 packets will reach E. 1286 We have the next-hops towards F as well: since F is not ordered with 1287 respect to C, the increasing next-hop is the decreasing one towards R 1288 (which is B) and the decreasing next-hop is the increasing one 1289 towards R (which is D). Since B is ordered with F, it will find a 1290 real increasing next-hop, so packet forwarded to B will get to F on 1291 path C-B-F. Similarly, D will have a real decreasing next-hop, and 1292 packet will use path C-D-F. 1294 4.6.4. Generalizing for graph that isn't 2-connected 1296 If a graph isn't 2-connected, then the basic approach given in 1297 Section 4.6.3 needs some extensions to determine the appropriate MRT 1298 next-hops to use for destinations outside the computing router X's 1299 blocks. In order to find a pair of maximally redundant trees in that 1300 graph we need to find a pair of RTs in each of the blocks (the root 1301 of these trees will be discussed later), and combine them. 1303 When computing the MRT next-hops from a router X, there are three 1304 basic differences: 1306 1. Only nodes in a common block with X should be explored in the SPF 1307 and reverse-SPF. 1309 2. Instead of using the GADAG root, X's local-root should be used. 1310 This has the following implications: 1312 A. The links from X's local-root should not be explored. 1314 B. If a node is explored in the increasing SPF so Y >> X, then 1315 X's Red MRT next-hops to reach Y uses X's Red MRT next-hops 1316 to reach X's local-root and if Z <<, then X's Blue MRT next- 1317 hops to reach Z uses X's Blue MRT next-hops to reach X's 1318 local-root. 1320 C. If a node W in a common block with X was not reached in the 1321 SPF or reverse-SPF, then W is unordered with respect to X. 1322 X's Blue MRT next-hops to W are X's decreasing aka Red MRT 1323 next-hops to X's local-root. X's Red MRT next-hops to W are 1324 X's increasing aka Blue MRT next-hops to X's local-root. 1326 3. For nodes in different blocks, the next-hops must be inherited 1327 via the relevant cut-vertex. 1329 These are all captured in the detailed algorithm given in 1330 Section 4.6.5. 1332 4.6.5. Complete Algorithm to Compute MRT Next-Hops 1334 The complete algorithm to compute MRT Next-Hops for a particular 1335 router X is given in Figure 23. In addition to computing the Blue 1336 MRT next-hops and Red MRT next-hops used by X to reach each node Y, 1337 the algorithm also stores an "order_proxy", which is the proper cut- 1338 vertex to reach Y if it is outside the block, and which is used later 1339 in deciding whether the Blue MRT or the Red MRT can provide an 1340 acceptable alternate for a particular primary next-hop. 1342 In_Common_Block(x, y) 1343 if ((x.localroot is y.localroot) or (x is y.localroot) or 1344 (y is x.localroot)) 1345 return true 1346 return false 1348 Store_Results(y, direction, spf_root) 1349 if direction is FORWARD 1350 y.higher = true 1351 y.blue_next_hops = y.next_hops 1352 if direction is REVERSE 1353 y.lower = true 1354 y.red_next_hops = y.next_hops 1356 SPF_No_Traverse_Root(spf_root, block_root, direction) 1357 Initialize spf_heap to empty 1358 Initialize nodes' spf_metric to infinity and next_hops to empty 1359 spf_root.spf_metric = 0 1360 insert(spf_heap, spf_root) 1361 while (spf_heap is not empty) 1362 min_node = remove_lowest(spf_heap) 1363 Store_Results(min_node, direction, spf_root) 1364 if min_node is not block_root 1365 foreach interface intf of min_node 1366 if (((direction is FORWARD) and intf.OUTGOING) or 1367 ((direction is REVERSE) and intf.INCOMING) and 1368 In_Common_Block(spf_root, intf.remote_node)) 1369 if direction is FORWARD 1370 path_metric = min_node.spf_metric + intf.metric 1371 else 1372 path_metric = min_node.spf_metric + 1373 intf.remote_intf.metric 1374 if path_metric < intf.remote_node.spf_metric 1375 intf.remote_node.spf_metric = path_metric 1376 if min_node is spf_root 1377 intf.remote_node.next_hops = make_list(intf) 1378 else 1379 intf.remote_node.next_hops = min_node.next_hops 1380 insert_or_update(spf_heap, intf.remote_node) 1381 else if path_metric is intf.remote_node.spf_metric 1382 if min_node is spf_root 1383 add_to_list(intf.remote_node.next_hops, intf) 1384 else 1385 add_list_to_list(intf.remote_node.next_hops, 1386 min_node.next_hops) 1388 SetEdge(y) 1389 if y.blue_next_hops is empty and y.red_next_hops is empty 1390 SetEdge(y.localroot) 1391 y.blue_next_hops = y.localroot.blue_next_hops 1392 y.red_next_hops = y.localroot.red_next_hops 1393 y.order_proxy = y.localroot.order_proxy 1395 Compute_MRT_NextHops(x, root) 1396 foreach node y 1397 y.higher = y.lower = false 1398 clear y.red_next_hops and y.blue_next_hops 1399 y.order_proxy = y 1400 SPF_No_Traverse_Root(x, x.localroot, FORWARD) 1401 SPF_No_Traverse_Root(x, x.localroot, REVERSE) 1403 // red and blue next-hops are stored to x.localroot as different 1404 // paths are found via the SPF and reverse-SPF. 1405 // Similarly any nodes whose local-root is x will have their 1406 // red_next_hops and blue_next_hops already set. 1408 // Handle nodes in the same block that aren't the local-root 1409 foreach node y 1410 if (y is not x) and (y.localroot is x.localroot) 1411 if y.higher 1412 y.red_next_hops = x.localroot.red_next_hops 1413 else if y.lower 1414 y.blue_next_hops = x.localroot.blue_next_hops 1415 else 1416 y.blue_next_hops = x.localroot.red_next_hops 1417 y.red_next_hops = x.localroot.blue_next_hops 1419 // Inherit next-hops and order_proxies to other components 1420 if x is not root 1421 root.blue_next_hops = x.localroot.blue_next_hops 1422 root.red_next_hops = x.localroot.red_next_hops 1423 root.order_proxy = x.localroot 1424 foreach node y 1425 if (y is not root) and (y is not x) 1426 SetEdge(y) 1428 Copute_RT_NextHops(x, root) 1430 Figure 23 1432 4.7. Identify MRT alternates 1434 At this point, a computing router S knows its Blue MRT next-hops and 1435 Red MRT next-hops for each destination. The primary next-hops along 1436 the SPT are also known. It remains to determine for each primary 1437 next-hop to a destination D, which of the MRTs avoids the primary 1438 next-hop node F. This computation depends upon data set in 1439 Compute_MRT_NextHops such as each node y's y.blue_next_hops, 1440 y.red_next_hops, y.order_proxy, y.higher, y.lower and topo_orders. 1441 Recall that any router knows only which are the nodes greater and 1442 lesser than itself, but it cannot decide the relation between any two 1443 given nodes easily; that is why we need topological ordering. 1445 For each primary next-hop node F to each destination D, S can call 1446 Select_Alternates(S, D, F) to determine whether to use the Blue MRT 1447 next-hops as the alternate next-hop(s) for that primary next-hop or 1448 to use the Red MRT next-hops. The algorithm is given in Figure 24 1449 and discussed afterwards. 1451 Select_Alternates(S, D, F) 1452 if D.order_proxy is not D 1453 D_lower = D.order_proxy.lower 1454 D_higher = D.order_proxy.higher 1455 D_topo_order = D.order_proxy.topo_order 1456 else 1457 D_lower = D.lower 1458 D_higher = D.higher 1459 D_topo_order = D.topo_order 1461 if D_higher 1462 if F.higher 1463 if F.topo_order < D_topo_order 1464 return USE_RED 1465 else 1466 return USE_BLUE 1467 else if F.lower 1468 return USE_BLUE 1469 else 1470 // F and S are neighbors so either F << S or F >> S 1471 else if D_lower 1472 if F.higher 1473 return USE_RED 1474 else if F.lower 1475 if F.topo_order < D_topo_order 1476 return USE_RED 1477 else 1478 return USE_BLUE 1479 else 1480 // F and S are neighbors so either F << S or F >> S 1481 else // D and S not ordered 1482 if F.lower 1483 return USE_BLUE 1484 else if F.upper 1485 return USE_RED 1486 else 1487 // F and S are neighbors so either F << S or F >> S 1489 Figure 24 1491 If either D>>S>>F or D<>D (ii) F<D.topo_order, either case (i) or 1502 case (iii) holds true, which means that selecting the Blue next-hop 1503 is safe. Similarly, if F.topo_order>S, we should use the Blue next-hop. 1515 As an example consider the ADAG depicted in Figure 25 and first 1516 suppose that G is the source, D is the destination and H is the 1517 failed next-hop. Since D>>G, we need to compare H.topo_order and 1518 D.topo_order. Since D.topo_order>H.topo_order D must be not smaller 1519 than H, so we should select the decreasing path towards the root. 1520 If, however, the destination were instead J, we must find that 1521 H.topo_order>J.topo_order, so we must choose the increasing Blue 1522 next-hop to J, which is I. In the case, when instead the destination 1523 is C, we find that we need first decrease to avoid using H, so the 1524 Blue, first decreasing then increasing, path is selected. 1526 [E]<-[D]<-[H]<-[J] 1527 | ^ ^ | 1528 V | | | 1529 [R] [C] [G]->[I] 1530 | ^ ^ ^ 1531 V | | | 1532 [A]->[B]->[F]---| 1534 (a) 1535 a 2-connected graph 1537 Figure 25 1539 5. Algorithm Alternatives and Evaluation 1541 This description of the algorithm assumes a particular approach that 1542 is believed to be a reasonable compromise between complexity and 1543 computation. There are two options given for constructing the GADAG 1544 as both are reasonable and promising. 1546 SPF-based GADAG Compute the common GADAG using Option 2 of SPF-based 1547 inheritance. This considers metrics when constructing the GADAG, 1548 which is important for path length and operational control. It 1549 has higher computational complexity than the Low-Point Inheritance 1550 GADAG. 1552 Low-Point Inheritance GADAG Compute the common GADAG using Option 1 1553 of Low-Point Inheritance. This ignores metrics when constructing 1554 the GADAG, but its computational complexity is O(links) which is 1555 attractive. It is possible that augmenting the GADAG by assigning 1556 directions to all links in the network graph and adding them to 1557 the GADAG will make the difference between this and the SPF-based 1558 GADAG minimal. 1560 In addition, it is possible to calculate Destination-Rooted GADAG, 1561 where for each destination, a GADAG rooted at that destination is 1562 computed. The GADAG can be computed using either Low-Point 1563 Inheritance or SPF-based. Then a router would need to compute the 1564 blue MRT and red MRT next-hops to that destination. Building GADAGs 1565 per destination is computationally more expensive, but may give 1566 somewhat shorter alternate paths. It may be useful for live-live 1567 multicast along MRTs. 1569 5.1. Algorithm Evaluation 1571 When evaluating different algorithms and methods for IP Fast Reroute 1572 [RFC5714], there are three critical points to consider. 1574 o Coverage: For every Point of Local Repair (PLR) and local failure, 1575 is there an alternate to reach every destination? Those 1576 destinations include not only routers in the IGP area, but also 1577 prefixes outside the IGP area. 1579 o Alternate Length: What is the length of the alternate path offered 1580 compared to the optimal alternate route in the network? This is 1581 computed as the total length of the alternate path divided by the 1582 length of an optimal alternate path. The optimal alternate path 1583 is computed by removing the failed node and running an SPF to find 1584 the shortest path from the PLR to the destination. 1586 o Alternate Bandwidth: What percentage of the traffic sent to the 1587 failed point can be sent on the alternates? This is computed as 1588 the sum of the bandwidths along the alternate paths divided by the 1589 bandwidth of the primary paths that go through the failure point. 1591 Simulation and modeling to evalute the MRT algorithms is underway. 1592 The algorithms being compared are: 1594 o SPF-based GADAG 1596 o Low-Point Inheritance GADAG 1598 o Destination-Rooted SPF-based GADAG 1600 o Destination-Rooted Low-Point Inheritance GADAG 1602 o Not-Via to Next-Next Hop[I-D.ietf-rtgwg-ipfrr-notvia-addresses] 1604 o Loop-Free Alternates[RFC5286] 1606 o Remote LFAs[I-D.shand-remote-lfa] 1608 6. IANA Considerations 1610 This doument includes no request to IANA. 1612 7. Security Considerations 1614 This architecture is not currently believed to introduce new security 1615 concerns. 1617 8. References 1619 8.1. Normative References 1621 [I-D.atlas-rtgwg-mrt-frr-architecture] 1622 Atlas, A., Konstantynowicz, M., Envedi, G., Csaszar, A., 1623 White, R., and M. Shand, "An Architecture for IP/LDP Fast- 1624 Reroute Using Maximally Redundant Trees", 1625 draft-atlas-rtgwg-mrt-frr-architecture-00 (work in 1626 progress), July 2011. 1628 8.2. Informative References 1630 [EnyediThesis] 1631 Enyedi, G., "Novel Algorithms for IP Fast Reroute", 1632 Department of Telecommunications and Media Informatics, 1633 Budapest University of Technology and Economics Ph.D. 1634 Thesis, February 2011, 1635 . 1637 [I-D.ietf-rtgwg-ipfrr-notvia-addresses] 1638 Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute 1639 Using Not-via Addresses", 1640 draft-ietf-rtgwg-ipfrr-notvia-addresses-07 (work in 1641 progress), April 2011. 1643 [I-D.ietf-rtgwg-lfa-applicability] 1644 Filsfils, C., Francois, P., Shand, M., Decraene, B., 1645 Uttaro, J., Leymann, N., and M. Horneffer, "LFA 1646 applicability in SP networks", 1647 draft-ietf-rtgwg-lfa-applicability-03 (work in progress), 1648 August 2011. 1650 [I-D.shand-remote-lfa] 1651 Bryant, S., Filsfils, C., Shand, M., and N. So, "Remote 1652 LFA FRR", draft-shand-remote-lfa-00 (work in progress), 1653 October 2011. 1655 [Kahn_1962_topo_sort] 1656 Kahn, A., "Topological sorting of large networks", 1657 Communications of the ACM, Volume 5, Issue 11 , Nov 1962, 1658 . 1660 [LFARevisited] 1661 Retvari, G., Tapolcai, J., Enyedi, G., and A. Csaszar, "IP 1662 Fast ReRoute: Loop Free Alternates Revisited", Proceedings 1663 of IEEE INFOCOM , 2011, . 1666 [LightweightNotVia] 1667 Enyedi, G., Retvari, G., Szilagyi, P., and A. Csaszar, "IP 1668 Fast ReRoute: Lightweight Not-Via without Additional 1669 Addresses", Proceedings of IEEE INFOCOM , 2009, 1670 . 1672 [MRTLinear] 1673 Enyedi, G., Retvari, G., and A. Csaszar, "On Finding 1674 Maximally Redundant Trees in Strictly Linear Time", IEEE 1675 Symposium on Computers and Comunications (ISCC) , 2009, 1676 . 1679 [RFC3137] Retana, A., Nguyen, L., White, R., Zinin, A., and D. 1680 McPherson, "OSPF Stub Router Advertisement", RFC 3137, 1681 June 2001. 1683 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 1684 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 1686 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 1687 RFC 5714, January 2010. 1689 Authors' Addresses 1691 Alia Atlas 1692 Juniper Networks 1693 10 Technology Park Drive 1694 Westford, MA 01886 1695 USA 1697 Email: akatlas@juniper.net 1699 Gabor Sandor Enyedi 1700 Ericsson 1701 Konyves Kalman krt 11 1702 Budapest 1097 1703 Hungary 1705 Email: Gabor.Sandor.Enyedi@ericsson.com 1707 Andras Csaszar 1708 Ericsson 1709 Konyves Kalman krt 11 1710 Budapest 1097 1711 Hungary 1713 Email: Andras.Csaszar@ericsson.com