idnits 2.17.1 draft-atlas-rift-pgp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 9 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2018) is 2006 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0-31' is mentioned on line 187, but not defined == Missing Reference: '32-63' is mentioned on line 188, but not defined == Missing Reference: 'P' is mentioned on line 283, but not defined == Unused Reference: 'I-D.ietf-rift-rift' is defined on line 311, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 315, but no explicit reference was found in the text == Outdated reference: A later version (-21) exists of draft-ietf-rift-rift-03 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RIFT A. Atlas 3 Internet-Draft Individual 4 Intended status: Standards Track Z. Zhang 5 Expires: April 25, 2019 Juniper Networks 6 October 22, 2018 8 Policy Guided Prefixes with Routing In Fat Trees 9 draft-atlas-rift-pgp-00 11 Abstract 13 In a fat tree, it can be sometimes desirable to guide traffic to 14 particular destinations or keep specific flows to certain paths. In 15 RIFT, this traffic steering/engineering is done by using policy- 16 guided prefixes with their associated communities. Routes based on 17 policy-guided prefixes are preferred over regular routes. Any node 18 can originate a policy-guided prefix and advertise it in both north 19 and south directions, and the calculation in both directions are 20 distance vector based. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 26 document are to be interpreted as described in RFC2119. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on April 25, 2019. 45 Copyright Notice 47 Copyright (c) 2018 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2.1. Ingress Filtering . . . . . . . . . . . . . . . . . . . . 4 65 2.2. Applying Policy . . . . . . . . . . . . . . . . . . . . . 4 66 2.3. Store Policy-Guided Prefix for Route Computation and 67 Regeneration . . . . . . . . . . . . . . . . . . . . . . 5 68 2.4. Re-origination . . . . . . . . . . . . . . . . . . . . . 6 69 2.5. Reachability Computation with PGP Consideration . . . . . 6 70 3. Security Considerations . . . . . . . . . . . . . . . . . . . 7 71 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 72 5. Normative References . . . . . . . . . . . . . . . . . . . . 7 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 75 1. Introduction 77 In a fat tree, it can be sometimes desirable to guide traffic to 78 particular destinations or keep specific flows to certain paths. In 79 RIFT, this is done by using policy-guided prefixes with their 80 associated communities. Each community is an abstract value whose 81 meaning is determined by configuration. It is assumed that the 82 fabric is under a single administrative control so that the meaning 83 and intent of the communities is understood by all the nodes in the 84 fabric. Any node can originate a policy-guided prefix. 86 Since RIFT uses distance vector concepts in a southbound direction, 87 it is straightforward to add a policy-guided prefix to an S-TIE. For 88 easier troubleshooting, the approach taken in RIFT is that a node's 89 southbound policy-guided prefixes are sent in its S-TIE and the 90 receiver does inbound filtering based on the associated communities 91 (an egress policy is imaginable but would lead to different S-TIEs 92 per adjacency possibly which is not considered in RIFT protocol 93 procedures). A southbound policy-guided prefix can only use links in 94 the south direction. If an PGP S-TIE is received on an East-West or 95 northbound link, it must be discarded by ingress filtering. 97 Conceptually, a southbound policy-guided prefix guides traffic from 98 the leaves up to at most the north-most level. It is also necessary 99 to to have northbound policy-guided prefixes to guide traffic from 100 the north-most level down to the appropriate leaves. Therefore, RIFT 101 includes northbound policy-guided prefixes in its N PGP-TIE and the 102 receiver does inbound filtering based on the associated communities. 103 A northbound policy-guided prefix can only use links in the northern 104 direction. If an N PGP TIE is received on an East-West or southbound 105 link, it must be discarded by ingress filtering. 107 By separating southbound and northbound policy-guided prefixes and 108 requiring that the cost associated with a PGP is strictly 109 monotonically increasing at each hop, the path cannot loop. Because 110 the costs are strictly increasing, it is not possible to have a loop 111 between a northbound PGP and a southbound PGP. If East-West links 112 were to be allowed, then looping could occur and issues such as 113 counting to infinity would become an issue to be solved (if complete 114 generality of path - such as including East-West links and using both 115 north and south links in arbitrary sequence - then a Path Vector 116 protocol or a similar solution must be considered). 118 Besides the usage for traffic engineering, PGPs can also be used to 119 ensure nodes are administratively reachable for debugging purpose 120 after certain failures. For example, a node looses all its 121 northbound adjacencies but is not at the top of the fabric. If it 122 detects that some other members at its level are advertising 123 northbound adjacencies MAY inject its loopback address into 124 southbound PGP TIE and become reachable "from the south" that way. 125 Further, a solution may be implemented where based on e.g. a "well 126 known" community such a southbound PGP is reflected at level 0 and 127 advertised as northbound PGP again to allow for "reachability from 128 the north" at the cost of additional flooding. 130 2. Specification 132 PGPs are advertised in PGPrefixTIEs included in PGP N/S-TIEs. S-PGPs 133 are propagated in south direction only and N-PGPs follow northern 134 direction strictly. THRIFT schema in the base RIFT specification 135 needs to be updated. For example: 137 o TIEElement needs to add "7: optional PGPrefixElement 138 pog_prefixes;" 140 o "struct PGPrefixElement" needs to be defined. Should 141 PrefixAttributes be used for PGPrefixElement (do all defined 142 fields in PrefixAttributes apply to PGPrefixElement)? 144 o "struct Community" needs to be referenced in PGPrefixElement 146 Future revisions of this document and the base RIFT specification 147 will coordinate the THRIFT schema. 149 2.1. Ingress Filtering 151 The set of policy-guided prefixes received in a TIE is subject to 152 ingress filtering and then re-originated to be sent out in the 153 receiver's appropriate TIE. Both the ingress filtering and the re- 154 origination use the communities associated with the policy-guided 155 prefixes to determine the correct behavior. The cost on re- 156 advertisement MUST increase in a strictly monotonic fashion. 158 When a node X receives a PGP S-TIE or a PGP N-TIE that is originated 159 from a node Y which does not have an adjacency with X, all PGPs in 160 such a TIE MUST be filtered. Similarly, if node Y is at the same 161 level as node X, then X MUST filter out PGPs in such S- and N-TIEs to 162 prevent loops. 164 Next, policy can be applied to determine which policy-guided prefixes 165 to accept. Since ingress filtering is chosen rather than egress 166 filtering and per-neighbor PGPs, policy that applies to links is done 167 at the receiver. Because the RIFT adjacency is between nodes and 168 there may be parallel links between the two nodes, the policy-guided 169 prefix is considered to start with the next-hop set that has all 170 links to the originating node Y. 172 A policy-guided prefix has or is assigned the following attributes: 174 cost: This is initialized to the cost received 176 community_list: This is initialized to the list of the communities 177 received. 179 next_hop_set: This is initialized to the set of links to the 180 originating node Y. 182 2.2. Applying Policy 184 The specific action to apply based upon a community is deployment 185 specific. Here are some examples of things that can be done with 186 communities. The length of a community is a 64 bits number and it 187 can be written as a single field M or as a multi-field (S = M[0-31], 188 T = M[32-63]) in these examples. For simplicity, the policy-guided 189 prefix is referred to as P, the processing node as X and the 190 originator as Y. 192 Prune Next-Hops: Community Required: For each next-hop in 193 P.next_hop_set, if the next-hop does not have the community, prune 194 that next-hop from P.next_hop_set. 196 Prune Next-Hops: Avoid Community: For each next-hop in 197 P.next_hop_set, if the next-hop has the community, prune that 198 next-hop from P.next_hop_set. 200 Drop if Community: If node X has community M, discard P. 202 Drop if not Community: If node X does not have the community M, 203 discard P. 205 Prune to ifIndex T: For each next-hop in P.next_hop_set, if the 206 next-hop's ifIndex is not the value T specified in the community 207 (S,T), then prune that next-hop from P.next_hop_set. 209 Add Cost T: For each appearance of community S in P.community_list, 210 if the node X has community S, then add T to P.cost. 212 Accumulate Min-BW T: Let bw be the sum of the bandwidth for 213 P.next_hop_set. If that sum is less than T, then replace (S,T) 214 with (S, bw). 216 Add Community T if Node matches S: If the node X has community S, 217 then add community T to P.community_list. 219 2.3. Store Policy-Guided Prefix for Route Computation and Regeneration 221 Once a policy-guided prefix has completed ingress filtering and 222 policy, it is almost ready to store and use. It is still necessary 223 to adjust the cost of the prefix to account for the link from the 224 computing node X to the originating neighbor node Y. 226 There are three different policies that can be used: 228 Minimum Equal-Cost: Find the lowest cost C next-hops in 229 P.next_hop_set and prune to those. Add C to P.cost. 231 Minimum Unequal-Cost: Find the lowest cost C next-hop in 232 P.next_hop_set. Add C to P.cost. 234 Maximum Unequal-Cost: Find the highest cost C next-hop in 235 P.next_hop_set. Add C to P.cost. 237 The default policy is Minimum Unequal-Cost but well-known communities 238 can be defined to get the other behaviors. 240 Regardless of the policy used, a node MUST store a PGP cost that is 241 at least 1 greater than the PGP cost received. This enforces the 242 strictly monotonically increasing condition that avoids loops. 244 Two databases of PGPs - from N-TIEs and from S-TIEs are stored. When 245 a PGP is inserted into the appropriate database, the usual tie- 246 breaking on cost is performed. Observe that the node retains all PGP 247 TIEs due to normal flooding behavior and hence loss of the best 248 prefix will lead to re-evaluation of TIEs present and re- 249 advertisement of a new best PGP. 251 2.4. Re-origination 253 A node must re-originate policy-guided prefixes and retransmit them. 254 The node has its database of southbound policy-guided prefixes to 255 send in its S-TIE and its database of northbound policy-guided 256 prefixes to send in its N-TIE. 258 Of course, a leaf does not need to re-originate southbound policy- 259 guided prefixes. 261 2.5. Reachability Computation with PGP Consideration 263 During reachability computation, after prefixes are attached as 264 specified in section 5.2.6 "Attaching Prefixes" of the RIFT base 265 specification, PGPs are considered. 267 Each policy-guided prefix P has its cost and next_hop_set already 268 stored in the associated database, as specified in Section 2.3; the 269 cost stored for the PGP is already updated to considering the cost of 270 the link to the advertising neighbor. By definition, a policy-guided 271 prefix is preferred to a regular prefix. 273 for each policy-guided prefix P: 274 if P not in route_database: 275 add (P, type=PolicyGuided, P.cost, next_hop_set) 276 end if 277 if P in route_database : 278 if (route_database[P].type is not PolicyGuided) or 279 (route_database[P].cost > P.cost): 280 update route_database[P] with (P, PolicyGuided, P.cost, next_hop_set) 281 else if route_database[P].cost == P.cost 282 update route_database[P] with (P, PolicyGuided, P.cost, 283 merge(next_hop_set, route_database[P].next_hop_set)) 284 else 285 // Not preferred route so ignore 286 end if 287 end if 288 end for 290 Figure 1: Adding Routes from Policy-Guided Prefixes 292 Notice that a policy-guided prefix is always preferred to a regular 293 prefix, even if the policy-guided prefix has a larger cost. 295 PGPs may overlap with prefixes introduced by automatic de- 296 aggregation. The topic is under further discussion. The break in 297 connectivity that leads to infeasibility of a PGP is mirrored in 298 adjacency tear-down and according removal of such PGPs. 299 Nevertheless, the underlying link-state flooding will be likely 300 reacting significantly faster than a hop-by-hop redistribution and 301 with that the preference for PGPs may cause intermittent black-holes. 303 3. Security Considerations 305 To be provided. 307 4. Acknowledgements 309 5. Normative References 311 [I-D.ietf-rift-rift] 312 Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift- 313 rift-03 (work in progress), October 2018. 315 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 316 Requirement Levels", BCP 14, RFC 2119, 317 DOI 10.17487/RFC2119, March 1997, 318 . 320 Authors' Addresses 322 Alia Atlas 323 Individual 325 EMail: akatlas@gmail.com 327 Zhaohui Zhang 328 Juniper Networks 330 EMail: zzhang@juniper.net