idnits 2.17.1 draft-mjsraman-rtgwg-intra-as-psp-te-leak-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([11]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 4, 2013) is 3981 days in the past. Is this intentional? Checking references for intended status: None ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'RFC2119' on line 131 == Missing Reference: '12' is mentioned on line 703, but not defined -- Looks like a reference, but probably isn't: 'RFC3630' on line 748 -- Looks like a reference, but probably isn't: 'RFC 3630' on line 720 == Unused Reference: '5' is defined on line 1011, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 1026, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 1030, but no explicit reference was found in the text == Outdated reference: A later version (-06) exists of draft-mjsraman-rtgwg-inter-as-psp-01 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG Working Group Shankar Raman 3 Internet-Draft Balaji Venkat Venkataswami 4 Intended Status: Experimental RFC Gaurav Raina 5 Expires: November 5, 2013 IIT Madras 6 May 4, 2013 8 Building power shortest inter-Area TE LSPs using pre-computed paths 9 draft-mjsraman-rtgwg-intra-as-psp-te-leak-03 11 Abstract 13 In this paper, we propose a framework to reduce the aggregate power 14 consumption of an Autonomous System (AS) using a collaborative 15 approach between areas within an AS. We identify the low-power paths 16 within non-backbone areas and then use Traffic Engineering (TE) 17 techniques to route the packets along the stitched paths from non- 18 backbone areas / backbone area to other non-backbone areas. Such low- 19 power paths can be identified by using the power-to-available- 20 bandwidth (PWR) ratio as an additional constraint in the Constrained 21 Shortest Path First (CSPF) algorithm. For routing the data traffic 22 through these low-power paths, the Inter-Area Traffic Engineered 23 Label Switched Path (TE-LSP) that spans multiple areas can be used. 24 Extensions to the Interior Gateway Protocols like OSPF and IS-IS that 25 support TE extensions can be used to disseminate information about 26 low-power paths in the respective areas (backbone or non-backbone) 27 that minimize the PWR ratio metric on the links within the areas and 28 between the areas thereby creating a collaborative approach to reduce 29 the power consumption. 31 The feasibility of our approaches is illustrated by applying our 32 algorithm to an AS with a backbone area and several non-backbone 33 areas. The techniques proposed in this paper for the Inter-Area power 34 reduced paths require a few modifications to the existing features of 35 the IGPs supporting TE extensions. The proposed techniques can be 36 extended to other levels of Internet hierarchy, such as Inter-AS 37 paths, through suitable modifications as in [11]. 39 When link state routing protocols like OSPF or ISIS are used to 40 discover TE topology, there is the limitation that traffic engineered 41 paths can be set up only when the head and tail end of the label 42 switched path are in the same area. There are solutions to overcome 43 this limitation either using offline Path Computation Engine (PCE) 44 that attach to multiple areas and know the topology of all areas. 45 This document proposes an alternative approach that does not require 46 any centralized PCE and uses selective leaking of low-power TE path 47 information from one area into other areas. 49 Status of this Memo 51 This Internet-Draft is submitted to IETF in full conformance with the 52 provisions of BCP 78 and BCP 79. 54 Internet-Drafts are working documents of the Internet Engineering 55 Task Force (IETF), its areas, and its working groups. Note that 56 other groups may also distribute working documents as 57 Internet-Drafts. 59 Internet-Drafts are draft documents valid for a maximum of six months 60 and may be updated, replaced, or obsoleted by other documents at any 61 time. It is inappropriate to use Internet-Drafts as reference 62 material or to cite them other than as "work in progress." 64 The list of current Internet-Drafts can be accessed at 65 http://www.ietf.org/1id-abstracts.html 67 The list of Internet-Draft Shadow Directories can be accessed at 68 http://www.ietf.org/shadow.html 70 Copyright and License Notice 72 Copyright (c) 2013 IETF Trust and the persons identified as the 73 document authors. All rights reserved. 75 This document is subject to BCP 78 and the IETF Trust's Legal 76 Provisions Relating to IETF Documents 77 (http://trustee.ietf.org/license-info) in effect on the date of 78 publication of this document. Please review these documents 79 carefully, as they describe your rights and restrictions with respect 80 to this document. Code Components extracted from this document must 81 include Simplified BSD License text as described in Section 4.e of 82 the Trust Legal Provisions and are provided without warranty as 83 described in the Simplified BSD License. 85 Table of Contents 87 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 88 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 89 1.1 Low-power routers and switches . . . . . . . . . . . . . . . 4 90 1.2 Power reduction using routing and traffic engineering . . . 4 91 2. Methodology of the proposal . . . . . . . . . . . . . . . . . 6 92 2.1 ABR Operation . . . . . . . . . . . . . . . . . . . . . . . 6 93 2.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . 7 94 2.1.2 ERRATA . . . . . . . . . . . . . . . . . . . . . . . . . 11 95 2.1.3 Power Bias . . . . . . . . . . . . . . . . . . . . . . . 11 96 2.1.4 Advertising Available POWER . . . . . . . . . . . . . . 12 97 2.1.5 ECMP links . . . . . . . . . . . . . . . . . . . . . . . 12 98 2.1.6 Dampening the side effects of constant change . . . . . 12 99 2.1.7 Calculating power shortest paths in an Area . . . . . . 12 100 2.1.8 Power profiles of Routers and Switches . . . . . . . . . 13 101 2.1.8.1 Concave and Convex power curves . . . . . . . . . . 15 102 2.1.8.2 Need to advertise both available power and 103 consumed power . . . . . . . . . . . . . . . . . . . 17 104 2.1.9 Power to Available Bandwidth ratio in a TLV . . . . . . 17 105 2.2 TE Path Head-end Operation . . . . . . . . . . . . . . . . . 20 106 2.2 Suppression of Frequent updates owing to fluctuation in 107 power and bandwidth . . . . . . . . . . . . . . . . . . . . 22 108 2.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . 23 109 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 24 110 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 24 111 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 112 5.1 Normative References . . . . . . . . . . . . . . . . . . . 24 113 5.2 Informative References . . . . . . . . . . . . . . . . . . 24 114 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 116 1 Introduction 118 Estimates of power consumption for the Internet predict a 300% 119 increase, as access speeds increase from 10 Mbps to 100 Mbps [3], 120 [8]. Access speeds are likely to increase as new video, voice and 121 gaming devices get added to the Internet. Various approaches have 122 been proposed to reduce the power consumption of the Internet such as 123 designing low-power routers and switches, and optimizing the network 124 topology using traffic engineering methods [2]. 126 1.1 Terminology 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in RFC 2119 [RFC2119]. 132 1.1 Low-power routers and switches 134 Low-power router and switch design aim at reducing the power consumed 135 by hardware architectural components such as transmission link, 136 lookup tables and memory. In [4] it is shown that the router's link 137 power consumption can vary by 20 Watts between idle and traffic 138 scenarios. Hence the authors suggest having more line cards and 139 running them to capacity: operating the router at full throughput 140 will lead to less power per bit, and hence larger packet lengths will 141 consume lower power. The two important components in routers that 142 have received attention for high power consumption are buffers and 143 TCAMs. Buffers are built using dynamic RAM (DRAM) or static RAM 144 (SRAM). SRAMs are limited in size and consume more power, but have 145 low access times. Guido [1] states that a 40Gb/s line card would 146 require more than 300 SRAM chips and consume 2:5kW. DRAM access times 147 prevent them from being used on high speed line cards. Sometimes the 148 buffering of packets in DRAM is done at the back end, while SRAM is 149 used at the front end for fast data access. But these schemes cannot 150 scale with increasing line speeds. Some variants of TCAMs have been 151 proposed for increasing line speeds and for reduced power consumption 152 [7]. 154 1.2 Power reduction using routing and traffic engineering 156 At the Internet level, creating a topology that allows route 157 adaptation, capacity scaling and power-aware service rate tuning, 158 will reduce power consumption. In [8] the author has proposed a 159 technique to traffic engineer the data packets in such a way that the 160 link capacity between routers is optimized. Links which are not 161 utilized are moved to the idle state. Power consumption can be 162 reduced by trading off performance related measures like latency. For 163 example, power savings while switching from 1 Gbps to 100 Mbps is 164 approximately 4 W and from 100 Mbps to 10 Mbps around 0:1 Watts. 165 Hence instead of operating at 1 Gbps the link speed could be reduced 166 to a lower bandwidth under certain conditions for reduced power 167 consumption. 169 Multi layer traffic engineering based methods make use of parameters 170 such as resource usage, bandwidth, throughput and QoS measures, for 171 power reduction. In [6] an approach for reducing Intra-AS power 172 consumption for optical networks that uses Djikstra's shortest path 173 algorithm is proposed. The input to this method assumes the existence 174 of a network topology using which an auxiliary graph is constructed. 175 Power optimization is done on the auxiliary graph and traffic is 176 routed through the low-power links. However, the algorithm expects 177 the topology to be available for getting the auxiliary graph. This 178 topology is easy to obtain for Intra-AS scenario, but by using a 179 centralized PCE (Path Computation Element) as in a hierarchical PCE 180 approach. Here for each area a PCE is assigned and each such PCE 181 calculates the path from a head-end router to a tail-end router, both 182 falling within the same area. When TE paths have to be stitched 183 across several areas then the hierarchical PCE which may be one level 184 up from the respective area PCEs is contacted for such a stitching. 186 In our approach, we propose a collaborative approach by the 187 respective areas in calculating low-power paths that result in power 188 reduction within an AS. This document proposes an alternative 189 approach that does not require any centralized PCE and uses selective 190 leaking of low-power TE path information from one area into other 191 areas. The core of most ISP ASes use the Multi-Protocol Label 192 Switching (MPLS) technology. MPLS label switched paths that traverse 193 multiple areas carry traffic from a head-end to a tail-end that can 194 be situated in different areas within the AS. The AS uses the 195 Interior Gateway Protocol (IGP) for exchanging routing related 196 information. The topology of one area is not revealed to the other in 197 OSPF-TE and IS-IS-TE. 199 The CSPF algorithm as proposed here is run on a specific area with 200 the available power-to-bandwidth (PWR) ratio as a constraint, to 201 determine "k" (where k is a suitable number) low-power-paths from the 202 head-end to the tail-end within the same area. The low-cost power 203 paths that minimize the PWR ratio can be exchanged among the 204 collaborating areas using IGP-TE TLVs that we propose in this 205 document. Explicit routing using RSVP-TE (for signalling) then can be 206 achieved between the head-end and the tail-end routers traversing 207 multiple areas through these low-power paths connecting the head-end 208 and tail-end using the Inter-Area Traffic Engineered Label Switched 209 Path (TE-LSP) that span multiple areas. 211 2. Methodology of the proposal 213 There are three known solutions to inter-area TE 215 (a) hop expansion at area boundaries where the head end can only 216 choose the path to area boundary rather than right to tail end, 218 (b) centralized PCE is attached to all areas and is aware of entire 219 topology, and 221 (c) path stitching by designating ABRs acting as BGP route 222 reflectors. 224 It is of course possible to build out low-power paths through the 225 above techniques but they suffer limitations such as not knowing for 226 certain whether the path exists a-priori. This document proposes a 227 technique where a-priori low-power paths are pre-computed in the 228 various areas and are leaked into other areas so that provisioning 229 these paths is done much more quicker than is otherwise possible. 231 Assume {N} as the set of nodes in a network running link state 232 routing protocol and {N' } be the set of nodes that are known to be 233 the endpoints of the traffic engineered paths. The topology {N, E} 234 has been divided into hierarchical areas with backbone area as the 235 second level that connects first level of all non-backbone areas. We 236 assume the network runs either OSPF-TE or ISIS-TE for establishing TE 237 paths. The set of nodes {N'} can be situated in any non-backbone area 238 or the backbone area. Nodes in {N'} may become aware of being 239 potential endpoints through offline configuration. 241 Once the nodes in {N'} become aware of being TE endpoints, they 242 advertise themselves in a special TLV in TE link state information. 243 We would term this "TE Endpoint TLV". In OSPF, they would advertise a 244 newly defined TLV in TE LSA and in ISIS, they would advertise a newly 245 defined TLV in TE LSP. Apart from nodes in {N'} the area border 246 routers or ABRs advertise another newly defined TLV that we would 247 term as "Area Border TLV". 249 2.1 ABR Operation 251 Apart from standard OSPF/ISIS ABR functions, each ABR should discover 252 the TE endpoints in every area attached to it. Assume for an ABR, let 253 the set discovered be {Ai, Nj}. The ABR should compute k-power- 254 shortest-paths to every element in {Ai, Nj} based on the constraints 255 applicable to the network. The constraint applied here is the 256 minimization of the PWR ratio which is defined as follows. 258 For a given router that is an ABR for an area (straddling the 259 backbone and non-backbone), a set of k-shortest paths that can be 260 potentially be used as a link towards a TE endpoint are identified. 262 2.1.1 Methodology 264 For each router / switch there exist linecards and each linecard has 265 a set of ports or sometimes just one port of high capacity. This 266 usually applies on routers and switches that are either single 267 chassis or multi-chassis in their characterisation. By single chassis 268 we mean that there exists a single chassis and slots for the Route 269 Processor Card (one or more of these) typically upto to two of them, 270 and one or more slots for linecards each having their respective 271 characteristics such as number of ports (port density), type of such 272 ports (SONET, ethernet, ATM etc..) usually depending on the link 273 layer technology they support. Links are connections between ports on 274 these linecards to other ports on linecards of other single chassis 275 or multi-chassis system. A multi-chassis system is one that has 276 multiple such chassis interconnceted amongst each other to form a 277 single logical view of the system. Both single and multi-chassis have 278 linecards and respective ports on these linecards. Multi-chassis 279 typically have a switch fabric chassis which connects each of these 280 chassis to each other or to chassis of other multi-chassis or single 281 chassis systems. 283 Consider the following topology as one that falls within an area... 285 Router A Router B Router C 286 +---+---+ +---+---+ +-------+ 287 | | | | | | | | | 288 |LC1|LC2| |LC1|LC2| |LC1|LC2| 289 | | | | | | L11 | | | 290 | P1| P1| | P1| P1|-------------- P1| P1|---+ 291 | P2| P2|--+ | P2| P2| L12 | P2| P2| | 292 | P3| P3| | L4 | P3| P3|-------------- P3| P3| | 293 | P4| P4|--+----------- P4| P4| +---- P4| P4| | 294 | P5| P5| | +----P5| P5--+ L5 | | P5| P5| | 295 | | | | | | | | | | | | | | | | | 296 +-|-+-|-+ |L3 | +---+---+ | | +---+-|-+ | L13 297 | | | +------------+-------+ | | 298 | |L2 | L5 | | | 299 | +----+------------+ | | | 300 | | | | | | 301 |L1 | | |L6 | | 302 | | Router D | | Router E L12| | Router F 303 | | +---+---+ | | +---+---+ | |+-------+ 304 | | | | | |L2 | | | | | || | |L 305 | | |LC1|LC2| | | |LC1|LC2| | ||LC1|LC2|1 306 | | | | | | | | | | | || | |4.. 307 | +-| P1| P1---+ | | P1| P1|------+ || P1| P1|-> 308 | | P2| P2| L7 +--- P2| P2| +--P2| P2|-> 309 | | P3| P3|-------------- P3| P3| L10 | P3| P3|-> 310 +----------| P4| P4| +---- P4| P4|-------------- P4| P4| 311 | P5| P5| | +-- P5| P5| +----- P5| P5| 312 | | | | | | | | | | | | | 313 +-|-+---+ L8 | | +---+---+ L9 | +---+---+ 314 +---------------+ +------------------+ 316 The table of links between the various routers (which are assumed to 317 be single chassis systems) is as follows... 319 +--------+----------+-----------+-----------+-----------+----------+ 320 | Links | Routers | LC <> LC | Port Conn.| Capacity |Available | 321 | | | | | |Bandwidth | 322 +--------+----------+-----------+-----------+-----------+----------+ 323 | L1 | A <> D | LC1<>LC1 | P5<>P4 | 10G | 7.5 | 324 | L2 | A <> D | LC2<>LC2 | P5<>P1 | 10G | 6.0 | 325 | L3 | A <> D | LC2<>LC1 | P2<>P1 | 10G | 4.0 | 326 | L4 | A <> B | LC2<>LC1 | P4<>P4 | 10G | 3.0 | 327 | L5 | B <> C | LC1<>LC1 | P5<>P4 | 10G | 3.5 | 328 | L6 | B <> E | LC1<>LC1 | P6<>P2 | 10G | 1.0 | 329 | L7 | D <> E | LC2<>LC1 | P3<>P3 | 10G | 6.0 | 330 | L8 | D <> E | LC1<>LC1 | P5<>P4 | 10G | 1.5 | 331 | L9 | E <> F | LC1<>LC2 | P5<>P5 | 100G | 20.0 | 332 | L10 | E <> F | LC2<>LC1 | P4<>P4 | 10G | 2.5 | 333 | L11 | B <> C | LC2<>LC1 | P1<>P1 | 10G | 3.0 | 334 | L12 | E <> C | LC2<>LC2 | P1<>P5 | 10G | 2.0 | 335 | L13 | C <> F | LC2<>LC1 | P1<>P2 | 10G | 1.0 | 336 | L14 | F <> OA | LC2<> | P1<> | | | 337 | | | | | | | 338 +--------+----------+-----------+-----------+------------+---------+ 340 In the above topology assume all point-to-point links between the 341 routers. For now we will deal with P2P links alone and not venture 342 into Broadcast Multi-access links or Non-Broadcast Multi-access links 343 etc.. It is suffice to show how the scheme works for P2P links and 344 then move more specifically to other types of networks to demonstrate 345 this method of calculating the power topology of the network in the 346 figure. 348 Each linecard consumes a certain amount of power and it is vendor 349 dependent as to how the power consumed relates to the Available 350 Bandwidth on any of the links to which the linecard connects to. It 351 is possible that the said topology of routers come from one vendor or 352 from multiple vendors. It is assumed that the algorithm proposed will 353 have the power consumed by a linecard available as a readable value 354 in terms of W or kW or whichever measurable metric that is provided 355 by the vendor. 357 It is possible that some of the Linecards are more capable than the 358 others. Consider that Router A is a more capable router with more 359 powerful linecards with higher port density. This is not shown in the 360 figure, but assume so. LC1, LC2 on Router A could be consuming more 361 power than the other Linecards on other routers. The main reason 362 could be that LC1 and LC2 may have higher port density or higher 363 speed ports than the other routers. In order to calculate the power 364 consumed on a link by a linecard it is important that we normalize 365 the power as power consumed per port. Here the ports are normalized 366 to lowest common denominator. If all links in the topology have 10G 367 port capacity then the power calculated should be in terms power 368 consumed per 10G port. 370 Assuming we have done this normalization we go on to calculate the 371 POWER metric for each of the ports involved in a link which is 372 derived as follows... 374 POWER metric = Power consumed per XG (normalized bandwidth) port 375 for a given ------------------------------------------------- 376 Port on a LC Available Bandwidth on that port 378 Assume link L1. The ports concerned are both 10G and the ports are P5 379 on Router A and P4 on Router D. For calculating the POWER metric for 380 a link which we will call PWRLINK we calculate the POWER metric for 381 each side of the link and average the two to get PWRLINK. 383 So PWRLINK for L1 = POWER for P5 on LC1 + Power for P4 on LC1 384 on Router A on Router D 385 ============================================ 386 2 388 The above can also be weighted if there is a multi-capacity port on 389 one side of the link and not on the other. A multi-capacity link is 390 one which provides multiple bandwidth capabilities such (1G/10G/100G) 391 for example but auto-negotiates with other end to provide a lesser 392 than highest capacity service. 394 The PWRLINK metrices once calculated are flooded in already defined 395 OSPF-TE-LSA as an adapted TE-metric and is typically flooded as a 396 link characteristic. 398 It is important to note that the denominator for POWER metric is 399 Available Bandwidth instead of Available Bandwidth on that port. The 400 Available Bandwidth is measured in terms of intervals and not as 401 discrete quantities. This is in order not to flood PWRLINK metrics 402 into the OSPF area in LSAs very frequently as Bandwidth may 403 constantly change. The same applies to POWER metric as well. 405 Once the LSAs have been flooded the Routers run CSPF on the graph of 406 the topology with PWRLINKs assigned to the links and calculate the 407 PWRLINK based paths which consume the least power. The shortest power 408 paths based on this topology can be used for forwarding high 409 bandwidth streams and to optimally use power within the area. 411 The Available Bandwidth column shows the Available bandwidth of the 412 link corresponding to the row and column intersection. This figure is 413 used as the numerator in the POWER metric computation for that port. 415 2.1.2 ERRATA 417 ERRATA : Previously the experiments were carried out with Available 418 Utilization since only 10G and 100G ports were considered. This 419 baselines the metric to 10G ports and proportionality thereof. But in 420 reality the actual Available Bandwidth needs to be considered for 421 real world experiments. Hence this draft has been changed to reflect 422 the Available Bandwidth to be taken as the denominator of the formula 423 thereof. 425 In our previous experiments the 100G link if it showed a utilization 426 of 0.2 would end up as a high POWER metric and hence would be totally 427 avoided. In reality this link may have been a more power optimal link 428 given that if it had a first power profile (Please refer section on 429 Power Profiles). Dividing the Power consumed or Available Power by 430 the Available Bandwidth gives a better picture of how much power cost 431 per Gb is consumed and normalizes the metric amongst links of varying 432 bandwidth. 434 An earlier version of this document rev-00 contained a different 435 algorithm to compute the k-shortest-power-paths. From the 436 experimental results gathered it was seen that the said algorithm was 437 prone to errors with respect to direction of traffic and 438 unnecessarily complex for the solution. Hence it has been set aside 439 for a more simple yet better one mentioned in this revision. 441 2.1.3 Power Bias 443 Assume in the figure that there exist Routers A and D and that there 444 is a bias on the link L1 in such a way that Router D computes a POWER 445 metric of 10 and the Router D computes a POWER metric of 2 on the 446 ports P5 and P4 respectively. Now the PWRLINK would be 6 for that 447 link L1. Thus even if one side is excessively power guzzling then the 448 PWRLINK moves up and thus is less preferred in the CSPF algorithm and 449 path computation based on the Power topology. 451 If there is no bias and both the sides of the link are optimal in 452 their power usage then the metric stays low even if more streams are 453 sent on it. This is the main objective that is set out for router and 454 switch manufacturers in the single chassis and multi-chassis world, 455 in that they are incentivised to manufacture linecards that are not 456 power hungry even if the number of packets flowing through them is 457 high and thus the Bandwidth Available is also reasonably on the 458 higher side compared to other routers. 460 For those manufacturers who set a high power value for even minimal 461 traffic, the vendors that dont would win out in the end. 463 2.1.4 Advertising Available POWER 465 Please see section 2.1.8 for more information on why Available POWER 466 plays a crucial role in determining the choice of routers based on 467 the Power metric. 469 2.1.5 ECMP links 471 It is possible that multiple links would have the same PWRLINK metric 472 after a computation cycle. In such a case load-balancing techniques 473 can be used to keep the ECMP links in a steady state with respect to 474 each other. Depending on the Available Bandwidth thereafter it is 475 possible that the ECMP links may no longer be Equal cost but UCMP or 476 Unequal Cost Paths. 478 2.1.6 Dampening the side effects of constant change 480 It is recommended in this draft that the implementation of the 481 proposal be adaptive, infrequent in computation to the extent 482 possible without sacrificing adapting to the dynamism and also reduce 483 any frequent oscillations. The actual methods to adopt for this 484 computation are outside the scope of this document. 486 2.1.7 Calculating power shortest paths in an Area 488 Assume the following topology where A,B,C etc.. are routers and 489 corresponding labelled edges with weights are the links. These 490 weights are the current values of the PWRLINK attribute that has been 491 flooded in the LSAs through the Area concerned. Assume B is the ABR 492 for Area 1 and the routers A and C are the Area 0 core routers. The 493 rest of the routers are assumed to be in Area 1. Once the power 494 topology of the Area 1 has been calculated as shown below with the 495 PWRLINK attributes being assigned to the links, Constrained shortest 496 path can be run from the ABR to any of the other routers say H, E , X 497 etc.. The CSPF algorithm takes the constraint in terms of the PWRLINK 498 attributes along with other attributes to construct a power shortest 499 path from say router B to other routers in Area 1. 501 0.5 502 (C) +----------------+ 503 0.5| / | 504 | / | 505 0.05 V/ 0.1 0.03 0.2 V 506 (A)--->(B)--->(D)--->(G)--->(H) 507 | | | 508 | 0.5| | 0.1 509 | V V 510 +----------->(E)--->(X) 511 0.5 0.3 513 Once the path has been computed it is possible to use RSVP-TE to 514 construct the power shortest path with the TE-LSP being instantiated 515 with the labels appropriately placed in the routers on the power 516 shortest path. In this topology, assume one would want to construct a 517 path from B to X then the dotted path shows the path constructed and 518 to be used by a set of flows or streams of packets belonging to 519 multiple flows as seen fit by the router B. If the PWRLINK metrics 520 change after due course of time then another power shortest path that 521 possibly traverses the same path (if the SUM of PWRLINKs doesnt 522 exceed any other path's metrics' SUM) or some other path would be 523 constructed. Specifically this method makes use of traffic- 524 engineering signalling protocols as the method to place the streams 525 from point X to point Y (where X and Y are routers). 527 0.5 528 (C) +----------------+ 529 0.5| / | 530 | / | 531 0.05 V/ 0.1 0.03 0.2 V 532 (A)--->(B)...>(D)...>(G)...>(H) 533 | | . 534 | 0.5| . 0.1 535 | V V 536 +----------->(E)--->(X) 537 0.5 0.3 539 2.1.8 Power profiles of Routers and Switches 541 It has been experimented and from several sources found that there 542 exist routers which have different power profiles. The power profile 543 of a router is the curve of power consumption to available bandwidth. 544 Mentioned below are a few of these prominent ones that have to be 545 taken into consideration. 547 The first profile that we will consider is the flattening curve. The 548 power consumed to available bandwidth curve takes the shape of a 549 steep one initially and then tapers off to a plateau. The point at 550 which it begins to give a delta-C (delta in Power Consumed) to delta- 551 B (Available Bandwidth exhausted) is the inflection point that tapers 552 off to a plateau. Here the delta-C/delta-B begins to slow down or 553 decrease rapidly. The more the traffic that is added onto the device 554 the lesser it draws power. 556 ^ 557 | 558 P | . 559 o | . 560 w | . 561 e | . 562 r | . 563 | . 564 c | . 565 o | . 566 n | . 567 s | . 568 u.| . 569 ------------------------------------> 570 | Available Bandwidth exhausted 572 The second profile that we will consider is the exponential curve. 573 The power consumed to available bandwidth curve takes the shape of an 574 ever increasing steep curve as shown below. Here the delta-C/delta-B 575 begins to increase as more traffic is thrown onto it as the Available 576 bandwidth exhausted increases. This power curve beyond a point is 577 intolerable with respect to power guzzling. 579 ^ 580 | 581 P | . 582 o | . 583 w | . 584 e | . 585 r | . 586 | . 587 c | . 588 o | . 589 n | . 590 s | . 591 u.| . 592 ------------------------------------> 593 | Available Bandwidth exhausted 595 The third profile that we will consider is a linear curve. In other 596 words just a straight line. Here delta-C/delta-B is a constant. 598 ^ 599 | 600 P | . 601 o | . 602 w | . 603 e | . 604 r | . 605 | . 606 c | . 607 o | . 608 n | . 609 s | . 610 u.| . 611 ------------------------------------> 612 | Available Bandwidth exhausted 614 2.1.8.1 Concave and Convex power curves 616 Given that there are 3 kinds of major profiles in the router power 617 consumption, what line would we like to pick. This is an important 618 point when choosing the metric to pick the low power paths. 620 (a) If the confrontation is between 2 first profile routers the lower 621 of the 2 would be considered as shown below. The lower curve offers 622 better power savings for each GB of bandwidth transported. 624 ^ 625 | 626 P | . 627 o | . 628 w | . . 629 e | . . 630 r | . . 631 | . . 632 c | . . 633 o | . . 634 n | . . 635 s | . . 636 u.| . 637 ------------------------------------> 638 | Available Bandwidth exhausted 640 (b) If the confrontation is between 2 second profile routers the 641 upper curve offers more power savings per GB of bandwidth. 643 ^ 644 | 645 P | . . 646 o | . . 647 w | . . 648 e | . . 649 r | . . 650 | . . 651 c | . . 652 o | . . 653 n | . . 654 s | . 655 u.| . 656 ------------------------------------> 657 | Available Bandwidth exhausted 659 (c) When the confrontation is between a first profile curve and a 660 second profile curve, it would be optimal to pick (as shown below) 661 the lower of the curves because it gives us lesser power consumed for 662 every GB of traffic routed / switched. Here the exponential curve is 663 the one that offers lesser amount of power consumed per GB of traffic 664 is chosen. But when it gets to a point that the two curves intersect 665 it would be more optimal to pick the tapering curve. Thus at the 666 meeting point of the 2 curves the exponential curve becomes more 667 costly and the tapering one gives us more GB for the power buck. Thus 668 this switchover from one curve to the other (in other words from the 669 exponential curve to the tapering one) does the trick in terms of 670 finding an optimal solution. 672 ^ . 673 | . 674 P | . . 675 o | (*) 676 w | . . 677 e | . . 678 r | . . 679 | . . 680 c | . . 681 o | . . 682 n | . . 683 s | . . 684 u.| .. 685 ------------------------------------> 686 | Available Bandwidth exhausted 687 (*) Metric switchover point from Consumed Power to Available 688 Power. 690 2.1.8.2 Need to advertise both available power and consumed power 692 Thus the above sections have shown that both the available power and 693 the consumed power MUST be advertised so that case (c) can be 694 deciphered and the switchover of the curves be done and the 695 appropriate router be chosen for the rest of the bandwidth to be 696 switched over to. 698 Thus there will exist Consumed-Power to Available Bandwidth ratio and 699 the Available Power to Available Bandwidth ratio. Both the ratios are 700 computed and the lower value chosen. The Available Power can be 701 judged from the calibration process such as the one carried out by 702 independent test organizations as in [12]. An example of their 703 calibration is referred to in [12]. 705 Here given below is the formula for calculating the Available Power 706 to Available Bandwidth ratio also called the Available POWER metric. 708 Available 709 POWER metric = Available Power consumed per XG 710 (normalized bandwidth) port 711 for a given ---------------------------------- 712 Port on a LC Available Bandwidth on that port 714 2.1.9 Power to Available Bandwidth ratio in a TLV 716 As per [RFC3630] the Link TLV can be used to carry this power to 717 available Bandwidth ratio with an additional sub-TLV of the link TLV. 718 The sub-type number 11 is recommended to be defined for this purpose. 720 [RFC 3630] states in section 2.2.1 and we QUOTE ... 722 2.1.10 Link TLV 724 The Link TLV describes a single link. It is constructed of a set of 725 sub-TLVs. There are no ordering requirements for the sub-TLVs. 727 Only one Link TLV shall be carried in each LSA, allowing for fine 728 granularity changes in topology. 730 The Link TLV is type 2, and the length is variable. 732 The following sub-TLVs of the Link TLV are defined: 734 1 - Link type (1 octet) 735 2 - Link ID (4 octets) 736 3 - Local interface IP address (4 octets) 737 4 - Remote interface IP address (4 octets) 738 5 - Traffic engineering metric (4 octets) 739 6 - Maximum bandwidth (4 octets) 740 7 - Maximum reservable bandwidth (4 octets) 741 8 - Unreserved bandwidth (32 octets) 742 9 - Administrative group (4 octets) 743 10 - Power-to-Multicast-replication-capacity (4 octets) 744 11 - Consumed-Power-to-Available-Bandwidth (4 octets) 745 12 - Available-Power-to-Available-Bandwidth (4 octets) 747 This memo defines sub-Types 1 through 9. See the IANA Considerations 748 in [RFC3630] section for allocation of new sub-Types. 750 The Link Type and Link ID sub-TLVs are mandatory, i.e., must appear 751 exactly once. All other sub-TLVs defined here may occur at most 752 once. These restrictions need not apply to future sub-TLVs. 753 Unrecognized sub-TLVs are ignored. 755 Various values below use the (32 bit) IEEE Floating Point format. For 756 quick reference, this format is as follows: 758 0 1 2 3 759 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 |S| Exponent | Fraction | 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 764 S is the sign, Exponent is the exponent base 2 in "excess 127" 765 notation, and Fraction is the mantissa - 1, with an implied binary 766 point in front of it. Thus, the above represents the value: 768 (-1)**(S) * 2**(Exponent-127) * (1 + Fraction) 770 It is proposed that we use the Power-to-Available-Bandwidth ratio as 771 a 32 bit IEEE floating Point format field for the purpose of this 772 document. 774 Assume the following topology in a non-backbone area after 775 calculating the PWR ratio in a given stage of the algorithm. 777 0.5 778 (C) +----------------+ 779 0.5| / | 780 | / | 781 0.05 V/ 0.1 0.03 0.2 V 782 (A)--->(B)--->(D)--->(G)--->(H) 783 | | | 784 | 0.5| | 0.1 785 | V V 786 +----------->(E)--->(X) 787 0.5 0.3 789 Here (B) is a Area Border Router and has to ingress links into it 790 from (C) and (A) which are in the backbone area. Connectivity within 791 the backbone area are not shown here. Assume (C) and (A) are 792 connected in some way with other routers in the backbone area. 793 Routers (D), (G), (E), (H), (X) are routers in the non-backbone area. 794 Routers (H), (E) and (X) are potential TE endpoints. The PWR metrics 795 shown here on the edges within the area represent metrics for a 796 specific TE endpoint. The metrics on edges (C)->(B) and (A)->(B) are 797 for any traffic ingressing through (B) into the non-backbone area 798 heading towards any TE endpoint (H), (E) or (X). 800 The number of constraints is likely to be few and the most widely 801 used constraints are TE metric, link groups and bandwidth. But no 802 restriction is assumed on use of other constraints. Thus here we add 803 the PWR metric of a link as an additional constraint. Once the ABR 804 computes k-power-shortest-paths to every {Ai, Nj} it has topology 805 information about, it advertises the k-power-shortest-paths as a 806 reachability vector in a newly defined "TE Reachability Vector TLV". 808 Consider an example network show below. TEh is head-end and TEt is 809 tail-end of a TE path, ABR1 and ABR2 are area border routers. 811 TEh2---R2 R4-----TEt2 812 \ / 813 \ / 814 TEh1---R1----ABR1-----Rb1-----Rb2-----ABR2----R3----TEt1 815 Area 1 Backbone Area2 817 In this example, ABR1's TE Reachability vector TLV for area 1 and 818 area 0 are given below. 820 { ABR1, [>, >]} 821 { ABR1, ABR2, [>, >]} 823 Here the vector TLVs are arranged as per increasing PWR metric 824 associated with each path. That is the summation of all PWR metrics 825 of the links in the path is done and the vector TLVs are ordered in 826 increasing order of PWR metric sums. So the lowest-cost-power path is 827 listed first and so on. If the least cost power path is to be chosen 828 then the path in the first TLV is chosen. 830 Similarly ABR2's TE Reachability vector TLV for area 2 and area 0 are 831 given below. 833 { ABR2, [>, >]} 834 { ABR2, ABR1, [>, >]} 836 The first thing to be noted is that head-ends are also considered as 837 TE-endpoints. Essentially this means any head-end or tail-end of a 838 inter-area TE-LSP can be considered as tail-end or head-end 839 respectively. 841 Note that the reachability vector advertised by ABR1 also contains 842 the reachability vector of ABR2. For example, if ABR2 is brought up 843 first, then it is likely that ABR1 would only have the following as 844 TE Reachability vector TLV for area 0 before ABR2 computes path to 845 the TE endpoints in area 2. { ABR1, ABR2 } 847 Note that TLV would only contain the aggregate of link 848 attributes namely cost, bandwidth etc and most importantly the PWR 849 metric as well but not the complete path of intermediate nodes. For 850 example, may be a set of <2, admin-group-1|admin-group- 851 2, 1Gbps> (where the 1Gbps could be the minimum bw available along 852 the path). The above example topology has only one path from ABRs to 853 TE endpoints. The number of path info "k" may have a default value or 854 can be configured by the operator on all nodes. 856 2.2 TE Path Head-end Operation 858 When any TE application requests TE path to be setup to an endpoint 859 that is not present in the same area, the head-end scans the TE 860 Reachability vector TLVs advertised by ABRs and selects the path 861 using the contained in the vector TLVs. 863 Here is an example with multiple paths in area 1, backbone and area 2 864 called Figure 2.0 866 TEh3----R5---ABR3----Rb3-----Rb4------ABR4----R6--TEt4 867 \ / ___/ \ ___/ 868 \ / / \/ 869 TEh2---R2---ABR5------Rb5--------ABR6---R4-----TEt2 870 / \ \____ /\___ 871 / \ \ / \ 872 TEh1---R1----ABR1-----Rb1-----Rb2-----ABR2----R3----TEt1 873 Area 1 Backbone Area2 875 In this topology in figure 2.0 taking the tail-ends represented in 876 the diagram, it is noted that TEt4 is reachable via ABR4, ABR6 and 877 ABR2 as well. The TE reachability TLVs advertised by ABR6 for area 2 878 would be multiple to each tail-end since there exist multiple paths 879 to reach at least most of them in area 2 once a packet reaches any of 880 the ABRs in area 2. 882 Here again the least cost power shortest path is listed first and so 883 on. 885 { ABR6, [>, >, >, etc.. } 888 For area 0 the TE reachability TLV would be 890 { ABR6, ABR1, [>, >...]} 891 { ABR6, ABR5, [>, >...]} 892 { ABR6, ABR3, [>, >...]} 894 For the sake of brevity we do not enumerate all path information 895 possible as it would be quite extensive. 897 It is possible that there may be already setup LSPs which are being 898 used for transit traffic on the backbone or in other non-backbone 899 areas. It is also feasible to advertize already set up LSPs in the 900 path info; no additional TLV is required for that purpose. The case 901 where this may be useful would be if such transport LSPs exist in the 902 backbone area and there is a willingness to provide higher preference 903 to these LSPs to carry transit LSPs over backbone. 905 There can be selective suppression of advertisements to other areas 906 (backbone or non-backbone) of LSPs if these are existing LSPs setup 907 along a path which are utilized to a greater degree. If underutilized 908 with respect to the PWR metric a more favourable metric could be 909 advertized to other areas. 911 For example, backbone area transport LSPs will be advertized as 912 transit LSPs which would provide connectivity to LSP sections lying 913 in non-backbone areas and would be updated more frequently since they 914 facilitate inter-Area TE. 916 Once a path in the TLV has been used for reserving bandwidth for 917 traffic over that path, then it is withdrawn from the advertisements 918 so that it becomes unusable. Another path may be computed over the 919 same path but with possibly a different PWR metric sum since it is 920 possible that the traffic over that path could have changed the PWR 921 metrices in the edges along that path. 923 2.2 Suppression of Frequent updates owing to fluctuation in power and 924 bandwidth 926 Using the power consumed and the bandwidth available as discrete 927 quantities will result in frequent oscillations. Such a step would 928 result will result in frequent re-computations of the shortest power 929 paths. For the sake of suppression of such frequent updates, it is 930 possible to handle the PWR metric as falling within reasonable 931 intervals of thresholds. If the interval in which PWR metric lies is 932 moved out of and another interval is reached then the update is sent 933 out in the IGP-TE mechanism. Otherwise if the interval in which the 934 PWR metric lies is not moved out of then the updates are not sent. 935 Suitable thresholds can be arrived at after suitable calibration 936 through tests. 938 Routers may have step levels in which they increase power consumption 939 when they additively are loaded with more large bandwidth consuming 940 multicast or unicast streams. Calibrating these levels may be useful 941 for implementing this scheme. It is possible that such calibrated 942 thresholds can be used for advertising the PWRLINK ratios in the OSPF 943 LSA advertisements. This would be useful for bringing down the 944 frequency of updates or advertisements from a line-card about its 945 PWRLINK ratio. When power consumption meanders within a certain given 946 interval these ratios need not be re-advertised even if further 947 unicast and/or multicast streams are added to it. The incentive is to 948 recognize a linecard that does not drastically change power 949 consumption even if large bandwidth streams are added onto it for 950 forwarding and thus give it credit for its power optimal functioning. 951 If a router tends to consume the highest level of power even when 952 carrying low amounts of unicast and multicast streams on its line 953 card, it would automatically have a poor ratio when compared to a 954 router that efficiently uses power when considering the Available 955 Bandwidth being observed. The best case would be a low power 956 consuming line-card or a router filled with such line cards that does 957 not leave its power interval no matter how much ever capacity is 958 sought to be used on it. But that would be an ideal condition but it 959 is definitely an idealistic scenario towards which the router 960 manufacturers should look at. 962 2.3 Advantages 964 1) The TE Reachability vector TLV contains the aggregate of all link 965 attributes along with TE constraints and so the head-end of the TE 966 path can explicitly select the ABR that connects the destination area 967 even though it does not know the complete topology of the backbone 968 area. 970 2) As the TE reachability vector contains only the aggregate 971 attributes of k-power-shortest-paths, the flooding overhead to 972 support the mechanism is limited. 974 3) Centralized path computation element is not required for 975 supporting inter-area power-shortest-path TE. The additional overhead 976 of computing k-power-shortest-paths on ABR can be solved by 977 offloading the computation overhead to additional processor in multi- 978 core platforms. 980 3 Security Considerations 982 None. 984 4 IANA Considerations 986 New TLV types for OSPF and IS-IS for the new TLVs that have been 987 introduced need to be assigned. 989 5 References 991 5.1 Normative References 993 5.2 Informative References 995 [1] G. Appenzeller, Sizing router buffers, Doctoral Thesis, 996 Department of Electrical Engineering, Stanford University, 997 2005. 999 [2] A. P. Bianzino, C. Chaudet, D. Rossi and J. L. Rougier, A survey 1000 of green networking research, IEEE Communications and 1001 Surveys Tutorials, preprint. 1003 [3] J. Baliga, K. Hinton and R. S. Tucker, Energy consumption of the 1004 internet, Proc. of joint international conference on 1005 optical internet, June 2007, pp. 1-3. 1007 [4] J. Chabarek, J. Sommers, P. Barford, C. Estan, D. Tsiang and S. 1008 Wright, Power awareness in network design and routing, 1009 Proc. of the IEEE INFOCOM 2008, April 2008, pp. 457-465. 1011 [5] B. Venkat et.al, Constructing disjoint and partially disjoint 1012 InterAS TE-LSPs, USPTO Patent 7751318, Cisco Systems, 1013 2010. 1015 [6] M. Xia et. al., Greening the optical backbone network: A traffic 1016 engineering approach, IEEE ICC Proceedings, May 2010, pp. 1017 1-5. 1019 [7] W. Lu and S. Sahni, Low-power TCAMs for very large forwarding 1020 tables, IEEE/ACM Transactions on Computer Networks, June 1021 2010, vol. 18, no. 3, pp. 948-959. 1023 [8] B. Zhang, Routing Area Open Meeting, Proceedings of the IETF 81, 1024 Quebec, Canada, July 2011. 1026 [9] M.J.S Raman, V.Balaji Venkat, G.Raina, Reducing Power consumption 1027 using the Border Gateway Protocol, IARIA conferences 1028 ENERGY 2012. 1030 [10] A.Cianfrani et al., An OSPF enhancement for energy saving in IP 1031 Networks, IEEE INFOCOM 2011 Workshop on Green 1032 Communications and Networking 1034 [11] Shankar Raman et al., draft-mjsraman-rtgwg-inter-as-psp-01.txt, 1035 Work in Progress, February 2012. 1037 Authors' Addresses 1039 Shankar Raman 1040 Department of Computer Science and Engineering 1041 IIT Madras 1042 Chennai - 600036 1043 TamilNadu 1044 India 1046 EMail: mjsraman@cse.iitm.ac.in 1048 Balaji Venkat Venkataswami 1049 Department of Electrical Engineering 1050 IIT Madras 1051 Chennai - 600036 1052 TamilNadu 1053 India 1055 EMail: balajivenkat299@gmail.com 1057 Prof.Gaurav Raina 1058 Department of Electrical Engineering 1059 IIT Madras 1060 Chennai - 600036 1061 TamilNadu 1062 India 1064 EMail: gaurav@ee.iitm.ac.in