idnits 2.17.1 draft-mjsraman-rtgwg-pim-power-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 9 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 27, 2012) is 4384 days in the past. Is this intentional? Checking references for intended status: None ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 146, but not defined == Unused Reference: 'KEYWORDS' is defined on line 391, but no explicit reference was found in the text == Unused Reference: 'RFC1776' is defined on line 394, but no explicit reference was found in the text == Unused Reference: 'TRUTHS' is defined on line 397, but no explicit reference was found in the text == Unused Reference: '1' is defined on line 402, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 406, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 410, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 414, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 419, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 423, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 427, but no explicit reference was found in the text == Unused Reference: 'EVILBIT' is defined on line 430, but no explicit reference was found in the text == Unused Reference: 'RFC5513' is defined on line 433, but no explicit reference was found in the text == Unused Reference: 'RFC5514' is defined on line 436, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 15 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG Working Group Shankar Raman 3 Internet-Draft Balaji Venkat Venkataswami 4 Intended Status: Experimental RFC Gaurav Raina 5 Vasan Srini 6 Expires: September 2012 I.I.T Madras 7 March 27, 2012 9 Building power optimal Multicast Trees 10 draft-mjsraman-rtgwg-pim-power-02 12 Abstract 14 Power consumption in multicast replication operations is an area of 15 concern and choosing suitable replication points that can decrease 16 power consumption overall assumes importance. Multicast replication 17 capacity is an attribute of every line card of major routers and 18 multi-layer switches that support multicast in the core of an 19 Internet Service Provider (ISP) or an enterprise network. 21 Currently multicast replication points on Point-to-Multipoint 22 Multicast Distribution trees consume power while delivering multiple 23 output streams of data from a given input stream. The multicast 24 distribution trees are constructed without any regard for a proper 25 placement of the replication points and consequent optimal power 26 consumption at these points. 28 This results in overloading certain routers while under-utilizing 29 others. An optimal usage of these replication resources could reduce 30 power consumption on these routers bringing power consumption to 31 optimality. In this paper, we propose a mechanism by which Multicast 32 Distribution Trees are constructed for carrying multicast traffic 33 across multiple routers within a given network. We propose that these 34 Multicast Distribution Trees be built by using the information 35 pertaining to power-replication capacity ratio available with fine 36 grained components such as multicast capable line-cards of routers 37 and multi-layer switches deployed within a network. 39 Status of this Memo 41 This Internet-Draft is submitted to IETF in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF), its areas, and its working groups. Note that 46 other groups may also distribute working documents as 47 Internet-Drafts. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 The list of current Internet-Drafts can be accessed at 55 http://www.ietf.org/1id-abstracts.html 57 The list of Internet-Draft Shadow Directories can be accessed at 58 http://www.ietf.org/shadow.html 60 Copyright and License Notice 62 Copyright (c) 2012 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (http://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the Simplified BSD License. 75 Table of Contents 77 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 78 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 79 2. Methodology of the proposal . . . . . . . . . . . . . . . . . 4 80 2.1 Discussion of this scheme . . . . . . . . . . . . . . . . . 7 81 2.2 Pseudo code for the proposed changes . . . . . . . . . . . . 8 82 2.3 Port Choice on same Linecard . . . . . . . . . . . . . . . . 8 83 3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 84 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 10 85 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 10 86 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 87 5.1 Normative References . . . . . . . . . . . . . . . . . . . 10 88 5.2 Informative References . . . . . . . . . . . . . . . . . . 10 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 91 1 Introduction 93 Multicast traffic across multiple areas within a given network such 94 as an ISP or a Campus Environment Network, may be carried using 95 Multicast Distribution Trees. The traffic may be carried from a 96 ingress router to several egress routers, example in a Campus 97 Environment network. The Network under consideration may comprise of 98 multiple areas involving a backbone area and several non-backbone 99 areas connected to each other through the backbone. If several such 100 multicast streams are to be carried in the network, it would be most 101 useful to have such Multicast Distribution Trees constructed such 102 that they have optimal power to available replication capacity ratios 103 on the routers' linecards that they traverse from source to 104 destinations. The intent is to provide a solution whereby several 105 such Distribution Trees can be laid out in such a way that the set of 106 routers that replicate multicast traffic traversed by the trees are 107 most optimal in the utilization of the power provided to them given 108 that there is sufficient replication capacity available. This we 109 believe would essentially lead to a equilibrium of power to available 110 replication capacity ratios amongst all routers in the topology which 111 in turn would optimize and reduce the overall ratios for the network. 113 Each router and its respective linecards deployed in the network have 114 an advertised capability for replication. Most multi-layer switches 115 and routers from vendors advertise in their respective data sheets a 116 certain capability for replication for each type of linecard 117 deployable on the box. Replication consumes power and delivers 118 multiple streams of data from a given input stream. It is status quo 119 that (Point-to-Multipoint) P2MP trees are constructed without taking 120 into account the power to available replication capacity ratios of 121 such routers thus overloading certain routers while underutilizing 122 the others. An optimal usage of these resources could reduce power 123 consumption on these routers / multi-layer switches. This equilibrium 124 could be arrived at by using a capability to choose from each 125 downstream PIM router the most power optimal path to the selected 126 (through current mechanisms) PIM upstream neighbor in the PIM-based 127 Multicast Distribution Tree which may be a shared tree or a Shortest 128 Path Tree as the case may be. The metric used to select the upstream 129 PIM neighbor could be the power to available replication capacity 130 ratio of each of the said router's line cards that are part of the 131 ECMP set of paths to the upstream neighbor if such ECMP paths do 132 exist. The metric comparison is done for all ECMP paths and the line 133 cards involved therein depending on their current utilization of 134 their replication capacity and power consumption. 136 This paper is organized as follows; In section 2, we deal with the 137 scheme that we propose. In section 2.1, we discuss some examples of 138 the scheme at work, and in section 3 we conclude with further areas 139 of study that may be useful to undertake. 141 1.1 Terminology 143 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 144 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 145 document are to be interpreted as described in RFC 2119 [RFC2119]. 147 2. Methodology of the proposal 149 The key metric under consideration is the power consumed DIVIDED BY 150 available replication capacity on each of the linecards of a router 151 in the network whose constituent ports form part of a ECMP set of 152 paths to a PIM upstream neighbor. The said ports on the different 153 line cards that form the ECMP set of links are eligible to be used as 154 a linecard:port atop which multicast traffic on that tree can be 155 carried. When choosing the path from a ECMP set of paths to a PIM 156 upstream neighbor, the said downstream PIM neighbor calculates the 157 power to multicast replication capacity ratio for each of the line 158 cards that are eligible to be chosen as the linecard:port combination 159 to be used in that section of the distribution tree. The lowest ratio 160 decides which linecard is chosen and if there exist multiple ports 161 within that linecard that connect to the said PIM upstream neighbor 162 the usual algorithm is used to select one of those ports. The key 163 proposal that this document recommends is the use of the power- 164 multicast-replication-capacity ratio to choose from among the 165 different linecards. The choice of port is left to the standard 166 method. 168 Assume that the following router topology in the vicinity of the 169 sender / senders is computed. 171 +----------------+ 172 / V 173 / +----> (R2) ------> (R3)<--(RcvrB) 174 / / (cost 15)\ | 175 / / \ +------+ 176 / / (cost 6) \V 177 (source/s)--->(R1)------> (R5) ------> (R4)<-(RcvrA) 178 \ ^ (cost 10) /\ / 179 \ (cost 4)| ----------+ \ / 180 \ | / \ / 181 (R6)----> (R7) --------> (R8)<-+ 183 Figure 1: Topology within a given network with an upstream ECMP link 184 from R4 to R7 185 In the above diagram you can see that the source/sources are 186 connected using a multi homed connections to the same ISP through 187 Routers R1 nd R2. Similarly there are two Receiver sites RcvrA and 188 RcvrB that are multihomed to TWO Routers RcvrB to R3 and R4 and for 189 RcvrA to R4 and R8 respectively. You can also observe that R4 is 190 connected to R7 through multiple paths. Assuming that both these 191 paths are Equal Cost then this gives rise to a situation where ECMP 192 paths exist for the PIM downstream router R4 to the PIM upstream 193 router R7. 195 Consider that RcvrA sends an IGMP join to R4. R4 now needs to send a 196 PIM join towards the upstream router R7. Assume this is a shared tree 197 with Rendezvous Point (RP) as R7. There are 2 equal cost paths to R7 198 from R4 each with cost 10 ((R7->R5->R4 = 6 + 4 = 10) and (R7 -> R4 = 199 10)). Assume that each of these paths from R4 to R5 onto R7 and from 200 R4 to R7 directly are on different linecards in the chassis R4. 201 Normally one of them would be chosen and power-to-multicast- 202 replication-capacity would not be a consideration in that decision. 203 What this document proposes is that R4 consider the metric PWR which 204 is a ratio formed by dividing the power consumed on each of the 205 linecards by their respective current multicast replication capacity. 207 Obviously one of them would have to be chosen. In the metric 208 comparison the linecard that has the lower PWR metric wins and is 209 selected for consideration to send a PIM join to R7 (the PIM upstream 210 neighbor and in this case the RP as well). 212 +----------------+ 213 / V 214 / +----> (R2) ------> (R3)<--(RcvrB) 215 / / \ | 216 / / \ +------+ 217 / / \V 218 (source/s)...>(R1)------> (R5) ------> (R4)<-(RcvrA) 219 . ^ .\ / 220 . | ........... \ / 221 . | . \ / 222 (R6)....> (R7) ........> (R8)<-+ 224 Legend : dotted lines represent path computed. 226 Figure 2: Instantiating an optimal power consuming distribution tree 228 In our example as in Figure 2 we find that the direct link to R4 and 229 R7 wins out as the link to be used in the distribution tree. 231 The one exception that SHOULD be considered in this decision is that 232 if the Outgoing Interface List consists of ports on linecard X on 233 which R4's downstream PIM neighbors have sent their respective PIM 234 joins and if the ECMP set of paths to the router R7 consist of 235 linecard X and Y, it would be preferable to choose linecard X without 236 taking into consideration the PWR metric. This is in light of the 237 fact that if majority of the OIF list's port members lie on linecard 238 X and the ingress port were also to be placed on linecard X then the 239 replication would be more optimal as it would not have to traverse 240 say the switch fabric to get to the majority of the OIF list. Other 241 localization conditions could also be considered as exceptions to the 242 PWR metric based rule. 244 This document assumes that the power used by each linecard and the 245 multicast replication utilization and advertised capacity are 246 available as data readable from the hardware on the router chassis 247 under consideration. Please note that unicast traffic already being 248 carried on the linecard may also contribute to the power being 249 consumed at the router's linecards under consideration. 251 If ECMP paths dont exist then there is no choice to make hence the 252 default selection of the link to be used to send a PIM join to the 253 upstream neighbor is followed. 255 As a result of this decision to include the PWR metric the paths in 256 the tree where ECMP links occur have the least power to available 257 replication capacity ratios at the time of computation. 259 Assume the following path is computed as per the least power to 260 available replication capacity ratios. Paths are computed through R6, 261 R7, R8, R4, and say the multicast stream occupies 4GB of traffic 262 along this tree so constructed and the available capacity of these 263 routers reduces to 6GB assuming all of them have a base capacity of 264 10GB. Subsequent paths constructed would have to take into account 265 the newly computed power to current replication capacity ratio in the 266 topology for multicast streams / trees yet to come. Now the linecard 267 connecting R4 to R7 directly will have reduction of a quantum of 4GB 268 capacity. It would reduce to 6GB as its available capacity. 270 Assume another 6GB worth of traffic is loaded onto this topology in 271 terms of a multicast stream / multiple streams then the new path 272 computed for these new streams would NOT possibly utilize the same 273 path as computed before since the power utilization and the available 274 replication capacity would have been changed to create a higher PWR 275 ratio. If the old streams reduce the replication capacity to an 276 extent such that routers through which they pass can no longer be 277 used since these routers' power to available replication capacity has 278 become poor when compared to other paths then a different path may be 279 computed from the ingress router to the egress router in such a way 280 as to avoid those routers which have such poor ratios. This again 281 applies only in ECMP sections of the distribution tree. 283 -----------------+ 284 / V 285 / +----> (R2)--------> (R3)<--(RcvrB) 286 / / \ | 287 / / \ +------+ 288 / / V 289 (source/s)...>(R1) ------> (R5) .......> (R4)<-(RcvrA) 290 . . /\ / 291 . . ----------+ \ / 292 . . / \ / 293 (R6).....> (R7) .......> (R8)<+ 295 Legend : dotted lines represent path computed. 297 Figure 3: Instantiating a subsequent optimal power consuming 298 distribution tree 300 Here R4 would now have to choose the path to R7 (which is also the 301 RP) through R5 since the PWR metric on R4 to R7 direct link would 302 have increased as a result of carrying the old stream. 304 Dynamism in multicast trees is another important point to consider as 305 PIM-Prunes and other PIM-joins may happen with respect to the 306 replication point under consideration. Suitable modifications to the 307 algorithm may be proposed to take into consideration such dynamic 308 conditions without causing major interruption to the multicast flows. 310 2.1 Discussion of this scheme 312 This scheme applies to PIM-SM, PIM-SSM. Applicability to PIM-Bidir is 313 also possible but currently not discussed in this document in detail. 315 Routers may have step levels in which they increase power consumption 316 when they additively are loaded with more large bandwidth consuming 317 multicast streams. Calibrating these levels may be useful for 318 implementing this scheme. It is possible that such calibrated 319 thresholds can be used for calculating the power to available 320 replication capacity ratios in the Multicast environments. This would 321 be useful for bringing down the frequency of calculations on a line- 322 card about its ratios. When power consumption meanders within a 323 certain given interval these ratios need not be calculated even if 324 further multicast streams are added to it. The incentive is to 325 recognize a linecard that does not drastically change power 326 consumption even if large bandwidth streams are added onto it for 327 replication and thus give it credit for its power optimal 328 functioning. If a linecard on a router tends to consume the highest 329 level of power even when carrying low amounts of multicast streams 330 and replicating them on its line card, it would automatically have a 331 poor ratio when compared to a linecard that efficiently uses power 332 when considering the replication capacity being used. The best case 333 would be a low power consuming line-card or a router filled with such 334 line cards that does not leave its power interval no matter how much 335 ever replication capacity is sought to be used on it. But that would 336 be an ideal condition but it is definitely an idealistic scenario 337 towards which the router manufacturers should look at. 339 2.2 Pseudo code for the proposed changes 341 If (there exist ECMP paths to a PIM upstream NBR) 342 AND (No localized conditions exist) 343 then 344 Calculate PWR ratio for each LC; 345 PWR per LC = power consumed by LC / 346 AvailableMCastReplicCap; 347 Choose the Lowest PWR; 348 Select that LC for the link to send PIM Join; 349 Endif 351 2.3 Port Choice on same Linecard 353 In case in the set of ECMP links to the upstream PIM NBR there exist 354 ports from the same line card and there is a tie breaking mechanism 355 required amongst these ports the following changes are recommended. 357 If (there exist ports on the same linecard which 358 constitute ECMP paths to a PIM upstream NBR) 359 AND (No localized conditions exist) 360 then 361 Choose the Lowest Utilized port; 362 Select that port in LC for the link to send PIM Join; 363 Endif 365 3 Conclusion 367 Here we propose a scheme that takes into account the power to 368 available replication capacity ratios as weights for the edges which 369 are the ECMP set of paths to a PIM upstream neighbor and compute a 370 low cost power path for multicast replication. This is an area of 371 future study which would be most conducive in terms of bringing about 372 optimal power usage and thus incentivising vendors to manufacture low 373 power consuming equipment. Compelled to bring about radical change in 374 the thinking relating to power consumption vendors manufacturing 375 networking equipment will drive down power consumption since the 376 scheme proposed chooses or gives priority to low power guzzling 377 linecards. 379 3 Security Considerations 381 None. 383 4 IANA Considerations 385 None. 387 5 References 389 5.1 Normative References 391 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 392 Requirement Levels", BCP 14, RFC 2119, March 1997. 394 [RFC1776] Crocker, S., "The Address is the Message", RFC 1776, April 395 1 1995. 397 [TRUTHS] Callon, R., "The Twelve Networking Truths", RFC 1925, 398 April 1 1996. 400 5.2 Informative References 402 [1] G. Appenzeller, Sizing router buffers, Doctoral 403 Thesis, Department of Electrical Engineering, Stanford 404 University, 2005. 406 [2] A. P. Bianzino, C. Chaudet, D. Rossi and J. L. 407 Rougier, A survey of green networking research, IEEE 408 Communications and Surveys Tutorials, preprint. 410 [3] J. Baliga, K. Hinton and R. S. Tucker, Energy 411 consumption of the internet, Proc. of joint international 412 conference on optical internet, June 2007, pp. 1993. 414 [4] J. Chabarek, J. Sommers, P. Barford, C. Estan, D. 415 Tsiang and S. Wright, Power awareness in network design 416 and routing, Proc. of the IEEE INFOCOM 2008, April 2008, 417 pp. 457-465. 419 [5] M. Xia et. al., Greening the optical backbone network: 420 A traffic engineering approach, IEEE ICC Proceedings, May 421 2010, pp. 1995. 423 [6] W. Lu and S. Sahni, Low-power TCAMs for very large 424 forwarding tables, IEEE/ACM Transactions on Computer 425 Networks, June 2010, vol. 18, no. 3, pp. 948-959. 427 [7] B. Zhang, Routing Area Open Meeting, Proceedings of 428 the IETF 81, Quebec, Canada, July 2011. 430 [EVILBIT] Bellovin, S., "The Security Flag in the IPv4 Header", 431 RFC 3514, April 1 2003. 433 [RFC5513] Farrel, A., "IANA Considerations for Three Letter 434 Acronyms", RFC 5513, April 1 2009. 436 [RFC5514] Vyncke, E., "IPv6 over Social Networks", RFC 5514, April 1 437 2009. 439 Authors' Addresses 441 Shankar Raman 442 Department of Computer Science and Engineering 443 I.I.T Madras, 444 Chennai - 600036 445 TamilNadu, 446 India. 448 EMail: mjsraman@cse.iitm.ac.in 450 Balaji Venkat Venkataswami 451 Department of Electrical Engineering, 452 I.I.T Madras, 453 Chennai - 600036, 454 TamilNadu, 455 India. 457 EMail: balajivenkat299@gmail.com 459 Prof.Gaurav Raina 460 Department of Electrical Engineering, 461 I.I.T Madras, 462 Chennai - 600036, 463 TamilNadu, 464 India. 466 EMail: gaurav@ee.iitm.ac.in 468 Vasan Srini, 469 Department of Electrical Engineering, 470 I.I.T Madras, 471 Chennai - 600036, 472 TamilNadu, 473 India. 475 Email: vasan.vs@gmail.com