idnits 2.17.1 draft-mjsraman-pce-power-replic-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 150 has weird spacing: '... stream in su...' == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 18, 2013) is 4077 days in the past. Is this intentional? Checking references for intended status: None ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'RFC2119' on line 139 -- Looks like a reference, but probably isn't: 'RFC3630' on line 402 -- Looks like a reference, but probably isn't: 'RFC 3630' on line 376 == Unused Reference: '1' is defined on line 460, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 464, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 468, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 472, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 477, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 481, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 485, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PCE Working Group Shankar Raman 3 Internet-Draft Balaji Venkat Venkataswami 4 Intended Status: Experimental RFC Gaurav Raina 5 Expires: August 2013 IIT Madras 6 February 18, 2013 8 Constructing power optimal P2MP TE-LSPs within an AS 9 draft-mjsraman-pce-power-replic-02 11 Abstract 13 Power consumption in multicast replication operations is an area of 14 concern and choosing suitable replication points that can decrease 15 power consumption overall assumes importance. Multicast replication 16 capacity is an attribute of every line card of major routers and 17 multi-layer switches that support multicast in the core of an 18 Internet Service Provider (ISP) or an enterprise network. 20 Currently multicast replication points on Point-to-Multipoint Traffic 21 Engineering Label-Switched-Paths (P2MP TE-LSPs) consume power while 22 delivering multiple output streams of data from a given input stream. 23 The multicast distribution trees are constructed without any regard 24 for a proper placement of the replication points and consequent 25 optimal power consumption at these points. 27 This results in overloading certain routers while under-utilizing 28 others. An optimal usage of these replication resources could 29 substantially reduce power consumption on these routers. In this 30 paper, we propose a mechanism by which P2MP TE-LSPs are constructed 31 for carrying multicast traffic across multiple areas within a given 32 AS. We propose that these LSPs be built by using the advertisements 33 of the power-replication capacity ratio advertised by fine grained 34 components such as multicast capable line-cards of routers and multi- 35 layer switches deployed within an AS. 37 Status of this Memo 39 This Internet-Draft is submitted to IETF in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF), its areas, and its working groups. Note that 44 other groups may also distribute working documents as 45 Internet-Drafts. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 The list of current Internet-Drafts can be accessed at 53 http://www.ietf.org/1id-abstracts.html 55 The list of Internet-Draft Shadow Directories can be accessed at 56 http://www.ietf.org/shadow.html 58 Copyright and License Notice 60 Copyright (c) 2013 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 76 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 77 2. Methodology of the proposal . . . . . . . . . . . . . . . . . 4 78 2.1 Discussion of this scheme . . . . . . . . . . . . . . . . . 6 79 2.2 Power to available multicast replication capacity ratio in 80 a TLV . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 82 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 11 83 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 11 84 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 85 5.1 Normative References . . . . . . . . . . . . . . . . . . . 11 86 5.2 Informative References . . . . . . . . . . . . . . . . . . 11 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 89 1 Introduction 91 Multicast traffic across multiple areas within a given AS, may be 92 carried using P2MP TE-LSPs. The traffic may be carried from a ingress 93 Provider Edge (PE) router to several egress PEs, example in a 94 multicast Virtual Private Network (MVPN) case. The autonomous system 95 (AS) may comprise of multiple areas involving a backbone area and 96 several non-backbone areas connected to each other through the 97 backbone. If several such multicast streams are to be carried in the 98 AS, it would be most useful to have such P2MP TE-LSPs constructed 99 such that they have optimal power to available replication capacity 100 ratios on the routers' linecards that they traverse from source to 101 destinations. The intent is to provide a solution whereby several 102 such P2MP TE-LSPs can be laid out in such a way that the set of 103 routers that replicate multicast traffic traversed by the P2MP TE- 104 LSPs are most optimal in the utilization of the power provided to 105 them given that there is sufficient replication capacity available. 106 This we believe would essentially lead to a equilibrium of power to 107 available replication capacity ratios amongst all routers in the 108 topology which in turn would optimize and reduce the overall ratios 109 for the AS. 111 Each router and its respective linecards deployed in the AS have an 112 advertised capability for replication. Most multi-layer switches and 113 routers from vendors advertise in their respective data sheets a 114 certain capability for replication for each type of linecard 115 deployable on the box. Replication consumes power and delivers 116 multiple streams of data from a given input stream. It is status quo 117 that P2MP (Point-to-Multipoint) Label Switched Paths are constructed 118 without taking into account the power to available replication 119 capacity ratios of such routers thus overloading certain routers 120 while underutilizing the others. An optimal usage of these resources 121 could reduce power consumption on these routers / multi-layer 122 switches. This equilibrium could be arrived at by using a capability 123 to advertize from each router a Traffic Engineering Database Link 124 State Advertisement (TED-LSA) that carries the power to available 125 replication capacity ratio of each of the said router's line cards, 126 depending on the current utilization of its replication capacity and 127 power consumption. 129 This paper is organized as follows; In section 2, we deal with the 130 scheme that we propose. In section 2.1, we discuss some examples of 131 the scheme at work, and in section 3 we conclude with future areas of 132 study that may be useful to undertake. 134 1.1 Terminology 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in RFC 2119 [RFC2119]. 140 2. Methodology of the proposal 142 The key metric under consideration is the power consumed DIVIDED BY 143 available replication capacity on each of the linecards of a router 144 in the AS, which is eligible to be used as a node atop which 145 multicast traffic can be carried. Once an advertisement about the 146 said metric has been sent in the regular flooding process in Link 147 State routing protocols such as OSPF-TE or ISIS-TE, it would be 148 possible for a head-end router for a P2MP TE-LSP to compute the TE- 149 LSP through the AS from the ingress PE to all egress PEs of that 150 multicast stream in such a way that the power to available 151 replication capacity ratios at the replication points are minimal on 152 that path. The Constrained Shortest Path First (CSPF) algorithm could 153 be modified to compute the least cost power to available replication 154 capacity ratio path and thus cause an equilibrium shift to be caused. 155 This path would be supplied to the RSVP-TE component of the head-end 156 and that would set up the path with appropriate labels. Once RSVP-TE 157 establishes the path and traffic is carried across it, the reduced 158 replication capacity of the routers in the P2MP TE-LSP path would be 159 re-advertised again, which in turn would be useful for computation of 160 the other paths from the instance that the replication capacity 161 changed on these routers. 163 Assume that the following router topology in the vicinity of the 164 sender / senders is computed. 166 +----------------+ 167 / V 168 / +----> (R2) ------> (R3)<--(RcvrB) 169 / / \ | 170 / / \ +------+ 171 / / \V 172 (source/s)--->(R1)------> (R5) ------> (R4)<-(RcvrA) 173 \ /\ / 174 \ ----------+ \ / 175 \ / \ / 176 (R6)----> (R7) --------> (R8)<-+ 178 Figure 1: Topology within a given AS with coloring for Power- 179 replication ratios 180 In the above diagram you can see that the source/sources are 181 connected using a multi homed connections to the same ISP through 182 Routers R1 nd R2. Similarly there are two Receiver sites RcvrA and 183 RcvrB that are multihomed to TWO Routers RcvrB to R3 and R4 and for 184 RcvrA to R4 and R8 respectively. 186 +----------------+ 187 / V 188 / +----> (R2) ------> (R3)<--(RcvrB) 189 / / \ | 190 / / \ +------+ 191 / / \V 192 (source/s)...>(R1)------> (R5) ------> (R4)<-(RcvrA) 193 . .\ / 194 . ........... \ / 195 . . \ / 196 (R6)....> (R7) ........> (R8)<-+ 198 Legend : dotted lines represent path computed. 200 Figure 2: Instantiating an optimal power consuming distribution tree 202 Given that the path calculation engine at the head-end R1 is given 203 this topology and along with other TED-LSA packets the current power 204 to available replication capacity ratios are advertised through the 205 IGP-TE extensions to the head-end R1, the paths with the least power 206 to available replication capacity ratios are computed and the paths 207 setup from head-end PEs to the tail-end PEs where the recievers are 208 connected. It is to be noted that the ratios computed for power to 209 available replication capacity on the topology are examined and the 210 replication points are setup on those routers that have the least 211 power to available replication capacity ratio. If branching points 212 are not required at certain points, these are anyways placed on least 213 cost power ratio routers that are the next best location to setup a 214 non-branching point. 216 Assume the following path is computed as per the least power to 217 available replication capacity ratios. Paths are computed through R6, 218 R7, R8, R4, and say the multicast stream occupies 4GB of traffic 219 along this tree so constructed and the available capacity of these 220 routers reduces to 6GB assuming all of them have a base capacity of 221 10GB. Subsequent paths constructed would have to take into account 222 the newly computed power to current replication capacity ratio in the 223 topology and construct new P2MP TE-LSPs for multicast streams yet to 224 come. 226 Assume another 6GB worth of traffic is loaded onto this topology in 227 terms of a multicast stream / multiple streams then the new path 228 computed for these new streams would possibly utilize the same path 229 as computed before. If the old streams reduce the replication 230 capacity to an extent such that routers through which they pass can 231 no longer be used since these routers' power to available replication 232 capacity has become poor when compared to other paths then a 233 different path may be computed from the ingress PE to the egress PEs 234 in such a way as to avoid those routers which have such poor ratios. 236 For example, assume R6, R7, R8 and R4 have exhausted their capacity, 237 or guzzle more power as a result of them carrying the 4GB stream that 238 was originally placed atop them. then a different path would be 239 chosen as follows. The path followed as shown in the Figure is R2,R3 240 and R4. Given that R4 is the only choice since it has connectivity to 241 both Receivers, in this case the branch point is placed atop R3, one 242 branch to get to RcvrB and the other to get to RcvrA through R4. 243 Policy decisions could guide the placement in case of a tie. Here the 244 the only choice has been to drive the end replication to RcvrA 245 through R4 and RcvrB through R3 owing to topology constraints. 247 It is to be noted that the power consumed by the linecard is divided 248 by the available replication capacity to arrive at a ratio and that 249 ratio is assigned as a weight to all of the links ingressing on that 250 linecard. It is possible that one might take a weighted average by 251 dividing a weighted co-efficient sum by the weighted sum of ingress 252 links on a linecard and the metrics so assigned be used as the metric 253 for calculation. 255 .................. 256 . V 257 . +----> (R2)........> (R3)<--(RcvrB) 258 . / . | 259 . / . +------+ 260 . / .V 261 (source/s)--->(R1)-------> (R5) -------> (R4)<-(RcvrA) 262 \ /\ / 263 \ ----------+ \ / 264 \ / \ / 265 (R6)-----> (R7) -------> (R8)<+ 267 Legend : dotted lines represent path computed. 269 Figure 3: Instantiating a subsequent optimal power consuming 270 distribution tree 272 2.1 Discussion of this scheme 274 It is to be noted that our scheme applies to centralized schemes of 275 path calculations. What is being calculated is a tree of nodes that 276 form a P2MP tree where each node can conceptualized as a router (read 277 also multi-layer switches) and each edge the link connecting one or 278 more ports on a line card to another linecard on a downstream router 279 to carry multicast traffic from a source located at the head end 280 ingress router to several receiver nodes connected to egress routers. 281 We will call this calculated tree as a P2MP tree. The tree is 282 calculated by the PCE in the head end / ingress router through which 283 sources connect. The PCE calculates the intra-AS P2MP path (the 284 literal P2MP TE-LSP within the AS) within that AS. 286 The calculated power to available replication capacity ratio is 287 assigned to each of the ingress links on a linecard on a router en- 288 route to egress links through which the multicast stream is 289 replicated on the same router. Thus all ingress links to a router 290 through a linecard are assigned the same metric as the power ratio so 291 calculated. The egress links would in continuity connect to a unicast 292 tunnel or another branch-point in the tunnel towards the receivers 293 which are represented as the egress routers. The egress routers would 294 in turn be replication points or direct connections to the actual 295 receivers. This method could be applied for multicast traffic to be 296 transported through MVPNs. The method of egress routers' discovery is 297 left to existing mechanisms. The primary input to the invention 298 proposed is an ingress router and their respective egress routers. 299 The other input to the construction of P2MP tree is the router level 300 topology with the metrics for the power to available replication 301 capacity ratio. 303 It is to be noted that this CSPF calculation can be hastened in terms 304 of time complexity by dividing the weights into equivalence classes. 305 First we divide the nodes into graph colored nodes with the least 306 ratio nodes marked as green as shown in the figure and given that 307 there exists a path that is all green from source to egress PEs, one 308 of such paths is chosen. If after coloring the nodes a path which is 309 disconnected exists, we incrementally add the next best colored nodes 310 to the graph to see if we a get a connected path from source to 311 egresses. These steps are repeated until we find a connected path. 312 This will hasten the algorithm to a conclusion rather than use a 313 brute force method which may take inordinate amount of time. R4 being 314 used in the 6GB case is an example of this. Because of topology 315 restrictions the R4 node had to be chosen inspite of the fact that it 316 is not green after carrying the 4GB stream. 318 Routers may have step levels in which they increase power consumption 319 when they additively are loaded with more large bandwidth consuming 320 multicast streams. Calibrating these levels may be useful for 321 implementing this scheme. It is possible that such calibrated 322 thresholds can be used for advertising the power to available 323 replication capacity ratios in the IGP-TE advertisements. This would 324 be useful for bringing down the frequency of updates or 325 advertisements from a line-card about its ratios. When power 326 consumption meanders within a certain given interval these ratios 327 need not be readvertised even if further multicast streams are added 328 to it. The incentive is to recognize a linecard that does not 329 drastically change power consumption even if large bandwidth streams 330 are added onto it for replication and thus give it credit for its 331 power optimal functioning. If a router tends to consume the highest 332 level of power even when carrying low amounts of multicast streams 333 and replicating them on its line card, it would automatically have a 334 poor ratio when compared to a router that efficiently uses power when 335 considering the replication capacity being used. The best case would 336 be a low power consuming line-card or a router filled with such line 337 cards that does not leave its power interval no matter how much ever 338 replication capacity is sought to be used on it. But that would be an 339 ideal condition but it is definitely an idealistic scenario towards 340 which the router manufacturers should look at. 342 It is possible that several multicast streams may be aggregated onto 343 a single P2MP-TE-LSP representing the given multicast tree that 344 encompasses the union of all the egress PEs of the several multicast 345 streams. The Ingress PE router is however common for all the 346 multicast streams so covered. Aggregation of these several multicast 347 streams from a given Ingress PE to several egress PEs is a common 348 occurrence to save the amount of state in the core of the network. By 349 aggregating these streams onto a single P2MP tree, it is possible to 350 amortize the cost of replication amongst a particular set of ingress 351 linecards / ports on those line cards while taking into account the 352 current power consumption and replication capacity available at the 353 time of computing the P2MP TE-LSP. 355 The dynamic nature of the multicast tree and the egress PEs that join 356 into it and leave it based on whether there are multicast listeners 357 in that VPN site attached to the said egress PE/ PEs, makes it 358 important to position the replication points in such a way that there 359 is maximum leverage on optimization in the ratios overall for the AS 360 which are computed. When aggregating multiple multicast streams over 361 a single P2MP TE-LSP it is important to keep this in mind. 363 So the key point is to aggregate multiple streams with a set 364 theoretical approach in mind so that there is maximum overlap of 365 egress PEs for these streams and position these streams atop a P2MP 366 TE-LSP in such a way that ratios are most optimal for that set of 367 streams (with the overall AS power consumption in mind). 369 2.2 Power to available multicast replication capacity ratio in a TLV 371 As per [RFC3630] the Link TLV can be used to carry this power to 372 available multicast replication capacity ratio with an additional 373 sub-TLV of the link TLV. The sub-type number 10 is recommended to be 374 defined for this purpose. 376 [RFC 3630] states in section 2.2.1 and we QUOTE ... 378 2.2.1 Link TLV 380 The Link TLV describes a single link. It is constructed of a set of 381 sub-TLVs. There are no ordering requirements for the sub-TLVs. 383 Only one Link TLV shall be carried in each LSA, allowing for fine 384 granularity changes in topology. 386 The Link TLV is type 2, and the length is variable. 388 The following sub-TLVs of the Link TLV are defined: 390 1 - Link type (1 octet) 391 2 - Link ID (4 octets) 392 3 - Local interface IP address (4 octets) 393 4 - Remote interface IP address (4 octets) 394 5 - Traffic engineering metric (4 octets) 395 6 - Maximum bandwidth (4 octets) 396 7 - Maximum reservable bandwidth (4 octets) 397 8 - Unreserved bandwidth (32 octets) 398 9 - Administrative group (4 octets) 399 10 - Power-to-Multicast-replication-capacity (4 octets) 401 This memo defines sub-Types 1 through 9. See the IANA Considerations 402 in [RFC3630] section for allocation of new sub-Types. 404 The Link Type and Link ID sub-TLVs are mandatory, i.e., must appear 405 exactly once. All other sub-TLVs defined here may occur at most 406 once. These restrictions need not apply to future sub-TLVs. 407 Unrecognized sub-TLVs are ignored. 409 Various values below use the (32 bit) IEEE Floating Point format. For 410 quick reference, this format is as follows: 412 0 1 2 3 413 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 415 |S| Exponent | Fraction | 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 S is the sign, Exponent is the exponent base 2 in "excess 127" 419 notation, and Fraction is the mantissa - 1, with an implied binary 420 point in front of it. Thus, the above represents the value: 422 (-1)**(S) * 2**(Exponent-127) * (1 + Fraction) 424 It is proposed that we use the Power-to-multicast-replication- 425 capacity ratio as a 32 bit IEEE floating Point format field for the 426 purpose of this document. 428 3 Conclusion 430 Here we propose a scheme that takes into account the power to 431 available replication capacity ratios as weights for the edges and 432 compute a low cost power path for multicast replication. This scheme 433 could be extended to inter-AS multicast streams or to inter-AS 434 multicast streams where the multicast stream is sought to be carried 435 over multiple ASes. This is an area of future study which would be 436 most conducive in terms of bringing about optimal power usage and 437 thus incentivising vendors to manufacture low power consuming 438 equipment. Compelled to bring about radical change in the thinking 439 relating to power consumption vendors manufacturing networking 440 equipment will drive down power consumption since the scheme proposed 441 chooses or gives priority to low power guzzling linecards. 443 3 Security Considerations 445 The security considerations for this proposal are the same as any NEW 446 opaque LSA introduced in an IGP like OSPF, IS-IS. 448 4 IANA Considerations 450 IANA would need to assign a NEW opaque LSA type to carry power and 451 multicast replication capacity such that this information can be 452 carried in the TE-LSAs within an AS. 454 5 References 456 5.1 Normative References 458 5.2 Informative References 460 [1] G. Appenzeller, Sizing router buffers, Doctoral 461 Thesis, Department of Electrical Engineering, Stanford 462 University, 2005. 464 [2] A. P. Bianzino, C. Chaudet, D. Rossi and J. L. 465 Rougier, A survey of green networking research, IEEE 466 Communications and Surveys Tutorials, preprint. 468 [3] J. Baliga, K. Hinton and R. S. Tucker, Energy 469 consumption of the internet, Proc. of joint international 470 conference on optical internet, June 2007, pp. 1993. 472 [4] J. Chabarek, J. Sommers, P. Barford, C. Estan, D. 473 Tsiang and S. Wright, Power awareness in network design 474 and routing, Proc. of the IEEE INFOCOM 2008, April 2008, 475 pp. 457-465. 477 [5] M. Xia et. al., Greening the optical backbone network: 478 A traffic engineering approach, IEEE ICC Proceedings, May 479 2010, pp. 1995. 481 [6] W. Lu and S. Sahni, Low-power TCAMs for very large 482 forwarding tables, IEEE/ACM Transactions on Computer 483 Networks, June 2010, vol. 18, no. 3, pp. 948-959. 485 [7] B. Zhang, Routing Area Open Meeting, Proceedings of 486 the IETF 81, Quebec, Canada, July 2011. 488 Authors' Addresses 490 Shankar Raman 491 Department of Computer Science and Engineering 492 IIT Madras 493 Chennai - 600036 494 TamilNadu 495 India 497 EMail: mjsraman@cse.iitm.ac.in 499 Balaji Venkat Venkataswami 500 Department of Electrical Engineering 501 IIT Madras 502 Chennai - 600036 503 TamilNadu 504 India 506 EMail: balajivenkat299@gmail.com 508 Prof.Gaurav Raina 509 Department of Electrical Engineering 510 IIT Madras 511 Chennai - 600036 512 TamilNadu 513 India 515 EMail: gaurav@ee.iitm.ac.in