PCE W. Lu Internet-Draft S. Kini Intended status: Standards Track S. Narayanan Expires: October 30, 2011 Ericsson April 28, 2011 Relayed CSPF for Multi-Area Multi-AS PCE draft-lu-relayed-cspf-01 Abstract For LSPs that span across multiple areas or multiple autonomous systems (AS), the path computation element (PCE) in each area or each AS can cooperate and conclude the optimal results if so exist. An upstream PCE, though incapable to carry out the path computation for the tailend outside of its domain, can provide the history of the computation to its downstream PCEs which will assume the computation job till it is accomplished or relay the baton to the next runner. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 30, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must Lu, et al. Expires October 30, 2011 [Page 1] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. CSPF Seed . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Multiple Seeds . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Heap Equivalence . . . . . . . . . . . . . . . . . . . . . 4 2.2.1. SPT Equivalence . . . . . . . . . . . . . . . . . . . 5 2.2.2. Seed Deposit Timing . . . . . . . . . . . . . . . . . 6 2.2.3. Seed Set Reduction . . . . . . . . . . . . . . . . . . 6 3. Multi-Area Path Computation . . . . . . . . . . . . . . . . . 6 3.1. Inter-area PCE . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Initial Seed Set . . . . . . . . . . . . . . . . . . . . . 7 3.3. PCE Relay . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4. Relay Content . . . . . . . . . . . . . . . . . . . . . . 8 3.5. PCE Elect . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Multi-Home Tail-End . . . . . . . . . . . . . . . . . . . . . 9 4.1. Relay Timer . . . . . . . . . . . . . . . . . . . . . . . 10 5. Multi-AS Path Computation . . . . . . . . . . . . . . . . . . 11 5.1. Information Hiding . . . . . . . . . . . . . . . . . . . . 11 5.2. Transit Link . . . . . . . . . . . . . . . . . . . . . . . 12 5.3. AS Number . . . . . . . . . . . . . . . . . . . . . . . . 12 6. Other Cases . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1. Backups . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.2. SRLG . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.3. Loose ERO . . . . . . . . . . . . . . . . . . . . . . . . 13 6.3.1. Pre-computed EROs . . . . . . . . . . . . . . . . . . 13 6.3.2. Re-Query . . . . . . . . . . . . . . . . . . . . . . . 13 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 10.2. Informative References . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 Lu, et al. Expires October 30, 2011 [Page 2] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 1. Introduction The demand for multi-area multi-AS path computation ability has evolved from the academic discussion to carrier network's feature request. A few solutions have been proposed and there are three major variations in this field. A global TE database is ideal and the simplest for serving multi-area or multi-as path request. This idea is apparently prohibitive for two reasons. 1. Such database may be too big and negate the purpose of having multiple areas or ASes; 2. This violates the information hiding and confidentiality requirement and is unacceptable by ISPs. A crankback method is more practical as it is an exhaustive search based mechanism and will find an LSP if it exists. The drawback nevertheless is obvious: 1. It does not scale, for one that it often requires and wastes more than one tryout to find a qualified LSP, and for two it is RSVP signaling based which is by its nature poor in scaling; 2. The extra signaling messages add burdens on the existing network; 3. The path, if found, is not guaranteed optimal; 4. It requires lots of manual configurations to specify border routers and hence is labor intensive; 5. It requires substantial RSVP changes, in both protocol and operation. BRPC [RFC5441] is another idea in coping with the aforementioned problems. It assumes that the destination is known in a particular domain and area. This assumption is not always true. Besides the destination may be multi-homed, meaning reachable through different areas and domains. The RFC method cannot handle this. Further more, the RFC's procedure mandates a PCEP [RFC5440] extension which understands the Virtual Shortest Path Tree (VSPT). This further complicates the method. Further more, VSPT approach only address one destination at a time. This document describes a relayed CSPF based PCE scheme which offers Lu, et al. Expires October 30, 2011 [Page 3] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 a solution with optimality, scalability, and simplicity. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2. Acronyms CSPF - Constrained Shortest Path First PCE - Path Computation Element PCReq - Path Computation Request PCRep - Path Computation Reply AS - Autonomous System LSP - Label Switched Path RSVP - Resource ReserVation Protocol PCEP - PCE to PCE Communication Protocol SPT - Shortest Path Tree VSPT - Virtual Shortest Path Tree 2. CSPF Seed Call the node initially deposited into the heap seed. CSPF, or more generically SPF, is a seed based algorithm. The entire SPT is built upon this seed. 2.1. Multiple Seeds It is not necessary, however, that the heap can have only one seed. In fact, most of the time during the SPF expansion, the heap contains many nodes which can be perceived as seeds for further expansion. 2.2. Heap Equivalence An SPF heap possesses following properties: Lu, et al. Expires October 30, 2011 [Page 4] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 1. A heap with one initial seed is equivalent to that with multiple intermediate seeds in any SPF stages for the destinations that have not yet been reached. 2. The deposit time of seeds is insensitive to destinations that have not yet been reached, provided that the seeds carry correct attributes values such as cost and nexthop. 3. The multiple seeds in property 1 can further be reduced to those that constitute a set of nodes besides which the destinations are not viable. 2.2.1. SPT Equivalence The property 1 is not difficult to comprehend. During normal SPF cycles, the heap will change and the path tree will grow. At any SPF cycle, the path tree records reachability to certain destinations. This is important to generic SPF applications such as IGP routing protocols. With IGP, either OSPF [RFC2328] or IS-IS [RFC1195][ISO.10589.1992], all destinations must be included, whether they come out of the heap early or late. None of those can be neglected. Nevertheless in CSPF, since only the targeted destinations matter, non-relevant records can be thrown away. The early path tree records are insignificant and can be disregarded. Therefore for the selected destinations, the expanded heap with multiple seeds is equivalent with the heap at the initial stage. Figure 1 gives a simple illustration of this concept. With normal SPF computation, the Headend node "H" is used as an initial seed to the SPF heap. The final shortest path tree (SPT) will provide reachability to nodes "A", "B" and the Tailend node "T". If one uses nodes "A" and "B" as deposit seeds, he will conclude a SPT with the same reachability for "T". The difference between the two SPTs is the reachabilities from "H" to nodes "A" and "B". A / \ H T \ / B Figure 1: SPT Equivalence Lu, et al. Expires October 30, 2011 [Page 5] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 2.2.2. Seed Deposit Timing The second property indicates that the seed deposit time does not change its SPT contribution for destinations that have not yet been reached. Figure 2 shows this concept in detail. ---A--- / \ H--B---C--T Figure 2: Seed Deposit Timing For destination "T", using seed set "A" and "B", or "A" and "C" will produce identical results. 2.2.3. Seed Set Reduction The third property allows us to remove the seeds that will not contribute to the path to the destinations. The statement is the key point that this proposal is based upon. As shown in Figure 3, for the reachability to "T", the initial seed "H" can be replaced with "A and B", per property 1. The two seeds can further be replaced with "A, C and D", per property 2. Now since "D" will not contribute to the path to "T", the seed set "A, C, D" can further be reduced to "A and C", which is the property 3. ---A--- / \ H--B---C--T \ --D--E Figure 3: Seed Set Reduction These properties are the base of multi-area multi-AS path computation method described in this document. 3. Multi-Area Path Computation 3.1. Inter-area PCE Per IGP nature, the Tailend in a different area are not visible to the computing PCE if the PCE only knows the TE database of the Headend area. There exist options for multi-area TE database or even Lu, et al. Expires October 30, 2011 [Page 6] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 a global multi-AS TE-database. But that will have scalability and confidentiality consequences. And this option is beyond of the scope of this document. Nevertheless, using the method of this document, the cross-area path can be achieved with no need of global TE database, nor much manual intervention. 3.2. Initial Seed Set Area border routers, or BN (Border Nodes) for short, lie in the necessary paths to the destination in next area, or areas beyond. They are natural choices of the initial seed set. 3.3. PCE Relay The path computation proceeds as a relayed CSPF job, one per area. Take Figure 4 for example, assuming routers "A" and "C" are BNs. The column line divides the topology into two areas, area "west" on the left, and "east" on the right. west : east ----A-- / : \ H--B--C---T : Figure 4: PCE Relay Following steps describe the procedure: a. PCE in area "west", or PCE-west for short, computes paths to "A" and "C"; b. PCE-west sends PCReq to PCE of "east" for relayed CSPF. The path information to "A" and "C" are encoded in the same PCReq message. The path information contains path attributes such as cost, bandwidth, admin-group, hop-count, etc. c. PCE-east uses the path information to BNs "A" and "C" and deposits them to its SPF heap. It then starts its SPF, and concludes the path computation to "T". The path to "T" if found, will be the segment in area "east". d. PCE-east then sends PCRep back to PCE-west with its path segment information. e. PCE-west maps the received path segment to one of BNs. It then stitches the segment to the one in area "west" and forms a Lu, et al. Expires October 30, 2011 [Page 7] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 complete path. If the destination is not in east, but further in an area eastward, PCE-east will relay the PCReq to a PCE further downstream, which will either find the target or continue to relay the job. The stitching will proceed likewise, but in the reversed order. 3.4. Relay Content Besides standard PCReq format, an extension to PCEP protocol is needed to allow encoding of seed information. Following table describes the minimum data fields: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Len | Node-ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Node-ID (Cont) | Sub-Type | Sub-Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seg-ID | Cost | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cost (Cont) | Hops | Sub-Type | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-Type #1 | | | // // | Sub-Type #M | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Relay Content TLV: Type - 1 byte, value TBD (RELAYED-SEED) Len - 1 byte, actual length in value Node-ID - 4 bytes, BN's node ID This TLV MAY have zero or more occurrences Sub-TLV: Lu, et al. Expires October 30, 2011 [Page 8] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 Sub-Type - 1 byte, value TBD (SEGMENT) Sub-Len - 1 byte, value 6 Seg-ID - Segment number, 1 byte, a BN can have multiple segment paths to it. This is to accommodate additive constraints; Cost - 4 bytes, starting cost Hops - 1 byte, starting hop count This TLV MAY have one or more occurrences There is no need to carry admin-group or bandwidth information. These can be learned from the standard path request information fields. 3.5. PCE Elect An area PCE has to know all BNs which connect to another area. This can be achieved with the help from the IGP protocols and their TE extensions. The method itself however is beyond the scope of this document. Once the list of BNs is known, the area PCE can send PCReq to PCEs in other areas. For a particular downstream area, only one PCE is necessary. Among eligible PCEs, an election mechanism may be necessary to avoid the waste and contention of computation resources. The elect PCE can be any router or a dedicated PCE. It does not have to be the transit router. Any tie-breaker algorithm can be used in such election. One simple method is to choose the one with the highest (or lowest) router ID. The election is a local decision of the requesting PCE. It can be proprietary and does not require standardization. 4. Multi-Home Tail-End A destination may be viable through multiple areas. This happens typically in dual-homed or multi-homed destination cases. To maximize the path availability and optimality, the PCReq SHOULD be sent to each neighboring area and to each area's PCE elect. As shown in Figure 6, the Headend in area A may reach out through its borders with area B and area C. Each PCReq message it sends is Lu, et al. Expires October 30, 2011 [Page 9] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 different per recipient PCE's service area, either B or C. For PCE of area B, the message encodes the seeds of BNs between area A and B. Likewise the PCReq for PCE of area C lists seeds of BNs between A and C. Both PCEs may find paths to the tailend, and send PCRep back to the Headend in A. To accommodate the race condition to two possible paths, the requester needs to implement a timer to allow reasonable wait time to collect all possible PCRep, details in the next section. The sending of multiple PCReq is not limited to the headend. It can happen to any intermediate PCEs as far as they have multiple exit borders to other areas. ----- ----- ------ | || B || | | ||-----|| | | A | | D | | ||-----|| | | || C || | ----- ----- ------ Figure 6: Multi-Homed Tailend 4.1. Relay Timer A PCE should always send PCRep back to its requester whether it finds the path successfully or not. However the PCRep may take time and may be lost at all. For this reason it is advisable for the requester to instantiate a relay timer so that it will not wait indefinitely if the PCRep never comes. The timer also facilitates the requester to collect and compare all returning PCRep. Figure 7 is a sample pseudo code of the timer logic. Lu, et al. Expires October 30, 2011 [Page 10] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 Sending PCReq: For (PCE_Elect of each border area) { send_PCReq(PCE_Elect, path_Req_Info, seeds_of_BNs); add_to_PCReq_Pending_List(PCE_Elect); if (relay_timer==NULL) { create_relay_timer(); } } Receiving PCRep: PCE_Elect = lookup(PCRep's source address); path = retrieve_from(PCRep, PCE_Elect); best_path = better_path(best_path, path); remove_from_PCReq_Pending_List(PCE_Elect); if (Pending_List==NULL) { Cancel_timer(relay_timer); if (self==Headend) { terminate; } else { send_PCRep(best_path, upstream_PCE); } } Relay Timer Expires: cleanup_Pending_List(); if (Pending_List==NULL) { Cancel_timer(relay_timer); if (self==Headend) { terminate; } else { send_PCRep(best_path, upstream_PCE); } } Figure 7: Relay Timer Pseudo Code 5. Multi-AS Path Computation For path computation across different autonomous systems, the methods for multi-area in section 3 mostly apply as well except a few special handlings described below. 5.1. Information Hiding When a PCE sends a PCRep to the requester which is in a different AS, it does not send explicit hop-by-hop EROs. Instead it sends a loose ERO with path level characters such as cost metric. This ensures the Lu, et al. Expires October 30, 2011 [Page 11] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 information hiding from different ASes while the End-to-End path can still be established. 5.2. Transit Link Unlike the multi-area setup where an area border router sits over both areas, two AS border routers need a transit link to connect them together. ----- Transit ----- |AS1|---------|AS2| ----- Link ----- Figure 8: Transit Link As shown in Figure 8, the transit link needs to be considered for an ASBR when it is to send PCReq to its peer ASBR. The link's characters such as metric, hop count, bandwidth, MUST be taken into account of the seed value. The PCReq relay logic remains the same. 5.3. AS Number The BN election concept does not apply in cross AS BNs. There is no need to carry the AS number into the TE database. The PCReq and the corresponding seeds SHOULD be sent to each viable AS peer node. 6. Other Cases 6.1. Backups Backup LSPs, or bypass LSPs, or pass re-optimization, or any path computation that requires the knowledge of an existing LSP MUST have the information of that LSP passed along with the PCReq and seeds. In case of multi-AS path computation, the upstream LSR MUST hide the path segment detail from the downstream LSR. 6.2. SRLG The SRLG handling should be no different than that of single AS single area path computation. As long as the SRLG information is available, through for example GMPLS, each relayed PCE should compute correct PCE segment, and the end-to-end path should meet SRLG requirement. Lu, et al. Expires October 30, 2011 [Page 12] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 6.3. Loose ERO In multi-AS case, since the explicit EROs are not passed back, the AS border router has to recover the path segment that it promised upstream LSR when the LSP request reaches it. Following two approaches can be used for this purpose. 6.3.1. Pre-computed EROs The LSR remembers and makes a record of the path segment. When RSVP request comes into the ASBR LSR, it already has resolved EROs stored locally. This is quick but requires RSVP implementation change. 6.3.2. Re-Query The LSR does not need to remember the explicit path segment. It is kept stateless. When requested, the ASBR LSR will query its own PCE, because it does not remember the previous query and has forgotten its response. The result nevertheless should be the same, provided the topology has no changes since. If the answer is different or no longer available, the network has dramatically changed. The whole path computation process needs redo. And this has to be done anyway regardless of the method. 7. Acknowledgements The authors would like to thank Jeff Tantsura and others for their useful review and suggestions. 8. IANA Considerations This document defines the following additional TLVs to the PCEP protocol: +------+------------------+---------------+ | Type | Name | Source | +------+------------------+---------------+ | TBD | RELAYED-SEED | This document | | TBD | SEGMENT(Sub-TLV) | This document | +------+------------------+---------------+ Lu, et al. Expires October 30, 2011 [Page 13] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 9. Security Considerations There are no specific security considerations within the scope of this document. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 10.2. Informative References [ISO.10589.1992] International Organization for Standardization, "Intermediate system to intermediate system intra-domain- routing routine information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473)", ISO Standard 10589, 1992. [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, December 1990. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC5440] Vasseur, JP. and JL. Le Roux, "Path Computation Element (PCE) Communication Protocol (PCEP)", RFC 5440, March 2009. [RFC5441] Vasseur, JP., Zhang, R., Bitar, N., and JL. Le Roux, "A Backward-Recursive PCE-Based Computation (BRPC) Procedure to Compute Shortest Constrained Inter-Domain Traffic Engineering Label Switched Paths", RFC 5441, April 2009. Lu, et al. Expires October 30, 2011 [Page 14] Internet-Draft Relayed CSPF for Multi-Area Multi-AS PCE April 2011 Authors' Addresses Wenhu Lu Ericsson 300 Holger Way San Jose, California 95134 USA Phone: 408 750-5436 Email: wenhu.lu@ericsson.com Sriganesh Kini Ericsson 300 Holger Way San Jose, California 95134 USA Phone: 408 750-5210 Email: sriganesh.kini@ericsson.com Srikanth Narayanan Ericsson 300 Holger Way San Jose, California 95134 USA Phone: 408 750-8567 Email: srikanth.narayanan@ericsson.com Lu, et al. Expires October 30, 2011 [Page 15]