Matthew R. Meyer Global Crossing Denver Maddux Nitrous.net Jean-Philippe Vasseur Cisco Systems, Inc. Curtis Villamizar Avici Systems Amir Birjandi MCI IETF Internet Draft Expires: April, 2004 October, 2004 MPLS Traffic Engineering Soft preemption Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Meyer, Maddux, Vasseur, Villamizar and Birjandi 1 Draft-ietf-mpls-soft-preemption-03.txt October,2004 Abstract This document details MPLS Traffic Engineering Soft Preemption, a suite of protocol modifications extending the concept of preemption with the goal of reducing/eliminating traffic disruption of preempted Traffic Engineered Label Switched Paths. Initially MPLS RSVP-TE was defined supporting only immediate Label Switched Path displacement upon preemption. The utilization of a preemption pending flag helps more gracefully mitigate the re-route process of preempted Label Switched Paths. For the brief period soft preemption is activated, reservations (though not necessarily traffic levels) are in effect under-provisioned until the Label Switched Path can be re-routed. For this reason, the feature is primarily but not exclusively interesting in MPLS enabled IP networks with Differentiated Services and Traffic Engineering capabilities. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED","MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [REQ-LEVELS]. 1. Terminology This document follows the nomenclature of the MPLS Architecture RFC 3031 [MPLS-ARCH]. 1.1. Acronyms and Abbreviations CSPF Constraint-based Shortest Path First. DS Differentiated Services LER Label Edge Router LSR Label Switching Router LSP Label Switched Path MPLS MultiProtocol Label Switching PPend Preemption Pending RSVP Resource ReserVation TE Traffic Engineering 1.2. Nomenclature Make Before Break - Technique used to non-intrusively alter the path of an LSP. The ingress LER first signals the new path, sharing the bandwidth with the primary LSP (to avoid double booking), then switches forwarding over to a new path. Finally the old path state is torn down. Meyer, Maddux, Vasseur, Villamizar and Birjandi 2 Draft-ietf-mpls-soft-preemption-03.txt October,2004 Numerically Lower Preemption Priority - RSVP TE LSPs have setup and hold preemption priorities of zero (best) through seven (worst). A numerically lower setup priority LSP is capable of preempting a numerically higher hold priority in a MPLS-TE environment. Preemption Pending flag - This flag is set on an IPv4 or IPv6 RSVP Resv RRO sub-object to signal to the TE LSP ingress LER that the TE LSP is about to be preempted and must be re-signaled (in a non disruptive fashion, with make before break) along another path. If present in the Path RRO, it is used to alert downstream LSRs that the LSP was soft preempted upstream. Point of Preemption - the midpoint or ingress LSR which due to RSVP provisioning levels is forced to either hard preempt or under-provision and signal soft preemption. Hard Preemption - The (typically default) preemption process in which higher numeric priority LSPs are intrusively displaced at the point of preemption by lower numeric priority LSPs. In hard preemption the LSP is torn down before reestablishment. Soft Preemption - The preemption process in which the point of preemption allows a brief under-provisioning period while the ingress router is alerted to the requirement for reroute. In soft preemption the LSP is reestablished before being torn down. Soft Preemption Desired Flag - This flag is set on the SESSION_ATTRIBUTES Flags in the Path message for the LSP indicate to LSRs along the path that, should the LSP need to be preempted, soft preemption should be used if supported. 2. Motivations Initially MPLS RSVP-TE [RSVP-TE] was defined supporting only one method of TE LSP preemption which immediately tore down TE LSPs, disregarding the preempted in-transit traffic. This simple but abrupt process nearly guarantees preempted traffic will be discarded, if only briefly, until the RSVP Path Error message reaches and is processed by the ingress LER and a new forwarding path can be established. In cases of actual resource contention this might be helpful, however preemption may be triggered by mere reservation contention and reservations may not reflect forwarding plane contention up to the moment. The result is that when conditions that promote preemption exist and hard preemption is the default behavior, inferior priority preempted traffic may be needlessly discarded when sufficient bandwidth exists for both the preempted LSP and the preempting LSP Hard preemption may be a requirement to protect numerically lower preemption priority traffic in a non Diff-Serv enabled architecture, but in a Diff-Serv enabled architecture, one need not rely exclusively Meyer, Maddux, Vasseur, Villamizar and Birjandi 3 Draft-ietf-mpls-soft-preemption-03.txt October,2004 upon preemption to enforce a preference for the most valued traffic since the marking and queuing disciplines should already be aligned for those purposes. Moreover, even in non Diff-Serv aware networks, depending on the TE LSP sizing rules (imagine all LSPs are sized at double their observed traffic level), reservation contention may not accurately reflect the potential for forwarding plane congestion. 3. Introduction In an MPLS RSVP-TE [RSVP-TE] enabled IP network, hard preemption is the default behavior. Hard preemption provides no mechanism to allow preempted TE LSPs to be handled in a make-before-break fashion: the hard preemption scheme instead utilizes a very intrusive method that can cause traffic disruption for a potentially large amount of TE LSPs. Without an alternative, network operators either accept this limitation, or remove functionality by using only one preemption priority or using invalid bandwidth reservation values. Understandably desirable features like ingress LER automated TE reservation adjustments are less palatable when preemption is intrusive and high network stability levels are a concern. This document defines the use of additional signaling and maintenance mechanisms to alert the ingress LER of the preemption that is pending and allow for temporary under-provisioning while the preempted tunnel is re-routed in a non disruptive fashion (make-before-break) by the ingress LER. During the period that the tunnel is being re-routed, link capacity is under-provisioned on the midpoint where preemption initiated and potentially one or more links upstream along the path where other soft preemptions may have occurred. Optionally the downstream path to the egress LER may be signaled as well to more efficiently deal with any near simultaneous soft preemptions that may have been triggered downstream of the initial preemption. 4. RSVP Extensions 4.1. SESSION-ATTRIBUTES Flags To explicitly signal the desire for a TE LSP to benefit from the soft preemption mechanism (and so not to be hard preempted), the following flag of the SESSION-ATTRIBUTE object (for both the C-Type 1 and 7) is defined: Soft preemption desired: 0x40 4.2. RRO IPv4/IPv6 Sub-Object Flags To report that a soft preemption is pending for an LSP, a flag is defined in the IPv4/IPv6 sub-object carried in the RRO object message defined in RFC3209[RSVP-TE]. This flag is called the preemption pending Meyer, Maddux, Vasseur, Villamizar and Birjandi 4 Draft-ietf-mpls-soft-preemption-03.txt October,2004 (PPend) flag. A compliant LSR MUST support the RRO object, as defined in RFC 3209[RSVP-TE]. RRO IPv4 and IPv6 sub-object address These two sub-objects currently have the following flags defined in RFC 3209 [RSVP-TE]and [FAST-REROUTE]: Local protection available: 0x01 Indicates that the link downstream of this node is protected via a local repair mechanism, which can be either one-to-one or facility backup. Local protection in use: 0x02 Indicates that a local repair mechanism is in use to maintain this tunnel (usually in the face of an outage of the link it was previously routed over, or an outage of the neighboring node). Bandwidth protection: 0x04 The PLR will set this when the protected LSP has a backup path which is guaranteed to provide the desired bandwidth specified in the FAST_REROUTE object or the bandwidth of the protected LSP, if no FAST_REROUTE object was included. The PLR may set this whenever the desired bandwidth is guaranteed; the PLR MUST set this flag when the desired bandwidth is guaranteed and the "bandwidth protection desired" flag was set in the SESSION_ATTRIBUTE object. If the requested bandwidth is not guaranteed, the PLR MUST NOT set this flag. Node protection: 0x08 The PLR will set this when the protected LSP has a backup path which provides protection against a failure of the next LSR along the protected LSP. The PLR may set this whenever node protection is provided by the protected LSP's backup path; the PLR MUST set this flag when the node protection is provided and the "node protection desired" flag was set in the SESSION_ATTRIBUTE object. If node protection is not provided, the PLR MUST NOT set this flag. Thus, if a PLR could only setup a link-protection backup path, the "Local protection available" bit will be set but the "Node protection" bit will be cleared. Soft preemption makes use of the Preemption pending flag defined here: Preemption pending: 0x10 The preempting node sets this flag if a pending preemption is in progress for the TE LSP. This indicates to the ingress LER of this LSP that it SHOULD be re-routed. 4.3. Use of the RRO IPv4/IPv6 Sub-Object in Path message Meyer, Maddux, Vasseur, Villamizar and Birjandi 5 Draft-ietf-mpls-soft-preemption-03.txt October,2004 An LSR MAY use the Preemption pending flag in the IPv4/IPv6 RRO sub- object carried in a PATH RRO message to simultaneously alert downstream LSRs that the LSP was soft preempted upstream. This information could be used by the downstream LSR to bias future soft preemption candidates toward LSPs already soft preempted elsewhere in their path. 5. Mode of Operation 5.1. Example set up R0--1G--R1---155----R2 LSP1: LSP2: | \ | | \ 155 R0-->R1 R1<--R2 | \ | \ | 155 1G R3 V V | \ | R5 R4 | \ 155 | \| R4------1G--R5 Fig 1. In the network depicted above in figure 1, consider the following conditions: -Reservable BW on R0-R1, R1-R5 and R4-R5 is 1Gb/sec -Reservable BW on R1-R2, R1-R4, R2-R3, R3-R5 is 155 Mb/sec. -Bandwidths and costs are identical in both directions -Each circuit has an IGP metric of 10 and IGP metric is used by CSPF -Two TE tunnels are defined: -LSP1: 155 Mb, setup/hold priority 0 tunnel path R0-R1-R5. -LSP2: 155 Mb, setup/hold priority 7 tunnel path R2-R1-R4. Both TE LSPs are signaled with the soft preemption desired bit of their SESSION-ATTRIBUTE Path object set. -Circuit R1-R5 fails. -Soft Preemption is functional. 5.2. Basic Operation When the circuit R1-R5 fails, R1 detects the failure and sends an updated IGP LSA/LSP and Path Error message to all the ingress LERs having a TE LSP traversing the failed link (R0 in the example above). Either form of notification may arrive at the ingress LERs first. Upon receiving the link failure notification, ingress LER R0 triggers a TE LSP re-route of LSP1, and re-signals LSP1 along shortest path available satisfying the TE LSP constraints: R0-R1-R4-R5 path. The Resv messages for LSP1 travel in the upstream direction (from the destination to the ingress LSR - R5 to R0 in this example). LSP2 is soft preempted at R1 Meyer, Maddux, Vasseur, Villamizar and Birjandi 6 Draft-ietf-mpls-soft-preemption-03.txt October,2004 as it has a numerically lower priority value and both bandwidth reservations cannot be satisfied on the R1-R4 link. Instead of sending a path tear for LSP2 upon preemption as with hard preemption (which would result in an immediate traffic disruption for LSP2), R1s local BW accounting for LSP2 is zeroed and a preemption pending flagged Resv RRO for LSP2 is issued upstream toward the ingress LER, R2. Optionally, R1 MAY simultaneously send a soft preemption flagged Path RRO notifying downstream LSRs of LSP2s soft preemption. If more than one soft preempted LSP has the same ingress LER (egress LER), these soft preemption Resv (Path) messages MAY be bundled together (see [REFRESH-REDUCTION]). The preempting node MUST immediately send a Resv message with the preemption pending RRO flag set for each soft preempted TE LSP. The node MAY use the occurrence of soft preemption to trigger an immediate IGP update or influence the scheduling of an IGP update. Should a refresh event for LSP2 arrive before LSP2 is re-routed, soft preempting nodes such as R1 MUST continue to refresh the LSP. Resv messages with the RRO preemption pending flag set SHOULD be sent in reliable mode (see [REFRESH-REDUCTION]). Upon reception of the Resv with the preemption pending flag set, the ingress LER (of LSP2 in this case) MAY update the working copy of the TE-DB before running CSPF for the new LSP. In the case that Diff-Serv [DIFF-MPLS] & TE [RSVP-TE]are deployed (as opposed to Diff-Serv-aware TE [DS-TE]), receiving preemption pending may imply to a ingress LER that the available bandwidth for the affected priority level and numerically greater priority levels has been exhausted for the indicated node interface. An ingress LER MAY choose to reduce or zero available BW for the implied priority range until more accurate information is available (i.e. a new IGP TE update is received). In the case that reservation availability is restored at the point of preemption (R1) the point of preemption MAY issue a Resv message with the preemption pending flag unset to signal restoration to the ingress LER. This implies that a ingress LER might have delayed or been unsuccessful in re-signaling. To guard against a situation where bandwidth under-provisioning will last forever, a local timer (soft preemption expiration timer) MUST be started on the preemption node, upon soft preemption. If this timer expires, the soft preempted TE LSP SHOULD be hard preempted. After the ingress LER has successfully established a new LSP, the old path MUST be torn down. As a result of soft preemption, no traffic will be needlessly black- holed due to mere reservation contention. If loss is to occur, it will Meyer, Maddux, Vasseur, Villamizar and Birjandi 7 Draft-ietf-mpls-soft-preemption-03.txt October,2004 be due only to an actual traffic congestion scenario and according to the operators Diff-Serv (if Diff-Serv is deployed) and queuing scheme. 5.3. Selection of the preempted TE LSP at a preempting mid-point When a numerically lower priority TE LSP is signaled that requires the preemption of a set of numerically higher priority LSPs, the node where preemption is to occur has to make a decision on the set of TE LSP, candidates for preemption. This decision is a local decision and various algorithms can be used, depending on the objective. See [PREEMPT-EXP]. As already mentioned, soft preemption causes a temporary link under- provisioning condition while the soft preempted TE LSPs are re-routed by their respective ingress LERs. In order to reduce this under- provisioning exposure, a soft-preempting LSR MAY check first if there exists soft preempt-able TE LSP bandwidth flagged PPend by another node but still available for soft-preemption locally. If sufficient overlap bandwidth exists the LSR MAY attempt to soft preempt the same LSP. This would help reducing the temporarily elevated under-provisioning ratio on the links where soft preemption occurs. Optionally, a midpoint LSR upstream or downstream from a soft preempting node MAY choose to cache the LSPs soft preempted state. In the event a local preemption is needed, the relevant priority level LSPs from the cache are soft preempted first, followed by the normal soft and hard preemption selection process for the given priority. 5.4. Interoperability Backward compatibility should be assured as long as the implementation followed the recommendations set forth in RFC 3209[RSVP-TE]. "When processing an RRO, unrecognized subobjects SHOULD be ignored and passed on". An LSR without soft preemption capabilities but that followed the aforementioned recommendation will simply ignore the RRO Preemption Pending flag and treat the Resv message as a regular Resv refresh message. As a consequence, the soft preempted TE LSP will not be re- routed with make before break by the ingress LER. As mentioned prior, to guard against a situation where bandwidth under- provisioning will last forever, a local timer (soft preemption expiration timer) MUST be started on the preemption node, upon soft preemption. When this timer expires, the soft preempted TE LSP SHOULD be hard preempted. This timer MAY be configurable. Optionally, an implementation MAY choose to soft preempt TE LSP for which the Soft preemption desired bit has not been set. This might have the effect of buying time during extremely short term preemptions. The current hard Meyer, Maddux, Vasseur, Villamizar and Birjandi 8 Draft-ietf-mpls-soft-preemption-03.txt October,2004 preemption scheme can be emulated with a soft preemption expiration timer set to zero. Soft Preemption as defined in this document is designed for use in MPLS RSVP-TE enabled IP Networks and may not functionally translate to some GMPLS technologies. As with backward compatibility, if a device does not recognize a flag, it should pass the subobject transparently. 6. Management Both the point of preemption and the ingress LER SHOULD provide some form of accounting internally and to the network operator interface with regard to which TE LSPs and how much capacity is under-provisioned due to soft preemption. Displays of under-provisioning are recommended for the following midpoint, ingress and egress views: - Sum of current bandwidth per preemption priority per local interface - Sum of current bandwidth total per local interface - Sum of current bandwidth total local router (ingress, egress, midpoint) - List current LSPs and bandwidth in PPend status - List current sum bandwidth and session count in PPend status per observed ERO hops (ingress, egress views only). - Cumulative PPend events per observed ERO hops. 7. IANA Considerations IANA [RFC-IANA] will not need to create a new registry. This document requires the assignment of flags related to RFC3209[RSVP-TE] sections 4.1.1.1, 4.1.1.2, 4.7.1 and 4.7.2. IANA will assign RRO IPv4/IPv6 sub-object flags defined in RFC3209 [RSVP-TE] sec 4.1.1.1 and 4.1.1.2 as detailed in section 4.2 of this document. IANA will assign session attribute flags for both the C-Type 1 and 7 (defined in RFC3209 [RSVP-TE] sec 4.7.1 and 4.7.2) as detailed in section 4.1 of this document. 8. Security Considerations This document does not introduce new security issues. The security considerations pertaining to the original RSVP protocol [RSVP] remain relevant. Meyer, Maddux, Vasseur, Villamizar and Birjandi 9 Draft-ietf-mpls-soft-preemption-03.txt October,2004 9. Acknowledgment The authors would like to thank Carol Iturralde, Dave Cooper, Loa Andersson, Arthi Ayyangar, and Ina Minei for their valuable comments. 10. IPR Disclosure Acknowledgement By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. 11. References 11.1. Normative References [MPLS-ARCH] Rosen, Viswanathan, Callon, "Multiprotocol Label Switching Architecture", RFC3031, January 2001. [RSVP-TE] Awduche et al, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC3209, December 2001. [FAST-REROUTE] Pan, P. et al., "Fast Reroute Extentions to RSVP-TE for LSP Tunnels", Internet Draft, draft-ietf-mpls-rsvp-lsp-fastreroute- 06.txt, November, 2004 [ISIS-TE] Smit, Li, IS-IS extensions for Traffic Engineering, draft- ietf-isis-traffic-04.txt, December 2002. [OSPF-TE] Katz, Kompella, Yeung, Traffic Engineering (TE) Extensions to OSPF Version 2, RFC 3630, September 2003. 11.2. Informative references [REQ-LEVELS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels," RFC 2119. [RSVP] R. Braden, Ed., et al, "Resource ReSerVation protocol (RSVP) -- version 1 functional specification," RFC2205, September 1997. [TE-REQ] Awduche et al, Requirements for Traffic Engineering over MPLS, RFC2702, September 1999. [DS-TE] Le Faucheur et al, "Requirements for support of Diff-Serv-aware MPLS Traffic Engineering", RFC3564, July 2003. [DS-TE-PROT] Le Faucheur et al, "Protocol extensions for support of Meyer, Maddux, Vasseur, Villamizar and Birjandi 10 Draft-ietf-mpls-soft-preemption-03.txt October,2004 Diff-Serv-aware MPLS Traffic Engineering", draft-ietf-tewg-diff-te- proto-06.txt, January 2004 [REFRESH-REDUCTION] Berger et al, "RSVP Refresh Overhead Reduction Extensions", RFC 2961, April 2001. [PREEMPT-EXP]DE Oliviera, JP. Vasseur, L.Chen and C. Scoglio " LSP Preemption Polcies for MPLS Traffic Engineering", daft-deoliviera-diff-te-preemption-02.txt, October 2003 [DIFF-MPLS] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, P., Krishnan, R., Cheval, P. and J. Heinanen, "Multi-Protocol Label Switching (MPLS) Support of Differentiated Services", RFC 3270, May 2002. [RFC-IANA] T. Narten and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 2434. 12. Authors Addresses Matthew R. Meyer Global Crossing 3133 Indian Valley Tr. Howell, MI 48855 USA email: mrm@gblx.net, mrm@packetshovel.net Denver Maddux Nitrous.net 1020 SW 35th St Corvallis, OR 97333 USA email: denver@nitrous.net Jean Philippe Vasseur Cisco Systems, Inc. 300 Beaver Brook Road Boxborough , MA 01719 USA Email: jpv@cisco.com Curtis Villamizar Avici Systems Inc. USA Email: curtis@avici.com Meyer, Maddux, Vasseur, Villamizar and Birjandi 11 Draft-ietf-mpls-soft-preemption-03.txt October,2004 Amir Birjandi MCI 22001 Louden County pky Ashburn, VA 20147 USA 13. Full Copyright Statement "Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights." "This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Meyer, Maddux, Vasseur, Villamizar and Birjandi 12