IS-IS for IP Internets S. Previdi, Ed. Internet-Draft C. Filsfils, Ed. Intended status: Standards Track A. Bashandy Expires: September 21, 2013 Cisco Systems, Inc. M. Horneffer Deutsche Telekom B. Decraene S. Litkowski Orange I. Milojevic Telekom Srbija R. Shakir British Telecom S. Ytti TDC Oy W. Henderickx Alcatel-Lucent J. Tantsura Ericsson March 20, 2013 Segment Routing with IS-IS Routing Protocol draft-previdi-filsfils-isis-segment-routing-02 Abstract Segment Routing (SR) enables any node to select any path (explicit or derived from IGPs SPT computations) for each of its traffic classes. The path does not depend on a hop-by-hop signaling technique (neither LDP nor RSVP). It only depends on a set of "segments" that are advertised by the IS-IS routing protocol. These segments act as topological sub-paths that can be combined together to form the desired path. There are two forms of segments: node and adjacency. A node segment represents a path to a node. An adjacency segment represents a specific adjacency to a node. A node segment is typically a multi- hop path while an adjacency segment is a one-hop path. SR's control- plane can be applied to IPv6 and MPLS dataplanes. Segment Routing control-plane can be applied to the MPLS dataplane: a node segment to node N is instantiated in the MPLS dataplane as an LSP along the shortest-path (SPT) to the node. An adjacency segment is instantiated in the MPLS dataplane as a cross-connect entry pointing to a specific egress datalink. Previdi, et al. Expires September 22, 2013 [Page 1] Internet-Draft Segment Routing with IS-IS March 2013 This document describes the Segment Routing functions, a set of use cases it addresses and the necessary changes that are required in the IS-IS protocol. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 22, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Simplicity . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Capacity Planning and Traffic Engineering (TE) . . . . . 5 Previdi, et al. Expires September 22, 2013 [Page 2] Internet-Draft Segment Routing with IS-IS March 2013 2.2.1. Disjointness in dual-plane networks . . . . . . . . . 8 2.2.2. QoS-based Routing Policies . . . . . . . . . . . . . 9 2.2.3. Deterministic non-ECMP Path . . . . . . . . . . . . . 10 2.3. Fast Reroute . . . . . . . . . . . . . . . . . . . . . . 11 2.4. Segment Routing in Software Defined Networks (SR-SDN) . . 12 3. Segment Routing Identifiers . . . . . . . . . . . . . . . . . 13 3.1. Node Segment Identifier (Node-SID) . . . . . . . . . . . 13 3.1.1. Node-SID SubTLV . . . . . . . . . . . . . . . . . . . 14 3.2. Adjacency Segment Identifier (Adj-SID) . . . . . . . . . 14 3.2.1. Adj-SID and Interface Address . . . . . . . . . . . . 16 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV . . . . 16 3.2.3. Adjacency Segment Identifiers in LANs . . . . . . . . 17 4. Segment Routing Capabilities . . . . . . . . . . . . . . . . 19 5. Elements of Procedure . . . . . . . . . . . . . . . . . . . . 20 5.1. Unicity . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2. IS-IS Multi-Level . . . . . . . . . . . . . . . . . . . . 20 5.3. Data-Plane Encodings . . . . . . . . . . . . . . . . . . 21 5.3.1. Segment Routing RIB (SR-RIB) . . . . . . . . . . . . 21 5.3.2. Multiprotocol Label Switching (MPLS) . . . . . . . . 23 5.3.3. IP Version 6 . . . . . . . . . . . . . . . . . . . . 24 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 7. Manageability Considerations . . . . . . . . . . . . . . . . 24 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.1. Normative References . . . . . . . . . . . . . . . . . . 24 10.2. Informative References . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 1. Introduction Segment Routing (SR) enables any node to select any path (explicit or derived from IGPs SPT computations) for each of its traffic classes. The path does not depend on a hop-by-hop signaling technique (neither LDP nor RSVP). It only depends on a set of "segments" that are advertised by the IS-IS routing protocol. These segments act as topological sub-paths that can be combined together to form the desired path. There are two forms of segments: node and adjacency. A Node Segment represents the shortest path to a node. A Node Segment is typically a multi-hop sortest path. An adjacency Segment represents a specific adjacency to a node. SR's control-plane can be applied to IPv6 and MPLS dataplanes. In the MPLS dataplane, a node segment to node N is instantiated as an LSP along the shortest-path (spt) to the node. An adjacency segment Previdi, et al. Expires September 22, 2013 [Page 3] Internet-Draft Segment Routing with IS-IS March 2013 is instantiated as a crossconnect entry pointing to a specific egress datalink. At the heart of the SR technology, we find node segments. Node segments must be globally unique within the network domain. A----B----C----D Figure 1 In Figure 1, all the nodes must be configured with the same Segment Routing Identifiers Block (called SRB Node Registry), e.g. 64-5000, and any node segment be uniquely allocated from that SRB Node Registry (e.g. A, B, C and D are configured respectively with node segments 64, 65, 66 and 67). In the MPLS dataplane instantiation, this means that all the nodes need to be able to reserve and allocate to the SR control-plane the same MPLS label range (e.g. 64-5000). 2. Applicability Segment Routing is applicable to the following use-cases: simplicity, TE, FRR and SDN. 2.1. Simplicity The vast majority of IP traffic travels on shortest-paths to their destination. SR delivers a very efficient control-plane technique to instantiate shortest-path-based node segments into the forwarding dataplane. In the example described in Figure 1, considering the MPLS forwarding plane, when node D advertises node segment 64 for its loopbacks D/32, node A and B introduce the following MPLS Dataplane entries: A: IP2MPLS: FEC D/32 => push 64, nhop B A: MPLS2MPLS: 64 => swap 64, nhop B B: IP2MPLS: FEC D/32 => push 64, nhop C B: MPLS2MPLS: 64 => swap 64, nhop C If D advertises node segment 64 with the P flag reset: C: IP2MPLS: FEC D/32 => push explicit-null, nhop D C: MPLS2MPLS: 64 => pop, nhop D If D advertises node segment 64 with the P flag set: Previdi, et al. Expires September 22, 2013 [Page 4] Internet-Draft Segment Routing with IS-IS March 2013 C: IP2MPLS: FEC D/32 => push 64, nhop D C: MPLS2MPLS: 64 => swap 64, nhop D LDP is no longer required to instantiate shortest-path LSP's to a remote node. The reduction in the number of protocols to operate, helps reduce the overall operational complexity of the network. For example, the complex IGP/LDP synchronization, described in [RFC5443] and [RFC6138] no longer needs to be considered hence drastically improving the scaling and reliability of the network. For example, when a core node C has 40 TE tunnels to 40 remote core routers and 260 adjacent aggregation routers and LDP LSP's need to be signaled to 5000 FEC's, then node C maintains an LDP label database of (260+40)*5000 = 1.500.000 label bindings. In fact several networks today are exposed to much more difficult LDP scaling constraints. In comparison, in the same use case, SR control-plane only maintains 5000 node segments. This is 300 times more scalable. 2.2. Capacity Planning and Traffic Engineering (TE) Capacity Planning deals with anticipating the placement of the traffic matrix onto the network topology, for a set of expected traffic and topology variations. The heart of the process consists in simulating the placement of the traffic along ECMP-aware shortest- path and accounting for the resulting bandwidth usage. The bandwidth accounting of a demand along its shortest-path is a basic capability of any planning tool or PCE server. For example, in the network topology described in Figure 2 and assuming a default IGP metric of 2 and IGP metrics BC=BG=CD=CE=DF=EF=1, a 1600Mbps A-to-Z flow is accounted as consuming 1600Mbps on links AB and FZ, 800Mbps on links BC, BG and GF, and 400Mbps on links CD, DF, CE and EF. C-----D / \ \ A---B +--E--F--Z \ / G------+ Figure 2: Example Topology 1 ECMP is extremely frequent in SP, Enterprise and DC architectures and it is not rare to see as much as 128 different ECMP paths between a source and a destination within a single network domain. Previdi, et al. Expires September 22, 2013 [Page 5] Internet-Draft Segment Routing with IS-IS March 2013 This is illustrated in Figure 3 which consists of a subset of a network where already 6 ECMP paths are observed from A to M. C / \ B-D-L-- / \ / \ A E \ \ M \ G / \ / \ / F-H-K \ / I Figure 3: ECMP Topology Example Segment Routing offers a simple support for such ECMP-based shortest- path placement: a node segment. A single node segment enumerates all the ECMP paths along the shortest-path. This is much simpler to the RSVP-TE model where one TE tunnel is required for each enumerated ECMP path. When the capacity planning process detects that a traffic or topology variation would lead to congestion traffic engineering or capacity increase is triggered. The most basic traffic engineering option consists of finding the smallest set of demands that need to be routed off their shortest path to eliminate the congestion, then to compute an explicit path for each of them and instantiating these traffic-engineered policies in the network. Segment Routing offers a simple support for explicit path policy. In the diagram described in Figure 3, it is assumed that the requirement is that AM flow should not consume any resource on the LM and the FG links. The first option would consists of using the following encapsulation at A: A sends any traffic to M towards the nhop F with a two-label stack where the top label is the adjacent segment FI and the next label is the node segment to M. Alternatively, a three-label stack with adjacency segments FI, IK and KM could have been used. The first option seems preferred as classically IP capacity planners optimize traffic along ECMP-aware shortest-path. The more node Previdi, et al. Expires September 22, 2013 [Page 6] Internet-Draft Segment Routing with IS-IS March 2013 segment can be used, the better. However, both options are available and one can favor adjacency segments. In the same way, if the requirement in the diagram described in Figure 3, is that the AM flow should not consume any resource along the LM link but should use any resource on the bottom of the topology, then A could send the AM flow to its nhop F with a single label: the label representing the node segment to M. We believe that Segment Routing offers an excellent solution for Capacity Planning because: Node segment represents an ECMP-aware shortest path. Adjacency segments allow to express any explicit path. The combination of node and adjacency segment allows to express any path without having to enumerate all the ECMP path options. The capacity planning process ensures that the majority of the traffic rides on node segments (ECMP-based shortest path) while a minority of the traffic is routed off its shortest-path. The network does not hold any signaling state for the traffic engineered flows. In comparison, a classic RSVP-TE Full-mesh traffic engineering solution involves a full-mesh of tunnels from any edge to edge of the network. For any specific edge to edge pair, tens of RSVP-TE tunnels may need to be enumerated to load-share the traffic along all the possible shortest paths. Analytically, assuming a single tunnel from an edge to an edge (optimistic assumption), an RSVP-TE Full-Mesh traffic engineering solution scales as E^2 where E is the number of edge nodes. The number of LSP's signaled and maintained by the network (in control- plane and in dataplane) scales quadratically with the number of edge nodes. In contrast, the Segment Routing solutions maintains E node segments. The number of control-plane and dataplane states scale linearly with the number of edge nodes. A network of 1000 edges is very frequent nowadays. In such a case, the capacity planning solution based on segment routing scales 1000 times better than the RSVP-TE Full-Mesh solution. Previdi, et al. Expires September 22, 2013 [Page 7] Internet-Draft Segment Routing with IS-IS March 2013 We have applied this comparative study to a use-case using real topology and real demand matrix. The data-set consisted in a full- mesh of 12000 Tunnels where originally only 65% of the traffic was riding on their shortest path. Two well-known defects are illustrated in this data set: the lack of ECMP support in RSVP-TE and hence the increase of the number of tunnels to enumerate all the ECMP options, the inefficiency of distributed optimization as too much traffic is riding off its shortest path. Using centralized optimization, we could optimize the IGP metrics such as to place 98% of the traffic on ECMP-aware shortest-path (one single node segment) while only 2% of the traffic required explicit traffic engineering tunnels away from the shortest path. Only 250 demands required explicit paths. In this example, we increased the efficiency of the network by 150%. Indeed, 98% is riding on shortest path instead of 65%. Furthermore, we reduced the operational complexity of the network by 60 times (200 explicit routing policies instead of 12000). The next two sections provide other examples illustrating the simplicity and efficiency benefits of the SR-based traffic engineering solution. 2.2.1. Disjointness in dual-plane networks Many networks are built according to the dual-plane design: Each access region k is connected to the core by two C routers (C(1,k) and C(2,k)). C(1,k) is part of plane-1 of the dual-plane core. C(2,k) is part of plane-2 of the dual-plane core. C(1,k) has a link to C(2, l) iff k = l. {C(1,k) has a link to C(1, l)} iff {C(2,k) has a link to C(2, l)}. Many networks need to deliver disjoint-based services (bank, government...): an access node A connected to core nodes C(1, A) and C(2, A) need to provide two disjoint services towards an access node Z connected to core nodes C(1, Z) and C(2, Z). Classic IP routing cannot fulfill this requirement as A would load- balance between the dual planes across ECMP paths. Previdi, et al. Expires September 22, 2013 [Page 8] Internet-Draft Segment Routing with IS-IS March 2013 RSVP-TE traffic-engineering would allow to signal two disjoint paths: one across the first plane and one across the second plane with the following two draw-backs: Many ECMP paths exist within each plane (from (Ci, A) to (Ci, Z)) and hence many RSVP-TE tunnels might be required to efficiently distribute the load within each plane. Many such services might need to be supported. Assuming 10000 such services across the network and assuming an average of 4 ECMP paths within each plane, a straight application of RSVP-TE would require 10000 * 2 * 4 tunnels hence 80000 tunnels. Even if load-sharing of traffic along ECMP paths in each plane is dropped, the solution would still need 20000 tunnels. Segment Routing (SR) offers a simpler solution. Any node of the first plane can be configured with an anycast loopback say 1.1.1.1/32 to which node segment 111 is attached. Any node of the second plane can be configured with an anycast loopback say 2.2.2.2/32 to which node segment 222 is attached. Let us also assume that access node Z is advertising node segment 500. A flow from A to Z via the first plane is simply represented by the segment list {111, 500}. In the MPLS dataplane case, A pushes a two- label stack for Z-destined packets: the top label is 111 and the second label is 500. Node segment 111 gets the traffic on ECMP-aware shortest path to the first plane and then node segment 500 gets the traffic on ECMP-aware shortest path to Z. Similarly, a flow from A to Z via the second plane is simply represented by the segment list {222, 500}. This simple solution would only add two node segments to the network instead of 80000 LSP's signaled by the RSVP-TE solution. This is 40000 better. 2.2.2. QoS-based Routing Policies Frequently, different classes of service need different path characteristics. For example, an international network with presence in Tokyo and Brussels may have lots of cheap network capacity from Tokyo to Europe via USA and some scarce expensive capacity via Russia. Previdi, et al. Expires September 22, 2013 [Page 9] Internet-Draft Segment Routing with IS-IS March 2013 ...USA...Brussels...Russia...Tokyo...USA... Figure 4: International Topology Example In such case, the IGP metrics would be tuned to have a shortest-path from Tokyo to Brussels via USA. This would provide efficient capacity planning usage while fulfilling the requirements of most of the data traffic. However, it may not suite the latency requirements of the voice traffic between the two cities. Segment Routing (SR) offers a simple solution to the problem. The core routers in Russia are configured with an extra anycast loopback 3.3.3.3/32 to which node segment 333 is attached. If we assume that Brussels is configured with node segment 600, Tokyo can send all its data traffic to Brussels with one single segment: 600. 600 gets the traffic from Tokyo to Brussels via USA and exploits any ECMP-path along this shortest-path. Tokyo can send all its voice traffic to Brussels with a list of two segments: {333, 600}. 333 gets the traffic to Russia and exploit any ECMP path along the shortest path. 600 gets the traffic from Russia to Brussels via ECMP-aware shortest-path. One single metric per link is sufficient as clearly it is possible to set the IGP metrics such that the shortest-path from Brussels to Russia is via Russia and not via USA and the shortest-path from Russia to Brussels is not back via Tokyo and USA but straight to Brussels. 2.2.3. Deterministic non-ECMP Path The previous sections have illustrated the ease of capacity planning traffic with ECMP-awareness and shortest-path. The key benefits in terms of drastic reduction of the number of routing policies signaled by the network control plane and maintained by the data plane have been explained and several orders of scaling simplifications have been illustrated. In this section, we highlight SR's ability to support a completely different model: the deterministic expression of a path avoiding any ECMP behavior. This is realized by expressing the end-to-end path as a list of adjacency segments. Previdi, et al. Expires September 22, 2013 [Page 10] Internet-Draft Segment Routing with IS-IS March 2013 For example, in Figure 3, one can force the AM traffic on the explicit path AFGKM by using the segment list {AF, FG, GK, KM}. Once again, SR offers simplicity and scaling benefits compared to the alternative RSVP-TE solution: no state is signaled through the network. In Figure 3, with SR, nodes F, G, K and M do not maintain any SR state for the A-to-M policy. With RSVP-TE, each nodes along the RSVP-TE tunnel must maintain one LSP state per tunnel. Here is a technique to decrease the number of adjacency segments to describe non-ECMP paths. In the topology example illustrated in Figure 5 node C can be configured with an SR explicit policy to node G via the path CDEFG. A-B-C-D-E-F-G-H Figure 5: Topology Example 3 Node C can advertise a (forwarding) adjacency to node G and attach an SR subTLV to identify the related adjacency segment (e.g 72). The ERO SubTLV is further attached to identify that this adjacency is not describing a real datalink between C and G but instead an SR non-ECMP sub-path along the route {BC, CD, DE, EF, FG}. Node A can then express its desired non-ECMP path has {AB, BC, 72, GH} instead of {AB, BC, CD, DE, EF, FG, GH}. Future versions of the document will document other techniques to decrease the number of adjacency segments in non-ECMP source-routed paths. 2.3. Fast Reroute This section assumes familiarity with Remote-LFA concepts described in [I-D.ietf-rtgwg-remote-lfa]. Lemma 1: In networks with symmetric IGP metrics (the metric of a link AB is the same as metric of the reverse link BA), we can prove that either the P and the Q sets intersect or there is at least one P node that is adjacent to a Q node. Previdi, et al. Expires September 22, 2013 [Page 11] Internet-Draft Segment Routing with IS-IS March 2013 Consider an arbitrary protected link S-E. In LFA FRR, if a path to the destination from a neighbor N of S does not cause a packet to loop back over the link S-E (i.e. N is a loop-free alternate), then S can send the packet to N and the packet will be delivered to the destination using the pre-failure forwarding information. If there is no such LFA neighbor, then S may be able to create a virtual LFA by using a tunnel to carry the packet to a point in the network which is not a direct neighbor of S and from which the packet will be delivered to the destination without looping back to S. Remote LFA (RLFA, [I-D.ietf-rtgwg-remote-lfa]) calls such a tunnel a repair tunnel. The tail-end of this tunnel is called a "remote LFA" or a "PQ node". We refer to RLFA for the definitions of the P and Q sets. If there is no such RLFA PQ node, we propose to use a Directed LFA (DLFA) repair tunnel to a Q node that is adjacent to the P space. The DLFA repair tunnel only requires two segments: a node segment to a P node which is adjacent to the Q node and an adjacency segment from the P node to its adjacent Q node. It results from lemma1, that thanks to the DLFA extension, we have a guaranteed LFA-based FRR technique for any network with symmetric IGP metrics. The solution is simple: it does not require any extra computation on top of the one required for RLFA. The repair tunnel can be encoded efficiently with only two segments. The solution preserves the capacity planning properties of LFA FRR. 2.4. Segment Routing in Software Defined Networks (SR-SDN) Some of the SDN requirements are: Guarantees of Tight SLA's (FRR and bandwidth admission control). Efficient use of the network resources. Very high scaling to support application-based transactions. Previdi, et al. Expires September 22, 2013 [Page 12] Internet-Draft Segment Routing with IS-IS March 2013 Segment Routing (SR) is a compelling architecture to support SDN for the following reasons. SR supports a simple but efficient capacity planning process based on centralized optimization. SR optimizes network resources by providing a very simple support for ECMP-based shortest-path flows SR provides for much better scaling than alternative solution: several orders of scaling gains have been illustrated in the simplicity and Capacity Planning sections. SR provides for guaranteed-FRR for any topology. SR provides for ultimate virtualization as the network does not contain any application state. The state is in the packet. It is encoded as a list of segments. SR provides for very frequent transaction-based application as the network does not hold any state for the SR-encoded flows. 3. Segment Routing Identifiers Segment Routing defines two types of Segment Identifiers: Node-SID and Adj-SID. 3.1. Node Segment Identifier (Node-SID) A node-SID is associated to a prefix advertised by a node (e.g. in a TLV-135). The Node-SID SubTLV MAY be present in one of the following TLVs: TLV-135 (IPv4) defined in [RFC5305]. TLV-235 (MT-ipv4) [RFC5120]. TLV-236 (IPv6) [RFC5308]. TLV-237 (MT-IPv6) [RFC5120]. Multiple Node-SID SubTLVs MAY be attached to a prefix. A node receiving a Node-SID subTLV containing more than one Node-SID MAY consider only one encoded Node-SID, in which case, the first encoded Node-SID MUST be considered and any additional Node-SID ignored. The value of the Node-SID is a 32 bit number. Previdi, et al. Expires September 22, 2013 [Page 13] Internet-Draft Segment Routing with IS-IS March 2013 3.1.1. Node-SID SubTLV The Node-SID SubTLV has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Segment Identifier (SID) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: Type: TBA Length: 6 octets Flags: 2 octets field of following flags: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |P|E|L| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: P-Flag: PHP flag. If set, then the penultimate hop MUST NOT pop the Nodal-SID before delivering the packet to the node that advertised the Node-SID. E-Flag: External flag. If set, then the prefix is not local to the ISIS protocol. It is redistributed from another protocol. L-Flag: Level flag. If set, then the prefix has been propagated to the router in this level from another level (i.e.: from level-1 into level-2 or from level-2 into level-1). Other bits: MUST be zero when sent and ignored when received. Segment Identifier (SID): 32 bits of Segment Identifier 3.2. Adjacency Segment Identifier (Adj-SID) Previdi, et al. Expires September 22, 2013 [Page 14] Internet-Draft Segment Routing with IS-IS March 2013 An Adjacency Segment Identifier (Adj-SID) represents a router adjacency. The value of the Adj-SID is local to the router and it is encoded as a 32 bit number value using a new SubTLV. According to IS-IS, each adjacency is advertised using one of the IS-IS Neighbor TLVs below: TLV-22 [RFC5305] TLV-222 [RFC5120] TLV-23 [RFC5311] TLV-223 [RFC5311] TLV-141[RFC5316] Currently, [RFC5316] defines TLV-141 with the purpose of inter-AS connectivity. In the Segment Routing context, we relax the constraint and we allow TLV-141 to be used for advertising any link that is external to the IS-IS domain no matter if it connects another AS or not. The newly defined Adj-SID subTLV carries the Adj-SID value for each of the advertised adjacencies and MAY be present in any of the neighbor TLVs described above. Multiple Adj-SID SubTLVs MAY be attached to the Neighbor TLVs (e.g.: TLV-22). An example where more than one is useful is the case of parallel adjacencies between two neighbors. In the figure here below: _____ / \ ----A------B------C---- \_____/ Figure 6: Parallel Adjacencies Router B nd C have 3 parallel adjacencies. Router B advertises three distinct Neighbor TLVs (e.g.: TLV-22), one for each parallel adjacency. Each of these advertisements will have its own Adj-SID SubTLV with a unique value (inside the Adj-SID space of the router). When router A inspects its IS-IS Link State Database (LSDB) it can figure out which link to use on a source routed path going through B-C links. It has knowledge of each individual parallel adjacency and can handle load sharing across them on its own (i.e.: decide in advance which packet should use which link). Previdi, et al. Expires September 22, 2013 [Page 15] Internet-Draft Segment Routing with IS-IS March 2013 However, router A may prefer not to select a specific parallel interface and leave the load sharing decision to router B so that load sharing is handled locally (i.e.: where parallel interfaces resides). In order to achieve that, router B inserts an additional Adj-SID value on each of the parallel adjacencies it advertises. The value of this second Adj-SID is common to all parallel adjacencies. Again, when router A inspects its IS-IS LSDB, it finds that the parallel adjacencies advertised by router B have a second Adj-SID with a value that is common across all parallel adjacencies. Using that value will bring packets into router B and the load sharing decision is owned by router B itself. When the same Adj-SID value is used on parallel adjacencies, we called the Adj-SID a "Bundle-Adj-SID". 3.2.1. Adj-SID and Interface Address When advertising one or more Adj-SID SubTLVs, the router MUST also advertise Interface Address and Neighbor Address SubTLVs (IPv4 or IPv6). The two MUST be present. The encoding is defined in [RFC5305] for IPv4 and in [RFC6119] for IPv6. 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV The following format is defined for the Adj-SID. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adj-SID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: Type: TBA Length: variable. Flags: 2 octets field of following flags: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Previdi, et al. Expires September 22, 2013 [Page 16] Internet-Draft Segment Routing with IS-IS March 2013 |B|F| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: B-Flag: Bundle flag. If set, then Adj-SID refers to a bundle (i.e.: a set of parallel adjacencies). F-Flag: FA flag. If set, then Adj-SID refers to a Forwarding Adjacency. Other bits: MUST be zero when sent and ignored when received. Adj-SID: 32 bits of Adjacency Segment Identifier Forwarding Adjacencies are defined in [RFC4206]. If the F-flag is set, then the explicit path taken by the Forwarding Adjacency MUST be encoded using the following subTLV in the Adj-SID SubTLV: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Segment Identifier (SID) #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Segment Identifier (SID) #... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: Type: TBA. Flags: none are currently used. Length: variable, 2 + multiple of 4 octets. Segment Identifier (SID): The SID value of each hop in the explicit path of the Forwarding Adjacency. 3.2.3. Adjacency Segment Identifiers in LANs Previdi, et al. Expires September 22, 2013 [Page 17] Internet-Draft Segment Routing with IS-IS March 2013 In LAN subnetworks, the Designated Intermediate System (DIS) is elected and originates the Pseudonode-LSP (PN-LSP) including all neighbors of the DIS. Still, when Segment Routing is used, each router in the LAN MUST advertise the Adj-SID of each of its neighbors. Since, on LANs, there are no neighbor advertisements in non-PN-LSPs (other than the adjacency to the DIS), each router advertises the set of Adj-SIDs (for each its neighbors) inside the Intermediate To Intermediate Hello (IIH) packets as soon as the adjacency to that neighbor reaches the UP state. We define a new IIH TLV, the IIH-Adj-SID TLV with following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Flags | System-ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adj-SID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Where: Type: TBA Length: 6 octets Flags: 10 bits of flags. None are used at this stage. MUST be zero when sent and ignored when received. System-ID: 6 octets of system ID and pseudonode number of the neighbor. Adj-SID: 32 bits of IIH Adjacency Segment Identifier Therefore, each router in the LAN advertises in its IIH packet the list of UP adjacencies in the form of tuples: . The DIS, as any other router in the LAN, receives IIHs from all routers on the LANand stores the set of tuples . The DIS includes the Adj-SID information received in the IIHs when advertising IS-Neighbors in its PN-LSPs. The result is that the PN-LS contains the neighbors of the DIS and, for each of them, the list of their Adj-SIDs to their respective neighbors in the LAN. This could require multiple IS-Neighbor TLVs for the same neighbor if there are more than 25 ISs on a LAN. Previdi, et al. Expires September 22, 2013 [Page 18] Internet-Draft Segment Routing with IS-IS March 2013 Each router within the level-1 area or level-2 subdomain, when receiving the PN-LSP, will extract each neighbor and its corresponding Adj-SID table in order to figure out which Adj-SID has to be used between any two neighbors in the LAN. 4. Segment Routing Capabilities Segment Routing requires each router to advertise its capabilities to the rest of the routing domain. TLV-242 (defined in [RFC4971]) describes router capabilities. For the purposes of Segment Routing we define an additional subTLV: the SR-Cap SubTLV. The SR-Cap SubTLV MUST be present in the Router Capability TLV (TLV-242), MUST appear only once and has following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | SR Capabilities Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: Type: TBA. Length: 2 octets. SR Capabilities Flags: 2 octets field of following flags: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|F|S| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: M-Flag: MPLS flag. If set, then the advertising router is capable of MPLS label based forwarding. F-Flag: IPv4 flag. If set, then the advertising router is capable of IPv4 based forwarding. S-Flag: IPv6 flag. If set, then the advertising router is capable of IPv6 based forwarding. Previdi, et al. Expires September 22, 2013 [Page 19] Internet-Draft Segment Routing with IS-IS March 2013 Other bits: MUST be zero when sent and ignored when received. The Router Capability TLV defined in [RFC4971]specifies the S and D bits. The SR-Capability SubTLV MUST be propagated throughout the entire routing domain and therefore the S bit in the Router Capability TLV MUST be set. The D bit of Router Capability TLV must be set accordingly. I.e.: it MUST be set when the Router Capability TLV is leaked from level-2 to level-1. 5. Elements of Procedure This section describes aspects of Segment Routing procedures. 5.1. Unicity The benefits of the Segment Routing solution build up on a small set of rules. The first 64 values of the 32-bit segment space are reserved and cannot be used by the SR Control-Plane neither for node or adjacency segment. All the nodes in the ISIS domain must be configured with the node SRB range. The range is a local policy and is not advertised by ISIS. A node segment must be allocated from the node SRB range. A given Node-SID must be allocated to a unique IP prefix. If the IP prefix is of anycast type and is advertised by two nodes N and M, then N and M attach the same (anycast) Node-SID to the same anycast IP address. If a node N learns a remote Adj-SID S but advertised with a value that falls in its locally configured Node SRB range, N SHOULD issue an error log warning for a misconfiguration. If a node N learns a remote Node-SID S but with a value that falls outside its locally configured node SRB range, N SHOULD NOT insert any RIB entry for segment S. Node N SHOULD issue an error log warning for misconfiguration. If a node N learns about two different IP addresses advertised with the same Node-SID, N MUST insert a RIB entry only for the node segment related to the highest IP address. N SHOULD issue an error log warning for misconfiguration. 5.2. IS-IS Multi-Level Previdi, et al. Expires September 22, 2013 [Page 20] Internet-Draft Segment Routing with IS-IS March 2013 In IS-IS protocol, adjacencies advertisements (e.g.: TLV-22) are not propagated across level/area boundaries hence the adjacency segment (Adj-SID) is not propagated across levels either. If a prefix is propagated across levels, then its Node-SID SubTLVs are also propagated. The Node-SID S flag is set accordingly, independently from the settings of the U/D bit defined in [RFC5305]. 5.3. Data-Plane Encodings The SR control-plane supports different forwarding planes. The first section describes the SR source routing concept and its RIB representation. The next sections map the SR-RIB entries into the MPLS and IPv6 forwarding planes. 5.3.1. Segment Routing RIB (SR-RIB) SR leverages source routing and introduces the following terminology: A packet is prepended with an SR header which contains a list of segments. A list of segments is ordered and has a pointer identifying the active segment. The active segment is the segment identified by the pointer. Forwarding is based on the active segment. The following forwarding operations are defined for SR: CONTINUE: the active segment remains active after the forwarding operation and the pointer is left unchanged. NEXT: the active segment is completed after the forwarding operation and the pointer is advanced to the next segment in the ordered list. INSERT: a list of segments is inserted in the segment list. The INSERT operation can be coupled with the CONTINUE or NEXT operation. Other operations will be introduced in future versions of the document. Two types of SR-RIB entries are defined: Previdi, et al. Expires September 22, 2013 [Page 21] Internet-Draft Segment Routing with IS-IS March 2013 TRANSIT: the ingress packet comes with an active segment. A Transit SR-RIB entry is represented as: Ingress active segment. Operation on the active segment. Egress Interface. INGRESS: the ingress packet comes without active segment (plain IP). 5.3.1.1. SR-RIB entry for local segments A node MUST install a transit SR-RIB entry for any local adjacency segment (Adj-SID) of value V attached to datalink L with: Ingress active segment : V Ingress operation: NEXT Egress interface: L A node MUST install a transit SR-RIB entry for any local adjacency segment (Adj-SID) of value W attached to ISIS link bundle B with: Ingress active segment: W Ingress operation: NEXT Egress interface: hash between any datalink within bundle B A node MUST install a transit SR-RIB entry for any local node segment (Node-SID) of value N with: Ingress active segment: N Ingress operation: NEXT (if not the last segment, then process the next segment else lookup in IP table) 5.3.1.2. Transit SR-RIB entry for remote segments A node MUST install a transit SR-RIB entry for any remote node segment (Node-SID) of value R attached to IP prefix P with: Ingress active segment: R Previdi, et al. Expires September 22, 2013 [Page 22] Internet-Draft Segment Routing with IS-IS March 2013 Ingress operation: CONTINUE (However, if the P flag is reset and P is advertised by the next-hop, then the operation is NEXT instead of CONTINUE). Egress interface: interface to next-hop along the shortest-path to P. A transit SR-RIB entry is never installed for a remote adjacency segment. 5.3.1.3. Ingress SR-RIB entry for remote segments Ingress SR-RIB entries enable traffic injection in the SR forwarding plane. An ingress SR-RIB entry is generally represented as: Classification: what traffic Encapsulation: what list of segments to insert In this section, we define its simplest instantiation: the automated ingress SR-RIB entry insertion towards remote node segments (Node- SID). A node SHOULD install an ingress SR-RIB entry for any remote node segment (Node-SID) of value V attached to IP prefix P with: FEC: prefix P Ingress operation: insert nodal segment V. Egress interface: interface to next-hop along the shortest-path to P. 5.3.1.4. Policy-based Ingress SRIB entry The text will be added in future revision. 5.3.2. Multiprotocol Label Switching (MPLS) The mapping of SR-RIB entries into the MPLS forwarding plane is straightforward. The following elements MUST be considered: A list of segments is represented as a stack of labels. The active segment is the top label. The CONTINUE operation is implemented as a swap where the outgoing label value is set to the incoming label value. Previdi, et al. Expires September 22, 2013 [Page 23] Internet-Draft Segment Routing with IS-IS March 2013 The NEXT operation is implemented as a MPLS pop operation. The INSERT operation is implemented as a MPLS push of a label stack. The Node-SID value or Adj-SID value rightmost 20 bits MUST be used for label values. This implies SID values to be allocated according to the 20 bit space in MPLS labels. 5.3.3. IP Version 6 The text will be added in future revision. 6. IANA Considerations TBD 7. Manageability Considerations TBD 8. Security Considerations TBD 9. Acknowledgements We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Pierre Francois, Thomas Telkamp and Les Ginsberg for their contribution to the content of this document. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) Hierarchy with Generalized Multi-Protocol Label Switching (GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005. [RFC4971] Vasseur, JP., Shen, N., and R. Aggarwal, "Intermediate System to Intermediate System (IS-IS) Extensions for Advertising Router Information", RFC 4971, July 2007. Previdi, et al. Expires September 22, 2013 [Page 24] Internet-Draft Segment Routing with IS-IS March 2013 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi Topology (MT) Routing in Intermediate System to Intermediate Systems (IS-ISs)", RFC 5120, February 2008. [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic Engineering", RFC 5305, October 2008. [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, October 2008. [RFC5311] McPherson, D., Ginsberg, L., Previdi, S., and M. Shand, "Simplified Extension of Link State PDU (LSP) Space for IS-IS", RFC 5311, February 2009. [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in Support of Inter-Autonomous System (AS) MPLS and GMPLS Traffic Engineering", RFC 5316, December 2008. [RFC6119] Harrison, J., Berger, J., and M. Bartlett, "IPv6 Traffic Engineering in IS-IS", RFC 6119, February 2011. 10.2. Informative References [I-D.ietf-rtgwg-remote-lfa] Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S. Ning, "Remote LFA FRR", draft-ietf-rtgwg-remote-lfa-01 (work in progress), December 2012. [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP Synchronization", RFC 5443, March 2009. [RFC6138] Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast Networks", RFC 6138, February 2011. Authors' Addresses Stefano Previdi (editor) Cisco Systems, Inc. Via Del Serafico, 200 Rome 00142 Italy Email: sprevidi@cisco.com Previdi, et al. Expires September 22, 2013 [Page 25] Internet-Draft Segment Routing with IS-IS March 2013 Clarence Filsfils (editor) Cisco Systems, Inc. Brussels BE Email: cfilsfil@cisco.com Ahmed Bashandy Cisco Systems, Inc. 170, West Tasman Drive San Jose, CA 95134 US Email: bashandy@cisco.com Martin Horneffer Deutsche Telekom Hammer Str. 216-226 Muenster 48153 DE Email: Martin.Horneffer@telekom.de Bruno Decraene Orange FR Email: bruno.decraene@orange.com Stephane Litkowski Orange FR Email: stephane.litkowski@orange.com Igor Milojevic Telekom Srbija Takovska 2 Belgrade RS Email: igormilojevic@telekom.rs Previdi, et al. Expires September 22, 2013 [Page 26] Internet-Draft Segment Routing with IS-IS March 2013 Rob Shakir British Telecom London UK Email: rob.shakir@bt.com Saku Ytti TDC Oy Mechelininkatu 1a TDC 00094 FI Email: saku@ytti.fi Wim Henderickx Alcatel-Lucent Copernicuslaan 50 Antwerp 2018 BE Email: wim.henderickx@alcatel-lucent.com Jeff Tantsura Ericsson 300 Holger Way San Jose, CA 95134 US Email: Jeff.Tantsura@ericsson.com Previdi, et al. Expires September 22, 2013 [Page 27]