idnits 2.17.1 draft-raszuk-teas-ip-te-np-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 2, 2019) is 1666 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 897, but no explicit reference was found in the text == Unused Reference: 'RFC2784' is defined on line 906, but no explicit reference was found in the text == Unused Reference: 'RFC8126' is defined on line 928, but no explicit reference was found in the text == Unused Reference: 'RFC8174' is defined on line 933, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-herbert-ipv4-eh-01 == Outdated reference: A later version (-26) exists of draft-ietf-6man-segment-routing-header-23 == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-07 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-01 == Outdated reference: A later version (-28) exists of draft-ietf-spring-srv6-network-programming-03 == Outdated reference: A later version (-12) exists of draft-xu-intarea-ip-in-udp-07 Summary: 0 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TEAS Working Group R. Raszuk, Ed. 3 Internet-Draft Bloomberg LP 4 Intended status: Informational October 2, 2019 5 Expires: April 4, 2020 7 IP Traffic Engineering Architecture with Network Programming 8 draft-raszuk-teas-ip-te-np-00 10 Abstract 12 This document describes a control plane based IP Traffic Engineering 13 Architecture where path information is kept in the control plane by 14 selected nodes instead of being inserted into each packet on ingress 15 of an administrative domain. The described proposal is also fully 16 compatible with the concept of network programming. 18 It is positioned as a complimentary technique to native SRv6 and can 19 be used when there are concerns with increased packet size due to 20 depth of SID stack, possible concerns regarding exceeding MTU or more 21 strict simplicity requirements typically seen in number of enterprise 22 networks. The proposed solution is applicable to both IPv4 or IPv6 23 based networks. 25 As an additional added value, detection of end to end path liveness 26 as well as dynamic path selection based on real time path quality is 27 integrated from day one in the design. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 4, 2020. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 66 4. Functional Description . . . . . . . . . . . . . . . . . . . 7 67 5. Control plane . . . . . . . . . . . . . . . . . . . . . . . . 10 68 6. Data plane . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Network Programming . . . . . . . . . . . . . . . . . . . . . 13 70 8. Active Path Probing . . . . . . . . . . . . . . . . . . . . . 16 71 8.1. TI-LFA Local Protection . . . . . . . . . . . . . . . . . 17 72 9. Solution advantages . . . . . . . . . . . . . . . . . . . . . 17 73 10. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 74 11. Deployment considerations . . . . . . . . . . . . . . . . . . 19 75 12. Security considerations . . . . . . . . . . . . . . . . . . . 19 76 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 77 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 78 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 79 15.1. Normative References . . . . . . . . . . . . . . . . . . 19 80 15.2. Informative References . . . . . . . . . . . . . . . . . 21 81 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 22 83 1. Background 85 Ability to steer data over selected topological points often 86 different from default IGP or BGP paths proves to provide substantial 87 advantages to consumers of such data. The construction of controlled 88 transit paths usually is driven by requirements to: offload 89 excessively used default routing paths, construct disjointed paths 90 for live-live dual streaming or create intra or inter-domain data 91 distribution overlays using dynamic real time SLAs criteria often 92 used along with per specific application mapping schema. 94 In addition to pure topological reasons there are often also 95 requirements for special data flow processing to happen in selected 96 network elements which by default would not be in the data path of 97 the subject flows. Examples of this could be: firewall traffic 98 screening, service function chaining, caching, deep packet 99 inspection, etc ... 101 While there are some solutions available to allow traffic engineering 102 in domains fully operated by single administrative entity there seems 103 to be lack of proposals which could be used to control 104 interconnections of sites over third party networks or Internet. As 105 part of that category one could also list public cloud tenancies 106 where ability to steer in/out traffic over other then default 107 Internet routing could provide much better SLA characteristics or 108 address some of the non purely technical requirements. 110 Another category of global networking which can significantly benefit 111 from standards based IP TE solution is unified model of path 112 engineering for Software Defined Wide Area Networks (SDWANs). One of 113 the basic operational principles in selected SDWANs is point to point 114 underlay selection based on the applied SLA characteristics. Adding 115 ability to traffic engineer such underlay flows allows to bypass 116 under performing underlay default paths or congestion points 117 occurring even few autonomous systems away. 119 2. Terminology 121 The following abbreviations are used within this document: 123 o TE - Traffic Engineering 125 o AF - Address Family 127 o IPv4 - Internet Protocol version 4 129 o IPv6 - Internet Protocol version 6 131 o IGP - Interior Gateway Protocol 133 o EH - Extension Header 135 o RIR - Regional Internet Registry 137 o PCE - Path Computation Element 139 o UDP - User Datagram Protocol 141 o BGP - Border Gateway Protocol 142 o SRH - Segment Routing Header 144 o OWAMP - A One-way Active Measurement Protocol 146 o DOH - Destination Option Header 148 o PE - Provider Edge 150 o SE - Segment Endpoint 152 o SID - Segment Identifier (PREFIX+FUNCTION+4bits} 154 o NMS - Network Management System 156 o CoS - Class of Service 158 o PCE - Path Computation Element 160 o PCEP - Path Computation Element Communication Protocol 162 o SR-MPLS - Segment Routing with MPLS data plane 164 o SRv6 - SRv6 Network Programming 166 o RTT - Round Trip Time 168 o MTU - Maximum Transmission Unit 170 o MOS - Mean Opinion Score 172 o OAM - Operation, Administration, Maintenance 174 o MPLS - Multiprotocol Label Switching 176 o GID - Group Identifier 178 3. Introduction 180 Proposed architecture described in this specification defines a new 181 forwarding paradigm which allows to create traffic engineered paths 182 either centrally or in a distributed way. With the assistance of 183 local provisioning tools or control plane such ordered set of paths 184 are distributed to those network elements which will participate in 185 data forwarding. In addition to basic packet forwarding the 186 architecture also provides mechanism to execution arbitrary 187 instructions at selected by operator network nodes which can include: 188 routers, switches, firewalls, service processors, hosts etc ... 190 Authors have taken a clean slate approach to look at the possible 191 options to engineer traffic within given administrative domain 192 boundaries. The solution is applicable to both traditional 193 "underlay" networks as well as administrative domains constructed 194 with "overlays". It is also 100% transparent to operating network 195 elements which would not participate in the traffic engineering 196 solution while maintaining packet's entropy and fast connectivity 197 restoration needs. 199 The proposed solution is constructed using either building blocks or 200 ideas borrowed from the following technologies: 202 o Segment Routing Architecture [RFC8402] 204 o Destination/Source Routing [I-D.ietf-rtgwg-dst-src-routing] 206 o Generic Packet Tunneling in IPv6 Specification [RFC2473] 208 o IP Encapsulation within IP [RFC2003] 210 o Encapsulating IP in UDP [I-D.xu-intarea-ip-in-udp] 212 o Advertising Segment Routing Policies in BGP 213 [I-D.ietf-idr-segment-routing-te-policy] 215 o BGP Vector Routing [I-D.patel-raszuk-bgp-vector-routing] 217 o A Path Computation Element (PCE) Based Architecture [RFC4655] 219 o PCEP Extensions for Segment Routing [I-D.ietf-pce-segment-routing] 221 o Topology Independent Fast Reroute using Segment Routing 222 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 224 o A One-way Active Measurement Protocol (OWAMP) [RFC4656] 226 It is also fully compatible with following specifications to embed 227 network programming concept as is define in the below documents while 228 in the same time provides a new alternate encoding model: 230 o Internet Protocol, Version 6 (IPv6) Specification [RFC8200] 232 o IPv6 Segment Routing Header (SRH) 233 [I-D.ietf-6man-segment-routing-header] 235 o IPv4 Extension Headers and Flow Label [I-D.herbert-ipv4-eh] 236 o IPv4 Extension Headers and UDP Encapsulated Extension Headers 237 [I-D.herbert-ipv4-udpencap-eh] 239 For the intradomain Traffic Engineering needs the introduced overhead 240 is of fixed size and regardless of the amount of segment endpoints or 241 links which need to be traversed as part of the engineered path is 242 constant and equal to 28 octets for IPv4 and 40 octets for IPv6. If 243 additional segment end or path end instructions are to be added into 244 additional headers an extension header size will need to be included. 245 Instructions however, can also be embedded into SID destination or 246 reside above the encapsulation header. In those cases,3 the total 247 length of the overhead remains fixed as stated above. 249 Interdomain Traffic Engineering depending on the deployment model 250 could result in additional fixed 12 octets of the overhead. Overlay 251 deployment models will be discussed in more details in below Data 252 Plane section. 254 While the described architecture is applicable to both IPv4 and IPv6 255 networks the proposal could be split into separate documents each 256 focusing on specifics corresponding only to a single address family 257 if the community expresses such preference. However, due to the 258 number of common AF agnostic characteristics it is advised to keep it 259 within a single document. 261 Since the support of EH in IPv4 is planned to be introduced with a 262 rather limited scope, the end segment or end path instructions could 263 end up using other extension header types (for example: Destination 264 Options) in IPv4 packets or could be encoded into the destination 265 addresses itself. It has to be noted that IPv4 packets could be 266 encapsulated in IPv6 when carried across a given domain. The 267 document describes how the concept of network programming can be 268 applied without use of extension headers. 270 The proposal does not enforce any new dependencies on IP address 271 block allocations and is in full alignment to the current IETF and 272 RIRs address structure and allocation policies. 274 The core of the defined functionality does not require any new 275 protocol extensions. The solution attempts to maximize and reuse 276 extensions already defined. If more optimal protocol solutions 277 applicable to any of the defined functional blocks surface additional 278 work will take place in corresponding area/wg. 280 Described architecture does not belong to segment routing family even 281 if some terminology used to describe the proposal have been borrowed 282 from it. Major difference is that by design it uses control plane or 283 management plane to install per path state in the transit nodes 284 participating in the engineering of data paths instead of encoding 285 set of TE midpoints into each packet on ingress. 287 While scaling aspects of any solution is a very important factor it 288 needs to be put in perspective to the operational requirements as 289 well as characteristics of the designs. It also needs to be noted 290 that even basic IP routing is based on state in the network elements 291 and scale of Internet routing is usually orders of magnitude higher 292 then state of most traffic engineering needs. While looking at 293 scaling factors of the complete solution variable size per packet 294 overhead needs to be weighted against cost of additional per path 295 fixed size state in control and data plane. 297 IP TE+NP design while allowing operator to create centrally computed 298 and distribute strict end to end paths in number of deployments can 299 be used in fully distributed mode. Traffic steering decisions can 300 autonomously take place in any TE midpoint what is particularly 301 useful with all SLA or performance based routing deployments. 303 If there is any comparison to be made between SR and IP TE+NP 304 architectures putting aside other fundamental differences would be 305 the assumption of constructing segment routing paths only by Binding 306 SIDs (divided into static and variable parts) and only encoding them 307 at each segment endpoint in least significant bits of source and 308 destination address of the outer IP header. 310 4. Functional Description 312 For the purpose of this document the following term definitions will 313 be used in capital letter notation: 315 o CLASSIFIER_ID: Identifier to set of rules used for mapping flows 316 to TE paths. Length - 4 octets. 318 o PATH_GID_PFX: routable node prefix + locally significant PATH_GID 319 value. Length - 4 or 16 octets. 321 o SID: routable node prefix + opt. function + opt. parameters + 4 322 bits (Lookup Type) - Length - 4 or 16 octets. 324 o PATH_LIST: ordered list of SIDs. Length N x 4 or N x 16 octets. 325 N min = 1. 327 v---------- IP TE+NP DOMAIN -----------v 329 +---------------SE1--------------+ 330 | | 331 | | 332 SRC_NET-----PE1----P1----SE2----P2----P3----SE3----PE2----DST_NET 333 | || | 334 | || | 335 +------- SE4 ------++----SE5----+ 337 Basic Network Topology 339 Figure 1 341 Consider basic two requirements to be applied for some class of 342 transit traffic T1 and T2: 344 o T1: PATH_A1: PE1--SE1--PE2 346 o T2: PATH_A2: PE1--SE4--SE5--PE2 348 TE midpoints can be placed in any arbitrary network location as long 349 as IPv4 or IPv6 reachability to such location exist. They can be 350 part of someone's IGP domain or can be placed anywhere in the 351 Internet. In the above figure P nodes can represent non TE aware 352 routers in someone's IGP or they can be taken as third party ISPs. 354 For the clarity of the example let's assume we discuss single 355 administrative deployment. IGP metric of all interfaces is set to 10 356 except interfaces attached to SE1, SE4 and SE5 nodes which are of 357 metric of 100. 359 The shortest default path, in the example above, between PEs is: 360 PE1--P1--SE2--P2--P3--SE3--PE2 362 In order to accomplish the stated requirements (for traffic classes 363 T1 and T2 defined above) the following ordered path lists are created 364 in the control plane and either locally configured on both ingress 365 and segment endpoints or distributed by any of the control plane 366 protocols discussed in subsequent sections: 368 CLASSIFIER_ID: T1 CLASSIFIER_ID: T2 369 PATH_GID: A1 PATH_GID: A2 370 PATH_LIST: SE1, PE2 PATH_LIST: SE4, SE5, PE2 372 There are few core elements of the design as listed below: 374 o Each PATH_GID_PFX contains unique routable IP prefix from one of 375 the loopbacks of the corresponding ingress PE followed by PATH_GID 376 value (PATH GROUP-ID). For example, if the loopback's prefix is a 377 /64 IPv6 prefix there can be 2^64 unique paths originated at a 378 given PE. If the loopback address is a /16 IPv4 prefix (for 379 example used from [RFC1918] space) there can be 2^16 paths 380 initiating at a given IPv4 PE. The choice of mapping scheme is 381 local to the ingress PE and is assigned by the operator. Let's 382 observe that in most cases to describe reachability to the 383 PATH_GID_PFX only a single IGP loopback prefix may need to be 384 advertised from any ingress PE. It is also highly recommended 385 that such loopback prefixes configured on all ingress nodes 386 (ingress PEs) to be sourced from the same address block such that 387 it can be described by single aggregate prefix. 389 o Each PATH_LIST consists of a number of SID elements. Each SID is 390 a unique routable IP address from one of the loopbacks of the 391 corresponding Segment Endpoint (SE) node. For example, if the 392 loopback's prefix is a /64 IPv6 prefix there can be 2^(64-4) 393 unique SID terminating on a given node. If the loopback address 394 is a /16 IPv4 prefix (for example used from [RFC1918] space) there 395 can be 2^(16-4) SIDs present on a given IPv4 node. As defined, a 396 SID may represent not only a node's topological location in the 397 network (via IP prefix reachability), but it may also, optionally, 398 contain embedded functions with their parameters. In order to 399 even further help the forwarding layer within a given domain, the 400 last four bits can be consistently chosen to describe the lookup 401 type required to correctly switch a given packet. 403 o Upon ingress to the domain, and after classification, packets are 404 encapsulated into an additional outer IP header with the following 405 elements corresponding to the non-default forwarding requirements: 407 Classified as T1 flows: Classified as T2 flows: 408 ----------------------- ----------------------- 409 Source address: PATH_A1_PFX Source address: PATH_A2_PFX 410 Destination address: SID_SE1 Destination address: SID_SE4 412 In the case of IPv6 the encapsulation for the basic TE only 413 requirement will consist of applying a fixed IPv6 40 octets header 414 containing source and destination address as described above, the 415 copy of original flow label, the copied and decremented hop limit 416 count and, depending on the local policy, CoS setting (copy of 417 original or setting local value). In the case of IPv4 scenario the 418 20 octets IP header will contain TTL copied and decremented from 419 original packet, CoS (copy of original or setting of local value) + 8 420 octets UDP header allowing to improve entropy of flows bundled to 421 travel within the provided TE path yet to still be able to utilize 422 any ECMP along the path list. 424 o Encapsulated packets are natively forwarded via the network (by 425 and through P nodes) till they arrive at the destination Segment 426 Endpoint where the destination address gets swapped to the new 427 destination address from the PATH_LIST kept in the local control 428 and data plane. The lookup which returns new destination of the 429 packet is a source-destination based lookup using both 430 PATH_GID_PFX (with PATH_GID being encoded in the least significant 431 bits of the source address of the packet) and SID (encoded in 432 destination address of the packet). That allows to maintain very 433 good scaling property of the solution without SID state or SID 434 number explosion. All functions descriptions which are encoded in 435 the SIDs can be reused across any segment endpoint, if required, 436 as they have only local significance. 438 o When packets arrive at the destination PE (last Segment Node) a 439 similar lookup is performed which returns NULL as next segment 440 what in turn will result into the decapsulation of the packet and 441 regular destination based lookup of the destination address 442 present in the inner IP header. As noted, a local optimization 443 allows to encode the local lookup type in last 4 bits of any SID 444 hence allowing to skip the first lookup if such optimization is 445 enabled by the operator. 447 o The described lookup table is instantiated and maintained by 448 either the control plane or by the local configuration of sets of 449 path lists. For any given segment end node, only local SIDs 450 (those where most significant prefix bits match locally configured 451 prefixes) are populated to data plane along with PATH_GIDs they 452 are attached to. That setup is all what is required to provide 453 basic IP TE service. More elaboration on other SID values will be 454 described within the embedded network programming section below. 456 5. Control plane 458 The proposed solution is based on classic IP reachability and does 459 not require any new control plane extension. In its basic form, and 460 in order to setup a few TE paths across the sample network in 461 Figure 1, all is required is to apply two path lists on ingress and 462 egress nodes as well as on three segment endpoints. 464 However depending on the required TE scale, on the network size, as 465 well as on the TE path complexity, real production deployments will 466 likely utilize automation in order to provision such configurations. 467 Local NMS can be used successfully to provision all participating 468 segment nodes with proper set of path lists. A separate document 469 specification describing yang models for the solution will be 470 provided. 472 Another alternative to propagate set of path lists can be enabled by 473 using segment routing extensions for PCEP as described in 474 [I-D.ietf-pce-segment-routing]. For the basic TE use cases path 475 lists used are identical to SID lists for SR-MPLS or SRv6 476 technologies. The logic used by PCE to compute such paths within 477 given domain can be directly leveraged by this architecture. The 478 defined SR-ERO sub-object can be directly used to propagate path 479 lists not also to ingress and egress nodes, but also to all segment 480 end points participating in given path list transit. 482 The described above methods offer a manual or automated way to 483 distribute path lists from central locations using directed TCP 484 sessions to all participating network elements. However, in order to 485 even further reduce the complexity and increase rate of path list 486 propagation across any domain a point to multipoint solution could be 487 utilized. Also here like in former cases, existing extensions are 488 available - specifically extension to BGP in order to Advertise 489 Segment Routing Policies as described in 490 [I-D.ietf-idr-segment-routing-te-policy]. Detailed encoding examples 491 will be provided in subsequent versions of this document. 493 BGP constructs used for SR Policies propagation to ingress nodes can 494 be used as is in order to propagate analogues path lists to all 495 participating nodes in the network. A new SAFI has been defined 496 (codepoint 73) to separate such propagation from any other address 497 family as well as to uniquely define the NLRI format. For the 498 purpose of dissemination path lists NLRI 4 octet Policy Color will 499 carry CLASSIFIER_ID and 4 or 16 octet Endpoint field will carry the 500 PATH_GID value. If PATH_GID is shorter then 4 or 16 octets the most 501 significant bits of Endpoint field will be set to zero. Ordered list 502 of SIDs will be propagated using Segment List Sub-TLVs (Type 3 for 503 IPv4 and Type 9 for IPv6). Optionally other Sub-TLVs can be also 504 included with propagation of path lists - for example: Preference 505 Sub-TLV, Priority Sub-TLV, Name Sub-TLV etc... 507 As intra-domain BGP usually employs route reflection it is likely 508 that participating nodes may receive many more path lists then 509 required to be kept or installed into data plane. There are two 510 optional solutions to reduce amount of unnecessary control plane 511 information required to be kept any participating node which when 512 applied on ingress will result in path lists inbound filtering: use 513 of route target extended communities or filtering based on 514 intersection of locally configured IP prefixes with either prefix 515 part of Endpoint NLRI or prefix part of any SID carried in Segment 516 List Sub-TLVs. Even if all path lists received would be accepted by 517 BGP for operational and troubleshooting needs only those which are 518 locally significant will be installed into data plane. 520 6. Data plane 522 There are three IP TE+NP deployment scenarios which may require 523 different data plane encoding specific to the type of connectivity 524 available for ingress, egress and TE transit nodes. The following 525 three categories are covered by this specification: 527 Cat I - deployment within service provider or enterprise where all 528 participating nodes are interconnected via links operated by the 529 same organization using addressing scheme in control of such 530 organization 532 Cat II - deployment where participating sites are interconnected 533 over third party operated networks, where participating in IP TE 534 nodes could allocate sufficient address block to be used as source 535 address and still permit to encode entire PATH_GID space of the 536 size chosen by the operator in the least significant bits of the 537 addresses of such nodes 539 Cat III - deployment where participating nodes are interconnected 540 over third party operated infrastructure where all what has been 541 granted to such nodes are either host routes or prefixes with not 542 enough bits left to encode PATH_GID 544 The below building blocks constitute the required minimum data plane 545 functionality for this architecture: 547 Source+Destination Routing [I-D.ietf-rtgwg-dst-src-routing] 549 Choice of encapsulation: 551 IPv4 in IPv4+UDP [I-D.xu-intarea-ip-in-udp] 553 IPv6 or IPv4 in IPv6 [RFC2473] 555 The selection of normal destination only lookup or source+destination 556 lookup is triggered by lookup of the destination address. Network 557 elements which do not participate in the IP TE+NP service will 558 perform destination only lookup and forward the packets. Network 559 elements which do participate in the new architecture will perform 560 destination address check and if that address matches the local 561 prefix assigned to IP TE+NP service source+destination lookup will 562 take place, otherwise standard destination only lookup will be 563 performed. 565 For deployments falling into Cat III as classified above available 566 address space does not allow to encode the PATH_GID as part of the 567 source address. Therefore in such scenarios it is recommended to use 568 additional GRE encapsulation where PATH_GID would be encoded in the 4 569 octet key field. 571 Proposed above GRE header encoding applicable only to Cat III 572 deployments should in addition to already defined rules also follow 573 described GRE encoding in the following specifications: 575 IPv4 in IPv4+UDP+GRE [RFC8086] 577 IPv4 or IPv6 in IPv6+GRE [RFC7676] 579 In Cat III deployments when source+destination lookup is performed 580 PATH_GID from GRE key field should be used instead of packet's source 581 address. For the case of IPv6 packet encapsulation 12 octets of 582 zeros should be locally prepended to the key to perform 583 source+destination lookup. 585 7. Network Programming 587 Control Plane Assisted Traffic Engineering is fully compatible with 588 functions as described in [I-D.ietf-spring-srv6-network-programming] 589 with one major difference. Instead of always inserting SIDs in a 590 form of SRH on ingress and into each packet, there are few 591 alternative ways proposed by this specification. One of them assumes 592 that information about selected functions is added to the packet by 593 the penultimate node of a given segment end node hop. SIDs defined 594 in this document consist of routable prefix part and locally 595 significant function/instruction part with optional parameters and 596 lookup type. They can be 32 bit in the case of IPv4 or 128 bit long 597 in the case of IPv6 with the length of the routable part being a 598 local choice of the operator. 600 PATH_GID+SID lookup can return a simple pointer to the next segment 601 node or can also result in any other local packet processing chain. 602 While the routable part of the SID has domain-wide significance the 603 function part has only local meaning to a given node on which it has 604 been instantiated. 606 It needs to be observed that some network functions can, for 607 practical purposes, only be instantiated of the ingress to the domain 608 and as such can be attached to the packet during initial 609 encapsulation by use of Segment Routing Header (SRH) or Desatination 610 Options Header (DOH). The examples of such functions include L3VPN 611 or EVPN or L2VPN demux labels which are to be used when packets 612 arrive to the other side of the domain with or without TE. 614 To further simplify the processing of packets via the segment end 615 nodes and relax the requirement for each transit node to inspect 616 Extension Header (EH) (when added by ingress node) the document will 617 recommend that each operator in the domain will reserve the last 4 618 bits of the SID to explicitly indicate the required lookup type (aka 619 switching vector) on the outer packet header to occur: 621 +---------------+--------------------------------+ 622 | Decimal value | Lookup Type | 623 +---------------+--------------------------------+ 624 | 0 | SRC-DST lookup only | 625 | 1 | EH inspection + SRC-DST lookup | 626 | 2 | Decapsulation + Global lookup | 627 | 3 | EH inspection + Decapsulation | 628 | 4 | reserved | 629 | .. | .. | 630 | 15 | reserved | 631 +---------------+--------------------------------+ 633 Table 1: Recommended allocation of domain wide IPv6 SID_PFX actions 635 As this specification is only of informational category the proposed 636 recommendation has non binding character and can be locally replaced 637 by any different schema as chosen by the operator and made possible 638 by implementations. For example the 4 bits may be placed in any 639 other offset after the SID's routable prefix part. The proposed SID 640 Lookup Types do not replace or interfere in any way with SRH SRv6 641 Endpoint Behaviors as defined in 642 [I-D.ietf-spring-srv6-network-programming]. 644 As defined today [RFC8200] mandates to inspect and process all 645 extension headers in the IPv6 packet when packet's destination 646 matches any of the locally configured IPv6 address. Therefor if 647 present SRH will need to be inspected and processed at each segment 648 end even if it is known by control plane that it does not contain any 649 instructions to be executed at a given network element ahead of time. 650 Authors will however still encourage recommended SID structure to be 651 used for either troubleshooting reasons or for the future when IPv6 652 specification will relax the EH handling rules to accomodate such new 653 deployment models. 655 As an alternative solution to avoid unnecessary processing of 656 extension header by nodes which are not required to do so 657 implementation can treat SID with last four bits set to zero as none 658 local destination address. In such scenario source+destination 659 lookup will instead of triggering local extension header processing 660 invoke destination IPv6 NAT function as defined in [RFC6296]. The 661 NAT rules which will be pre-programmed using information contained in 662 the PATH_LIST will effectively result in destination address swap. 663 Such NAT translation is to be of unidirectional character can can 664 remain fully stateless. 666 Described solution also directly applies to the case of IPv4 in IPv6 667 encapsulation. 669 In the case of IPv4 in IPv4+UDP encapsulation the basic behaviour of 670 embedding functions in SIDs does not change. However as to the 671 moment of this writing the proposed IPv4 header extensions 672 [I-D.herbert-ipv4-eh] and [I-D.herbert-ipv4-udpencap-eh] may only 673 allow limited number of extension headers to be used (Hop-by-Hop 674 Options and Destination Options). As such the recommended allocation 675 table in the case of IPv4 requires slight adjustment: 677 +---------------+---------------------------------+ 678 | Decimal value | Lookup Type | 679 +---------------+---------------------------------+ 680 | 0 | SRC-DST lookup only | 681 | 1 | DOH inspection + SRC-DST lookup | 682 | 2 | Decapsulation + Global lookup | 683 | 3 | DOH inspection + Decapsulation | 684 | 4 | reserved | 685 | .. | .. | 686 | 15 | reserved | 687 +---------------+---------------------------------+ 689 Table 2: Recommended allocation of domain wide IPv4 SID_PFX actions 691 The specific syntax of Destination Option Header encoding when used 692 with IPv4 encapsulation will be defined in subsequent versions of 693 this document. 695 Existing services (ex: MPLS-VPNs [RFC4364]) are fully compatible as- 696 is without any modifications to be transported over described IP TE 697 architecture. Existing MPLS label can be used as service demux with 698 full replacement of MPLS-Transport to IP-TE transport. In such 699 scenario there is no longer need to rename service demux value into 700 some new nomenclature to artificially force it to fit into SID space. 701 Substitute of MPLS transport with new IP TE transport is essentially 702 treated as basic IP-in-IP encapsulation and is seamless to the upper 703 layer applications. That however in no way can prevent invention of 704 new native services to only use new network programming paradigm. 706 8. Active Path Probing 708 One of the critical network metrics for a lot of applications running 709 on the network is not only ability to reach the destination in a 710 relatively congestion free fashion, but also the quality of the path 711 which is traversed towards a destination. The latter is, 712 unfortunately, very seldom used as selection criteria in number of TE 713 implementations. Here authors recommend that, from day one, the 714 operator has an option in order to define the minimum path quality 715 metrics before it is considered for actual data plane use as both 716 relative or absolute set of values. Comparison with non TE path or 717 other TE paths end to end metrics should also be available. 719 Today's network technologies focus on local protection as reaction to 720 adjacent link or node failures. At the same time, there is a 721 significant concern that they lack detection of any malfunctions of 722 network elements' internal data plane itself which, as proven in 723 number of production deployments, does occur. 725 Moreover, it also needs to be observed that most if not all of 726 commonly used routing protocols focus on assuring loop free 727 destination reachability via shortest or best path measured with 728 static metrics without any consideration given to actual quality of 729 end to end path towards given destination. 731 Traffic engineering allows to enable real time SLA evaluation of 732 various TE paths. Results of such measurements can be used to 733 automatically map traffic to such TE transport. Architecture 734 described by this specification integrates such functionality 735 provided an operator chooses to enable it. 737 It needs to be noted that packets used for diagnostics must traverse 738 the exact same data plane and should be encapsulated in the identical 739 header as the user packets. Such measurements not only detect path 740 parameters but also end to end path availability. 742 While (N times path RTT - N times local detection interval) slower 743 from local protection for vast majority of applications such end to 744 end path liveness detection rate is both sufficient for applications 745 and much simpler to implement and operate. It is also more 746 attractive due to increased spectrum of types of failures which can 747 be detected. Removed complexity required to be employed (example: 748 node protection repair of adjacent segment nodes) is also an 749 important consideration. 751 The choice of path probing protocol is left as the local operator's 752 decision. However, it needs to be observed that such protocol suite 753 should allow fast liveness detection as well as end to end path 754 quality measurements reported to path headend (typically a network 755 ingress node) as RTT, Jitter, Delay, MOS parameters as well as max 756 MTU and sweep MTU path validation. 758 It is also completely valid to use more than one protocol - each in 759 different frequency setting. As an example, one could use BFD 760 multihop [RFC5883] with hardware offload to detect end to end path 761 liveness while in the same time apply OWAMP [RFC4656] to collect more 762 unidirectional path quality metrics. Recommendation for a single 763 integrated path liveness and quality reporting protocol will also be 764 described in a separate IETF specification. 766 8.1. TI-LFA Local Protection 768 As stated in the TI-LFA specification for networks supporting segment 769 routing [I-D.ietf-rtgwg-segment-routing-ti-lfa], protection of SR 770 policy midpoints involves adjustments to segment list carried in the 771 packets as well as proper selection of repair path in order to assure 772 that protected packets can successfully reach the next SR policy 773 segment node. 775 Based on the control plane distribution of complete PATH_LIST, 776 similar protection is possible in the described architecture. 777 Without any additional requirements to adjust any other fields in the 778 packet header only destination address can be swapped. Current 779 destination can be replaced by subsequent node's destination address 780 on the PATH_LIST upon detection of neighboring node failure. That 781 operation however, requires to maintain per path state at PLRs what 782 while certainly possible may not be operator's preference. 784 Enabling local protection in segment engineered IP networks is 785 clearly possible, however it needs additional processing and control 786 plane information to be distributed and present on all nodes in the 787 domain. Protection PATH_LISTs can be either computed centrally or by 788 any node in the domain (including PLRs). Authors recommend this to 789 remain a local operator decision and at the same time encourage to 790 use end to end path protection scheme as first preference. 792 9. Solution advantages 794 The following key advantages can be used to characterize the 795 described architecture: 797 o Native TE support for IPv4 and IPv6 799 o Very efficient use of available address space - no requirement for 800 any new address allocations 802 o IGP impact - single prefix injection from ingress nodes of length 803 chosen by operator 805 o Ability to aggregate injected prefixes at area or domain boundary 806 with no impact to functionality 808 o No extensions to ISIS or OSPF routing protocols required 810 o Reuse of commonly available components (SRC-DST routing and IPinIP 811 encapsulation) 813 o Integrated end to end path validation for reachability and quality 815 o For basic TE and PATH_LIST SID integrated network programming 816 functions fixed overhead of 28 octets for IPv4 and 40 octets for 817 IPv6. 819 o Full compatibility with SRH from SRv6 Network Programming concept 821 o No per user data flow state in any network element of the network 822 except ingress (mapping only) 824 o No packet header size growth with the growing number of TE segment 825 endpoints policies 827 o Support in all available hardware - no need for any new operations 828 on the packet headers 830 o TI-LFA support when end to end path protection will not be 831 sufficient 833 o Full native support of network services: L2VPNs, L3VPNs, EVPNs etc 834 with single SID in SRH or native service level encapsulation 836 o Support of ingress, egress or transit nodes with available only 837 single host address available on each such system 839 10. OAM 841 As result of use of IP encapsulation both traceroute as well as ping 842 are natively supported within a given domain boundaries. ICMP or UDP 843 OAM probes will be encapsulated in the exact same IPv4 or IPv6 header 844 as user data packets therefore all replies will be sent to the domain 845 ingress node. 847 No modifications to additional extension headers or even their 848 presence is required for correct OAM operations. 850 If an OAM packet is originated externally to the domain, the ingress 851 node will need to act as OAM proxy in relaying the responses to its 852 original sources. 854 11. Deployment considerations 856 The solution is defined to be fully customizable by the operator. 857 The path engineering as well as choice of numbering will likely 858 differ domain to domain. 860 As all packets subject to this specification carry in their source 861 address immutable PATH_GID. Together with locally assigned SIDs no 862 further extensions are necessary to identify specific path flows at 863 any point in the domain. The same tuple PATH_GIDs + SIDs can also be 864 used to identify any path statistics (netflow records) at any point 865 in the domain. 867 12. Security considerations 869 The described architecture reuses standard components defined in 870 other IETF WGs. It does not define any new protocol or data plane 871 extensions. All security related work applicable to each used 872 component is also recommended to be applied to IP TE+NP architecture. 874 13. IANA Considerations 876 No IANA allocations are required by this specification. 878 14. Acknowledgements 880 Authors would like to thank Tony Li, Stefano Previdi, Dirk Steinberg, 881 Francois Clad, Joel Halpern and Linda Dunbar for their valuable 882 review and comments. 884 15. References 886 15.1. Normative References 888 [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., 889 and E. Lear, "Address Allocation for Private Internets", 890 BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996, 891 . 893 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 894 DOI 10.17487/RFC2003, October 1996, 895 . 897 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 898 Requirement Levels", BCP 14, RFC 2119, 899 DOI 10.17487/RFC2119, March 1997, 900 . 902 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 903 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 904 December 1998, . 906 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 907 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 908 DOI 10.17487/RFC2784, March 2000, 909 . 911 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 912 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 913 2006, . 915 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 916 Translation", RFC 6296, DOI 10.17487/RFC6296, June 2011, 917 . 919 [RFC7676] Pignataro, C., Bonica, R., and S. Krishnan, "IPv6 Support 920 for Generic Routing Encapsulation (GRE)", RFC 7676, 921 DOI 10.17487/RFC7676, October 2015, 922 . 924 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 925 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 926 March 2017, . 928 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 929 Writing an IANA Considerations Section in RFCs", BCP 26, 930 RFC 8126, DOI 10.17487/RFC8126, June 2017, 931 . 933 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 934 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 935 May 2017, . 937 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 938 (IPv6) Specification", STD 86, RFC 8200, 939 DOI 10.17487/RFC8200, July 2017, 940 . 942 15.2. Informative References 944 [I-D.herbert-ipv4-eh] 945 Herbert, T., "IPv4 Extension Headers and Flow Label", 946 draft-herbert-ipv4-eh-01 (work in progress), May 2019. 948 [I-D.herbert-ipv4-udpencap-eh] 949 Herbert, T., "IPv4 Extension Headers and UDP Encapsulated 950 Extension Headers", draft-herbert-ipv4-udpencap-eh-01 951 (work in progress), March 2019. 953 [I-D.ietf-6man-segment-routing-header] 954 Filsfils, C., Dukes, D., Previdi, S., Leddy, J., 955 Matsushima, S., and d. daniel.voyer@bell.ca, "IPv6 Segment 956 Routing Header (SRH)", draft-ietf-6man-segment-routing- 957 header-23 (work in progress), September 2019. 959 [I-D.ietf-idr-segment-routing-te-policy] 960 Previdi, S., Filsfils, C., Mattes, P., Rosen, E., Jain, 961 D., and S. Lin, "Advertising Segment Routing Policies in 962 BGP", draft-ietf-idr-segment-routing-te-policy-07 (work in 963 progress), July 2019. 965 [I-D.ietf-pce-segment-routing] 966 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 967 and J. Hardwick, "PCEP Extensions for Segment Routing", 968 draft-ietf-pce-segment-routing-16 (work in progress), 969 March 2019. 971 [I-D.ietf-rtgwg-dst-src-routing] 972 Lamparter, D. and A. Smirnov, "Destination/Source 973 Routing", draft-ietf-rtgwg-dst-src-routing-07 (work in 974 progress), March 2019. 976 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 977 Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B., 978 Francois, P., daniel.voyer@bell.ca, d., Clad, F., and P. 979 Camarillo, "Topology Independent Fast Reroute using 980 Segment Routing", draft-ietf-rtgwg-segment-routing-ti- 981 lfa-01 (work in progress), March 2019. 983 [I-D.ietf-spring-srv6-network-programming] 984 Filsfils, C., Camarillo, P., Leddy, J., 985 daniel.voyer@bell.ca, d., Matsushima, S., and Z. Li, "SRv6 986 Network Programming", draft-ietf-spring-srv6-network- 987 programming-03 (work in progress), September 2019. 989 [I-D.patel-raszuk-bgp-vector-routing] 990 Raszuk, R., Patel, K., Pithawala, B., Sajassi, A., 991 Osborne, E., Jalil, L., and J. Uttaro, "BGP vector 992 routing.", draft-patel-raszuk-bgp-vector-routing-07 (work 993 in progress), May 2016. 995 [I-D.xu-intarea-ip-in-udp] 996 Xu, X., Assarpour, H., Ma, S., daniel.bernier@bell.ca, d., 997 Dukes, D., Lee, Y., and F. Yongbing, "Encapsulating IP in 998 UDP", draft-xu-intarea-ip-in-udp-07 (work in progress), 999 May 2018. 1001 [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation 1002 Element (PCE)-Based Architecture", RFC 4655, 1003 DOI 10.17487/RFC4655, August 2006, 1004 . 1006 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1007 Zekauskas, "A One-way Active Measurement Protocol 1008 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 1009 . 1011 [RFC5883] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1012 (BFD) for Multihop Paths", RFC 5883, DOI 10.17487/RFC5883, 1013 June 2010, . 1015 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1016 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1017 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1018 July 2018, . 1020 Author's Address 1022 Robert Raszuk (editor) 1023 Bloomberg LP 1024 731 Lexington Ave 1025 New York City, NY 10022 1026 USA 1028 Email: robert@raszuk.net