idnits 2.17.1 

draft-raszuk-teas-ip-te-np-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 2, 2019) is 1666 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2119' is defined on line 897, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2784' is defined on line 906, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC8126' is defined on line 928, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC8174' is defined on line 933, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-03) exists of
     draft-herbert-ipv4-eh-01

  == Outdated reference: A later version (-26) exists of
     draft-ietf-6man-segment-routing-header-23

  == Outdated reference: A later version (-26) exists of
     draft-ietf-idr-segment-routing-te-policy-07

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-segment-routing-ti-lfa-01

  == Outdated reference: A later version (-28) exists of
     draft-ietf-spring-srv6-network-programming-03

  == Outdated reference: A later version (-12) exists of
     draft-xu-intarea-ip-in-udp-07


     Summary: 0 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	TEAS Working Group                                        R. Raszuk, Ed.
3	Internet-Draft                                              Bloomberg LP
4	Intended status: Informational                           October 2, 2019
5	Expires: April 4, 2020

7	      IP Traffic Engineering Architecture with Network Programming
8	                     draft-raszuk-teas-ip-te-np-00

10	Abstract

12	   This document describes a control plane based IP Traffic Engineering
13	   Architecture where path information is kept in the control plane by
14	   selected nodes instead of being inserted into each packet on ingress
15	   of an administrative domain.  The described proposal is also fully
16	   compatible with the concept of network programming.

18	   It is positioned as a complimentary technique to native SRv6 and can
19	   be used when there are concerns with increased packet size due to
20	   depth of SID stack, possible concerns regarding exceeding MTU or more
21	   strict simplicity requirements typically seen in number of enterprise
22	   networks.  The proposed solution is applicable to both IPv4 or IPv6
23	   based networks.

25	   As an additional added value, detection of end to end path liveness
26	   as well as dynamic path selection based on real time path quality is
27	   integrated from day one in the design.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at https://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on April 4, 2020.

46	Copyright Notice

48	   Copyright (c) 2019 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (https://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
64	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
65	   3.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
66	   4.  Functional Description  . . . . . . . . . . . . . . . . . . .   7
67	   5.  Control plane . . . . . . . . . . . . . . . . . . . . . . . .  10
68	   6.  Data plane  . . . . . . . . . . . . . . . . . . . . . . . . .  12
69	   7.  Network Programming . . . . . . . . . . . . . . . . . . . . .  13
70	   8.  Active Path Probing . . . . . . . . . . . . . . . . . . . . .  16
71	     8.1.  TI-LFA Local Protection . . . . . . . . . . . . . . . . .  17
72	   9.  Solution advantages . . . . . . . . . . . . . . . . . . . . .  17
73	   10. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18
74	   11. Deployment considerations . . . . . . . . . . . . . . . . . .  19
75	   12. Security considerations . . . . . . . . . . . . . . . . . . .  19
76	   13. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  19
77	   14. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  19
78	   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  19
79	     15.1.  Normative References . . . . . . . . . . . . . . . . . .  19
80	     15.2.  Informative References . . . . . . . . . . . . . . . . .  21
81	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  22

83	1.  Background

85	   Ability to steer data over selected topological points often
86	   different from default IGP or BGP paths proves to provide substantial
87	   advantages to consumers of such data.  The construction of controlled
88	   transit paths usually is driven by requirements to: offload
89	   excessively used default routing paths, construct disjointed paths
90	   for live-live dual streaming or create intra or inter-domain data
91	   distribution overlays using dynamic real time SLAs criteria often
92	   used along with per specific application mapping schema.

94	   In addition to pure topological reasons there are often also
95	   requirements for special data flow processing to happen in selected
96	   network elements which by default would not be in the data path of
97	   the subject flows.  Examples of this could be: firewall traffic
98	   screening, service function chaining, caching, deep packet
99	   inspection, etc ...

101	   While there are some solutions available to allow traffic engineering
102	   in domains fully operated by single administrative entity there seems
103	   to be lack of proposals which could be used to control
104	   interconnections of sites over third party networks or Internet.  As
105	   part of that category one could also list public cloud tenancies
106	   where ability to steer in/out traffic over other then default
107	   Internet routing could provide much better SLA characteristics or
108	   address some of the non purely technical requirements.

110	   Another category of global networking which can significantly benefit
111	   from standards based IP TE solution is unified model of path
112	   engineering for Software Defined Wide Area Networks (SDWANs).  One of
113	   the basic operational principles in selected SDWANs is point to point
114	   underlay selection based on the applied SLA characteristics.  Adding
115	   ability to traffic engineer such underlay flows allows to bypass
116	   under performing underlay default paths or congestion points
117	   occurring even few autonomous systems away.

119	2.  Terminology

121	   The following abbreviations are used within this document:

123	   o  TE - Traffic Engineering

125	   o  AF - Address Family

127	   o  IPv4 - Internet Protocol version 4

129	   o  IPv6 - Internet Protocol version 6

131	   o  IGP - Interior Gateway Protocol

133	   o  EH - Extension Header

135	   o  RIR - Regional Internet Registry

137	   o  PCE - Path Computation Element

139	   o  UDP - User Datagram Protocol

141	   o  BGP - Border Gateway Protocol
142	   o  SRH - Segment Routing Header

144	   o  OWAMP - A One-way Active Measurement Protocol

146	   o  DOH - Destination Option Header

148	   o  PE - Provider Edge

150	   o  SE - Segment Endpoint

152	   o  SID - Segment Identifier (PREFIX+FUNCTION+4bits}

154	   o  NMS - Network Management System

156	   o  CoS - Class of Service

158	   o  PCE - Path Computation Element

160	   o  PCEP - Path Computation Element Communication Protocol

162	   o  SR-MPLS - Segment Routing with MPLS data plane

164	   o  SRv6 - SRv6 Network Programming

166	   o  RTT - Round Trip Time

168	   o  MTU - Maximum Transmission Unit

170	   o  MOS - Mean Opinion Score

172	   o  OAM - Operation, Administration, Maintenance

174	   o  MPLS - Multiprotocol Label Switching

176	   o  GID - Group Identifier

178	3.  Introduction

180	   Proposed architecture described in this specification defines a new
181	   forwarding paradigm which allows to create traffic engineered paths
182	   either centrally or in a distributed way.  With the assistance of
183	   local provisioning tools or control plane such ordered set of paths
184	   are distributed to those network elements which will participate in
185	   data forwarding.  In addition to basic packet forwarding the
186	   architecture also provides mechanism to execution arbitrary
187	   instructions at selected by operator network nodes which can include:
188	   routers, switches, firewalls, service processors, hosts etc ...

190	   Authors have taken a clean slate approach to look at the possible
191	   options to engineer traffic within given administrative domain
192	   boundaries.  The solution is applicable to both traditional
193	   "underlay" networks as well as administrative domains constructed
194	   with "overlays".  It is also 100% transparent to operating network
195	   elements which would not participate in the traffic engineering
196	   solution while maintaining packet's entropy and fast connectivity
197	   restoration needs.

199	   The proposed solution is constructed using either building blocks or
200	   ideas borrowed from the following technologies:

202	   o  Segment Routing Architecture [RFC8402]

204	   o  Destination/Source Routing [I-D.ietf-rtgwg-dst-src-routing]

206	   o  Generic Packet Tunneling in IPv6 Specification [RFC2473]

208	   o  IP Encapsulation within IP [RFC2003]

210	   o  Encapsulating IP in UDP [I-D.xu-intarea-ip-in-udp]

212	   o  Advertising Segment Routing Policies in BGP
213	      [I-D.ietf-idr-segment-routing-te-policy]

215	   o  BGP Vector Routing [I-D.patel-raszuk-bgp-vector-routing]

217	   o  A Path Computation Element (PCE) Based Architecture [RFC4655]

219	   o  PCEP Extensions for Segment Routing [I-D.ietf-pce-segment-routing]

221	   o  Topology Independent Fast Reroute using Segment Routing
222	      [I-D.ietf-rtgwg-segment-routing-ti-lfa]

224	   o  A One-way Active Measurement Protocol (OWAMP) [RFC4656]

226	   It is also fully compatible with following specifications to embed
227	   network programming concept as is define in the below documents while
228	   in the same time provides a new alternate encoding model:

230	   o  Internet Protocol, Version 6 (IPv6) Specification [RFC8200]

232	   o  IPv6 Segment Routing Header (SRH)
233	      [I-D.ietf-6man-segment-routing-header]

235	   o  IPv4 Extension Headers and Flow Label [I-D.herbert-ipv4-eh]
236	   o  IPv4 Extension Headers and UDP Encapsulated Extension Headers
237	      [I-D.herbert-ipv4-udpencap-eh]

239	   For the intradomain Traffic Engineering needs the introduced overhead
240	   is of fixed size and regardless of the amount of segment endpoints or
241	   links which need to be traversed as part of the engineered path is
242	   constant and equal to 28 octets for IPv4 and 40 octets for IPv6.  If
243	   additional segment end or path end instructions are to be added into
244	   additional headers an extension header size will need to be included.
245	   Instructions however, can also be embedded into SID destination or
246	   reside above the encapsulation header.  In those cases,3 the total
247	   length of the overhead remains fixed as stated above.

249	   Interdomain Traffic Engineering depending on the deployment model
250	   could result in additional fixed 12 octets of the overhead.  Overlay
251	   deployment models will be discussed in more details in below Data
252	   Plane section.

254	   While the described architecture is applicable to both IPv4 and IPv6
255	   networks the proposal could be split into separate documents each
256	   focusing on specifics corresponding only to a single address family
257	   if the community expresses such preference.  However, due to the
258	   number of common AF agnostic characteristics it is advised to keep it
259	   within a single document.

261	   Since the support of EH in IPv4 is planned to be introduced with a
262	   rather limited scope, the end segment or end path instructions could
263	   end up using other extension header types (for example: Destination
264	   Options) in IPv4 packets or could be encoded into the destination
265	   addresses itself.  It has to be noted that IPv4 packets could be
266	   encapsulated in IPv6 when carried across a given domain.  The
267	   document describes how the concept of network programming can be
268	   applied without use of extension headers.

270	   The proposal does not enforce any new dependencies on IP address
271	   block allocations and is in full alignment to the current IETF and
272	   RIRs address structure and allocation policies.

274	   The core of the defined functionality does not require any new
275	   protocol extensions.  The solution attempts to maximize and reuse
276	   extensions already defined.  If more optimal protocol solutions
277	   applicable to any of the defined functional blocks surface additional
278	   work will take place in corresponding area/wg.

280	   Described architecture does not belong to segment routing family even
281	   if some terminology used to describe the proposal have been borrowed
282	   from it.  Major difference is that by design it uses control plane or
283	   management plane to install per path state in the transit nodes
284	   participating in the engineering of data paths instead of encoding
285	   set of TE midpoints into each packet on ingress.

287	   While scaling aspects of any solution is a very important factor it
288	   needs to be put in perspective to the operational requirements as
289	   well as characteristics of the designs.  It also needs to be noted
290	   that even basic IP routing is based on state in the network elements
291	   and scale of Internet routing is usually orders of magnitude higher
292	   then state of most traffic engineering needs.  While looking at
293	   scaling factors of the complete solution variable size per packet
294	   overhead needs to be weighted against cost of additional per path
295	   fixed size state in control and data plane.

297	   IP TE+NP design while allowing operator to create centrally computed
298	   and distribute strict end to end paths in number of deployments can
299	   be used in fully distributed mode.  Traffic steering decisions can
300	   autonomously take place in any TE midpoint what is particularly
301	   useful with all SLA or performance based routing deployments.

303	   If there is any comparison to be made between SR and IP TE+NP
304	   architectures putting aside other fundamental differences would be
305	   the assumption of constructing segment routing paths only by Binding
306	   SIDs (divided into static and variable parts) and only encoding them
307	   at each segment endpoint in least significant bits of source and
308	   destination address of the outer IP header.

310	4.  Functional Description

312	   For the purpose of this document the following term definitions will
313	   be used in capital letter notation:

315	   o  CLASSIFIER_ID: Identifier to set of rules used for mapping flows
316	      to TE paths.  Length - 4 octets.

318	   o  PATH_GID_PFX: routable node prefix + locally significant PATH_GID
319	      value.  Length - 4 or 16 octets.

321	   o  SID: routable node prefix + opt. function + opt. parameters + 4
322	      bits (Lookup Type) - Length - 4 or 16 octets.

324	   o  PATH_LIST: ordered list of SIDs.  Length N x 4 or N x 16 octets.
325	      N min = 1.

327	                  v---------- IP TE+NP DOMAIN -----------v

329	                        +---------------SE1--------------+
330	                        |                                |
331	                        |                                |
332	     SRC_NET-----PE1----P1----SE2----P2----P3----SE3----PE2----DST_NET
333	                  |                  ||           |
334	                  |                  ||           |
335	                  +------- SE4 ------++----SE5----+

337	                          Basic Network Topology

339	                                 Figure 1

341	   Consider basic two requirements to be applied for some class of
342	   transit traffic T1 and T2:

344	   o  T1: PATH_A1: PE1--SE1--PE2

346	   o  T2: PATH_A2: PE1--SE4--SE5--PE2

348	   TE midpoints can be placed in any arbitrary network location as long
349	   as IPv4 or IPv6 reachability to such location exist.  They can be
350	   part of someone's IGP domain or can be placed anywhere in the
351	   Internet.  In the above figure P nodes can represent non TE aware
352	   routers in someone's IGP or they can be taken as third party ISPs.

354	   For the clarity of the example let's assume we discuss single
355	   administrative deployment.  IGP metric of all interfaces is set to 10
356	   except interfaces attached to SE1, SE4 and SE5 nodes which are of
357	   metric of 100.

359	   The shortest default path, in the example above, between PEs is:
360	   PE1--P1--SE2--P2--P3--SE3--PE2

362	   In order to accomplish the stated requirements (for traffic classes
363	   T1 and T2 defined above) the following ordered path lists are created
364	   in the control plane and either locally configured on both ingress
365	   and segment endpoints or distributed by any of the control plane
366	   protocols discussed in subsequent sections:

368	       CLASSIFIER_ID: T1                  CLASSIFIER_ID: T2
369	       PATH_GID:   A1                     PATH_GID:   A2
370	       PATH_LIST:  SE1, PE2               PATH_LIST:  SE4, SE5, PE2

372	   There are few core elements of the design as listed below:

374	   o  Each PATH_GID_PFX contains unique routable IP prefix from one of
375	      the loopbacks of the corresponding ingress PE followed by PATH_GID
376	      value (PATH GROUP-ID).  For example, if the loopback's prefix is a
377	      /64 IPv6 prefix there can be 2^64 unique paths originated at a
378	      given PE.  If the loopback address is a /16 IPv4 prefix (for
379	      example used from [RFC1918] space) there can be 2^16 paths
380	      initiating at a given IPv4 PE.  The choice of mapping scheme is
381	      local to the ingress PE and is assigned by the operator.  Let's
382	      observe that in most cases to describe reachability to the
383	      PATH_GID_PFX only a single IGP loopback prefix may need to be
384	      advertised from any ingress PE.  It is also highly recommended
385	      that such loopback prefixes configured on all ingress nodes
386	      (ingress PEs) to be sourced from the same address block such that
387	      it can be described by single aggregate prefix.

389	   o  Each PATH_LIST consists of a number of SID elements.  Each SID is
390	      a unique routable IP address from one of the loopbacks of the
391	      corresponding Segment Endpoint (SE) node.  For example, if the
392	      loopback's prefix is a /64 IPv6 prefix there can be 2^(64-4)
393	      unique SID terminating on a given node.  If the loopback address
394	      is a /16 IPv4 prefix (for example used from [RFC1918] space) there
395	      can be 2^(16-4) SIDs present on a given IPv4 node.  As defined, a
396	      SID may represent not only a node's topological location in the
397	      network (via IP prefix reachability), but it may also, optionally,
398	      contain embedded functions with their parameters.  In order to
399	      even further help the forwarding layer within a given domain, the
400	      last four bits can be consistently chosen to describe the lookup
401	      type required to correctly switch a given packet.

403	   o  Upon ingress to the domain, and after classification, packets are
404	      encapsulated into an additional outer IP header with the following
405	      elements corresponding to the non-default forwarding requirements:

407	      Classified as T1 flows:             Classified as T2 flows:
408	      -----------------------             -----------------------
409	     Source address: PATH_A1_PFX         Source address: PATH_A2_PFX
410	     Destination address: SID_SE1        Destination address: SID_SE4

412	   In the case of IPv6 the encapsulation for the basic TE only
413	   requirement will consist of applying a fixed IPv6 40 octets header
414	   containing source and destination address as described above, the
415	   copy of original flow label, the copied and decremented hop limit
416	   count and, depending on the local policy, CoS setting (copy of
417	   original or setting local value).  In the case of IPv4 scenario the
418	   20 octets IP header will contain TTL copied and decremented from
419	   original packet, CoS (copy of original or setting of local value) + 8
420	   octets UDP header allowing to improve entropy of flows bundled to
421	   travel within the provided TE path yet to still be able to utilize
422	   any ECMP along the path list.

424	   o  Encapsulated packets are natively forwarded via the network (by
425	      and through P nodes) till they arrive at the destination Segment
426	      Endpoint where the destination address gets swapped to the new
427	      destination address from the PATH_LIST kept in the local control
428	      and data plane.  The lookup which returns new destination of the
429	      packet is a source-destination based lookup using both
430	      PATH_GID_PFX (with PATH_GID being encoded in the least significant
431	      bits of the source address of the packet) and SID (encoded in
432	      destination address of the packet).  That allows to maintain very
433	      good scaling property of the solution without SID state or SID
434	      number explosion.  All functions descriptions which are encoded in
435	      the SIDs can be reused across any segment endpoint, if required,
436	      as they have only local significance.

438	   o  When packets arrive at the destination PE (last Segment Node) a
439	      similar lookup is performed which returns NULL as next segment
440	      what in turn will result into the decapsulation of the packet and
441	      regular destination based lookup of the destination address
442	      present in the inner IP header.  As noted, a local optimization
443	      allows to encode the local lookup type in last 4 bits of any SID
444	      hence allowing to skip the first lookup if such optimization is
445	      enabled by the operator.

447	   o  The described lookup table is instantiated and maintained by
448	      either the control plane or by the local configuration of sets of
449	      path lists.  For any given segment end node, only local SIDs
450	      (those where most significant prefix bits match locally configured
451	      prefixes) are populated to data plane along with PATH_GIDs they
452	      are attached to.  That setup is all what is required to provide
453	      basic IP TE service.  More elaboration on other SID values will be
454	      described within the embedded network programming section below.

456	5.  Control plane

458	   The proposed solution is based on classic IP reachability and does
459	   not require any new control plane extension.  In its basic form, and
460	   in order to setup a few TE paths across the sample network in
461	   Figure 1, all is required is to apply two path lists on ingress and
462	   egress nodes as well as on three segment endpoints.

464	   However depending on the required TE scale, on the network size, as
465	   well as on the TE path complexity, real production deployments will
466	   likely utilize automation in order to provision such configurations.
467	   Local NMS can be used successfully to provision all participating
468	   segment nodes with proper set of path lists.  A separate document
469	   specification describing yang models for the solution will be
470	   provided.

472	   Another alternative to propagate set of path lists can be enabled by
473	   using segment routing extensions for PCEP as described in
474	   [I-D.ietf-pce-segment-routing].  For the basic TE use cases path
475	   lists used are identical to SID lists for SR-MPLS or SRv6
476	   technologies.  The logic used by PCE to compute such paths within
477	   given domain can be directly leveraged by this architecture.  The
478	   defined SR-ERO sub-object can be directly used to propagate path
479	   lists not also to ingress and egress nodes, but also to all segment
480	   end points participating in given path list transit.

482	   The described above methods offer a manual or automated way to
483	   distribute path lists from central locations using directed TCP
484	   sessions to all participating network elements.  However, in order to
485	   even further reduce the complexity and increase rate of path list
486	   propagation across any domain a point to multipoint solution could be
487	   utilized.  Also here like in former cases, existing extensions are
488	   available - specifically extension to BGP in order to Advertise
489	   Segment Routing Policies as described in
490	   [I-D.ietf-idr-segment-routing-te-policy].  Detailed encoding examples
491	   will be provided in subsequent versions of this document.

493	   BGP constructs used for SR Policies propagation to ingress nodes can
494	   be used as is in order to propagate analogues path lists to all
495	   participating nodes in the network.  A new SAFI has been defined
496	   (codepoint 73) to separate such propagation from any other address
497	   family as well as to uniquely define the NLRI format.  For the
498	   purpose of dissemination path lists NLRI 4 octet Policy Color will
499	   carry CLASSIFIER_ID and 4 or 16 octet Endpoint field will carry the
500	   PATH_GID value.  If PATH_GID is shorter then 4 or 16 octets the most
501	   significant bits of Endpoint field will be set to zero.  Ordered list
502	   of SIDs will be propagated using Segment List Sub-TLVs (Type 3 for
503	   IPv4 and Type 9 for IPv6).  Optionally other Sub-TLVs can be also
504	   included with propagation of path lists - for example: Preference
505	   Sub-TLV, Priority Sub-TLV, Name Sub-TLV etc...

507	   As intra-domain BGP usually employs route reflection it is likely
508	   that participating nodes may receive many more path lists then
509	   required to be kept or installed into data plane.  There are two
510	   optional solutions to reduce amount of unnecessary control plane
511	   information required to be kept any participating node which when
512	   applied on ingress will result in path lists inbound filtering: use
513	   of route target extended communities or filtering based on
514	   intersection of locally configured IP prefixes with either prefix
515	   part of Endpoint NLRI or prefix part of any SID carried in Segment
516	   List Sub-TLVs.  Even if all path lists received would be accepted by
517	   BGP for operational and troubleshooting needs only those which are
518	   locally significant will be installed into data plane.

520	6.  Data plane

522	   There are three IP TE+NP deployment scenarios which may require
523	   different data plane encoding specific to the type of connectivity
524	   available for ingress, egress and TE transit nodes.  The following
525	   three categories are covered by this specification:

527	      Cat I - deployment within service provider or enterprise where all
528	      participating nodes are interconnected via links operated by the
529	      same organization using addressing scheme in control of such
530	      organization

532	      Cat II - deployment where participating sites are interconnected
533	      over third party operated networks, where participating in IP TE
534	      nodes could allocate sufficient address block to be used as source
535	      address and still permit to encode entire PATH_GID space of the
536	      size chosen by the operator in the least significant bits of the
537	      addresses of such nodes

539	      Cat III - deployment where participating nodes are interconnected
540	      over third party operated infrastructure where all what has been
541	      granted to such nodes are either host routes or prefixes with not
542	      enough bits left to encode PATH_GID

544	   The below building blocks constitute the required minimum data plane
545	   functionality for this architecture:

547	      Source+Destination Routing [I-D.ietf-rtgwg-dst-src-routing]

549	      Choice of encapsulation:

551	      IPv4 in IPv4+UDP [I-D.xu-intarea-ip-in-udp]

553	      IPv6 or IPv4 in IPv6 [RFC2473]

555	   The selection of normal destination only lookup or source+destination
556	   lookup is triggered by lookup of the destination address.  Network
557	   elements which do not participate in the IP TE+NP service will
558	   perform destination only lookup and forward the packets.  Network
559	   elements which do participate in the new architecture will perform
560	   destination address check and if that address matches the local
561	   prefix assigned to IP TE+NP service source+destination lookup will
562	   take place, otherwise standard destination only lookup will be
563	   performed.

565	   For deployments falling into Cat III as classified above available
566	   address space does not allow to encode the PATH_GID as part of the
567	   source address.  Therefore in such scenarios it is recommended to use
568	   additional GRE encapsulation where PATH_GID would be encoded in the 4
569	   octet key field.

571	   Proposed above GRE header encoding applicable only to Cat III
572	   deployments should in addition to already defined rules also follow
573	   described GRE encoding in the following specifications:

575	      IPv4 in IPv4+UDP+GRE [RFC8086]

577	      IPv4 or IPv6 in IPv6+GRE [RFC7676]

579	   In Cat III deployments when source+destination lookup is performed
580	   PATH_GID from GRE key field should be used instead of packet's source
581	   address.  For the case of IPv6 packet encapsulation 12 octets of
582	   zeros should be locally prepended to the key to perform
583	   source+destination lookup.

585	7.  Network Programming

587	   Control Plane Assisted Traffic Engineering is fully compatible with
588	   functions as described in [I-D.ietf-spring-srv6-network-programming]
589	   with one major difference.  Instead of always inserting SIDs in a
590	   form of SRH on ingress and into each packet, there are few
591	   alternative ways proposed by this specification.  One of them assumes
592	   that information about selected functions is added to the packet by
593	   the penultimate node of a given segment end node hop.  SIDs defined
594	   in this document consist of routable prefix part and locally
595	   significant function/instruction part with optional parameters and
596	   lookup type.  They can be 32 bit in the case of IPv4 or 128 bit long
597	   in the case of IPv6 with the length of the routable part being a
598	   local choice of the operator.

600	   PATH_GID+SID lookup can return a simple pointer to the next segment
601	   node or can also result in any other local packet processing chain.
602	   While the routable part of the SID has domain-wide significance the
603	   function part has only local meaning to a given node on which it has
604	   been instantiated.

606	   It needs to be observed that some network functions can, for
607	   practical purposes, only be instantiated of the ingress to the domain
608	   and as such can be attached to the packet during initial
609	   encapsulation by use of Segment Routing Header (SRH) or Desatination
610	   Options Header (DOH).  The examples of such functions include L3VPN
611	   or EVPN or L2VPN demux labels which are to be used when packets
612	   arrive to the other side of the domain with or without TE.

614	   To further simplify the processing of packets via the segment end
615	   nodes and relax the requirement for each transit node to inspect
616	   Extension Header (EH) (when added by ingress node) the document will
617	   recommend that each operator in the domain will reserve the last 4
618	   bits of the SID to explicitly indicate the required lookup type (aka
619	   switching vector) on the outer packet header to occur:

621	            +---------------+--------------------------------+
622	            | Decimal value |          Lookup Type           |
623	            +---------------+--------------------------------+
624	            |       0       |      SRC-DST lookup only       |
625	            |       1       | EH inspection + SRC-DST lookup |
626	            |       2       | Decapsulation + Global lookup  |
627	            |       3       | EH inspection + Decapsulation  |
628	            |       4       |            reserved            |
629	            |       ..      |               ..               |
630	            |       15      |            reserved            |
631	            +---------------+--------------------------------+

633	    Table 1: Recommended allocation of domain wide IPv6 SID_PFX actions

635	   As this specification is only of informational category the proposed
636	   recommendation has non binding character and can be locally replaced
637	   by any different schema as chosen by the operator and made possible
638	   by implementations.  For example the 4 bits may be placed in any
639	   other offset after the SID's routable prefix part.  The proposed SID
640	   Lookup Types do not replace or interfere in any way with SRH SRv6
641	   Endpoint Behaviors as defined in
642	   [I-D.ietf-spring-srv6-network-programming].

644	   As defined today [RFC8200] mandates to inspect and process all
645	   extension headers in the IPv6 packet when packet's destination
646	   matches any of the locally configured IPv6 address.  Therefor if
647	   present SRH will need to be inspected and processed at each segment
648	   end even if it is known by control plane that it does not contain any
649	   instructions to be executed at a given network element ahead of time.
650	   Authors will however still encourage recommended SID structure to be
651	   used for either troubleshooting reasons or for the future when IPv6
652	   specification will relax the EH handling rules to accomodate such new
653	   deployment models.

655	   As an alternative solution to avoid unnecessary processing of
656	   extension header by nodes which are not required to do so
657	   implementation can treat SID with last four bits set to zero as none
658	   local destination address.  In such scenario source+destination
659	   lookup will instead of triggering local extension header processing
660	   invoke destination IPv6 NAT function as defined in [RFC6296].  The
661	   NAT rules which will be pre-programmed using information contained in
662	   the PATH_LIST will effectively result in destination address swap.
663	   Such NAT translation is to be of unidirectional character can can
664	   remain fully stateless.

666	   Described solution also directly applies to the case of IPv4 in IPv6
667	   encapsulation.

669	   In the case of IPv4 in IPv4+UDP encapsulation the basic behaviour of
670	   embedding functions in SIDs does not change.  However as to the
671	   moment of this writing the proposed IPv4 header extensions
672	   [I-D.herbert-ipv4-eh] and [I-D.herbert-ipv4-udpencap-eh] may only
673	   allow limited number of extension headers to be used (Hop-by-Hop
674	   Options and Destination Options).  As such the recommended allocation
675	   table in the case of IPv4 requires slight adjustment:

677	            +---------------+---------------------------------+
678	            | Decimal value |           Lookup Type           |
679	            +---------------+---------------------------------+
680	            |       0       |       SRC-DST lookup only       |
681	            |       1       | DOH inspection + SRC-DST lookup |
682	            |       2       |  Decapsulation + Global lookup  |
683	            |       3       |  DOH inspection + Decapsulation |
684	            |       4       |             reserved            |
685	            |       ..      |                ..               |
686	            |       15      |             reserved            |
687	            +---------------+---------------------------------+

689	    Table 2: Recommended allocation of domain wide IPv4 SID_PFX actions

691	   The specific syntax of Destination Option Header encoding when used
692	   with IPv4 encapsulation will be defined in subsequent versions of
693	   this document.

695	   Existing services (ex: MPLS-VPNs [RFC4364]) are fully compatible as-
696	   is without any modifications to be transported over described IP TE
697	   architecture.  Existing MPLS label can be used as service demux with
698	   full replacement of MPLS-Transport to IP-TE transport.  In such
699	   scenario there is no longer need to rename service demux value into
700	   some new nomenclature to artificially force it to fit into SID space.
701	   Substitute of MPLS transport with new IP TE transport is essentially
702	   treated as basic IP-in-IP encapsulation and is seamless to the upper
703	   layer applications.  That however in no way can prevent invention of
704	   new native services to only use new network programming paradigm.

706	8.  Active Path Probing

708	   One of the critical network metrics for a lot of applications running
709	   on the network is not only ability to reach the destination in a
710	   relatively congestion free fashion, but also the quality of the path
711	   which is traversed towards a destination.  The latter is,
712	   unfortunately, very seldom used as selection criteria in number of TE
713	   implementations.  Here authors recommend that, from day one, the
714	   operator has an option in order to define the minimum path quality
715	   metrics before it is considered for actual data plane use as both
716	   relative or absolute set of values.  Comparison with non TE path or
717	   other TE paths end to end metrics should also be available.

719	   Today's network technologies focus on local protection as reaction to
720	   adjacent link or node failures.  At the same time, there is a
721	   significant concern that they lack detection of any malfunctions of
722	   network elements' internal data plane itself which, as proven in
723	   number of production deployments, does occur.

725	   Moreover, it also needs to be observed that most if not all of
726	   commonly used routing protocols focus on assuring loop free
727	   destination reachability via shortest or best path measured with
728	   static metrics without any consideration given to actual quality of
729	   end to end path towards given destination.

731	   Traffic engineering allows to enable real time SLA evaluation of
732	   various TE paths.  Results of such measurements can be used to
733	   automatically map traffic to such TE transport.  Architecture
734	   described by this specification integrates such functionality
735	   provided an operator chooses to enable it.

737	   It needs to be noted that packets used for diagnostics must traverse
738	   the exact same data plane and should be encapsulated in the identical
739	   header as the user packets.  Such measurements not only detect path
740	   parameters but also end to end path availability.

742	   While (N times path RTT - N times local detection interval) slower
743	   from local protection for vast majority of applications such end to
744	   end path liveness detection rate is both sufficient for applications
745	   and much simpler to implement and operate.  It is also more
746	   attractive due to increased spectrum of types of failures which can
747	   be detected.  Removed complexity required to be employed (example:
748	   node protection repair of adjacent segment nodes) is also an
749	   important consideration.

751	   The choice of path probing protocol is left as the local operator's
752	   decision.  However, it needs to be observed that such protocol suite
753	   should allow fast liveness detection as well as end to end path
754	   quality measurements reported to path headend (typically a network
755	   ingress node) as RTT, Jitter, Delay, MOS parameters as well as max
756	   MTU and sweep MTU path validation.

758	   It is also completely valid to use more than one protocol - each in
759	   different frequency setting.  As an example, one could use BFD
760	   multihop [RFC5883] with hardware offload to detect end to end path
761	   liveness while in the same time apply OWAMP [RFC4656] to collect more
762	   unidirectional path quality metrics.  Recommendation for a single
763	   integrated path liveness and quality reporting protocol will also be
764	   described in a separate IETF specification.

766	8.1.  TI-LFA Local Protection

768	   As stated in the TI-LFA specification for networks supporting segment
769	   routing [I-D.ietf-rtgwg-segment-routing-ti-lfa], protection of SR
770	   policy midpoints involves adjustments to segment list carried in the
771	   packets as well as proper selection of repair path in order to assure
772	   that protected packets can successfully reach the next SR policy
773	   segment node.

775	   Based on the control plane distribution of complete PATH_LIST,
776	   similar protection is possible in the described architecture.
777	   Without any additional requirements to adjust any other fields in the
778	   packet header only destination address can be swapped.  Current
779	   destination can be replaced by subsequent node's destination address
780	   on the PATH_LIST upon detection of neighboring node failure.  That
781	   operation however, requires to maintain per path state at PLRs what
782	   while certainly possible may not be operator's preference.

784	   Enabling local protection in segment engineered IP networks is
785	   clearly possible, however it needs additional processing and control
786	   plane information to be distributed and present on all nodes in the
787	   domain.  Protection PATH_LISTs can be either computed centrally or by
788	   any node in the domain (including PLRs).  Authors recommend this to
789	   remain a local operator decision and at the same time encourage to
790	   use end to end path protection scheme as first preference.

792	9.  Solution advantages

794	   The following key advantages can be used to characterize the
795	   described architecture:

797	   o  Native TE support for IPv4 and IPv6

799	   o  Very efficient use of available address space - no requirement for
800	      any new address allocations

802	   o  IGP impact - single prefix injection from ingress nodes of length
803	      chosen by operator

805	   o  Ability to aggregate injected prefixes at area or domain boundary
806	      with no impact to functionality

808	   o  No extensions to ISIS or OSPF routing protocols required

810	   o  Reuse of commonly available components (SRC-DST routing and IPinIP
811	      encapsulation)

813	   o  Integrated end to end path validation for reachability and quality

815	   o  For basic TE and PATH_LIST SID integrated network programming
816	      functions fixed overhead of 28 octets for IPv4 and 40 octets for
817	      IPv6.

819	   o  Full compatibility with SRH from SRv6 Network Programming concept

821	   o  No per user data flow state in any network element of the network
822	      except ingress (mapping only)

824	   o  No packet header size growth with the growing number of TE segment
825	      endpoints policies

827	   o  Support in all available hardware - no need for any new operations
828	      on the packet headers

830	   o  TI-LFA support when end to end path protection will not be
831	      sufficient

833	   o  Full native support of network services: L2VPNs, L3VPNs, EVPNs etc
834	      with single SID in SRH or native service level encapsulation

836	   o  Support of ingress, egress or transit nodes with available only
837	      single host address available on each such system

839	10.  OAM

841	   As result of use of IP encapsulation both traceroute as well as ping
842	   are natively supported within a given domain boundaries.  ICMP or UDP
843	   OAM probes will be encapsulated in the exact same IPv4 or IPv6 header
844	   as user data packets therefore all replies will be sent to the domain
845	   ingress node.

847	   No modifications to additional extension headers or even their
848	   presence is required for correct OAM operations.

850	   If an OAM packet is originated externally to the domain, the ingress
851	   node will need to act as OAM proxy in relaying the responses to its
852	   original sources.

854	11.  Deployment considerations

856	   The solution is defined to be fully customizable by the operator.
857	   The path engineering as well as choice of numbering will likely
858	   differ domain to domain.

860	   As all packets subject to this specification carry in their source
861	   address immutable PATH_GID.  Together with locally assigned SIDs no
862	   further extensions are necessary to identify specific path flows at
863	   any point in the domain.  The same tuple PATH_GIDs + SIDs can also be
864	   used to identify any path statistics (netflow records) at any point
865	   in the domain.

867	12.  Security considerations

869	   The described architecture reuses standard components defined in
870	   other IETF WGs.  It does not define any new protocol or data plane
871	   extensions.  All security related work applicable to each used
872	   component is also recommended to be applied to IP TE+NP architecture.

874	13.  IANA Considerations

876	   No IANA allocations are required by this specification.

878	14.  Acknowledgements

880	   Authors would like to thank Tony Li, Stefano Previdi, Dirk Steinberg,
881	   Francois Clad, Joel Halpern and Linda Dunbar for their valuable
882	   review and comments.

884	15.  References

886	15.1.  Normative References

888	   [RFC1918]  Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.,
889	              and E. Lear, "Address Allocation for Private Internets",
890	              BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996,
891	              <https://www.rfc-editor.org/info/rfc1918>.

893	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
894	              DOI 10.17487/RFC2003, October 1996,
895	              <https://www.rfc-editor.org/info/rfc2003>.

897	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
898	              Requirement Levels", BCP 14, RFC 2119,
899	              DOI 10.17487/RFC2119, March 1997,
900	              <https://www.rfc-editor.org/info/rfc2119>.

902	   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
903	              IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473,
904	              December 1998, <https://www.rfc-editor.org/info/rfc2473>.

906	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
907	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
908	              DOI 10.17487/RFC2784, March 2000,
909	              <https://www.rfc-editor.org/info/rfc2784>.

911	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
912	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
913	              2006, <https://www.rfc-editor.org/info/rfc4364>.

915	   [RFC6296]  Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix
916	              Translation", RFC 6296, DOI 10.17487/RFC6296, June 2011,
917	              <https://www.rfc-editor.org/info/rfc6296>.

919	   [RFC7676]  Pignataro, C., Bonica, R., and S. Krishnan, "IPv6 Support
920	              for Generic Routing Encapsulation (GRE)", RFC 7676,
921	              DOI 10.17487/RFC7676, October 2015,
922	              <https://www.rfc-editor.org/info/rfc7676>.

924	   [RFC8086]  Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE-
925	              in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086,
926	              March 2017, <https://www.rfc-editor.org/info/rfc8086>.

928	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
929	              Writing an IANA Considerations Section in RFCs", BCP 26,
930	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
931	              <https://www.rfc-editor.org/info/rfc8126>.

933	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
934	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
935	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

937	   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
938	              (IPv6) Specification", STD 86, RFC 8200,
939	              DOI 10.17487/RFC8200, July 2017,
940	              <https://www.rfc-editor.org/info/rfc8200>.

942	15.2.  Informative References

944	   [I-D.herbert-ipv4-eh]
945	              Herbert, T., "IPv4 Extension Headers and Flow Label",
946	              draft-herbert-ipv4-eh-01 (work in progress), May 2019.

948	   [I-D.herbert-ipv4-udpencap-eh]
949	              Herbert, T., "IPv4 Extension Headers and UDP Encapsulated
950	              Extension Headers", draft-herbert-ipv4-udpencap-eh-01
951	              (work in progress), March 2019.

953	   [I-D.ietf-6man-segment-routing-header]
954	              Filsfils, C., Dukes, D., Previdi, S., Leddy, J.,
955	              Matsushima, S., and d. daniel.voyer@bell.ca, "IPv6 Segment
956	              Routing Header (SRH)", draft-ietf-6man-segment-routing-
957	              header-23 (work in progress), September 2019.

959	   [I-D.ietf-idr-segment-routing-te-policy]
960	              Previdi, S., Filsfils, C., Mattes, P., Rosen, E., Jain,
961	              D., and S. Lin, "Advertising Segment Routing Policies in
962	              BGP", draft-ietf-idr-segment-routing-te-policy-07 (work in
963	              progress), July 2019.

965	   [I-D.ietf-pce-segment-routing]
966	              Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W.,
967	              and J. Hardwick, "PCEP Extensions for Segment Routing",
968	              draft-ietf-pce-segment-routing-16 (work in progress),
969	              March 2019.

971	   [I-D.ietf-rtgwg-dst-src-routing]
972	              Lamparter, D. and A. Smirnov, "Destination/Source
973	              Routing", draft-ietf-rtgwg-dst-src-routing-07 (work in
974	              progress), March 2019.

976	   [I-D.ietf-rtgwg-segment-routing-ti-lfa]
977	              Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B.,
978	              Francois, P., daniel.voyer@bell.ca, d., Clad, F., and P.
979	              Camarillo, "Topology Independent Fast Reroute using
980	              Segment Routing", draft-ietf-rtgwg-segment-routing-ti-
981	              lfa-01 (work in progress), March 2019.

983	   [I-D.ietf-spring-srv6-network-programming]
984	              Filsfils, C., Camarillo, P., Leddy, J.,
985	              daniel.voyer@bell.ca, d., Matsushima, S., and Z. Li, "SRv6
986	              Network Programming", draft-ietf-spring-srv6-network-
987	              programming-03 (work in progress), September 2019.

989	   [I-D.patel-raszuk-bgp-vector-routing]
990	              Raszuk, R., Patel, K., Pithawala, B., Sajassi, A.,
991	              Osborne, E., Jalil, L., and J. Uttaro, "BGP vector
992	              routing.", draft-patel-raszuk-bgp-vector-routing-07 (work
993	              in progress), May 2016.

995	   [I-D.xu-intarea-ip-in-udp]
996	              Xu, X., Assarpour, H., Ma, S., daniel.bernier@bell.ca, d.,
997	              Dukes, D., Lee, Y., and F. Yongbing, "Encapsulating IP in
998	              UDP", draft-xu-intarea-ip-in-udp-07 (work in progress),
999	              May 2018.

1001	   [RFC4655]  Farrel, A., Vasseur, J., and J. Ash, "A Path Computation
1002	              Element (PCE)-Based Architecture", RFC 4655,
1003	              DOI 10.17487/RFC4655, August 2006,
1004	              <https://www.rfc-editor.org/info/rfc4655>.

1006	   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
1007	              Zekauskas, "A One-way Active Measurement Protocol
1008	              (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006,
1009	              <https://www.rfc-editor.org/info/rfc4656>.

1011	   [RFC5883]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
1012	              (BFD) for Multihop Paths", RFC 5883, DOI 10.17487/RFC5883,
1013	              June 2010, <https://www.rfc-editor.org/info/rfc5883>.

1015	   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
1016	              Decraene, B., Litkowski, S., and R. Shakir, "Segment
1017	              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
1018	              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

1020	Author's Address

1022	   Robert Raszuk (editor)
1023	   Bloomberg LP
1024	   731 Lexington Ave
1025	   New York City, NY  10022
1026	   USA

1028	   Email: robert@raszuk.net