idnits 2.17.1 

draft-ietf-rtgwg-dt-encap-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 298: '...italized keyword MUST is used as defin...'
     RFC 2119 keyword, line 376: '...lancing procedure MUST choose the same...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 21, 2016) is 2950 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'I-D.xu-bier-encapsulation' is defined on line 1888,
     but no explicit reference was found in the text

  == Outdated reference: A later version (-19) exists of
     draft-ietf-tsvwg-rfc5405bis-10

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085)

  ** Obsolete normative reference: RFC 6830 (Obsoleted by RFC 9300, RFC 9301)

  == Outdated reference: A later version (-08) exists of
     draft-ietf-nvo3-arch-04

  == Outdated reference: A later version (-15) exists of
     draft-ietf-tsvwg-circuit-breaker-13

  == Outdated reference: A later version (-19) exists of
     draft-ietf-tsvwg-gre-in-udp-encap-11

  == Outdated reference: A later version (-12) exists of
     draft-saldana-tsvwg-simplemux-04

  == Outdated reference: A later version (-06) exists of
     draft-xu-bier-encapsulation-03


     Summary: 5 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTGWG                                                   E. Nordmark (ed)
3	Internet-Draft                                           Arista Networks
4	Intended status: Informational                                   A. Tian
5	Expires: September 22, 2016                                Ericsson Inc.
6	                                                                J. Gross
7	                                                                  VMware
8	                                                               J. Hudson
9	                                    Brocade Communications Systems, Inc.
10	                                                              L. Kreeger
11	                                                     Cisco Systems, Inc.
12	                                                                 P. Garg
13	                                                               Microsoft
14	                                                               P. Thaler
15	                                                    Broadcom Corporation
16	                                                              T. Herbert
17	                                                                Facebook
18	                                                          March 21, 2016

20	                      Encapsulation Considerations
21	                      draft-ietf-rtgwg-dt-encap-01

23	Abstract

25	   The IETF Routing Area director has chartered a design team to look at
26	   common issues for the different data plane encapsulations being
27	   discussed in the NVO3 and SFC working groups and also in the BIER
28	   BoF, and also to look at the relationship between such encapsulations
29	   in the case that they might be used at the same time.  The purpose of
30	   this design team is to discover, discuss and document considerations
31	   across the different encapsulations in the different WGs/BoFs so that
32	   we can reduce the number of wheels that need to be reinvented in the
33	   future.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at http://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on September 22, 2016.

51	Copyright Notice

53	   Copyright (c) 2016 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents
58	   (http://trustee.ietf.org/license-info) in effect on the date of
59	   publication of this document.  Please review these documents
60	   carefully, as they describe your rights and restrictions with respect
61	   to this document.  Code Components extracted from this document must
62	   include Simplified BSD License text as described in Section 4.e of
63	   the Trust Legal Provisions and are provided without warranty as
64	   described in the Simplified BSD License.

66	Table of Contents

68	   1.  Design Team Charter . . . . . . . . . . . . . . . . . . . . .   3
69	   2.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   3.  Common Issues . . . . . . . . . . . . . . . . . . . . . . . .   5
71	   4.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
72	   5.  Assumptions . . . . . . . . . . . . . . . . . . . . . . . . .   6
73	   6.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   7
74	   7.  Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . .   7
75	   8.  Next-protocol indication  . . . . . . . . . . . . . . . . . .   9
76	   9.  MTU and Fragmentation . . . . . . . . . . . . . . . . . . . .  10
77	   10. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
78	   11. Security Considerations . . . . . . . . . . . . . . . . . . .  13
79	     11.1.  Encapsulation-specific considerations  . . . . . . . . .  14
80	     11.2.  Virtual network isolation  . . . . . . . . . . . . . . .  15
81	     11.3.  Packet level security  . . . . . . . . . . . . . . . . .  16
82	     11.4.  In summary:  . . . . . . . . . . . . . . . . . . . . . .  17
83	   12. QoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  17
84	   13. Congestion Considerations . . . . . . . . . . . . . . . . . .  18
85	   14. Header Protection . . . . . . . . . . . . . . . . . . . . . .  20
86	   15. Extensibility Considerations  . . . . . . . . . . . . . . . .  22
87	   16. Layering Considerations . . . . . . . . . . . . . . . . . . .  25
88	   17. Service model . . . . . . . . . . . . . . . . . . . . . . . .  26
89	   18. Hardware Friendly . . . . . . . . . . . . . . . . . . . . . .  27
90	     18.1.  Considerations for NIC offload . . . . . . . . . . . . .  28
91	   19. Middlebox Considerations  . . . . . . . . . . . . . . . . . .  32
92	   20. Related Work  . . . . . . . . . . . . . . . . . . . . . . . .  32
93	   21. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  34
94	   22. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . .  34
95	   23. Change Log  . . . . . . . . . . . . . . . . . . . . . . . . .  35
96	   24. References  . . . . . . . . . . . . . . . . . . . . . . . . .  35
97	     24.1.  Normative References . . . . . . . . . . . . . . . . . .  35
98	     24.2.  Informative References . . . . . . . . . . . . . . . . .  38
99	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  41

101	1.  Design Team Charter

103	   There have been multiple efforts over the years that have resulted in
104	   new or modified data plane behaviors involving encapsulations.  That
105	   includes IETF efforts like MPLS, LISP, and TRILL but also industry
106	   efforts like VXLAN and NVGRE.  These collectively can be seen as a
107	   source of insight into the properties that data planes need to meet.
108	   The IETF is currently working on potentially new encapsulations in
109	   NVO3 and SFC and considering working on BIER.  In addition there is
110	   work on tunneling in the INT area.

112	   This is a short term design team chartered to collect and construct
113	   useful advice to parties working on new or modified data plane
114	   behaviors that include additional encapsulations.  The goal is for
115	   the group to document useful advice gathered from interacting with
116	   ongoing efforts.  An Internet Draft will be produced for IETF92 to
117	   capture that advice, which will be discussed in RTGWG.

119	   Data plane encapsulations face a set of common issues such as:

121	   o  How to provide entropy for ECMP
122	   o  Issues around packet size and fragmentation/reassembly
123	   o  OAM - what support is needed in an encapsulation format?
124	   o  Security and privacy.
125	   o  QoS
126	   o  Congestion Considerations
127	   o  IPv6 header protection (zero UDP checksum over IPv6 issue)
128	   o  Extensibility - e.g., for evolving OAM, security, and/or
129	      congestion control
130	   o  Layering of multiple encapsulations e.g., SFC over NVO3 over BIER

132	   The design team will provide advice on those issues.  The intention
133	   is that even where we have different encapsulations for different
134	   purposes carrying different information, each such encapsulation
135	   doesn't have to reinvent the wheel for the above common issues.

137	   The design team will look across the routing area in particular at
138	   SFC, NVO3 and BIER.  It will not be involved in comparing or
139	   analyzing any particular encapsulation formats proposed in those WGs
140	   and BoFs but instead focus on common advice.

142	2.  Overview

144	   The references provide background information on NVO3, SFC, and BIER.
145	   In particular, NVO3 is introduced in [RFC7364], [RFC7365], and
146	   [I-D.ietf-nvo3-arch].  SFC is introduced in
147	   [I-D.ietf-sfc-architecture] and [I-D.ietf-sfc-problem-statement].
148	   Finally, the information on BIER is in
149	   [I-D.shepherd-bier-problem-statement],
150	   [I-D.wijnands-bier-architecture], and
151	   [I-D.wijnands-mpls-bier-encapsulation].  We assume the reader has
152	   some basic familiarity with those proposed encapsulations.  The
153	   Related Work section points at some prior work that relates to the
154	   encapsulation considerations in this document.

156	   Encapsulation protocols typically have some unique information that
157	   they need to carry.  In some cases that information might be modified
158	   along the path and in other cases it is constant.  The in-flight
159	   modifications has impacts on what it means to provide security for
160	   the encapsulation headers.

162	   o  NVO3 carries a VNI Identifier edge to edge which is not modified.
163	      There has been OAM discussions in the WG and it isn't clear
164	      whether some of the OAM information might be modified in flight.
165	   o  SFC carries Service Function Path identification and service meta-
166	      data.  The meta-data might be modified as the packets follow the
167	      service path.  SFC talks of some loop avoidance mechanism which is
168	      likely to result in modifications for for each hop in the service
169	      chain even if the meta-data is unmodified.
170	   o  BIER carries a bitmap of egress ports to which a packet should be
171	      delivered, and as the packet is forwarded down different paths
172	      different bits are cleared in that bitmap.

174	   Even if information isn't modified in flight there might be devices
175	   that wish to inspect that information.  For instance, one can
176	   envision future NVO3 security devices which filter based on the
177	   virtual network identifier.

179	   The need for extensibility is different across the protocols

181	   o  NVO3 might need some extensions for OAM and security.
182	   o  SFC consists of Service Function Path identification plus carrying
183	      service meta-data along a path, and different services might need
184	      different types and amount of meta-data.
185	   o  BIER might need variable number of bits in their bitmaps, or other
186	      future schemes to scale up to larger network.

188	   The extensibility needs and constraints might be different when
189	   considering hardware vs. software implementations of the
190	   encapsulation headers.  NIC hardware might have different constraints
191	   than switch hardware.

193	   As the IETF designs these encapsulations the different WGs solve the
194	   issues for their own encapsulation.  But there are likely to be
195	   future cases when the different encapsulations are combined in the
196	   same header.  For instance, NVO3 might be a "transport" used to carry
197	   SFC between the different hops in the service chain.

199	   Most of the issues discussed in this document are not new.  The IETF
200	   and industry as specified and deployed many different encapsulation
201	   or tunneling protocols over time, ranging from simple IP-in-IP and
202	   GRE encapsulation, IPsec, pseudo-wires, session-based approached like
203	   L2TP, and the use of MPLS control and data planes.  IEEE 802 has also
204	   defined layered encapsulation for Provider Backbone Bridges (PBB) and
205	   IEEE 802.1Qbp (ECMP).  This document tries to leverage what we
206	   collectively have learned from that experience and summarize what
207	   would be relevant for new encapsulations like NVO3, SFC, and BIER.

209	3.  Common Issues

211	   [This section is mostly a repeat of the charter but with a few
212	   modifications and additions.]

214	   Any new encapsulation protocol would need to address a large set of
215	   issues that are not central to the new information that this protocol
216	   intends to carry.  The common issues explored in this document are:

218	   o  How to provide entropy for Equal Cost MultiPath (ECMP) routing
219	   o  Issues around packet size and fragmentation/reassembly
220	   o  Next header indication - each encapsulation might be able to carry
221	      different payloads
222	   o  OAM - what support is needed in an encapsulation format?
223	   o  Security and privacy
224	   o  QoS
225	   o  Congestion Considerations
226	   o  Header protection
227	   o  Extensibility - e.g., for evolving OAM, security, and/or
228	      congestion control
229	   o  Layering of multiple encapsulations e.g., SFC over NVO3 over BIER
230	   o  Importance of being friendly to hardware and software
231	      implementations

233	   The degree to which these common issues apply to a particular
234	   encapsulation can differ based on the intended purpose of the
235	   encapsulation.  But it is useful to understand all of them before
236	   determining which ones apply.

238	4.  Scope

240	   It is important to keep in mind what we are trying to cover and not
241	   cover in this document and effort.  This is

243	   o  A look across the three new encapsulations, while taking lots of
244	      previous work into account
245	   o  Focus on the class of encapsulations that would run over IP/UDP.
246	      That was done to avoid being distracted by the data-plane and
247	      control-plane interaction, which is more significant for protocols
248	      that are designed to run over "transports" that maintain session
249	      or path state.
250	   o  We later expanded the scope somewhat to consider how the
251	      encapsulations would play with MPLS "transport", which is
252	      important because SFC and BIER seem to target being independent of
253	      the underlying "transport"

255	   However, this document and effort is NOT intended to:

257	   o  Design some new encapsulation header to rule them all
258	   o  Design yet another new NVO3 encapsulation header
259	   o  Try to select the best encapsulation header
260	   o  Evaluate any existing and proposed encapsulations

262	   While the origin and focus of this document is the routing area and
263	   in particular NVO3, SFC, and BIER, the considerations apply to other
264	   encapsulations that are being defined in the IETF and elsewhere.
265	   There seems to be an increase in the number of encapsulations being
266	   defined to run over UDP, where there might already exist an
267	   encapsulation over IP or Ethernet.  Feedback on how these
268	   considerations apply in those contexts is welcome.

270	5.  Assumptions

272	   The design center for the new encapsulations is a well-managed
273	   network.  That network can be a datacenter network (plus datacenter
274	   interconnect) or a service provider network.  Based on the existing
275	   and proposed encapsulations in those environment it is reasonable to
276	   make these assumptions:

278	   o  The MTU is carefully managed and configured.  Hence an
279	      encapsulation protocol can make the packets bigger without
280	      resulting in a requirement for fragmentation and reassembly
281	      between ingress and egress.  (However, it might be useful to
282	      detecting MTU misconfigurations.)
283	   o  In general an encapsulation needs some approach for congestion
284	      management.  But the assumptions are different than for arbitrary
285	      Internet paths in that the underlay might be well-provisioned and
286	      better policed at the edge, and due to multi-tenancy, the
287	      congestion control in the endpoints might be even less trusted
288	      than on the Internet at large.

290	   The goal is to implement these encapsulations in hardware and
291	   software hence we can't assume that the needs of either
292	   implementation approach can trump the needs of the other.  In
293	   particular, around extensibility the needs and constraints might be
294	   quite different.

296	6.  Terminology

298	   The capitalized keyword MUST is used as defined in
299	   http://en.wikipedia.org/wiki/Julmust

301	   TBD: Refer to existing documents for at least NVO3 and SFC
302	   terminology.  We use at least the VNI ID in this document.

304	7.  Entropy

306	   In many cases the encapsulation format needs to enable ECMP in
307	   unmodified routers.  Those routers might use different fields in TCP/
308	   UDP packets to do ECMP without a risk of reordering a flow.  Note
309	   that the same entropy might also be used at layer 2 e.g. for Link
310	   Aggregation (LAG).

312	   The common way to do ECMP-enabled encapsulation over IP today is to
313	   add a UDP header and to use UDP with the UDP source port carrying
314	   entropy from the inner/original packet headers as in LISP [RFC6830].
315	   The total entropy consists of 14 bits in the UDP source port (using
316	   the ephemeral port range) plus the outer IP addresses which seems to
317	   be sufficient for entropy; using outer IPv6 headers would give the
318	   option for more entropy should it be needed in the future.

320	   In some environments it might be fine to use all 16 bits of the port
321	   range.  However, middleboxes might make assumptions about the system
322	   ports or user ports.  But they should not make any assumptions about
323	   the ports in the Dynamic and/or Private Port range, which have the
324	   two MSBs set to 11b.

326	   The UDP source port might change over the lifetime of an encapsulated
327	   flow, for instance for DoS mitigation or re-balancing load across
328	   ECMP.  Such changes need to consider reordering if there are packets
329	   in flight for the flow.

331	   There is some interaction between entropy and OAM and extensibility
332	   mechanism.  It is desirable to be able to send OAM packets to follow
333	   the same path as network packets.  Hence OAM packets should use the
334	   same entropy mechanism as data packets.  While routers might use
335	   information in addition the entropy field and outer IP header, they
336	   can not use arbitrary parts of the encapsulation header since that
337	   might result in OAM frames taking a different path.  Likewise if
338	   routers look past the encapsulation header they need to be aware of
339	   the extensibility mechanism(s) in the encapsulation format to be able
340	   to find the inner headers in the presence of extensions; OAM frames
341	   might use some extensions e.g. for timestamps.

343	   Architecturally the entropy and the next header field are really part
344	   of enclosing delivery header.  UDP with entropy goes hand-in-hand
345	   with the outer IP header.  Thus the UDP entropy is present for the
346	   underlay IP routers the same way that an MPLS entropy label is
347	   present for LSRs.  The entropy above is all about providing entropy
348	   for the outer delivery of the encapsulated packets.

350	   It has been suggested that when IPv6 is used it would not be
351	   necessary to add a UDP header for entropy, since the IPv6 flow label
352	   can be used for entropy.  (This assumes that there is an IP protocol
353	   number for the encapsulation in addition to a UDP destination port
354	   number since UDP would be used with IPv4 underlay.  And any use of
355	   UDP checksums would need to be replaced by an encaps-specific
356	   checksum or secure hash.)  While such an approach would save 8 bytes
357	   of headers when the underlay is IPv6, it does assume that the
358	   underlay routers use the flow label for ECMP, and it also would make
359	   the IPv6 approach different than the IPv4 approach.  Currently the
360	   leaning is towards recommending using the UDP encapsulation for both
361	   IPv4 and IPv6 underlay.  The IPv6 flow label can be used for
362	   additional entropy if need be.  There is more detailed discussion for
363	   using the IPv6 flow label for tunnels in [RFC6438].

365	   Note that in the proposed BIER encapsulation
366	   [I-D.wijnands-mpls-bier-encapsulation], there is an an 8-bit field
367	   which specifies an entropy value that can be used for load balancing
368	   purposes.  This entropy is for the BIER forwarding decisions, which
369	   is independent of any outer delivery ECMP between BIER routers.  Thus
370	   it is not part of the delivery ECMP discussed in this section.

372	      [Note: For any given bit in BIER (that identifies an exit from the
373	      BIER domain) there might be multiple immediate next hops.  The
374	      BIER entropy field is used to select that next hop as part of BIER
375	      processing.  The BIER forwarding process may do equal cost load
376	      balancing, but the load balancing procedure MUST choose the same
377	      path for any two packets that have the same entropy value.]

379	   In summary:

381	   o  The entropy is associated with the transport, that is an outer IP
382	      header or MPLS.
383	   o  In the case of IP transport use 14 or 16 bits of UDP source port,
384	      plus outer IPv6 flowid for entropy.

386	8.  Next-protocol indication

388	   Next-protocol indications appear in three different contexts for
389	   encapsulations.

391	   Firstly, the transport delivery mechanism for the encapsulations we
392	   discuss in this document need some way to indicate which
393	   encapsulation header (or other payload) comes next in the packet.
394	   Some encapsulations might be identified by a UDP port; others might
395	   be identified by an Ethernet type or IP protocol number.  Which
396	   approach is used is a function of the preceding header the same way
397	   as IPv4 is identified by both an Ethernet type and an IP protocol
398	   number (for IP-in-IP).  In some cases the header type is implicit in
399	   some session (L2TP) or path (MPLS) setup.  But this is largely beyond
400	   the control of the encapsulation protocol.  For instance, if there is
401	   a requirement to carry the encapsulation after an Ethernet header,
402	   then an Ethernet type is needed.  If required to be carried after an
403	   IP/UDP header, then a UDP port number is needed.  For UDP port
404	   numbers there are considerations for port number conservation
405	   described in [I-D.ietf-tsvwg-port-use].

407	   It is worth mentioning that in the MPLS case of no implicit protocol
408	   type many forwarding devices peek at the first nibble of the payload
409	   to determine whether to apply IPv4 or IPv6 L3/L4 hashes for load
410	   balancing [RFC7325].  That behavior places some constraints on other
411	   payloads carried over MPLS and some protocol define an initial
412	   control word in the payload with a value of zero in its first nibble
413	   [RFC4385] to avoid confusion with IPv4 and IPv6 payload headers.

415	   Secondly, the encapsulation needs to indicate the type of its
416	   payload, which is in scope for the design of the encapsulation.  We
417	   have existing protocols which use Ethernet types (such as GRE).  Here
418	   each encapsulation header can potentially makes its own choices
419	   between:

421	   o  Use the Ethernet type space - makes it easy to carry existing L2
422	      and L3 protocols including IPv4, IPv6, and Ethernet.
423	      Disadvantages are that it is a 16 bit number and we probably need
424	      far less than 100 values, and the number space is controlled by
425	      the IEEE 802 RAC with its own allocation policies.
426	   o  Use the IP protocol number space - makes it easy to carry e.g.,
427	      ESP in addition to IP and Ethernet but brings in all existing
428	      protocol numbers many of which would never be used directly on top
429	      of the encapsulation protocol.  IANA managed eight bit values,
430	      presumably more difficult to get an assigned number than to get a
431	      transport port assignment.
432	   o  Define their own next-protocol number space, which can use fewer
433	      bits than an Ethernet type and give more flexibility, but at the
434	      cost of administering that numbering space (presumably by the
435	      IANA).

437	   Thirdly, if the IETF ends up defining multiple encapsulations at
438	   about the same time, and there is some chance that multiple such
439	   encapsulations can be combined in the same packet, there is a
440	   question whether it makes sense to use a common approach and
441	   numbering space for the encapsulation across the different protocols.
442	   A common approach might not be beneficial as long as there is only
443	   one way to indicate e.g., SFC inside NVO3.

445	   Many Internet protocols use fixed values (typically managed by the
446	   IANA function) for their next-protocol field.  That facilitates
447	   interpretation of packets by middleboxes and e.g., for debugging
448	   purposes, but might make the protocol evolution inflexible.  Our
449	   collective experience with MPLS shows an alternative where the label
450	   can be viewed as an index to a table containing processing
451	   instructions and the table content can be managed in different ways.
452	   Encapsulations might want to consider the tradeoffs between such more
453	   flexible versus more fixed approaches.

455	   In summary:

457	   o  Would it be useful for the IETF come up with a common scheme for
458	      encapsulation protocols?  If not each encapsulation can define its
459	      own scheme.

461	9.  MTU and Fragmentation

463	   A common approach today is to assume that the underlay have
464	   sufficient MTU to carry the encapsulated packets without any
465	   fragmentation and reassembly at the tunnel endpoints.  That is
466	   sufficient when the operator of the ingress and egress have full
467	   control of the paths between those endpoints.  And it makes for
468	   simpler (hardware) implementations if fragmentation and reassembly
469	   can be avoided.

471	   However, even under that assumption it would be beneficial to be able
472	   to detect when there is some misconfiguration causing packets to be
473	   dropped due to MTU issues.  One way to do this is to have the
474	   encapsulator set the don't-fragment (DF) flag in the outer IPv4
475	   header and receive and log any received ICMP "packet too big" (PTB)
476	   errors.  Note that no flag needs to be set in an outer IPv6 header
477	   [RFC2460].

479	   Encapsulations could also define an optional tunnel fragmentation and
480	   reassembly mechanism which would be useful in the case when the
481	   operator doesn't have full control of the path, or when the protocol
482	   gets deployed outside of its original intended context.  Such a
483	   mechanism would be required if the underlay might have a path MTU
484	   which makes it impossible to carry at least 1518 bytes (if offering
485	   Ethernet service), or at least 1280 (if offering IPv6 service).  The
486	   use of such a protocol mechanism could be triggered by receiving a
487	   PTB.  But such a mechanism might not be implemented by all
488	   encapsulators and decapsulators.  [Aerolink is one example of such a
489	   protocol.]

491	   Depending on the payload carried by the encapsulation there are some
492	   additional possibilities:

494	   o  If payload is IPv4/6 then the underlay path MTU could be used to
495	      report end-to-end path MTU.
496	   o  If the payload service is Ethernet/L2, then there is no such per
497	      destination reporting mechanism.  However, there is a LLDP TLV for
498	      reporting max frame size; might be useful to report minimum to end
499	      stations, but unmodified end stations would do nothing with that
500	      TLV since they assume that the MTU is at least 1518.

502	   In summary:

504	   o  In some deployments an encapsulation can assume well-managed MTU
505	      hence no need for fragmentation and reassembly related to the
506	      encapsulation.
507	   o  Even so, it makes sense for ingress to track any ICMP packet too
508	      big addressed to ingress to be able to log any MTU
509	      misconfigurations.
510	   o  Should an encapsulation protocol be deployed outside of the
511	      original context it might very well need support for fragmentation
512	      and reassembly.

514	10.  OAM

516	   The OAM area is seeing active development in the IETF with
517	   discussions (at least) in NVO3 and SFC working groups, plus the new
518	   LIME WG looking at architecture and YANG models.

520	   The design team has take a narrow view of OAM to explore the
521	   potential OAM implications on the encapsulation format.

523	   In terms of what we have heard from the various working groups there
524	   seem to be needs to:

526	   o  Be able to send out-of-band OAM messages - that potentially should
527	      follow the same path through the network as some flow of data
528	      packets.

530	      *  Such OAM messages should not accidentally be decapsulated and
531	         forwarded to the end stations.
532	   o  Be able to add OAM information to data packets that are
533	      encapsulated.  Discussions have been around:

535	      *  Using a bit in the OAM to synchronize sampling of counters
536	         between the encapsulator and decapsulator.
537	      *  Optional timestamps, sequence numbers, etc for more detailed
538	         measurements between encapsulator and decapsulator.
539	   o  Usable for both proactive monitoring (akin to BFD) and reactive
540	      checks (akin to traceroute to pin-point a failure)

542	   To ensure that the OAM messages can follow the same path the OAM
543	   messages need to get the same ECMP (and LAG hashing) results as a
544	   given data flow.  An encapsulator can choose between one of:

546	   o  Limit ECMP hashing to not look past the UDP header i.e. the
547	      entropy needs to be in the source/destination IP and UDP ports
548	   o  Make OAM packets look the same as data packets i.e. the initial
549	      part of the OAM payload has the inner Ethernet, IP, TCP/UDP
550	      headers as a payload.  (This approach was taken in TRILL out of
551	      necessity since there is no UDP header.)  Any OAM bit in the
552	      encapsulation header must in any case be excluded from the
553	      entropy.

555	   There can be several ways to prevent OAM packets from accidentally
556	   being forwarded to the end station using:

558	   o  A bit in the frame (as in TRILL) indicating OAM
559	   o  A next-protocol indication with a designated value for "none" or
560	      "oam".

562	   This assumes that the bit or next protocol, respectively, would not
563	   affect entropy/ECMP in the underlay.  However, the next-protocol
564	   field might be used to provide differentiated treatment of packets
565	   based on their payload; for instance a TCP vs. IPsec ESP payload
566	   might be handled differently.  Based on that observation it might be
567	   undesirable to overload the next protocol with the OAM drop behavior,
568	   resulting in a preference for having a bit to indicate that the
569	   packet should be forwarded to the end station after decapsulation.

571	   There has been suggestions that one (or more) marker bits in the
572	   encaps header would be useful in order to delineate measurement
573	   epochs on the encapsulator and decapsulator and use that to compare
574	   counters to determine packet loss.

576	   A result of the above is that OAM is likely to evolve and needs some
577	   degree of extensibility from the encapsulation format; a bit or two
578	   plus the ability to define additional larger extensions.

580	   An open question is how to handle error messages or other reports
581	   relating to OAM.  One can think if such reporting as being associated
582	   with the encapsulation the same way ICMP is associated with IP.
583	   Would it make sense for the IETF to develop a common Encapsulation
584	   Error Reporting Protocol as part of OAM, which can be used for
585	   different encapsulations?  And if so, what are the technical
586	   challenges.  For instance, how to avoid it being filtered as ICMP
587	   often is?

589	   A potential additional consideration for OAM is the possible future
590	   existence of gateways that "stitch" together different dataplane
591	   encapsulations and might want to carry OAM end-to-end across the
592	   different encapsulations.

594	   In summary:

596	   o  It makes sense to reserve a bit for "drop after decapsulation" for
597	      OAM out-of-band.
598	   o  An encapsulation needs sufficient extensibility for OAM (such as
599	      bits, timestamps, sequence numbers).  That might be motivated by
600	      in-band OAM but it would make sense to leverage the same
601	      extensions for out-of band OAM.
602	   o  OAM places some constraints on use of entropy in forwarding
603	      devices.
604	   o  Should IETF look into error reporting that is independent of the
605	      specific encapsulation?

607	11.  Security Considerations

609	   Different encapsulation use cases will have different requirements
610	   around security.  For instance, when encapsulation is used to build
611	   overlay networks for network virtualization, isolation between
612	   virtual networks may be paramount.  BIER support of multicast may
613	   entail different security requirements than encapsulation for
614	   unicast.

616	   In real deployment, the security of the underlying network may be
617	   considered for determining the level of security needed in the
618	   encapsulation layer.  However for the purposes of this discussion, we
619	   assume that network security is out of scope and that the underlying
620	   network does not itself provide adequate or as least uniform security
621	   mechanisms for encapsulation.

623	   There are at least three considerations for security:

625	   o  Anti-spoofing/virtual network isolation
626	   o  Interaction with packet level security such as IPsec or DTLS
627	   o  Privacy (e.g., VNI ID confidentially for NVO3)

629	   This section uses a VNI ID in NVO3 as an example.  A SFC or BIER
630	   encapsulation is likely to have fields with similar security and
631	   privacy requirements.

633	11.1.  Encapsulation-specific considerations

635	   Some of these considerations appear for a new encapsulation, and
636	   others are more specific to network virtualization in datacenters.

638	   o  New attack vectors:

640	      *  DDOS on specific queued/paths by attempting to reproduce the
641	         5-tuple hash for targeted connections.
642	      *  Entropy in outer 5-tuple may be too little or predictable.
643	      *  Leakage of identifying information in the encapsulation header
644	         for an encrypted payload.
645	      *  Vulnerabilities of using global values in fields like VNI ID.
646	   o  Trusted versus untrusted tenants in network virtualization:

648	      *  The criticality of virtual network isolation depends on whether
649	         tenants are trusted or untrusted.  In the most extreme cases,
650	         tenants might not only be untrusted but may be considered
651	         hostile.
652	      *  For a trusted set of users (e.g. a private cloud) it may be
653	         sufficient to have just a virtual network identifier to provide
654	         isolation.  Packets inadvertently crossing virtual networks
655	         should be dropped similar to a TCP packet with a corrupted port
656	         being received on the wrong connection.
657	      *  In the presence of untrusted users (e.g. a public cloud) the
658	         virtual network identifier must be adequately protected against
659	         corruption and verified for integrity.  This case may warrant
660	         keyed integrity.
661	   o  Different forms of isolation:

663	      *  Isolation could be blocking all traffic between tenants (or
664	         except as allowed by some firewall)
665	      *  Could also be about performance isolation i.e. one tenant can
666	         overload the network in a way that affects other tenants

668	      *  Physical isolation of traffic for different tenants in network
669	         may be required, as well as required restrictions that tenants
670	         may have on where their packets may be routed.
671	   o  New attack vectors from untrusted tenants:

673	      *  Third party VMs with untrusted tenants allows internally borne
674	         attacks within data centers
675	      *  Hostile VMs inside the system may exist (e.g. public cloud)
676	      *  Internally launched DDOS
677	      *  Passive snooping for mis-delivered packets
678	      *  Mitigate damage and detection in event that a VM is able to
679	         circumvent isolation mechanisms
680	   o  Tenant-provider relationship:

682	      *  Tenant might not trust provider, hypervisors, network
683	      *  Provider likely will need to provide SLA or a least a statement
684	         on security
685	      *  Tenant may implement their own additional layers of security
686	      *  Regulation and certification considerations
687	   o  Trend towards tighter security:

689	      *  Tenants' data in network increases in volume and value, attacks
690	         become more sophisticated
691	      *  Large DCs already encrypt everything on disk
692	      *  DCs likely to encrypt inter-DC traffic at this point, use TLS
693	         to Internet.
694	      *  Encryption within DC is becoming more commonplace, becomes
695	         ubiquitous when cost is low enough.
696	      *  Cost/performance considerations.  Cost of support for strong
697	         security has made strong network security in DCs prohibitive.
698	      *  Are there lessons from MacSec?

700	11.2.  Virtual network isolation

702	   The first requirement is isolation between virtual networks.  Packets
703	   sent in one virtual network should never be illegitimately received
704	   by a node in another virtual network.  Isolation should be protected
705	   in the presence of malicious attacks or inadvertent packet
706	   corruption.

708	   The second requirement is sender authentication.  Sender identity is
709	   authenticated to prevent anti-spoofing.  Even if an attacker has
710	   access to the packets in the network, they cannot send packets into a
711	   virtual network.  This may have two possibilities:

713	   o  Pairwise sender authentication.  Any two communicating hosts
714	      negotiate a shared key.

716	   o  Group authentication.  A group of hosts share a key (this may be
717	      more appropriate for multicast of encapsulation).

719	   Possible security solutions:

721	   o  Security cookie: This is similar to L2TP cookie mechanism
722	      [RFC3931].  A shared plain text cookie is shared between
723	      encapsulator and decapsulator.  A receiver validates a packet by
724	      evaluating if the cookie is correct for the virtual network and
725	      address of a sender.  Validation function is F(cookie, VNI ID,
726	      source address).  If cookie matches, accept packet, else drop.
727	      Since cookie is plain text this method does not protect against an
728	      eavesdropping.  Cookies are set and may be rotated out of band.
729	   o  Secure hash: This is a stronger mechanism than simple cookies that
730	      borrows from IPsec and PPP authentication methods.  In this model
731	      security field contains a secure hash of some fields in the packet
732	      using a shared key.  Hash function may be something like H(key,
733	      VNI ID, address, salt).  The salt ensures the hash is not the same
734	      for every packet, and if it includes a sequence number may also
735	      protect against replay attacks.

737	   In any use of a shared key, periodic re-keying should be allowed.
738	   This could include use of techniques like generation numbers, key
739	   windows, etc.  See [I-D.farrelll-mpls-opportunistic-encrypt] for an
740	   example application.

742	   We might see firewalls that are aware of the encapsulation and can
743	   provide some defense in depth combined with the above example anti-
744	   spoofing approaches.  An example would be an NVO3-aware firewall
745	   being able to check the VNI ID.

747	   Separately and in addition to such filtering, there might be a desire
748	   to completely block an encapsulation protocol at certain places in
749	   the network, e.g., at the edge of a datacenter.  Using a fixed
750	   standard UDP destination port number for each encapsulation protocol
751	   would facilitate such blocking.

753	11.3.  Packet level security

755	   An encapsulated packet may itself be encapsulated in IPsec (e.g.
756	   ESP).  This should be straightforward and in fact is what would
757	   happen today in security gateways.  In this case, there is no special
758	   consideration for the fact that packet is encapsulated, however since
759	   the encapsulation layer headers are included (part of encrypted data
760	   for instance) we lose visibility in the network of the encapsulation.

762	   The more interesting case is when security is applied to the
763	   encapsulation payload.  This will keep the encapsulation headers in
764	   the outer header visible to the network (for instance in nvo3 we may
765	   way to firewall based on VNI ID even if the payload is encrypted).
766	   One possibility is to apply DTLS to the encapsulation payload.  In
767	   this model the protocol stack may be something like
768	   IP|UDP|Encap|DTLS|encrypted_payload.  The encapsulation and security
769	   should be done together at an encapsulator and resolved at the
770	   decapsulator.  Since the encapsulation header is outside of the
771	   security coverage, this may itself require security (like described
772	   above).

774	   In both of the above the security associations (SAs) may be between
775	   physical hosts, so for instance in nvo3 we can have packets of
776	   different virtual networks using the same SA-- this should not be an
777	   issue since it is the VNI ID that ensures isolation (which needs to
778	   be secured also).

780	11.4.  In summary:

782	   o  Encapsulations need extensibility mechanisms to be able to add
783	      security features like cookies and secure hashes protecting the
784	      encapsulation header.
785	   o  NVO3 probably has specific higher requirements relating to
786	      isolation for network virtualization, which is in scope for the
787	      NVO3 WG.
788	   o  Our collective IETF experience is that successful protocols get
789	      deployed outside of the original intended context, hence the
790	      initial assumptions about the threat model might become invalid.
791	      That needs to be considered in the standardization of new
792	      encapsulations.

794	12.  QoS

796	   In the Internet architecture we support QoS using the Differentiated
797	   Services Code Points (DSCP) in the formerly named Type-of-Service
798	   field in the IPv4 header, and in the Traffic-Class field in the IPv6
799	   header.  The ToS and TC fields also contain the two ECN bits, which
800	   are discussed in Section 13.

802	   We have existing specifications how to process those bits.  See
803	   [RFC2983] for diffserv handling, which specifies how the received
804	   DSCP value is used to set the DSCP value in an outer IP header when
805	   encapsulating.  (There are also existing specifications how DSCP can
806	   be mapped to layer2 priorities.)

808	   Those specifications apply whether or not there is some intervening
809	   headers (e.g., for NVO3 or SFC) between the inner and outer IP
810	   headers.  Thus the encapsulation considerations in this area are
811	   mainly about applying the framework in [RFC2983].

813	   Note that the DSCP and ECN bits are not the only part of an inner
814	   packet that might potentially affect the outer packet.  For example,
815	   [RFC2473] specifies handling of inner IPv6 hop-by-hop options that
816	   effectively result in copying some options to the outer header.  It
817	   is simpler to not have future encapsulations depend on such copying
818	   behavior.

820	   There are some other considerations specific to doing OAM for
821	   encapsulations.  If OAM messages are used to measure latency, it
822	   would make sense to treat them the same as data payloads.  Thus they
823	   need to have the same outer DSCP value as the data packets which they
824	   wish to measure.

826	   Due to OAM there are constraints on middleboxes in general.  If
827	   middleboxes inspect the packet past the outer IP+UDP and
828	   encapsulation header and look for inner IP and TCP/UDP headers, that
829	   might violate the assumption that OAM packets will be handled the
830	   same as regular data packets.  That issue is broader than just QoS -
831	   applies to firewall filters etc.

833	   In summary:

835	   o  Leverage the existing approach in [RFC2983] for DSCP handling.

837	13.  Congestion Considerations

839	   Additional encapsulation headers does not introduce anything new for
840	   Explicit Congestion Notification.  It is just like IP-in-IP and IPsec
841	   tunnels which is specified in [RFC6040] in terms of how the ECN bits
842	   in the inner and outer header are handled when encapsulating and
843	   decapsulating packets.  Thus new encapsulations can more or less
844	   include that by reference.

846	   There are additional considerations around carrying non-congestion
847	   controlled traffic.  These details have been worked out in
848	   [I-D.ietf-mpls-in-udp].  As specified in [RFC5405]: "IP-based traffic
849	   is generally assumed to be congestion-controlled, i.e., it is assumed
850	   that the transport protocols generating IP-based traffic at the
851	   sender already employ mechanisms that are sufficient to address
852	   congestion on the path.  Consequently, a tunnel carrying IP-based
853	   traffic should already interact appropriately with other traffic
854	   sharing the path, and specific congestion control mechanisms for the
855	   tunnel are not necessary".  Those considerations are being captured
856	   in [I-D.ietf-tsvwg-rfc5405bis].

858	   For this reason, where an encapsulation method is used to carry IP
859	   traffic that is known to be congestion controlled, the UDP tunnels
860	   does not create an additional need for congestion control.  Internet
861	   IP traffic is generally assumed to be congestion-controlled.
862	   Similarly, in general Layer 3 VPNs are carrying IP traffic that is
863	   similarly assumed to be congestion controlled.

865	   However, some of the encapsulations (at least NVO3) will be able to
866	   carry arbitrary Layer 2 packets to provide an L2 service, in which
867	   case one can not assume that the traffic is congestion controlled.

869	   One could handle this by adding some congestion control support to
870	   the encapsulation header (one instance of which would end up looking
871	   like DCCP).  However, if the underlay is well-provisioned and managed
872	   as opposed to being arbitrary Internet path, it might be sufficient
873	   to have a slower reaction to congestion induced by that traffic.
874	   There is work underway on a notion of "circuit breakers" for this
875	   purpose.  See See [I-D.ietf-tsvwg-circuit-breaker].  Encapsulations
876	   which carry arbitrary Layer 2 packets want to consider that ongoing
877	   work.

879	   If the underlay is provisioned in such a way that it can guarantee
880	   sufficient capacity for non-congestion controlled Layer 2 traffic,
881	   then such circuit breakers might not be needed.

883	   Two other considerations appear in the context of these
884	   encapsulations as applied to overlay networks:

886	   o  Protect against malicious end stations
887	   o  Ensure fairness and/or measure resource usage across multiple
888	      tenants

890	   Those issues are really orthogonal to the encapsulation, in that they
891	   are present even when no new encapsulation header is in use.
892	   However, the application of the new encapsulations are likely to be
893	   in environments where those issues are becoming more important.
894	   Hence it makes sense to consider them.

896	   One could make the encapsulation header be extensible to that it can
897	   carry sufficient information to be able to measure resource usage,
898	   delays, and congestion.  The suggestions in the OAM section about a
899	   single bit for counter synchronization, and optional timestamps and/
900	   or sequence numbers, could be part of such an approach.  There might
901	   also be additional congestion-control extensions to be carried in the
902	   encapsulation.  Overall this results in a consideration to support
903	   sufficient extensibility in the encapsulation to handle potential
904	   future developments in this space.

906	   Coarse measurements are likely to suffice, at least for circuit-
907	   breaker-like purposes, see [I-D.wei-tsvwg-tunnel-congestion-feedback]
908	   and [I-D.briscoe-conex-data-centre] for examples on active work in
909	   this area via use of ECN.  [RFC6040] Appendix C is also relevant.
910	   The outer ECN bits seem sufficient (at least when everything uses
911	   ECN) to do this course measurements.  Needs some more study for the
912	   case when there are also drops; might need to exchange counters
913	   between ingress and egress to handle drops.

915	   Circuit breakers are not sufficient to make a network with different
916	   congestion control when the goal is to provide a predictable service
917	   to different tenants.  The fallback would be to rate limit different
918	   traffic.

920	   In summary:

922	   o  Leverage the existing approach in [RFC6040] for ECN handling.
923	   o  If the encapsulation can carry non-IP, hence non-congestion
924	      controlled traffic, then leverage the approach in
925	      [I-D.ietf-mpls-in-udp].
926	   o  "Watch this space" for circuit breakers.

928	14.  Header Protection

930	   Many UDP based encapsulations such as VXLAN [RFC7348] either
931	   discourage or explicitly disallow the use of UDP checksums.  The
932	   reason is that the UDP checksum covers the entire payload of the
933	   packet and switching ASICs are typically optimized to look at only a
934	   small set of headers as the packet passes through the switch.  In
935	   these case, computing a checksum over the packet is very expensive.
936	   (Software endpoints and the NICs used with them generally do not have
937	   the same issue as they need to look at the entire packet anyways.)

939	   The lack a header checksum creates the possibility that bit errors
940	   can be introduced into any information carried by the new headers.
941	   Specifically, in the case of IPv6, the assumption is that a transport
942	   layer checksum - UDP in this case - will protect the IP addresses
943	   through the inclusion of a pseudo-header in the calculation.  This is
944	   different from IPv4 on which many of these encapsulation protocols
945	   are initially deployed which contains its own header checksum.  In
946	   addition to IP addresses, the encapsulation header often contains its
947	   own information which is used for addressing packets or other high
948	   value network functions.  Without a checksum, this information is
949	   potentially vulnerable - an issue regardless of whether the packet is
950	   carried over IPv4 or IPv6.

952	   Several protocols cite [RFC6935] and [RFC6936] as an exemption to the
953	   IPv6 checksum requirements.  However, these are intended to be
954	   tailored to a fairly narrow set of circumstances - primarily relying
955	   on sparseness of the address space to detect invalid values and well
956	   managed networks - and are not a one size fits all solution.  In
957	   these cases, an analysis should be performed of the intended
958	   environment, including the probability of errors being introduced and
959	   the use of ECC memory in routing equipment.

961	   Conceptually, the ideal solution to this problem is a checksum that
962	   covers only the newly added headers of interest.  There is little
963	   value in the portion of the UDP checksum that covers the encapsulated
964	   packet because that would generally be protected by other checksums
965	   and this is the expensive portion to compute.  In fact, this solution
966	   already exists in the form of UDP-Lite and UDP based encapsulations
967	   could be easily ported to run on top of it.  Unfortunately, the main
968	   value in using UDP as part of the encapsulation header is that it is
969	   recognized by already deployed equipment for the purposes of ECMP,
970	   RSS, and middlebox operations.  As UDP-Lite uses a different protocol
971	   number than UDP and it is not widely implemented in middleboxes, this
972	   value is lost.  A possible solution is to incorporate the same
973	   partial-checksum concept as UDP-Lite or other header checksum
974	   protection into the encapsulation header and continue using UDP as
975	   the outer protocol.  One potential challenge with this approach is
976	   the use of NAT or other form of translation on the outer header will
977	   result in an invalid checksum as the translator will not know to
978	   update the encapsulation header.

980	   The method chosen to protect headers is often related to the security
981	   needs of the encapsulation mechanism.  On one hand, the impact of a
982	   poorly protected header is not limited to only data corruption but
983	   can also introduce a security vulnerability in the form of
984	   misdirected packets to an unauthorized recipient.  Conversely, high
985	   security protocols that already include a secure hash over the
986	   valuable portion of the header (such as by encrypting the entire IP
987	   packet using IPsec, or some secure hash of the encap header) do not
988	   require additional checksum protection as the hash provides stronger
989	   assurance than a simple checksum.

991	   If the sender has included a checksum, then the receiver should
992	   verify that checksum or, if incapable, drop the packet.  The
993	   assumption is that configuration and/or control-plane capability
994	   exchanges can be used when different receiver have different checksum
995	   validation capabilities.

997	   In summary:

999	   o  Encapsulations need extensibility to be able to add checksum/CRC
1000	      for the encapsulation header itself.
1001	   o  When the encapsulation has a checksum/CRC, include the IPv6
1002	      pseudo-header in it.
1003	   o  The checksum/CRC can potentially be avoided when cryptographic
1004	      protection is applied to the encapsulation.

1006	15.  Extensibility Considerations

1008	   Protocol extensibility is the concept that a networking protocol may
1009	   be extended to include new use cases or functionality that were not
1010	   part of the original protocol specification.  Extensibility may be
1011	   used to add security, control, management, or performance features to
1012	   a protocol.  A solution may allow private extensions for
1013	   customization or experimentation.

1015	   Extending a protocol often implies that a protocol header must carry
1016	   new information.  There are two usual methods to accomplish this:

1018	   1.  Define or redefine the meaning of existing fields in a protocol
1019	       header.
1020	   2.  Add new (optional) fields to the protocol header.

1022	   It is also possible to create a new protocol version, but this is
1023	   more associated with defining a protocol than extending it (IPv6
1024	   being a successor to IPv4 is an example of protocol versioning).

1026	   In some cases it might be more appropriate to define a new inner
1027	   protocol which can carry the new functionality instead of extending
1028	   the outer protocol.  Examples where this works well is in the IP/
1029	   transport split, where the earlier architecture had a single NCP
1030	   [RFC0033] protocol which carried both the hop-by-hop semantics which
1031	   are now in IP, and the end-to-end semantics which are now in TCP.
1032	   Such a split is effective when different nodes need to act upon the
1033	   different information.  Applying this for general protocol
1034	   extensibility through nesting is not well understood, and does result
1035	   in longer header chains.  Furthermore, our experience with IPv6
1036	   extension headers [RFC2460] in middleboxes indicates that the header
1037	   chaining approach does not help with middlebox traversal.

1039	   Many protocol definitions include some number of reserved fields or
1040	   bits which can be used for future extension.  VXLAN is an example of
1041	   a protocol that includes reserved bits which are subsequently being
1042	   allocated for new purposes.  Another technique employed is to re-
1043	   purpose existing header fields with new meanings.  A classic example
1044	   of this is the definition of DSCP code point which redefines the ToS
1045	   field originally specified in IPv4.  When a field is redefined, some
1046	   mechanism may be needed to ensure that all interested parties agree
1047	   on the meaning of the field.  The techniques of defining meaning for
1048	   reserved bits or redefining existing fields have the advantage that a
1049	   protocol header can be kept a fixed length.  The disadvantage is that
1050	   the extensibility is limited.  For instance, the number reserved bits
1051	   in a fixed protocol header is limited.  For standard protocols the
1052	   decision to commit to a definition for a field can be wrenching since
1053	   it is difficult to retract later.  Also, it is difficult to predict a
1054	   priori how many reserved fields or bits to put into a protocol header
1055	   to satisfy the extensions create over the lifetime of the protocol.

1057	   Extending a protocol header with new fields can be done in several
1058	   ways.

1060	   o  TLVs are a very popular method used in such protocols as IP and
1061	      TCP.  Depending on the type field size and structure, TLVs can
1062	      offer a virtually unlimited range of extensions.  A disadvantage
1063	      of TLVs is that processing them can be verbose, quite complicated,
1064	      several validations must often be done for each TLV, and there is
1065	      no deterministic ordering for a list of TLVs.  TCP serves as an
1066	      example of a protocol where TLVs have been successfully used (i.e.
1067	      required for protocol operation).  IP is an example of a protocol
1068	      that allows TLVs but are rarely used in practice (router fast
1069	      paths usually that assume no IP options).  Note that TCP TLVs are
1070	      implemented in software as well as (NIC) hardware handling various
1071	      forms of TCP offload.  Additional discussions about hardware
1072	      implications for extensibility is captured in Section 18.
1073	   o  Extension headers are closely related to TLVs.  These also carry
1074	      type/value information, but instead of being a list of TLVs within
1075	      a single protocol header, each one is in its own protocol header.
1076	      IPv6 extension headers and SFC NSH are examples of this technique.
1077	      Similar to TLVs these offer a wide range of extensibility, but
1078	      have similarly complex processing.  Another difference with TLVs
1079	      is that each extension header is idempotent.  This is beneficial
1080	      in cases where a protocol implements a push/pop model for header
1081	      elements like service chaining, but makes it more difficult group
1082	      correlated information within one protocol header.
1083	   o  A particular form of extension headers are the tags used by IEEE
1084	      802 protocols.  Those are similar to e.g., IPv6 extension headers
1085	      but with the key difference that each tag is a fixed length header
1086	      where the length is implicit in the tag value.  Thus as long as a
1087	      receiver can be programmed with a tag value to length map, it can
1088	      skip those new tags.
1089	   o  Flag-fields are a non-TLV like method of extending a protocol
1090	      header.  The basic idea is that the header contains a set of
1091	      flags, where each set flags corresponds to optional field that is
1092	      present in the header.  GRE is an example of a protocol that
1093	      employs this mechanism.  The fields are present in the header in
1094	      the order of the flags, and the length of each field is fixed.
1095	      Flag-fields are simpler to process compared to TLVs, having fewer
1096	      validations and the order of the optional fields is deterministic.
1097	      A disadvantage is that range of possible extensions with flag-
1098	      fields is smaller than TLVs.

1100	   The requirements for receiving unknown or unimplemented extensible
1101	   elements in an encapsulation protocol (flags, TLVs, optional fields)
1102	   need to be specified.  There are two parties to consider, middle
1103	   boxes and terminal endpoints of encapsulation (at the decapsulator).

1105	   A protocol may allow or expect nodes in a path to modify fields in an
1106	   encapsulation (example use of this is BIER).  In this case, the
1107	   middleboxes should follow the same requirements as nodes terminating
1108	   the encapsulation.  In the case that middle boxes do not modify the
1109	   encapsulation, we can assume that they may still inspect any fields
1110	   of the encapsulation.  Missing or unknown fields should be accepted
1111	   per protocol specification, however it is permissible for a site to
1112	   implement a local policy otherwise (e.g. a firewall may drop packets
1113	   with unknown options).

1115	   For handling unknown options at terminal nodes, there are two
1116	   possibilities: drop packet or accept while ignoring the unknown
1117	   options.  Many Internet protocols specify that reserved flags must be
1118	   set to zero on transmission and ignored on reception.  L2TP is
1119	   example data protocol that has such flags.  GRE is a notable
1120	   exception to this rule, reserved flag bits 1-5 cannot be ignored
1121	   [RFC2890].  For TCP and IPv4, implementations must ignore optional
1122	   TLVs with unknown type; however in IPv6 if a packet contains an
1123	   unknown extension header (unrecognized next header type) the packet
1124	   must be dropped with an ICMP error message returned.  The IPv6
1125	   options themselves (encoded inside the destinations options or hop-
1126	   by-hop options extension header) have more flexibility.  There are
1127	   bits in the option code are used to instruct the receiver whether to
1128	   ignore, silently drop, or drop and send error if the option is
1129	   unknown.  Some protocols define a "mandatory bit" that can is set
1130	   with TLVs to indicate that an option must not be ignored.
1131	   Conceptually, optional data elements can only be ignored if they are
1132	   idempotent and do not alter how the rest of the packet is parsed or
1133	   processed.

1135	   Depending on what type of protocol evolution one can predict, it
1136	   might make sense to have a way for a sender to express that the
1137	   packet should be dropped by a terminal node which does not understand
1138	   the new information.  In other cases it would make sense to have the
1139	   receiver silently ignore the new info.  The former can be expressed
1140	   by having a version field in the encapsulation, or a notion of
1141	   "mandatory bit" as discussed above.

1143	   A security mechanism which use some form secure hash over the
1144	   encapsulation header would need to be able to know which extensions
1145	   can be changed in flight.

1147	   In summary:

1149	   o  Encapsulations need the ability to be extended to handle e.g., the
1150	      OAM or security aspects discussed in this document.
1151	   o  Practical experience seems to tell us that extensibility
1152	      mechanisms which are not in use on day one might result in
1153	      immediate ossification by lack of implementation support.  In some
1154	      cases that has occurred in routers and in other cases in
1155	      middleboxes.  Hence devising ways where the extensibility
1156	      mechanisms are in use seems important.

1158	16.  Layering Considerations

1160	   One can envision that SFC might use NVO3 as a delivery/transport
1161	   mechanism.  With more imagination that in turn might be delivered
1162	   using BIER.  Thus it is useful to think about what things look like
1163	   when we have BIER+NVO3+SFC+payload.  Also, if NVO3 is widely deployed
1164	   there might be cases of NVO3 nesting where a customer uses NVO3 to
1165	   provide network virtualization e.g., across departments.  That
1166	   customer uses a service provider which happens to use NVO3 to provide
1167	   transport for their customers.Thus NVO3 in NVO3 might happen.

1169	   A key question we set out to answer is what the packets might look
1170	   like in such a case, and in particular whether we would end up with
1171	   multiple UDP headers for entropy.

1173	   Based on the discussion in the Entropy section, the entropy is
1174	   associated with the outer delivery IP header.  Thus if there are
1175	   multiple IP headers there would be a UDP header for each one of the
1176	   IP headers.  But SFC does not require its own IP header.  So a case
1177	   of NVO3+SFC would be IP+UDP+NVO3+SFC.  A nested NVO3 encapsulation
1178	   would have independent IP+UDP headers.

1180	   The layering also has some implications for middleboxes.

1182	   o  A device on the path between the ingress and egress is allowed to
1183	      transparently inspect all layers of the protocol stack and drop or
1184	      forward, but not transparently modify anything but the layer in
1185	      which they operate.  What this means is that an IP router is
1186	      allowed modify the outer IP ttl and ECN bits, but not the
1187	      encapsulation header or inner headers and payload.  And a BIER
1188	      router is allowed to modify the BIER header.
1189	   o  Alternatively such a device can become visible at a higher layer.
1190	      E.g., a middlebox could a middlebox could first decapsulate,
1191	      perform some function then encapsulate; which means it will
1192	      generate a new encapsulation header.

1194	   The design team asked itself some additional questions:

1196	   o  Would it make sense to have a common encapsulation base header
1197	      (for OAM, security?, etc) and then followed by the specific
1198	      information for NVO3, SFC, BIER?  Given that there are separate
1199	      proposals and the set of information needing to be carried
1200	      differs, and the extensibility needs might be different, it would
1201	      be difficult and not that useful to have a common base header.
1202	   o  With a base header in place, one could view the different
1203	      functions (NVO3, SFC, and BIER) as different extensions to that
1204	      base header resulting in encodings which are more space optimal by
1205	      not repeating the same base header.  The base header would only be
1206	      repeated when there is an additional IP (and hence UDP) header.
1207	      That could mean a single length field (to skip to get to the
1208	      payload after all the encapsulation headers).  That might be
1209	      technically feasible, but it would create a lot of dependencies
1210	      between different WGs making it harder to make progress.  Compare
1211	      with the potential savings in packet size.

1213	17.  Service model

1215	   The IP service is lossy and subject to reordering.  In order to avoid
1216	   a performance impact on transports like TCP the handling of packets
1217	   is designed to avoid reordering packets that are in the same
1218	   transport flow (which is typically identified by the 5-tuple).  But
1219	   across such flows the receiver can see different ordering for a given
1220	   sender.  That is the case for a unicast vs. a multicast flow from the
1221	   same sender.

1223	   There is a general tussle between the desire for high capacity
1224	   utilization across a multipath network and the impact on packet
1225	   ordering within the same flow (which results in lower transport
1226	   protocol performance).  That isn't affected by the introduction of an
1227	   encapsulation.  However, the encapsulation comes with some entropy,
1228	   and there might be cases where folks want to change that in response
1229	   to overload or failures.  For instance, one might want to change UDP
1230	   source port to try different ECMP route.  Such changes can result in
1231	   packet reordering within a flow, hence would need to be done
1232	   infrequently and with care e.g., by identifying packet trains.

1234	   There might be some applications/services which are not able to
1235	   handle reordering across flows.  The IETF has defined pseudo-wires
1236	   [RFC3985] which provides the ability to ensure ordering (implemented
1237	   using sequence numbers and/or timestamps).

1239	   Architectural such services would make sense, but as a separate layer
1240	   on top of an encapsulation protocol.  They could be deployed between
1241	   ingress and egress of a tunnel which uses some encaps.  Potentially
1242	   the tunnel control points at the ingress and egress could become a
1243	   platform for fixing suboptimal behavior elsewhere in the network.

1245	   That would clearly be undesirable in the general case.  However,
1246	   handling encapsulation of non-IP traffic hence non-congestion-
1247	   controlled traffic is likely to be required, which implies some
1248	   fairness and/or QoS policing on the ingress and egress devices.

1250	   But the tunnels could potentially do more like increase reliability
1251	   (retransmissions, FEC) or load spreading using e.g.  MP-TCP between
1252	   ingress and egress.

1254	18.  Hardware Friendly

1256	   Hosts, switches and routers often leverage capabilities in the
1257	   hardware to accelerate packet encapsulation, decapsulation and
1258	   forwarding.

1260	   Some design considerations in encapsulation that leverage these
1261	   hardware capabilities may result in more efficiently packet
1262	   processing and higher overall protocol throughput.

1264	   While "hardware friendliness" can be viewed as unnecessary
1265	   considerations for a design, part of the motivation for considering
1266	   this is ease of deployment; being able to leverage existing NIC and
1267	   switch chips for at least a useful subset of the functionality that
1268	   the new encapsulation provides.  The other part is the ease of
1269	   implementing new NICs and switch/router chips that support the
1270	   encapsulation at ever increasing line rates.

1272	   [disclaimer] There are many different types of hardware in any given
1273	   network, each maybe better at some tasks while worse at others.  We
1274	   would still recommend protocol designers to examine the specific
1275	   hardware that are likely to be used in their networks and make
1276	   decisions on a case by case basis.

1278	   Some considerations are:

1280	   o  Keep the encap header small.  Switches and routers usually only
1281	      read the first small number of bytes into the fast memory for
1282	      quick processing and easy manipulation.  The bulk of the packets
1283	      are usually stored in slow memory.  A big encap header may not fit
1284	      and additional read from the slow memory will hurt the overall
1285	      performance and throughput.
1286	   o  Put important information at the beginning of the encapsulation
1287	      header.  The reasoning is similar as explained in the previous
1288	      point.  If important information are located at the beginning of
1289	      the encapsulation header, the packet may be processed with smaller
1290	      number of bytes to be read into the fast memory and improve
1291	      performance.

1293	   o  Avoid full packet checksums in the encapsulation if possible.
1294	      Encapsulations should instead consider adding their own checksum
1295	      which covers the encapsulation header and any IPv6 pseudo-header.
1296	      The motivation is that most of the switch/router hardware make
1297	      switching/forwarding decisions by reading and examining only the
1298	      first certain number of bytes in the packet.  Most of the body of
1299	      the packet do not need to be processed normally.  If we are
1300	      concerned of preventing packet to be misdelivered due to memory
1301	      errors, consider only perform header checksums.  Note that NIC
1302	      chips can typically already do full packet checksums for TCP/UDP,
1303	      while adding a header checksum might require adding some hardware
1304	      support.
1305	   o  Place important information at fixed offset in the encapsulation
1306	      header.  Packet processing hardware may be capable of parallel
1307	      processing.  If important information can be found at fixed
1308	      offset, different part of the encapsulation header may be
1309	      processed by different hardware units in parallel (for example
1310	      multiple table lookups may be launched in parallel).  It is easier
1311	      for hardware to handle optional information when the information,
1312	      if present, can be found in ideally one place, but in general, in
1313	      as few places as possible.  That facilitates parallel processing.
1314	      TLV encoding with unconstrained order typically does not have that
1315	      property.
1316	   o  Limit the number of header combinations.  In many cases the
1317	      hardware can explore different combinations of headers in
1318	      parallel, however there is some added cost for this.

1320	18.1.  Considerations for NIC offload

1322	   This section provides guidelines to provide support of common
1323	   offloads for encapsulation in Network Interface Cards (NICs).
1324	   Offload mechanisms are techniques that are implemented separately
1325	   from the normal protocol implementation of a host networking stack
1326	   and are intended to optimize or speed up protocol processing.
1327	   Hardware offload is performed within a NIC device on behalf of a
1328	   host.

1330	   There are three basic offload techniques of interest:

1332	   o  Receive multi queue
1333	   o  Checksum offload
1334	   o  Segmentation offload

1336	18.1.1.  Receive multi-queue

1338	   Contemporary NICs support multiple receive descriptor queues (multi-
1339	   queue).  Multi-queue enables load balancing of network processing for
1340	   a NIC across multiple CPUs.  On packet reception, a NIC must select
1341	   the appropriate queue for host processing.  Receive Side Scaling
1342	   (RSS) is a common method which uses the flow hash for a packet to
1343	   index an indirection table where each entry stores a queue number.

1345	   UDP encapsulation, where the source port is used for entropy, should
1346	   be compatible with multi-queue NICs that support five-tuple hash
1347	   calculation for UDP/IP packets as input to RSS.  The source port
1348	   ensures classification of the encapsulated flow even in the case that
1349	   the outer source and destination addresses are the same for all flows
1350	   (e.g. all flows are going over a single tunnel).

1352	18.1.2.  Checksum offload

1354	   Many NICs provide capabilities to calculate standard ones complement
1355	   payload checksum for packets in transmit or receive.  When using
1356	   encapsulation over UDP there are at least two checksums that may be
1357	   of interest: the encapsulated packet's transport checksum, and the
1358	   UDP checksum in the outer header.

1360	18.1.2.1.  Transmit checksum offload

1362	   NICs may provide a protocol agnostic method to offload transmit
1363	   checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with
1364	   UDP encapsulation.  In this method the host provides checksum related
1365	   parameters in a transmit descriptor for a packet.  These parameters
1366	   include the starting offset of data to checksum, the length of data
1367	   to checksum, and the offset in the packet where the computed checksum
1368	   is to be written.  The host initializes the checksum field to pseudo
1369	   header checksum.  In the case of encapsulated packet, the checksum
1370	   for an encapsulated transport layer packet, a TCP packet for
1371	   instance, can be offloaded by setting the appropriate checksum
1372	   parameters.

1374	   NICs typically can offload only one transmit checksum per packet, so
1375	   simultaneously offloading both an inner transport packet's checksum
1376	   and the outer UDP checksum is likely not possible.  In this case
1377	   setting UDP checksum to zero (per above discussion) and offloading
1378	   the inner transport packet checksum might be acceptable.

1380	   There is a proposal in [I-D.herbert-remotecsumoffload] to leverage
1381	   NIC checksum offload when an encapsulator is co-resident with a host.

1383	18.1.2.2.  Receive checksum offload

1385	   Protocol encapsulation is compatible with NICs that perform a
1386	   protocol agnostic receive checksum (CHECKSUM_COMPLETE in Linux
1387	   parlance).  In this technique, a NIC computes a ones complement
1388	   checksum over all (or some predefined portion) of a packet.  The
1389	   computed value is provided to the host stack in the packet's receive
1390	   descriptor.  The host driver can use this checksum to "patch up" and
1391	   validate any inner packet transport checksum, as well as the outer
1392	   UDP checksum if it is non-zero.

1394	   Many legacy NICs don't provide checksum-complete but instead provide
1395	   an indication that a checksum has been verified (CHECKSUM_UNNECESSARY
1396	   in Linux).  Usually, such validation is only done for simple TCP/IP
1397	   or UDP/IP packets.  If a NIC indicates that a UDP checksum is valid,
1398	   the checksum-complete value for the UDP packet is the "not" of the
1399	   pseudo header checksum.  In this way, checksum-unnecessary can be
1400	   converted to checksum-complete.  So if the NIC provides checksum-
1401	   unnecessary for the outer UDP header in an encapsulation, checksum
1402	   conversion can be done so that the checksum-complete value is derived
1403	   and can be used by the stack to validate an checksums in the
1404	   encapsulated packet.

1406	18.1.3.  Segmentation offload

1408	   Segmentation offload refers to techniques that attempt to reduce CPU
1409	   utilization on hosts by having the transport layers of the stack
1410	   operate on large packets.  In transmit segmentation offload, a
1411	   transport layer creates large packets greater than MTU size (Maximum
1412	   Transmission Unit).  It is only at much lower point in the stack, or
1413	   possibly the NIC, that these large packets are broken up into MTU
1414	   sized packet for transmission on the wire.  Similarly, in receive
1415	   segmentation offload, small packets are coalesced into large, greater
1416	   than MTU size packets at a point low in the stack receive path or
1417	   possibly in a device.  The effect of segmentation offload is that the
1418	   number of packets that need to be processed in various layers of the
1419	   stack is reduced, and hence CPU utilization is reduced.

1421	18.1.3.1.  Transmit Segmentation Offload

1423	   Transmit Segmentation Offload (TSO) is a NIC feature where a host
1424	   provides a large (larger than MTU size) TCP packet to the NIC, which
1425	   in turn splits the packet into separate segments and transmits each
1426	   one.  This is useful to reduce CPU load on the host.

1428	   The process of TSO can be generalized as:

1430	   o  Split the TCP payload into segments which allow packets with size
1431	      less than or equal to MTU.
1432	   o  For each created segment:

1434	      1.  Replicate the TCP header and all preceding headers of the
1435	          original packet.

1437	      2.  Set payload length fields in any headers to reflect the length
1438	          of the segment.
1439	      3.  Set TCP sequence number to correctly reflect the offset of the
1440	          TCP data in the stream.
1441	      4.  Recompute and set any checksums that either cover the payload
1442	          of the packet or cover header which was changed by setting a
1443	          payload length.

1445	   Following this general process, TSO can be extended to support TCP
1446	   encapsulation UDP.  For each segment the Ethernet, outer IP, UDP
1447	   header, encapsulation header, inner IP header if tunneling, and TCP
1448	   headers are replicated.  Any packet length header fields need to be
1449	   set properly (including the length in the outer UDP header), and
1450	   checksums need to be set correctly (including the outer UDP checksum
1451	   if being used).

1453	   To facilitate TSO with encapsulation it is recommended that optional
1454	   fields should not contain values that must be updated on a per
1455	   segment basis-- for example an encapsulation header should not
1456	   include checksums, lengths, or sequence numbers that refer to the
1457	   payload.  If the encapsulation header does not contain such fields
1458	   then the TSO engine only needs to copy the bits in the encapsulation
1459	   header when creating each segment and does not need to parse the
1460	   encapsulation header.

1462	18.1.3.2.  Large Receive Offload

1464	   Large Receive Offload (LRO) is a NIC feature where packets of a TCP
1465	   connection are reassembled, or coalesced, in the NIC and delivered to
1466	   the host as one large packet.  This feature can reduce CPU
1467	   utilization in the host.

1469	   LRO requires significant protocol awareness to be implemented
1470	   correctly and is difficult to generalize.  Packets in the same flow
1471	   need to be unambiguously identified.  In the presence of tunnels or
1472	   network virtualization, this may require more than a five-tuple match
1473	   (for instance packets for flows in two different virtual networks may
1474	   have identical five-tuples).  Additionally, a NIC needs to perform
1475	   validation over packets that are being coalesced, and needs to
1476	   fabricate a single meaningful header from all the coalesced packets.

1478	   The conservative approach to supporting LRO for encapsulation would
1479	   be to assign packets to the same flow only if they have identical
1480	   five-tuple and were encapsulated the same way.  That is the outer IP
1481	   addresses, the outer UDP ports, encapsulated protocol, encapsulation
1482	   headers, and inner five tuple are all identical.

1484	18.1.3.3.  In summary:

1486	   In summary, for NIC offload:

1488	   o  The considerations for using full UDP checksums are different for
1489	      NIC offload than for implementations in forwarding devices like
1490	      routers and switches.
1491	   o  Be judicious about encapsulations that change fields on a per-
1492	      packet basis, since such behavior might make it hard to use TSO.

1494	19.  Middlebox Considerations

1496	   This document has touched upon middleboxes in different section.  The
1497	   reason for this is as encapsulations get widely deployed one would
1498	   expect different forms of middleboxes might become aware of the
1499	   encapsulation protocol just as middleboxes have been made aware of
1500	   other protocols where there are business and deployment
1501	   opportunities.  Such middleboxes are likely to do more than just drop
1502	   packets based on the UDP port number used by an encapsulation
1503	   protocol.

1505	   We note that various forms of encapsulation gateways that stitch one
1506	   encapsulation protocol together with another form of protocol could
1507	   have similar effects.

1509	   An example of a middlebox that could see some use would be an
1510	   NVO3-aware firewall that would filter on the VNI IDs to provide some
1511	   defense in depth inside or across NVO3 datacenters.

1513	   A question for the IETF is whether we should document what to do or
1514	   what not to do in such middleboxes.  This document touches on areas
1515	   of OAM and ECMP as it relates to middleboxes and it might make sense
1516	   to document how encapsulation-aware middleboxes should avoid
1517	   unintended consequences in those (and perhaps other) areas.

1519	   In summary:

1521	   o  We are likely to see middleboxes that at least parse the headers
1522	      for successful new encapsulations.
1523	   o  Should the IETF document considerations for what not to do in such
1524	      middleboxes?

1526	20.  Related Work

1528	   The IETF and industry has defined encapsulations for a long time,
1529	   with examples like GRE [RFC2890], VXLAN [RFC7348], and NVGRE
1530	   [I-D.sridharan-virtualization-nvgre] being able to carry arbitrary
1531	   Ethernet payloads, and various forms of IP-in-IP and IPsec
1532	   encapsulations that can carry IP packets.  As part of NVO3 there has
1533	   been additional proposals like Geneve [I-D.gross-geneve] and GUE
1534	   [I-D.herbert-gue] which look at more extensibility.  NSH
1535	   [I-D.quinn-sfc-nsh] is an example of an encapsulation that tries to
1536	   provide extensibility mechanisms which target both hardware and
1537	   software implementations.

1539	   There is also a large body of work around MPLS encapsulations
1540	   [RFC3032].  The MPLS-in-UDP work [I-D.ietf-mpls-in-udp] and GRE over
1541	   UDP [I-D.ietf-tsvwg-gre-in-udp-encap] have worked on some of the
1542	   common issues around checksum and congestion control.  MPLS also
1543	   introduced a entropy label [RFC6790].  There is also a proposal for
1544	   MPLS encryption [I-D.farrelll-mpls-opportunistic-encrypt].

1546	   The idea to use a UDP encapsulation with a UDP source port for
1547	   entropy for the underlay routers' ECMP dates back to LISP [RFC6830].

1549	   The pseudo-wire work [RFC3985] is interesting in the notion of
1550	   layering additional services/characteristics such as ordered delivery
1551	   or timely deliver on top of an encapsulation.  That layering approach
1552	   might be useful for the new encapsulations as well.  For instance,
1553	   the control word [RFC4385].  There is also material on congestion
1554	   control for pseudo-wires in [I-D.ietf-pwe3-congcons].

1556	   Both MPLS and L2TP [RFC3931] rely on some control or signaling to
1557	   establish state (for the path/labels in the case of MPLS, and for the
1558	   session in the case of L2TP).  The NVO3, SFC, and BIER encapsulations
1559	   will also have some separation between the data plane and control
1560	   plane, but the type of separation appears to be different.

1562	   IEEE 802.1 has defined encapsulations for L2 over L2, in the form of
1563	   Provider backbone Bridging (PBB) [IEEE802.1Q-2014] and Equal Cost
1564	   Multipath (ECMP) [IEEE802.1Q-2014].  The latter includes something
1565	   very similar to the way the UDP source port is used as entropy: "The
1566	   flow hash, carried in an F-TAG, serves to distinguish frames
1567	   belonging to different flows and can be used in the forwarding
1568	   process to distribute frames over equal cost paths"

1570	   TRILL, which is also a L2 over L2 encapsulation, took a different
1571	   approach to entropy but preserved the ability for OAM frames
1572	   [RFC7174] to use the same entropy hence ECMP path as data frames.  In
1573	   [I-D.ietf-trill-oam-fm] there 96 bytes of headers for entropy in the
1574	   OAM frames, followed by the actual OAM content.  This ensures that
1575	   any headers, which fit in those 96 bytes except the OAM bit in the
1576	   TRILL header, can be used for ECMP hashing.

1578	   As encapsulations evolve there might be a desire to fit multiple
1579	   inner packets into one outer packet.  The work in
1580	   [I-D.saldana-tsvwg-simplemux] might be interesting for that purpose.

1582	21.  Acknowledgements

1584	   The authors acknowledge the comments from Alia Atlas, Fred Baker,
1585	   David Black, Bob Briscoe, Stewart Bryant, Mike Cox, Andy Malis, Radia
1586	   Perlman, Carlos Pignataro, Jamal Hadi Salim, Michael Smith, and Lucy
1587	   Yong.

1589	22.  Open Issues

1591	   o  Middleboxes:

1593	      *  Due to OAM there are constraints on middleboxes in general.  If
1594	         middleboxes inspect the packet past the outer IP+UDP and
1595	         encapsulation header and look for inner IP and TCP/UDP headers,
1596	         that might violate the assumption that OAM packets will be
1597	         handled the same as regular data packets.  That issue is
1598	         broader than just QoS - applies to firewall filters etc.
1599	      *  Firewalls looking at inner payload?  How does that work for OAM
1600	         frames?  Even if it only drops ... TRILL approach might be an
1601	         option?  Would that encourage more middleboxes making the
1602	         network more fragile?
1603	      *  Editorially perhaps we should pull the above two into a
1604	         separate section about middlebox considerations?
1605	   o  Next-protocol indication - should it be common across different
1606	      encapsulation headers?  We will have different ways to indicate
1607	      the presence of the first encapsulation header in a packet (could
1608	      be a UDP destination port, an Ethernet type, etc depending on the
1609	      outer delivery header).  But for the next protocol past an
1610	      encapsulation header one could envision creating or adoption a
1611	      common scheme.  Such a would also need to be able to identify
1612	      following headers like Ethernet, IPv4/IPv6, ESP, etc.
1613	   o  Common OAM error reporting protocol?
1614	   o  There is discussion about timestamps, sequence numbers, etc in
1615	      three different parts of the document.  OAM, Congestion
1616	      Considerations, and Service Model, where the latter argues that a
1617	      pseudo-wire service should really be layered on top of the
1618	      encapsulation using its own header.  Those recommendations seem to
1619	      be at odds with each other.  Do we envision sequence numbers,
1620	      timestamps, etc as potential extensions for OAM and CC?  If so,
1621	      those extensions could be used to provide a service which doesn't
1622	      reorder packets.

1624	23.  Change Log

1626	   The changes from draft-rtg-dt-encap-01 based on feedback at the
1627	   Dallas IETF meeting:

1629	   o  Setting the context that not all common issues might apply to all
1630	      encapsulations, but that they should all be understood before
1631	      being dismissed.
1632	   o  Clarified that IPv6 flow label is useful for entropy in
1633	      combination with a UDP source port.
1634	   o  Editorially added a "summary" set of bullets to most sections.
1635	   o  Editorial clarifications in the next protocol section to more
1636	      clearly state the three areas.
1637	   o  Folded the two next protocol sections into one.
1638	   o  Mention the MPLS first nibble issue in the next protocol section.
1639	   o  Mention that viewing the next protocol as an index to a table with
1640	      processing instructions can provide additional flexibility in the
1641	      protocol evolution.
1642	   o  For the OAM "don't forward to end stations" added that defining a
1643	      bit seems better than using a special next-protocol value.
1644	   o  Added mention of DTLS in addition to IPsec for security.
1645	   o  Added some mention of IPv6 hob-by-hop options of other headers
1646	      than potentially can be copied from inner to outer header.
1647	   o  Added text on architectural considerations when it might make
1648	      sense to define an additional header/protocol as opposed to using
1649	      the extensibility mechanism in the existing encapsulation
1650	      protocol.
1651	   o  Clarified the "unconstrained TLVs" in the hardware friendly
1652	      section.
1653	   o  Clarified the text around checksum verification and full vs.
1654	      header checksums.
1655	   o  Added wording that the considerations might apply for encaps
1656	      outside of the routing area.
1657	   o  Added references to draft-ietf-pwe3-congcons, draft-ietf-tsvwg-
1658	      rfc5405bis, RFC2473, and RFC7325
1659	   o  Removed reference to RFC3948.
1660	   o  Updated the acknowledgements section.
1661	   o  Added this change log section.

1663	24.  References

1665	24.1.  Normative References

1667	   [I-D.ietf-tsvwg-rfc5405bis]
1668	              Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
1669	              Guidelines", draft-ietf-tsvwg-rfc5405bis-10 (work in
1670	              progress), March 2016.

1672	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1673	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
1674	              December 1998, <http://www.rfc-editor.org/info/rfc2460>.

1676	   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
1677	              IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473,
1678	              December 1998, <http://www.rfc-editor.org/info/rfc2473>.

1680	   [RFC2890]  Dommety, G., "Key and Sequence Number Extensions to GRE",
1681	              RFC 2890, DOI 10.17487/RFC2890, September 2000,
1682	              <http://www.rfc-editor.org/info/rfc2890>.

1684	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1685	              RFC 2983, DOI 10.17487/RFC2983, October 2000,
1686	              <http://www.rfc-editor.org/info/rfc2983>.

1688	   [RFC3032]  Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
1689	              Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
1690	              Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001,
1691	              <http://www.rfc-editor.org/info/rfc3032>.

1693	   [RFC3931]  Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed.,
1694	              "Layer Two Tunneling Protocol - Version 3 (L2TPv3)",
1695	              RFC 3931, DOI 10.17487/RFC3931, March 2005,
1696	              <http://www.rfc-editor.org/info/rfc3931>.

1698	   [RFC3985]  Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation
1699	              Edge-to-Edge (PWE3) Architecture", RFC 3985,
1700	              DOI 10.17487/RFC3985, March 2005,
1701	              <http://www.rfc-editor.org/info/rfc3985>.

1703	   [RFC4385]  Bryant, S., Swallow, G., Martini, L., and D. McPherson,
1704	              "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
1705	              Use over an MPLS PSN", RFC 4385, DOI 10.17487/RFC4385,
1706	              February 2006, <http://www.rfc-editor.org/info/rfc4385>.

1708	   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
1709	              for Application Designers", BCP 145, RFC 5405,
1710	              DOI 10.17487/RFC5405, November 2008,
1711	              <http://www.rfc-editor.org/info/rfc5405>.

1713	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
1714	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
1715	              2010, <http://www.rfc-editor.org/info/rfc6040>.

1717	   [RFC6438]  Carpenter, B. and S. Amante, "Using the IPv6 Flow Label
1718	              for Equal Cost Multipath Routing and Link Aggregation in
1719	              Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011,
1720	              <http://www.rfc-editor.org/info/rfc6438>.

1722	   [RFC6790]  Kompella, K., Drake, J., Amante, S., Henderickx, W., and
1723	              L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
1724	              RFC 6790, DOI 10.17487/RFC6790, November 2012,
1725	              <http://www.rfc-editor.org/info/rfc6790>.

1727	   [RFC6830]  Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The
1728	              Locator/ID Separation Protocol (LISP)", RFC 6830,
1729	              DOI 10.17487/RFC6830, January 2013,
1730	              <http://www.rfc-editor.org/info/rfc6830>.

1732	   [RFC6935]  Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and
1733	              UDP Checksums for Tunneled Packets", RFC 6935,
1734	              DOI 10.17487/RFC6935, April 2013,
1735	              <http://www.rfc-editor.org/info/rfc6935>.

1737	   [RFC6936]  Fairhurst, G. and M. Westerlund, "Applicability Statement
1738	              for the Use of IPv6 UDP Datagrams with Zero Checksums",
1739	              RFC 6936, DOI 10.17487/RFC6936, April 2013,
1740	              <http://www.rfc-editor.org/info/rfc6936>.

1742	   [RFC7174]  Salam, S., Senevirathne, T., Aldrin, S., and D. Eastlake
1743	              3rd, "Transparent Interconnection of Lots of Links (TRILL)
1744	              Operations, Administration, and Maintenance (OAM)
1745	              Framework", RFC 7174, DOI 10.17487/RFC7174, May 2014,
1746	              <http://www.rfc-editor.org/info/rfc7174>.

1748	   [RFC7325]  Villamizar, C., Ed., Kompella, K., Amante, S., Malis, A.,
1749	              and C. Pignataro, "MPLS Forwarding Compliance and
1750	              Performance Requirements", RFC 7325, DOI 10.17487/RFC7325,
1751	              August 2014, <http://www.rfc-editor.org/info/rfc7325>.

1753	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1754	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
1755	              eXtensible Local Area Network (VXLAN): A Framework for
1756	              Overlaying Virtualized Layer 2 Networks over Layer 3
1757	              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
1758	              <http://www.rfc-editor.org/info/rfc7348>.

1760	   [RFC7364]  Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
1761	              Kreeger, L., and M. Napierala, "Problem Statement:
1762	              Overlays for Network Virtualization", RFC 7364,
1763	              DOI 10.17487/RFC7364, October 2014,
1764	              <http://www.rfc-editor.org/info/rfc7364>.

1766	   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
1767	              Rekhter, "Framework for Data Center (DC) Network
1768	              Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
1769	              2014, <http://www.rfc-editor.org/info/rfc7365>.

1771	24.2.  Informative References

1773	   [I-D.briscoe-conex-data-centre]
1774	              Briscoe, B. and M. Sridharan, "Network Performance
1775	              Isolation in Data Centres using Congestion Policing",
1776	              draft-briscoe-conex-data-centre-02 (work in progress),
1777	              February 2014.

1779	   [I-D.farrelll-mpls-opportunistic-encrypt]
1780	              Farrel, A. and S. Farrell, "Opportunistic Security in MPLS
1781	              Networks", draft-farrelll-mpls-opportunistic-encrypt-05
1782	              (work in progress), June 2015.

1784	   [I-D.gross-geneve]
1785	              Gross, J., Sridhar, T., Garg, P., Wright, C., Ganga, I.,
1786	              Agarwal, P., Duda, K., Dutt, D., and J. Hudson, "Geneve:
1787	              Generic Network Virtualization Encapsulation", draft-
1788	              gross-geneve-02 (work in progress), October 2014.

1790	   [I-D.herbert-gue]
1791	              Herbert, T., Yong, L., and O. Zia, "Generic UDP
1792	              Encapsulation", draft-herbert-gue-03 (work in progress),
1793	              March 2015.

1795	   [I-D.herbert-remotecsumoffload]
1796	              Herbert, T., "Remote checksum offload for encapsulation",
1797	              draft-herbert-remotecsumoffload-02 (work in progress),
1798	              March 2016.

1800	   [I-D.ietf-mpls-in-udp]
1801	              Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black,
1802	              "Encapsulating MPLS in UDP", draft-ietf-mpls-in-udp-11
1803	              (work in progress), January 2015.

1805	   [I-D.ietf-nvo3-arch]
1806	              Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
1807	              Narten, "An Architecture for Overlay Networks (NVO3)",
1808	              draft-ietf-nvo3-arch-04 (work in progress), October 2015.

1810	   [I-D.ietf-pwe3-congcons]
1811	              Stein, Y., Black, D., and B. Briscoe, "Pseudowire
1812	              Congestion Considerations", draft-ietf-pwe3-congcons-02
1813	              (work in progress), July 2014.

1815	   [I-D.ietf-sfc-architecture]
1816	              Halpern, J. and C. Pignataro, "Service Function Chaining
1817	              (SFC) Architecture", draft-ietf-sfc-architecture-11 (work
1818	              in progress), July 2015.

1820	   [I-D.ietf-sfc-problem-statement]
1821	              Quinn, P. and T. Nadeau, "Service Function Chaining
1822	              Problem Statement", draft-ietf-sfc-problem-statement-13
1823	              (work in progress), February 2015.

1825	   [I-D.ietf-trill-oam-fm]
1826	              Senevirathne, T., Finn, N., Salam, S., Kumar, D.,
1827	              Eastlake, D., Aldrin, S., and L. Yizhou, "TRILL Fault
1828	              Management", draft-ietf-trill-oam-fm-11 (work in
1829	              progress), October 2014.

1831	   [I-D.ietf-tsvwg-circuit-breaker]
1832	              Fairhurst, G., "Network Transport Circuit Breakers",
1833	              draft-ietf-tsvwg-circuit-breaker-13 (work in progress),
1834	              February 2016.

1836	   [I-D.ietf-tsvwg-gre-in-udp-encap]
1837	              Yong, L., Crabbe, E., Xu, X., and T. Herbert, "GRE-in-UDP
1838	              Encapsulation", draft-ietf-tsvwg-gre-in-udp-encap-11 (work
1839	              in progress), March 2016.

1841	   [I-D.ietf-tsvwg-port-use]
1842	              Touch, J., "Recommendations on Using Assigned Transport
1843	              Port Numbers", draft-ietf-tsvwg-port-use-11 (work in
1844	              progress), April 2015.

1846	   [I-D.quinn-sfc-nsh]
1847	              Quinn, P., Guichard, J., Surendra, S., Smith, M.,
1848	              Henderickx, W., Nadeau, T., Agarwal, P., Manur, R.,
1849	              Chauhan, A., Halpern, J., Majee, S., Elzur, U., Melman,
1850	              D., Garg, P., McConnell, B., Wright, C., and K. Kevin,
1851	              "Network Service Header", draft-quinn-sfc-nsh-07 (work in
1852	              progress), February 2015.

1854	   [I-D.saldana-tsvwg-simplemux]
1855	              Saldana, J., "Simplemux. A generic multiplexing protocol",
1856	              draft-saldana-tsvwg-simplemux-04 (work in progress),
1857	              January 2016.

1859	   [I-D.shepherd-bier-problem-statement]
1860	              Shepherd, G., Dolganow, A., and a.
1861	              arkadiy.gulko@thomsonreuters.com, "Bit Indexed Explicit
1862	              Replication (BIER) Problem Statement", draft-shepherd-
1863	              bier-problem-statement-02 (work in progress), February
1864	              2015.

1866	   [I-D.sridharan-virtualization-nvgre]
1867	              Garg, P. and Y. Wang, "NVGRE: Network Virtualization using
1868	              Generic Routing Encapsulation", draft-sridharan-
1869	              virtualization-nvgre-08 (work in progress), April 2015.

1871	   [I-D.wei-tsvwg-tunnel-congestion-feedback]
1872	              Wei, X., Zhu, L., Deng, L., and B. Briscoe, "Tunnel
1873	              Congestion Feedback", draft-wei-tsvwg-tunnel-congestion-
1874	              feedback-04 (work in progress), June 2015.

1876	   [I-D.wijnands-bier-architecture]
1877	              Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and
1878	              S. Aldrin, "Multicast using Bit Index Explicit
1879	              Replication", draft-wijnands-bier-architecture-05 (work in
1880	              progress), March 2015.

1882	   [I-D.wijnands-mpls-bier-encapsulation]
1883	              Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and
1884	              S. Aldrin, "Encapsulation for Bit Index Explicit
1885	              Replication in MPLS Networks", draft-wijnands-mpls-bier-
1886	              encapsulation-02 (work in progress), December 2014.

1888	   [I-D.xu-bier-encapsulation]
1889	              Xu, X., Somasundaram, S., Jacquenet, C., and R. Raszuk,
1890	              "BIER Encapsulation", draft-xu-bier-encapsulation-03 (work
1891	              in progress), October 2015.

1893	   [IEEE802.1Q-2014]
1894	              IEEE, "IEEE Standard for Local and metropolitan area
1895	              networks--Bridges and Bridged Networks", IEEE Std 802.1Q-
1896	              2014, 2014,
1897	              <http://www.ieee802.org/1/pages/802.1Q-2014.html>.

1899	              (Access Controlled link within page)

1901	   [RFC0033]  Crocker, S., "New Host-Host Protocol", RFC 33,
1902	              DOI 10.17487/RFC0033, February 1970,
1903	              <http://www.rfc-editor.org/info/rfc33>.

1905	Authors' Addresses

1907	   Erik Nordmark
1908	   Arista Networks
1909	   5453 Great America Parkway
1910	   Santa Clara, CA 95054
1911	   USA

1913	   Email: nordmark@arista.com

1915	   Albert Tian
1916	   Ericsson Inc.
1917	   300 Holger Way
1918	   San Jose, California  95134
1919	   USA

1921	   Email: albert.tian@ericsson.com

1923	   Jesse Gross
1924	   VMware
1925	   3401 Hillview Ave.
1926	   Palo Alto, CA  94304
1927	   USA

1929	   Email: jgross@vmware.com

1931	   Jon Hudson
1932	   Brocade Communications Systems, Inc.
1933	   130 Holger Way
1934	   San Jose, CA  95134
1935	   USA

1937	   Email: jon.hudson@gmail.com

1939	   Lawrence Kreeger
1940	   Cisco Systems, Inc.
1941	   170 W. Tasman Avenue
1942	   San Jose, CA 95134
1943	   USA

1945	   Email: kreeger@cisco.com
1946	   Pankaj Garg
1947	   Microsoft
1948	   1 Microsoft Way
1949	   Redmond, WA  98052
1950	   USA

1952	   Email: pankajg@microsoft.com

1954	   Patricia Thaler
1955	   Broadcom Corporation
1956	   3151 Zanker Road
1957	   San Jose, CA 95134
1958	   USA

1960	   Email: pthaler@broadcom.com

1962	   Tom Herbert
1963	   Facebook
1964	   1 Hacker Way
1965	   Menlo Park, CA 94052
1966	   USA

1968	   Email: tom@herbertland.com