idnits 2.17.1 

draft-rtg-dt-encap-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 292: '...italized keyword MUST is used as defin...'
     RFC 2119 keyword, line 365: '...lancing procedure MUST choose the same...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 21, 2015) is 3256 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'I-D.xu-bier-encapsulation' is defined on line 1809,
     but no explicit reference was found in the text

  == Outdated reference: A later version (-19) exists of
     draft-ietf-tsvwg-rfc5405bis-02

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085)

  ** Obsolete normative reference: RFC 6830 (Obsoleted by RFC 9300, RFC 9301)

  == Outdated reference: A later version (-05) exists of
     draft-farrelll-mpls-opportunistic-encrypt-04

  == Outdated reference: A later version (-02) exists of
     draft-herbert-remotecsumoffload-01

  == Outdated reference: A later version (-08) exists of
     draft-ietf-nvo3-arch-03

  == Outdated reference: A later version (-11) exists of
     draft-ietf-sfc-architecture-08

  == Outdated reference: A later version (-15) exists of
     draft-ietf-tsvwg-circuit-breaker-01

  == Outdated reference: A later version (-19) exists of
     draft-ietf-tsvwg-gre-in-udp-encap-06

  == Outdated reference: A later version (-12) exists of
     draft-saldana-tsvwg-simplemux-02

  == Outdated reference: A later version (-04) exists of
     draft-wei-tsvwg-tunnel-congestion-feedback-03

  == Outdated reference: A later version (-06) exists of
     draft-xu-bier-encapsulation-02


     Summary: 5 errors (**), 0 flaws (~~), 12 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTGWG                                                   E. Nordmark (ed)
3	Internet-Draft                                           Arista Networks
4	Intended status: Informational                                   A. Tian
5	Expires: November 22, 2015                                 Ericsson Inc.
6	                                                                J. Gross
7	                                                                  VMware
8	                                                               J. Hudson
9	                                         Brocade Communications Systems,
10	                                                                    Inc.
11	                                                              L. Kreeger
12	                                                     Cisco Systems, Inc.
13	                                                                 P. Garg
14	                                                               Microsoft
15	                                                               P. Thaler
16	                                                    Broadcom Corporation
17	                                                              T. Herbert
18	                                                                  Google
19	                                                            May 21, 2015

21	                      Encapsulation Considerations
22	                         draft-rtg-dt-encap-02

24	Abstract

26	   The IETF Routing Area director has chartered a design team to look at
27	   common issues for the different data plane encapsulations being
28	   discussed in the NVO3 and SFC working groups and also in the BIER
29	   BoF, and also to look at the relationship between such encapsulations
30	   in the case that they might be used at the same time.  The purpose of
31	   this design team is to discover, discuss and document considerations
32	   across the different encapsulations in the different WGs/BoFs so that
33	   we can reduce the number of wheels that need to be reinvented in the
34	   future.

36	Status of this Memo

38	   This Internet-Draft is submitted in full conformance with the
39	   provisions of BCP 78 and BCP 79.

41	   Internet-Drafts are working documents of the Internet Engineering
42	   Task Force (IETF).  Note that other groups may also distribute
43	   working documents as Internet-Drafts.  The list of current Internet-
44	   Drafts is at http://datatracker.ietf.org/drafts/current/.

46	   Internet-Drafts are draft documents valid for a maximum of six months
47	   and may be updated, replaced, or obsoleted by other documents at any
48	   time.  It is inappropriate to use Internet-Drafts as reference
49	   material or to cite them other than as "work in progress."

51	   This Internet-Draft will expire on November 22, 2015.

53	Copyright Notice

55	   Copyright (c) 2015 IETF Trust and the persons identified as the
56	   document authors.  All rights reserved.

58	   This document is subject to BCP 78 and the IETF Trust's Legal
59	   Provisions Relating to IETF Documents
60	   (http://trustee.ietf.org/license-info) in effect on the date of
61	   publication of this document.  Please review these documents
62	   carefully, as they describe your rights and restrictions with respect
63	   to this document.  Code Components extracted from this document must
64	   include Simplified BSD License text as described in Section 4.e of
65	   the Trust Legal Provisions and are provided without warranty as
66	   described in the Simplified BSD License.

68	Table of Contents

70	   1.  Design Team Charter  . . . . . . . . . . . . . . . . . . . . .  4
71	   2.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
72	   3.  Common Issues  . . . . . . . . . . . . . . . . . . . . . . . .  6
73	   4.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
74	   5.  Assumptions  . . . . . . . . . . . . . . . . . . . . . . . . .  7
75	   6.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  8
76	   7.  Entropy  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
77	   8.  Next-protocol indication . . . . . . . . . . . . . . . . . . .  9
78	   9.  MTU and Fragmentation  . . . . . . . . . . . . . . . . . . . . 11
79	   10. OAM  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
80	   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 14
81	     11.1.  Encapsulation-specific considerations . . . . . . . . . . 14
82	     11.2.  Virtual network isolation . . . . . . . . . . . . . . . . 16
83	     11.3.  Packet level security . . . . . . . . . . . . . . . . . . 17
84	     11.4.  In summary: . . . . . . . . . . . . . . . . . . . . . . . 17
85	   12. QoS  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
86	   13. Congestion Considerations  . . . . . . . . . . . . . . . . . . 18
87	   14. Header Protection  . . . . . . . . . . . . . . . . . . . . . . 20
88	   15. Extensibility Considerations . . . . . . . . . . . . . . . . . 22
89	   16. Layering Considerations  . . . . . . . . . . . . . . . . . . . 25
90	   17. Service model  . . . . . . . . . . . . . . . . . . . . . . . . 26
91	   18. Hardware Friendly  . . . . . . . . . . . . . . . . . . . . . . 27
92	     18.1.  Considerations for NIC offload  . . . . . . . . . . . . . 28
93	   19. Middlebox Considerations . . . . . . . . . . . . . . . . . . . 32
94	   20. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 33
95	   21. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 34
96	   22. Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . . 34
97	   23. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 35
98	   24. References . . . . . . . . . . . . . . . . . . . . . . . . . . 35
99	     24.1.  Normative References  . . . . . . . . . . . . . . . . . . 35
100	     24.2.  Informative References  . . . . . . . . . . . . . . . . . 37
101	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40

103	1.  Design Team Charter

105	   There have been multiple efforts over the years that have resulted in
106	   new or modified data plane behaviors involving encapsulations.  That
107	   includes IETF efforts like MPLS, LISP, and TRILL but also industry
108	   efforts like VXLAN and NVGRE.  These collectively can be seen as a
109	   source of insight into the properties that data planes need to meet.
110	   The IETF is currently working on potentially new encapsulations in
111	   NVO3 and SFC and considering working on BIER.  In addition there is
112	   work on tunneling in the INT area.

114	   This is a short term design team chartered to collect and construct
115	   useful advice to parties working on new or modified data plane
116	   behaviors that include additional encapsulations.  The goal is for
117	   the group to document useful advice gathered from interacting with
118	   ongoing efforts.  An Internet Draft will be produced for IETF92 to
119	   capture that advice, which will be discussed in RTGWG.

121	   Data plane encapsulations face a set of common issues such as:
122	   o  How to provide entropy for ECMP
123	   o  Issues around packet size and fragmentation/reassembly
124	   o  OAM - what support is needed in an encapsulation format?
125	   o  Security and privacy.
126	   o  QoS
127	   o  Congestion Considerations
128	   o  IPv6 header protection (zero UDP checksum over IPv6 issue)
129	   o  Extensibility - e.g., for evolving OAM, security, and/or
130	      congestion control
131	   o  Layering of multiple encapsulations e.g., SFC over NVO3 over BIER
132	   The design team will provide advice on those issues.  The intention
133	   is that even where we have different encapsulations for different
134	   purposes carrying different information, each such encapsulation
135	   doesn't have to reinvent the wheel for the above common issues.

137	   The design team will look across the routing area in particular at
138	   SFC, NVO3 and BIER.  It will not be involved in comparing or
139	   analyzing any particular encapsulation formats proposed in those WGs
140	   and BoFs but instead focus on common advice.

142	2.  Overview

144	   The references provide background information on NVO3, SFC, and BIER.
145	   In particular, NVO3 is introduced in [RFC7364], [RFC7365], and
146	   [I-D.ietf-nvo3-arch].  SFC is introduced in
147	   [I-D.ietf-sfc-architecture] and [I-D.ietf-sfc-problem-statement].
148	   Finally, the information on BIER is in
149	   [I-D.shepherd-bier-problem-statement],

151	   [I-D.wijnands-bier-architecture], and
152	   [I-D.wijnands-mpls-bier-encapsulation].  We assume the reader has
153	   some basic familiarity with those proposed encapsulations.  The
154	   Related Work section points at some prior work that relates to the
155	   encapsulation considerations in this document.

157	   Encapsulation protocols typically have some unique information that
158	   they need to carry.  In some cases that information might be modified
159	   along the path and in other cases it is constant.  The in-flight
160	   modifications has impacts on what it means to provide security for
161	   the encapsulation headers.
162	   o  NVO3 carries a VNI Identifier edge to edge which is not modified.
163	      There has been OAM discussions in the WG and it isn't clear
164	      whether some of the OAM information might be modified in flight.
165	   o  SFC carries service meta-data which might be modified or
166	      unmodified as the packets follow the service path.  SFC talks of
167	      some loop avoidance mechanism which is likely to result in
168	      modifications for for each hop in the service chain even if the
169	      meta-data is unmodified.
170	   o  BIER carries a bitmap of egress ports to which a packet should be
171	      delivered, and as the packet is forwarded down different paths
172	      different bits are cleared in that bitmap.

174	   Even if information isn't modified in flight there might be devices
175	   that wish to inspect that information.  For instance, one can
176	   envision future NVO3 security devices which filter based on the
177	   virtual network identifier.

179	   The need for extensibility is different across the protocols
180	   o  NVO3 might need some extensions for OAM and security.
181	   o  SFC is all about carrying service meta-data along a path, and
182	      different services might need different types and amount of meta-
183	      data.
184	   o  BIER might need variable number of bits in their bitmaps, or other
185	      future schemes to scale up to larger network.
186	   The extensibility needs and constraints might be different when
187	   considering hardware vs. software implementations of the
188	   encapsulation headers.  NIC hardware might have different constraints
189	   than switch hardware.

191	   As the IETF designs these encapsulations the different WGs solve the
192	   issues for their own encapsulation.  But there are likely to be
193	   future cases when the different encapsulations are combined in the
194	   same header.  For instance, NVO3 might be a "transport" used to carry
195	   SFC between the different hops in the service chain.

197	   Most of the issues discussed in this document are not new.  The IETF
198	   and industry as specified and deployed many different encapsulation
199	   or tunneling protocols over time, ranging from simple IP-in-IP and
200	   GRE encapsulation, IPsec, pseudo-wires, session-based approached like
201	   L2TP, and the use of MPLS control and data planes.  IEEE 802 has also
202	   defined layered encapsulation for Provider Backbone Bridges (PBB) and
203	   IEEE 802.1Qbp (ECMP).  This document tries to leverage what we
204	   collectively have learned from that experience and summarize what
205	   would be relevant for new encapsulations like NVO3, SFC, and BIER.

207	3.  Common Issues

209	   [This section is mostly a repeat of the charter but with a few
210	   modifications and additions.]

212	   Any new encapsulation protocol would need to address a large set of
213	   issues that are not central to the new information that this protocol
214	   intends to carry.  The common issues explored in this document are:
215	   o  How to provide entropy for Equal Cost MultiPath (ECMP) routing
216	   o  Issues around packet size and fragmentation/reassembly
217	   o  Next header indication - each encapsulation might be able to carry
218	      different payloads
219	   o  OAM - what support is needed in an encapsulation format?
220	   o  Security and privacy
221	   o  QoS
222	   o  Congestion Considerations
223	   o  Header protection
224	   o  Extensibility - e.g., for evolving OAM, security, and/or
225	      congestion control
226	   o  Layering of multiple encapsulations e.g., SFC over NVO3 over BIER
227	   o  Importance of being friendly to hardware and software
228	      implementations

230	   The degree to which these common issues apply to a particular
231	   encapsulation can differ based on the intended purpose of the
232	   encapsulation.  But it is useful to understand all of them before
233	   determining which ones apply.

235	4.  Scope

237	   It is important to keep in mind what we are trying to cover and not
238	   cover in this document and effort.  This is
239	   o  A look across the three new encapsulations, while taking lots of
240	      previous work into account
241	   o  Focus on the class of encapsulations that would run over IP/UDP.
242	      That was done to avoid being distracted by the data-plane and
243	      control-plane interaction, which is more significant for protocols
244	      that are designed to run over "transports" that maintain session
245	      or path state.
246	   o  We later expanded the scope somewhat to consider how the
247	      encapsulations would play with MPLS "transport", which is
248	      important because SFC and BIER seem to target being independent of
249	      the underlying "transport"

251	   However, this document and effort is NOT intended to:
252	   o  Design some new encapsulation header to rule them all
253	   o  Design yet another new NVO3 encapsulation header
254	   o  Try to select the best encapsulation header
255	   o  Evaluate any existing and proposed encapsulations

257	   While the origin and focus of this document is the routing area and
258	   in particular NVO3, SFC, and BIER, the considerations apply to other
259	   encapsulations that are being defined in the IETF and elsewhere.
260	   There seems to be an increase in the number of encapsulations being
261	   defined to run over UDP, where there might already exist an
262	   encapsulation over IP or Ethernet.  Feedback on how these
263	   considerations apply in those contexts is welcome.

265	5.  Assumptions

267	   The design center for the new encapsulations is a well-managed
268	   network.  That network can be a datacenter network (plus datacenter
269	   interconnect) or a service provider network.  Based on the existing
270	   and proposed encapsulations in those environment it is reasonable to
271	   make these assumptions:
272	   o  The MTU is carefully managed and configured.  Hence an
273	      encapsulation protocol can make the packets bigger without
274	      resulting in a requirement for fragmentation and reassembly
275	      between ingress and egress.  (However, it might be useful to
276	      detecting MTU misconfigurations.)
277	   o  In general an encapsulation needs some approach for congestion
278	      management.  But the assumptions are different than for arbitrary
279	      Internet paths in that the underlay might be well-provisioned and
280	      better policed at the edge, and due to multi-tenancy, the
281	      congestion control in the endpoints might be even less trusted
282	      than on the Internet at large.

284	   The goal is to implement these encapsulations in hardware and
285	   software hence we can't assume that the needs of either
286	   implementation approach can trump the needs of the other.  In
287	   particular, around extensibility the needs and constraints might be
288	   quite different.

290	6.  Terminology

292	   The capitalized keyword MUST is used as defined in
293	   http://en.wikipedia.org/wiki/Julmust

295	   TBD: Refer to existing documents for at least NVO3 and SFC
296	   terminology.  We use at least the VNI ID in this document.

298	7.  Entropy

300	   In many cases the encapsulation format needs to enable ECMP in
301	   unmodified routers.  Those routers might use different fields in TCP/
302	   UDP packets to do ECMP without a risk of reordering a flow.

304	   The common way to do ECMP-enabled encapsulation over IP today is to
305	   add a UDP header and to use UDP with the UDP source port carrying
306	   entropy from the inner/original packet headers as in LISP [RFC6830].
307	   The total entropy consists of 14 bits in the UDP source port (using
308	   the ephemeral port range) plus the outer IP addresses which seems to
309	   be sufficient for entropy; using outer IPv6 headers would give the
310	   option for more entropy should it be needed in the future.

312	   In some environments it might be fine to use all 16 bits of the port
313	   range.  However, middleboxes might make assumptions about the system
314	   ports or user ports.  But they should not make any assumptions about
315	   the ports in the Dynamic and/or Private Port range, which have the
316	   two MSBs set to 11b.

318	   The UDP source port might change over the lifetime of an encapsulated
319	   flow, for instance for DoS mitigation or re-balancing load across
320	   ECMP.

322	   There is some interaction between entropy and OAM and extensibility
323	   mechanism.  It is desirable to be able to send OAM packets to follow
324	   the same path as network packets.  Hence OAM packets should use the
325	   same entropy mechanism as data packets.  While routers might use
326	   information in addition the entropy field and outer IP header, they
327	   can not use arbitrary parts of the encapsulation header since that
328	   might result in OAM frames taking a different path.  Likewise if
329	   routers look past the encapsulation header they need to be aware of
330	   the extensibility mechanism(s) in the encapsulation format to be able
331	   to find the inner headers in the presence of extensions; OAM frames
332	   might use some extensions e.g. for timestamps.

334	   Architecturally the entropy and the next header field are really part
335	   of enclosing delivery header.  UDP with entropy goes hand-in-hand
336	   with the outer IP header.  Thus the UDP entropy is present for the
337	   underlay IP routers the same way that an MPLS entropy label is
338	   present for LSRs.  The entropy above is all about providing entropy
339	   for the outer delivery of the encapsulated packets.

341	   It has been suggested that when IPv6 is used it would not be
342	   necessary to add a UDP header for entropy, since the IPv6 flow label
343	   can be used for entropy.  (This assumes that there is an IP protocol
344	   number for the encapsulation in addition to a UDP destination port
345	   number since UDP would be used with IPv4 underlay.  And any use of
346	   UDP checksums would need to be replaced by an encaps-specific
347	   checksum or secure hash.)  While such an approach would save 8 bytes
348	   of headers when the underlay is IPv6, it does assume that the
349	   underlay routers use the flow label for ECMP, and it also would make
350	   the IPv6 approach different than the IPv4 approach.  Currently the
351	   leaning is towards recommending using the UDP encapsulation for both
352	   IPv4 and IPv6 underlay.  The IPv6 flow label can be used for
353	   additional entropy if need be.

355	   Note that in the proposed BIER encapsulation
356	   [I-D.wijnands-mpls-bier-encapsulation], there is an an 8-bit field
357	   which specifies an entropy value that can be used for load balancing
358	   purposes.  This entropy is for the BIER forwarding decisions, which
359	   is independent of any outer delivery ECMP between BIER routers.  Thus
360	   it is not part of the delivery ECMP discussed in this section.
361	      [Note: For any given bit in BIER (that identifies an exit from the
362	      BIER domain) there might be multiple immediate next hops.  The
363	      BIER entropy field is used to select that next hop as part of BIER
364	      processing.  The BIER forwarding process may do equal cost load
365	      balancing, but the load balancing procedure MUST choose the same
366	      path for any two packets have the same entropy value.]

368	   In summary:
369	   o  The entropy is associated with the transport, that is an outer IP
370	      header or MPLS.
371	   o  In the case of IP transport use >=14 bits of UDP source port, plus
372	      outer IPv6 flowid for entropy.

374	8.  Next-protocol indication

376	   Next-protocol indications appear in three different context for
377	   encapsulations.

379	   Firstly, the transport delivery mechanism for the encapsulations we
380	   discuss in this document need some way to indicate which
381	   encapsulation header (or other payload) comes next in the packet.
382	   Some encapsulations might be identified by a UDP port; others might
383	   be identified by an Ethernet type or IP protocol number.  Which
384	   approach is used is a function of the preceding header the same way
385	   as IPv4 is identified by both an Ethernet type and an IP protocol
386	   number (for IP-in-IP).  In some cases the header type is implicit in
387	   some session (L2TP) or path (MPLS) setup.  But this is largely beyond
388	   the control of the encapsulation protocol.  For instance, if there is
389	   a requirement to carry the encapsulation after an Ethernet header,
390	   then an Ethernet type is needed.  If required to be carried after an
391	   IP/UDP header, then a UDP port number is needed.  For UDP port
392	   numbers there are considerations for port number conservation
393	   described in [I-D.ietf-tsvwg-port-use].

395	   It is worth mentioning that in the MPLS case of no implicit protocol
396	   type many forwarding devices peek at the first nibble of the payload
397	   to determine whether to apply IPv4 or IPv6 L3/L4 hashes for load
398	   balancing [RFC7325].  That behavior places some constraints on other
399	   payloads carried over MPLS and some protocol define an initial
400	   control word in the payload with a value of zero in its first nibble
401	   [RFC4385] to avoid confusion with IPv4 and IPv6 payload headers.

403	   Secondly, the encapsulation needs to indicate the type of its
404	   payload, which is in scope for the design of the encapsulation.  We
405	   have existing protocols which use Ethernet types (such as GRE).  Here
406	   each encapsulation header can potentially makes its own choices
407	   between:
408	   o  Reuse Ethernet types - makes it easy to carry existing L2 and L3
409	      protocols including IPv6, IPv6, and Ethernet.  Disadvantages are
410	      that it is a 16 bit number and we probably need far less than 100
411	      values, and the number space is controlled by the IEEE 802 RAC
412	      with its own allocation policies.
413	   o  Reuse IP protocol numbers - makes it easy to carry e.g., ESP in
414	      addition to IP and Etnernet but brings in all existing protocol
415	      numbers many of which would never be used directly on top of the
416	      encapsulation protocol.  IANA managed eight bit values, presumably
417	      more difficult to get an assigned number than to get a transport
418	      port assignment.
419	   o  Define their own next-protocol number space, which can use fewer
420	      bits than an Ethernet type and give more flexibility, but at the
421	      cost of administering that numbering space (presumably by the
422	      IANA).

424	   Thirdly, if the IETF ends up defining multiple encapsulations at
425	   about the same time, and there is some chance that multiple such
426	   encapsulations can be combined in the same packet, there is a
427	   question whether it makes sense to use a common approach and
428	   numbering space for the encapsulation across the different protocols.
429	   A common approach might not be beneficial as long as there is only
430	   one way to indicate e.g., SFC inside NVO3.

432	   Many Internet protocols use fixed values (typically managed by the
433	   IANA function) for their next-protocol field.  That facilitates
434	   interpretation of packets by middleboxes and e.g., for debugging
435	   purposes, but might make the protocol evolution inflexible.  Our
436	   collective experience with MPLS shows an alternative where the label
437	   can be viewed as an index to a table containing processing
438	   instructions and the table content can be managed in different ways.
439	   Encapsulations might want to consider the tradeoffs between such more
440	   flexible versus more fixed approaches.

442	   In summary:
443	   o  Would it be useful for the IETF come up with a common scheme for
444	      encapsulation protocols?  If not each encapsulation can define its
445	      own scheme.

447	9.  MTU and Fragmentation

449	   A common approach today is to assume that the underlay have
450	   sufficient MTU to carry the encapsulated packets without any
451	   fragmentation and reassembly at the tunnel endpoints.  That is
452	   sufficient when the operator of the ingress and egress have full
453	   control of the paths between those endpoints.  And it makes for
454	   simpler (hardware) implementations if fragmentation and reassembly
455	   can be avoided.

457	   However, even under that assumption it would be beneficial to be able
458	   to detect when there is some misconfiguration causing packets to be
459	   dropped due to MTU issues.  One way to do this is to have the
460	   encapsulator set the don't-fragment (DF) flag in the outer IPv4
461	   header and receive and log any received ICMP "packet too big" (PTB)
462	   errors.  Note that no flag needs to be set in an outer IPv6 header
463	   [RFC2460].

465	   Encapsulations could also define an optional tunnel fragmentation and
466	   reassembly mechanism which would be useful in the case when the
467	   operator doesn't have full control of the path, or when the protocol
468	   gets deployed outside of its original intended context.  Such a
469	   mechanism would be required if the underlay might have a path MTU
470	   which makes it impossible to carry at least 1518 bytes (if offering
471	   Ethernet service), or at least 1280 (if offering IPv6 service).  The
472	   use of such a protocol mechanism could be triggered by receiving a
473	   PTB.  But such a mechanism might not be implemented by all
474	   encapsulators and decapsulators.  [Aerolink is one example of such a
475	   protocol.]

477	   Depending on the payload carried by the encapsulation there are some
478	   additional possibilities:

480	   o  If payload is IPv4/6 then the underlay path MTU could be used to
481	      report end-to-end path MTU.
482	   o  If the payload service is Ethernet/L2, then there is no such per
483	      destination reporting mechanism.  However, there is a LLDP TLV for
484	      reporting max frame size; might be useful to report minimum to end
485	      stations, but unmodified end stations would do nothing with that
486	      TLV since they assume that the MTU is at least 1518.

488	   In summary:
489	   o  In some deployments an encapsulation can assume well-managed MTU
490	      hence no need for fragmentation and reassembly related to the
491	      encapsulation.
492	   o  Even so, it makes sense for ingress to track any ICMP packet too
493	      big addressed to ingress to be able to log any MTU
494	      misconfigurations.
495	   o  Should an encapsulation protocol be depoyed outside of the
496	      original context it might very well need support for fragmentation
497	      and reassembly.

499	10.  OAM

501	   The OAM area is seeing active development in the IETF with
502	   discussions (at least) in NVO3 and SFC working groups, plus the new
503	   LIME WG looking at architecture and YANG models.

505	   The design team has take a narrow view of OAM to explore the
506	   potential OAM implications on the encapsulation format.

508	   In terms of what we have heard from the various working groups there
509	   seem to be needs to:
510	   o  Be able to send out-of-band OAM messages - that potentially should
511	      follow the same path through the network as some flow of data
512	      packets.
513	      *  Such OAM messages should not accidentally be decapsulated and
514	         forwarded to the end stations.
515	      *  Be able to add OAM information to data packets that are
516	         encapsulated.  Discussions have been around
517	      *  Using a bit in the OAM to synchronize sampling of counters
518	         between the encapsulator and decapsulator.
519	      *  Optional timestamps, sequence numbers, etc for more detailed
520	         measurements between encapsulator and decapsulator.
521	   o  Usable for both proactive monitoring (akin to BFD) and reactive
522	      checks (akin to traceroute to pin-point a failure)

524	   To ensure that the OAM messages can follow the same path the OAM
525	   messages need to get the same ECMP (and LAG hashing) results as a
526	   given data flow.  An encapsulator can choose between one of:

528	   o  Limit ECMP hashing to not look past the UDP header i.e. the
529	      entropy needs to be in the source/destination IP and UDP ports
530	   o  Make OAM packets look the same as data packets i.e. the initial
531	      part of the OAM payload has the inner Ethernet, IP, TCP/UDP
532	      headers as a payload.  (This approach was taken in TRILL out of
533	      necessity since there is no UDP header.)  Any OAM bit in the
534	      encapsulation header must in any case be excluded from the
535	      entropy.

537	   There can be several ways to prevent OAM packets from accidentally
538	   being forwarded to the end station using:
539	   o  A bit in the frame (as in TRILL) indicating OAM
540	   o  A next-protocol indication with a designated value for "none" or
541	      "oam".
542	   This assumes that the bit or next protocol, respectively, would not
543	   affect entropy/ECMP in the underlay.  However, the next-protocol
544	   field might be used to provide differentiated treatement of packets
545	   based on their payload; for instance a TCP vs. IPsec ESP payload
546	   might be handled differently.  Based on that observation it might be
547	   undesirable to overload the next protocol with the OAM drop behavior,
548	   resulting in a preference for having a bit to indicate that the
549	   packet should be forwarded to the end station after decapsulation.

551	   There has been suggestions that one (or more) marker bits in the
552	   encaps header would be useful in order to delineate measurement
553	   epochs on the encapsulator and decapsulator and use that to compare
554	   counters to determine packet loss.

556	   A result of the above is that OAM is likely to evolve and needs some
557	   degree of extensibility from the encapsulation format; a bit or two
558	   plus the ability to define additional larger extensions.

560	   An open question is how to handle error messages or other reports
561	   relating to OAM.  One can think if such reporting as being associated
562	   with the encapsulation the same way ICMP is associated with IP.
563	   Would it make sense for the IETF to develop a common Encapsulation
564	   Error Reporting Protocol as part of OAM, which can be used for
565	   different encapsulations?  And if so, what are the technical
566	   challenges.  For instance, how to avoid it being filtered as ICMP
567	   often is?

569	   A potential additional consideration for OAM is the possible future
570	   existence of gateways that "stitch" together different dataplane
571	   encapsulations and might want to carry OAM end-to-end across the
572	   different encapsulations.

574	   In summary:

576	   o  It makes sense to reserve a bit for "drop after decapsulation" for
577	      OAM out-of-band.
578	   o  An encapsulation needs sufficient extensibility for OAM (such as
579	      bits, timestamps, sequence numbers).  That might be motivated by
580	      in-band OAM but it would make sense to leverage the same
581	      extensions for out-of band OAM.
582	   o  OAM places some constraints on use of entropy in forwarding
583	      devices.
584	   o  Should IETF look into error reporting that is independent of the
585	      specific encapsulation?

587	11.  Security Considerations

589	   Different encapsulation use cases will have different requirements
590	   around security.  For instance, when encapsulation is used to build
591	   overlay networks for network virtualization, isolation between
592	   virtual networks may be paramount.  BIER support of multicast may
593	   entail different security requirements than encapsulation for
594	   unicast.

596	   In real deployment, the security of the underlying network may be
597	   considered for determining the level of security needed in the
598	   encapsulation layer.  However for the purposes of this discussion, we
599	   assume that network security is out of scope and that the underlying
600	   network does not itself provide adequate or as least uniform security
601	   mechanisms for encapsulation.

603	   There are at least three considerations for security:
604	   o  Anti-spoofing/virtual network isolation
605	   o  Interaction with packet level security such as IPsec or DTLS
606	   o  Privacy (e.g., VNI ID confidentially for NVO3)

608	   This section uses a VNI ID in NVO3 as an example.  A SFC or BIER
609	   encapsulation is likely to have fields with similar security and
610	   privacy requirements.

612	11.1.  Encapsulation-specific considerations

614	   Some of these considerations appear for a new encapsulation, and
615	   others are more specific to network virtualization in datacenters.
616	   o  New attack vectors:
617	      *  DDOS on specific queued/paths by attempting to reproduce the
618	         5-tuple hash for targeted connections.
619	      *  Entropy in outer 5-tuple may be too little or predictable.
620	      *  Leakage of identifying information in the encapsulation header
621	         for an encrypted payload.

623	      *  Vulnerabilities of using global values in fields like VNI ID.
624	   o  Trusted versus untrusted tenants in network virtualization:
625	      *  The criticality of virtual network isolation depends on whether
626	         tenants are trusted or untrusted.  In the most extreme cases,
627	         tenants might not only be untrusted but may be considered
628	         hostile.
629	      *  For a trusted set of users (e.g. a private cloud) it may be
630	         sufficient to have just a virtual network identifier to provide
631	         isolation.  Packets inadvertently crossing virtual networks
632	         should be dropped similar to a TCP packet with a corrupted port
633	         being received on the wrong connection.
634	      *  In the presence of untrusted users (e.g. a public cloud) the
635	         virtual network identifier must be adequately protected against
636	         corruption and verified for integrity.  This case may warrant
637	         keyed integrity.
638	   o  Different forms of isolation:
639	      *  Isolation could be blocking all traffic between tenants (or
640	         except as allowed by some firewall)
641	      *  Could also be about performance isolation i.e. one tenant can
642	         overload the network in a way that affects other tenants
643	      *  Physical isolation of traffic for different tenants in network
644	         may be required, as well as required restrictions that tenants
645	         may have on where their packets may be routed.
646	   o  New attack vectors from untrusted tenants:
647	      *  Third party VMs with untrusted tenants allows internally borne
648	         attacks within data centers
649	      *  Hostile VMs inside the system may exist (e.g. public cloud)
650	      *  Internally launched DDOS
651	      *  Passive snooping for mis-delivered packets
652	      *  Mitigate damage and detection in event that a VM is able to
653	         circumvent isolation mechanisms
654	   o  Tenant-provider relationship:
655	      *  Tenant might not trust provider, hypervisors, network
656	      *  Provider likely will need to provide SLA or a least a statement
657	         on security
658	      *  Tenant may implement their own additional layers of security
659	      *  Regulation and certification consuderations
660	   o  Trend towards tighter security:
661	      *  Tenants' data in network increases in volume and value, attacks
662	         become more sophisticated
663	      *  Large DCs already encrypt everything on disk
664	      *  DCs likely to encrypt inter-DC traffic at this point, use TLS
665	         to Internet.
666	      *  Encryption within DC is becoming more commonplace, becomes
667	         ubiquitous when cost is low enough.
668	      *  Cost/performance considerations.  Cost of support for strong
669	         security has made strong network security in DCs prohibitive.

671	      *  Are there lessons from MacSec?

673	11.2.  Virtual network isolation

675	   The first requirement is isolation between virtual networks.  Packets
676	   sent in one virtual network should never be illegitimately received
677	   by a node in another virtual network.  Isolation should be protected
678	   in the presence of malicious attacks or inadvertent packet
679	   corruption.

681	   The second requirement is sender authentication.  Sender identity is
682	   authenticated to prevent anti-spoofing.  Even if an attacker has
683	   access to the packets in the network, they cannot send packets into a
684	   virtual network.  This may have two possibilities:
685	   o  Pairwise sender authentication.  Any two communicating hosts
686	      negotiate a shared key.
687	   o  Group authentication.  A group of hosts share a key (this may be
688	      more appropriate for multicast of encapsulation).

690	   Possible security solutions:
691	   o  Security cookie: This is similar to L2TP cookie mechanism
692	      [RFC3931].  A shared plain text cookie is shared between
693	      encapsulator and decapsulator.  A receiver validates a packet by
694	      evaluating if the cookie is correct for the virtual network and
695	      address of a sender.  Validation function is F(cookie, VNI ID,
696	      source addr).  If cookie matches, accept packet, else drop.  Since
697	      cookie is plain text this method does not protect against an
698	      eavesdropping.  Cookies are set and may be rotated out of band.
699	   o  Secure hash: This is a stronger mechanism than simple cookies that
700	      borrows from IPsec and PPP authentication methods.  In this model
701	      security field contains a secure hash of some fields in the packet
702	      using a shared key.  Hash function may be something like H(key,
703	      VNI ID, addrs, salt).  The salt ensures the hash is not the same
704	      for every packet, and if it includes a sequence number may also
705	      protect against replay attacks.

707	   In any use of a shared key, periodic re-keying should be allowed.
708	   This could include use of techniques like generation numbers, key
709	   windows, etc.  See [I-D.farrelll-mpls-opportunistic-encrypt] for an
710	   example application.

712	   We might see firewalls that are aware of the encapsulation and can
713	   provide some defense in depth combined with the above example anti-
714	   spoofing approaches.  An example would be an NVO3-aware firewall
715	   being able to check the VNI ID.

717	   Separately and in addition to such filtering, there might be a desire
718	   to completely block an encapsulation protocol at certain places in
719	   the network, e.g., at the edge of a datacenter.  Using a fixed
720	   standard UDP destination port number for each encapsulation protocol
721	   would facilitate such blocking.

723	11.3.  Packet level security

725	   An encapsulated packet may itself be encapsulated in IPsec (e.g.
726	   ESP).  This should be straightforward and in fact is what would
727	   happen today in security gateways.  In this case, there is no special
728	   consideration for the fact that packet is encapsulated, however since
729	   the encapsulation layer headers are included (part of encrypted data
730	   for instance) we lose visibility in the network of the encapsulation.

732	   The more interesting case is when security is applied to the
733	   encapsulation payload.  This will keep the encapsulation headers in
734	   the outer header visible to the network (for instance in nvo3 we may
735	   way to firewall based on VNI ID even if the payload is encrypted).
736	   One possibility is to apply DTLS to the encapsulation payload.  In
737	   this model the protocol stack may be something like IP|UDP|Encap|
738	   DTLS|encrypted_payload.  The encapsulation and security should be
739	   done together at an encapsulator and resolved at the decapsulator.
740	   Since the encapsulation header is outside of the security coverage,
741	   this may itself require security (like described above).

743	   In both of the above the security associations (SAs) may be between
744	   physical hosts, so for instance in nvo3 we can have packets of
745	   different virtual networks using the same SA-- this should not be an
746	   issue since it is the VNI ID that ensures isolation (which needs to
747	   be secured also).

749	11.4.  In summary:

751	   o  Encapsulations need extensibility mechanisms to be able to add
752	      security features like cookies and secure hashes protecting the
753	      encapsulation header.
754	   o  NVO3 proably has specific higher requirements relating to
755	      isolation for network virtualization, which is in scope for the
756	      NVO3 WG/
757	   o  Our collective IETF experience is that succesful protocols get
758	      deployed outside of the original intended context, hence the
759	      initial assumptions about the threat model might become invalid.
760	      That needs to be considered in the standardization of new
761	      encapsulations.

763	12.  QoS

765	   In the Internet architecture we support QoS using the Differentiated
766	   Services Code Points (DSCP) in the formerly named Type-of-Service
767	   field in the IPv4 header, and in the Traffic-Class field in the IPv6
768	   header.  The ToS and TC fields also contain the two ECN bits.

770	   We have existing specifications how to process those bits.  See
771	   [RFC2983] for diffserv handling, which specifies how the received
772	   DSCP value is used to set the DSCP value in an outer IP header when
773	   encapsulating.  (There are also existing specifications how DSCP can
774	   be mapped to layer2 priorities.)

776	   Those specifications apply whether or not there is some intervening
777	   headers (e.g., for NVO3 or SFC) between the inner and outer IP
778	   headers.  Thus the encapsulation considerations in this area are
779	   mainly about applying the framework in [RFC2983].

781	   Note that the DSCP and ECN bits are not the only part of an inner
782	   packet that might potentially affect the outer packet.  For example,
783	   [RFC2473] specifies handling of inner IPv6 hop-by-hop options that
784	   effectively result in copying some options to the outer header.  It
785	   is simpler to not have future encapsulations depend on such copying
786	   behavior.

788	   There are some other considerations specific to doing OAM for
789	   encapsulations.  If OAM messages are used to measure latency, it
790	   would make sense to treat them the same as data payloads.  Thus they
791	   need to have the same outer DSCP value as the data packets which they
792	   wish to measure.

794	   Due to OAM there are constraints on middleboxes in general.  If
795	   middleboxes inspect the packet past the outer IP+UDP and
796	   encapsulation header and look for inner IP and TCP/UDP headers, that
797	   might violate the assumption that OAM packets will be handled the
798	   same as regular data packets.  That issue is broader than just QoS -
799	   applies to firewall filters etc.

801	   In summary:
802	   o  Leverage the existing approach in [RFC2983] for DSCP handling.

804	13.  Congestion Considerations

806	   Additional encapsulation headers does not introduce anything new for
807	   Explicit Congestion Notification.  It is just like IP-in-IP and IPsec
808	   tunnels which is specified in [RFC6040] in terms of how the ECN bits
809	   in the inner and outer header are handled when encapsulating and
810	   decapsulating packets.  Thus new encapsulations can more or less
811	   include that by reference.

813	   There are additional considerations around carrying non-congestion
814	   controlled traffic.  These details have been worked out in
815	   [I-D.ietf-mpls-in-udp].  As specified in [RFC5405]: "IP-based traffic
816	   is generally assumed to be congestion-controlled, i.e., it is assumed
817	   that the transport protocols generating IP-based traffic at the
818	   sender already employ mechanisms that are sufficient to address
819	   congestion on the path Consequently, a tunnel carrying IP-based
820	   traffic should already interact appropriately with other traffic
821	   sharing the path, and specific congestion control mechanisms for the
822	   tunnel are not necessary".  Those considerations are being captured
823	   in [I-D.ietf-tsvwg-rfc5405bis].

825	   For this reason, where an encapsulation method is used to carry IP
826	   traffic that is known to be congestion controlled, the UDP tunnels
827	   does not create an additional need for congestion control.  Internet
828	   IP traffic is generally assumed to be congestion-controlled.
829	   Similarly, in general Layer 3 VPNs are carrying IP traffic that is
830	   similarly assumed to be congestion controlled.

832	   However, some of the encapsulations (at least NVO3) will be able to
833	   carry arbitrary Layer 2 packets to provide an L2 service, in which
834	   case one can not assume that the traffic is congestion controlled.

836	   One could handle this by adding some congestion control support to
837	   the encapsulation header (one instance of which would end up looking
838	   like DCCP).  However, if the underlay is well-provisioned and managed
839	   as opposed to being arbitrary Internet path, it might be sufficient
840	   to have a slower reaction to congestion induced by that traffic.
841	   There is work underway on a notion of "circuit breakers" for this
842	   purpose.  See See [I-D.ietf-tsvwg-circuit-breaker].  Encapsulations
843	   which carry arbitrary Layer 2 packets want to consider that ongoing
844	   work.

846	   If the underlay is provisioned in such a way that it can guarantee
847	   sufficient capacity for non-congestion controlled Layer 2 traffic,
848	   then such circuit breakers might not be needed.

850	   Two other considerations appear in the context of these
851	   encapsulations as applied to overlay networks:
852	   o  Protect against malicious end stations
853	   o  Ensure fairness and/or measure resource usage across multiple
854	      tenants
855	   Those issues are really orthogonal to the encapsulation, in that they
856	   are present even when no new encapsulation header is in use.
857	   However, the application of the new encapsulations are likely to be
858	   in environments where those issues are becoming more important.
859	   Hence it makes sense to consider them.

861	   One could make the encapsulation header be extensible to that it can
862	   carry sufficient information to be able to measure resource usage,
863	   delays, and congestion.  The suggestions in the OAM section about a
864	   single bit for counter synchronization, and optional timestamps
865	   and/or sequence numbers, could be part of such an approach.  There
866	   might also be additional congestion-control extensions to be carried
867	   in the encapsulation.  Overall this results in a consideration to be
868	   able to have sufficient extensibility in the encapsulation to be
869	   handle to handle potential future developments in this space.

871	   Coarse measurements are likely to suffice, at least for circuit-
872	   breaker-like purposes, see [I-D.wei-tsvwg-tunnel-congestion-feedback]
873	   and [I-D.briscoe-conex-data-centre] for examples on active work in
874	   this area via use of ECN.  [RFC6040] Appendix C is also relevant.
875	   The outer ECN bits seem sufficient (at least when everything uses
876	   ECN) to do this course measurements.  Needs some more study for the
877	   case when there are also drops; might need to exchange counters
878	   between ingress and egress to handle drops.

880	   Circuit breakers are not sufficient to make a network with different
881	   congestion control when the goal is to provide a predictable service
882	   to different tenants.  The fallback would be to rate limit different
883	   traffic.

885	   In summary:
886	   o  Leverage the existing approach in [RFC6040] for ECN handling.
887	   o  If the encapsulation can carry non-IP, hence non-congestion
888	      controlled traffic, then leverage the approach in
889	      [I-D.ietf-mpls-in-udp].
890	   o  "Watch this space" for circuit breakers.

892	14.  Header Protection

894	   Many UDP based encapsulations such as VXLAN [RFC7348] either
895	   discourage or explicitly disallow the use of UDP checksums.  The
896	   reason is that the UDP checksum covers the entire payload of the
897	   packet and switching ASICs are typically optimized to look at only a
898	   small set of headers as the packet passes through the switch.  In
899	   these case, computing a checksum over the packet is very expensive.
900	   (Software endpoints and the NICs used with them generally do not have
901	   the same issue as they need to look at the entire packet anyways.)

903	   The lack a header checksum creates the possibility that bit errors
904	   can be introduced into any information carried by the new headers.
905	   Specifically, in the case of IPv6, the assumption is that a transport
906	   layer checksum - UDP in this case - will protect the IP addresses
907	   through the inclusion of a pseudoheader in the calculation.  This is
908	   different from IPv4 on which many of these encapsulation protocols
909	   are initially deployed which contains its own header checksum.  In
910	   addition to IP addresses, the encapsulation header often contains its
911	   own information which is used for addressing packets or other high
912	   value network functions.  Without a checksum, this information is
913	   potentially vulnerable - an issue regardless of whether the packet is
914	   carried over IPv4 or IPv6.

916	   Several protocols cite [RFC6935] and [RFC6936] as an exemption to the
917	   IPv6 checksum requirements.  However, these are intended to be
918	   tailored to a fairly narrow set of circumstances - primarily relying
919	   on sparseness of the address space to detect invalid values and well
920	   managed networks - and are not a one size fits all solution.  In
921	   these cases, an analysis should be performed of the intended
922	   environment, including the probability of errors being introduced and
923	   the use of ECC memory in routing equipment.

925	   Conceptually, the ideal solution to this problem is a checksum that
926	   covers only the newly added headers of interest.  There is little
927	   value in the portion of the UDP checksum that covers the encapsulated
928	   packet because that would generally be protected by other checksums
929	   and this is the expensive portion to compute.  In fact, this solution
930	   already exists in the form of UDP-Lite and UDP based encapsulations
931	   could be easily ported to run on top of it.  Unfortunately, the main
932	   value in using UDP as part of the encapsulation header is that it is
933	   recognized by already deployed equipment for the purposes of ECMP,
934	   RSS, and middlebox operations.  As UDP-Lite uses a different protocol
935	   number than UDP and it is not widely implemented in middleboxes, this
936	   value is lost.  A possible solution is to incorporate the same
937	   partial-checksum concept as UDP-Lite or other header checksum
938	   protection into the encapsulation header and continue using UDP as
939	   the outer protocol.  One potential challenge with this approach is
940	   the use of NAT or other form of translation on the outer header will
941	   result in an invalid checksum as the translator will not know to
942	   update the encapsulation header.

944	   The method chosen to protect headers is often related to the security
945	   needs of the encapsulation mechanism.  On one hand, the impact of a
946	   poorly protected header is not limited to only data corruption but
947	   can also introduce a security vulnerability in the form of
948	   misdirected packets to an unauthorized recipient.  Conversely, high
949	   security protocols that already include a secure hash over the
950	   valuable portion of the header (such as by encrypting the entire IP
951	   packet using IPsec, or some secure hash of the encap header) do not
952	   require additional checksum protection as the hash provides stronger
953	   assurance than a simple checksum.

955	   If the sender has included a checksum, then the receiver should
956	   verify that checksum or, if incapable, drop the packet.  The
957	   assumption is that configuration and/or control-plane capability
958	   exchanges can be used when different receiver have different checksum
959	   validation capabilities.

961	   In summary:
962	   o  Encapsulations need extensibility to be able to add checksum/CRC
963	      for the encapsulation header itself.
964	   o  When the encapsulation has a checksum/CRC, include the IPv6
965	      pseudo-header in it.
966	   o  The checksum/CRC can potentially be avoided when cryptographic
967	      protection is applied to to the encapsulation.

969	15.  Extensibility Considerations

971	   Protocol extensibility is the concept that a networking protocol may
972	   be extended to include new use cases or functionality that were not
973	   part of the original protocol specification.  Extensibility may be
974	   used to add security, control, management, or performance features to
975	   a protocol.  A solution may allow private extensions for
976	   customization or experimentation.

978	   Extending a protocol often implies that a protocol header must carry
979	   new information.  There are two usual methods to accomplish this:
980	   1.  Define or redefine the meaning of existing fields in a protocol
981	       header.
982	   2.  Add new (optional) fields to the protocol header.
983	   It is also possible to create a new protocol version, but this is
984	   more associated with defining a protocol than extending it (IPv6
985	   being a successor to IPv4 is an example of protocol versioning).

987	   In some cases it might be more appropriate to define a new inner
988	   protocol which can carry the new functionality instead of extending
989	   the outer protocol.  Examples where this works well is in the IP/
990	   transport split, where the earlier architecture had a single NCP
991	   protocol which carried both the hop-by-hop semantics which are now in
992	   IP, and the end-to-end semantics which are now in TCP.  Such a split
993	   is effective when different nodes need to act upon the different
994	   information.  Applying this for general protocol extensibility
995	   through nesting is not well understood, and does result in longer
996	   header chains.  Furthermore, our experience with IPv6 extension
997	   headers [RFC2460] in middleboxes indicates that the approach does not
998	   help with middlebox traversal.

1000	   Many protocol definitions include some number of reserved fields or
1001	   bits which can be used for future extension.  VXLAN is an example of
1002	   a protocol that includes reserved bits which are subsequently being
1003	   allocated for new purposes.  Another technique employed is to
1004	   repurpose existing header fields with new meanings.  A classic
1005	   example of this is the definition of DSCP code point which redefines
1006	   the ToS field originally specified in IPv4.  When a field is
1007	   redefined, some mechanism may be needed to ensure that all interested
1008	   parties agree on the meaning of the field.  The techniques of
1009	   defining meaning for reserved bits or redefining existing fields have
1010	   the advantage that a protocol header can be kept a fixed length.  The
1011	   disadvantage is that the extensibility is limited.  For instance, the
1012	   number reserved bits in a fixed protocol header is limited.  For
1013	   standard protocols the decision to commit to a definition for a field
1014	   can be wrenching since it is difficult to retract later.  Also, it is
1015	   difficult to predict a priori how many reserved fields or bits to put
1016	   into a protocol header to satisfy the extensions create over the
1017	   lifetime of the protocol.

1019	   Extending a protocol header with new fields can be done in several
1020	   ways.
1021	   o  TLVs are a very popular method used in such protocols as IP and
1022	      TCP.  Depending on the type field size and structure, TLVs can
1023	      offer a virtually unlimited range of extensions.  A disadvantage
1024	      of TLVs is that processing them can be verbose, quite complicated,
1025	      several validations must often be done for each TLV, and there is
1026	      no deterministic ordering for a list of TLVs.  TCP serves as an
1027	      example of a protocol where TLVs have been successfully used (i.e.
1028	      required for protocol operation).  IP is an example of a protocol
1029	      that allows TLVs but are rarely used in practice (router fast
1030	      paths usually that assume no IP options).  Note that TCP TLVs are
1031	      implemented in software as well as (NIC) hardware handling various
1032	      forms of TCP offload.
1033	   o  Extension headers are closely related to TLVs.  These also carry
1034	      type/value information, but instead of being a list of TLVs within
1035	      a single protocol header, each one is in its own protocol header.
1036	      IPv6 extension headers and SFC NSH are examples of this technique.
1037	      Similar to TLVs these offer a wide range of extensibility, but
1038	      have similarly complex processing.  Another difference with TLVs
1039	      is that each extension header is idempotent.  This is beneficial
1040	      in cases where a protocol implements a push/pop model for header
1041	      elements like service chaining, but makes it more difficult group
1042	      correlated information within one protocol header.
1043	   o  A particular form of extension headers are the tags used by IEEE
1044	      802 protocols.  Those are similar to e.g., IPv6 extension headers
1045	      but with the key difference that each tag is a fixed length header
1046	      where the length is implicit in the tag value.  Thus as long as a
1047	      receiver can be programmed with a tag value to length map, it can
1048	      skip those new tags.

1050	   o  Flag-fields are a non-TLV like method of extending a protocol
1051	      header.  The basic idea is that the header contains a set of
1052	      flags, where each set flags corresponds to optional field that is
1053	      present in the header.  GRE is an example of a protocol that
1054	      employs this mechanism.  The fields are present in the header in
1055	      the order of the flags, and the length of each field is fixed.
1056	      Flag-fields are simpler to process compared to TLVs, having fewer
1057	      validations and the order of the optional fields is deterministic.
1058	      A disadvantage is that range of possible extensions with flag-
1059	      fields is smaller than TLVs.

1061	   The requirements for receiving unknown or unimplemented extensible
1062	   elements in an encapsulation protocol (flags, TLVs, optional fields)
1063	   need to be specified.  There are two parties to consider, middle
1064	   boxes and terminal endpoints of encapsulation (at the decapsulator).

1066	   A protocol may allow or expect nodes in a path to modify fields in an
1067	   encapsulation (example use of this is BIER).  In this case, the
1068	   middleboxes should follow the same requirements as nodes terminating
1069	   the encapsulation.  In the case that middle boxes do not modify the
1070	   encapsulation, we can assume that they may still inspect any fields
1071	   of the encapsulation.  Missing or unknown fields should be accepted
1072	   per protocol specification, however it is permissible for a site to
1073	   implement a local policy otherwise (e.g. a firewall may drop packets
1074	   with unknown options).

1076	   For handling unknown options at terminal nodes, there are two
1077	   possibilities: drop packet or accept while ignoring the unknown
1078	   options.  Many Internet protocols specify that reserved flags must be
1079	   set to zero on transmission and ignored on reception.  L2TP is
1080	   example data protocol that has such flags.  GRE is a notable
1081	   exception to this rule, reserved flag bits 1-5 cannot be ignored
1082	   [RFC2890].  For TCP and IPv4, implementations must ignore optional
1083	   TLVs with unknown type; however in IPv6 if a packet contains an
1084	   unknown extension header (unrecognized next header type) the packet
1085	   must be dropped with an ICMP error message returned.  The IPv6
1086	   options themselves (encoded inside the destinations options or hop-
1087	   by-hop options extension header) have more flexibility.  There bits
1088	   in the option code are used to instruct the receiver whether to
1089	   ignore, silently drop, or drop and send error if the option is
1090	   unknown.  Some protocols define a "mandatory bit" that can is set
1091	   with TLVs to indicate that an option must not be ignored.
1092	   Conceptually, optional data elements can only be ignored if they are
1093	   idempotent and do not alter how the rest of the packet is parsed or
1094	   processed.

1096	   Depending on what type of protocol evolution one can predict, it
1097	   might make sense to have an way for a sender to express that the
1098	   packet should be dropped by a terminal node which does not understand
1099	   the new information.  In other cases it would make sense to have the
1100	   receiver silently ignore the new info.  The former can be expressed
1101	   by having a version field in the encapsulation, or a notion of
1102	   "mandatory bit" as discussed above.

1104	   A security mechanism which use some form secure hash over the
1105	   encapsulation header would need to be able to know which extensions
1106	   can be changed in flight.

1108	   In summary:
1109	   o  Encapsulations need the ability to be extended to handle e.g., the
1110	      OAM or security aspects discussed in this document.
1111	   o  Practical experience seems to tell us that extensibility
1112	      mechanisms which are not in use on day one might result in
1113	      immediate ossification by lack of implementation support.  In some
1114	      cases that has occurred in routers and in other cases in
1115	      middleboxes.  Hence devising ways where the extensibility
1116	      mechanisms are in use seems important.

1118	16.  Layering Considerations

1120	   One can envision that SFC might use NVO3 as a delivery/transport
1121	   mechanism.  With more imagination that in turn might be delivered
1122	   using BIER.  Thus it is useful to think about what things look like
1123	   when we have BIER+NVO3+SFC+payload.  Also, if NVO3 is widely deployed
1124	   there might be cases of NVO3 nesting where a customer uses NVO3 to
1125	   provide network virtualization e.g., across departments.  That
1126	   customer uses a service provider which happens to use NVO3 to provide
1127	   transport for their customers.Thus NVO3 in NVO3 might happen.

1129	   A key question we set out to answer is what the packets might look
1130	   like in such a case, and in particular whether we would end up with
1131	   multiple UDP headers for entropy.

1133	   Based on the discussion in the Entropy section, the entropy is
1134	   associated with the outer delivery IP header.  Thus if there are
1135	   multiple IP headers there would be a UDP header for each one of the
1136	   IP headers.  But SFC does not require its own IP header.  So a case
1137	   of NVO3+SFC would be IP+UDP+NVO3+SFC.  A nested NVO3 encapsulation
1138	   would have independent IP+UDP headers.

1140	   The layering also has some implications for middleboxes.
1141	   o  A device on the path between the ingress and egress is allowed to
1142	      transparently inspect all layers of the protocol stack and drop or
1143	      forward, but not transparently modify anything but the layer in
1144	      which they operate.  What this means is that an IP router is
1145	      allowed modify the outer IP ttl and ECN bits, but not the
1146	      encapsulation header or inner headers and payload.  And a BIER
1147	      router is allowed to modify the BIER header.
1148	   o  Alternatively such a device can become visible at a higher layer.
1149	      E.g., a middlebox could become an decapsulate + function +
1150	      encapsulate which means it will generate a new encapsulation
1151	      header.

1153	   The design team asked itself some additional questions:
1154	   o  Would it make sense to have a common encapsulation base header
1155	      (for OAM, security?, etc) and then followed by the specific
1156	      information for NVO3, SFC, BIER?  Given that there are separate
1157	      proposals and the set of information needing to be carried
1158	      differs, and the extensibility needs might be different, it would
1159	      be difficult and not that useful to have a common base header.
1160	   o  With a base header in place, one could view the different
1161	      functions (NVO3, SFC, and BIER) as different extensions to that
1162	      base header resulting in encodings which are more space optimal by
1163	      not repeating the same base header.  The base header would only be
1164	      repeated when there is an additional IP (and hence UDP) header.
1165	      That could mean a single length field (to skip to get to the
1166	      payload after all the encapsulation headers).  That might be
1167	      technically feasible, but it would create a lot of dependencies
1168	      between different WGs making it harder to make progress.  Compare
1169	      with the potential savings in packet size.

1171	17.  Service model

1173	   The IP service is lossy and subject to reordering.  In order to avoid
1174	   a performance impact on transports like TCP the handling of packets
1175	   is designed to avoid reordering packets that are in the same
1176	   transport flow (which is typically identified by the 5-tuple).  But
1177	   across such flows the receiver can see different ordering for a given
1178	   sender.  That is the case for a unicast vs. a multicast flow from the
1179	   same sender.

1181	   There is a general tussle between the desire for high capacity
1182	   utilization across a multipath network and the import on packet
1183	   ordering within the same flow (which results in lower transport
1184	   protocol performance).  That isn't affected by the introduction of an
1185	   encapsulation.  However, the encapsulation comes with some entropy,
1186	   and there might be cases where folks want to change that in response
1187	   to overload or failures.  For instance, might want to change UDP
1188	   source port to try different ECMP route.  Such changes can result in
1189	   packet reordering within a flow, hence would need to be done
1190	   infrequently and with care e.g., by identifying packet trains.

1192	   There might be some applications/services which are not able to
1193	   handle reordering across flows.  The IETF has defined pseudo-wires
1194	   [RFC3985] which provides the ability to ensure ordering (implemented
1195	   using sequence numbers and/or timestamps).

1197	   Architectural such services would make sense, but as a separate layer
1198	   on top of an encapsulation protocol.  They could be deployed between
1199	   ingress and egress of a tunnel which uses some encaps.  Potentially
1200	   the tunnel control points at the ingress and egress could become a
1201	   platform for fixing suboptimal behavior elsewhere in the network.
1202	   That would clearly be undesirable in the general case.  However,
1203	   handling encapsulation of non-IP traffic hence non-congestion-
1204	   controlled traffic is likely to be required, which implies some
1205	   fairness and/or QoS policing on the ingress and egress devices.

1207	   But the tunnels could potentially do more like increase reliability
1208	   (retransmissions, FEC) or load spreading using e.g.  MP-TCP between
1209	   ingress and egress.

1211	18.  Hardware Friendly

1213	   Hosts, switches and routers often leverage capabilities in the
1214	   hardware to accelerate packet encapsulation, decapsulation and
1215	   forwarding.

1217	   Some design considerations in encapsulation that leverage these
1218	   hardware capabilities may result in more efficiently packet
1219	   processing and higher overall protocol throughput.

1221	   While "hardware friendliness" can be viewed as unnecessary
1222	   considerations for a design, part of the motivation for considering
1223	   this is ease of deployment; being able to leverage existing NIC and
1224	   switch chips for at least a useful subset of the functionality that
1225	   the new encapsulation provides.  The other part is the ease of
1226	   implementing new NICs and switch/router chips that support the
1227	   encapsulation at ever increasing line rates.

1229	   [disclaimer] There are many different types of hardware in any given
1230	   network, each maybe better at some tasks while worse at others.  We
1231	   would still recommend protocol designers to examine the specific
1232	   hardware that are likely to be used in their networks and make
1233	   decisions on a case by case basis.

1235	   Some considerations are:
1236	   o  Keep the encap header small.  Switches and routers usually only
1237	      read the first small number of bytes into the fast memory for
1238	      quick processing and easy manipulation.  The bulk of the packets
1239	      are usually stored in slow memory.  A big encap header may not fit
1240	      and additional read from the slow memory will hurt the overall
1241	      performance and throughput.
1242	   o  Put important information at the beginning of the encapsulation
1243	      header.  The reasoning is similar as explained in the previous
1244	      point.  If important information are located at the beginning of
1245	      the encapsulation header, the packet may be processed with smaller
1246	      number of bytes to be read into the fast memory and improve
1247	      performance.
1248	   o  Avoid full packet checksums in the encapsulation if possible.
1249	      Encapsulations should instead consider adding their own checksum
1250	      which covers the encapsulation header and any IPv6 pseudo-header.
1251	      The motivation is that most of the switch/router hardware make
1252	      switching/forwarding decisions by reading and examining only the
1253	      first certain number of bytes in the packet.  Most of the body of
1254	      the packet do not need to be processed normally.  If we are
1255	      concerned of preventing packet to be misdelivered due to memory
1256	      errors, consider only perform header checksums.  Note that NIC
1257	      chips can typically already do full packet checksums for TCP/UDP,
1258	      while adding a header checksum might require adding some hardware
1259	      support.
1260	   o  Place important information at fixed offset in the encapsulation
1261	      header.  Packet processing hardware may be capable of parallel
1262	      processing.  If important information can be found at fixed
1263	      offset, different part of the encapsulation header may be
1264	      processed by different hardware units in parallel (for example
1265	      multiple table lookups may be launched in parallel).  It is easier
1266	      for hardware to handle optional information when the information,
1267	      if present, can be found in ideally one place, but in general, in
1268	      as few places as possible.  That facilitates parallel processing.
1269	      TLV encoding with unconstrained order typically does not have that
1270	      property.
1271	   o  Limit the number of header combinations.  In many cases the
1272	      hardware can explore different combinations of headers in
1273	      parallel, however there is some added cost for this.

1275	18.1.  Considerations for NIC offload

1277	   This section provides guidelines to provide support of common
1278	   offloads for encapsulation in Network Interface Cards (NICs).
1279	   Offload mechanisms are techniques that are implemented separately
1280	   from the normal protocol implementation of a host networking stack
1281	   and are intended to optimize or speed up protocol processing.
1282	   Hardware offload is performed within a NIC device on behalf of a
1283	   host.

1285	   There are three basic offload techniques of interest:

1287	   o  Receive multi queue
1288	   o  Checksum offload
1289	   o  Segmentation offload

1291	18.1.1.  Receive multi-queue

1293	   Contemporary NICs support multiple receive descriptor queues (multi-
1294	   queue).  Multi-queue enables load balancing of network processing for
1295	   a NIC across multiple CPUs.  On packet reception, a NIC must select
1296	   the appropriate queue for host processing.  Receive Side Scaling
1297	   (RSS) is a common method which uses the flow hash for a packet to
1298	   index an indirection table where each entry stores a queue number.

1300	   UDP encapsulation, where the source port is used for entropy, should
1301	   be compatible with multi-queue NICs that support five-tuple hash
1302	   calculation for UDP/IP packets as input to RSS.  The source port
1303	   ensures classification of the encapsulated flow even in the case that
1304	   the outer source and destination addresses are the same for all flows
1305	   (e.g. all flows are going over a single tunnel).

1307	18.1.2.  Checksum offload

1309	   Many NICs provide capabilities to calculate standard ones complement
1310	   payload checksum for packets in transmit or receive.  When using
1311	   encapsulation over UDP there are at least two checksums that may be
1312	   of interest: the encapsulated packet's transport checksum, and the
1313	   UDP checksum in the outer header.

1315	18.1.2.1.  Transmit checksum offload

1317	   NICs may provide a protocol agnostic method to offload transmit
1318	   checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with
1319	   UDP encapsulation.  In this method the host provides checksum related
1320	   parameters in a transmit descriptor for a packet.  These parameters
1321	   include the starting offset of data to checksum, the length of data
1322	   to checksum, and the offset in the packet where the computed checksum
1323	   is to be written.  The host initializes the checksum field to pseudo
1324	   header checksum.  In the case of encapsulated packet, the checksum
1325	   for an encapsulated transport layer packet, a TCP packet for
1326	   instance, can be offloaded by setting the appropriate checksum
1327	   parameters.

1329	   NICs typically can offload only one transmit checksum per packet, so
1330	   simultaneously offloading both an inner transport packet's checksum
1331	   and the outer UDP checksum is likely not possible.  In this case
1332	   setting UDP checksum to zero (per above discussion) and offloading
1333	   the inner transport packet checksum might be acceptable.

1335	   There is a proposal in [I-D.herbert-remotecsumoffload] to leverage
1336	   NIC checksum offload when an encapsulator is co-resident with a host.

1338	18.1.2.2.  Receive checksum offload

1340	   Protocol encapsulation is compatible with NICs that perform a
1341	   protocol agnostic receive checksum (CHECKSUM_COMPLETE in Linux
1342	   parlance).  In this technique, a NIC computes a ones complement
1343	   checksum over all (or some predefined portion) of a packet.  The
1344	   computed value is provided to the host stack in the packet's receive
1345	   descriptor.  The host driver can use this checksum to "patch up" and
1346	   validate any inner packet transport checksum, as well as the outer
1347	   UDP checksum if it is non-zero.

1349	   Many legacy NICs don't provide checksum-complete but instead provide
1350	   an indication that a checksum has been verified (CHECKSUM_UNNECESSARY
1351	   in Linux).  Usually, such validation is only done for simple TCP/IP
1352	   or UDP/IP packets.  If a NIC indicates that a UDP checksum is valid,
1353	   the checksum-complete value for the UDP packet is the "not" of the
1354	   pseudo header checksum.  In this way, checksum-unnecessary can be
1355	   converted to checksum-complete.  So if the NIC provides checksum-
1356	   unnecessary for the outer UDP header in an encapsulation, checksum
1357	   conversion can be done so that the checksum-complete value is derived
1358	   and can be used by the stack to validate an checksums in the
1359	   encapsulated packet.

1361	18.1.3.  Segmentation offload

1363	   Segmentation offload refers to techniques that attempt to reduce CPU
1364	   utilization on hosts by having the transport layers of the stack
1365	   operate on large packets.  In transmit segmentation offload, a
1366	   transport layer creates large packets greater than MTU size (Maximum
1367	   Transmission Unit).  It is only at much lower point in the stack, or
1368	   possibly the NIC, that these large packets are broken up into MTU
1369	   sized packet for transmission on the wire.  Similarly, in receive
1370	   segmentation offload, small packets are coalesced into large, greater
1371	   than MTU size packets at a point low in the stack receive path or
1372	   possibly in a device.  The effect of segmentation offload is that the
1373	   number of packets that need to be processed in various layers of the
1374	   stack is reduced, and hence CPU utilization is reduced.

1376	18.1.3.1.  Transmit Segmentation Offload

1378	   Transmit Segmentation Offload (TSO) is a NIC feature where a host
1379	   provides a large (larger than MTU size) TCP packet to the NIC, which
1380	   in turn splits the packet into separate segments and transmits each
1381	   one.  This is useful to reduce CPU load on the host.

1383	   The process of TSO can be generalized as:
1384	   o  Split the TCP payload into segments which allow packets with size
1385	      less than or equal to MTU.
1386	   o  For each created segment:
1387	      1.  Replicate the TCP header and all preceding headers of the
1388	          original packet.
1389	      2.  Set payload length fields in any headers to reflect the length
1390	          of the segment.
1391	      3.  Set TCP sequence number to correctly reflect the offset of the
1392	          TCP data in the stream.
1393	      4.  Recompute and set any checksums that either cover the payload
1394	          of the packet or cover header which was changed by setting a
1395	          payload length.

1397	   Following this general process, TSO can be extended to support TCP
1398	   encapsulation UDP.  For each segment the Ethernet, outer IP, UDP
1399	   header, encapsulation header, inner IP header if tunneling, and TCP
1400	   headers are replicated.  Any packet length header fields need to be
1401	   set properly (including the length in the outer UDP header), and
1402	   checksums need to be set correctly (including the outer UDP checksum
1403	   if being used).

1405	   To facilitate TSO with encapsulation it is recommended that optional
1406	   fields should not contain values that must be updated on a per
1407	   segment basis-- for example an encapsulation header should not
1408	   include checksums, lengths, or sequence numbers that refer to the
1409	   payload.  If the encapsulation header does not contain such fields
1410	   then the TSO engine only needs to copy the bits in the encapsulation
1411	   header when creating each segment and does not need to parse the
1412	   encapsulation header.

1414	18.1.3.2.  Large Receive Offload

1416	   Large Receive Offload (LRO) is a NIC feature where packets of a TCP
1417	   connection are reassembled, or coalesced, in the NIC and delivered to
1418	   the host as one large packet.  This feature can reduce CPU
1419	   utilization in the host.

1421	   LRO requires significant protocol awareness to be implemented
1422	   correctly and is difficult to generalize.  Packets in the same flow
1423	   need to be unambiguously identified.  In the presence of tunnels or
1424	   network virtualization, this may require more than a five-tuple match
1425	   (for instance packets for flows in two different virtual networks may
1426	   have identical five-tuples).  Additionally, a NIC needs to perform
1427	   validation over packets that are being coalesced, and needs to
1428	   fabricate a single meaningful header from all the coalesced packets.

1430	   The conservative approach to supporting LRO for encapsulation would
1431	   be to assign packets to the same flow only if they have identical
1432	   five-tuple and were encapsulated the same way.  That is the outer IP
1433	   addresses, the outer UDP ports, encapsulated protocol, encapsulation
1434	   headers, and inner five tuple are all identical.

1436	18.1.3.3.  In summary:

1438	   In summary, for NIC offload:
1439	   o  The considerations for using full UDP checksums are different for
1440	      NIC offload than for implementations in forwarding devices like
1441	      routers and switches.
1442	   o  Be judicious about encapsulations that change fields on a per-
1443	      packet basis, since such behavior might make it hard to use TSO.

1445	19.  Middlebox Considerations

1447	   This document has touched upon middleboxes in different section.  The
1448	   reason for this is as encapsulations get widely deployed one would
1449	   expect different forms of middleboxes might become aware of the
1450	   encapsulation protocol just as middleboxes have been made aware of
1451	   other protocols where there are business and deployment
1452	   opportunities.  Such middleboxes are likely to do more than just drop
1453	   packets based on the UDP port number used by an encapsulation
1454	   protocol.

1456	   We note that various forms of encapsulation gateways that stitch one
1457	   encapsulation protocol together with another form of protocol could
1458	   have similar effects.

1460	   An example of a middlebox that could see some use would be an NVO3-
1461	   aware firewall that would filter on the VNI IDs to provide some
1462	   defense in depth inside or across NVO3 datacenters.

1464	   A question for the IETF is whether we should document what to do or
1465	   what not to do in such middleboxes.  This document touches on areas
1466	   of OAM and ECMP as it relates to middleboxes and it might make sense
1467	   to document how encapsulation-aware middleboxes should avoid
1468	   unintended consequences in those (and perhaps other) areas.

1470	   In summary:
1471	   o  We are likely to see middleboxes that at least parse the headers
1472	      for succesful new encapsulations.
1473	   o  Should the IETF document considerations for what not to do in such
1474	      middleboxes?

1476	20.  Related Work

1478	   The IETF and industry has defined encapsulations for a long time,
1479	   with examples like GRE [RFC2890], VXLAN [RFC7348], and NVGRE
1480	   [I-D.sridharan-virtualization-nvgre] being able to carry arbitrary
1481	   Ethernet payloads, and various forms of IP-in-IP and IPsec
1482	   encapsulations that can carry IP packets.  As part of NVO3 there has
1483	   been additional proposals like Geneve [I-D.gross-geneve] and GUE
1484	   [I-D.herbert-gue] which look at more extensibility.  NSH
1485	   [I-D.quinn-sfc-nsh] is an example of an encapsulation that tries to
1486	   provide extensibility mechanisms which target both hardware and
1487	   software implementations.

1489	   There is also a large body of work around MPLS encapsulations
1490	   [RFC3032].  The MPLS-in-UDP work [I-D.ietf-mpls-in-udp] and GRE over
1491	   UDP [I-D.ietf-tsvwg-gre-in-udp-encap] have worked on some of the
1492	   common issues around checksum and congestion control.  MPLS also
1493	   introduced a entropy label [RFC6790].  There is also a proposal for
1494	   MPLS encryption [I-D.farrelll-mpls-opportunistic-encrypt].

1496	   The idea to use a UDP encapsulation with a UDP source port for
1497	   entropy for the underlay routers' ECMP dates back to LISP [RFC6830].

1499	   The pseudo-wire work [RFC3985] is interesting in the notion of
1500	   layering additional services/characteristics such as ordered delivery
1501	   or timely deliver on top of an encapsulation.  That layering approach
1502	   might be useful for the new encapsulations as well.  For instance,
1503	   the control word [RFC4385].  There is also material on congestion
1504	   control for pseudo-wires in [I-D.ietf-pwe3-congcons].

1506	   Both MPLS and L2TP [RFC3931] rely on some control or signaling to
1507	   establish state (for the path/labels in the case of MPLS, and for the
1508	   session in the case of L2TP).  The NVO3, SFC, and BIER encapsulations
1509	   will also have some separation between the data plane and control
1510	   plane, but the type of separation appears to be different.

1512	   IEEE 802.1 has defined encapsulations for L2 over L2, in the form of
1513	   Provider backbone Bridging (PBB) [IEEE802.1Q-2014] and Equal Cost
1514	   Multipath (ECMP) [IEEE802.1Q-2014].  The latter includes something
1515	   very similar to the way the UDP source port is used as entropy: "The
1516	   flow hash, carried in an F-TAG, serves to distinguish frames
1517	   belonging to different flows and can be used in the forwarding
1518	   process to distribute frames over equal cost paths"

1520	   TRILL, which is also a L2 over L2 encapsulation, took a different
1521	   approach to entropy but preserved the ability for OAM frames
1522	   [RFC7174] to use the same entropy hence ECMP path as data frames.  In
1523	   [I-D.ietf-trill-oam-fm] there 96 bytes of headers for entropy in the
1524	   OAM frames, followed by the actual OAM content.  This ensures that
1525	   any headers, which fit in those 96 bytes except the OAM bit in the
1526	   TRILL header, can be used for ECMP hashing.

1528	   As encapsulations evolve there might be a desire to fit multiple
1529	   inner packets into one outer packet.  The work in
1530	   [I-D.saldana-tsvwg-simplemux] might be interesting for that purpose.

1532	21.  Acknowledgements

1534	   The authors acknowledge the comments from Alia Atlas, Fred Baker,
1535	   David Black, Bob Briscoe, Stewart Bryant, Mike Cox, Andy Malis, Radia
1536	   Perlman, Michael Smith, and Lucy Yong.

1538	22.  Open Issues

1540	   o  Middleboxes:
1541	      *  Due to OAM there are constraints on middleboxes in general.  If
1542	         middleboxes inspect the packet past the outer IP+UDP and
1543	         encapsulation header and look for inner IP and TCP/UDP headers,
1544	         that might violate the assumption that OAM packets will be
1545	         handled the same as regular data packets.  That issue is
1546	         broader than just QoS - applies to firewall filters etc.
1547	      *  Firewalls looking at inner payload?  How does that work for OAM
1548	         frames?  Even if it only drops ...  TRILL approach might be an
1549	         option?  Would that encourage more middleboxes making the
1550	         network more fragile?
1551	      *  Editorially perhaps we should pull the above two into a
1552	         separate section about middlebox considerations?
1553	   o  Next-protocol indication - should it be common across different
1554	      encapsulation headers?  We will have different ways to indicate
1555	      the presence of the first encapsulation header in a packet (could
1556	      be a UDP destination port, an Ethernet type, etc depending on the
1557	      outer delivery header).  But for the next protocol past an
1558	      encapsulation header one could envision creating or adoption a
1559	      common scheme.  Such a would also need to be able to identify
1560	      following headers like Ethernet, IPv4/IPv6, ESP, etc.
1561	   o  Common OAM error reporting protocol?
1562	   o  There is discussion about timestamps, sequence numbers, etc in
1563	      three different parts of the document.  OAM, Congestion
1564	      Considerations, and Service Model, where the latter argues that a
1565	      pseudo-wire service should really be layered on top of the
1566	      encapsulation using its own header.  Those recommendations seem to
1567	      be at odds with each other.  Do we envision sequence numbers,
1568	      timestamps, etc as potential extensions for OAM and CC?  If so,
1569	      those extensions could be used to provide a service which doesn't
1570	      reorder packets.

1572	23.  Change Log

1574	   The changes from draft-rtg-dt-encap-01 based on feedback at the
1575	   Dallas IETF meeting:
1576	   o  Setting the context that not all common issues might apply to all
1577	      encapsulations, but that they should all be understood before
1578	      being dismissed.
1579	   o  Clarified that IPv6 flow label is useful for entropy in
1580	      combination with a UDP source port.
1581	   o  Editorially added a "summary" set of bullets to most sections.
1582	   o  Editorial clarifications in the next protocol section to more
1583	      clearly state the three areas.
1584	   o  Folded the two next protocol sections into one.
1585	   o  Mention the MPLS first nibble issue in the next protocol section.
1586	   o  Mention that viewing the next protocol as an index to a table with
1587	      processing instructions can provide additional flexibility in the
1588	      protocol evolution.
1589	   o  For the OAM "don't forward to end stations" added that defining a
1590	      bit seems better than using a special next-protocol value.
1591	   o  Added mention of DTLS in addition to IPsec for security.
1592	   o  Added some mention of IPv6 hob-by-hop options of other headers
1593	      than potentially can be copied from inner to outer header.
1594	   o  Added text on architectural considerations when it might make
1595	      sense to define an additional header/protocol as opposed to using
1596	      the extensibility mechanism in the existing encapsulation
1597	      protocol.
1598	   o  Clarified the "unconstrained TLVs" in the hardware friendly
1599	      section.
1600	   o  Clarified the text around checksum verification and full vs.
1601	      header checksums.
1602	   o  Added wording that the considerations might apply for encaps
1603	      outside of the routing area.
1604	   o  Added references to draft-ietf-pwe3-congcons,
1605	      draft-ietf-tsvwg-rfc5405bis, RFC2473, and RFC7325
1606	   o  Removed reference to RFC3948.
1607	   o  Updated the acknowledgements section.
1608	   o  Added this change log section.

1610	24.  References

1612	24.1.  Normative References

1614	   [I-D.ietf-tsvwg-rfc5405bis]
1615	              Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
1616	              Guidelines", draft-ietf-tsvwg-rfc5405bis-02 (work in
1617	              progress), April 2015.

1619	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1620	              (IPv6) Specification", RFC 2460, December 1998.

1622	   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
1623	              IPv6 Specification", RFC 2473, December 1998.

1625	   [RFC2890]  Dommety, G., "Key and Sequence Number Extensions to GRE",
1626	              RFC 2890, September 2000.

1628	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1629	              RFC 2983, October 2000.

1631	   [RFC3032]  Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
1632	              Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
1633	              Encoding", RFC 3032, January 2001.

1635	   [RFC3931]  Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling
1636	              Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.

1638	   [RFC3985]  Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-
1639	              Edge (PWE3) Architecture", RFC 3985, March 2005.

1641	   [RFC4385]  Bryant, S., Swallow, G., Martini, L., and D. McPherson,
1642	              "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
1643	              Use over an MPLS PSN", RFC 4385, February 2006.

1645	   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
1646	              for Application Designers", BCP 145, RFC 5405,
1647	              November 2008.

1649	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
1650	              Notification", RFC 6040, November 2010.

1652	   [RFC6790]  Kompella, K., Drake, J., Amante, S., Henderickx, W., and
1653	              L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
1654	              RFC 6790, November 2012.

1656	   [RFC6830]  Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The
1657	              Locator/ID Separation Protocol (LISP)", RFC 6830,
1658	              January 2013.

1660	   [RFC6935]  Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and
1661	              UDP Checksums for Tunneled Packets", RFC 6935, April 2013.

1663	   [RFC6936]  Fairhurst, G. and M. Westerlund, "Applicability Statement
1664	              for the Use of IPv6 UDP Datagrams with Zero Checksums",
1665	              RFC 6936, April 2013.

1667	   [RFC7174]  Salam, S., Senevirathne, T., Aldrin, S., and D. Eastlake,
1668	              "Transparent Interconnection of Lots of Links (TRILL)
1669	              Operations, Administration, and Maintenance (OAM)
1670	              Framework", RFC 7174, May 2014.

1672	   [RFC7325]  Villamizar, C., Kompella, K., Amante, S., Malis, A., and
1673	              C. Pignataro, "MPLS Forwarding Compliance and Performance
1674	              Requirements", RFC 7325, August 2014.

1676	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1677	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
1678	              eXtensible Local Area Network (VXLAN): A Framework for
1679	              Overlaying Virtualized Layer 2 Networks over Layer 3
1680	              Networks", RFC 7348, August 2014.

1682	   [RFC7364]  Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L.,
1683	              and M. Napierala, "Problem Statement: Overlays for Network
1684	              Virtualization", RFC 7364, October 2014.

1686	   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
1687	              Rekhter, "Framework for Data Center (DC) Network
1688	              Virtualization", RFC 7365, October 2014.

1690	24.2.  Informative References

1692	   [I-D.briscoe-conex-data-centre]
1693	              Briscoe, B. and M. Sridharan, "Network Performance
1694	              Isolation in Data Centres using Congestion Policing",
1695	              draft-briscoe-conex-data-centre-02 (work in progress),
1696	              February 2014.

1698	   [I-D.farrelll-mpls-opportunistic-encrypt]
1699	              Farrel, A. and S. Farrell, "Opportunistic Security in MPLS
1700	              Networks", draft-farrelll-mpls-opportunistic-encrypt-04
1701	              (work in progress), January 2015.

1703	   [I-D.gross-geneve]
1704	              Gross, J., Sridhar, T., Garg, P., Wright, C., Ganga, I.,
1705	              Agarwal, P., Duda, K., Dutt, D., and J. Hudson, "Geneve:
1706	              Generic Network Virtualization Encapsulation",
1707	              draft-gross-geneve-02 (work in progress), October 2014.

1709	   [I-D.herbert-gue]
1710	              Herbert, T., Yong, L., and O. Zia, "Generic UDP
1711	              Encapsulation", draft-herbert-gue-03 (work in progress),
1712	              March 2015.

1714	   [I-D.herbert-remotecsumoffload]
1715	              Herbert, T., "Remote checksum offload for encapsulation",
1716	              draft-herbert-remotecsumoffload-01 (work in progress),
1717	              November 2014.

1719	   [I-D.ietf-mpls-in-udp]
1720	              Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black,
1721	              "Encapsulating MPLS in UDP", draft-ietf-mpls-in-udp-11
1722	              (work in progress), January 2015.

1724	   [I-D.ietf-nvo3-arch]
1725	              Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
1726	              Narten, "An Architecture for Overlay Networks (NVO3)",
1727	              draft-ietf-nvo3-arch-03 (work in progress), March 2015.

1729	   [I-D.ietf-pwe3-congcons]
1730	              Stein, Y., Black, D., and B. Briscoe, "Pseudowire
1731	              Congestion Considerations", draft-ietf-pwe3-congcons-02
1732	              (work in progress), July 2014.

1734	   [I-D.ietf-sfc-architecture]
1735	              Halpern, J. and C. Pignataro, "Service Function Chaining
1736	              (SFC) Architecture", draft-ietf-sfc-architecture-08 (work
1737	              in progress), May 2015.

1739	   [I-D.ietf-sfc-problem-statement]
1740	              Quinn, P. and T. Nadeau, "Service Function Chaining
1741	              Problem Statement", draft-ietf-sfc-problem-statement-13
1742	              (work in progress), February 2015.

1744	   [I-D.ietf-trill-oam-fm]
1745	              Senevirathne, T., Finn, N., Salam, S., Kumar, D.,
1746	              Eastlake, D., Aldrin, S., and L. Yizhou, "TRILL Fault
1747	              Management", draft-ietf-trill-oam-fm-11 (work in
1748	              progress), October 2014.

1750	   [I-D.ietf-tsvwg-circuit-breaker]
1751	              Fairhurst, G., "Network Transport Circuit Breakers",
1752	              draft-ietf-tsvwg-circuit-breaker-01 (work in progress),
1753	              March 2015.

1755	   [I-D.ietf-tsvwg-gre-in-udp-encap]
1756	              Crabbe, E., Yong, L., Xu, X., and T. Herbert, "GRE-in-UDP
1757	              Encapsulation", draft-ietf-tsvwg-gre-in-udp-encap-06 (work
1758	              in progress), March 2015.

1760	   [I-D.ietf-tsvwg-port-use]
1761	              Touch, J., "Recommendations on Using Assigned Transport
1762	              Port Numbers", draft-ietf-tsvwg-port-use-11 (work in
1763	              progress), April 2015.

1765	   [I-D.quinn-sfc-nsh]
1766	              Quinn, P., Guichard, J., Surendra, S., Smith, M.,
1767	              Henderickx, W., Nadeau, T., Agarwal, P., Manur, R.,
1768	              Chauhan, A., Halpern, J., Majee, S., Elzur, U., Melman,
1769	              D., Garg, P., McConnell, B., Wright, C., and K. Kevin,
1770	              "Network Service Header", draft-quinn-sfc-nsh-07 (work in
1771	              progress), February 2015.

1773	   [I-D.saldana-tsvwg-simplemux]
1774	              Saldana, J., "Simplemux. A generic multiplexing protocol",
1775	              draft-saldana-tsvwg-simplemux-02 (work in progress),
1776	              January 2015.

1778	   [I-D.shepherd-bier-problem-statement]
1779	              Shepherd, G., Dolganow, A., and a.
1780	              arkadiy.gulko@thomsonreuters.com, "Bit Indexed Explicit
1781	              Replication (BIER) Problem Statement",
1782	              draft-shepherd-bier-problem-statement-02 (work in
1783	              progress), February 2015.

1785	   [I-D.sridharan-virtualization-nvgre]
1786	              Garg, P. and Y. Wang, "NVGRE: Network Virtualization using
1787	              Generic Routing Encapsulation",
1788	              draft-sridharan-virtualization-nvgre-08 (work in
1789	              progress), April 2015.

1791	   [I-D.wei-tsvwg-tunnel-congestion-feedback]
1792	              Wei, X., Zhu, L., and L. Deng, "Tunnel Congestion
1793	              Feedback", draft-wei-tsvwg-tunnel-congestion-feedback-03
1794	              (work in progress), October 2014.

1796	   [I-D.wijnands-bier-architecture]
1797	              Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and
1798	              S. Aldrin, "Multicast using Bit Index Explicit
1799	              Replication", draft-wijnands-bier-architecture-05 (work in
1800	              progress), March 2015.

1802	   [I-D.wijnands-mpls-bier-encapsulation]
1803	              Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and
1804	              S. Aldrin, "Encapsulation for Bit Index Explicit
1805	              Replication in MPLS Networks",
1806	              draft-wijnands-mpls-bier-encapsulation-02 (work in
1807	              progress), December 2014.

1809	   [I-D.xu-bier-encapsulation]
1810	              Xu, X., Somasundaram, S., Jacquenet, C., and R. Raszuk,
1811	              "BIER Encapsulation", draft-xu-bier-encapsulation-02 (work
1812	              in progress), February 2015.

1814	   [IEEE802.1Q-2014]
1815	              IEEE, "IEEE Standard for Local and metropolitan area
1816	              networks--Bridges and Bridged Networks", IEEE Std 802.1Q-
1817	              2014, 2014,
1818	              <http://www.ieee802.org/1/pages/802.1Q-2014.html>.

1820	              (Access Controlled link within page)

1822	Authors' Addresses

1824	   Erik Nordmark
1825	   Arista Networks
1826	   5453 Great America Parkway
1827	   Santa Clara, CA 95054
1828	   USA

1830	   Email: nordmark@arista.com

1832	   Albert Tian
1833	   Ericsson Inc.
1834	   300 Holger Way
1835	   San Jose, California  95134
1836	   USA

1838	   Email: albert.tian@ericsson.com

1840	   Jesse Gross
1841	   VMware
1842	   3401 Hillview Ave.
1843	   Palo Alto, CA  94304
1844	   USA

1846	   Email: jgross@vmware.com
1847	   Jon Hudson
1848	   Brocade Communications Systems, Inc.
1849	   130 Holger Way
1850	   San Jose, CA  95134
1851	   USA

1853	   Email: jon.hudson@gmail.com

1855	   Lawrence Kreeger
1856	   Cisco Systems, Inc.
1857	   170 W. Tasman Avenue
1858	   San Jose, CA 95134
1859	   USA

1861	   Email: kreeger@cisco.com

1863	   Pankaj Garg
1864	   Microsoft
1865	   1 Microsoft Way
1866	   Redmond, WA  98052
1867	   USA

1869	   Email: pankajg@microsoft.com

1871	   Patricia Thaler
1872	   Broadcom Corporation
1873	   3151 Zanker Road
1874	   San Jose, CA 95134
1875	   USA

1877	   Email: pthaler@broadcom.com

1879	   Tom Herbert
1880	   Google
1881	   1600 Amphitheatre Parkway
1882	   Mountain View, CA
1883	   USA

1885	   Email: therbert@google.com