idnits 2.17.1 draft-ietf-rtgwg-dt-encap-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 298: '...italized keyword MUST is used as defin...' RFC 2119 keyword, line 376: '...lancing procedure MUST choose the same...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 21, 2016) is 2950 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.xu-bier-encapsulation' is defined on line 1888, but no explicit reference was found in the text == Outdated reference: A later version (-19) exists of draft-ietf-tsvwg-rfc5405bis-10 ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) ** Obsolete normative reference: RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-08) exists of draft-ietf-nvo3-arch-04 == Outdated reference: A later version (-15) exists of draft-ietf-tsvwg-circuit-breaker-13 == Outdated reference: A later version (-19) exists of draft-ietf-tsvwg-gre-in-udp-encap-11 == Outdated reference: A later version (-12) exists of draft-saldana-tsvwg-simplemux-04 == Outdated reference: A later version (-06) exists of draft-xu-bier-encapsulation-03 Summary: 5 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG E. Nordmark (ed) 3 Internet-Draft Arista Networks 4 Intended status: Informational A. Tian 5 Expires: September 22, 2016 Ericsson Inc. 6 J. Gross 7 VMware 8 J. Hudson 9 Brocade Communications Systems, Inc. 10 L. Kreeger 11 Cisco Systems, Inc. 12 P. Garg 13 Microsoft 14 P. Thaler 15 Broadcom Corporation 16 T. Herbert 17 Facebook 18 March 21, 2016 20 Encapsulation Considerations 21 draft-ietf-rtgwg-dt-encap-01 23 Abstract 25 The IETF Routing Area director has chartered a design team to look at 26 common issues for the different data plane encapsulations being 27 discussed in the NVO3 and SFC working groups and also in the BIER 28 BoF, and also to look at the relationship between such encapsulations 29 in the case that they might be used at the same time. The purpose of 30 this design team is to discover, discuss and document considerations 31 across the different encapsulations in the different WGs/BoFs so that 32 we can reduce the number of wheels that need to be reinvented in the 33 future. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on September 22, 2016. 51 Copyright Notice 53 Copyright (c) 2016 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Design Team Charter . . . . . . . . . . . . . . . . . . . . . 3 69 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 3. Common Issues . . . . . . . . . . . . . . . . . . . . . . . . 5 71 4. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 5. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 6. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 74 7. Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 75 8. Next-protocol indication . . . . . . . . . . . . . . . . . . 9 76 9. MTU and Fragmentation . . . . . . . . . . . . . . . . . . . . 10 77 10. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 78 11. Security Considerations . . . . . . . . . . . . . . . . . . . 13 79 11.1. Encapsulation-specific considerations . . . . . . . . . 14 80 11.2. Virtual network isolation . . . . . . . . . . . . . . . 15 81 11.3. Packet level security . . . . . . . . . . . . . . . . . 16 82 11.4. In summary: . . . . . . . . . . . . . . . . . . . . . . 17 83 12. QoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 13. Congestion Considerations . . . . . . . . . . . . . . . . . . 18 85 14. Header Protection . . . . . . . . . . . . . . . . . . . . . . 20 86 15. Extensibility Considerations . . . . . . . . . . . . . . . . 22 87 16. Layering Considerations . . . . . . . . . . . . . . . . . . . 25 88 17. Service model . . . . . . . . . . . . . . . . . . . . . . . . 26 89 18. Hardware Friendly . . . . . . . . . . . . . . . . . . . . . . 27 90 18.1. Considerations for NIC offload . . . . . . . . . . . . . 28 91 19. Middlebox Considerations . . . . . . . . . . . . . . . . . . 32 92 20. Related Work . . . . . . . . . . . . . . . . . . . . . . . . 32 93 21. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 94 22. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 34 95 23. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 35 96 24. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 97 24.1. Normative References . . . . . . . . . . . . . . . . . . 35 98 24.2. Informative References . . . . . . . . . . . . . . . . . 38 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 101 1. Design Team Charter 103 There have been multiple efforts over the years that have resulted in 104 new or modified data plane behaviors involving encapsulations. That 105 includes IETF efforts like MPLS, LISP, and TRILL but also industry 106 efforts like VXLAN and NVGRE. These collectively can be seen as a 107 source of insight into the properties that data planes need to meet. 108 The IETF is currently working on potentially new encapsulations in 109 NVO3 and SFC and considering working on BIER. In addition there is 110 work on tunneling in the INT area. 112 This is a short term design team chartered to collect and construct 113 useful advice to parties working on new or modified data plane 114 behaviors that include additional encapsulations. The goal is for 115 the group to document useful advice gathered from interacting with 116 ongoing efforts. An Internet Draft will be produced for IETF92 to 117 capture that advice, which will be discussed in RTGWG. 119 Data plane encapsulations face a set of common issues such as: 121 o How to provide entropy for ECMP 122 o Issues around packet size and fragmentation/reassembly 123 o OAM - what support is needed in an encapsulation format? 124 o Security and privacy. 125 o QoS 126 o Congestion Considerations 127 o IPv6 header protection (zero UDP checksum over IPv6 issue) 128 o Extensibility - e.g., for evolving OAM, security, and/or 129 congestion control 130 o Layering of multiple encapsulations e.g., SFC over NVO3 over BIER 132 The design team will provide advice on those issues. The intention 133 is that even where we have different encapsulations for different 134 purposes carrying different information, each such encapsulation 135 doesn't have to reinvent the wheel for the above common issues. 137 The design team will look across the routing area in particular at 138 SFC, NVO3 and BIER. It will not be involved in comparing or 139 analyzing any particular encapsulation formats proposed in those WGs 140 and BoFs but instead focus on common advice. 142 2. Overview 144 The references provide background information on NVO3, SFC, and BIER. 145 In particular, NVO3 is introduced in [RFC7364], [RFC7365], and 146 [I-D.ietf-nvo3-arch]. SFC is introduced in 147 [I-D.ietf-sfc-architecture] and [I-D.ietf-sfc-problem-statement]. 148 Finally, the information on BIER is in 149 [I-D.shepherd-bier-problem-statement], 150 [I-D.wijnands-bier-architecture], and 151 [I-D.wijnands-mpls-bier-encapsulation]. We assume the reader has 152 some basic familiarity with those proposed encapsulations. The 153 Related Work section points at some prior work that relates to the 154 encapsulation considerations in this document. 156 Encapsulation protocols typically have some unique information that 157 they need to carry. In some cases that information might be modified 158 along the path and in other cases it is constant. The in-flight 159 modifications has impacts on what it means to provide security for 160 the encapsulation headers. 162 o NVO3 carries a VNI Identifier edge to edge which is not modified. 163 There has been OAM discussions in the WG and it isn't clear 164 whether some of the OAM information might be modified in flight. 165 o SFC carries Service Function Path identification and service meta- 166 data. The meta-data might be modified as the packets follow the 167 service path. SFC talks of some loop avoidance mechanism which is 168 likely to result in modifications for for each hop in the service 169 chain even if the meta-data is unmodified. 170 o BIER carries a bitmap of egress ports to which a packet should be 171 delivered, and as the packet is forwarded down different paths 172 different bits are cleared in that bitmap. 174 Even if information isn't modified in flight there might be devices 175 that wish to inspect that information. For instance, one can 176 envision future NVO3 security devices which filter based on the 177 virtual network identifier. 179 The need for extensibility is different across the protocols 181 o NVO3 might need some extensions for OAM and security. 182 o SFC consists of Service Function Path identification plus carrying 183 service meta-data along a path, and different services might need 184 different types and amount of meta-data. 185 o BIER might need variable number of bits in their bitmaps, or other 186 future schemes to scale up to larger network. 188 The extensibility needs and constraints might be different when 189 considering hardware vs. software implementations of the 190 encapsulation headers. NIC hardware might have different constraints 191 than switch hardware. 193 As the IETF designs these encapsulations the different WGs solve the 194 issues for their own encapsulation. But there are likely to be 195 future cases when the different encapsulations are combined in the 196 same header. For instance, NVO3 might be a "transport" used to carry 197 SFC between the different hops in the service chain. 199 Most of the issues discussed in this document are not new. The IETF 200 and industry as specified and deployed many different encapsulation 201 or tunneling protocols over time, ranging from simple IP-in-IP and 202 GRE encapsulation, IPsec, pseudo-wires, session-based approached like 203 L2TP, and the use of MPLS control and data planes. IEEE 802 has also 204 defined layered encapsulation for Provider Backbone Bridges (PBB) and 205 IEEE 802.1Qbp (ECMP). This document tries to leverage what we 206 collectively have learned from that experience and summarize what 207 would be relevant for new encapsulations like NVO3, SFC, and BIER. 209 3. Common Issues 211 [This section is mostly a repeat of the charter but with a few 212 modifications and additions.] 214 Any new encapsulation protocol would need to address a large set of 215 issues that are not central to the new information that this protocol 216 intends to carry. The common issues explored in this document are: 218 o How to provide entropy for Equal Cost MultiPath (ECMP) routing 219 o Issues around packet size and fragmentation/reassembly 220 o Next header indication - each encapsulation might be able to carry 221 different payloads 222 o OAM - what support is needed in an encapsulation format? 223 o Security and privacy 224 o QoS 225 o Congestion Considerations 226 o Header protection 227 o Extensibility - e.g., for evolving OAM, security, and/or 228 congestion control 229 o Layering of multiple encapsulations e.g., SFC over NVO3 over BIER 230 o Importance of being friendly to hardware and software 231 implementations 233 The degree to which these common issues apply to a particular 234 encapsulation can differ based on the intended purpose of the 235 encapsulation. But it is useful to understand all of them before 236 determining which ones apply. 238 4. Scope 240 It is important to keep in mind what we are trying to cover and not 241 cover in this document and effort. This is 243 o A look across the three new encapsulations, while taking lots of 244 previous work into account 245 o Focus on the class of encapsulations that would run over IP/UDP. 246 That was done to avoid being distracted by the data-plane and 247 control-plane interaction, which is more significant for protocols 248 that are designed to run over "transports" that maintain session 249 or path state. 250 o We later expanded the scope somewhat to consider how the 251 encapsulations would play with MPLS "transport", which is 252 important because SFC and BIER seem to target being independent of 253 the underlying "transport" 255 However, this document and effort is NOT intended to: 257 o Design some new encapsulation header to rule them all 258 o Design yet another new NVO3 encapsulation header 259 o Try to select the best encapsulation header 260 o Evaluate any existing and proposed encapsulations 262 While the origin and focus of this document is the routing area and 263 in particular NVO3, SFC, and BIER, the considerations apply to other 264 encapsulations that are being defined in the IETF and elsewhere. 265 There seems to be an increase in the number of encapsulations being 266 defined to run over UDP, where there might already exist an 267 encapsulation over IP or Ethernet. Feedback on how these 268 considerations apply in those contexts is welcome. 270 5. Assumptions 272 The design center for the new encapsulations is a well-managed 273 network. That network can be a datacenter network (plus datacenter 274 interconnect) or a service provider network. Based on the existing 275 and proposed encapsulations in those environment it is reasonable to 276 make these assumptions: 278 o The MTU is carefully managed and configured. Hence an 279 encapsulation protocol can make the packets bigger without 280 resulting in a requirement for fragmentation and reassembly 281 between ingress and egress. (However, it might be useful to 282 detecting MTU misconfigurations.) 283 o In general an encapsulation needs some approach for congestion 284 management. But the assumptions are different than for arbitrary 285 Internet paths in that the underlay might be well-provisioned and 286 better policed at the edge, and due to multi-tenancy, the 287 congestion control in the endpoints might be even less trusted 288 than on the Internet at large. 290 The goal is to implement these encapsulations in hardware and 291 software hence we can't assume that the needs of either 292 implementation approach can trump the needs of the other. In 293 particular, around extensibility the needs and constraints might be 294 quite different. 296 6. Terminology 298 The capitalized keyword MUST is used as defined in 299 http://en.wikipedia.org/wiki/Julmust 301 TBD: Refer to existing documents for at least NVO3 and SFC 302 terminology. We use at least the VNI ID in this document. 304 7. Entropy 306 In many cases the encapsulation format needs to enable ECMP in 307 unmodified routers. Those routers might use different fields in TCP/ 308 UDP packets to do ECMP without a risk of reordering a flow. Note 309 that the same entropy might also be used at layer 2 e.g. for Link 310 Aggregation (LAG). 312 The common way to do ECMP-enabled encapsulation over IP today is to 313 add a UDP header and to use UDP with the UDP source port carrying 314 entropy from the inner/original packet headers as in LISP [RFC6830]. 315 The total entropy consists of 14 bits in the UDP source port (using 316 the ephemeral port range) plus the outer IP addresses which seems to 317 be sufficient for entropy; using outer IPv6 headers would give the 318 option for more entropy should it be needed in the future. 320 In some environments it might be fine to use all 16 bits of the port 321 range. However, middleboxes might make assumptions about the system 322 ports or user ports. But they should not make any assumptions about 323 the ports in the Dynamic and/or Private Port range, which have the 324 two MSBs set to 11b. 326 The UDP source port might change over the lifetime of an encapsulated 327 flow, for instance for DoS mitigation or re-balancing load across 328 ECMP. Such changes need to consider reordering if there are packets 329 in flight for the flow. 331 There is some interaction between entropy and OAM and extensibility 332 mechanism. It is desirable to be able to send OAM packets to follow 333 the same path as network packets. Hence OAM packets should use the 334 same entropy mechanism as data packets. While routers might use 335 information in addition the entropy field and outer IP header, they 336 can not use arbitrary parts of the encapsulation header since that 337 might result in OAM frames taking a different path. Likewise if 338 routers look past the encapsulation header they need to be aware of 339 the extensibility mechanism(s) in the encapsulation format to be able 340 to find the inner headers in the presence of extensions; OAM frames 341 might use some extensions e.g. for timestamps. 343 Architecturally the entropy and the next header field are really part 344 of enclosing delivery header. UDP with entropy goes hand-in-hand 345 with the outer IP header. Thus the UDP entropy is present for the 346 underlay IP routers the same way that an MPLS entropy label is 347 present for LSRs. The entropy above is all about providing entropy 348 for the outer delivery of the encapsulated packets. 350 It has been suggested that when IPv6 is used it would not be 351 necessary to add a UDP header for entropy, since the IPv6 flow label 352 can be used for entropy. (This assumes that there is an IP protocol 353 number for the encapsulation in addition to a UDP destination port 354 number since UDP would be used with IPv4 underlay. And any use of 355 UDP checksums would need to be replaced by an encaps-specific 356 checksum or secure hash.) While such an approach would save 8 bytes 357 of headers when the underlay is IPv6, it does assume that the 358 underlay routers use the flow label for ECMP, and it also would make 359 the IPv6 approach different than the IPv4 approach. Currently the 360 leaning is towards recommending using the UDP encapsulation for both 361 IPv4 and IPv6 underlay. The IPv6 flow label can be used for 362 additional entropy if need be. There is more detailed discussion for 363 using the IPv6 flow label for tunnels in [RFC6438]. 365 Note that in the proposed BIER encapsulation 366 [I-D.wijnands-mpls-bier-encapsulation], there is an an 8-bit field 367 which specifies an entropy value that can be used for load balancing 368 purposes. This entropy is for the BIER forwarding decisions, which 369 is independent of any outer delivery ECMP between BIER routers. Thus 370 it is not part of the delivery ECMP discussed in this section. 372 [Note: For any given bit in BIER (that identifies an exit from the 373 BIER domain) there might be multiple immediate next hops. The 374 BIER entropy field is used to select that next hop as part of BIER 375 processing. The BIER forwarding process may do equal cost load 376 balancing, but the load balancing procedure MUST choose the same 377 path for any two packets that have the same entropy value.] 379 In summary: 381 o The entropy is associated with the transport, that is an outer IP 382 header or MPLS. 383 o In the case of IP transport use 14 or 16 bits of UDP source port, 384 plus outer IPv6 flowid for entropy. 386 8. Next-protocol indication 388 Next-protocol indications appear in three different contexts for 389 encapsulations. 391 Firstly, the transport delivery mechanism for the encapsulations we 392 discuss in this document need some way to indicate which 393 encapsulation header (or other payload) comes next in the packet. 394 Some encapsulations might be identified by a UDP port; others might 395 be identified by an Ethernet type or IP protocol number. Which 396 approach is used is a function of the preceding header the same way 397 as IPv4 is identified by both an Ethernet type and an IP protocol 398 number (for IP-in-IP). In some cases the header type is implicit in 399 some session (L2TP) or path (MPLS) setup. But this is largely beyond 400 the control of the encapsulation protocol. For instance, if there is 401 a requirement to carry the encapsulation after an Ethernet header, 402 then an Ethernet type is needed. If required to be carried after an 403 IP/UDP header, then a UDP port number is needed. For UDP port 404 numbers there are considerations for port number conservation 405 described in [I-D.ietf-tsvwg-port-use]. 407 It is worth mentioning that in the MPLS case of no implicit protocol 408 type many forwarding devices peek at the first nibble of the payload 409 to determine whether to apply IPv4 or IPv6 L3/L4 hashes for load 410 balancing [RFC7325]. That behavior places some constraints on other 411 payloads carried over MPLS and some protocol define an initial 412 control word in the payload with a value of zero in its first nibble 413 [RFC4385] to avoid confusion with IPv4 and IPv6 payload headers. 415 Secondly, the encapsulation needs to indicate the type of its 416 payload, which is in scope for the design of the encapsulation. We 417 have existing protocols which use Ethernet types (such as GRE). Here 418 each encapsulation header can potentially makes its own choices 419 between: 421 o Use the Ethernet type space - makes it easy to carry existing L2 422 and L3 protocols including IPv4, IPv6, and Ethernet. 423 Disadvantages are that it is a 16 bit number and we probably need 424 far less than 100 values, and the number space is controlled by 425 the IEEE 802 RAC with its own allocation policies. 426 o Use the IP protocol number space - makes it easy to carry e.g., 427 ESP in addition to IP and Ethernet but brings in all existing 428 protocol numbers many of which would never be used directly on top 429 of the encapsulation protocol. IANA managed eight bit values, 430 presumably more difficult to get an assigned number than to get a 431 transport port assignment. 432 o Define their own next-protocol number space, which can use fewer 433 bits than an Ethernet type and give more flexibility, but at the 434 cost of administering that numbering space (presumably by the 435 IANA). 437 Thirdly, if the IETF ends up defining multiple encapsulations at 438 about the same time, and there is some chance that multiple such 439 encapsulations can be combined in the same packet, there is a 440 question whether it makes sense to use a common approach and 441 numbering space for the encapsulation across the different protocols. 442 A common approach might not be beneficial as long as there is only 443 one way to indicate e.g., SFC inside NVO3. 445 Many Internet protocols use fixed values (typically managed by the 446 IANA function) for their next-protocol field. That facilitates 447 interpretation of packets by middleboxes and e.g., for debugging 448 purposes, but might make the protocol evolution inflexible. Our 449 collective experience with MPLS shows an alternative where the label 450 can be viewed as an index to a table containing processing 451 instructions and the table content can be managed in different ways. 452 Encapsulations might want to consider the tradeoffs between such more 453 flexible versus more fixed approaches. 455 In summary: 457 o Would it be useful for the IETF come up with a common scheme for 458 encapsulation protocols? If not each encapsulation can define its 459 own scheme. 461 9. MTU and Fragmentation 463 A common approach today is to assume that the underlay have 464 sufficient MTU to carry the encapsulated packets without any 465 fragmentation and reassembly at the tunnel endpoints. That is 466 sufficient when the operator of the ingress and egress have full 467 control of the paths between those endpoints. And it makes for 468 simpler (hardware) implementations if fragmentation and reassembly 469 can be avoided. 471 However, even under that assumption it would be beneficial to be able 472 to detect when there is some misconfiguration causing packets to be 473 dropped due to MTU issues. One way to do this is to have the 474 encapsulator set the don't-fragment (DF) flag in the outer IPv4 475 header and receive and log any received ICMP "packet too big" (PTB) 476 errors. Note that no flag needs to be set in an outer IPv6 header 477 [RFC2460]. 479 Encapsulations could also define an optional tunnel fragmentation and 480 reassembly mechanism which would be useful in the case when the 481 operator doesn't have full control of the path, or when the protocol 482 gets deployed outside of its original intended context. Such a 483 mechanism would be required if the underlay might have a path MTU 484 which makes it impossible to carry at least 1518 bytes (if offering 485 Ethernet service), or at least 1280 (if offering IPv6 service). The 486 use of such a protocol mechanism could be triggered by receiving a 487 PTB. But such a mechanism might not be implemented by all 488 encapsulators and decapsulators. [Aerolink is one example of such a 489 protocol.] 491 Depending on the payload carried by the encapsulation there are some 492 additional possibilities: 494 o If payload is IPv4/6 then the underlay path MTU could be used to 495 report end-to-end path MTU. 496 o If the payload service is Ethernet/L2, then there is no such per 497 destination reporting mechanism. However, there is a LLDP TLV for 498 reporting max frame size; might be useful to report minimum to end 499 stations, but unmodified end stations would do nothing with that 500 TLV since they assume that the MTU is at least 1518. 502 In summary: 504 o In some deployments an encapsulation can assume well-managed MTU 505 hence no need for fragmentation and reassembly related to the 506 encapsulation. 507 o Even so, it makes sense for ingress to track any ICMP packet too 508 big addressed to ingress to be able to log any MTU 509 misconfigurations. 510 o Should an encapsulation protocol be deployed outside of the 511 original context it might very well need support for fragmentation 512 and reassembly. 514 10. OAM 516 The OAM area is seeing active development in the IETF with 517 discussions (at least) in NVO3 and SFC working groups, plus the new 518 LIME WG looking at architecture and YANG models. 520 The design team has take a narrow view of OAM to explore the 521 potential OAM implications on the encapsulation format. 523 In terms of what we have heard from the various working groups there 524 seem to be needs to: 526 o Be able to send out-of-band OAM messages - that potentially should 527 follow the same path through the network as some flow of data 528 packets. 530 * Such OAM messages should not accidentally be decapsulated and 531 forwarded to the end stations. 532 o Be able to add OAM information to data packets that are 533 encapsulated. Discussions have been around: 535 * Using a bit in the OAM to synchronize sampling of counters 536 between the encapsulator and decapsulator. 537 * Optional timestamps, sequence numbers, etc for more detailed 538 measurements between encapsulator and decapsulator. 539 o Usable for both proactive monitoring (akin to BFD) and reactive 540 checks (akin to traceroute to pin-point a failure) 542 To ensure that the OAM messages can follow the same path the OAM 543 messages need to get the same ECMP (and LAG hashing) results as a 544 given data flow. An encapsulator can choose between one of: 546 o Limit ECMP hashing to not look past the UDP header i.e. the 547 entropy needs to be in the source/destination IP and UDP ports 548 o Make OAM packets look the same as data packets i.e. the initial 549 part of the OAM payload has the inner Ethernet, IP, TCP/UDP 550 headers as a payload. (This approach was taken in TRILL out of 551 necessity since there is no UDP header.) Any OAM bit in the 552 encapsulation header must in any case be excluded from the 553 entropy. 555 There can be several ways to prevent OAM packets from accidentally 556 being forwarded to the end station using: 558 o A bit in the frame (as in TRILL) indicating OAM 559 o A next-protocol indication with a designated value for "none" or 560 "oam". 562 This assumes that the bit or next protocol, respectively, would not 563 affect entropy/ECMP in the underlay. However, the next-protocol 564 field might be used to provide differentiated treatment of packets 565 based on their payload; for instance a TCP vs. IPsec ESP payload 566 might be handled differently. Based on that observation it might be 567 undesirable to overload the next protocol with the OAM drop behavior, 568 resulting in a preference for having a bit to indicate that the 569 packet should be forwarded to the end station after decapsulation. 571 There has been suggestions that one (or more) marker bits in the 572 encaps header would be useful in order to delineate measurement 573 epochs on the encapsulator and decapsulator and use that to compare 574 counters to determine packet loss. 576 A result of the above is that OAM is likely to evolve and needs some 577 degree of extensibility from the encapsulation format; a bit or two 578 plus the ability to define additional larger extensions. 580 An open question is how to handle error messages or other reports 581 relating to OAM. One can think if such reporting as being associated 582 with the encapsulation the same way ICMP is associated with IP. 583 Would it make sense for the IETF to develop a common Encapsulation 584 Error Reporting Protocol as part of OAM, which can be used for 585 different encapsulations? And if so, what are the technical 586 challenges. For instance, how to avoid it being filtered as ICMP 587 often is? 589 A potential additional consideration for OAM is the possible future 590 existence of gateways that "stitch" together different dataplane 591 encapsulations and might want to carry OAM end-to-end across the 592 different encapsulations. 594 In summary: 596 o It makes sense to reserve a bit for "drop after decapsulation" for 597 OAM out-of-band. 598 o An encapsulation needs sufficient extensibility for OAM (such as 599 bits, timestamps, sequence numbers). That might be motivated by 600 in-band OAM but it would make sense to leverage the same 601 extensions for out-of band OAM. 602 o OAM places some constraints on use of entropy in forwarding 603 devices. 604 o Should IETF look into error reporting that is independent of the 605 specific encapsulation? 607 11. Security Considerations 609 Different encapsulation use cases will have different requirements 610 around security. For instance, when encapsulation is used to build 611 overlay networks for network virtualization, isolation between 612 virtual networks may be paramount. BIER support of multicast may 613 entail different security requirements than encapsulation for 614 unicast. 616 In real deployment, the security of the underlying network may be 617 considered for determining the level of security needed in the 618 encapsulation layer. However for the purposes of this discussion, we 619 assume that network security is out of scope and that the underlying 620 network does not itself provide adequate or as least uniform security 621 mechanisms for encapsulation. 623 There are at least three considerations for security: 625 o Anti-spoofing/virtual network isolation 626 o Interaction with packet level security such as IPsec or DTLS 627 o Privacy (e.g., VNI ID confidentially for NVO3) 629 This section uses a VNI ID in NVO3 as an example. A SFC or BIER 630 encapsulation is likely to have fields with similar security and 631 privacy requirements. 633 11.1. Encapsulation-specific considerations 635 Some of these considerations appear for a new encapsulation, and 636 others are more specific to network virtualization in datacenters. 638 o New attack vectors: 640 * DDOS on specific queued/paths by attempting to reproduce the 641 5-tuple hash for targeted connections. 642 * Entropy in outer 5-tuple may be too little or predictable. 643 * Leakage of identifying information in the encapsulation header 644 for an encrypted payload. 645 * Vulnerabilities of using global values in fields like VNI ID. 646 o Trusted versus untrusted tenants in network virtualization: 648 * The criticality of virtual network isolation depends on whether 649 tenants are trusted or untrusted. In the most extreme cases, 650 tenants might not only be untrusted but may be considered 651 hostile. 652 * For a trusted set of users (e.g. a private cloud) it may be 653 sufficient to have just a virtual network identifier to provide 654 isolation. Packets inadvertently crossing virtual networks 655 should be dropped similar to a TCP packet with a corrupted port 656 being received on the wrong connection. 657 * In the presence of untrusted users (e.g. a public cloud) the 658 virtual network identifier must be adequately protected against 659 corruption and verified for integrity. This case may warrant 660 keyed integrity. 661 o Different forms of isolation: 663 * Isolation could be blocking all traffic between tenants (or 664 except as allowed by some firewall) 665 * Could also be about performance isolation i.e. one tenant can 666 overload the network in a way that affects other tenants 668 * Physical isolation of traffic for different tenants in network 669 may be required, as well as required restrictions that tenants 670 may have on where their packets may be routed. 671 o New attack vectors from untrusted tenants: 673 * Third party VMs with untrusted tenants allows internally borne 674 attacks within data centers 675 * Hostile VMs inside the system may exist (e.g. public cloud) 676 * Internally launched DDOS 677 * Passive snooping for mis-delivered packets 678 * Mitigate damage and detection in event that a VM is able to 679 circumvent isolation mechanisms 680 o Tenant-provider relationship: 682 * Tenant might not trust provider, hypervisors, network 683 * Provider likely will need to provide SLA or a least a statement 684 on security 685 * Tenant may implement their own additional layers of security 686 * Regulation and certification considerations 687 o Trend towards tighter security: 689 * Tenants' data in network increases in volume and value, attacks 690 become more sophisticated 691 * Large DCs already encrypt everything on disk 692 * DCs likely to encrypt inter-DC traffic at this point, use TLS 693 to Internet. 694 * Encryption within DC is becoming more commonplace, becomes 695 ubiquitous when cost is low enough. 696 * Cost/performance considerations. Cost of support for strong 697 security has made strong network security in DCs prohibitive. 698 * Are there lessons from MacSec? 700 11.2. Virtual network isolation 702 The first requirement is isolation between virtual networks. Packets 703 sent in one virtual network should never be illegitimately received 704 by a node in another virtual network. Isolation should be protected 705 in the presence of malicious attacks or inadvertent packet 706 corruption. 708 The second requirement is sender authentication. Sender identity is 709 authenticated to prevent anti-spoofing. Even if an attacker has 710 access to the packets in the network, they cannot send packets into a 711 virtual network. This may have two possibilities: 713 o Pairwise sender authentication. Any two communicating hosts 714 negotiate a shared key. 716 o Group authentication. A group of hosts share a key (this may be 717 more appropriate for multicast of encapsulation). 719 Possible security solutions: 721 o Security cookie: This is similar to L2TP cookie mechanism 722 [RFC3931]. A shared plain text cookie is shared between 723 encapsulator and decapsulator. A receiver validates a packet by 724 evaluating if the cookie is correct for the virtual network and 725 address of a sender. Validation function is F(cookie, VNI ID, 726 source address). If cookie matches, accept packet, else drop. 727 Since cookie is plain text this method does not protect against an 728 eavesdropping. Cookies are set and may be rotated out of band. 729 o Secure hash: This is a stronger mechanism than simple cookies that 730 borrows from IPsec and PPP authentication methods. In this model 731 security field contains a secure hash of some fields in the packet 732 using a shared key. Hash function may be something like H(key, 733 VNI ID, address, salt). The salt ensures the hash is not the same 734 for every packet, and if it includes a sequence number may also 735 protect against replay attacks. 737 In any use of a shared key, periodic re-keying should be allowed. 738 This could include use of techniques like generation numbers, key 739 windows, etc. See [I-D.farrelll-mpls-opportunistic-encrypt] for an 740 example application. 742 We might see firewalls that are aware of the encapsulation and can 743 provide some defense in depth combined with the above example anti- 744 spoofing approaches. An example would be an NVO3-aware firewall 745 being able to check the VNI ID. 747 Separately and in addition to such filtering, there might be a desire 748 to completely block an encapsulation protocol at certain places in 749 the network, e.g., at the edge of a datacenter. Using a fixed 750 standard UDP destination port number for each encapsulation protocol 751 would facilitate such blocking. 753 11.3. Packet level security 755 An encapsulated packet may itself be encapsulated in IPsec (e.g. 756 ESP). This should be straightforward and in fact is what would 757 happen today in security gateways. In this case, there is no special 758 consideration for the fact that packet is encapsulated, however since 759 the encapsulation layer headers are included (part of encrypted data 760 for instance) we lose visibility in the network of the encapsulation. 762 The more interesting case is when security is applied to the 763 encapsulation payload. This will keep the encapsulation headers in 764 the outer header visible to the network (for instance in nvo3 we may 765 way to firewall based on VNI ID even if the payload is encrypted). 766 One possibility is to apply DTLS to the encapsulation payload. In 767 this model the protocol stack may be something like 768 IP|UDP|Encap|DTLS|encrypted_payload. The encapsulation and security 769 should be done together at an encapsulator and resolved at the 770 decapsulator. Since the encapsulation header is outside of the 771 security coverage, this may itself require security (like described 772 above). 774 In both of the above the security associations (SAs) may be between 775 physical hosts, so for instance in nvo3 we can have packets of 776 different virtual networks using the same SA-- this should not be an 777 issue since it is the VNI ID that ensures isolation (which needs to 778 be secured also). 780 11.4. In summary: 782 o Encapsulations need extensibility mechanisms to be able to add 783 security features like cookies and secure hashes protecting the 784 encapsulation header. 785 o NVO3 probably has specific higher requirements relating to 786 isolation for network virtualization, which is in scope for the 787 NVO3 WG. 788 o Our collective IETF experience is that successful protocols get 789 deployed outside of the original intended context, hence the 790 initial assumptions about the threat model might become invalid. 791 That needs to be considered in the standardization of new 792 encapsulations. 794 12. QoS 796 In the Internet architecture we support QoS using the Differentiated 797 Services Code Points (DSCP) in the formerly named Type-of-Service 798 field in the IPv4 header, and in the Traffic-Class field in the IPv6 799 header. The ToS and TC fields also contain the two ECN bits, which 800 are discussed in Section 13. 802 We have existing specifications how to process those bits. See 803 [RFC2983] for diffserv handling, which specifies how the received 804 DSCP value is used to set the DSCP value in an outer IP header when 805 encapsulating. (There are also existing specifications how DSCP can 806 be mapped to layer2 priorities.) 808 Those specifications apply whether or not there is some intervening 809 headers (e.g., for NVO3 or SFC) between the inner and outer IP 810 headers. Thus the encapsulation considerations in this area are 811 mainly about applying the framework in [RFC2983]. 813 Note that the DSCP and ECN bits are not the only part of an inner 814 packet that might potentially affect the outer packet. For example, 815 [RFC2473] specifies handling of inner IPv6 hop-by-hop options that 816 effectively result in copying some options to the outer header. It 817 is simpler to not have future encapsulations depend on such copying 818 behavior. 820 There are some other considerations specific to doing OAM for 821 encapsulations. If OAM messages are used to measure latency, it 822 would make sense to treat them the same as data payloads. Thus they 823 need to have the same outer DSCP value as the data packets which they 824 wish to measure. 826 Due to OAM there are constraints on middleboxes in general. If 827 middleboxes inspect the packet past the outer IP+UDP and 828 encapsulation header and look for inner IP and TCP/UDP headers, that 829 might violate the assumption that OAM packets will be handled the 830 same as regular data packets. That issue is broader than just QoS - 831 applies to firewall filters etc. 833 In summary: 835 o Leverage the existing approach in [RFC2983] for DSCP handling. 837 13. Congestion Considerations 839 Additional encapsulation headers does not introduce anything new for 840 Explicit Congestion Notification. It is just like IP-in-IP and IPsec 841 tunnels which is specified in [RFC6040] in terms of how the ECN bits 842 in the inner and outer header are handled when encapsulating and 843 decapsulating packets. Thus new encapsulations can more or less 844 include that by reference. 846 There are additional considerations around carrying non-congestion 847 controlled traffic. These details have been worked out in 848 [I-D.ietf-mpls-in-udp]. As specified in [RFC5405]: "IP-based traffic 849 is generally assumed to be congestion-controlled, i.e., it is assumed 850 that the transport protocols generating IP-based traffic at the 851 sender already employ mechanisms that are sufficient to address 852 congestion on the path. Consequently, a tunnel carrying IP-based 853 traffic should already interact appropriately with other traffic 854 sharing the path, and specific congestion control mechanisms for the 855 tunnel are not necessary". Those considerations are being captured 856 in [I-D.ietf-tsvwg-rfc5405bis]. 858 For this reason, where an encapsulation method is used to carry IP 859 traffic that is known to be congestion controlled, the UDP tunnels 860 does not create an additional need for congestion control. Internet 861 IP traffic is generally assumed to be congestion-controlled. 862 Similarly, in general Layer 3 VPNs are carrying IP traffic that is 863 similarly assumed to be congestion controlled. 865 However, some of the encapsulations (at least NVO3) will be able to 866 carry arbitrary Layer 2 packets to provide an L2 service, in which 867 case one can not assume that the traffic is congestion controlled. 869 One could handle this by adding some congestion control support to 870 the encapsulation header (one instance of which would end up looking 871 like DCCP). However, if the underlay is well-provisioned and managed 872 as opposed to being arbitrary Internet path, it might be sufficient 873 to have a slower reaction to congestion induced by that traffic. 874 There is work underway on a notion of "circuit breakers" for this 875 purpose. See See [I-D.ietf-tsvwg-circuit-breaker]. Encapsulations 876 which carry arbitrary Layer 2 packets want to consider that ongoing 877 work. 879 If the underlay is provisioned in such a way that it can guarantee 880 sufficient capacity for non-congestion controlled Layer 2 traffic, 881 then such circuit breakers might not be needed. 883 Two other considerations appear in the context of these 884 encapsulations as applied to overlay networks: 886 o Protect against malicious end stations 887 o Ensure fairness and/or measure resource usage across multiple 888 tenants 890 Those issues are really orthogonal to the encapsulation, in that they 891 are present even when no new encapsulation header is in use. 892 However, the application of the new encapsulations are likely to be 893 in environments where those issues are becoming more important. 894 Hence it makes sense to consider them. 896 One could make the encapsulation header be extensible to that it can 897 carry sufficient information to be able to measure resource usage, 898 delays, and congestion. The suggestions in the OAM section about a 899 single bit for counter synchronization, and optional timestamps and/ 900 or sequence numbers, could be part of such an approach. There might 901 also be additional congestion-control extensions to be carried in the 902 encapsulation. Overall this results in a consideration to support 903 sufficient extensibility in the encapsulation to handle potential 904 future developments in this space. 906 Coarse measurements are likely to suffice, at least for circuit- 907 breaker-like purposes, see [I-D.wei-tsvwg-tunnel-congestion-feedback] 908 and [I-D.briscoe-conex-data-centre] for examples on active work in 909 this area via use of ECN. [RFC6040] Appendix C is also relevant. 910 The outer ECN bits seem sufficient (at least when everything uses 911 ECN) to do this course measurements. Needs some more study for the 912 case when there are also drops; might need to exchange counters 913 between ingress and egress to handle drops. 915 Circuit breakers are not sufficient to make a network with different 916 congestion control when the goal is to provide a predictable service 917 to different tenants. The fallback would be to rate limit different 918 traffic. 920 In summary: 922 o Leverage the existing approach in [RFC6040] for ECN handling. 923 o If the encapsulation can carry non-IP, hence non-congestion 924 controlled traffic, then leverage the approach in 925 [I-D.ietf-mpls-in-udp]. 926 o "Watch this space" for circuit breakers. 928 14. Header Protection 930 Many UDP based encapsulations such as VXLAN [RFC7348] either 931 discourage or explicitly disallow the use of UDP checksums. The 932 reason is that the UDP checksum covers the entire payload of the 933 packet and switching ASICs are typically optimized to look at only a 934 small set of headers as the packet passes through the switch. In 935 these case, computing a checksum over the packet is very expensive. 936 (Software endpoints and the NICs used with them generally do not have 937 the same issue as they need to look at the entire packet anyways.) 939 The lack a header checksum creates the possibility that bit errors 940 can be introduced into any information carried by the new headers. 941 Specifically, in the case of IPv6, the assumption is that a transport 942 layer checksum - UDP in this case - will protect the IP addresses 943 through the inclusion of a pseudo-header in the calculation. This is 944 different from IPv4 on which many of these encapsulation protocols 945 are initially deployed which contains its own header checksum. In 946 addition to IP addresses, the encapsulation header often contains its 947 own information which is used for addressing packets or other high 948 value network functions. Without a checksum, this information is 949 potentially vulnerable - an issue regardless of whether the packet is 950 carried over IPv4 or IPv6. 952 Several protocols cite [RFC6935] and [RFC6936] as an exemption to the 953 IPv6 checksum requirements. However, these are intended to be 954 tailored to a fairly narrow set of circumstances - primarily relying 955 on sparseness of the address space to detect invalid values and well 956 managed networks - and are not a one size fits all solution. In 957 these cases, an analysis should be performed of the intended 958 environment, including the probability of errors being introduced and 959 the use of ECC memory in routing equipment. 961 Conceptually, the ideal solution to this problem is a checksum that 962 covers only the newly added headers of interest. There is little 963 value in the portion of the UDP checksum that covers the encapsulated 964 packet because that would generally be protected by other checksums 965 and this is the expensive portion to compute. In fact, this solution 966 already exists in the form of UDP-Lite and UDP based encapsulations 967 could be easily ported to run on top of it. Unfortunately, the main 968 value in using UDP as part of the encapsulation header is that it is 969 recognized by already deployed equipment for the purposes of ECMP, 970 RSS, and middlebox operations. As UDP-Lite uses a different protocol 971 number than UDP and it is not widely implemented in middleboxes, this 972 value is lost. A possible solution is to incorporate the same 973 partial-checksum concept as UDP-Lite or other header checksum 974 protection into the encapsulation header and continue using UDP as 975 the outer protocol. One potential challenge with this approach is 976 the use of NAT or other form of translation on the outer header will 977 result in an invalid checksum as the translator will not know to 978 update the encapsulation header. 980 The method chosen to protect headers is often related to the security 981 needs of the encapsulation mechanism. On one hand, the impact of a 982 poorly protected header is not limited to only data corruption but 983 can also introduce a security vulnerability in the form of 984 misdirected packets to an unauthorized recipient. Conversely, high 985 security protocols that already include a secure hash over the 986 valuable portion of the header (such as by encrypting the entire IP 987 packet using IPsec, or some secure hash of the encap header) do not 988 require additional checksum protection as the hash provides stronger 989 assurance than a simple checksum. 991 If the sender has included a checksum, then the receiver should 992 verify that checksum or, if incapable, drop the packet. The 993 assumption is that configuration and/or control-plane capability 994 exchanges can be used when different receiver have different checksum 995 validation capabilities. 997 In summary: 999 o Encapsulations need extensibility to be able to add checksum/CRC 1000 for the encapsulation header itself. 1001 o When the encapsulation has a checksum/CRC, include the IPv6 1002 pseudo-header in it. 1003 o The checksum/CRC can potentially be avoided when cryptographic 1004 protection is applied to the encapsulation. 1006 15. Extensibility Considerations 1008 Protocol extensibility is the concept that a networking protocol may 1009 be extended to include new use cases or functionality that were not 1010 part of the original protocol specification. Extensibility may be 1011 used to add security, control, management, or performance features to 1012 a protocol. A solution may allow private extensions for 1013 customization or experimentation. 1015 Extending a protocol often implies that a protocol header must carry 1016 new information. There are two usual methods to accomplish this: 1018 1. Define or redefine the meaning of existing fields in a protocol 1019 header. 1020 2. Add new (optional) fields to the protocol header. 1022 It is also possible to create a new protocol version, but this is 1023 more associated with defining a protocol than extending it (IPv6 1024 being a successor to IPv4 is an example of protocol versioning). 1026 In some cases it might be more appropriate to define a new inner 1027 protocol which can carry the new functionality instead of extending 1028 the outer protocol. Examples where this works well is in the IP/ 1029 transport split, where the earlier architecture had a single NCP 1030 [RFC0033] protocol which carried both the hop-by-hop semantics which 1031 are now in IP, and the end-to-end semantics which are now in TCP. 1032 Such a split is effective when different nodes need to act upon the 1033 different information. Applying this for general protocol 1034 extensibility through nesting is not well understood, and does result 1035 in longer header chains. Furthermore, our experience with IPv6 1036 extension headers [RFC2460] in middleboxes indicates that the header 1037 chaining approach does not help with middlebox traversal. 1039 Many protocol definitions include some number of reserved fields or 1040 bits which can be used for future extension. VXLAN is an example of 1041 a protocol that includes reserved bits which are subsequently being 1042 allocated for new purposes. Another technique employed is to re- 1043 purpose existing header fields with new meanings. A classic example 1044 of this is the definition of DSCP code point which redefines the ToS 1045 field originally specified in IPv4. When a field is redefined, some 1046 mechanism may be needed to ensure that all interested parties agree 1047 on the meaning of the field. The techniques of defining meaning for 1048 reserved bits or redefining existing fields have the advantage that a 1049 protocol header can be kept a fixed length. The disadvantage is that 1050 the extensibility is limited. For instance, the number reserved bits 1051 in a fixed protocol header is limited. For standard protocols the 1052 decision to commit to a definition for a field can be wrenching since 1053 it is difficult to retract later. Also, it is difficult to predict a 1054 priori how many reserved fields or bits to put into a protocol header 1055 to satisfy the extensions create over the lifetime of the protocol. 1057 Extending a protocol header with new fields can be done in several 1058 ways. 1060 o TLVs are a very popular method used in such protocols as IP and 1061 TCP. Depending on the type field size and structure, TLVs can 1062 offer a virtually unlimited range of extensions. A disadvantage 1063 of TLVs is that processing them can be verbose, quite complicated, 1064 several validations must often be done for each TLV, and there is 1065 no deterministic ordering for a list of TLVs. TCP serves as an 1066 example of a protocol where TLVs have been successfully used (i.e. 1067 required for protocol operation). IP is an example of a protocol 1068 that allows TLVs but are rarely used in practice (router fast 1069 paths usually that assume no IP options). Note that TCP TLVs are 1070 implemented in software as well as (NIC) hardware handling various 1071 forms of TCP offload. Additional discussions about hardware 1072 implications for extensibility is captured in Section 18. 1073 o Extension headers are closely related to TLVs. These also carry 1074 type/value information, but instead of being a list of TLVs within 1075 a single protocol header, each one is in its own protocol header. 1076 IPv6 extension headers and SFC NSH are examples of this technique. 1077 Similar to TLVs these offer a wide range of extensibility, but 1078 have similarly complex processing. Another difference with TLVs 1079 is that each extension header is idempotent. This is beneficial 1080 in cases where a protocol implements a push/pop model for header 1081 elements like service chaining, but makes it more difficult group 1082 correlated information within one protocol header. 1083 o A particular form of extension headers are the tags used by IEEE 1084 802 protocols. Those are similar to e.g., IPv6 extension headers 1085 but with the key difference that each tag is a fixed length header 1086 where the length is implicit in the tag value. Thus as long as a 1087 receiver can be programmed with a tag value to length map, it can 1088 skip those new tags. 1089 o Flag-fields are a non-TLV like method of extending a protocol 1090 header. The basic idea is that the header contains a set of 1091 flags, where each set flags corresponds to optional field that is 1092 present in the header. GRE is an example of a protocol that 1093 employs this mechanism. The fields are present in the header in 1094 the order of the flags, and the length of each field is fixed. 1095 Flag-fields are simpler to process compared to TLVs, having fewer 1096 validations and the order of the optional fields is deterministic. 1097 A disadvantage is that range of possible extensions with flag- 1098 fields is smaller than TLVs. 1100 The requirements for receiving unknown or unimplemented extensible 1101 elements in an encapsulation protocol (flags, TLVs, optional fields) 1102 need to be specified. There are two parties to consider, middle 1103 boxes and terminal endpoints of encapsulation (at the decapsulator). 1105 A protocol may allow or expect nodes in a path to modify fields in an 1106 encapsulation (example use of this is BIER). In this case, the 1107 middleboxes should follow the same requirements as nodes terminating 1108 the encapsulation. In the case that middle boxes do not modify the 1109 encapsulation, we can assume that they may still inspect any fields 1110 of the encapsulation. Missing or unknown fields should be accepted 1111 per protocol specification, however it is permissible for a site to 1112 implement a local policy otherwise (e.g. a firewall may drop packets 1113 with unknown options). 1115 For handling unknown options at terminal nodes, there are two 1116 possibilities: drop packet or accept while ignoring the unknown 1117 options. Many Internet protocols specify that reserved flags must be 1118 set to zero on transmission and ignored on reception. L2TP is 1119 example data protocol that has such flags. GRE is a notable 1120 exception to this rule, reserved flag bits 1-5 cannot be ignored 1121 [RFC2890]. For TCP and IPv4, implementations must ignore optional 1122 TLVs with unknown type; however in IPv6 if a packet contains an 1123 unknown extension header (unrecognized next header type) the packet 1124 must be dropped with an ICMP error message returned. The IPv6 1125 options themselves (encoded inside the destinations options or hop- 1126 by-hop options extension header) have more flexibility. There are 1127 bits in the option code are used to instruct the receiver whether to 1128 ignore, silently drop, or drop and send error if the option is 1129 unknown. Some protocols define a "mandatory bit" that can is set 1130 with TLVs to indicate that an option must not be ignored. 1131 Conceptually, optional data elements can only be ignored if they are 1132 idempotent and do not alter how the rest of the packet is parsed or 1133 processed. 1135 Depending on what type of protocol evolution one can predict, it 1136 might make sense to have a way for a sender to express that the 1137 packet should be dropped by a terminal node which does not understand 1138 the new information. In other cases it would make sense to have the 1139 receiver silently ignore the new info. The former can be expressed 1140 by having a version field in the encapsulation, or a notion of 1141 "mandatory bit" as discussed above. 1143 A security mechanism which use some form secure hash over the 1144 encapsulation header would need to be able to know which extensions 1145 can be changed in flight. 1147 In summary: 1149 o Encapsulations need the ability to be extended to handle e.g., the 1150 OAM or security aspects discussed in this document. 1151 o Practical experience seems to tell us that extensibility 1152 mechanisms which are not in use on day one might result in 1153 immediate ossification by lack of implementation support. In some 1154 cases that has occurred in routers and in other cases in 1155 middleboxes. Hence devising ways where the extensibility 1156 mechanisms are in use seems important. 1158 16. Layering Considerations 1160 One can envision that SFC might use NVO3 as a delivery/transport 1161 mechanism. With more imagination that in turn might be delivered 1162 using BIER. Thus it is useful to think about what things look like 1163 when we have BIER+NVO3+SFC+payload. Also, if NVO3 is widely deployed 1164 there might be cases of NVO3 nesting where a customer uses NVO3 to 1165 provide network virtualization e.g., across departments. That 1166 customer uses a service provider which happens to use NVO3 to provide 1167 transport for their customers.Thus NVO3 in NVO3 might happen. 1169 A key question we set out to answer is what the packets might look 1170 like in such a case, and in particular whether we would end up with 1171 multiple UDP headers for entropy. 1173 Based on the discussion in the Entropy section, the entropy is 1174 associated with the outer delivery IP header. Thus if there are 1175 multiple IP headers there would be a UDP header for each one of the 1176 IP headers. But SFC does not require its own IP header. So a case 1177 of NVO3+SFC would be IP+UDP+NVO3+SFC. A nested NVO3 encapsulation 1178 would have independent IP+UDP headers. 1180 The layering also has some implications for middleboxes. 1182 o A device on the path between the ingress and egress is allowed to 1183 transparently inspect all layers of the protocol stack and drop or 1184 forward, but not transparently modify anything but the layer in 1185 which they operate. What this means is that an IP router is 1186 allowed modify the outer IP ttl and ECN bits, but not the 1187 encapsulation header or inner headers and payload. And a BIER 1188 router is allowed to modify the BIER header. 1189 o Alternatively such a device can become visible at a higher layer. 1190 E.g., a middlebox could a middlebox could first decapsulate, 1191 perform some function then encapsulate; which means it will 1192 generate a new encapsulation header. 1194 The design team asked itself some additional questions: 1196 o Would it make sense to have a common encapsulation base header 1197 (for OAM, security?, etc) and then followed by the specific 1198 information for NVO3, SFC, BIER? Given that there are separate 1199 proposals and the set of information needing to be carried 1200 differs, and the extensibility needs might be different, it would 1201 be difficult and not that useful to have a common base header. 1202 o With a base header in place, one could view the different 1203 functions (NVO3, SFC, and BIER) as different extensions to that 1204 base header resulting in encodings which are more space optimal by 1205 not repeating the same base header. The base header would only be 1206 repeated when there is an additional IP (and hence UDP) header. 1207 That could mean a single length field (to skip to get to the 1208 payload after all the encapsulation headers). That might be 1209 technically feasible, but it would create a lot of dependencies 1210 between different WGs making it harder to make progress. Compare 1211 with the potential savings in packet size. 1213 17. Service model 1215 The IP service is lossy and subject to reordering. In order to avoid 1216 a performance impact on transports like TCP the handling of packets 1217 is designed to avoid reordering packets that are in the same 1218 transport flow (which is typically identified by the 5-tuple). But 1219 across such flows the receiver can see different ordering for a given 1220 sender. That is the case for a unicast vs. a multicast flow from the 1221 same sender. 1223 There is a general tussle between the desire for high capacity 1224 utilization across a multipath network and the impact on packet 1225 ordering within the same flow (which results in lower transport 1226 protocol performance). That isn't affected by the introduction of an 1227 encapsulation. However, the encapsulation comes with some entropy, 1228 and there might be cases where folks want to change that in response 1229 to overload or failures. For instance, one might want to change UDP 1230 source port to try different ECMP route. Such changes can result in 1231 packet reordering within a flow, hence would need to be done 1232 infrequently and with care e.g., by identifying packet trains. 1234 There might be some applications/services which are not able to 1235 handle reordering across flows. The IETF has defined pseudo-wires 1236 [RFC3985] which provides the ability to ensure ordering (implemented 1237 using sequence numbers and/or timestamps). 1239 Architectural such services would make sense, but as a separate layer 1240 on top of an encapsulation protocol. They could be deployed between 1241 ingress and egress of a tunnel which uses some encaps. Potentially 1242 the tunnel control points at the ingress and egress could become a 1243 platform for fixing suboptimal behavior elsewhere in the network. 1245 That would clearly be undesirable in the general case. However, 1246 handling encapsulation of non-IP traffic hence non-congestion- 1247 controlled traffic is likely to be required, which implies some 1248 fairness and/or QoS policing on the ingress and egress devices. 1250 But the tunnels could potentially do more like increase reliability 1251 (retransmissions, FEC) or load spreading using e.g. MP-TCP between 1252 ingress and egress. 1254 18. Hardware Friendly 1256 Hosts, switches and routers often leverage capabilities in the 1257 hardware to accelerate packet encapsulation, decapsulation and 1258 forwarding. 1260 Some design considerations in encapsulation that leverage these 1261 hardware capabilities may result in more efficiently packet 1262 processing and higher overall protocol throughput. 1264 While "hardware friendliness" can be viewed as unnecessary 1265 considerations for a design, part of the motivation for considering 1266 this is ease of deployment; being able to leverage existing NIC and 1267 switch chips for at least a useful subset of the functionality that 1268 the new encapsulation provides. The other part is the ease of 1269 implementing new NICs and switch/router chips that support the 1270 encapsulation at ever increasing line rates. 1272 [disclaimer] There are many different types of hardware in any given 1273 network, each maybe better at some tasks while worse at others. We 1274 would still recommend protocol designers to examine the specific 1275 hardware that are likely to be used in their networks and make 1276 decisions on a case by case basis. 1278 Some considerations are: 1280 o Keep the encap header small. Switches and routers usually only 1281 read the first small number of bytes into the fast memory for 1282 quick processing and easy manipulation. The bulk of the packets 1283 are usually stored in slow memory. A big encap header may not fit 1284 and additional read from the slow memory will hurt the overall 1285 performance and throughput. 1286 o Put important information at the beginning of the encapsulation 1287 header. The reasoning is similar as explained in the previous 1288 point. If important information are located at the beginning of 1289 the encapsulation header, the packet may be processed with smaller 1290 number of bytes to be read into the fast memory and improve 1291 performance. 1293 o Avoid full packet checksums in the encapsulation if possible. 1294 Encapsulations should instead consider adding their own checksum 1295 which covers the encapsulation header and any IPv6 pseudo-header. 1296 The motivation is that most of the switch/router hardware make 1297 switching/forwarding decisions by reading and examining only the 1298 first certain number of bytes in the packet. Most of the body of 1299 the packet do not need to be processed normally. If we are 1300 concerned of preventing packet to be misdelivered due to memory 1301 errors, consider only perform header checksums. Note that NIC 1302 chips can typically already do full packet checksums for TCP/UDP, 1303 while adding a header checksum might require adding some hardware 1304 support. 1305 o Place important information at fixed offset in the encapsulation 1306 header. Packet processing hardware may be capable of parallel 1307 processing. If important information can be found at fixed 1308 offset, different part of the encapsulation header may be 1309 processed by different hardware units in parallel (for example 1310 multiple table lookups may be launched in parallel). It is easier 1311 for hardware to handle optional information when the information, 1312 if present, can be found in ideally one place, but in general, in 1313 as few places as possible. That facilitates parallel processing. 1314 TLV encoding with unconstrained order typically does not have that 1315 property. 1316 o Limit the number of header combinations. In many cases the 1317 hardware can explore different combinations of headers in 1318 parallel, however there is some added cost for this. 1320 18.1. Considerations for NIC offload 1322 This section provides guidelines to provide support of common 1323 offloads for encapsulation in Network Interface Cards (NICs). 1324 Offload mechanisms are techniques that are implemented separately 1325 from the normal protocol implementation of a host networking stack 1326 and are intended to optimize or speed up protocol processing. 1327 Hardware offload is performed within a NIC device on behalf of a 1328 host. 1330 There are three basic offload techniques of interest: 1332 o Receive multi queue 1333 o Checksum offload 1334 o Segmentation offload 1336 18.1.1. Receive multi-queue 1338 Contemporary NICs support multiple receive descriptor queues (multi- 1339 queue). Multi-queue enables load balancing of network processing for 1340 a NIC across multiple CPUs. On packet reception, a NIC must select 1341 the appropriate queue for host processing. Receive Side Scaling 1342 (RSS) is a common method which uses the flow hash for a packet to 1343 index an indirection table where each entry stores a queue number. 1345 UDP encapsulation, where the source port is used for entropy, should 1346 be compatible with multi-queue NICs that support five-tuple hash 1347 calculation for UDP/IP packets as input to RSS. The source port 1348 ensures classification of the encapsulated flow even in the case that 1349 the outer source and destination addresses are the same for all flows 1350 (e.g. all flows are going over a single tunnel). 1352 18.1.2. Checksum offload 1354 Many NICs provide capabilities to calculate standard ones complement 1355 payload checksum for packets in transmit or receive. When using 1356 encapsulation over UDP there are at least two checksums that may be 1357 of interest: the encapsulated packet's transport checksum, and the 1358 UDP checksum in the outer header. 1360 18.1.2.1. Transmit checksum offload 1362 NICs may provide a protocol agnostic method to offload transmit 1363 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1364 UDP encapsulation. In this method the host provides checksum related 1365 parameters in a transmit descriptor for a packet. These parameters 1366 include the starting offset of data to checksum, the length of data 1367 to checksum, and the offset in the packet where the computed checksum 1368 is to be written. The host initializes the checksum field to pseudo 1369 header checksum. In the case of encapsulated packet, the checksum 1370 for an encapsulated transport layer packet, a TCP packet for 1371 instance, can be offloaded by setting the appropriate checksum 1372 parameters. 1374 NICs typically can offload only one transmit checksum per packet, so 1375 simultaneously offloading both an inner transport packet's checksum 1376 and the outer UDP checksum is likely not possible. In this case 1377 setting UDP checksum to zero (per above discussion) and offloading 1378 the inner transport packet checksum might be acceptable. 1380 There is a proposal in [I-D.herbert-remotecsumoffload] to leverage 1381 NIC checksum offload when an encapsulator is co-resident with a host. 1383 18.1.2.2. Receive checksum offload 1385 Protocol encapsulation is compatible with NICs that perform a 1386 protocol agnostic receive checksum (CHECKSUM_COMPLETE in Linux 1387 parlance). In this technique, a NIC computes a ones complement 1388 checksum over all (or some predefined portion) of a packet. The 1389 computed value is provided to the host stack in the packet's receive 1390 descriptor. The host driver can use this checksum to "patch up" and 1391 validate any inner packet transport checksum, as well as the outer 1392 UDP checksum if it is non-zero. 1394 Many legacy NICs don't provide checksum-complete but instead provide 1395 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1396 in Linux). Usually, such validation is only done for simple TCP/IP 1397 or UDP/IP packets. If a NIC indicates that a UDP checksum is valid, 1398 the checksum-complete value for the UDP packet is the "not" of the 1399 pseudo header checksum. In this way, checksum-unnecessary can be 1400 converted to checksum-complete. So if the NIC provides checksum- 1401 unnecessary for the outer UDP header in an encapsulation, checksum 1402 conversion can be done so that the checksum-complete value is derived 1403 and can be used by the stack to validate an checksums in the 1404 encapsulated packet. 1406 18.1.3. Segmentation offload 1408 Segmentation offload refers to techniques that attempt to reduce CPU 1409 utilization on hosts by having the transport layers of the stack 1410 operate on large packets. In transmit segmentation offload, a 1411 transport layer creates large packets greater than MTU size (Maximum 1412 Transmission Unit). It is only at much lower point in the stack, or 1413 possibly the NIC, that these large packets are broken up into MTU 1414 sized packet for transmission on the wire. Similarly, in receive 1415 segmentation offload, small packets are coalesced into large, greater 1416 than MTU size packets at a point low in the stack receive path or 1417 possibly in a device. The effect of segmentation offload is that the 1418 number of packets that need to be processed in various layers of the 1419 stack is reduced, and hence CPU utilization is reduced. 1421 18.1.3.1. Transmit Segmentation Offload 1423 Transmit Segmentation Offload (TSO) is a NIC feature where a host 1424 provides a large (larger than MTU size) TCP packet to the NIC, which 1425 in turn splits the packet into separate segments and transmits each 1426 one. This is useful to reduce CPU load on the host. 1428 The process of TSO can be generalized as: 1430 o Split the TCP payload into segments which allow packets with size 1431 less than or equal to MTU. 1432 o For each created segment: 1434 1. Replicate the TCP header and all preceding headers of the 1435 original packet. 1437 2. Set payload length fields in any headers to reflect the length 1438 of the segment. 1439 3. Set TCP sequence number to correctly reflect the offset of the 1440 TCP data in the stream. 1441 4. Recompute and set any checksums that either cover the payload 1442 of the packet or cover header which was changed by setting a 1443 payload length. 1445 Following this general process, TSO can be extended to support TCP 1446 encapsulation UDP. For each segment the Ethernet, outer IP, UDP 1447 header, encapsulation header, inner IP header if tunneling, and TCP 1448 headers are replicated. Any packet length header fields need to be 1449 set properly (including the length in the outer UDP header), and 1450 checksums need to be set correctly (including the outer UDP checksum 1451 if being used). 1453 To facilitate TSO with encapsulation it is recommended that optional 1454 fields should not contain values that must be updated on a per 1455 segment basis-- for example an encapsulation header should not 1456 include checksums, lengths, or sequence numbers that refer to the 1457 payload. If the encapsulation header does not contain such fields 1458 then the TSO engine only needs to copy the bits in the encapsulation 1459 header when creating each segment and does not need to parse the 1460 encapsulation header. 1462 18.1.3.2. Large Receive Offload 1464 Large Receive Offload (LRO) is a NIC feature where packets of a TCP 1465 connection are reassembled, or coalesced, in the NIC and delivered to 1466 the host as one large packet. This feature can reduce CPU 1467 utilization in the host. 1469 LRO requires significant protocol awareness to be implemented 1470 correctly and is difficult to generalize. Packets in the same flow 1471 need to be unambiguously identified. In the presence of tunnels or 1472 network virtualization, this may require more than a five-tuple match 1473 (for instance packets for flows in two different virtual networks may 1474 have identical five-tuples). Additionally, a NIC needs to perform 1475 validation over packets that are being coalesced, and needs to 1476 fabricate a single meaningful header from all the coalesced packets. 1478 The conservative approach to supporting LRO for encapsulation would 1479 be to assign packets to the same flow only if they have identical 1480 five-tuple and were encapsulated the same way. That is the outer IP 1481 addresses, the outer UDP ports, encapsulated protocol, encapsulation 1482 headers, and inner five tuple are all identical. 1484 18.1.3.3. In summary: 1486 In summary, for NIC offload: 1488 o The considerations for using full UDP checksums are different for 1489 NIC offload than for implementations in forwarding devices like 1490 routers and switches. 1491 o Be judicious about encapsulations that change fields on a per- 1492 packet basis, since such behavior might make it hard to use TSO. 1494 19. Middlebox Considerations 1496 This document has touched upon middleboxes in different section. The 1497 reason for this is as encapsulations get widely deployed one would 1498 expect different forms of middleboxes might become aware of the 1499 encapsulation protocol just as middleboxes have been made aware of 1500 other protocols where there are business and deployment 1501 opportunities. Such middleboxes are likely to do more than just drop 1502 packets based on the UDP port number used by an encapsulation 1503 protocol. 1505 We note that various forms of encapsulation gateways that stitch one 1506 encapsulation protocol together with another form of protocol could 1507 have similar effects. 1509 An example of a middlebox that could see some use would be an 1510 NVO3-aware firewall that would filter on the VNI IDs to provide some 1511 defense in depth inside or across NVO3 datacenters. 1513 A question for the IETF is whether we should document what to do or 1514 what not to do in such middleboxes. This document touches on areas 1515 of OAM and ECMP as it relates to middleboxes and it might make sense 1516 to document how encapsulation-aware middleboxes should avoid 1517 unintended consequences in those (and perhaps other) areas. 1519 In summary: 1521 o We are likely to see middleboxes that at least parse the headers 1522 for successful new encapsulations. 1523 o Should the IETF document considerations for what not to do in such 1524 middleboxes? 1526 20. Related Work 1528 The IETF and industry has defined encapsulations for a long time, 1529 with examples like GRE [RFC2890], VXLAN [RFC7348], and NVGRE 1530 [I-D.sridharan-virtualization-nvgre] being able to carry arbitrary 1531 Ethernet payloads, and various forms of IP-in-IP and IPsec 1532 encapsulations that can carry IP packets. As part of NVO3 there has 1533 been additional proposals like Geneve [I-D.gross-geneve] and GUE 1534 [I-D.herbert-gue] which look at more extensibility. NSH 1535 [I-D.quinn-sfc-nsh] is an example of an encapsulation that tries to 1536 provide extensibility mechanisms which target both hardware and 1537 software implementations. 1539 There is also a large body of work around MPLS encapsulations 1540 [RFC3032]. The MPLS-in-UDP work [I-D.ietf-mpls-in-udp] and GRE over 1541 UDP [I-D.ietf-tsvwg-gre-in-udp-encap] have worked on some of the 1542 common issues around checksum and congestion control. MPLS also 1543 introduced a entropy label [RFC6790]. There is also a proposal for 1544 MPLS encryption [I-D.farrelll-mpls-opportunistic-encrypt]. 1546 The idea to use a UDP encapsulation with a UDP source port for 1547 entropy for the underlay routers' ECMP dates back to LISP [RFC6830]. 1549 The pseudo-wire work [RFC3985] is interesting in the notion of 1550 layering additional services/characteristics such as ordered delivery 1551 or timely deliver on top of an encapsulation. That layering approach 1552 might be useful for the new encapsulations as well. For instance, 1553 the control word [RFC4385]. There is also material on congestion 1554 control for pseudo-wires in [I-D.ietf-pwe3-congcons]. 1556 Both MPLS and L2TP [RFC3931] rely on some control or signaling to 1557 establish state (for the path/labels in the case of MPLS, and for the 1558 session in the case of L2TP). The NVO3, SFC, and BIER encapsulations 1559 will also have some separation between the data plane and control 1560 plane, but the type of separation appears to be different. 1562 IEEE 802.1 has defined encapsulations for L2 over L2, in the form of 1563 Provider backbone Bridging (PBB) [IEEE802.1Q-2014] and Equal Cost 1564 Multipath (ECMP) [IEEE802.1Q-2014]. The latter includes something 1565 very similar to the way the UDP source port is used as entropy: "The 1566 flow hash, carried in an F-TAG, serves to distinguish frames 1567 belonging to different flows and can be used in the forwarding 1568 process to distribute frames over equal cost paths" 1570 TRILL, which is also a L2 over L2 encapsulation, took a different 1571 approach to entropy but preserved the ability for OAM frames 1572 [RFC7174] to use the same entropy hence ECMP path as data frames. In 1573 [I-D.ietf-trill-oam-fm] there 96 bytes of headers for entropy in the 1574 OAM frames, followed by the actual OAM content. This ensures that 1575 any headers, which fit in those 96 bytes except the OAM bit in the 1576 TRILL header, can be used for ECMP hashing. 1578 As encapsulations evolve there might be a desire to fit multiple 1579 inner packets into one outer packet. The work in 1580 [I-D.saldana-tsvwg-simplemux] might be interesting for that purpose. 1582 21. Acknowledgements 1584 The authors acknowledge the comments from Alia Atlas, Fred Baker, 1585 David Black, Bob Briscoe, Stewart Bryant, Mike Cox, Andy Malis, Radia 1586 Perlman, Carlos Pignataro, Jamal Hadi Salim, Michael Smith, and Lucy 1587 Yong. 1589 22. Open Issues 1591 o Middleboxes: 1593 * Due to OAM there are constraints on middleboxes in general. If 1594 middleboxes inspect the packet past the outer IP+UDP and 1595 encapsulation header and look for inner IP and TCP/UDP headers, 1596 that might violate the assumption that OAM packets will be 1597 handled the same as regular data packets. That issue is 1598 broader than just QoS - applies to firewall filters etc. 1599 * Firewalls looking at inner payload? How does that work for OAM 1600 frames? Even if it only drops ... TRILL approach might be an 1601 option? Would that encourage more middleboxes making the 1602 network more fragile? 1603 * Editorially perhaps we should pull the above two into a 1604 separate section about middlebox considerations? 1605 o Next-protocol indication - should it be common across different 1606 encapsulation headers? We will have different ways to indicate 1607 the presence of the first encapsulation header in a packet (could 1608 be a UDP destination port, an Ethernet type, etc depending on the 1609 outer delivery header). But for the next protocol past an 1610 encapsulation header one could envision creating or adoption a 1611 common scheme. Such a would also need to be able to identify 1612 following headers like Ethernet, IPv4/IPv6, ESP, etc. 1613 o Common OAM error reporting protocol? 1614 o There is discussion about timestamps, sequence numbers, etc in 1615 three different parts of the document. OAM, Congestion 1616 Considerations, and Service Model, where the latter argues that a 1617 pseudo-wire service should really be layered on top of the 1618 encapsulation using its own header. Those recommendations seem to 1619 be at odds with each other. Do we envision sequence numbers, 1620 timestamps, etc as potential extensions for OAM and CC? If so, 1621 those extensions could be used to provide a service which doesn't 1622 reorder packets. 1624 23. Change Log 1626 The changes from draft-rtg-dt-encap-01 based on feedback at the 1627 Dallas IETF meeting: 1629 o Setting the context that not all common issues might apply to all 1630 encapsulations, but that they should all be understood before 1631 being dismissed. 1632 o Clarified that IPv6 flow label is useful for entropy in 1633 combination with a UDP source port. 1634 o Editorially added a "summary" set of bullets to most sections. 1635 o Editorial clarifications in the next protocol section to more 1636 clearly state the three areas. 1637 o Folded the two next protocol sections into one. 1638 o Mention the MPLS first nibble issue in the next protocol section. 1639 o Mention that viewing the next protocol as an index to a table with 1640 processing instructions can provide additional flexibility in the 1641 protocol evolution. 1642 o For the OAM "don't forward to end stations" added that defining a 1643 bit seems better than using a special next-protocol value. 1644 o Added mention of DTLS in addition to IPsec for security. 1645 o Added some mention of IPv6 hob-by-hop options of other headers 1646 than potentially can be copied from inner to outer header. 1647 o Added text on architectural considerations when it might make 1648 sense to define an additional header/protocol as opposed to using 1649 the extensibility mechanism in the existing encapsulation 1650 protocol. 1651 o Clarified the "unconstrained TLVs" in the hardware friendly 1652 section. 1653 o Clarified the text around checksum verification and full vs. 1654 header checksums. 1655 o Added wording that the considerations might apply for encaps 1656 outside of the routing area. 1657 o Added references to draft-ietf-pwe3-congcons, draft-ietf-tsvwg- 1658 rfc5405bis, RFC2473, and RFC7325 1659 o Removed reference to RFC3948. 1660 o Updated the acknowledgements section. 1661 o Added this change log section. 1663 24. References 1665 24.1. Normative References 1667 [I-D.ietf-tsvwg-rfc5405bis] 1668 Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1669 Guidelines", draft-ietf-tsvwg-rfc5405bis-10 (work in 1670 progress), March 2016. 1672 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1673 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1674 December 1998, . 1676 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1677 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1678 December 1998, . 1680 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 1681 RFC 2890, DOI 10.17487/RFC2890, September 2000, 1682 . 1684 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1685 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1686 . 1688 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1689 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1690 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1691 . 1693 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1694 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1695 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1696 . 1698 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 1699 Edge-to-Edge (PWE3) Architecture", RFC 3985, 1700 DOI 10.17487/RFC3985, March 2005, 1701 . 1703 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 1704 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 1705 Use over an MPLS PSN", RFC 4385, DOI 10.17487/RFC4385, 1706 February 2006, . 1708 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 1709 for Application Designers", BCP 145, RFC 5405, 1710 DOI 10.17487/RFC5405, November 2008, 1711 . 1713 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1714 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1715 2010, . 1717 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1718 for Equal Cost Multipath Routing and Link Aggregation in 1719 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1720 . 1722 [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and 1723 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 1724 RFC 6790, DOI 10.17487/RFC6790, November 2012, 1725 . 1727 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1728 Locator/ID Separation Protocol (LISP)", RFC 6830, 1729 DOI 10.17487/RFC6830, January 2013, 1730 . 1732 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1733 UDP Checksums for Tunneled Packets", RFC 6935, 1734 DOI 10.17487/RFC6935, April 2013, 1735 . 1737 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1738 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1739 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1740 . 1742 [RFC7174] Salam, S., Senevirathne, T., Aldrin, S., and D. Eastlake 1743 3rd, "Transparent Interconnection of Lots of Links (TRILL) 1744 Operations, Administration, and Maintenance (OAM) 1745 Framework", RFC 7174, DOI 10.17487/RFC7174, May 2014, 1746 . 1748 [RFC7325] Villamizar, C., Ed., Kompella, K., Amante, S., Malis, A., 1749 and C. Pignataro, "MPLS Forwarding Compliance and 1750 Performance Requirements", RFC 7325, DOI 10.17487/RFC7325, 1751 August 2014, . 1753 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1754 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1755 eXtensible Local Area Network (VXLAN): A Framework for 1756 Overlaying Virtualized Layer 2 Networks over Layer 3 1757 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1758 . 1760 [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., 1761 Kreeger, L., and M. Napierala, "Problem Statement: 1762 Overlays for Network Virtualization", RFC 7364, 1763 DOI 10.17487/RFC7364, October 2014, 1764 . 1766 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1767 Rekhter, "Framework for Data Center (DC) Network 1768 Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 1769 2014, . 1771 24.2. Informative References 1773 [I-D.briscoe-conex-data-centre] 1774 Briscoe, B. and M. Sridharan, "Network Performance 1775 Isolation in Data Centres using Congestion Policing", 1776 draft-briscoe-conex-data-centre-02 (work in progress), 1777 February 2014. 1779 [I-D.farrelll-mpls-opportunistic-encrypt] 1780 Farrel, A. and S. Farrell, "Opportunistic Security in MPLS 1781 Networks", draft-farrelll-mpls-opportunistic-encrypt-05 1782 (work in progress), June 2015. 1784 [I-D.gross-geneve] 1785 Gross, J., Sridhar, T., Garg, P., Wright, C., Ganga, I., 1786 Agarwal, P., Duda, K., Dutt, D., and J. Hudson, "Geneve: 1787 Generic Network Virtualization Encapsulation", draft- 1788 gross-geneve-02 (work in progress), October 2014. 1790 [I-D.herbert-gue] 1791 Herbert, T., Yong, L., and O. Zia, "Generic UDP 1792 Encapsulation", draft-herbert-gue-03 (work in progress), 1793 March 2015. 1795 [I-D.herbert-remotecsumoffload] 1796 Herbert, T., "Remote checksum offload for encapsulation", 1797 draft-herbert-remotecsumoffload-02 (work in progress), 1798 March 2016. 1800 [I-D.ietf-mpls-in-udp] 1801 Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1802 "Encapsulating MPLS in UDP", draft-ietf-mpls-in-udp-11 1803 (work in progress), January 2015. 1805 [I-D.ietf-nvo3-arch] 1806 Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. 1807 Narten, "An Architecture for Overlay Networks (NVO3)", 1808 draft-ietf-nvo3-arch-04 (work in progress), October 2015. 1810 [I-D.ietf-pwe3-congcons] 1811 Stein, Y., Black, D., and B. Briscoe, "Pseudowire 1812 Congestion Considerations", draft-ietf-pwe3-congcons-02 1813 (work in progress), July 2014. 1815 [I-D.ietf-sfc-architecture] 1816 Halpern, J. and C. Pignataro, "Service Function Chaining 1817 (SFC) Architecture", draft-ietf-sfc-architecture-11 (work 1818 in progress), July 2015. 1820 [I-D.ietf-sfc-problem-statement] 1821 Quinn, P. and T. Nadeau, "Service Function Chaining 1822 Problem Statement", draft-ietf-sfc-problem-statement-13 1823 (work in progress), February 2015. 1825 [I-D.ietf-trill-oam-fm] 1826 Senevirathne, T., Finn, N., Salam, S., Kumar, D., 1827 Eastlake, D., Aldrin, S., and L. Yizhou, "TRILL Fault 1828 Management", draft-ietf-trill-oam-fm-11 (work in 1829 progress), October 2014. 1831 [I-D.ietf-tsvwg-circuit-breaker] 1832 Fairhurst, G., "Network Transport Circuit Breakers", 1833 draft-ietf-tsvwg-circuit-breaker-13 (work in progress), 1834 February 2016. 1836 [I-D.ietf-tsvwg-gre-in-udp-encap] 1837 Yong, L., Crabbe, E., Xu, X., and T. Herbert, "GRE-in-UDP 1838 Encapsulation", draft-ietf-tsvwg-gre-in-udp-encap-11 (work 1839 in progress), March 2016. 1841 [I-D.ietf-tsvwg-port-use] 1842 Touch, J., "Recommendations on Using Assigned Transport 1843 Port Numbers", draft-ietf-tsvwg-port-use-11 (work in 1844 progress), April 2015. 1846 [I-D.quinn-sfc-nsh] 1847 Quinn, P., Guichard, J., Surendra, S., Smith, M., 1848 Henderickx, W., Nadeau, T., Agarwal, P., Manur, R., 1849 Chauhan, A., Halpern, J., Majee, S., Elzur, U., Melman, 1850 D., Garg, P., McConnell, B., Wright, C., and K. Kevin, 1851 "Network Service Header", draft-quinn-sfc-nsh-07 (work in 1852 progress), February 2015. 1854 [I-D.saldana-tsvwg-simplemux] 1855 Saldana, J., "Simplemux. A generic multiplexing protocol", 1856 draft-saldana-tsvwg-simplemux-04 (work in progress), 1857 January 2016. 1859 [I-D.shepherd-bier-problem-statement] 1860 Shepherd, G., Dolganow, A., and a. 1861 arkadiy.gulko@thomsonreuters.com, "Bit Indexed Explicit 1862 Replication (BIER) Problem Statement", draft-shepherd- 1863 bier-problem-statement-02 (work in progress), February 1864 2015. 1866 [I-D.sridharan-virtualization-nvgre] 1867 Garg, P. and Y. Wang, "NVGRE: Network Virtualization using 1868 Generic Routing Encapsulation", draft-sridharan- 1869 virtualization-nvgre-08 (work in progress), April 2015. 1871 [I-D.wei-tsvwg-tunnel-congestion-feedback] 1872 Wei, X., Zhu, L., Deng, L., and B. Briscoe, "Tunnel 1873 Congestion Feedback", draft-wei-tsvwg-tunnel-congestion- 1874 feedback-04 (work in progress), June 2015. 1876 [I-D.wijnands-bier-architecture] 1877 Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and 1878 S. Aldrin, "Multicast using Bit Index Explicit 1879 Replication", draft-wijnands-bier-architecture-05 (work in 1880 progress), March 2015. 1882 [I-D.wijnands-mpls-bier-encapsulation] 1883 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and 1884 S. Aldrin, "Encapsulation for Bit Index Explicit 1885 Replication in MPLS Networks", draft-wijnands-mpls-bier- 1886 encapsulation-02 (work in progress), December 2014. 1888 [I-D.xu-bier-encapsulation] 1889 Xu, X., Somasundaram, S., Jacquenet, C., and R. Raszuk, 1890 "BIER Encapsulation", draft-xu-bier-encapsulation-03 (work 1891 in progress), October 2015. 1893 [IEEE802.1Q-2014] 1894 IEEE, "IEEE Standard for Local and metropolitan area 1895 networks--Bridges and Bridged Networks", IEEE Std 802.1Q- 1896 2014, 2014, 1897 . 1899 (Access Controlled link within page) 1901 [RFC0033] Crocker, S., "New Host-Host Protocol", RFC 33, 1902 DOI 10.17487/RFC0033, February 1970, 1903 . 1905 Authors' Addresses 1907 Erik Nordmark 1908 Arista Networks 1909 5453 Great America Parkway 1910 Santa Clara, CA 95054 1911 USA 1913 Email: nordmark@arista.com 1915 Albert Tian 1916 Ericsson Inc. 1917 300 Holger Way 1918 San Jose, California 95134 1919 USA 1921 Email: albert.tian@ericsson.com 1923 Jesse Gross 1924 VMware 1925 3401 Hillview Ave. 1926 Palo Alto, CA 94304 1927 USA 1929 Email: jgross@vmware.com 1931 Jon Hudson 1932 Brocade Communications Systems, Inc. 1933 130 Holger Way 1934 San Jose, CA 95134 1935 USA 1937 Email: jon.hudson@gmail.com 1939 Lawrence Kreeger 1940 Cisco Systems, Inc. 1941 170 W. Tasman Avenue 1942 San Jose, CA 95134 1943 USA 1945 Email: kreeger@cisco.com 1946 Pankaj Garg 1947 Microsoft 1948 1 Microsoft Way 1949 Redmond, WA 98052 1950 USA 1952 Email: pankajg@microsoft.com 1954 Patricia Thaler 1955 Broadcom Corporation 1956 3151 Zanker Road 1957 San Jose, CA 95134 1958 USA 1960 Email: pthaler@broadcom.com 1962 Tom Herbert 1963 Facebook 1964 1 Hacker Way 1965 Menlo Park, CA 94052 1966 USA 1968 Email: tom@herbertland.com