idnits 2.17.1 draft-detienne-dmvpn-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC2332]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC2332, but the abstract doesn't seem to directly say this. It does mention RFC2332 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC2332, updated by this document, for RFC5378 checks: 1998-04-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 20, 2013) is 3780 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'S1' is mentioned on line 294, but not defined == Missing Reference: 'S2' is mentioned on line 294, but not defined == Missing Reference: 'S3' is mentioned on line 125, but not defined == Missing Reference: 'S4' is mentioned on line 125, but not defined == Missing Reference: 'H1' is mentioned on line 725, but not defined == Unused Reference: 'RFC5226' is defined on line 1417, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 4026 ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) ** Obsolete normative reference: RFC 5996 (Obsoleted by RFC 7296) ** Downref: Normative reference to an Informational RFC: RFC 7018 -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) Summary: 5 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPSECME Working Group F. Detienne 3 Internet-Draft M. Kumar 4 Updates: 2332 (if approved) M. Sullenberger 5 Intended status: Standards Track Cisco 6 Expires: June 23, 2014 December 20, 2013 8 Flexible Dynamic Mesh VPN 9 draft-detienne-dmvpn-01 11 Abstract 13 The purpose of a Dynamic Mesh VPN (DMVPN) is to allow IPsec/IKE 14 Security Gateways administrators to configure the devices in a 15 partial mesh (often a simple star topology called Hub-Spokes) and let 16 the Security Gateways establish direct protected tunnels called 17 Shortcut Tunnels. These Shortcut Tunnels are dynamically created 18 when traffic flows and are protected by IPsec. 20 To achieve this goal, this document extends NHRP ([RFC2332]) into a 21 routing policy feed and integrates GRE tunneling with IKEv2 and IPsec 22 to provide the necessary cryptographic security. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on June 23, 2014. 41 Copyright Notice 43 Copyright (c) 2013 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Tunnel Types . . . . . . . . . . . . . . . . . . . . . . . . 5 61 4. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 62 4.1. Initial Connectivity . . . . . . . . . . . . . . . . . . 6 63 4.2. Initial Routing Table Status . . . . . . . . . . . . . . 8 64 4.3. Indirection Notification . . . . . . . . . . . . . . . . 9 65 4.4. Node Discovery via Resolution Request . . . . . . . . . . 10 66 4.5. Resolution Request Forwarding . . . . . . . . . . . . . . 11 67 4.6. Egress node NHRP cache and Tunnel Creation . . . . . . . 12 68 4.7. Resolution Reply format and processing . . . . . . . . . 13 69 4.8. From Hub and Spoke to Dynamic Mesh . . . . . . . . . . . 14 70 4.9. Remote Access Clients . . . . . . . . . . . . . . . . . . 15 71 4.10. Node mutual authentication . . . . . . . . . . . . . . . 16 72 5. NHRP Extension Format . . . . . . . . . . . . . . . . . . . . 17 73 5.1. NHRP Traffic Indication . . . . . . . . . . . . . . . . . 17 74 6. Security Considerations . . . . . . . . . . . . . . . . . . . 19 75 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 76 8. Compliance against ADVPN requirements . . . . . . . . . . . . 19 77 9. Design Considerations . . . . . . . . . . . . . . . . . . . . 26 78 9.1. Routing Policy and RFC4301 Security . . . . . . . . . . . 26 79 9.2. Using Configuration Attributes . . . . . . . . . . . . . 28 80 9.3. NAT Support . . . . . . . . . . . . . . . . . . . . . . . 30 81 10. Acknowldegements . . . . . . . . . . . . . . . . . . . . . . 30 82 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 83 11.1. Normative References . . . . . . . . . . . . . . . . . . 30 84 11.2. Informative References . . . . . . . . . . . . . . . . . 31 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 87 1. Introduction 89 This document describes a Dynamic Mesh VPN (DMVPN), in which an 90 initial partial mesh expands to create direct connections called 91 Shortcut Tunnels between endpoints that need to exchange data but are 92 not directly connected in the initial mesh. 94 The approach used in the design of this specification gives DMVPN the 95 following advantages: 97 o Can run with routing protocol or with IKEv2 policies (CP_Exchange) 98 making the specification suitable for complex gateways and remote 99 access clients alike. 100 o Can handle virtually infinite number of prefixes (including mobile 101 prefixes) thanks to routing protocol. 102 o The tunnel approach allows load balancing over multiple Transport 103 networks and multicast to work natively. 104 o Routing policy can apply more complex peer selection than 5-tuple 105 traffic selector. 106 o The layered approach allows evolution of other specifications used 107 over DMVPN without having to rewrite or modify DMVPN. 108 o Non-IP protocols such as ISIS, MPLS, plain ethernet... are 109 natively supported (e.g. for Data Center Interconnection). 111 In a generic manner, DMVPN topologies initialize as Hub-Spoke 112 networks where Spoke Security Gateway nodes S* connect to Hub 113 Security Gateway nodes H* over a public transport network (such as 114 the Internet) considered insufficiently secure so as to mandate the 115 use of IPsec and IKE. For scalability and redundancy reasons, there 116 may be multiple hubs; the Hubs would then be connected together 117 through the DMVPN. The diagram Figure 1 depicts this situation. 119 DC1 DC2 120 | | 121 [H1]-----[H2] 122 | | | | 123 +-+ | | +-+ 124 | | | | 125 [S1] [S2] [S3] [S4] 126 | | | 127 D1 D2 D3 129 Figure 1: Hub and Spoke, multiple hubs, multiple spokes 131 Initially, the Security Gateway nodes (S*) are configured to build 132 tunnels secured with IPsec to the Security Gateway node (H*) in a hub 133 and spoke style network (any partial mesh will do, but Hub-Spoke is 134 common and easily understood). This initial network is then used 135 when traffic starts flowing between the protected networks D*. DMVPN 136 uses NHRP as a signaling mechanism over the S*-H* and H*-H* tunnels 137 to trigger the spokes (S*) to discover each other and build dynamic, 138 direct Shortcut Tunnels. The Shortcut Tunnels allow those spokes to 139 communicate directly with each other without forwarding traffic 140 through the hub, essentially creating a dynamic mesh. 142 The spokes can be either routers or firewalls playing the role of 143 Security Gateways or hosts such as computers, mobile phones,etc. 145 protecting their own traffic. Nodes S1, S2 and S3 above are routers 146 while S4 is a host implementation. 148 This document describes how NHRP is modified and augmented to allow 149 the rapid creation of dynamic IPsec tunnels between two devices. 150 Throughout this document, we will call these devices participating in 151 the DMVPN "nodes". 153 In the context of this document, the nodes protect a topologically 154 dispersed Private, Overlay Network address space. The nodes allow 155 the devices in the Overlay Network to communicate securely with each 156 other via GRE tunnels secured by IPsec using dynamic tunnels 157 established between the nodes over the (presumably insecure) 158 Transport network. I.e. the protected tunnel packets are forwarded 159 over this Transport network. 161 The NBMA Next Hop Resolution Protocol (NHRP) as described in 162 [RFC2332] allows an ingress node to determine the internetworking 163 layer address and NBMA address of an egress node. The servers in 164 such an NBMA network provide the functionality of address resolution 165 based on a cache which contains protocol layer address to NBMA 166 subnetwork layer address resolution information. This can be used to 167 create a virtual network where dynamic virtual circuits can be 168 created on an as needed basis. In this document, we will depart the 169 underlying notion of a centralized NHS. 171 All data traffic, NHRP frames and other control traffic needed by 172 this DMVPN MUST be protected by IPsec. In order to efficiently 173 support Layer 2 based protocols, all packets and frames MUST be 174 encapsulated in GRE ([RFC2784]) first; the resulting GRE packet then 175 MUST be protected by IPsec. IPsec transport mode MUST be supported 176 while IPsec tunnel mode MAY be used. The usage of a GRE 177 encapsulation protected by IPsec is described in [RFC4301]. 178 Implementations SHOULD strongly link GRE and IPsec SA's through some 179 form of connection latching as described in [RFC5660]. 181 2. Terminology 183 The NHRP semantic is used throughout this document however some 184 additional terminology is used to better fit to the context. 186 o Protected Network, Private Network: a network hosted by one of the 187 nodes. The protected network IP addresses are those that are 188 resolved by NHRP into an NBMA address. 189 o Overlay Network: the entire network composed with the Protected 190 Networks and the IP addresses installed on the Tunnel interfaces 191 instantiating the DMVPN. 193 o Transport Network, Public Network: the network transporting the 194 GRE/IPsec packets. 195 o Nodes: the devices connected by the DMVPN that implement NHRP, GRE 196 /IPsec and IKE. 197 o Ingress Node: The NHRP node that takes data packets from off of 198 the DMVPN and injects them into the DMVPN on either a multi-hop 199 tunnel path (initially) or single hop shortcut tunnel. Also the 200 node that will send an NHRP Resolution Request and receive an NHRP 201 Resolution Reply to build a short-cut tunnel. 202 o Egress Node: The NHRP node that extracts data packets from the 203 DMVPN and forwards them off of the DMVPN. Also the node that 204 answers an NHRP Resolution Request and send an NHRP Resolution 205 Reply. 206 o Intermediate Node: An NHRP node that is in the middle of multi-hop 207 tunnel path between an Ingress and Egress Node. For the 208 particular data traffic in question the Intermediate node will 209 receive packets from the DMVPN and resend them (hair-pin) them 210 back onto the DMVPN. 212 Note, a particular node in the DMVPN, may at the same time be an 213 Ingress, Egress and Intermediate node depending on the data traffic 214 flow being looked at. 216 In general, DMVPN nodes make extensive use of the Local Address 217 Groups (LAG) and Logically Independent Subnets (LIS)models as 218 described in [RFC2332]. A compliant implementation MUST support the 219 LAG model and SHOULD support the LIS model. 221 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 222 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 223 "OPTIONAL" in this document are to be interpreted as described in 224 [RFC2119]. 226 3. Tunnel Types 228 The tunnels described in this document are of type GRE/IPsec. GRE/ 229 IPsec allows a single pair of IPsec SA's to be negotiated between the 230 DMVPN nodes. From an IPsec aggregation standpoint, this means less 231 negotiation, cleaner use of expensive resources and less 232 reprogramming of the data plane by the IKE control plane as 233 additional networks are discovered between any two peers. 235 In the remainder of this document, GRE and GRE/IPsec will be used 236 interchangeably depending on the focused layer but always imply "GRE 237 protected by IPsec" 239 Taking advantage of the GRE encapsulation, and while NHRP could be 240 forwarded over IP, the RFC recommended Layer 2 NHRP frames have been 241 retained in order to simplify the security policies (packet filters 242 do not have to be augmented to allow NHRP through, no risk of 243 mistakenly propagating frames where they should not, etc.). 244 Compliant implementations MUST support L2 NHRP frames. 246 DMVPN can be implemented in a number of ways and this document places 247 no restriction on the actual implementation. This section covers 248 what the authors believe are the important implementation 249 recommendations to construct a scalable implementation. 251 The authors recommend using a logical interface construct to 252 represent the GRE tunnels. These interfaces are called Tunnel 253 Interfaces or simply Interfaces from here onward. 255 In the remainder of this document, we will assume the implementation 256 uses point-to-point Tunnel Interfaces; routes to prefixes in the 257 Overlay network are in the Routing Table (aka Routing Information 258 Base). These routes forward traffic toward the tunnel interfaces. 260 Point-to-Multipoint GRE interfaces (aka multipoint interfaces for 261 short) can also be used. In that case there is by construction only 262 one tunnel source NBMA address and the interface has multiple tunnel 263 endpoints. In this case NHRP registration request and reply 264 messages, [RFC2332], are used to pass the tunnel address to tunnel 265 NBMA address mapping from the NHC (S*) to the NHS (H*). The NHRP 266 registration request and reply MAY be restricted to a single direct 267 tunnel hop between the NHC (S*) and NHS (H*). 269 For didactic reasons, and an easier understanding of the LAG support, 270 we will use the point-to-point construct to highlight the protocol 271 behavior in the remainder of this document. An implementation can 272 use different models (point-to-point, multipoint, bump in the 273 stack,...) but MUST comply to the external (protocol level) behavior 274 described in this document. 276 4. Solution Overview 278 4.1. Initial Connectivity 280 We assume the following scenario where nodes (S1, S2, H1, H2) 281 depicted in Figure 2 supporting GRE, IPsec/IKE and NHRP establish 282 connections instantiated by GRE tunnels. Those GRE tunnels SHOULD be 283 protected by IPsec/IKE. These tunnels will be used to secure all the 284 data traffic as well as the NHRP control frames. In general, routing 285 protocols (and possibly other control protocols such as NHRP or IKE) 286 will also be protected by IPsec or IKE. 288 DC1 289 | 290 [H1] 291 | | ] 292 +-+ +-+ ] GRE/IPsec tunnels over Transport network 293 | | ] 294 [S1] [S2] 295 | | 296 D1 D2 298 Figure 2: Hub and Spoke Initial Connectivity 300 It is assumed that S1, H1 and S2 are connected via a shared Transport 301 network (typically a Public, NBMA network) and there is connectivity 302 between the nodes over that transport network. 304 The nodes possess multiple interfaces; each of which has a dedicated 305 IP address: 307 o a public interface IntPub connected to the transport network; IP 308 address: Pub{node} 309 o one or several tunnel interface Tunnel0,1,.. (GRE/IPsec) 310 connecting to peers; IP address: Tun{i}{node} 311 o a private interface IntPriv facing the private network of the 312 node; IP address: Priv{node} 314 e.g. node S1 owns the following addresses: PubS1, TunS1 and PrivS1 316 The networks D1, D2, DC1 and also the tunnel address Tun{i} can and 317 are presumed to be private in the sense that their address space is 318 kept independent from the transport network address space. Together, 319 they form the Overlay network. For the transport network, the 320 address family is either IPv4 or IPv6. In the context of this 321 document, for the overlay network, the address family is IPv4 and/or 322 IPv6. 324 Initially, nodes S1 and S2 create a connection to node H1. 325 Optionally, S1 and S2 MAY register to H1 via NHRP. Typically the GRE 326 tunnels between S* and H1 will be protected by IPsec. A compliant 327 implementation MUST support IPsec protected GRE tunnels and SHOULD 328 support unprotected GRE tunnels. 330 At the end of this section, a dynamic tunnel will be set up between 331 S1 and S2 and traffic will flow directly through S1 and S2 without 332 going through H1. 334 4.2. Initial Routing Table Status 336 In the context of this document, the authors make no assumption about 337 how the routing tables are initially populated but one can assume 338 that routing protocols exchange information between H1 and S1 and 339 between H1 and S2. 341 In this diagram, we assume each node has routes (summarized or 342 specific) for networks D1, D2, DC1 which are IP networks. We assume 343 the summary prefix SUM to encompass all the private networks depicted 344 on this diagram. We assume the communication between those networks 345 needs to be protected and therefore, the routes point to tunnels. 346 I.e. S1 knows a route summarizing all the Overlay subnets and this 347 route points to the GRE/IPsec tunnel leading to H1. Note, the the 348 summary prefix is a network design choice and it can be replaced by a 349 prefix summary manifold or individual non-summarized routes. 351 Example 1: Node S1 has the following routing table: 353 o TunH1 => Tunnel0 354 o SUM => TunH1 on Tunnel0 355 o 0.0.0.0/0 => IntPub 356 o D1 => IntPriv 358 Example 2: Node H1 has the following routing table: 360 o TunS1 => Tunnel1 361 o TunS2 => Tunnel2 362 o D1 => TunS1 on Tunnel1 363 o D2 => TunS2 on Tunnel2 364 o 0.0.0.0/0 => IntPub 365 o DC1 => IntPriv 367 The exact format of the routing table is implementation dependent but 368 the node discovery principle MUST be enforced and the implementation 369 MUST be compatible with an implementation using the routing tables 370 outlined above. 372 This document does not specify how the routes are installed but it 373 can be assumed that the routes (1) and (2) in the tables above are 374 exchanged between S* and H* nodes after the S*-H* connections have 375 been duly authenticated. In a DMVPN solution, it is typical that the 376 routes are exchanged by a route exchange protocol (e.g. BGP or IKE as 377 shown in Section 9.2) or are installed statically (usually a mix of 378 both). It is important that routing updates be filtered in order to 379 prevent a node from advertising improper routes to another node. 380 This filtering is out of the scope of this document as most routing 381 protocol implementations are already capable of such filtering. In 382 order to meet these criteria, an implementation SHOULD offer 383 identity-based policies to filter those routes on a per peer basis. 385 When a device Ds on network D1 needs to connect to a device Dd on 386 network D2 388 o a data packet ip(Ds, Dd) is sent and reaches S1 on IntPriv 389 o the data packet is routed by S1 via Tunnel0 toward H1; S1 390 encapsulates, protects and forwards this packet out IntPub via the 391 transport network to H1 392 o H1 receives the protected packet on IntPub; H1 decrypts and 393 decapsulates this packet; the resulting data packet looks to the 394 IP stack on H1 as if it arrived on interface Tunnel1 395 o the data packet is routed by H1 via Tunnel2 toward S2; H1 396 encapsulates, protects and forwards this out IntPub via the 397 transport network to S2 398 o S2 receives the protected packet on IntPub; S2 decrypts and 399 decapsulates this packet; the resulting data packet looks to the 400 IP stack as if it arrived on interface Tunnel0 401 o S2 routes the data packet out of its IntPriv interface to the 402 destination Dd 404 4.3. Indirection Notification 406 Considering the packet flow seen in {previous section}. When H1 407 (Intermediate Node) receives a packet from the ingress node S1 and 408 forwards it to the Next Node S2, it technically re-injects the packet 409 back into the DMVPN. 411 At this point H1 SHOULD an Indirection Notification message to S1. 412 The Indirection Notification is a dedicated NHRP message indicating 413 the ingress node that it sent an IP packet that had to be forwarded 414 via the intermediate node to another node. The Indirection 415 Notification MUST contain the first 64 bytes of the clear text IP 416 packet that was forwarded to the next node. The exact format of this 417 message is detailed in the section [PACKET_FORMAT]. 419 The Indirection Notification MUST be sent back to the ingress node 420 through the same GRE/IPsec tunnel upon which the hair-pinned IP 421 packet was received and MUST be rate limited. 423 This message is a hint that a direct tunnel SHOULD be built between 424 the end-nodes, bypassing intermediate nodes. This tunnel is called a 425 "Shortcut Tunnel". 427 Compliant implementations MUST be able to send and accept the 428 Indirection Notification, however implementations MUST continue to 429 accept traffic over the spoke-hub-spoke path during spoke-spoke path 430 establishment (Shortcut Tunnel). 432 When a node receives such a notification, it MUST perform the 433 following: 435 o parse and accept the message 436 o extract the source address of the original protected IP packet 437 from the 64 bytes available 438 o perform a route lookup on this source address 440 * If the routing to this source address is also via the DMVPN 441 network upon which it received the Indirect Notification then 442 this node is an intermediate node on the tunnel path from the 443 ingress node (injection point) to the egress node (extraction 444 point). In this case this intermediate node MUST silently drop 445 the Indirect Notification that it received. Note that if the 446 node is an intermediate node, it is likely that it has 447 generated and sent an Indirect Notification about this same 448 protected IP packet to its tunnel neighbor on the tunnel path 449 back towards the ingress node (injection point). This is 450 correct behavior. 451 o if the previous step did succeed, extract the destination IP 452 address (Dd) of the original protected IP packet from the 64 bytes 453 available. 455 The ingress node MAY also extract additional information from those 456 64 bytes such as the protocol type, port numbers etc. 458 In steady state, Indirection Notifications MUST be accepted and 459 processed as above from any trusted peer with which the node has a 460 direct connection. 462 4.4. Node Discovery via Resolution Request 464 After processing the information in the Indirection Notify, the 465 ingress node local policy SHOULD determine whether a shortcut tunnel 466 needs to be established. Assuming the local policy requests a 467 shortcut tunnel, the ingress node MUST emit a Resolution Request for 468 the destination IP address Dd. 470 More specifically, the NHRP Resolution Request emitted by S1 to 471 resolve Dd will contain the following fields: 473 o Fixed Header 475 * ar$op.version = 1 476 * ar$op.type = 1 478 o Common Header (Mandatory Header) 480 * Source NBMA Address = PubS1 481 * Source Protocol Address = TunS1 482 * Destination Protocol Address = Dd 484 The resolution request is routed by S1 to H1 over the GRE/IPsec 485 tunnel. If an intermediate node has a valid (authoritative) NHRP 486 mapping in its cache, it MAY respond. An intermediate node SHOULD 487 NOT answer Resolution Requests in any other case. 489 Note that a Resolution Request can be voluntarily emitted by Security 490 Gateway and is not strictly limited to a response to the Indirection 491 Notify message. Such cases and policies are out of the scope of the 492 document. 494 The sending of Resolution Requests by a ingress node MUST be rate 495 limited. 497 4.5. Resolution Request Forwarding 499 The Resolution Request can be sent by S1 to an explicit or implicit 500 next-hop-server. In the explicit scenario, the NHS is defined in the 501 node configuration. In the implicit case, the node can infer the NHS 502 to use. Similarly, an intermediate node that cannot answer a 503 Resolution Request SHOULD forward the Resolution Request to an 504 implicit or explicit NHS in the same manner unless local policy 505 forbids resolution forwarding between Spokes. There can be an 506 undetermined number of intermediate node. 508 A DMVPN compliant implementation MUST be able to infer the NHS from 509 its routing table in the following way: 511 o the address Dd to be resolved is looked up in the routing table 512 (other parameters can be considered by the ingress node but these 513 will not be available to intermediate nodes) 514 o the best route for Dd is selected (longest prefix match) 516 * if several routes match (same prefix length) only the routes 517 pointing to a DMVPN Tunnel interface are kept. This SHOULD NOT 518 occur in practice. 519 o if the best route found points to a DMVPN Tunnel interface, the 520 next-hop address MUST be used as NHS 521 o if the best route found does not point to a DMVPN Tunnel interface 522 the forwarding of the packet stops and the matching prefix P and 523 prefix len (Plen) is kept temporarily. Very often, P/Plen == D2/ 524 D2len (this is the case in the diagram used in this document) but 525 this may not always be true depending on the structure of the 526 networks protected by S2. The associated prefix length (Plen) is 527 also preserved. 529 If the Resolution Request forwarding stops at the ingress node (at 530 emission), the Resolution Request process MUST be stopped with an 531 error for address Dd. If the lookup succeeds, the next-hop's NBMA 532 address is used as destination address of the GRE encapsulation. 533 Before forwarding, each intermediate node MUST add a Forward Transit 534 Extension record to the NHRP Resolution Request. 536 Any intermediate nodes SHOULD NOT cache any information while 537 forwarding Resolution Requests. In the case an intermediate node 538 implementation caches information, it MUST NOT assume that other 539 intermediate nodes will also cache that information. 541 Thanks to the forwarding model described in this document and due to 542 the absence of intermediate caching, Server Cache Synchronization is 543 not needed and even recommended against. Therefore, a DMVPN 544 compliant implementation MUST NOT rely on such a synchronization 545 which would have adverse effects on the scalability of the entire 546 system. 548 If the TTL of the request drops to zero or the current node finds 549 itself on a Forward Transit Extension record then the NHRP Resolution 550 Request MUST be dropped and an NHRP error message sent to the source. 552 When the Resolution Request eventually reaches a node where the 553 route(s) to the destination would take it out through a non-DMVPN 554 interface, the Resolution Request process MUST be stopped and this 555 node becomes the egress node. The egress node is typically (by 556 virtue of network design) the topologically closest node to the 557 resolved address Dd. 559 The egress node must then prepare itself for replying with a 560 Resolution Reply. 562 4.6. Egress node NHRP cache and Tunnel Creation 564 When a node declares itself an egress node while attempting to 565 forward a Resolution Request, it MUST evaluate the need for 566 establishing a shortcut tunnel according to a user policy. Note that 567 an implementation is not mandated to support a user policy but then 568 the implicit policy MUST request the shortcut establishment. If 569 policies are supported, one of the possible policies MUST be shortcut 570 establishment. 572 If a shortcut is required, the egress node MUST perform the following 573 operations: 575 o the source NBMA address (PubS1) is extracted from the NHRP 576 Resolution Request 577 o if a GRE/IPsec tunnel already exists between PubS2 and PubS1, this 578 tunnel is selected (assuming interface TunnelX) 579 o otherwise, a new GRE shortcut tunnel is created between PubS2 and 580 PubS1 (assuming interface TunnelX); the GRE tunnel SHOULD be 581 protected by IPsec and the SA's immediately negotiated by IKE 582 o an NHRP cache entry is created for TunS1 => PubS1. The entry 583 SHOULD NOT remain in the cache for more than the specified Hold 584 Time (from the NHRP Resolution Request). This NHRP cache entry 585 may be 'refreshed' for another hold time period prior to expiry by 586 receipt of another matching NHRP Resolution Request or by sending 587 an NHRP Resolution Request and receiving an NHRP Resolution Reply. 588 o a route is inserted into the RIB: TunS1/32 => PubS1 on TunnelX 589 (assuming IPv4) 591 Regardless how the shortcut tunnel is created a node SHOULD NOT try 592 to establish more than one tunnel with a remote node. If there are 593 other tunnels not managed by DMVPN, the tunnel selectors (source, 594 destination, tunnel key) MUST NOT interfere with the DMVPN shortcut 595 tunnels. 597 If a tunnel has to be created and SA's established, a node SHOULD 598 wait for the tunnel to be in place before proceeding with further 599 operations. Regardless of how those operations are timed in the 600 implementation, a node SHOULD avoid dropping data packets during the 601 cache and SA installation. The order of operations SHOULD ensure 602 continuous forwarding. 604 4.7. Resolution Reply format and processing 606 After the operations described in the previous section are completed, 607 a Resolution Reply MUST be emitted by the egress node. Instead of 608 strictly answering with just the host address being looked up, the 609 Reply will contain the entire prefix (P/Plen) that was found during 610 the stopped Resolution Request forwarding phase. 612 The Resolution Reply main fields MUST be populated as follows: 614 o Fixed Header 616 * ar$op.version = 1 617 * ar$op.type = 2 618 o Common Header (Mandatory Header) 620 * Source NBMA Address = PubS1 621 * Source Protocol Address = TunS1 622 * Destination Protocol Address = Dd 624 o CIE-1 626 * Prefix-len = Plen 627 * Client NBMA Address = PubS2 628 * Client Protocol Address = TunS2 630 The Destination Protocol address remains the address being resolved 631 (Dd) while the CIE actually contains the remainder of the response 632 (Plen via NBMA PubS2, Protocol TunS2). The Resolution Reply MUST be 633 forwarded to the ingress node S1 either through the shortcut tunnel 634 or via the Hub. 636 If the address family of the resolved address Dd is IPv6, the 637 Resolution Reply SHOULD be augmented with a second CIE containing the 638 egress node's link local address. 640 If a node decides to block the resolution process, it MAY simply drop 641 the Resolution Request or avoid sending a Resolution Reply. A node 642 MAY also send a NACK Resolution Reply. 644 When the Resolution Reply is received by the ingress node, a new 645 tunnel TunnelY MUST be created pointing to PubS2 if one does not 646 already exist (which depends on whether the Resolution Reply was 647 routed via the Hub(s) or directly on the shortcut tunnel). The 648 ingress node MUST process the reply in the following way: 650 o Validate that this Resolution Reply corresponds to a Request 651 emitted by S1. If not, issue an error and stop processing the 652 Reply. 653 o An NHRP Cache entry is created for TunS2 => PubS2 654 o Two routes are added to the routing table: 656 * TunS2 => TunnelY 657 * P/Plen => TunS2 on TunnelY 659 Though implementations may be entirely different, a compliant 660 implementation MUST exhibit a functional behavior strictly equivalent 661 to the one described above. I.e. IP packets MUST eventually be 662 forwarded according to the above implementation. 664 DMVPN compliant implementations MUST support providing and receiving 665 aggregated address resolution information. 667 4.8. From Hub and Spoke to Dynamic Mesh 669 At the end of the resolution process, the overlay topology will be as 670 follows: 672 DC1 673 | 674 [H1] 675 | | ] 676 +-+ +-+ ] GRE/IPsec tunnels over Transport network 677 | | ] 678 [S1]===[S2] 679 | | 680 D1 D2 682 Shortcut tunnel established 684 Where the tunnel depicted with = is a GRE/IPsec shortcut tunnel 685 created by NHRP. The Routing Table on S1 will now look as follows: 687 o TunH1 => Tunnel0 688 o SUM => TunH1 on Tunnel0 689 o 0.0.0.0/0 => IntPub 690 o D1 => IntPriv 691 o TunS2 => TunnelY 692 o P/Plen => TunS2 on TunnelY 694 It is easy to see that traffic from D1 to D2 will follow the shortcut 695 path under the assumption that P == D2 or D2 is a subnet included in 696 P. 698 The tunnels between S* and H* are actually tunnels created 699 automatically to bootstrap the DMVPN. In practice the initial 700 topology will be a static star (aka Hub and Spoke) topology between 701 S* and H* that will evolve into a dynamic mesh between the nodes S*. 703 From the spokes (S*) standpoint, the bootstrap tunnels can be 704 established with a node H1 statically defined or discovered by DNS. 705 The problem of finding the initial hubs in a DMVPN is not different 706 than finding regular hubs in a traditional Hub and Spoke network. 708 For scalability reasons, it is expected that the NHRP Indirection/ 709 Resolution is the only way by which routes are exchanged between S* 710 nodes. While this does not fall in the context of this document, it 711 is worth mentioning that actual implementations SHOULD NOT establish 712 a routing protocol adjacency directly over the shortcut tunnels. 714 4.9. Remote Access Clients 716 The specification in this document allows a node to not protect any 717 private network. I.e. in a degenerate case, it MUST be possible for 718 a node S1 to not have a D1 network attached to it. Instead, S1 only 719 owns a PubS1? and TunS1? address. This would typically the case of a 720 remote access client (PC, mobile device,...) that only has a tunnel 721 address and an NBMA address. 723 DC1 724 | 725 [H1] 726 | | ] 727 +-+ +-+ ] GRE/IPsec tunnels over Transport network 728 | | ] 729 [S1]===[S2] 730 | 731 D2 733 Remote Access Client 735 On the diagram above, S1 is actually a simple PC or mobile node that 736 is not protecting any other network other than its own tunnel 737 address. 739 These nodes may fully participate in a DMVPN network, including 740 building spoke-spoke tunnels as long as they support GRE, NHRP, IPsec 741 /IKE, and have a way to separate tunneled traffic (virtual 742 interfaces) and be able to update a local routing table to associate 743 networks with different next-hops out either their IntTun (data 744 traffic going over the tunnel) or (IntPub) (tunnel packets themselves 745 and/or non-tunneled data traffic). They may not need to run a 746 routing protocol since they can rely on the Configuration Payload 747 Exchange described in Section 9.2. 749 4.10. Node mutual authentication 751 Nodes authenticate each other using the IKE protocol, while they 752 attempt to establish a tunnel. Because the system is by nature 753 extremely distributed, it is recommended to use X.509 certificates 754 for authentication. Internet Public Key Infrastructure is described 755 in [RFC5280] 757 The structured names and various fields in the certificate can be 758 useful for filtering undesired connectivity in large administrative 759 domains or when two domains are being partially merged. It is indeed 760 easy for a system administrator to define filters to prevent 761 connectivity between nodes that are not supposed to communicate 762 directly (e.g. filtering based on the O or OU fields). 764 Though nodes may be blocked from building a direct tunnel by the 765 above means they may or may not be allowed to communicate via a 766 spoke-hub-spoke path. Allowing or blocking communication via the 767 spoke-hub-spoke path is outside the scope of this document. 769 5. NHRP Extension Format 771 As described in [RFC2332], an NHRP packet consists of a fixed part, a 772 mandatory part and an extensions part. The Fixed Part is common to 773 all NHRP packet types. The Mandatory Part MUST be present, but 774 varies depending on packet type. The Extensions Part also varies 775 depending on packet type, and need not be present. This section 776 describes the packet format of the new messages introduced as well as 777 extensions to the existing packet types. 779 5.1. NHRP Traffic Indication 781 The fixed part of an NHRP Traffic Indication packet picks itself 782 directly from the standard NHRP fixed part and all fields pick up the 783 same meaning as in [RFC2332] unless otherwise explicitly stated. 785 0 1 2 3 786 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 787 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 788 | ar$afn | ar$pro.type | 789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 790 | ar$pro.snap | 791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 792 | ar$pro.snap | ar$hopcnt | ar$pktsz | 793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 794 | ar$chksum | ar$extoff | 795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 796 | ar$op.version | ar$op.type | ar$shtl | ar$sstl | 797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 799 Figure 3: Traffic Indication Fixed Header 801 o ar$op.type With ar$op.version = 1, this is an NHRP packet. 802 Further, [RFC2332] uses the numbers 1-7 for standard NHRP 803 messages. When ar$op.type = 8, this indicates a traffic 804 indication packet. 806 The mandatory part of the NHRP Traffic Indication packet is slightly 807 different from the NHRP Resolution/Registration/Purge Request/Reply 808 packets and bears a much closer resemblance with the mandatory part 809 of NHRP Error Indication packet. The mandatory part of an NHRP 810 Traffic Indication has the following format 811 0 1 2 3 812 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 813 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 814 | Src Proto Len | Dst Proto Len | unused | 815 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 816 | Traffic Code | unused | 817 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 818 | Source NBMA Address (variable length) | 819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 820 | Source NBMA Subaddress (variable length) | 821 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 | Source Protocol Address (variable length) | 823 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 824 | Destination Protocol Address (variable length) | 825 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 826 | Contents of Data Packet in traffic (variable length) | 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 829 Figure 4: Traffic Indication Mandatory Part 831 o Src Proto Len: This field holds the length in octets of the Source 832 Protocol Address. 833 o Dst Proto Len: This field holds the length in octets of the 834 Destination Protocol Address. 835 o Traffic Code: A code indicating the type of traffic indication 836 message, chosen from the following list 838 * 0: NHRP Traffic Redirect/Indirection message.This indirection 839 is an indication,to the receiver, of the possible existence of 840 a 'better' path in the NBMA network. 841 o Source NBMA Address: The Source NBMA address field is the address 842 of the station which generated the traffic indication. 843 o Source NBMA SubAddress: The Source NBMA subaddress field is the 844 address of the station generated the traffic indication. If the 845 field's length as specified in ar$sstl is 0 then no storage is 846 allocated for this address at all. 847 o Source Protocol Address: This is the protocol address of the 848 station which issued the Traffic Indication packet. 849 o Destination Protocol Address: This is the destination IP address 850 from the packet which triggered the sending of this Traffic 851 Indication message. 853 Note that unlike NHRP Resolution/Registration/Purge messages, Traffic 854 Indication message doesn't have a request/reply pair nor does it 855 contain any CIE though it may contain extension records. 857 6. Security Considerations 859 The use of NHRP and its protocol extensions described in this 860 document do not open a direct security hole. The peers are duly 861 authenticated with each other by IKE and the traffic is protected by 862 IPsec. The only risk may come from inside the network itself; this 863 is not different from static meshes. 865 Implementers must be diligent in offering all the control and data 866 plane filtering options that an administrator would need to secure 867 the communication inside the overlay network. 869 7. IANA Considerations 871 The following values are used experimentally: 873 o The ar$op.type value of 8 representing Traffic Indication 874 o Traffic Code value of 0 indicating a Traffic Indirection message. 876 Full standardization would require official IANA numbers to be 877 assigned. 879 8. Compliance against ADVPN requirements 881 This section compares the adequacy of DMVPN to the requirement list 882 stated in [RFC7018]. 884 8.1. Requirement 1: minimize configuration change 886 There are three requirements from requirement 1 from [RFC7018] which 887 reads: 889 "For any network topology (star, full mesh, and dynamic full 890 mesh), when a new gateway or endpoint is added, removed, or 891 changed, configuration changes are minimized as follows. Adding 892 or removing a spoke in the topology MUST NOT require configuration 893 changes to hubs other than where the spoke was connected and 894 SHOULD NOT require configuration changes to the hub to which the 895 spoke was connected. The changes also MUST NOT require 896 configuration changes in other spokes. 898 Specifically, when evaluating potential proposals, we will compare 899 them by looking at how many endpoints or gateways must be 900 reconfigured when a new gateway or endpoint is added, removed, or 901 changed and how substantial this reconfiguration is, in addition 902 to the amount of static configuration required." 904 The three requirements are (1a) all hub change, (1b) connected hub 905 change, and (1c) other spokes. 907 (1a), (1b), and (1c) are met by the ability for tunnels transport 908 addresses to be dynamically discovered by NHRP and the tunnels 909 dynamically created and configured by IKE when the authentication 910 succeeds. 912 8.2. Requirement 2: IPsec without config change, even with peer address 913 change 915 There is one requirement from Requirement 2 of [RFC7018] which reads: 917 "ADVPN Peers MUST allow IPsec tunnels to be set up with other 918 members of the ADVPN without any configuration changes, even when 919 peer addresses get updated every time the device comes up. This 920 implies that Security Policy Database (SPD) entries or other 921 configuration based on a peer IP address will need to be 922 automatically updated, avoided, or handled in some manner to avoid 923 a need to manually update policy whenever an address changes." 925 Each proposal meets this requirement as described below: 927 This requirement is met, and uses the Summary route from Hub 928 (Section 4.2), then method of Indirection (Section 4.3) then a 929 Resolution Request (Section 4.4) and finally a Resolution reply 930 (Section 4.7) to identify the overlay address to transport address 931 mapping in a dynamic manner. 933 8.3. Requirement 3: Tunnel, Routing, and no Additional Configuration 935 There are two requirements from requirement 3 of [RFC7018] which 936 reads: 938 "In many cases, additional tunneling protocols (e.g., GRE) or 939 routing protocols (e.g., OSPF) are run over the IPsec tunnels. 940 Gateways MUST allow for the operation of tunneling and routing 941 protocols operating over spoke-to-spoke IPsec tunnels with minimal 942 or no configuration impact. The ADVPN solution SHOULD NOT 943 increase the amount of information required to configure protocols 944 running over IPsec tunnels." 946 The two requirements are: (3a) minimal or no configuration impact 947 incurred by the tunneling protocols between spokes, (3b) minimal 948 configuration impact incurred by routing protocols operating over the 949 spoke-to-spoke tunnels. 951 Requirement (3a) is met as dynamic tunnels are dynamically created at 952 the same time as the IKE SA is authenticated. 954 Requirement (3b) is met as routing protocols do not operate over 955 spoke-to-spoke tunnels; only NHRP is responsible for exchanging 956 prefixes between spokes and NHRP is entirely dynamic. 958 8.4. Requirement 4: Spoke-to-Spoke Optimization 960 There are two requirements from requirement 4 of [RFC7018] which 961 reads: 963 "In the full-mesh and dynamic full-mesh topologies, spokes MUST 964 allow for direct communication with other spoke gateways and 965 endpoints. In the star topology mode, direct communication 966 between spokes MUST be disallowed." 968 The two requirements are: (4a) in full-mesh and dynamic full-mesh 969 topologies, allow direct spoke-to-spoke communication and (4b) in 970 star topology, disallow direct spoke-to-spoke communication. 972 Requirement (4a) is met by the Resolution Request/Reply mechanism 973 described from Section 4.4 to Section 4.7. 975 Requirement (4b) is met by disabling the NHRP protocol handler from a 976 tunnel pointing to a remote peer. As NHRP is disabled, NHRP messages 977 to and from that peer will be dropped and the peer will be unable to 978 forge a new dynamic endpoint with any other spoke. It is sufficient 979 to disable NHRP to that spoke at the hub level to impede the 980 Resolution mechanism causing the spoke-spoke optimization. 981 Requirement (4b) can be applied globally (for all spokes) or 982 individually (for selected spokes) the activation or deactivation of 983 NHRP on a given peer to peer tunnel can be driven by static 984 configuration or on a per-identity basis. Additionally, peers can 985 filter NHRP Resolution Requests or Replies if partial meshing is 986 allowed to specific prefixes only. Additional identity and 987 certificate filters can be imposed to further restrict which devices 988 can connect to others. For instance, Certificates Subject Names 989 fields such as Organization or Organization Unit are frequently used 990 to that effect. 992 8.5. Requirement 5: Credentials and Compromise 994 There are three requirements from requirement 5 of [RFC7018] which 995 reads: 997 "ADVPN Peers MUST NOT have a way to get the long-term 998 authentication credentials for any other ADVPN Peers. The 999 compromise of an endpoint MUST NOT affect the security of 1000 communications between other ADVPN Peers. The compromise of a 1001 gateway SHOULD NOT affect the security of the communications 1002 between ADVPN Peers not associated with that gateway." 1004 The three requirements are: (5a) no way to get the long-term 1005 authentication credentials from any other ADVPN peers, (5b) 1006 compromise of an endpoint does not affect security of communications 1007 with other peers, and (5c) compromise of a gateway does not affect 1008 the security of communications between ADVPN peers not associated 1009 with that gateway. 1011 Requirement (5a) and is met by Section 4.10 which recommends PKI. 1013 Requirement (5b) is met with mutual authentication. If an endpoint 1014 is compromised, its corresponding certificate will be revoked and it 1015 will be impossible for this endpoint to create any new connection to 1016 any new peer. 1018 Requirement (5c), is met by the same mechanism as (5b). 1020 8.6. Requirement 6: Handoff and Roaming 1022 There are two requirements from requirement 6 of [RFC7018] which 1023 reads: 1025 "Gateways SHOULD allow for seamless handoff of sessions in cases 1026 where endpoints are roaming, even if they cross policy boundaries. 1027 This would mean the data traffic is minimally affected even as the 1028 handoff happens. External factors like firewalls and NAT boxes 1029 that will be part of the overall solution when ADVPN is deployed 1030 will not be considered part of this solution. Such endpoint 1031 roaming may affect not only the endpoint-to- endpoint SA but also 1032 the relationship between the endpoints and gateways (such as when 1033 an endpoint roams to a new network that is handled by a different 1034 gateway)." 1036 The two requirements are (6a) gateways allow for seamless handoff of 1037 sessions when clients roaming (6b) even if they cross policy 1038 boundaries. 1040 Requirement (6a) is met by the fact that tunnels can be established 1041 dynamically but will not be available for traffic until the IPsec SA 1042 is fully available. This is ensured by the fact that NHRP does not 1043 install prefixes into the routing policy until the SA's are fully 1044 negotiated, as described in Section 4.6 1045 Requirement (6b) is met because DMVPN is agnostic to policy 1046 boundaries or domains. 1048 8.7. Requirement 7: Easy handoff and Migration 1050 The are two requirements from requirement 7 of [RFC7018] which reads: 1052 "Gateways SHOULD allow for easy handoff of a session to another 1053 gateway, to optimize latency, bandwidth, load balancing, 1054 availability, or other factors, based on policy. This ability to 1055 migrate traffic from one gateway to another applies regardless of 1056 whether the gateways in question are hubs or spokes. It even 1057 applies in the case where a gateway (hub or spoke) moves in the 1058 network, as may happen with a vehicle-based network." 1060 The two requiremets are: (7a) Easy handoff os a session to another 1061 gateway to optimize requirements based on policy, and (7b) ability to 1062 migrate from one gateway to another. 1064 Requirement (7a) can be achieved by using IKEv2 Redirect ([RFC5685]) 1065 to redirect a peer entirely to another gateway. Specific Indirection 1066 Notification can be used to redirect specific networks or peers. 1068 Requirement (7b) is met because IKEv2 Redirect, Resolution Request 1069 and Indirection Notification can be sent on a voluntary basis by any 1070 device (hub or spoke) which means than a source node or an egress 1071 node can be of any type (hub or spoke). In practice, this is an 1072 unusual mode of operation (seldom desirable) but it is legitimate. 1074 8.8. Requirement 8: NAT 1076 There are three requirements from requirement 8 of [RFC7018] which 1077 reads: 1079 "Gateways and endpoints MUST have the capability to participate in 1080 an ADVPN even when they are located behind NAT boxes. However, in 1081 some cases they may be deployed in such a way that they will not 1082 be fully reachable behind a NAT box. It is especially difficult 1083 to handle cases where the hub is behind a NAT box. When the two 1084 endpoints are both behind separate NATs, communication between 1085 these spokes SHOULD be supported using workarounds such as port 1086 forwarding by the NAT or detecting when two spokes are behind 1087 uncooperative NATs, and using a hub in that case." 1089 The three requirements are: (8a) Gateways and endpoints MUST have the 1090 capability to participate in an ADVPN even when they are located 1091 behind NAT boxes. (8b) When the two endpoints are both behind 1092 separate NAT boxes. (8c) Shortcuts should continue to work seamlessly 1093 when NAT prevents direct spoke-spoke connectivity. 1095 All requirements (8a,8b) are met by the use of NAT Traversal to 1096 detect NAT devices within the network. If a hub is deployed behind a 1097 NAT address, the spokes need to point their tunnel destination 1098 towards the public address of the Hub,as described in Section 9.3 1100 Requirement (8c) is met since NHRP does not install prefixes into the 1101 routing policy until the SA's are fully negotiated, as described in 1102 Section 4.6. 1104 8.9. Requirement 9: Changes reported 1106 There is one requirement from requirement 9 of [RFC7018] which reads: 1108 "Changes such as establishing a new IPsec SA SHOULD be reportable 1109 and manageable. However, creating a MIB or other management 1110 technique is not within scope for this effort." 1112 Requirement (9) is met by taking advantage of the various MIB's 1113 defined in existing documents such as [RFC2677], [RFC4292], etc. 1114 There is no standard IPsec MIB but various vendors have developed a 1115 proprietary MIB (typically based on draft-ietf-ipsec-flowmon-mib and 1116 draft-ietf-ipsec-mib) that implementations of this specification can 1117 use. Traps can be triggered as tunnel interfaces come up and down 1118 dynamically as defined in [RFC2863] section 3.1.9. Additional 1119 logging message can be triggered at various levels of the 1120 implementation. 1122 8.10. Requirement 10: Federation between organisations 1124 There is one requirement from requirement 10 of [RFC7018] which 1125 reads: 1127 "To support allied and federated environments, endpoints and 1128 gateways from different organizations SHOULD be able to connect to 1129 each other." 1131 Requirement (10) is met is met by the use of PKI ([RFC5280]), 1132 described in Section 4.6. NHRP can resolve networks across multiple 1133 domains as long as those domains are somehow initially connected to 1134 the topology. 1136 8.11. Requirement 11: Configuration of star, full-mesh, or partial 1137 full-mesh topologies 1139 There is one requirement from requirement 11 of [RFC7018] which 1140 reads: 1142 "The administrator of the ADVPN SHOULD allow for the configuration 1143 of a star, full-mesh, or partial full-mesh topology, based on 1144 which tunnels are allowed to be set up." 1146 Requirement (11) is met by the same principle as Requirement (4b). 1148 8.12. Requirement 12: Scale for Multicast 1150 There is one requirement from requirement 12 of [RFC7018] which 1151 reads: 1153 "The ADVPN solution SHOULD be able to scale for multicast 1154 traffic." 1156 Requirement (12) is met by the use of a full tunneling interface as 1157 described in Section 1. All multicast control protocols such as PIM 1158 ([RFC4601]) or IGMP ([RFC4604]) or even MLD ([RFC3810]) will work 1159 seamlessly on the overlay medium (GRE/IPsec tunnels). 1161 8.13. Requirement 13: Monitoring and Reporting 1163 There is one requirement from requirement 13 of [RFC7018] which 1164 reads: 1166 "The ADVPN solution SHOULD allow for easy monitoring, logging, and 1167 reporting of the dynamic changes to help with troubleshooting such 1168 environments." 1170 Requirement (13) is met by the use of multiple existing technologies 1171 (IPsec, IKE, NHRP, GRE, interfaces) which all generate their own 1172 monitoring, logging, and reporting. 1174 8.14. Requirement 14: L3 VPNs 1176 There is one requirement from requirement 14 of [RFC7018] which 1177 reads: 1179 "There is also the case where L3VPNs operate over IPsec tunnels, 1180 for example, Provider-Edge-based VPNs. An ADVPN MUST support 1181 L3VPNs as applications protected by the IPsec tunnels." 1183 Requirement (14) is met by the use of GRE to encapsulate all traffic 1184 which allows for L2 headers to be transported over DMVPN providing 1185 L3VPN functionality. L3VPN labels can be exchanged by running a 1186 routing protocol over the tunnels. 1188 In accordance to Requirements (1) and (2) about minimal 1189 configuration, the tunnel interfaces only need to activate MPLS as a 1190 supported encapsulation format. This activation can be performed 1191 globally for all tunnels or can be performed for individual tunnels 1192 based on the peer identity. 1194 8.15. Requirement 15: QoS 1196 There is one requirement from requirement 15 of [RFC7018] which 1197 reads: 1199 "The ADVPN solution SHOULD allow the enforcement of per-peer QoS 1200 in both the star and full-mesh topologies." 1202 Requirement (15) is met by applying a QoS policy on the point-to- 1203 point (GRE/IPsec) tunnels, allowing the policy to only parse traffic 1204 that is destined to a specific remote peer. 1206 8.16. Requirement 16: Hub Redundancy 1208 There is one requirement from requirement 16 of [RFC7018] which 1209 reads: 1211 "The ADVPN solution SHOULD take care of not letting the hub be a 1212 single point of failure." 1214 Requirement (16) is met by the ability to use multiple Hubs and an 1215 overlay routing protocol as described in Sections 1 and 4.2. This 1216 method allows a routing based resiliency. Additionally, a spoke can 1217 define multiple addresses or a DNS names to be used as a backup hub. 1219 9. Design Considerations 1221 This section contains a number of points that do not augment the 1222 specification explained so far but instead clarify its use. 1224 9.1. Routing Policy and RFC4301 Security 1226 The notion of routing policy is extensively used throughout this 1227 document. This routing policy is a mechanism used to lookup which 1228 peer or node the packet should be sent to. The exact representation 1229 of a Routing Policy is left to the implementer. It may represent but 1230 is not limited to a unique routing table, a manifold of routing 1231 table, a policy route or any other mechanism that can take a 1232 forwarding decision. 1234 A key conceptual difference between a Routing Policy and a plain SADB 1235 or a routing table is that packets can be routed to a peer based on 1236 complex rules that may be more complex than just the usual 1237 destination prefix of a RIB or the 3- or 5-tuple (source/destination 1238 IP, source/destination port, protocol) of the SADB. 1240 Most systems can take forwarding decisions that are more elaborate 1241 than that. This includes policy-based-routing, application based 1242 forwarding, multi-topology routing, etc. that are used to evaluate 1243 packets before they optionally undergo the basic routing table or 1244 SADB. 1246 A notable example of a Routing Policy is a manifold of Routing Tables 1247 in the context of VPN Instances (see [RFC4026]); these dedicated 1248 tables are called VRF's. In this example, a dedicated VRF that we 1249 will call VRF Red is associated to the overlay network and 1250 exclusively routes protected packets. In effect, the private 1251 interfaces and the tunnel interfaces are considered Red Interfaces 1252 and exclusively make use of VRF Red as a routing table. Packets 1253 entering the system on a Red interface undergo a VRF Red lookup and 1254 can only leave the device on a Red interface (which tunnels are part 1255 of). 1257 Another routing table called VRF Black is associated to the transport 1258 network (or NBMA network) and exclusively routes traffic to and from 1259 Unprotected Network. This means the physical interfaces facing the 1260 transport network are Black interfaces and traffic entering that 1261 interface is driven by the Black VRF routing table. GRE/IPsec 1262 packets entering the Unprotected interface are such packets. 1264 As noted earlier in this document, GRE tunnels request to be IPsec 1265 protected through a crypto socket as explained in [RFC5660]. A 1266 corresponding SPD and SADB will be created by that socket. 1268 Plain GRE packets will be discarded as they were not duly protected 1269 and no SPD covers that traffic flow ([RFC4301], section 5). 1271 IPsec packets will be accepted by the IPsec stack, their SPI looked 1272 up, get validated (hash, anti-replay) and decrypted. The clear text 1273 packet undergoes the SADB check and MUST be a GRE packet. If it is 1274 not a GRE packet of adequate source/destination, the packet is 1275 discarded. In the light of [RFC5660], the packets will be given to 1276 their application without further intermediate lookup; in this case, 1277 the application is the corresponding GRE Tunnel Interface. 1279 The protected/overlay packet is now in the clear, ready to be 1280 processed by the GRE Input Features. In particular, security 1281 features can be applied on the clear text, overlay packet (access- 1282 filter, Unicast Reverse Path Forwarding, Layer 7 inspection via 1283 Firewall or Intrusion Prevention system,...). Those policies can be 1284 applied on the fly at IKE negotiation time when the remote peer 1285 identity is known. The clear text packet, should it survive the 1286 security policies, will be forwarded to another Red Interface 1287 according to the VRF Red table. 1289 In the egress direction, clear text packets enter a VRF Red 1290 Interface, get forwarded to a Tunnel interface according to VRF Red. 1291 The packet undergoes output features on the output interface (this 1292 may include filters, firewalling etc.) and is encapsulated into GRE. 1293 The GRE encapsulation function passes the packet to IPsec for 1294 protection through the crypto socket. The packet is now an ESP or AH 1295 packet and can be routed out the public interface according to the 1296 VRF Black table. 1298 For compliance with [RFC4301], explicit leaks may be configured 1299 between VRF's to allow specific traffic to bypass IPsec encryption or 1300 other security policies if necessary but by default, the Red and 1301 Black VRF's are absolutely compartmented. 1303 Various operating systems such as Linux do support VRFs but also have 1304 other methods of implementing a routing policy (e.g. iproute2) that 1305 they can use to their advantage to achieve beyond-routing or beyond- 1306 SADB policy enforcement. 1308 9.2. Using Configuration Attributes 1310 As outlined earlier in this document, this specification lets any 1311 administratively authorized control protocol set up the routing 1312 policy of the base topology. This section explains how IKEv2 can 1313 perform that task. 1315 IKEv2 natively features Configuration Attributes exchanged in 1316 Configuration Payloads ([RFC5996], section 3.15). These payloads can 1317 be used to exchange prefix between peers. The exchange looks like 1318 Spoke1 Hub 1320 HDR, SK {IDi, [CERT,] 1321 [CERTREQ,] [IDr,] AUTH, 1322 CP(CFG_REQUEST), SAi2, 1323 TSi, TSr} --> 1324 <-- HDR, SK {IDr, [CERT,] AUTH, 1325 CP(CFG_REPLY), SAr2, 1326 TSi, TSr} 1327 HDR, SK {CP(CFG_SET)} 1328 --> 1329 <-- HDR, SK {CP(CFG_ACK)} 1331 Config Exchange 1333 In accordance to the previous notation, the config payloads and 1334 attributes in order to set up the routing table depicted in 1335 Section 4.2 looks as follows: 1337 CP(CFG_REQUEST)= 1338 INTERNAL_ADDRESS() 1340 CP(CFG_REPLY)= 1341 INTERNAL_ADDRESS(TunS1) 1342 INTERNAL_NETMASK(255.255.255.255) 1343 INTERNAL_SUBNET(SUM/SUM_MASK) 1344 ... (other INTERNAL_SUBNETs if necessary) 1346 CP(CFG_SET)= 1347 INTERNAL_SUBNET(D1/D1_MASK) 1348 ... (other INTERNAL_SUBNETs if necessary) 1350 CP(CFG_ACK) 1352 Config Exchange 1354 The information exchange can be achieved by both sides requesting and 1355 responding solely using CFG_REQUEST and CFG_ACK but it has been 1356 expanded to showcase the conformance to the IKEv2 protocols. 1358 Due to limited packet size and issues caused by fragmentation, the 1359 number of prefixes exchanged by CP exchange is expected to be limited 1360 in practice. This mechanism is not meant to transfer a large number 1361 of prefixes. Should the prefix count be high, the authors strongly 1362 recommend the use of a routing protocol instead. 1364 A peer receiving INTERNAL_SUBNET attributes from another peer MUST be 1365 free to ignore or otherwise interpret that INTERNAL_SUBNET in 1366 accordance to a security policy. This is necessary in accordance to 1367 [RFC4301] PAD and a recommended practice.Interpretation of that 1368 INTERNAL_SUBNET includes plain rejection (ignore), modification of 1369 the received subnet, logging a warning message and/or termination of 1370 the connection. 1372 9.3. NAT Support 1374 IKEv2 supports NAT Traversal natively. Since GRE provides the 1375 tunneling capability, GRE itself can be protected by IPsec Transport 1376 Mode. See [RFC5996], sections 2.23 for NAT support and 2.23.1 in 1377 particular for NAT Traversal in Transport Mode for the protocol 1378 details. 1380 If a hub is deployed behind a NAT address, the spokes need to point 1381 their tunnel destination towards the public address of the Hub, 1382 assuming the hub is reachable via a well known NAT translation 1383 (static mapping or dynamic public address published via DNS for 1384 instance). 1386 10. Acknowldegements 1388 The authors would like to thank Graham Bartlett, Brian Weis, Mark 1389 Comeadow and Mark Jackson from Cisco for their help in publishing and 1390 reviewing this document. We would also like to acknowledge the 1391 historical DMVPN team, in particular Jan Vilhuber and Pratima Sethi. 1393 11. References 1395 11.1. Normative References 1397 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1398 Requirement Levels", BCP 14, RFC 2119, March 1997. 1400 [RFC2332] Luciani, J., Katz, D., Piscitello, D., Cole, B., and N. 1401 Doraswamy, "NBMA Next Hop Resolution Protocol (NHRP)", RFC 1402 2332, April 1998. 1404 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1405 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1406 March 2000. 1408 [RFC3810] Vida, R. and L. Costa, "Multicast Listener Discovery 1409 Version 2 (MLDv2) for IPv6", RFC 3810, June 2004. 1411 [RFC4026] Andersson, L. and T. Madsen, "Provider Provisioned Virtual 1412 Private Network (VPN) Terminology", RFC 4026, March 2005. 1414 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1415 Internet Protocol", RFC 4301, December 2005. 1417 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1418 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1419 May 2008. 1421 [RFC5660] Williams, N., "IPsec Channels: Connection Latching", RFC 1422 5660, October 2009. 1424 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 1425 the Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 1426 5685, November 2009. 1428 [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 1429 "Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 1430 5996, September 2010. 1432 [RFC7018] Manral, V. and S. Hanna, "Auto-Discovery VPN Problem 1433 Statement and Requirements", RFC 7018, September 2013. 1435 11.2. Informative References 1437 [RFC2677] Greene, M., Cucchiara, J., and J. Luciani, "Definitions of 1438 Managed Objects for the NBMA Next Hop Resolution Protocol 1439 (NHRP)", RFC 2677, August 1999. 1441 [RFC2863] McCloghrie, K. and F. Kastenholz, "The Interfaces Group 1442 MIB", RFC 2863, June 2000. 1444 [RFC4292] Haberman, B., "IP Forwarding Table MIB", RFC 4292, April 1445 2006. 1447 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 1448 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 1449 Protocol Specification (Revised)", RFC 4601, August 2006. 1451 [RFC4604] Holbrook, H., Cain, B., and B. Haberman, "Using Internet 1452 Group Management Protocol Version 3 (IGMPv3) and Multicast 1453 Listener Discovery Protocol Version 2 (MLDv2) for Source- 1454 Specific Multicast", RFC 4604, August 2006. 1456 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 1457 Housley, R., and W. Polk, "Internet X.509 Public Key 1458 Infrastructure Certificate and Certificate Revocation List 1459 (CRL) Profile", RFC 5280, May 2008. 1461 Authors' Addresses 1463 Frederic Detienne 1464 Cisco 1465 De Kleetlaan 7 1466 Diegem 1831 1467 Belgium 1469 Email: fd@cisco.com 1471 Manish Kumar 1472 Cisco 1473 Mail Stop BGL14/G/ 1474 SEZ Unit, Cessna Business Park 1475 Varthur Hobli, Sarjapur Marathalli Outer Ring Road 1476 Bangalore, Karnataka 560 103 1477 India 1479 Email: manishkr@cisco.com 1481 Mike Sullenberger 1482 Cisco 1483 Mail Stop SJCK/3/1 1484 225 W. Tasman Drive 1485 San Jose, California 95134 1486 United States 1488 Email: mls@cisco.com