idnits 2.17.1 draft-detienne-dmvpn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 29, 2013) is 3924 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'S1' is mentioned on line 267, but not defined == Missing Reference: 'S2' is mentioned on line 267, but not defined == Missing Reference: 'S3' is mentioned on line 100, but not defined == Missing Reference: 'S4' is mentioned on line 100, but not defined == Missing Reference: 'H1' is mentioned on line 696, but not defined == Unused Reference: 'RFC5226' is defined on line 1016, but no explicit reference was found in the text == Unused Reference: 'RFC5996' is defined on line 1027, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) ** Obsolete normative reference: RFC 5996 (Obsoleted by RFC 7296) -- Unexpected draft version: The latest known version of draft-ietf-ipsecme-p2p-vpn-problem is -02, but you're referring to -07. Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPSECME Working Group F. Detienne 3 Internet-Draft M. Kumar 4 Intended status: Standards Track M. Sullenberger 5 Expires: January 30, 2014 Cisco 6 July 29, 2013 8 Flexible Dynamic Mesh VPN 9 draft-detienne-dmvpn-00 11 Abstract 13 The purpose of a Dynamic Mesh VPN (DMVPN) is to allow IPsec/IKE 14 Security Gateways administrators to configure the devices in a 15 partial mesh (often a simple star topology called Hub-Spokes) and let 16 the Security Gateways establish direct protected tunnels called 17 Shortcut Tunnels. These Shortcut Tunnels are dynamically created 18 when traffic flows and are protected by IPsec. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 30, 2014. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Tunnel Types . . . . . . . . . . . . . . . . . . . . . . . . 5 57 4. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 58 4.1. Initial Connectivity . . . . . . . . . . . . . . . . . . 6 59 4.2. Initial Routing Table Status . . . . . . . . . . . . . . 7 60 4.3. Indirection Notification . . . . . . . . . . . . . . . . 8 61 4.4. Node Discovery via Resolution Request . . . . . . . . . . 10 62 4.5. Resolution Request Forwarding . . . . . . . . . . . . . . 10 63 4.6. Egress node NHRP cache and Tunnel Creation . . . . . . . 12 64 4.7. Resolution Reply format and processing . . . . . . . . . 13 65 4.8. From Hub and Spoke to Dynamic Mesh . . . . . . . . . . . 14 66 4.9. Remote Access Clients . . . . . . . . . . . . . . . . . . 15 67 4.10. Node mutual authentication . . . . . . . . . . . . . . . 16 68 5. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . 16 69 5.1. NHRP Traffic Indication . . . . . . . . . . . . . . . . . 16 70 6. Security Considerations . . . . . . . . . . . . . . . . . . . 18 71 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 72 8. Match against ADVPN requirements . . . . . . . . . . . . . . 18 73 9. Acknowldegements . . . . . . . . . . . . . . . . . . . . . . 21 74 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 75 10.1. Normative References . . . . . . . . . . . . . . . . . . 22 76 10.2. Informative References . . . . . . . . . . . . . . . . . 22 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 79 1. Introduction 81 This document describes a Dynamic Mesh VPN (DMVPN), in which an 82 initial partial mesh expands to create direct connections called 83 Shortcut Tunnels between endpoints that need to exchange data but are 84 not directly connected in the initial mesh. 86 In a generic manner, DMVPN topologies initialize as Hub-Spoke 87 networks where Spoke Security Gateway nodes S* connect to Hub 88 Security Gateway nodes H* over a public transport network (such as 89 the Internet) considered insufficiently secure so as to mandate the 90 use of IPsec and IKE. For scalability and redundancy reasons, there 91 may be multiple hubs; the Hubs would then be connected together 92 through the DMVPN. The diagram Figure 1 depicts this situation. 94 DC1 DC2 95 | | 96 [H1]-----[H2] 97 | | | | 98 +-+ | | +-+ 99 | | | | 100 [S1] [S2] [S3] [S4] 101 | | | 102 D1 D2 D3 104 Figure 1: Hub and Spoke, multiple hubs, multiple spokes 106 Initially, the Security Gateway nodes (S*) are configured to build 107 tunnels secured with IPsec to the Security Gateway node (H*) in a hub 108 and spoke style network (any partial mesh will do, but Hub-Spoke is 109 common and easily understood). This initial network is then used 110 when traffic starts flowing between the protected networks D*. DMVPN 111 uses NHRP as a signaling mechanism over the S*-H* and H*-H* tunnels 112 to trigger the spokes (S*) to discover each other and build dynamic, 113 direct Shortcut Tunnels. The Shortcut Tunnels allow those spokes to 114 communicate directly with each other without forwarding traffic 115 through the hub, essentially creating a dynamic mesh. 117 The spokes can be either routers or firewalls playing the role of 118 Security Gateways or hosts such as computers, mobile phones,etc. 119 protecting their own traffic. Nodes S1, S2 and S3 above are routers 120 while S4 is a host implementation. 122 This document describes how NHRP is modified and augmented to allow 123 the rapid creation of dynamic IPsec tunnels between two devices. 124 Throughout this document, we will call these devices participating in 125 the DMVPN "nodes". 127 In the context of this document, the nodes protect a topologically 128 dispersed Private, Overlay Network address space. The nodes allow 129 the devices in the Overlay Network to communicate securely with each 130 other via GRE tunnels secured by IPsec using dynamic tunnels 131 established between the nodes over the (presumably insecure) 132 Transport network. I.e. the protected tunnel packets are forwarded 133 over this Transport network. 135 The NBMA Next Hop Resolution Protocol (NHRP) as described in 136 [RFC2332] allows an ingress node to determine the internetworking 137 layer address and NBMA address of an egress node. The servers in 138 such an NBMA network provide the functionality of address resolution 139 based on a cache which contains protocol layer address to NBMA 140 subnetwork layer address resolution information. This can be used to 141 create a virtual network where dynamic virtual circuits can be 142 created on an as needed basis. In this document, we will depart the 143 underlying notion of a centralized NHS. 145 All data traffic, NHRP frames and other control traffic needed by 146 this DMVPN MUST be protected by IPsec. In order to efficiently 147 support Layer 2 based protocols, all packets and frames MUST be 148 encapsulated in GRE ([RFC2784]) first; the resulting GRE packet then 149 MUST be protected by IPsec. IPsec transport mode MUST be supported 150 while IPsec tunnel mode MAY be used. The usage of a GRE 151 encapsulation protected by IPsec is described in [RFC4301]. 152 Implementations SHOULD strongly link GRE and IPsec SA's through some 153 form of connection latching as described in [RFC5660]. 155 2. Terminology 157 The NHRP semantic is used throughout this document however some 158 additional terminology is used to better fit to the context. 160 o Protected Network, Private Network: a network hosted by one of the 161 nodes. The protected network IP addresses are those that are 162 resolved by NHRP into an NBMA address. 163 o Overlay Network: the entire network composed with the Protected 164 Networks and the IP addresses installed on the Tunnel interfaces 165 instantiating the DMVPN. 166 o Transport Network, Public Network: the network transporting the 167 GRE/IPsec packets. 168 o Nodes: the devices connected by the DMVPN that implement NHRP, GRE 169 /IPsec and IKE. 170 o Ingress Node: The NHRP node that takes data packets from off of 171 the DMVPN and injects them into the DMVPN on either a multi-hop 172 tunnel path (initially) or single hop shortcut tunnel. Also the 173 node that will send an NHRP Resolution Request and receive an NHRP 174 Resolution Reply to build a short-cut tunnel. 175 o Egress Node: The NHRP node that extracts data packets from the 176 DMVPN and forwards them off of the DMVPN. Also the node that 177 answers an NHRP Resolution Request and send an NHRP Resolution 178 Reply. 179 o Intermediate Node: An NHRP node that is in the middle of multi-hop 180 tunnel path between an Ingress and Egress Node. For the 181 particular data traffic in question the Intermediate node will 182 receive packets from the DMVPN and resend them (hair-pin) them 183 back onto the DMVPN. 185 Note, a particular node in the DMVPN, may at the same time be an 186 Ingress, Egress and Intermediate node depending on the data traffic 187 flow being looked at. 189 In general, DMVPN nodes make extensive use of the Local Address 190 Groups (LAG) and Logically Independent Subnets (LIS)models as 191 described in [RFC2332]. A compliant implementation MUST support the 192 LAG model and SHOULD support the LIS model. 194 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 195 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 196 "OPTIONAL" in this document are to be interpreted as described in 197 [RFC2119]. 199 3. Tunnel Types 201 The tunnels described in this document are of type GRE/IPsec. GRE/ 202 IPsec allows a single pair of IPsec SA's to be negotiated between the 203 DMVPN nodes. From an IPsec aggregation standpoint, this means less 204 negotiation, cleaner use of expensive resources and less 205 reprogramming of the data plane by the IKE control plane as 206 additional networks are discovered between any two peers. 208 In the remainder of this document, GRE and GRE/IPsec will be used 209 interchangeably depending on the focused layer but always imply "GRE 210 protected by IPsec" 212 Taking advantage of the GRE encapsulation, and while NHRP could be 213 forwarded over IP, the RFC recommended Layer 2 NHRP frames have been 214 retained in order to simplify the security policies (packet filters 215 do not have to be augmented to allow NHRP through, no risk of 216 mistakenly propagating frames where they should not, etc.). 217 Compliant implementations MUST support L2 NHRP frames. 219 DMVPN can be implemented in a number of ways and this document places 220 no restriction on the actual implementation. This section covers 221 what the authors believe are the important implementation 222 recommendations to construct a scalable implementation. 224 The authors recommend using a logical interface construct to 225 represent the GRE tunnels. These interfaces are called Tunnel 226 Interfaces or simply Interfaces from here onward. 228 In the remainder of this document, we will assume the implementation 229 uses point-to-point Tunnel Interfaces; routes to prefixes in the 230 Overlay network are in the Routing Table (aka Routing Information 231 Base). These routes forward traffic toward the tunnel interfaces. 233 Point-to-Multipoint GRE interfaces (aka multipoint interfaces for 234 short) can also be used. In that case there is by construction only 235 one tunnel source NBMA address and the interface has multiple tunnel 236 endpoints. In this case NHRP registration request and reply 237 messages, [RFC2332], are used to pass the tunnel address to tunnel 238 NBMA address mapping from the NHC (S*) to the NHS (H*). The NHRP 239 registration request and reply MAY be restricted to a single direct 240 tunnel hop between the NHC (S*) and NHS (H*). 242 For didactic reasons, and an easier understanding of the LAG support, 243 we will use the point-to-point construct to highlight the protocol 244 behavior in the remainder of this document. An implementation can 245 use different models (point-to-point, multipoint, bump in the 246 stack,...) but MUST comply to the external (protocol level) behavior 247 described in this document. 249 4. Solution Overview 251 4.1. Initial Connectivity 253 We assume the following scenario where nodes (S1, S2, H1, H2) 254 depicted in figure Figure 2 supporting GRE, IPsec/IKE and NHRP 255 establish connections instantiated by GRE tunnels. Those GRE tunnels 256 SHOULD be protected by IPsec/IKE. These tunnels will be used to 257 secure all the data traffic as well as the NHRP control frames. In 258 general, routing protocols (and possibly other control protocols) 259 will also run through these tunnels, and therefore also be protected. 261 DC1 262 | 263 [H1] 264 | | ] 265 +-+ +-+ ] GRE/IPsec tunnels over Transport network 266 | | ] 267 [S1] [S2] 268 | | 269 D1 D2 271 Figure 2: Hub and Spoke Initial Connectivity 273 It is assumed that S1, H1 and S2 are connected via a shared Transport 274 network (typically a Public, NBMA network) and there is connectivity 275 between the nodes over that transport network. 277 The nodes possess multiple interfaces; each of which has a dedicated 278 IP address: 280 o a public interface IntPub connected to the transport network; IP 281 address: Pub{node} 282 o one or several tunnel interface Tunnel0,1,.. (GRE/IPsec) 283 connecting to peers; IP address: Tun{i}{node} 284 o a private interface IntPriv facing the private network of the 285 node; IP address: Priv{node} 287 e.g. node S1 owns the following addresses: PubS1, TunS1 and PrivS1 288 The networks D1, D2, DC1 and also the tunnel address Tun{i} can and 289 are presumed to be private in the sense that their address space is 290 kept independent from the transport network address space. Together, 291 they form the Overlay network. For the transport network, the 292 address family is either IPv4 or IPv6. In the context of this 293 document, for the overlay network, the address family is IPv4 and/or 294 IPv6. 296 Initially, nodes S1 and S2 create a connection to node H1. 297 Optionally, S1 and S2 MAY register to H1 via NHRP. Typically the GRE 298 tunnels between S* and H1 will be protected by IPsec. A compliant 299 implementation MUST support IPsec protected GRE tunnels and SHOULD 300 support unprotected GRE tunnels. 302 At the end of this section, a dynamic tunnel will be set up between 303 S1 and S2 and traffic will flow directly through S1 and S2 without 304 going through H1. 306 4.2. Initial Routing Table Status 308 In the context of this document, the authors make no assumption about 309 how the routing tables are initially populated but one can assume 310 that routing protocols exchange information between H1 and S1 and 311 between H1 and S2. 313 In this diagram, we assume each node has routes (summarized or 314 specific) for networks D1, D2, DC1 which are IP networks. We assume 315 the summary prefix SUM to encompass all the private networks depicted 316 on this diagram. We assume the communication between those networks 317 needs to be protected and therefore, the routes point to tunnels. 318 I.e. S1 knows a route summarizing all the Overlay subnets and this 319 route points to the GRE/IPsec tunnel leading to H1. Note, the the 320 summary prefix is a network design choice and it can be replaced by a 321 prefix summary manifold or individual non-summarized routes. 323 Example 1: Node S1 has the following routing table: 325 o TunH1 => Tunnel0 326 o SUM => TunH1 on Tunnel0 327 o 0.0.0.0/0 => IntPub 328 o D1 => IntPriv 330 Example 2: Node H1 has the following routing table: 332 o TunS1 => Tunnel1 333 o TunS2 => Tunnel2 334 o D1 => TunS1 on Tunnel1 335 o D2 => TunS2 on Tunnel2 336 o 0.0.0.0/0 => IntPub 337 o DC1 => IntPriv 339 The exact format of the routing table is implementation dependent but 340 the node discovery principle MUST be enforced and the implementation 341 MUST be compatible with an implementation using the routing tables 342 outlined above. 344 This document does not specify how the routes are installed but it 345 can be assumed that the routes (1) and (2) in the tables above are 346 exchanged between S* and H* nodes after the S*-H* connections have 347 been duly authenticated. In a DMVPN solution, it is typical that the 348 routes are exchanged by a route exchange protocol (e.g. BGP) or are 349 installed statically (usually a mix of both). It is important that 350 routing updates be filtered in order to prevent a node from 351 advertising improper routes to another node. This filtering is out 352 of the scope of this document as most routing protocol 353 implementations are already capable of such filtering. In order to 354 meet these criteria, an implementation SHOULD offer identity-based 355 policies to filter those routes on a per peer basis. 357 When a device Ds on network D1 needs to connect to a device Dd on 358 network D2 360 o a data packet ip(Ds, Dd) is sent and reaches S1 on IntPriv 361 o the data packet is routed by S1 via Tunnel0 toward H1; S1 362 encapsulates, protects and forwards this packet out IntPub via the 363 transport network to H1 364 o H1 receives the protected packet on IntPub; H1 decrypts and 365 decapsulates this packet; the resulting data packet looks to the 366 IP stack on H1 as if it arrived on interface Tunnel1 367 o the data packet is routed by H1 via Tunnel2 toward S2; H1 368 encapsulates, protects and forwards this out IntPub via the 369 transport network to S2 370 o S2 receives the protected packet on IntPub; S2 decrypts and 371 decapsulates this packet; the resulting data packet looks to the 372 IP stack as if it arrived on interface Tunnel0 373 o S2 routes the data packet out of its IntPriv interface to the 374 destination Dd 376 4.3. Indirection Notification 378 Considering the packet flow seen in {previous section}. When H1 379 (Intermediate Node) receives a packet from the ingress node S1 and 380 forwards it to the Next Node S2, it technically re-injects the packet 381 back into the DMVPN. 383 At this point H1 SHOULD an Indirection Notification message to S1. 384 The Indirection Notification is a dedicated NHRP message indicating 385 the ingress node that it sent an IP packet that had to be forwarded 386 via the intermediate node to another node. The Indirection 387 Notification MUST contain the first 64 bytes of the clear text IP 388 packet that was forwarded to the next node. The exact format of this 389 message is detailed in the section [PACKET_FORMAT]. 391 The Indirection Notification MUST be sent back to the ingress node 392 through the same GRE/IPsec tunnel upon which the hair-pinned IP 393 packet was received and MUST be rate limited. 395 This message is a hint that a direct tunnel SHOULD be built between 396 the end-nodes, bypassing intermediate nodes. This tunnel is called a 397 "Shortcut Tunnel". 399 Compliant implementations MUST be able to send and accept the 400 Indirection Notification, however implementations MUST continue to 401 accept traffic over the spoke-hub-spoke path during spoke-spoke path 402 establishment (Shortcut Tunnel). 404 When a node receives such a notification, it MUST perform the 405 following: 407 o parse and accept the message 408 o extract the source address of the original protected IP packet 409 from the 64 bytes available 410 o perform a route lookup on this source address 412 * If the routing to this source address is also via the DMVPN 413 network upon which it received the Indirect Notification then 414 this node is an intermediate node on the tunnel path from the 415 ingress node (injection point) to the egress node (extraction 416 point). In this case this intermediate node MUST silently drop 417 the Indirect Notification that it received. Note that if the 418 node is an intermediate node, it is likely that it has 419 generated and sent an Indirect Notification about this same 420 protected IP packet to its tunnel neighbor on the tunnel path 421 back towards the ingress node (injection point). This is 422 correct behavior. 423 o if the previous step did succeed, extract the destination IP 424 address (Dd) of the original protected IP packet from the 64 bytes 425 available. 427 The ingress node MAY also extract additional information from those 428 64 bytes such as the protocol type, port numbers etc. 430 In steady state, Indirection Notifications MUST be accepted and 431 processed as above from any trusted peer with which the node has a 432 direct connection. 434 4.4. Node Discovery via Resolution Request 436 After processing the information in the Indirection Notify, the 437 ingress node local policy SHOULD determine whether a shortcut tunnel 438 needs to be established. Assuming the local policy requests a 439 shortcut tunnel, the ingress node MUST emit a Resolution Request for 440 the destination IP address Dd. 442 More specifically, the NHRP Resolution Request emitted by S1 to 443 resolve Dd will contain the following fields: 445 o Fixed Header 447 * ar$op.version = 1 448 * ar$op.type = 1 449 o Common Header (Mandatory Header) 451 * Source NBMA Address = PubS1 452 * Source Protocol Address = TunS1 453 * Destination Protocol Address = Dd 455 The resolution request is routed by S1 to H1 over the GRE/IPsec 456 tunnel. If an intermediate node has a valid (authoritative) NHRP 457 mapping in its cache, it MAY respond. An intermediate node SHOULD 458 NOT answer Resolution Requests in any other case. 460 Note that a Resolution Request can be voluntarily emitted by Security 461 Gateway and is not strictly limited to a response to the Indirection 462 Notify message. Such cases and policies are out of the scope of the 463 document. 465 The sending of Resolution Requests by a ingress node MUST be rate 466 limited. 468 4.5. Resolution Request Forwarding 470 The Resolution Request can be sent by S1 to an explicit or implicit 471 next-hop-server. In the explicit scenario, the NHS is defined in the 472 node configuration. In the implicit case, the node can infer the NHS 473 to use. Similarly, an intermediate node that cannot answer a 474 Resolution Request SHOULD forward the Resolution Request to an 475 implicit or explicit NHS in the same manner unless local policy 476 forbids resolution forwarding between Spokes. There can be an 477 undetermined number of intermediate node. 479 A DMVPN compliant implementation MUST be able to infer the NHS from 480 its routing table in the following way: 482 o the address Dd to be resolved is looked up in the routing table 483 (other parameters can be considered by the ingress node but these 484 will not be available to intermediate nodes) 485 o the best route for Dd is selected (longest prefix match) 487 * if several routes match (same prefix length) only the routes 488 pointing to a DMVPN Tunnel interface are kept. This SHOULD NOT 489 occur in practice. 490 o if the best route found points to a DMVPN Tunnel interface, the 491 next-hop address MUST be used as NHS 492 o if the best route found does not point to a DMVPN Tunnel interface 493 the forwarding of the packet stops and the matching prefix P and 494 prefix len (Plen) is kept temporarily. Very often, P/Plen == D2/ 495 D2len (this is the case in the diagram used in this document) but 496 this may not always be true depending on the structure of the 497 networks protected by S2. The associated prefix length (Plen) is 498 also preserved. 500 If the Resolution Request forwarding stops at the ingress node (at 501 emission), the Resolution Request process MUST be stopped with an 502 error for address Dd. If the lookup succeeds, the next-hop's NBMA 503 address is used as destination address of the GRE encapsulation. 504 Before forwarding, each intermediate node MUST add a Forward Transit 505 Extension record to the NHRP Resolution Request. 507 Any intermediate nodes SHOULD NOT cache any information while 508 forwarding Resolution Requests. In the case an intermediate node 509 implementation caches information, it MUST NOT assume that other 510 intermediate nodes will also cache that information. 512 Thanks to the forwarding model described in this document and due to 513 the absence of intermediate caching, Server Cache Synchronization is 514 not needed and even recommended against. Therefore, a DMVPN 515 compliant implementation MUST NOT rely on such a synchronization 516 which would have adverse effects on the scalability of the entire 517 system. 519 If the TTL of the request drops to zero or the current node finds 520 itself on a Forward Transit Extension record then the NHRP Resolution 521 Request MUST be dropped and an NHRP error message sent to the source. 523 When the Resolution Request eventually reaches a node where the 524 route(s) to the destination would take it out through a non-DMVPN 525 interface, the Resolution Request process MUST be stopped and this 526 node becomes the egress node. The egress node is typically (by 527 virtue of network design) the topologically closest node to the 528 resolved address Dd. 530 The egress node must then prepare itself for replying with a 531 Resolution Reply. 533 4.6. Egress node NHRP cache and Tunnel Creation 535 When a node declares itself an egress node while attempting to 536 forward a Resolution Request, it MUST evaluate the need for 537 establishing a shortcut tunnel according to a user policy. Note that 538 an implementation is not mandated to support a user policy but then 539 the implicit policy MUST request the shortcut establishment. If 540 policies are supported, one of the possible policies MUST be shortcut 541 establishment. 543 If a shortcut is required, the egress node MUST perform the following 544 operations: 546 o the source NBMA address (PubS1) is extracted from the NHRP 547 Resolution Request 548 o if a GRE/IPsec tunnel already exists between PubS2 and PubS1, this 549 tunnel is selected (assuming interface TunnelX) 550 o otherwise, a new GRE shortcut tunnel is created between PubS2 and 551 PubS1 (assuming interface TunnelX); the GRE tunnel SHOULD be 552 protected by IPsec and the SA's immediately negotiated by IKE 553 o an NHRP cache entry is created for TunS1 => PubS1. The entry 554 SHOULD NOT remain in the cache for more than the specified Hold 555 Time (from the NHRP Resolution Request). This NHRP cache entry 556 may be 'refreshed' for another hold time period prior to expiry by 557 receipt of another matching NHRP Resolution Request or by sending 558 an NHRP Resolution Request and receiving an NHRP Resolution Reply. 559 o a route is inserted into the RIB: TunS1/32 => PubS1 on TunnelX 560 (assuming IPv4) 562 Regardless how the shortcut tunnel is created a node SHOULD NOT try 563 to establish more than one tunnel with a remote node. If there are 564 other tunnels not managed by DMVPN, the tunnel selectors (source, 565 destination, tunnel key) MUST NOT interfere with the DMVPN shortcut 566 tunnels. 568 If a tunnel has to be created and SA's established, a node SHOULD 569 wait for the tunnel to be in place before proceeding with further 570 operations. Regardless of how those operations are timed in the 571 implementation, a node SHOULD avoid dropping data packets during the 572 cache and SA installation. The order of operations SHOULD ensure 573 continuous forwarding. 575 4.7. Resolution Reply format and processing 577 After the operations described in the previous section are completed, 578 a Resolution Reply MUST be emitted by the egress node. Instead of 579 strictly answering with just the host address being looked up, the 580 Reply will contain the entire prefix (P/Plen) that was found during 581 the stopped Resolution Request forwarding phase. 583 The Resolution Reply main fields MUST be populated as follows: 585 o Fixed Header 587 * ar$op.version = 1 588 * ar$op.type = 2 589 o Common Header (Mandatory Header) 591 * Source NBMA Address = PubS1 592 * Source Protocol Address = TunS1 593 * Destination Protocol Address = Dd 594 o CIE-1 596 * Prefix-len = Plen 597 * Client NBMA Address = PubS2 598 * Client Protocol Address = TunS2 600 The Destination Protocol address remains the address being resolved 601 (Dd) while the CIE actually contains the remainder of the response 602 (Plen via NBMA PubS2, Protocol TunS2). The Resolution Reply MUST be 603 forwarded to the ingress node S1 either through the shortcut tunnel 604 or via the Hub. 606 If the address family of the resolved address Dd is IPv6, the 607 Resolution Reply SHOULD be augmented with a second CIE containing the 608 egress node's link local address. 610 If a node decides to block the resolution process, it MAY simply drop 611 the Resolution Request or avoid sending a Resolution Reply. A node 612 MAY also send a NACK Resolution Reply. 614 When the Resolution Reply is received by the ingress node, a new 615 tunnel TunnelY MUST be created pointing to PubS2 if one does not 616 already exist (which depends on whether the Resolution Reply was 617 routed via the Hub(s) or directly on the shortcut tunnel). The 618 ingress node MUST process the reply in the following way: 620 o Validate that this Resolution Reply corresponds to a Request 621 emitted by S1. If not, issue an error and stop processing the 622 Reply. 624 o An NHRP Cache entry is created for TunS2 => PubS2 625 o Two routes are added to the routing table: 627 * TunS2 => TunnelY 628 * P/Plen => TunS2 on TunnelY 630 Though implementations may be entirely different, a compliant 631 implementation MUST exhibit a functional behavior strictly equivalent 632 to the one described above. I.e. IP packets MUST eventually be 633 forwarded according to the above implementation. 635 DMVPN compliant implementations MUST support providing and receiving 636 aggregated address resolution information. 638 4.8. From Hub and Spoke to Dynamic Mesh 640 At the end of the resolution process, the overlay topology will be as 641 follows: 643 DC1 644 | 645 [H1] 646 | | ] 647 +-+ +-+ ] GRE/IPsec tunnels over Transport network 648 | | ] 649 [S1]===[S2] 650 | | 651 D1 D2 653 Shortcut tunnel established 655 Where the tunnel depicted with = is a GRE/IPsec shortcut tunnel 656 created by NHRP. The Routing Table on S1 will now look as follows: 658 o TunH1 => Tunnel0 659 o SUM => TunH1 on Tunnel0 660 o 0.0.0.0/0 => IntPub 661 o D1 => IntPriv 662 o TunS2 => TunnelY 663 o P/Plen => TunS2 on TunnelY 665 It is easy to see that traffic from D1 to D2 will follow the shortcut 666 path under the assumption that P == D2 or D2 is a subnet included in 667 P. 669 The tunnels between S* and H* are actually tunnels created 670 automatically to bootstrap the DMVPN. In practice the initial 671 topology will be a static star (aka Hub and Spoke) topology between 672 S* and H* that will evolve into a dynamic mesh between the nodes S*. 674 From the spokes (S*) standpoint, the bootstrap tunnels can be 675 established with a node H1 statically defined or discovered by DNS. 676 The problem of finding the initial hubs in a DMVPN is not different 677 than finding regular hubs in a traditional Hub and Spoke network. 679 For scalability reasons, it is expected that the NHRP Indirection/ 680 Resolution is the only way by which routes are exchanged between S* 681 nodes. While this does not fall in the context of this document, it 682 is worth mentioning that actual implementations SHOULD NOT establish 683 a routing protocol adjacency directly over the shortcut tunnels. 685 4.9. Remote Access Clients 687 The specification in this document allows a node to not protect any 688 private network. I.e. in a degenerate case, it MUST be possible for 689 a node S1 to not have a D1 network attached to it. Instead, S1 only 690 owns a PubS1? and TunS1? address. This would typically the case of a 691 remote access client (PC, mobile device,...) that only has a tunnel 692 address and an NBMA address. 694 DC1 695 | 696 [H1] 697 | | ] 698 +-+ +-+ ] GRE/IPsec tunnels over Transport network 699 | | ] 700 [S1]===[S2] 701 | 702 D2 704 Remote Access Client 706 On the diagram above, S1 is actually a simple PC or mobile node that 707 is not protecting any other network other than its own tunnel 708 address. 710 These nodes may fully participate in a DMVPN network, including 711 building spoke-spoke tunnels as long as they support GRE, NHRP, IPsec 712 /IKE, and have a way to separate tunneled traffic (virtual 713 interfaces) and be able to update a local routing table to associate 714 networks with different next-hops out either their IntTun (data 715 traffic going over the tunnel) or (IntPub) (tunnel packets themselves 716 and/or non-tunneled data traffic). They may not need to run a 717 routing protocol. 719 4.10. Node mutual authentication 721 Nodes authenticate each other using the IKE protocol, while they 722 attempt to establish a tunnel. Because the system is by nature 723 extremely distributed, it is recommended to use X.509 certificates 724 for authentication. Internet Public Key Infrastructure is described 725 in [RFC5280] 727 The structured names and various fields in the certificate can be 728 useful for filtering undesired connectivity in large administrative 729 domains or when two domains are being partially merged. It is indeed 730 easy for a system administrator to define filters to prevent 731 connectivity between nodes that are not supposed to communicate 732 directly (e.g. filtering based on the O or OU fields). 734 Though nodes may be blocked from building a direct tunnel by the 735 above means they may or may not be allowed to communicate via a 736 spoke-hub-spoke path. Allowing or blocking communication via the 737 spoke-hub-spoke path is outside the scope of this document. 739 5. Packet Formats 741 As described in [RFC2332], an NHRP packet consists of a fixed part, a 742 mandatory part and an extensions part. The Fixed Part is common to 743 all NHRP packet types. The Mandatory Part MUST be present, but 744 varies depending on packet type. The Extensions Part also varies 745 depending on packet type, and need not be present. This section 746 describes the packet format of the new messages introduced as well as 747 extensions to the existing packet types. 749 5.1. NHRP Traffic Indication 751 The fixed part of an NHRP Traffic Indication packet picks itself 752 directly from the standard NHRP fixed part and all fields pick up the 753 same meaning as in [RFC2332] unless otherwise explicitly stated. 755 0 1 2 3 756 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 759 | ar$afn | ar$pro.type | 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 | ar$pro.snap | 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 763 | ar$pro.snap | ar$hopcnt | ar$pktsz | 764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 765 | ar$chksum | ar$extoff | 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767 | ar$op.version | ar$op.type | ar$shtl | ar$sstl | 768 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 770 Figure 3: Traffic Indication Fixed Header 772 o ar$op.type With ar$op.version = 1, this is an NHRP packet. 773 Further, [RFC2332] uses the numbers 1-7 for standard NHRP 774 messages. When ar$op.type = 8, this indicates a traffic 775 indication packet. 777 The mandatory part of the NHRP Traffic Indication packet is slightly 778 different from the NHRP Resolution/Registration/Purge Request/Reply 779 packets and bears a much closer resemblance with the mandatory part 780 of NHRP Error Indication packet. The mandatory part of an NHRP 781 Traffic Indication has the following format 783 0 1 2 3 784 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 785 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 786 | Src Proto Len | Dst Proto Len | unused | 787 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 788 | Traffic Code | unused | 789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 790 | Source NBMA Address (variable length) | 791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 792 | Source NBMA Subaddress (variable length) | 793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 794 | Source Protocol Address (variable length) | 795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 796 | Destination Protocol Address (variable length) | 797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 798 | Contents of Data Packet in traffic (variable length) | 799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 801 Figure 4: Traffic Indication Mandatory Part 803 o Src Proto Len: This field holds the length in octets of the Source 804 Protocol Address. 806 o Dst Proto Len: This field holds the length in octets of the 807 Destination Protocol Address. 808 o Traffic Code: A code indicating the type of traffic indication 809 message, chosen from the following list 811 * 0: NHRP Traffic Redirect/Indirection message.This indirection 812 is an indication,to the receiver, of the possible existence of 813 a 'better' path in the NBMA network. 814 o Source NBMA Address: The Source NBMA address field is the address 815 of the station which generated the traffic indication. 816 o Source NBMA SubAddress: The Source NBMA subaddress field is the 817 address of the station generated the traffic indication. If the 818 field's length as specified in ar$sstl is 0 then no storage is 819 allocated for this address at all. 820 o Source Protocol Address: This is the protocol address of the 821 station which issued the Traffic Indication packet. 822 o Destination Protocol Address: This is the destination IP address 823 from the packet which triggered the sending of this Traffic 824 Indication message. 826 Note that unlike NHRP Resolution/Registration/Purge messages, Traffic 827 Indication message doesn't have a request/reply pair nor does it 828 contain any CIE though it may contain extension records. 830 6. Security Considerations 832 The use of NHRP and its protocol extensions described in this 833 document do not open a direct security hole. The peers are duly 834 authenticated with each other by IKE and the traffic is protected by 835 IPsec. The only risk may come from inside the network itself; this 836 is not different from static meshes. 838 Implementers must be diligent in offering all the control and data 839 plane filtering options that an administrator would need to secure 840 the communication inside the overlay network. 842 7. IANA Considerations 844 The following values are used experimentally: 846 o The ar$op.type value of 8 representing Traffic Indication 847 o Traffic Code value of 0 indicating a Traffic Indirection message. 849 Full standardization would require official IANA numbers to be 850 assigned. 852 8. Match against ADVPN requirements 853 This section compares the adequacy of DMVPN to the requirement list 854 stated in [ADVPNreq]. 856 8.1. Requirement 1 858 A new spoke in a DMVPN does not require changes on a hub to which it 859 is connected other than authentication and authorization state which 860 are dynamically handled. No state is required on other hubs because 861 addresses are passed between hubs using NHRP and IKEv2. This 862 requirement is one of the basic features of DMVPN. 864 8.2. Requirement 2 866 NHRP is used to distribute dynamic peer NBMA and Overlay addresses. 867 These addresses will be redistributed or rediscovered upon any 868 address change. This requirement is one of the basic features of 869 DMVPN. Practical implementation and deployments already exist that 870 take advantage of this mechanism. 872 8.3. Requirement 3 874 DMVPN requires minimal configuration in order to configure protocols 875 running over IPsec tunnels. The tunnels are latched to their crypto 876 socket according to [RFC5660]. The routing protocols or other 877 feature do not even need to be aware of the IPsec layer nor does 878 IPsec need to be aware of the actual traffic it carries. Practical 879 implementation and deployments already exist. 881 8.4. Requirement 4 883 Spokes can talk directly to each other if and only if the Hub and 884 Spoke policies allow it. Sections Section 4.6 and Section 4.5 885 explicitly mention places where such policies should be applied. 886 Practical implementation and deployments already exist that exhibit 887 this form of restriction. 889 8.5. Requirement 5 891 Each DMVPN peer has unique authentication credentials and uses them 892 for each peer connection. The credentials do not need to be shared 893 or pre-shared unless the administrator allows it which is out of the 894 scope of this document. To this effect, DMVPN makes great use of 895 certificates as a strong authentication mechanism. Cross-domain 896 authentication is made possible by PKI should the security gateways 897 belong to different PKI domains. Practical implementation and 898 deployments already exist that take advantage of this mechanism. 900 8.6. Requirement 6 902 DMVPN Gateways are free to roam. The only requirement is that Spokes 903 update their peers with their new NBMA IP address should it change. 904 Implementations MAY choose to update their peers via MOBIKE but MUST 905 support a re-registration and re-discovery. Roaming across hubs 906 require that the new hub learns the prefixes behind the branch which 907 is what DMVPN does by construction. For supporting roaming hubs 908 changing their NBMA IP address, Hubs' DNS record MUST be updated (the 909 mechanism is not covered in this document) and Spokes MUST be able to 910 resolve a Hub NBMA address by DNS. Practical implementation and 911 deployments already exist. 913 8.7. Requirement 7 915 Handoffs are possible and can be initiated by a Hub or a Spoke. At 916 any point in time, a Spoke may create multiple simultaneous 917 connections to several Hubs and change its routing policies to send 918 or receive traffic via any of the active tunnels. If a Hub wishes to 919 offload a connection to another Hub, the Hub can do so by using an 920 IKE REDIRECT as explained in [RFC5685]. Those handoffs are optional 921 and left at the discretion of the implementer. Partial practical 922 implementation and deployments already exist and more get developed 923 on an ad-hoc basis without breaking protocol-level compatibility. 925 8.8. Requirement 8 927 DMVPN support gateways behind NAT boxes through the IKEv2 NAT 928 Traversal Exchange. Practical implementation and deployments already 929 exist. 931 8.9. Requirement 9 933 Changes of SA are reportable and manageable. This document does not 934 define a MIB nor imposes message formats or protocols (Syslog, 935 Traps,...). All tables such as NHRP, IPsec SA's and routing table 936 are MIB manageable. The creation of IKE sessions triggers messages 937 and NHRP can be instrumented to log and report any necessary event. 938 Practical implementation and deployments already take advantage of 939 those facilities. 941 8.10. Requirement 10 943 With an appropriate PKI authorization structure, DMVPN can support 944 allied and federated environments. Practical implementation and 945 deployments already exist. 947 8.11. Requirement 11 949 DMVPN supports star, full mesh, or a partial mesh topologies. The 950 protocol stack exposed here can be applied to all known scenarios. 951 Implementers are free to cover and support the adequate use cases. 952 Practical deployments of all those topologies already exist. 954 8.12. Requirement 12 956 DMVPN can distribute multicast traffic by taking advantage of 957 protocols such as PIM, IGMP and MSDP. Practical implementation and 958 deployments already exist. 960 8.13. Requirement 13 962 DMVPN allows monitoring and logging. All topology changes, 963 connections and disconnections are logged and can be monitored. The 964 DMVPN solution explained in this document does not preclude any form 965 of logging or monitoring and additional monitoring points can be 966 added without impacting interoperability. Practical deployments 967 already exist that take advantage of those facilities. 969 8.14. Requirement 14 971 L3VPNs are supported over IPsec/GRE tunnels. The main advantage of a 972 GRE tunnel protected by IPsec is that L2 frames do not need any 973 additional IP encapsulation which means that L2 frames can be 974 natively transported over DMVPN. Practical L3VPN implementation and 975 deployments already exist. 977 8.15. Requirement 15 979 DMVPN supports per-peer QoS between Spoke or Hub or between Spokes. 980 The QoS implementation is out of the scope of this document. 981 Practical implementation and deployments already exist. 983 8.16. Requirement 16 985 DMVPN allows multiple resiliency mechanisms and no device, Spoke or 986 Hub is a single point of failure by protocol design. Multiple 987 encrypted tunnels can be established between Spokes and Hubs or Hubs 988 can be configured as redundant entities allowing failover. Practical 989 such deployments already exist. 991 9. Acknowldegements 993 The authors would like to thank Brian Weis, Mark Comeadow and Mark 994 Jackson from Cisco for their help in publishing and reviewing this 995 document. We would also like to acknowledge the historical DMVPN 996 team, in particular Jan Vilhuber and Pratima Sethi. 998 10. References 1000 10.1. Normative References 1002 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1003 Requirement Levels", BCP 14, RFC 2119, March 1997. 1005 [RFC2332] Luciani, J., Katz, D., Piscitello, D., Cole, B., and N. 1006 Doraswamy, "NBMA Next Hop Resolution Protocol (NHRP)", RFC 1007 2332, April 1998. 1009 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1010 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1011 March 2000. 1013 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1014 Internet Protocol", RFC 4301, December 2005. 1016 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1017 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1018 May 2008. 1020 [RFC5660] Williams, N., "IPsec Channels: Connection Latching", RFC 1021 5660, October 2009. 1023 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 1024 the Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 1025 5685, November 2009. 1027 [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 1028 "Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 1029 5996, September 2010. 1031 10.2. Informative References 1033 [ADVPNreq] 1034 Hanna, S., "Auto Discovery VPN Problem Statement and 1035 Requirements", June 2013, . 1038 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 1039 Housley, R., and W. Polk, "Internet X.509 Public Key 1040 Infrastructure Certificate and Certificate Revocation List 1041 (CRL) Profile", RFC 5280, May 2008. 1043 Authors' Addresses 1045 Frederic Detienne 1046 Cisco 1047 De Kleetlaan 7 1048 Diegem 1831 1049 Belgium 1051 Email: fd@cisco.com 1053 Manish Kumar 1054 Cisco 1055 Mail Stop BGL14/G/ 1056 SEZ Unit, Cessna Business Park 1057 Varthur Hobli, Sarjapur Marathalli Outer Ring Road 1058 Bangalore, Karnataka 560 103 1059 India 1061 Email: manishkr@cisco.com 1063 Mike Sullenberger 1064 Cisco 1065 Mail Stop SJCK/3/1 1066 225 W. Tasman Drive 1067 San Jose, California 95134 1068 United States 1070 Email: mls@cisco.com