idnits 2.17.1 draft-hao-nvo3-anycast-gw-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii) Publication Limitation clause. If this document is intended for submission to the IESG for publication, this constitutes an error. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 9 instances of too long lines in the document, the longest one being 22 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 30, 2014) is 3560 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'NVO3FRWK' on line 111 -- Looks like a reference, but probably isn't: 'NVO3PS' on line 112 -- Looks like a reference, but probably isn't: 'RFC2338' on line 265 -- Looks like a reference, but probably isn't: 'RFC3768' on line 265 -- Looks like a reference, but probably isn't: 'RFC5798' on line 266 -- Looks like a reference, but probably isn't: 'NVO3ARCH' on line 232 -- Looks like a reference, but probably isn't: 'RFC2119' on line 176 -- Looks like a reference, but probably isn't: 'RFC6325' on line 176 == Unused Reference: '1' is defined on line 319, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 323, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 327, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 330, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 333, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-ietf-nvo3-arch-01 == Outdated reference: A later version (-09) exists of draft-ietf-nvo3-framework-03 -- Obsolete informational reference (is this intentional?): RFC 2338 (ref. '3') (Obsoleted by RFC 3768) -- Obsolete informational reference (is this intentional?): RFC 3768 (ref. '4') (Obsoleted by RFC 5798) Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 Weiguo Hao 2 Lucy Yong 3 Yizhou Li 4 Internet Draft Huawei 5 Feng Wang 6 H3C 7 W.Shao 8 Tencent 9 Vic Liu 10 China Mobile 12 Intended status: Informational June 30, 2014 13 Expires: December 2014 15 NVO3 Anycast Layer 3 Gateway 16 draft-hao-nvo3-anycast-gw-00.txt 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. This document may not be modified, 25 and derivative works of it may not be created, and it may not be 26 published except as an Internet-Draft. 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. This document may not be modified, 30 and derivative works of it may not be created, except to publish it 31 as an RFC and to translate it into languages other than English. 33 This document may contain material from IETF Documents or IETF 34 Contributions published or made publicly available before November 35 10, 2008. The person(s) controlling the copyright in some of this 36 material may not have granted the IETF Trust the right to allow 37 modifications of such material outside the IETF Standards Process. 38 Without obtaining an adequate license from the person(s) controlling 39 the copyright in such materials, this document may not be modified 40 outside the IETF Standards Process, and derivative works of it may 41 not be created outside the IETF Standards Process, except to format 42 it for publication as an RFC or to translate it into languages other 43 than English. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF), its areas, and its working groups. Note that 47 other groups may also distribute working documents as Internet- 48 Drafts. 50 Internet-Drafts are draft documents valid for a maximum of six 51 months and may be updated, replaced, or obsoleted by other documents 52 at any time. It is inappropriate to use Internet-Drafts as 53 reference material or to cite them other than as "work in progress." 55 The list of current Internet-Drafts can be accessed at 56 http://www.ietf.org/ietf/1id-abstracts.txt 58 The list of Internet-Draft Shadow Directories can be accessed at 59 http://www.ietf.org/shadow.html 61 This Internet-Draft will expire on December 30, 2014. 63 Copyright Notice 65 Copyright (c) 2013 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with 73 respect to this document. Code Components extracted from this 74 document must include Simplified BSD License text as described in 75 Section 4.e of the Trust Legal Provisions and are provided without 76 warranty as described in the Simplified BSD License. 78 This document is subject to BCP 78 and the IETF Trust's Legal 79 Provisions Relating to IETF Documents 80 (http://trustee.ietf.org/license-info) in effect on the date of 81 publication of this document. Please review these documents 82 carefully, as they describe your rights and restrictions with 83 respect to this document. 85 Abstract 87 This draft describes centralized anycast layer 3 gateway solution 88 for NVO3 networks interworking with external networks. Comparing to 89 traditional VRRP based active-standby layer 3 gateway solution, this 90 solution can achieve better load balancing and scalability. 92 Table of Contents 94 1. Introduction ................................................ 3 95 2. Conventions used in this document............................ 4 96 3. Anycast Layer 3 Gateway...................................... 5 97 4. ARP Handling ................................................ 6 98 5. Resilience on Gateway Node Failure........................... 6 99 6. Anycast GW and VRRP GW Comparison............................ 6 100 6.1. VRRP based layer 3 gateway solution..................... 6 101 6.2. Comparison ............................................. 7 102 7. Security Considerations...................................... 7 103 8. IANA Considerations ......................................... 7 104 8.1. Normative References.................................... 8 105 8.2. Informative References.................................. 8 106 9. Acknowledgments ............................................. 8 108 1. Introduction 110 NVO3 overlay networks provide network connectivity to a set of 111 Tenant Systems (TSs) [NVO3FRWK]. A data center(DC) may support many 112 tenant networks.[NVO3PS] It is very often that some Tenant Systems 113 need to communicate with external networks. An external network may 114 be another overlay network in DC or a VPN in WAN or Internet. In 115 this case, a gateway (GW) is required where inter-network policies 116 are placed and enforced. 118 Figure 1 illustrates a popular DC infrastructure where two DC GWs 119 are used at DC boarder. All tenant system traffic going in/out DC 120 and between overlay VNs will pass through the DC GWs. For a large DC 121 and supporting many tenant networks, such GWs can be the pain point 122 for the scalability. Although VRRP [RFC2338] [RFC3768] [RFC5798] may 123 be used at the GWs to provide link/node redundancy, it does not 124 resolve the scalability issue. Distributed GW may be implemented on 125 NVEs [NVO3ARCH], which may reduce the traffic passing through these 126 GWs; however all the traffic going in/out DC still have to go 127 through these GWs. 129 ,---------. 130 ,' `. 131 ( IP/MPLS WAN ) 132 `. ,' 133 * -+------+' * 134 * * 135 * * 136 --------- --------- 137 | GW1 | | GW2 | 138 | | ************ | | 139 --------- --------- 140 * * 141 * * 142 * * 143 * * 144 * * 145 --------- --------- --------- --------- 146 | TOR1 | ******** | TOR2 | ********| TOR3 |********| TOR4 | 147 | | | | | | | | 148 --------- --------- --------- --------- 149 | | | | 150 | | | | 151 --------- --------- --------- --------- 152 | NVE1 | | NVE2 | | NVE3 | | NVE4 | 153 | | | | | | | | 154 --------- --------- --------- --------- 155 | | | | | | | | 156 ____ ____ ____ ____ ____ ____ ____ ____ 157 |T | |T | |T | |T | |T | |T | |T | |T | 158 |S1| |S2| |S3| |S4| |S5| |S6| |S7| |S8| 159 ---- ---- ---- ---- ---- ---- ---- ---- 160 1. Figure1 Centralized layer 3 gateway in NVO3 network 162 This draft proposes anycast layer 3 gateway solution for DC GWs that 163 address the scalability concern. To differentiate it from 164 distributed GWs in NVO3, the DC GW is referred to as centralized GW. 165 A centralized GW is a gateway network device that has embedded NVE 166 capability, i.e. ability to maintain the inner/outer mapping and 167 terminates overlay tunnels. 169 2. Conventions used in this document 171 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 172 NOT","SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 173 this document are to be interpreted as described in RFC 2119 174 [RFC2119].The acronyms and terminology in [RFC6325] is used herein 175 with the following additions: 177 Network Virtualization Edge (NVE)- An NVE is the network entity that 178 sits at the edge of an underlay network and implements network 179 virtualization functions. 181 Tenant System - A physical or virtual system that can play the role 182 of a host, or a forwarding element such as a router, switch,firewall, 183 etc. It belongs to a single tenant and connects to one or more VNs 184 of that tenant. 186 VN - A VN is a logical abstraction of a physical network that 187 provides L2 network services to a set of Tenant Systems. 189 3. Anycast Layer 3 Gateway 191 Anycast Layer 3 Gateway means that multiple GW network devices 192 support GW functions between overlay VNs and external networks and 193 have the same GW IP and MAC address for each overlay VN, these 194 gateways share same gateway IP and MAC address for each VN, the GW 195 IP and MAC address is called gateway anycast IP and MAC address. To 196 ensure NVO3 traffic load balancing from ingress NVEs to these 197 gateways, these gateways also share same outer IP address, this 198 address is called as device underlying anycast IP address in the 199 document. 201 Gateway anycast IP address is used as the default gateway's IP 202 address for all TSs in the corresponding VN. As different VNs are 203 allowed to have overlapping MAC address space, different anycast 204 gateway IP addresses can map to the same anycast MAC. That is to say, 205 each VN should have a unique anycast gateway IP address, however the 206 gateway MAC address for VNs may map to the same anycast MAC. It is 207 recommended to configure only one anycast MAC as all VNs gateway MAC 208 address on each gateway device for simplicity purpose. 210 When sending traffic toward a VN gateway on GW devices, ingress NVEs 211 use the device underlying anycast IP address as outer IP destination 212 address on NVO3 packets. The VN may be an L2 VN or L3 VN. 214 Each GW network device announces device underlying anycast IP 215 address in underlying IGP network. If these gateways have same 216 routing cost to an ingress NVE, the underlying equal-cost multi-path 217 (ECMP) approach will distribute the NVO3 traffic from the ingress 218 NVE to one of GW devices. 220 When sending traffic toward a tenant system in a VN, a VN GW on a GW 221 device obtains the mapping of the tenant system and attached NVE 222 from a table lookup. If the VN is an L3 VN, the GW device 223 encapsulates the packet with the NVE IP address as outer destination 224 IP address and its device underlying anycast IP address as the outer 225 source IP address. If the VN is an L2 VN, the GW device inserts 226 inner MAC header with its anycast MAC address as the source MAC 227 address and tenant MAC address as the destination MAC address; then 228 encapsulates the packet with the NVE IP address as outer destination 229 IP address and its underlying anycast IP address as the outer source 230 IP address. 232 NVA [NVO3ARCH] maintains TS/NVE mappings per a VN and pushes the 233 mappings to the NVEs and GW network devices. To support anycast L3 234 GW, NVA has the mapping of VN GW anycast IP and device underlying 235 anycast IP for an L3 VN; or the mapping of VN GW anycast MAC and 236 device underlying anycast IP for an L2 VN. 238 4. ARP Handling 240 To avoid ARP request flooding in each VN, NVEs can make use of the 241 mapping information from a Network Virtualization Authority (NVA) to 242 response the ARP request. 244 For L3 VN, upon receiving an ARP request with a VN GW anycast IP 245 address, local NVE intercepts it, and uses itself MAC address in the 246 reply. 248 For L2 VN, upon receiving an ARP request with a VN GW anycast IP 249 address, local NVE snoop it, and uses VN GW anycast MAC address in 250 the reply. Note that NVEs may locally maintain the mapping of VN GW 251 anycast IP and MAC address, or obtain from NVA. 253 5. Resilience on Gateway Node Failure 255 Anycast L3 gateway solution is resilient on a GW network device 256 failure. If a GW network device fails, IGP updates link status and 257 the host routes, the NVO3 encapsulated traffic with device 258 underlying anycast IP will only reach the remaining GW network 259 devices. 261 6. Anycast GW and VRRP GW Comparison 263 6.1. VRRP based layer 3 gateway solution 265 The Virtual Router Redundancy Protocol (VRRP) [RFC2338] [RFC3768] 266 [RFC5798] is designed to eliminate the single point of gateway 267 failure. VRRP is an election protocol that dynamically assigns 268 responsibility for a virtual router to one of the VRRP routers on a 269 layer 2 VN. Any of the virtual router's IP addresses on a LAN can 270 then be used as the default first hop router by end-hosts. The layer 271 3 gateway of VRRP master is responsible for forwarding packets 272 destined to the virtual router. If VRRP master fails, VRRP backup 273 will take over. 275 VRRP based solution has the following issues: 277 1. Inefficient network bandwidth usage. Only the VRRP master gateway 278 forwards the traffic. VRRP slave is idle most of the time. 280 2. VRRP session number per VN. VRRP session among physical layer 3 281 gateways should be established per layer 2 VN. Large number of 282 layer 2 VN will cause heavy CPU workload for each layer 3 gateway. 284 6.2. Comparison 286 +----------------------------+------------------------+--------------------------------+ 288 | Dimension | VRRP | Anycast gatway solution | 290 +----------------------------+------------------------+--------------------------------+ 292 | Network bandwidth usage | Low | High | 294 +----------------------------+------------------------+--------------------------------+ 296 | Keep alive workload | VRRP Session per VN | No | 298 +----------------------------+------------------------+--------------------------------+ 300 | Network resilience | VRRP Switchover | Underlying network convergence | 302 +----------------------------+------------------------+--------------------------------+ 304 7. Security Considerations 306 NA 308 8. IANA Considerations 310 NA 312 8.1. Normative References 314 [1] [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 315 Requirement Levels", BCP 14, RFC2119, March 1997. 317 8.2. Informative References 319 [1] [NVO3ARCH] Black, D, Narten, T., et al, " An Architecture for 320 Overlay Networks (NVO3)", draft-ietf-nvo3-arch-01, work in 321 progress. 323 [2] [NVO3FRWK] LASSERRE, M., Motin, T., et al, "Framework for DC 324 Network Virtualization", draft-ietf-nvo3-framework-03, work in 325 progress. 327 [3] [RFC 2338] S. Knight, et al, ''Virtual Router Redundancy 328 Protocol'',RFC 2338, April 1998 330 [4] [RFC 3768] R. Hinden, Ed., "Virtual Router Redundancy Protocol 331 (VRRP)", RFC 3768, April 2004 333 [5] [RFC 5798] S. Nadas,Ed., ''Virtual Router Redundancy Protocol 334 (VRRP) Version 3 for IPv4 and IPv6'', RFC 5798, March 2010 336 9. Acknowledgments 338 The authors wish to acknowledge the important contributions of Zhang 339 Chengsong. 341 Authors' Addresses 343 Weiguo Hao 344 Huawei Technologies 345 101 Software Avenue, 346 Nanjing 210012 347 China 348 Phone: +86-25-56623144 349 Email: haoweiguo@huawei.com 351 Lucy Yong 352 Phone: +1-918-808-1918 353 Email: lucy.yong@huawei.com 355 Yizhou Li 356 Huawei Technologies 357 101 Software Avenue, 358 Nanjing 210012 359 China 360 Phone: +86-25-56625375 361 Email: liyizhou@huawei.com 363 Feng wang 364 H3C Technologies 365 Email: imfeng@h3c.com 367 Wade Shao 368 Tencent 369 Email: wadeshao@tencent.com 371 Vic Liu 372 China Mobile 373 32 Xuanwumen West Ave, Beijing, China 374 Email: liuzhiheng@chinamobile.com