idnits 2.17.1 draft-kj-nvo3-pion-architecture-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 220: '...que per NVE, but MUST be unique per co...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 11, 2012) is 4367 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC6513' is defined on line 492, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-davie-stt-01 == Outdated reference: A later version (-04) exists of draft-kreeger-nvo3-overlay-cp-00 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-01 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-00 Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Jin 3 Internet-Draft ZTE 4 Intended status: Informational B. Khasnabish 5 Expires: November 12, 2012 ZTE USA 6 May 11, 2012 8 Architecture of PSN Independent Overlay Network(PION) 9 draft-kj-nvo3-pion-architecture-00.txt 11 Abstract 13 This draft introduces PSN independent overlay network (PION) 14 architecture for intra- and inter-datacenter (DC) connections. The 15 motivations, protocol layers, applications, and etc, for PION are 16 also discussed. PION provides a virtualized underlying-PSN- 17 independent network in order to maximize the reuse of IETF protocol 18 definitions and implementations. The inter- and intra-DC connection 19 provided by PION could be from endpoint to endpoint, or endpoint to 20 network, or network to network. The packet transport capabilities 21 provided by the overlay network are determined by the capability of 22 the underlying PSN. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on November 12, 2012. 41 Copyright Notice 43 Copyright (c) 2012 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . 3 60 3. PION Definition . . . . . . . . . . . . . . . . . . . . . . . 3 61 4. PION Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 62 5. PION Protocol model . . . . . . . . . . . . . . . . . . . . . 5 63 5.1. Protocol Layers . . . . . . . . . . . . . . . . . . . . . 5 64 5.2. Encapsulation Layer . . . . . . . . . . . . . . . . . . . 6 65 5.3. Tenant Network Identifier Layer . . . . . . . . . . . . . 7 66 5.4. PSN Layer Encapsulation . . . . . . . . . . . . . . . . . 7 67 5.5. Associate tenant and PSN Layer . . . . . . . . . . . . . . 7 68 6. Network Architecture . . . . . . . . . . . . . . . . . . . . . 8 69 7. Applicability of PION . . . . . . . . . . . . . . . . . . . . 8 70 7.1. PION over IP PSN . . . . . . . . . . . . . . . . . . . . . 9 71 7.2. PION over MPLS PSN . . . . . . . . . . . . . . . . . . . . 9 72 8. Control Plane Consideration . . . . . . . . . . . . . . . . . 10 73 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 74 10. Informative References . . . . . . . . . . . . . . . . . . . . 11 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 77 1. Introduction 79 This draft introduces architecture of PSN independent overlay network 80 (PION) for intra and inter datacenter connection, the motivation, 81 protocol layer, application and etc. The PSN in this draft refers to 82 IP or MPLS network. PION provides a virtualized network independent 83 of underlying PSN, so as to maximize the reuse of IETF protocol 84 definitions and implementations. That means the overlay network 85 could work on any underlying PSN layer, and reuse the capability of 86 the underlying layer. 88 The inter- and intra-DC connection provided by PION could be from 89 endpoint to endpoint, or endpoint to network, or network to network. 90 The packet transport capabilities provided by the overlay network are 91 determined by the capability of the underlying PSN. Enabling overlay 92 network to be independent of underlying PSN allows overlay network to 93 be benefit from different kinds of underlying PSN capabilities. 95 2. List of Acronyms 97 PSN: Packet Switched Network 99 PION: PSN independent overlay network 101 PION Header: include an encapsulation layer and tenant ID layer 103 Tenant Packet: a customer packet encapsulated with PION header 105 BW: BandWidth 107 ECMP: Equal Cost Multi-Path 109 MPLS: Multi-Protocol Label Switch 111 NVE: Network Virtualization Edge 113 QoS: Quality of Service 115 SLA: Service level aggrement 117 3. PION Definition 119 PSN independent overlay network (PION) is to provide an overlay 120 network for datacenter, to provider intra and inter connection 121 between end-points over various kinds of underlying PSN. PION could 122 provide service bandwidth and QoS assurance, multicast, traffic 123 engineering, security and other capabilities by relying on different 124 kinds of underlying PSN capabilities. PION is designed to isolate 125 traffic and addresses among different tenants, and to be scalable 126 enough to accommodate millions of end-points. 128 The PION provides the following functionalities: 130 1. Be PSN independent, to maximize reuse of existing IETF defined 131 PSN technologies. 133 2. Provide traffic/address isolation for each tenant traffic; 135 3. Provide good scalability to accommodate two million VMs running 136 on greater than one hundred thousands of physical servers; 138 4. Provide differentiate service for differentiate tenant, including 139 bandwidth, QoS and etc; 141 4. PION Motivation 143 As a core requirement, emerging datacenters need to support both 144 multi-tenancy and high scalability. The packet transport 145 capabilities provided by overlay network are determined by the 146 capability of underlying PSN, making overlay network independent of 147 underlying PSN allow overlay network to benefit from different kinds 148 of underlying PSN capabilities. 150 The IP PSN could provide the maximum connection availability for the 151 overlay network, and it also provides stateless IP connections which 152 ease the operation of PSN tunnel. Approach examples like VXLAN 153 [I-D.mahalingam-dutt-dcops-vxlan], NVGRE 154 [I-D.sridharan-virtualization-nvgre] or STT [I-D.davie-stt] allow 155 setting up of a Layer 2 overlay network over UDP, IP, or even TCP- 156 like. 158 The MPLS PSN is now widely deployed in wide area network, and has 159 been proved to have capability to provide connection with bandwidth 160 guarantee, differentiate QoS assurance, high resiliency, and etc. 161 There are some capabilities that IP PSN does not own, but MPLS PSN 162 do. One typical example is as below: 164 1. It is required to support bandwidth guarantee per tenant, not 165 shared bandwidth provisioning among tenants. The IP connections 166 resources among NVEs are served for all tenants, and would be unable 167 to setup connections that are dedicated for tenants. Some "Gold" 168 class tenants may require bandwidth guarantee for the service. If 169 tenants of other category (e.g., Silver, Bronze, etc.) are mixed/ 170 shared with Gold category tenants, and the traffic flows from all 171 category tenants are transferred over the same connection, the 172 desired bandwidth of the Gold-class tenants may not be guaranteed. 173 When the overlay network is across WAN, the bandwidth guarantee 174 problem would be exaggerated by the limited bandwidth in WAN. The 175 MPLS PSM has the capability to provide tenant-aware traffic 176 transportation. For example, when the connection provided by overlay 177 network is across WAN with IP/MPLS enabled, and then the specified 178 tenant traffic could be traffic engineered by the IP/MPLS network, 179 which would greatly improve the tenant service transportation 180 quality. 182 The purpose of PSN independent overlay network (PION) is to reuse 183 various kinds of existing IETF defined PSN technologies, while 184 keeping the tenant packet encapsulation to be uniformed over 185 different type of PSN connections/tunnels. The PSN here mainly refer 186 to IP and MPLS, the layer2 PSN technologies are excluded. 188 5. PION Protocol model 190 5.1. Protocol Layers 192 PION protocol layering model is shown below: 194 +-------------------------------------------+ 195 | Customer Payload | 196 | ~~~ | 197 /===========================================\ 198 H Tenant Network Identifier H 199 H-------------------------------------------H <--Tenant Header 200 H Encapsulation H 201 \===========================================/ 202 | PSN Layer | 203 +-------------------------------------------+ 204 | Data-Link | 205 +-------------------------------------------+ 206 | Physical Layer | 207 +-------------------------------------------+ 209 Figure 1 211 The customer payload in datacenter would be an Ethernet payload, but 212 here it does not preclude other type of payload, e.g, IP payload. 214 The encapsulation layer provides packet transport with some 215 capabilities that other layers could not provide. For more detail, 216 see section 4.2. 218 Tenant Network Identifier (TNI) layer provides customer traffic and 219 address isolation among different tenants. This identifier maybe 220 unique per NVE, but MUST be unique per connection between two NVEs. 222 The PSN layer provides physical network transport for the virtualized 223 network in datacenter, and is maximally reused from IETF definition 224 protocols. The ECMP transport capability of PSN layer should be able 225 to hash the traffic per flow per tenant. 227 Data-Link and physical layer is out of the scope of this document. 229 5.2. Encapsulation Layer 231 There are several functions/services that the encapsulation layer 232 could provide. This draft lists the following functions/services: 234 1. Customer payload indication to indicate different type of 235 customer payloads. 237 2. Packet sequencing and fragmentation capability. 239 3. Flow entropy value to add flow based entropy, and tag all the 240 packets from a flow with an entropy label. 242 The customer payload could be Ethernet in many cases, but does not 243 preclude IP payload. An indication value in encapsulation layer 244 could be provided to indicate the customer payload type. Some 245 application using UDP transportation requires to transmit packets 246 with sequence, and to get the information of packet loss. Some 247 application requires lager packet transportation to improve 248 efficiency, and then packet fragmentation is required, and is 249 preferred to be performed at hardware layer. The encapsulation layer 250 has the capability to provide packet fragmentation information. Some 251 PSN connection used by PION does not provide ECMP capability, e.g, 252 GRE. The encapsulation layer would provide such ECMP capability, by 253 adding a flow entropy value to indicate flow based entropy, and it is 254 required to tag all the packets from one flow with same entropy 255 value. 257 As the PSN layer with UDP encapsulation, the entropy value could be 258 added to the UDP source port, then the flow entropy value in 259 encapsulation layer could be omitted. 261 As the PSN layer with GRE tunnel, the flow entropy value in 262 encapsulation layer should be added if ECMP per flow is required. 264 As the PSN layer with TCP-like [I-D.davie-stt] encapsulation, the 265 sequencing and fragmentation could be provided by the IP layer, and 266 then the sequencing and fragmentation capability in tenant header 267 could be omitted. The entropy value could be added to the TCP source 268 port, and then the flow entropy value in encapsulation layer could be 269 omitted. 271 As the PSN layer with MPLS tunnel, the sequencing and fragmentation 272 in tenant header would be applied if required. The entropy value 273 could be added to the MPLS flow label, and then the flow entropy 274 value in encapsulation layer could be omitted. 276 5.3. Tenant Network Identifier Layer 278 The tenant network identifier (TNI) could be an integer to indicate 279 the membership of each customer packet. One example is to use an 280 explicit integer number, like VLAN. Take datacenter for example, 281 explicit tenant ID will simplify the interoperations in the inter- 282 datacenter connection environment. By whatever control plane TNI has 283 been allocated, static configuration or dynamic allocation, the 284 overlay network with different control plane could be always 285 interoperable with same TNI. That would be particular useful when 286 interconnecting two datacenter with different control plane, the 287 operator only needs to ensure the same TNI (or by TNI translation) to 288 interoperate. 290 5.4. PSN Layer Encapsulation 292 The PSN Layer required for PION could be any kinds of PSN connection 293 that has capability to transmit tenant packets. There would be 294 generally two kinds of PSN connection that could be provided, IP and 295 MPLS. 297 5.5. Associate tenant and PSN Layer 299 It is the NVE's responsibility to associate the tenant with PSN 300 connection to one peer NVE, which could be done by configuration or 301 other implementation specific way. Different type of PSN connections 302 could be used between different NVEs within one tenant. The NVE 303 should have the capability to setup the specified PSN connection if 304 required. For example, if only IP connection required between or 305 among NVEs, IP connection setup capability is required for NVEs. If 306 one NVE requires BW guarantee connection to peer NVE which is located 307 in another datacenter across WAN, the NVE should setup hierarchy MPLS 308 LSP as specified in section 7.2, and specify the bandwidth required. 310 6. Network Architecture 312 One important application for the overlay network is to provide intra 313 and inter datacenter connection between end-point and end-point, or 314 end-point and network, see figure below. 316 /-----DC1-----\ /-------WAN-------\ /-----DC2-----\ 317 +---+ | | | | +---+ 318 |NVE|--\ | | | | /--|NVE| 319 | 1 | \ / | 3 | 320 +---+ \ +-------+ +-------+ / +---+ 321 \--| Edge | | Edge |--/ 322 /--|Router1|-----------|Router2|--\ 323 / +-------+ +-------+ \ 324 +---+ / \ +---+ 325 |NVE|--/ \--|NVE| 326 | 2 | | | | | | 4 | 327 +---+ | | | | +---+ 328 \-----DC1-----/ \-------WAN-------/ \-----DC2-----/ 330 Figure 2 332 There are two datacenters, DC1 and DC2 managed by same 333 administrators, and the two datacenters are connected by the WAN 334 which would be an IP/MPLS network. The overlay network of intra- 335 datacenter connection is to connect end-points within a datacenter 336 (e.g, between NVE1 and NVE2, NVE3 and NVE4), and be scalable enough 337 without being restricted by the topology of underlying datacenter 338 network. 340 The overlay network of inter-datacenter connection is to connect end- 341 points between two different datacenters (e.g, between NVE1 and NVE3 342 if NVE1 and NVE3 is the gateway). In this case, the overlay network 343 would be across a wide area network where the network resources would 344 be always limited. The overlay network should have the capability to 345 provide QoS/BW guarantee per tenant customers, even when being across 346 WAN where large network providers have already deployed MPLS 347 technology widely. In addition, the MPLS network has the capability 348 to provide traffic-engineering, QoS/BW guarantee, and higher 349 reliability. The overlay network should have the capability to 350 benefit from the underlying network to provide high quality service. 352 7. Applicability of PION 353 7.1. PION over IP PSN 355 Most of the datacenters have IP transport capability, and the end- 356 points would be reasonably to be assumed to be IP reachable in most 357 cases. The PSN layer would be an IP layer with UDP encapsulation, 358 GRE encapsulation, or TCP-like [I-D.davie-stt] encapsulation. If 359 security is required, IPsec could be used as a PSN tunnel. 361 The IP connection is indexed by destination IP address, and all 362 tenants would share the same IP connection when sending to the same 363 destination. To provide PSN tunnel per destination per tenant, 364 please see section 7.2. 366 7.2. PION over MPLS PSN 368 The MPLS LSP could be setup per destination per tenant, and provide 369 optimized traffic transmission for the overlay network, which would 370 greatly improve the service quality especially when interconnecting 371 different datacenters. And hierarchy MPLS LSP could provide flexible 372 connection across different domains. 374 PION over MPLS PSN does not require all the nodes in the network to 375 be MPLS enabled. Hierarchy MPLS LSP over IP would be used to adapt 376 the IP environment of a datacenter house. Most of the deployment 377 would only require NVE and edge router to be MPLS capable in 378 datacenter. One typical deployment use case for MPLS tunnel per 379 destination per tenant would be PION deployment across WAN. See the 380 figure below. 382 /----DC1----\ /----WAN--------\ /----DC2---\ 383 / \ / \ / \ 384 +---+ +-------+ +-------+ +---+ 385 |NVE| | Edge | | Edge | |NVE| 386 | 1 |--------|Router1|-----------|Router2|--------| 2 | 387 +---+ +-------+ +-------+ +---+ 388 || | | || 389 ||<=============>|<=================>|<=============>|| 390 | T1(IP) T2(MPLS) T3(IP) | 391 |<--------------------------------------------------->| 392 End to End MPLS Tunnel 394 Figure 3 396 The two interconnected datacenter would be across WAN which is MPLS 397 enabled. The MPLS tunnel per destination address per tenant ID 398 provided for "Gold" class tenant customer could have dedicated 399 network resources to serve. 401 Assuming only the WAN network is required to provide bandwidth 402 guarantee, where congestion is always happened. When setting up 403 connection from NVE1 to NVE2 where the two NVEs are gateway for the 404 specified tenant, there would be a hierarchy LSP from NVE1 to NVE2. 405 The underlying Tunnel3 (T3 in the figure) between Edge Router2 and 406 NVE2, underlying Tunnel1 (T1 in the figure) between Edge Router1 and 407 NVE1 could be IP connection within the datacenter (e.g, GRE 408 encapsulated). The underlying Tunnel2 (T2 in the figure) between 409 Edge Router1 and Edge Router2 could be selected as an MPLS-TE tunnel 410 which would provide QoS/BW guarantee. The allocated MPLS label is an 411 inner label to associated different underlying tunnels in different 412 domains. And inner MPLS label is only switched at underlying tunnel 413 stitching point, e.g, Edge Router1 and Edge Router2. 415 In the above case, only NVE and Edge Router are required to be MPLS 416 capable and the MPLS network in WAN could be optimized to provide 417 high quality service to tenant customer. The edge router in above 418 case is designed to be tenant-aware to optimize the tenant traffic by 419 standard IETF way. 421 8. Control Plane Consideration 423 There are three kinds of control plane functions for PION: 425 1. One is between end-point and NVE which is used to signal the 426 behave requirement of end-point to NVE. One option to implement the 427 first control plane is to reuse VDP defined by IEEE. 429 2. The other one is among NVEs, to synchronize end-point and PION 430 connection mapping among NVEs. Please refer to the requirement 431 [I-D.kreeger-nvo3-overlay-cp] for more detail. 433 One option to implement the second control plane above within one 434 datacenter could be by using one centralized server, and a standard 435 interface between the central server and NVE should be defined. The 436 centralized server would collect all the mapping information from 437 each NVE through this standard interface. When the NVE receives a 438 packet without forwarding entry, it would request the forwarding 439 entry from the centralized server to get the correct forwarding entry 440 and install it with appropriate lift time. 442 It is also possible to employ two or more centralized servers in one 443 datacenters, different centralized servers should be able to 444 synchronize the mapping information, and a standard interface between 445 different centralized servers should be defined. 447 When inter-connecting two datacenters, a standard interface between 448 the corresponding two centralized servers should also be defined, the 449 interface would be the same as the one within datacenter. All the 450 PION mapping information would be exchanged between the two central 451 servers through this standard interface. 453 3. Additional control plane function for PION is to setup PSN 454 connection. The IP connection within IP PSN is setup by normal 455 routing protocols and the IETF defined control plane could be reused. 456 The control plane of MPLS connection per destination per tenant would 457 be defined, one possible way is to reused MP-BGP or XMPP. 459 9. Acknowledgments 461 The authors would like to thank Igor Gashinsky, David McDysan, 462 Patricia Thaler, Thomas Morin, Vishwas Manral for their review and 463 contributions. 465 10. Informative References 467 [I-D.davie-stt] 468 Davie, B. and J. Gross, "A Stateless Transport Tunneling 469 Protocol for Network Virtualization (STT)", 470 draft-davie-stt-01 (work in progress), March 2012. 472 [I-D.kreeger-nvo3-overlay-cp] 473 Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. 474 Narten, "Network Virtualization Overlay Control Protocol 475 Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in 476 progress), January 2012. 478 [I-D.mahalingam-dutt-dcops-vxlan] 479 Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright, 480 C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A 481 Framework for Overlaying Virtualized Layer 2 Networks over 482 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01 483 (work in progress), February 2012. 485 [I-D.sridharan-virtualization-nvgre] 486 Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin, 487 G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang, 488 "NVGRE: Network Virtualization using Generic Routing 489 Encapsulation", draft-sridharan-virtualization-nvgre-00 490 (work in progress), September 2011. 492 [RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP 493 VPNs", RFC 6513, February 2012. 495 Authors' Addresses 497 Lizhong Jin 498 ZTE 499 889, Bibo Road 500 Shanghai, 201203, China 502 Email: lizhong.jin@zte.com.cn, lizho.jin@gmail.com 504 Bhumip Khasnabish 505 ZTE USA, Inc. 506 55 Madison Avenue, Suite 160 507 Morristown, NJ 07960 USA 509 Email: bhumip.khasnabish@zteusa.com, vumip1@gmail.com