idnits 2.17.1 draft-herbert-nvo3-ila-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 475: '...ses. Type bits and C-bit MUST be zero....' RFC 2119 keyword, line 667: '...neutral mapping) MUST be zero for a SI...' RFC 2119 keyword, line 677: '... Locators MUST only be associated wi...' RFC 2119 keyword, line 719: '... Locators MUST map to only one SIR d...' RFC 2119 keyword, line 779: '... the ILA address MUST be translated ba...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1540 has weird spacing: '...LA node as an...' -- The document date (October 27, 2016) is 2731 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC7346' is mentioned on line 605, but not defined == Missing Reference: 'RFC6145' is mentioned on line 1491, but not defined ** Obsolete undefined reference: RFC 6145 (Obsoleted by RFC 7915) == Missing Reference: 'RFC7348' is mentioned on line 1045, but not defined == Missing Reference: 'RFC4122' is mentioned on line 1640, but not defined == Unused Reference: 'RFC2460' is defined on line 1199, but no explicit reference was found in the text == Unused Reference: 'RFC4291' is defined on line 1206, but no explicit reference was found in the text == Unused Reference: 'RFC6296' is defined on line 1209, but no explicit reference was found in the text == Unused Reference: 'NVO3ARCH' is defined on line 1251, but no explicit reference was found in the text == Unused Reference: 'GUE' is defined on line 1255, but no explicit reference was found in the text == Unused Reference: 'GUESEC' is defined on line 1258, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-13) exists of draft-ietf-6man-rfc2460bis-03 == Outdated reference: A later version (-08) exists of draft-ietf-nvo3-arch-03 == Outdated reference: A later version (-03) exists of draft-herbert-gue-02 == Outdated reference: A later version (-03) exists of draft-hy-gue-4-secure-transport-00 Summary: 4 errors (**), 0 flaws (~~), 16 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Tom Herbert 3 Intended Status: Informational Facebook 4 Expires: April 30, 2017 October 27, 2016 6 Identifier-locator addressing for IPv6 7 draft-herbert-nvo3-ila-03 9 Abstract 11 This specification describes identifier-locator addressing (ILA) for 12 IPv6. Identifier-locator addressing differentiates between location 13 and identity of a network node. Part of an address expresses the 14 immutable identity of the node, and another part indicates the 15 location of the node which can be dynamic. Identifier-locator 16 addressing can be used to efficiently implement overlay networks for 17 network virtualization as well as solutions for use cases in 18 mobility. 20 Status of this Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as 28 Internet-Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/1id-abstracts.html 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html 41 Copyright and License Notice 43 Copyright (c) 2016 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2 Architectural overview . . . . . . . . . . . . . . . . . . . . . 6 61 2.1 Addressing . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 2.2 Network topology . . . . . . . . . . . . . . . . . . . . . . 6 63 2.3 Translations and mappings . . . . . . . . . . . . . . . . . 7 64 2.4 ILA routing . . . . . . . . . . . . . . . . . . . . . . . . 8 65 3 Address formats . . . . . . . . . . . . . . . . . . . . . . . . 9 66 3.1 ILA address format . . . . . . . . . . . . . . . . . . . . . 9 67 3.2 Locators . . . . . . . . . . . . . . . . . . . . . . . . . . 9 68 3.3 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . 9 69 3.3.1 Checksum neutral-mapping format . . . . . . . . . . . . 10 70 3.3.2 Identifier types . . . . . . . . . . . . . . . . . . . . 10 71 3.3.2.1 Interface identifiers . . . . . . . . . . . . . . . 10 72 3.3.2.2 Locally unique identifiers . . . . . . . . . . . . . 11 73 3.3.2.3 Virtual networking identifiers for IPv4 . . . . . . 11 74 3.3.2.4 Virtual networking identifiers for IPv6 unicast . . 12 75 3.3.2.5 Virtual networking identifiers for IPv6 multicast . 13 76 3.4 Standard identifier representation addresses . . . . . . . . 14 77 3.4.1 SIR for locally unique identifiers . . . . . . . . . . . 15 78 3.4.2 SIR for virtual addresses . . . . . . . . . . . . . . . 15 79 3.4.3 SIR domains . . . . . . . . . . . . . . . . . . . . . . 16 80 4 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 81 4.1 Identifier to locator mapping . . . . . . . . . . . . . . . 16 82 4.2 Address translations . . . . . . . . . . . . . . . . . . . . 16 83 4.2.1 SIR to ILA address translation . . . . . . . . . . . . . 16 84 4.2.2 ILA to SIR address translation . . . . . . . . . . . . . 17 85 4.3 Virtual networking operation . . . . . . . . . . . . . . . . 17 86 4.3.1 Crossing virtual networks . . . . . . . . . . . . . . . 18 87 4.3.2 IPv4/IPv6 protocol translation . . . . . . . . . . . . . 18 88 4.4 Transport layer checksums . . . . . . . . . . . . . . . . . 18 89 4.4.1 Checksum-neutral mapping . . . . . . . . . . . . . . . . 19 90 4.4.2 Sending an unmodified checksum . . . . . . . . . . . . . 20 91 4.5 Address selection . . . . . . . . . . . . . . . . . . . . . 20 92 4.6 Duplicate identifier detection . . . . . . . . . . . . . . . 20 93 4.7 ICMP error handling . . . . . . . . . . . . . . . . . . . . 21 94 4.7.1 Handling ICMP errors by ILA capable hosts . . . . . . . 21 95 4.7.2 Handling ICMP errors by non-ILA capable hosts . . . . . 21 96 4.8 Multicast . . . . . . . . . . . . . . . . . . . . . . . . . 22 97 5 Motivation for ILA . . . . . . . . . . . . . . . . . . . . . . . 22 98 5.1 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . 22 99 5.1.1 Multi-tenant virtualization . . . . . . . . . . . . . . 22 100 5.1.2 Datacenter virtualization . . . . . . . . . . . . . . . 23 101 5.1.3 Device mobility . . . . . . . . . . . . . . . . . . . . 23 102 5.2 Alternative methods . . . . . . . . . . . . . . . . . . . . 24 103 5.2.1 ILNP . . . . . . . . . . . . . . . . . . . . . . . . . . 24 104 5.2.2 Flow label as virtual network identifier . . . . . . . . 24 105 5.2.3 Extension headers . . . . . . . . . . . . . . . . . . . 25 106 5.2.4 Encapsulation techniques . . . . . . . . . . . . . . . . 25 107 6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 26 108 7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 109 7.1 Normative References . . . . . . . . . . . . . . . . . . . 26 110 7.2 Informative References . . . . . . . . . . . . . . . . . . 26 111 8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 27 112 Appendix A: Communication scenarios . . . . . . . . . . . . . . . 28 113 A.1 Terminology for scenario descriptions . . . . . . . . . . . 28 114 A.2 Identifier objects . . . . . . . . . . . . . . . . . . . . . 29 115 A.3 Reference network for scenarios . . . . . . . . . . . . . . 29 116 A.4 Scenario 1: Object to task . . . . . . . . . . . . . . . . . 30 117 A.5 Scenario 2: Object to Internet . . . . . . . . . . . . . . . 30 118 A.6 Scenario 3: Internet to object . . . . . . . . . . . . . . . 30 119 A.7 Scenario 4: Tenant system to service . . . . . . . . . . . . 31 120 A.8 Scenario 5: Object to tenant system . . . . . . . . . . . . 31 121 A.9 Scenario 6: Tenant system to Internet . . . . . . . . . . . 32 122 A.10 Scenario 7: Internet to tenant system . . . . . . . . . . . 32 123 A.11 Scenario 8: IPv4 tenant system to object . . . . . . . . . 32 124 A.12 Tenant to tenant system in the same virtual network . . . . 33 125 A.12.1 Scenario 9: TS to TS in the same VN using IPV6 . . . . 33 126 A.12.2 Scenario 10: TS to TS in same VN using IPv4 . . . . . . 33 127 A.13 Tenant system to tenant system in different virtual 128 networks . . . . . . . . . . . . . . . . . . . . . . . . . 33 129 A.13.1 Scenario 11: TS to TS in different VNs using IPV6 . . . 33 130 A.13.2 Scenario 12: TS to TS in different VNs using IPv4 . . . 34 131 A.13.3 Scenario 13: IPv4 TS to IPv6 TS in different VNs . . . 34 132 Appendix B: unique identifier generation . . . . . . . . . . . . . 35 133 B.1 Globally unique identifiers method . . . . . . . . . . . . . 35 134 B.2 Universally Unique Identifiers method . . . . . . . . . . . 35 135 Appendix C: Datacenter task virtualization . . . . . . . . . . . . 36 136 C.1 Address per task . . . . . . . . . . . . . . . . . . . . . . 36 137 C.2 Job scheduling . . . . . . . . . . . . . . . . . . . . . . . 36 138 C.3 Task migration . . . . . . . . . . . . . . . . . . . . . . . 37 139 C.3.1 Address migration . . . . . . . . . . . . . . . . . . . 37 140 C.3.2 Connection migration . . . . . . . . . . . . . . . . . . 38 142 1 Introduction 144 This specification describes the address formats, protocol operation, 145 and communication scenarios of identifier-locator addressing (ILA). 146 In identifier-locator addressing, an IPv6 address is split into a 147 locator and an identifier component. The locator indicates the 148 topological location in the network for a node, and the identifier 149 indicates the node's identity which refers to the logical or virtual 150 node in communications. Locators are routable within a network, but 151 identifiers typically are not. An application addresses a peer 152 destination by identifier. Identifiers are mapped to locators for 153 transit in the network. The on-the-wire address is composed of a 154 locator and an identifier: the locator is sufficient to route the 155 packet to a physical host, and the identifier allows the receiving 156 host to translate and forward the packet to the addressed 157 application. 159 With identifier-locator addressing network virtualization and 160 addressing for mobility can be implemented in an IPv6 network without 161 any additional encapsulation headers. Packets sent with identifier- 162 locator addresses look like plain unencapsulated packets (e.g. TCP/IP 163 packets). This method is transparent to the network, so protocol 164 specific mechanisms in network hardware work seamlessly. These 165 mechanisms include hash calculation for ECMP, NIC large segment 166 offload, checksum offload, etc. 168 Many of the concepts for ILA are adapted from Identifier-Locator 169 Network Protocol (ILNP) ([RFC6740], [RFC6741]) which defines a 170 protocol and operations model for identifier-locator addressing in 171 IPv6. 173 Section 5 provides a motivation for ILA and comparison of ILA with 174 alternative methods that achieve similar functionality. 176 1.1 Terminology 178 ILA Identifier-locator addressing. 180 ILA router A network node that performs ILA translation and 181 forwarding of translated packets. 183 ILA host An end host that is capable of performing ILA 184 translations on transmit or receive. 186 ILA node A network node capable of performing ILA translations. 187 This can be an ILA router or ILA host. 189 Locator A network prefix that routes to a physical host. 191 Locators provide the topological location of an 192 addressed node. In ILA locators are a sixty-four bit 193 prefixes. 195 Identifier A number that identifies an addressable node in the 196 network independent of its location. ILA identifiers 197 are sixty-four bit values. 199 ILA address 200 An IPv6 address composed of a locator (upper sixty-four 201 bits) and an identifier (low order sixty-four bits). 203 SIR Standard identifier representation. 205 SIR prefix A sixty-four bit network prefix used to identify a SIR 206 address. 208 SIR address 209 An IPv6 address composed of a SIR prefix (upper sixty- 210 four bits) and an identifier (lower sixty-four bits). 211 SIR addresses are visible to applications and provide a 212 means to address nodes independent of their location. 214 SIR domain A unique identifier namespace defined by a SIR prefix. 215 Each SIR prefix defines a SIR domain. 217 ILA translation 218 The process of translating the upper sixty-four bits of 219 an IPv6 address. Translations may be from a SIR prefix 220 to a locator or a locator to a SIR prefix. 222 Virtual address 223 An IPv6 or IPv4 address that resides in the address 224 space of a virtual network. Such addresses may be 225 translated to SIR addresses as an external 226 representation of the address outside of the virtual 227 network, or they may be translated to ILA addresses for 228 transit over an underlay network. 230 Topological address 231 An address that refers to a non-virtual node in a 232 network topology. These address physical hosts in a 233 network. 235 2 Architectural overview 237 Identifier-locator addressing allows a data plane method to implement 238 network virtualization without encapsulation and its related 239 overheads. The service ILA provides is effectively layer 3 over layer 240 3 network virtualization (IPv4 or IPv6 over IPv6). 242 2.1 Addressing 244 ILA performs translations on IPv6 address. There are two types of 245 addresses introduced for ILA: ILA addresses and SIR addresses. 247 ILA addresses are IPv6 addresses that are composed of a locator 248 (upper sixty-four bits) and an identifier (low order sixty-four 249 bits). The identifier serves as the logical addresses of a node, and 250 the locator indicates the location of the node on the network. 252 A SIR address (standard identifier representation) is an IPv6 address 253 that contains an identifier and an application visible SIR prefix. 254 SIR addresses are visible to the application and can be used as 255 connection endpoints. When a packet is sent to a SIR address, an ILA 256 router or host overwrites the SIR prefix with a locator corresponding 257 to the identifier. When a peer ILA node receives the packet, the 258 locator is overwritten with the original SIR prefix before delivery 259 to the application. In this manner applications only see SIR 260 addresses, they do not have visibility into ILA addresses. 262 ILA translations can transform addresses from one type to another. In 263 network virtualization virtual addresses can be translated into ILA 264 and SIR addresses, and conversely ILA and SIR addresses can be 265 translated to virtual addresses. 267 2.2 Network topology 269 ILA nodes are nodes in the network that perform ILA translations. An 270 ILA router is a node that performs ILA address translation and packet 271 forwarding to implement overlay network functionality. ILA routers 272 perform translations on packets sent by end nodes for transport 273 across an underlay network. Packets received by ILA routers on the 274 underlay network have their addresses reversed translated for 275 reception at an end node. An ILA host is an end node that implements 276 ILA functionality for transmitting or receiving packets. 278 ILA nodes are responsible for transit of packets over an underlay 279 network. On ingress to an ILA node (host or router) the virtual or 280 SIR address of a destination is translated to an ILA address. At the 281 a peer ILA node, the reverse translation is performed before handing 282 packets to an application. 284 The figure below provides an example topology using ILA. ILA 285 translations performed in one direction between Host A and Host B are 286 denoted. Host A sends a packet with a destination SIR address (step 287 (1)). An ILA router in the path translates the SIR address to an ILA 288 address with a locator set to Host B, referring to the location of 289 the node indicated by the identifier in the SIR address. The packet 290 is forwarded over the network and delivered to a peer ILA node (step 291 2). The peer ILA node, in this case another ILA router, translates 292 the destination address back to a SIR address and forwards to the 293 final destination (step 3). 295 +--------+ +--------+ 296 | Host A +-+ +--->| Host B | 297 | | | (2) ILA (') | | 298 +--------+ | ...addressed.... ( ) +--------+ 299 V +---+--+ . packet . +---+--+ (_) 300 (1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR 301 addressed +->|router| . . |router|->-+ addressed 302 packet +---+--+ . IPv6 . +---+--+ packet 303 / . Network . 304 / . . +--+-++--------+ 305 +--------+ / . . |ILA || Host | 306 | Host +--+ . .- -|host|| | 307 | | . . +--+-++--------+ 308 +--------+ ................ 310 2.3 Translations and mappings 312 Address translation is the mechanism employed by ILA. Logical or 313 virtual addresses are translated to topological IPv6 addresses for 314 transport to the proper destination. Translation occurs in the upper 315 sixty-four bits of an address, the low order sixty-four bits contains 316 an identifier that is immutable and is not used to route a packet. 318 Each ILA node maintains a mapping table. This table maps identifiers 319 to locators. The mappings are dynamic as nodes with identifiers can 320 be created, destroyed, or migrated between physical hosts. Mappings 321 are propagated amongst ILA routers or hosts in a network using 322 mapping propagation protocols (mapping propagation protocols will be 323 described in other specifications). 325 Identifiers are not statically bound to a host on the network, and in 326 fact their binding (or location) may change. This is the basis for 327 network virtualization and address migration. An identifier is mapped 328 to a locator at any given time, and a set of identifier to locator 329 mappings is propagated throughout a network to allow communications. 330 The mappings are kept synchronized so that if an identifier migrates 331 to a new physical host, its identifier to locator mapping is updated. 333 2.4 ILA routing 335 ILA is intended to be sufficiently lightweight so that all the hosts 336 in a network could potentially send and receive ILA addressed 337 packets. In order to scale this model and allow for hosts that do not 338 participate in ILA, a routing topology may be applied. A simple 339 routing topology is illustrated below. 341 +---------+--+ 342 (1) Default SIR route |ILA router | (2) Translated dest. 343 +->->->->->->->->->| |->->->->->+ 344 | +------------+ | 345 | V 346 +--------++-----+ +-----++--------+ 347 | || | | || | 348 | Host || ILA | | ILA || Host | 349 | ||host |->->->->->->->->->->->->->->| host|| | 350 +--------++-----+ (5) Direct route +-----++--------+ 351 . . 352 . . (3) Resolve 353 (4) Resolve . . Request +--------------+ 354 Reply . ..................>| | 355 . | ILA resolver | 356 ........................| | 357 +--------------+ 359 An ILA router can be addressed by an "anycast" SIR prefix so that it 360 receives packets sent on the network with SIR addresses. When an ILA 361 router receives a SIR addressed packet (step (1) in the diagram) it 362 will perform the ILA translation and send the ILA addressed packet to 363 the destination ILA node (step (2)). 365 If a sending host is ILA capable the triangular routing can be 366 eliminated by performing an ILA resolution protocol. This entails the 367 host sending an ILA resolve request that specifies the SIR address to 368 resolve (step (3) in the figure). An ILA resolver can respond to a 369 resolver request with the identifier to locator mapping (step (4)). 370 Subsequently, the ILA host can perform ILA translation and send 371 directly to the destination specified in the locator (step (5) in the 372 figure). The ILA resolution protocol will be specified in a companion 373 document. 375 In this model an ILA host maintains a cache of identifier mappings 376 for identifiers that it is currently communicating with. ILA routers 377 are expected to maintain a complete list of identifier to locator 378 mappings within the SIR domains that they service. 380 3 Address formats 382 3.1 ILA address format 384 An ILA address is composed of a locator and an identifier where each 385 occupies sixty-four bits (similar to the encoding in ILNP [RFC6741]). 387 | 64 bits | 64 bits | 388 +--------------------------------+-------------------------------+ 389 | Locator | Identifier | 390 +----------------------------------------------------------------+ 392 3.2 Locators 394 Locators are routable network address prefixes that create 395 topological addresses for physical hosts within the network. They may 396 be assigned from a global address block [RFC3587], or be based on 397 unique local IPv6 unicast addresses as described in [RFC4193]. 399 The format of an ILA address with a global unicast locator is: 401 |<--------------- Locator --------------->| 402 |3 bits| N bits | M bits | 61-N-M | 64 bits | 403 +------+-------------+---------+---------------------------------+ 404 | 001 | Global prefix | Subnet | Host | Identifier | 405 +------+---------------+---------+--------+----------------------+ 407 The format of an ILA address with a unique local IPv6 unicast locator 408 is: 410 |<--------------- Locator --------->| 411 | 7 bits |1| 40 bits | 16 bits | 64 bits | 412 +--------+-+------------+-----------+----------------------------+ 413 | FC00 |L| Global ID | Host | Identifier | 414 +--------+-+------------+-----------+----------------------------+ 416 3.3 Identifiers 418 The format of an ILA identifier is: 420 0 1 2 3 421 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 | Type|C| Identifier | 424 +-+-+-+-+ | 425 | | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 Fields are: 430 o Type: Type of the identifier (see section 3.3.2). 432 o C: The C-bit. This indicates that checksum-neutral mapping 433 applied (see section 3.3.1). 435 o Identifier: Identifier value. 437 3.3.1 Checksum neutral-mapping format 439 If the C-bit is set the low order sixteen bits of an identifier 440 contain the adjustment for checksum-neutral mapping (see section 441 4.4.1 for description of checksum-neutral mapping). The format of an 442 identifier with checksum neutral mapping is: 444 0 1 2 3 445 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 447 | Type|1| Identifier | 448 +-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 449 | | Checksum-neutral adjustment | 450 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 3.3.2 Identifier types 454 Identifier types allow standard encodings for common uses of 455 identifiers. Defined identifier types are: 457 0: interface identifier 459 1: locally unique identifier 461 2: virtual networking identifier for IPv4 address 463 3: virtual networking identifier for IPv6 unicast address 465 4: virtual networking identifier for IPv6 multicast address 467 5-7: Reserved 469 3.3.2.1 Interface identifiers 471 The interface identifier type indicates a plain local scope interface 472 identifier. When this type is used the address is a normal IPv6 473 address without identifier-locator semantics. The purpose of this 474 type is to allow normal IPv6 addresses to be defined within the same 475 networking prefix as ILA addresses. Type bits and C-bit MUST be zero. 477 The format of an ILA interface identifier address is: 479 | 64 bits |3 bits|1| 60 bits | 480 +----------------------------+------+---------------------------+ 481 | Prefix | 0x0 |0| IID | 482 +---------------------------------------------------------------+ 484 3.3.2.2 Locally unique identifiers 486 Locally unique identifiers (LUI) can be created for various 487 addressable objects within a network. These identifiers are in a flat 488 sixty bit space and must be unique within a SIR domain (unique within 489 a site for instance). To simplify administration, hierarchical 490 allocation of locally unique identifiers may be performed. The format 491 of an ILA address with locally unique identifiers is: 493 | 64 bits |3 bits|1| 60 bits | 494 +----------------------------+------+---------------------------+ 495 | Locator | 0x1 |C| Locally unique ident. | 496 +---------------------------------------------------------------+ 498 The figure below illustrates the translation from SIR address to an 499 ILA address as would be performed when a node sends to a SIR address. 500 Note the low order 16 bites of the identifier may be modified as the 501 checksum-neutral adjustment. The reverse translation of ILA address 502 to SIR address is symmetric. 504 +----------------------------+------+---------------------------+ 505 | SIR prefix | 0x1 |0| Identifier | 506 +---------------------------------------------------------------+ 507 | | | 508 SIR prefix to locator C-bit if needed | 509 V V V 510 +----------------------------+------+---------------------------+ 511 | Locator | 0x1 |C| Identifier | 512 +---------------------------------------------------------------+ 514 3.3.2.3 Virtual networking identifiers for IPv4 516 This type defines a format for encoding an IPv4 virtual address and 517 virtual network identifier within an identifier. The format of an ILA 518 address for IPv4 virtual networking is: 520 | 64 bits |3 bits|1| 28 bits | 32 bits | 521 +----------------------------+------+-----------+----------------+ 522 | Locator | 0x2 |C| VNID | VADDR | 523 +----------------------------------------------------------------+ 525 VNID is a virtual network identifier and VADDR is a virtual address 526 within the virtual network indicated by the VNID. The VADDR can be an 527 IPv4 unicast or multicast address, and may often be in a private 528 address space (i.e. [RFC1918]) used in the virtual network. 530 Translating a virtual IPv4 address into an ILA or SIR address and the 531 reverse translation are straight forward. Note that the low order 16 532 bits of the IPv6 address may be modified as the checksum-neutral 533 adjustment and that this translation implies protocol translation 534 when sending IPv4 packets over an ILA IPv6 network. 536 +----------------+ 537 | IPv4 address | 538 +----------------+ 539 ^ 540 | 541 V 542 +----------------------------+------+-----------+----------------+ 543 | Locator or SIR prefix | 0x2 |C| VNID | IPv4 address | 544 +----------------------------------------------------------------+ 546 3.3.2.4 Virtual networking identifiers for IPv6 unicast 548 In this format, a virtual network identifier and virtual IPv6 unicast 549 address are encoded within an identifier. To facilitate encoding of 550 virtual addresses, there is a unique mapping between a VNID and a 551 ninety-six bit prefix of the virtual address. The format an IPv6 552 unicast encoding with VNID in an ILA address is: 554 | 64 bits |3 bits|1| 28 bits | 32 bits | 555 +------------------------------+------+--------------+-----------+ 556 | Locator | 0x3 |C| VNID | VADDR6L | 557 +----------------------------------------------------------------+ 559 VADDR6L contains the low order 32 bits of the IPv6 virtual address. 560 The upper 96 bits of the virtual address are inferred from the VNID 561 to prefix mapping. Note that for ILA translations the low order 562 sixteen of the VADDR6L may be modified for checksum-neutral 563 adjustment. 565 The figure below illustrates encoding a tenant IPv6 virtual unicast 566 address into a ILA or SIR address. 568 +----------------------------------------------+-----------------+ 569 | Tenant prefix | VADDR6L | 570 +-----------------------+-------------------------------+--------+ 571 | | 572 +-prefix to VNID-+ | 573 | | 574 v v 575 +---------------------------+------+-----------+-----------------+ 576 | Locator or SIR prefix | 0x3 |C| VNID | VADDR6L | 577 +----------------------------------------------------------------+ 579 This encoding is reversible, given an ILA address, the virtual 580 address visible to the tenant can be deduced: 582 +---------------------------+------+-----------+-----------------+ 583 | Locator or SIR prefix | 0x3 |C| VNID | VADDR6L | 584 +----------------------------------------+-----------------------+ 585 | | 586 +-VNID to prefix-+ | 587 | | 588 v v 589 +----------------------------------------------+-----------------+ 590 | Tenant prefix | VADDR6L | 591 +----------------------------------------------------------------+ 593 3.3.2.5 Virtual networking identifiers for IPv6 multicast 595 In this format, a virtual network identifier and virtual IPv6 596 multicast address are encoded within an identifier. 598 /* IPv6 multicast address with VNID encoding in an ILA address */ 599 | 64 bits |3 bits|1|28 bits |4 bits| 28 bits | 600 +--------------------------+------+------------------------------+ 601 | Locator | 0x4 |C| VNID |Scope | MADDR6L | 602 +----------------------------------------------------------------+ 604 This format encodes an IPv6 multicast address in an identifier. The 605 scope indicates multicast address scope as defined in [RFC7346]. 606 MADDR6L is the low order 28 bits of the multicast address. The full 607 multicast address is thus: 609 ff0::: 611 And so can encode multicast addresses of the form: 613 ff0X::0 to ff0X::0fff:ffff 615 The figure below illustrates encoding a tenant IPv6 virtual multicast 616 address in an ILA or SIR address. Note that low order sixteen bits 617 of MADDR6L may be modified to be the checksum-neutral adjustment. 619 | 12 bits | 4 bits| 84 bits | 28 bits | 620 +---------+-------+-----------------------------------+----------+ 621 | 0xfff | Scope | 0's | MADDR6L | 622 +-------------+---------------------------------------------+----+ 623 | | 624 +------------------------------------+ | 625 | | 626 v v 627 +--------------------------+------+------------------------------+ 628 | Locator or SIR prefix | 0x4 |C| VNID |Scope | MADDR6L | 629 +----------------------------------------------------------------+ 631 This translation is reversible: 633 +--------------------------+------+------------------------------+ 634 | Locator or SIR prefix | 0x4 |C| VNID |Scope | MADDR6L | 635 +----------------------------------------------------------------+ 636 | | 637 +------------------------------------+ | 638 | | 639 V V 640 +---------+-------+-----------------------------------+----------+ 641 | 0xfff | Scope | 0's | MADDR6L | 642 +-------------+---------------------------------------------+----+ 644 3.4 Standard identifier representation addresses 646 An identifier identifies objects or nodes in a network. For instance, 647 an identifier may refer to a specific host, virtual machine, or 648 tenant system. When a host initiates a connection or sends a packet, 649 it uses the identifier to indicate the peer endpoint of the 650 communication. The endpoints of an established connection context 651 also referenced by identifiers. It is only when the packet is 652 actually being sent over a network that the locator for the 653 identifier needs to be resolved. 655 In order to maintain compatibility with existing networking stacks 656 and applications, identifiers are encoded in IPv6 addresses using a 657 standard identifier representation (SIR) address. A SIR address is a 658 combination of a prefix which occupies what would be the locator 659 portion of an ILA address, and the identifier in its usual location. 660 The format of a SIR address is: 662 | 64 bits |3 bits|1| 60 bits | 663 +--------------------------------+-------------------------------+ 664 | SIR prefix | Type |0| Identifier | 665 +----------------------------------------------------------------+ 667 The C-bit (checksum-neutral mapping) MUST be zero for a SIR address. 668 Type may be any identifier type except zero (interface identifiers) 670 A SIR prefix may be site-local, or globally routable. A globally 671 routable SIR prefix facilitates connectivity between hosts on the 672 Internet and ILA nodes. A gateway between a site's network and the 673 Internet can translate between SIR prefix and locator for an 674 identifier. A network may have multiple SIR prefixes where each 675 prefix defines a unique identifier space. 677 Locators MUST only be associated with one SIR prefix. This ensures 678 that if a translation from a SIR address to an ILA address is 679 performed when sending a packet, the reverse translation at the 680 receiver yields the same SIR address that was seen at the 681 transmitter. This also ensures that a reverse checksum-neutral 682 mapping can be performed at a receiver to restore the addresses that 683 were included in a pseudo header for setting a transport checksum. 685 A standard identifier representation address can be used as the 686 externally visible address for a node. This can used throughout the 687 network, returned in DNS AAAA records [RFC3363], used in logging, 688 etc. An application can use a SIR address without knowledge that it 689 encodes an identifier. 691 3.4.1 SIR for locally unique identifiers 693 The SIR address for a locally unique identifier has format: 695 | 64 bits |3 bits|1| 60 bits | 696 +--------------------------------+-------------------------------+ 697 | SIR prefix | 0x1 |0|Locally unique ident. | 698 +----------------------------------------------------------------+ 700 3.4.2 SIR for virtual addresses 702 A virtual address can be encoded using the standard identifier 703 representation. For example, the SIR address for an IPv6 virtual 704 address may be: 706 | 64 bits |3 bits|1| 28 bits | 32 bits | 707 +--------------------------------+------+------------+-----------+ 708 | SIR prefix | 0x3 |0| VNID | VADDRL6 | 709 +----------------------------------------------------------------+ 711 Note that this allows three representations of the same address in he 712 network: as a virtual address, a SIR address, and an ILA address. 714 3.4.3 SIR domains 716 Each SIR prefix defines a SIR domain. A SIR domain is a unique name 717 space for identifiers within a domain. The full identity of a node is 718 thus determined by an identifier and SIR domain (SIR prefix). 719 Locators MUST map to only one SIR domain in order to ensure that 720 translation from a locator to SIR prefix is unambiguous. 722 4 Operation 724 This section describes operation methods for using identifier-locator 725 addressing. 727 4.1 Identifier to locator mapping 729 An application initiates a communication or flow using a SIR address 730 or virtual address for a destination. In order to send a packet on 731 the network, the destination address is translated by an ILA router 732 or an ILA host in the path. An ILA node maintains a list of mappings 733 from identifier to locator to perform this translation. 735 The mechanisms of propagating and maintaining identifier to locator 736 mappings are outside the scope of this document. 738 4.2 Address translations 740 With ILA, address translation is performed to convert SIR addresses 741 to ILA addresses, and ILA addresses to SIR addresses. Translation is 742 usually done on a destination address as a form of source routing, 743 however translation on source virtual addresses to SIR addresses can 744 also be done to support some network virtualization scenarios (see 745 appendix A.7 for example). 747 4.2.1 SIR to ILA address translation 749 When translating a SIR address to an ILA address the SIR prefix in 750 the address is overridden with a locator, and checksum neutral 751 mapping may be performed. Since this operation is potentially done 752 for every packet the process should be very efficient (particularly 753 the lookup and checksum processing operations). 755 The typical steps to transmit a packet using ILA are: 757 1) Host stack creates a packet with source address set to a local 758 address (possibly a SIR address) for the local identity, and 759 the destination address is set to the SIR address or virtual 760 address for the peer. The peer address may have been discovered 761 through DNS or other means. 763 2) An ILA router or host translates the packet to use the locator. 764 If the original destination address is a SIR address then the 765 SIR prefix is overwritten with the locator. If the original 766 packet is a virtually addressed tenant packet then the virtual 767 address is translated per section 3.3.2. The locator is 768 discovered by a lookup in the locator to identifier mappings. 770 3) The ILA node performs checksum-neutral mapping if configured 771 for that (section 4.4.1). 773 4) Packet is forwarded on the wire. The network routes the packet 774 to the host indicated by the locator. 776 4.2.2 ILA to SIR address translation 778 When a destination node (ILA router or end host) receives an ILA 779 addressed packet, the ILA address MUST be translated back to a SIR 780 address (or tenant address) before upper layer processing. 782 The steps of receive processing are: 784 1) Packet is received. The destination locator is verified to 785 match a locator assigned to the host. 787 2) A lookup is performed on the destination identifier to find if 788 it addresses a local identifier. If match is found, either the 789 locator is overwritten with SIR prefix (for locally unique 790 identifier type) or the address is translated back to a tenant 791 virtual address as shown in appendix A.7. 793 3) Perform reverse checksum-neutral mapping if C-bit is set 794 (section 4.4.1). 796 4) Perform any optional policy checks; for instance that the 797 source may send a packet to the destination address, that 798 packet is not illegitimately crossing virtual networks, etc. 800 5) Forward packet to application processing. 802 4.3 Virtual networking operation 804 When using ILA with virtual networking identifiers, address 805 translation is performed to convert tenant virtual network and 806 virtual addresses to ILA addresses, and ILA addresses back to a 807 virtual network and tenant's virtual addresses. Translation may occur 808 on either source address, destination address, or both (see scenarios 809 for virtual networking in Appendix A). Address translation is 810 performed similar to the SIR translation cases described above. 812 4.3.1 Crossing virtual networks 814 With explicit configuration, virtual network hosts may communicate 815 directly with virtual hosts in another virtual network by using SIR 816 addresses for virtualization in both the source and destination 817 addresses. This might be done to allow services in one virtual 818 network to be accessed from another (by prior agreement between 819 tenants). See appendix A.13 for example of ILA addressing for such a 820 scenario. 822 4.3.2 IPv4/IPv6 protocol translation 824 An IPv4 tenant may send a packet that is converted to an IPv6 packet 825 with ILA addresses. Similarly, an IPv6 packet with ILA addresses may 826 be converted to an IPv4 packet to be received by an IPv4-only tenant. 827 These are IPv4/IPv6 stateless protocol translations as described in 828 [RFC6144] and [RFC6145]. See appendix A.12 for a description of these 829 scenarios. 831 4.4 Transport layer checksums 833 Packets undergoing ILA translation may encapsulate transport layer 834 checksums (e.g. TCP or UDP) that include a pseudo header that is 835 affected by the translation. 837 ILA provides two alternatives do deal with this: 839 o Perform a checksum-neutral mapping to ensure that an 840 encapsulated transport layer checksum is kept correct on the 841 wire. 843 o Send the checksum as-is, that is send the checksum value based 844 on the pseudo header before translation. 846 Some intermediate devices that are not the actual end point of a 847 transport protocol may attempt to validate transport layer checksums. 848 In particular, many Network Interface Cards (NICs) have offload 849 capabilities to validate transport layer checksums (including any 850 pseudo header) and return a result of validation to the host. 851 Typically, these devices will not drop packets with bad checksums, 852 they just pass a result to the host. Checksum offload is a 853 performance benefit, so if packets have incorrect checksums on the 854 wire this benefit is lost. With this incentive, applying a checksum- 855 neutral mapping is the recommended alternative. If it is known that 856 the addresses of a packet are not included in a transport checksum, 857 for instance a GRE packet is being encapsulated, then a source may 858 choose not to perform checksum-neutral mapping. 860 4.4.1 Checksum-neutral mapping 862 When a change is made to one of the IP header fields in the IPv6 863 pseudo-header checksum (such as one of the IP addresses), the 864 checksum field in the transport layer header may become invalid. 865 Fortunately, an incremental change in the area covered by the 866 Internet standard checksum [RFC1071] will result in a well-defined 867 change to the checksum value [RFC1624]. So, a checksum change caused 868 by modifying part of the area covered by the checksum can be 869 corrected by making a complementary change to a different 16-bit 870 field covered by the same checksum. 872 ILA can perform a checksum-neutral mapping when a SIR prefix or 873 virtual address is translated to a locator in an IPv6 address, and 874 performs the reverse mapping when translating a locator back to a SIR 875 prefix or virtual address. The low order sixteen bits of the 876 identifier contain the checksum adjustment value for ILA. 878 On transmission, the translation process is: 880 1) Compute the one's complement difference between the SIR prefix 881 and the locator. Fold this value to 16 bits (add-with-carry 882 four 16-bit words of the difference). 884 2) Add-with-carry the bit-wise not of the 0x1000 (i.e. 0xefff) to 885 the value from #1. This compensates the checksum for setting 886 the C-bit. 888 3) Add-with-carry the value from #2 to the low order sixteen bits 889 of the identifier. 891 4) Set the resultant value from #3 in the low order sixteen bits 892 of the identifier and set the C-bit. 894 Note that the "adjustment" (the 16-bit value set in the identifier in 895 set #3) is fixed for a given SIR to locator mapping, so the 896 adjustment value can be saved in an associated data structure for a 897 mapping to avoid computing it for each translation. 899 On reception of an ILA addressed packet, if the C-bit is set in an 900 ILA address: 902 1) Compute the one's complement difference between the locator in 903 the address and the SIR prefix that the locator is being 904 translated to. Fold this value to 16 bits (add-with-carry four 905 16-bit words of the difference). 907 2) Add-with-carry 0x1000 to the value from #1. This compensates 908 the checksum for clearing the C-bit. 910 3) Add-with-carry the value from #2 to the low order sixteen bits 911 of the identifier. 913 4) Set the resultant value from #3 in the low order sixteen bits 914 of the identifier and clear the C-bit. This restores the 915 original identifier sent in the packet. 917 4.4.2 Sending an unmodified checksum 919 When sending an unmodified checksum, the checksum is incorrect as 920 viewed in the packet on the wire. At the receiver, ILA translation of 921 the destination ILA address back to the SIR address occurs before 922 transport layer processing. This ensures that the checksum can be 923 verified when processing the transport layer header containing the 924 checksum. Intermediate devices are not expected to drop packets due 925 to a bad transport layer checksum. 927 4.5 Address selection 929 There may be multiple possibilities for creating either a source or 930 destination address. A node may be associated with more than one 931 identifier, and there may be multiple locators for a particular 932 identifier. The choice of locator or identifier is implementation or 933 configuration specific. The selection of an identifier occurs at flow 934 creation and must be invariant for the duration of the flow. Locator 935 selection must be done at least once per flow, and the locator 936 associated with the destination of a flow may change during the 937 lifetime of the flow (for instance in the case of a migrating 938 connection it will change). ILA address selection should follow 939 specifications in Default Address Selection for Internet Protocol 940 Version 6 (IPv6) [RFC6724]. 942 4.6 Duplicate identifier detection 944 As part of implementing the locator to identifier mapping, duplicate 945 identifier detection should be implemented in a centralized control 946 plane. A registry of identifiers could be maintained (possibly in 947 association the identifier to locator mapping database). When a node 948 creates an identifier it registers the identifier, and when the 949 identifier is no longer in use (e.g. task completes) the identifier 950 is unregistered. The control plane should able to detect a 951 registration attempt for an existing identifier and deny the request. 953 4.7 ICMP error handling 955 A packet that contains an ILA address may cause ICMP errors within 956 the network. In this case the ICMP data contains an IP header with an 957 ILA address. ICMP messages are sent back to the source address in the 958 packet. Upon receiving an ICMP error the host will process it 959 differently depending on whether it is ILA capable. 961 4.7.1 Handling ICMP errors by ILA capable hosts 963 If a host is ILA capable it can attempt to reverse translate the ILA 964 address in the destination of a header in the ICMP data back to a SIR 965 address that was originally used to transmit the packet. The steps 966 are: 968 1) Assume that the upper sixty-four bits of the destination 969 address in the ICMP data is a locator. Try match these bits 970 back to a SIR address. If the host is only in one SIR domain, 971 then the mapping to SIR address is implicit. If the host is in 972 multiple domains then a locator to SIR addresses table can be 973 maintained for this lookup. 975 2) If the identifier is marked with checksum-neutral mapping, undo 976 the checksum-neutral using the SIR address found in #1. The 977 resulting identifier address is potentially the original 978 address used to send the packet. 980 3) Lookup the identifier in the identifier to locator mapping 981 table. If an entry is found compare the locator in the entry to 982 the locator (upper sixty-four bits) of the destination address 983 in the IP header of the ICMP data. If hese match then proceed 984 to next step. 986 4) Overwrite the upper sixty-four bits of the destination address 987 in the ICMP data with the found SIR address and overwrite the 988 low order sixty-four bits with the found identifier (the result 989 of undoing checksum-neutral mapping). The resulting address 990 should be the original SIR address used in sending. The ICMP 991 error packet can then be received by the stack for further 992 processing. 994 4.7.2 Handling ICMP errors by non-ILA capable hosts 996 A non-ILA capable host may receive an ICMP error generated by the 997 network that contains an ILA address in an IP header contained in the 998 ICMP data. This would happen in the case that an ILA router performed 999 translation on a packet the host sent and that packet subsequently 1000 generated an ICMP error. In this case the host receiving the error 1001 message will attempt to find the connection state corresponding to 1002 the packet in headers the ICMP data. Since the host is unaware of ILA 1003 the lookup for connection state should fail. Because the host cannot 1004 recover the original addresses it used to send the packet, it won't 1005 be able any to derive any useful information about the original 1006 destination of the packet that it sent. 1008 If packets for a flow are always routed through an ILA router in both 1009 directions, for example ILA routers are coincident with edge routes 1010 in a network, then ICMP errors could be intercepted by an 1011 intermediate node which could translate the destination addresses in 1012 ICMP data back to the original SIR addresses. A receiving host would 1013 then see the destination address in the packet of the ICMP data to be 1014 that it used to transmit the original packet. 1016 4.8 Multicast 1018 ILA is generally not intended for use with multicast. In the case of 1019 multicast, routing of packets is based on the source address. Neither 1020 the SIR address nor an ILA address is suitable for use as a source 1021 address in a multicast packet. A SIR address is unroutable and hence 1022 would make a multicast packet unroutable if used as a source address. 1023 Using an ILA address as the source address makes the multicast packet 1024 routable, but this exposes ILA address to applications which is 1025 especially problematic on a multicast receiver that doesn't support 1026 ILA. 1028 If all multicast receivers are known to support ILA, a local locator 1029 address may be used in the source address of the multicast packet. In 1030 this case, each receiver will translate the source address from an 1031 ILA address to a SIR address before delivering packets to an 1032 application. 1034 5 Motivation for ILA 1036 5.1 Use cases 1038 5.1.1 Multi-tenant virtualization 1040 In multi-tenant virtualization overlay networks are established for 1041 tenants to provide virtual networks. Each tenant may have one or more 1042 virtual networks and a tenant's nodes are assigned virtual addresses 1043 within virtual networks. Identifier-locator addressing may be used as 1044 an alternative to traditional network virtualization encapsulation 1045 protocols used to create overlay networks (e.g. VXLAN [RFC7348]). 1046 Section 5.2,4 describes the advantages of using ILA in lieu of 1047 encapsulation protocols. 1049 Tenant systems (e.g. VMs) run on physical hosts and may migrate to 1050 different hosts. A tenant system is identified by a virtual address 1051 and virtual networking identifier of a corresponding virtual network. 1052 ILA can encode the virtual address and a virtual networking 1053 identifier in an ILA identifier. Each identifier is mapped to a 1054 locator that indicates the current host where the tenant system 1055 resides. Nodes that send to the tenant system set the locator per the 1056 mapping. When a tenant system migrates its identifier to locator 1057 mapping is updated and communicating nodes will use the new mapping. 1059 5.1.2 Datacenter virtualization 1061 Datacenter virtualization virtualizes networking resources. Various 1062 objects within a datacenter can be assigned addresses and serve as 1063 logical endpoints of communication. A large address space, for 1064 example that of IPv6, allows addressing to be used beyond the 1065 traditional concepts of host based addressing. Addressed objects can 1066 include tasks, virtual IP addresses (VIPs), pieces of content, disk 1067 blocks, etc. Each object has a location which is given by the host on 1068 which an object resides. Some objects may be migratable between hosts 1069 such that their location changes over time. 1071 Objects are identified by a unique identifier within a namespace for 1072 the datacenter (appendix B discusses methods to create unique 1073 identifiers for ILA). Each identifier is mapped to a locator that 1074 indicates the current host where the object resides. Nodes that send 1075 to an object set the locator per the mapping. When an object migrates 1076 its identifier to locator mapping is updated and communicating nodes 1077 will use the new mapping. 1079 A datacenter object of particular interest is tasks, units of 1080 execution for for applications. The goal of virtualzing tasks is to 1081 maximize resource efficiency and job scheduling. Tasks share many 1082 properties of tenant systems, however they are finer grained objects, 1083 may have a shorter lifetimes, and are likely created in greater 1084 numbers. Appendix C provides more detail and motivation for 1085 virtualizing tasks using ILA. 1087 5.1.3 Device mobility 1089 ILA may be applied as a solution for mobile devices. These are 1090 devices, smart phones for instance, that physically move between 1091 different networks. The goal of mobility is to provide a seamless 1092 transition when a device moves from one network to another. 1094 Each mobile device is identified by unique identifier within some 1095 provider domain. ILA encodes the identifier for the device in an ILA 1096 identifier. Each identifier is mapped to a locator that indicates the 1097 current network or point of attachment for the device. Nodes that 1098 send to the device set the locator per the mapping. When a mobile 1099 device moves between networks its identifier to locator mapping is 1100 updated and communicating nodes will use the new mapping. 1102 5.2 Alternative methods 1104 This section discusses the merits of alternative solution that have 1105 been proposed to provide network virtualization or mobility in IPv6. 1107 5.2.1 ILNP 1109 ILNP splits an address into a locator and identifier in the same 1110 manner as ILA. ILNP has characteristics, not present in ILA, that 1111 prevent it from being a practical solution: 1113 o ILNP requires that transport layer protocol implementations must 1114 be modified to work over ILNP. 1116 o ILNP can only be implemented in end hosts, not within the 1117 network. This essentially requires that all end hosts need to be 1118 modified to participate in mobility. 1120 o ILNP employs IPv6 extension headers which are mostly considered 1121 non-deployable. ILA does not use these. 1123 o Core support for ILA is in upstream Linux, to date there is no 1124 publicly available source code for ILNP. 1126 o ILNP involves DNS to distribute mapping information, ILA assumes 1127 mapping information is not part of naming. 1129 5.2.2 Flow label as virtual network identifier 1131 The IPv6 flow label could conceptually be used as a 20-bit virtual 1132 network identifier in order to indicate a packet is sent on an 1133 overlay network. In this model the addresses may be virtual addresses 1134 within the specified virtual network. Presumably, the tuple of flow- 1135 label and addresses could be used by switches to forward virtually 1136 addressed packets. 1138 This approach has some issues: 1140 o Forwarding virtual packets to their physical location would 1141 require specialized switch support. 1143 o The flow label is only twenty bits, this is too small to be a 1144 discriminator in forwarding a virtual packet to a specific 1145 destination. Conceptually, the flow label might be used in a 1146 type of label switching to solve that. 1148 o The flow label is not considered immutable in transit, 1149 intermediate devices may change it. 1151 o The flow label is not part of the pseudo header for transport 1152 checksum calculation, so it is not covered by any transport (or 1153 other) checksums. 1155 5.2.3 Extension headers 1157 To accomplish network virtualization an extension header, as a 1158 destination or routing option, could be used that contains the 1159 virtual destination address of a packet. The destination address in 1160 the IPv6 header would be the topological address for the location of 1161 the virtual node. Conceivably, segment routing could be used to 1162 implement network virtualization in this manner. 1164 This technique has some issues: 1166 o Intermediate devices must not insert extension headers 1167 [RFC2460bis]. 1169 o Extension headers introduce additional packet overhead which may 1170 impact performance. 1172 o Extension headers are not covered by transport checksums (as the 1173 addresses would be) nor any other checksum. 1175 o Extension headers are not widely supported in network hardware 1176 or devices. For instance, several NIC offloads don't work in the 1177 presence of extension headers. 1179 5.2.4 Encapsulation techniques 1181 Various encapsulation techniques have been proposed for implementing 1182 network virtualization and mobility. LISP is an example of an 1183 encapsulation that is based on locator identifier separation similar 1184 to ILA. The primary drawback of encapsulation is complexity and per 1185 packet overhead. For, instance when LISP is used with IPv6 the 1186 encapsulation overhead is fifty-six bytes and two IP headers are 1187 present in every packet. This adds considerable processing costs, 1188 requires considerations to handle path MTU correctly, and certain 1189 network accelerations may be lost. 1191 6 IANA Considerations 1193 There are no IANA considerations in this specification. 1195 7 References 1197 7.1 Normative References 1199 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1200 (IPv6) Specification", RFC 2460, December 1998. 1202 [RFC2460bis] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1203 (IPv6) Specification", draft-ietf-6man-rfc2460bis-03, 1204 January 2016. 1206 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 1207 Architecture", RFC 4291, February 2006. 1209 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 1210 Translation", RFC 6296, June 2011. 1212 [RFC1071] Braden, R., Borman, D., Partridge, C., and W. Plummer, 1213 "Computing the Internet checksum", RFC 1071, September 1214 1988. 1216 [RFC1624] Rijsinghani, A., "Computation of the Internet Checksum 1217 via Incremental Update", RFC 1624, May 1994. 1219 [RFC6724] Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown, 1220 "Default Address Selection for Internet Protocol Version 1221 6 (IPv6)", RFC 6724, September 2012. 1223 7.2 Informative References 1225 [RFC6740] RJ Atkinson and SN Bhatti, "Identifier-Locator Network 1226 Protocol (ILNP) Architectural Description", RFC 6740, 1227 November 2012. 1229 [RFC6741] RJ Atkinson and SN Bhatti, "Identifier-Locator Network 1230 Protocol (ILNP) Engineering Considerations", RFC 6741, 1231 November 2012. 1233 [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., 1234 and E. Lear, "Address Allocation for Private Internets", 1235 BCP 5, RFC 1918, February 1996. 1237 [RFC3363] Bush, R., Durand, A., Fink, B., Gudmundsson, O., and T. 1238 Hain, "Representing Internet Protocol version 6 (IPv6) 1239 Addresses in the Domain Name System (DNS)", RFC 3363, 1240 August 2002. 1242 [RFC3587] Hinden, R., Deering, S., and E. Nordmark, "IPv6 Global 1243 Unicast Address Format", RFC 3587, August 2003. 1245 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 1246 Addresses", RFC 4193, October 2005. 1248 [RFC6144] Baker, F., Li, X., Bao, C., and K. Yin, "Framework for 1249 IPv4/IPv6 Translation", RFC 6144, April 2011. 1251 [NVO3ARCH] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and 1252 Narten, T., "An Architecture for Overlay Networks 1253 (NVO3)", draft-ietf-nvo3-arch-03 1255 [GUE] Herbert, T., and Yong, L., "Generic UDP Encapsulation", 1256 draft-herbert-gue-02, work in progress. 1258 [GUESEC] Yong, L., and Herbert, T. "Generic UDP Encapsulation (GUE) 1259 for Secure Transport", draft-hy-gue-4-secure-transport- 1260 00, work in progress 1262 8 Acknowledgments 1264 The author would like to thank Mark Smith, Lucy Yong, Erik Kline, 1265 Saleem Bhatti, Petr Lapukhov, Blake Matheny, Doug Porter, Pierre 1266 Pfister, and Fred Baker for their insightful comments for this draft; 1267 Roy Bryant, Lorenzo Colitti, Mahesh Bandewar, and Erik Kline for 1268 their work on defining and applying ILA. 1270 Appendix A: Communication scenarios 1272 This section describes the use of identifier-locator addressing in 1273 several scenarios. 1275 A.1 Terminology for scenario descriptions 1277 A formal notation for identifier-locator addressing with ILNP is 1278 described in [RFC6740]. We extend this to include for network 1279 virtualization cases. 1281 Basic terms are: 1283 A = IP Address 1284 I = Identifier 1285 L = Locator 1286 LUI = Locally unique identifier 1287 VNI = Virtual network identifier 1288 VA = An IPv4 or IPv6 virtual address 1289 VAX = An IPv6 networking identifier (IPv6 VA mapped to VAX) 1290 SIR = Prefix for standard identifier representation 1291 VNET = IPv6 prefix for a tenant (assumed to be globally routable) 1292 Iaddr = IPv6 address of an Internet host 1294 An ILA IPv6 address is denoted by 1296 L:I 1298 A SIR address with a locally unique identifier and SIR prefix is 1299 denoted by 1301 SIR:LUI 1303 A virtual identifier with a virtual network identifier and a virtual 1304 IPv4 address is denoted by 1306 VNI:VA 1308 An ILA IPv6 address with a virtual networking identifier for IPv4 1309 would then be denoted 1311 L:(VNI:VA) 1313 The local and remote address pair in a packet or endpoint is denoted 1315 A,A 1317 An address translation sequence from SIR addresses to ILA addresses 1318 for transmission on the network and back to SIR addresses at a 1319 receiver has notation: 1321 A,A -> L:I,A -> A,A 1323 A.2 Identifier objects 1325 Identifier-locator addressing is broad enough in scope to address 1326 many different types of networking entities. For the purposes of this 1327 section we classify these as "objects" and "tenant systems". 1329 Objects encompass uses where nodes are address by local unique 1330 identifiers (LUI). In the scenarios below objects are denoted by OBJ. 1332 Tenant systems are those associated with network virtualization that 1333 have virtual addresses (that is they are addressed by VNI:VA). In the 1334 scenarios below tenant systems are denoted by TS. 1336 A.3 Reference network for scenarios 1338 The figure below provides an example network topology with ILA 1339 addressing in use. In this example, there are four hosts in the 1340 network with locators L1, L2, L3, and L4. There three objects with 1341 identifiers O1, O2, and O3, as well as a common networking service 1342 with identifier S1. There are two virtual networks VNI1 and VNI2, and 1343 four tenant systems addressed as: VA1 and VA2 in VNI1, VA3 and VA4 in 1344 VNI2. The network is connected to the Internet via a gateway. 1345 ` ............. 1346 . . 1347 +-----------------+ . Internet . +-----------------+ 1348 | Host L1 | . . | Host L2 | 1349 | +-------------+ | ............. | +-------------+ | 1350 | | TS VNI1:VA1 | | | | | TS VNI1:VA2 | | 1351 | +-------------+ +---+ +-----+-----+ +---+ +-------------+ | 1352 | +-------------+ | | | Gateway | | | +-------------+ | 1353 | | OBJ O1 | | | +-----+-----+ | | | TS VNI2:VA3 | | 1354 | +-------------+ | | | | | +-------------+ | 1355 +-----------------+ | ............. | +-----------------+ 1356 +-----. .-----+ 1357 +-----------------+ . Underlay . +-----------------+ 1358 | Host L3 | +-----. Network .---+ | Host L4 | 1359 | +-------------+ | | ............. | | +-------------+ | 1360 | | OBJ O2 | | | | | | VM VNI2:VA4 | | 1361 | +-------------+ +---+ +-----| +-------------+ | 1362 | +-------------+ | | +-------------+ | 1363 | | OBJ O3 | | | | Serv. S1 | | 1364 | +-------------+ | | +-------------+ | 1365 +-----------------+ +-----------------+ 1366 Several communication scenarios can be considered: 1368 1) Object to object 1369 2) Object to Internet 1370 3) Internet to object 1371 4) Tenant system to local service 1372 5) Object to tenant system 1373 6) Tenant system to Internet 1374 7) Internet to tenant system 1375 8) IPv4 tenant system to service 1376 9) Tenant system to tenant system same virtual network using IPv6 1377 10) Tenant system to tenant system in same virtual network using 1378 IPv4 1379 11) Tenant system to tenant system in different virtual network 1380 using IPv6 1381 12) Tenant system to tenant system in different virtual network 1382 using IPv4 1383 13) IPv4 tenant system to IPv6 tenant system in different virtual 1384 networks 1386 A.4 Scenario 1: Object to task 1388 The transport endpoints for object to object communication are the 1389 SIR addresses for the objects. When a packet is sent on the wire, the 1390 locator is set in the destination address of the packet. On reception 1391 the destination addresses is converted back to SIR representation for 1392 processing at the transport layer. 1394 If object O1 is communicating with object O2, the ILA translation 1395 sequence would be: 1397 SIR:O1,SIR:O2 -> // Transport endpoints on O1 1398 SIR:O1,L3:O2 -> // ILA used on the wire 1399 SIR:O1,SIR:O2 // Received at O2 1401 A.5 Scenario 2: Object to Internet 1403 Communication from an object to the Internet is accomplished through 1404 use of a SIR address (globally routable) in the source address of 1405 packets. No ILA translation is needed in this path. 1407 If object O1 is sending to an address Iaddr on the Internet, the 1408 packet addresses would be: 1410 SIR:O1,Iaddr 1412 A.6 Scenario 3: Internet to object 1413 An Internet host transmits a packet to a task using an externally 1414 routable SIR address. The SIR prefix routes the packet to a gateway 1415 for the datacenter. The gateway translates the destination to an ILA 1416 address. 1418 If a host on the Internet with address Iaddr sends a packet to object 1419 O3, the ILA translation sequence would be: 1421 Iaddr,SIR:O3 -> // Transport endpoint at Iaddr 1422 Iaddr,L1:O3 -> // On the wire in datacenter 1423 Iaddr,SIR:O3 // Received at O3 1425 A.7 Scenario 4: Tenant system to service 1427 A tenant can communicate with a datacenter service using the SIR 1428 address of the service. 1430 If TS VA1 is communicating with service S1, the ILA translation 1431 sequence would be: 1433 VNET:VA1,Saddr-> // Transport endpoints in TS 1434 SIR:(VNET:VA1):Saddr-> // On the wire 1435 SIR:(VNET:VA1):Saddr // Received at S1 1437 Where VNET is the address prefix for the tenant and Saddr is the IPv6 1438 address of the service. 1440 The ILA translation sequence in the reverse path, service to tenant 1441 system, would be: 1443 Saddr,SIR:(VNET:VA1) // Transport endpoints in S1 1444 Saddr,L1:(VNET:VA1) // On the wire 1445 Saddr,VNET:VA1 // Received at the TS 1447 Note that from the point of view of the service task there is no 1448 material difference between a peer that is a tenant system versus one 1449 which is another task. 1451 A.8 Scenario 5: Object to tenant system 1453 An object can communicate with a tenant system through it's 1454 externally visible address. 1456 If object O2 is communicating with TS VA4, the ILA translation 1457 sequence would be: 1459 SIR:O2,VNET:VA4 -> // Transport endpoints at T2 1460 SIR:O2,L4:(VNI2:VAX4) -> // On the wire 1461 SIR:O2,VNET:VA4 // Received at TS 1463 A.9 Scenario 6: Tenant system to Internet 1465 Communication from a TS to the Internet assumes that the VNET for the 1466 TS is globally routable, hence no ILA translation would be needed. 1468 If TS VA4 sends a packet to the Internet, the addresses would be: 1470 VNET:VA4,Iaddr 1472 A.10 Scenario 7: Internet to tenant system 1474 An Internet host transmits a packet to a tenant system using an 1475 externally routable tenant prefix and address. The prefix routes the 1476 packet to a gateway for the datacenter. The gateway translates the 1477 destination to an ILA address. 1479 If a host on the Internet with address Iaddr is sending to TS VA4, 1480 the ILA translation sequence would be: 1482 Iaddr,VNET:VA4 -> // Endpoint at Iaddr 1483 Iaddr,L4:(VNI2:VAX4) -> // On the wire in datacenter 1484 Iaddr,VNET:VA4 // Received at TS 1486 A.11 Scenario 8: IPv4 tenant system to object 1488 A TS that is IPv4-only may communicate with an object using protocol 1489 translation. The object would be represented as an IPv4 address in 1490 the tenant's address space, and stateless NAT64 should be usable as 1491 described in [RFC6145]. 1493 If TS VA2 communicates with object O3, the ILA translation sequence 1494 would be: 1496 VA2,ADDR3 -> // IPv4 endpoints at TS 1497 SIR:(VNI1:VA2),L3:O3 -> // On the wire in datacenter 1498 SIR:(VNI1:VA2),SIR:O3 // Received at task 1500 VA2 is the IPv4 address in the tenant's virtual network, ADDR4 is an 1501 address in the tenant's address space that maps to the network 1502 service. 1504 The reverse path, task sending to a TS with an IPv4 address, requires 1505 a similar protocol translation. 1507 For object O3 communicate with TS VA2, the ILA translation sequence 1508 would be: 1510 SIR:O3,SIR:(VNI1:VA2) -> // Endpoints at T4 1511 SIR:O3,L2:(VNI1:VA2) -> // On the wire in datacenter 1512 ADDR4,VA2 // IPv4 endpoint at TS 1514 A.12 Tenant to tenant system in the same virtual network 1516 ILA may be used to allow tenants within a virtual network to 1517 communicate without the need for explicit encapsulation headers. 1519 A.12.1 Scenario 9: TS to TS in the same VN using IPV6 1521 If TS VA1 sends a packet to TS VA2, the ILA translation sequence 1522 would be: 1524 VNET:VA1,VNET:VA2 -> // Endpoints at VA1 1525 VNET:VA1,L2:(VNI1,VAX2) -> // On the wire 1526 VNET:VA1,VNET:VA2 -> // Received at VA2 1528 A.12.2 Scenario 10: TS to TS in same VN using IPv4 1530 For two tenant systems to communicate using IPv4 and ILA, IPv4/IPv6 1531 protocol translation is done both on the transmit and receive. 1533 If TS VA1 sends an IPv4 packet to TS VA2, the ILA translation 1534 sequence would be: 1536 VA1,VA2 -> // Endpoints at VA1 1537 SIR:(VNI1:VA1),L2:(VNI1,VA2) -> // On the wire 1538 VA1,VA2 // Received at VA2 1540 Note that the SIR is chosen by an ILA node as an appropriate SIR 1541 prefix in the underlay network. Tenant systems do not use SIR address 1542 for this communication, they only use virtual addresses. 1544 A.13 Tenant system to tenant system in different virtual networks 1546 A tenant system may be allowed to communicate with another tenant 1547 system in a different virtual network. This should only be allowed 1548 with explicit policy configuration. 1550 A.13.1 Scenario 11: TS to TS in different VNs using IPV6 1552 For TS VA4 to communicate with TS VA1 using IPv6 the translation 1553 sequence would be: 1555 VNET2:VA4,VNET1:VA1-> // Endpoint at VA4 1556 VNET2:VA4,L1:(VNI1,VAX1)-> // On the wire 1557 VNET2:VA4,VNET1:VA1 // Received at VA1 1559 Note that this assumes that VNET1 and VNET2 are globally routable 1560 between the two virtual networks. 1562 A.13.2 Scenario 12: TS to TS in different VNs using IPv4 1564 To allow IPv4 tenant systems in different virtual networks to 1565 communicate with each other, an address representing the peer would 1566 be mapped into each tenant's address space. IPv4/IPv6 protocol 1567 translation is done on transmit and receive. 1569 For TS VA4 to communicate with TS VA1 using IPv4 the translation 1570 sequence may be: 1572 VA4,SADDR1 -> // IPv4 endpoint at VA4 1573 SIR:(VNI2:VA4),L1:(VNI1,VA1)-> // On the wire 1574 SADDR4,VA1 // Received at VA1 1576 SADDR1 is the mapped address for VA1 in VA4's address space, and 1577 SADDR4 is the mapped address for VA4 in VA1's address space. 1579 A.13.3 Scenario 13: IPv4 TS to IPv6 TS in different VNs 1581 Communication may also be mixed so that an IPv4 tenant system can 1582 communicate with an IPv6 tenant system in another virtual network. 1583 IPv4/IPv6 protocol translation is done on transmit. 1585 For TS VA4 using IPv4 to communicate with TS VA1 using IPv6 the 1586 translation sequence may be: 1588 VA4,SADDR1 -> // IPv4 endpoint at VA4 1589 SIR:(VNI2:VA4),L1:(VNI1,VAX1)-> // On the wire 1590 SIR:(VNI2:VA4),VNET1:VA1 // Received at VA1 1592 SADDR1 is the mapped IPv4 address for VA1 in VA4's address space. 1594 In the reverse direction, TS VA1 using IPv6 would communicate with TS 1595 VA4 with the translation sequence: 1597 VNET1:VA1,SIR:(VNI2:VA4) // Endpoint at VA1 1598 VNET1:VA1,L4:(VNI2:VA4) // On the wire 1599 SADDR1,VA4 // Received at VA4 1601 Appendix B: unique identifier generation 1603 The unique identifier type of ILA identifiers can address 2**60 1604 objects. This appendix describes some method to perform allocation of 1605 identifiers for objects to avoid duplicated identifiers being 1606 allocated. 1608 B.1 Globally unique identifiers method 1610 For small to moderate sized deployments the technique for creating 1611 locally assigned global identifiers described in [RFC4193] could be 1612 used. In this technique a SHA-1 digest of the time of day in NTP 1613 format and an EUI-64 identifier of the local host is performed. N 1614 bits of the result are used as the globally unique identifier. 1616 The probability that two or more of these IDs will collide can be 1617 approximated using the formula: 1619 P = 1 - exp(-N**2 / 2**(L+1)) 1621 where P is the probability of collision, N is the number of 1622 identifiers, and L is the length of an identifier. 1624 The following table shows the probability of a collision for a range 1625 of identifiers using a 60-bit length. 1627 Identifiers Probability of Collision 1628 1000 4.3368*10^-13 1629 10000 4.3368*10^-11 1630 100000 4.3368*10^-09 1631 1000000 4.3368*10^-07 1633 Note that locally unique identifiers may be ephemeral, for instance a 1634 task may only exist for a few seconds. This should be considered when 1635 determining the probability of identifier collision. 1637 B.2 Universally Unique Identifiers method 1639 For larger deployments, hierarchical allocation may be desired. The 1640 techniques in Universally Unique Identifier (UUID) URN ([RFC4122]) 1641 can be adapted for allocating unique object identifiers in sixty 1642 bits. An identifier is split into two components: a registrar prefix 1643 and sub-identifier. The registrar prefix defines an identifier block 1644 which is managed by an agent, the sub-identifier is a unique value 1645 within the registrar block. 1647 For instance, each host in a network could be an agent so that unique 1648 identifiers for objects could be created autonomously be the host. 1650 The identifier might be composed of a twenty-four bit host identifier 1651 followed by a thirty-six bit timestamp. Assuming that a host can 1652 allocate up to 100 identifiers per second, this allows about 21.8 1653 years before wrap around. 1655 /* LUI identifier with host registrar and timestamp */ 1656 |3 bits|1| 24 bits | 36 bits | 1657 +------+-------------------+-------------------------------------+ 1658 | 0x1 |C| Host identifier | Timestamp Identifier | 1659 +----------------------------------------------------------------+ 1661 Appendix C: Datacenter task virtualization 1663 This section describes some details to apply ILA to virtualizing 1664 tasks in a datacenter. 1666 C.1 Address per task 1668 Managing the port number space for services within a datacenter is a 1669 nontrivial problem. When a service task is created, it may run on 1670 arbitrary hosts. The typical scenario is that the task will be 1671 started on some machine and will be assigned a port number for its 1672 service. The port number must be chosen dynamically to not conflict 1673 with any other port numbers already assigned to tasks on the same 1674 machine (possibly even other instances of the same service). A 1675 canonical name for the service is entered into a database with the 1676 host address and assigned port. When a client wishes to connect to 1677 the service, it queries the database with the service name to get 1678 both the address of an instance as well as its port number. Note that 1679 DNS is not adequate for the service lookup since it does not provide 1680 port numbers. 1682 With ILA, each service task can be assigned its own IPv6 address and 1683 therefore will logically be assigned the full port space for that 1684 address. This a dramatic simplification since each service can now 1685 use a publicly known port number that does not need to unique between 1686 services or instances. A client can perform a lookup on the service 1687 name to get an IP address of an instance and then connect to that 1688 address using a well known port number. In this case, DNS is 1689 sufficient for directing clients to instances of a service. 1691 C.2 Job scheduling 1693 In the usual datacenter model, jobs are scheduled to run as tasks on 1694 some number of machines. A distributed job scheduler provides the 1695 scheduling which may entail considerable complexity since jobs will 1696 often have a variety of resource constraints. The scheduler takes 1697 these constraints into account while trying to maximize utility of 1698 the datacenter in terms utilization, cost, latency, etc. Datacenter 1699 jobs do not typically run in virtual machines (VMs), but may run 1700 within containers. Containers are mechanisms that provide resource 1701 isolation between tasks running on the same host OS. These resources 1702 can include CPU, disk, memory, and networking. 1704 A fundamental problem arises in that once a task for a job is 1705 scheduled on a machine, it often needs to run to completion. If the 1706 scheduler needs to schedule a higher priority job or change resource 1707 allocations, there may be little recourse but to kill tasks and 1708 restart them on a different machine. In killing a task, progress is 1709 lost which results in increased latency and wasted CPU cycles. Some 1710 tasks may checkpoint progress to minimize the amount of progress 1711 lost, but this is not a very transparent or general solution. 1713 An alternative approach is to allow transparent job migration. The 1714 scheduler may migrate running jobs from one machine to another. 1716 C.3 Task migration 1718 Under the orchestration of the job scheduler, the steps to migrate a 1719 job may be: 1721 1) Stop running tasks for the job. 1722 2) Package the runtime state of the job. The runtime state is 1723 derived from the containers for the jobs. 1724 3) Send the runtime state of the job to the new machine where the 1725 job is to run. 1726 4) Instantiate the job's state on the new machine. 1727 5) Start the tasks for the job continuing from the point at which 1728 it was stopped. 1730 This model similar to virtual machine (VM) migration except that the 1731 runtime state is typically much less data-- just task state as 1732 opposed to a full OS image. Task state may be compressed to reduce 1733 latency in migration. 1735 C.3.1 Address migration 1737 ILA facilitates address (specifically SIR address) migration between 1738 hosts as part of task migration or for other purposes. The steps in 1739 migrating an address might be: 1741 1) Configure address on the target host. 1743 2) Suspend use of the address on the old host. This includes 1744 handling established connections (see next section). A state 1745 may be established to drop packets or send ICMP destination 1746 unreachable when packets to the migrated address are received. 1748 3) Update the identifier to locator mapping database. Depending on 1749 the control plane implementation this may include pushing the 1750 new mapping to hosts. 1752 4) Communicating hosts will learn of the new mapping via a control 1753 plane either by participation in a protocol for mapping 1754 propagation or by the ILA resolution protocol. 1756 C.3.2 Connection migration 1758 When a task and its addresses are migrated between machines, the 1759 disposition of existing TCP connections needs to be considered. 1761 The simplest course of action is to drop TCP connections across a 1762 migration. Since migrations should be relatively rare events, it is 1763 conceivable that TCP connections could be automatically closed in the 1764 network stack during a migration event. If the applications running 1765 are known to handle this gracefully (i.e. reopen dropped connections) 1766 then this may be viable. 1768 For seamless migration, open connections may be migrated between 1769 hosts. Migration of these entails pausing the connection, packaging 1770 connection state and sending to target, instantiating connection 1771 state in the peer stack, and restarting the connection. From the time 1772 the connection is paused to the time it is running again in the new 1773 stack, packets received for the connection should be silently 1774 dropped. For some period of time, the old stack will need to keep a 1775 record of the migrated connection. If it receives a packet, it should 1776 either silently drop the packet or forward it to the new location. 1778 Author's Address 1780 Tom Herbert 1781 Facebook 1782 1 Hacker Way 1783 Menlo Park, CA 1784 EMail: tom@herbertland.com