idnits 2.17.1 draft-ietf-shim6-failure-detection-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1655. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1666. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1673. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1679. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 6, 2008) is 5802 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415) ** Obsolete normative reference: RFC 3484 (Obsoleted by RFC 6724) == Outdated reference: A later version (-11) exists of draft-ietf-bfd-base-06 == Outdated reference: A later version (-09) exists of draft-ietf-dna-protocol-06 == Outdated reference: A later version (-04) exists of draft-ietf-shim6-locator-pair-selection-02 == Outdated reference: A later version (-12) exists of draft-ietf-shim6-proto-09 == Outdated reference: A later version (-12) exists of draft-ietf-tcpm-icmp-attacks-02 -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Arkko 3 Internet-Draft Ericsson 4 Intended status: Standards Track I. van Beijnum 5 Expires: December 8, 2008 IMDEA Networks 6 June 6, 2008 8 Failure Detection and Locator Pair Exploration Protocol for IPv6 9 Multihoming 10 draft-ietf-shim6-failure-detection-12 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on December 8, 2008. 37 Abstract 39 This document specifies how the level 3 multihoming shim protocol 40 (SHIM6) detects failures between two communicating hosts. It also 41 specifies an exploration protocol for switching to another pair of 42 interfaces and/or addresses between the same hosts if a failure 43 occurs and an operational pair can be found. 45 Table of Contents 47 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 48 2. Requirements language . . . . . . . . . . . . . . . . . . . . 6 49 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 7 50 3.1. Available Addresses . . . . . . . . . . . . . . . . . . 7 51 3.2. Locally Operational Addresses . . . . . . . . . . . . . 8 52 3.3. Operational Address Pairs . . . . . . . . . . . . . . . 8 53 3.4. Primary Address Pair . . . . . . . . . . . . . . . . . . 10 54 3.5. Current Address Pair . . . . . . . . . . . . . . . . . . 10 55 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 11 56 4.1. Failure Detection . . . . . . . . . . . . . . . . . . . 11 57 4.2. Full Reachability Exploration . . . . . . . . . . . . . 13 58 4.3. Exploration Order . . . . . . . . . . . . . . . . . . . 14 59 5. Protocol Definition . . . . . . . . . . . . . . . . . . . . . 16 60 5.1. Keepalive Message . . . . . . . . . . . . . . . . . . . 16 61 5.2. Probe Message . . . . . . . . . . . . . . . . . . . . . 17 62 5.3. Keepalive Timeout Option Format . . . . . . . . . . . . 21 63 6. Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . 23 64 6.1. Incoming payload packet . . . . . . . . . . . . . . . . 23 65 6.2. Outgoing payload packet . . . . . . . . . . . . . . . . 24 66 6.3. Keepalive timeout . . . . . . . . . . . . . . . . . . . 24 67 6.4. Send timeout . . . . . . . . . . . . . . . . . . . . . . 25 68 6.5. Retransmission . . . . . . . . . . . . . . . . . . . . . 25 69 6.6. Reception of the Keepalive message . . . . . . . . . . . 25 70 6.7. Reception of the Probe message State=Exploring . . . . . 26 71 6.8. Reception of the Probe message State=InboundOk . . . . . 26 72 6.9. Reception of the Probe message State=Operational . . . . 26 73 6.10. Graphical Representation of the State Machine . . . . . 27 74 7. Protocol Constants . . . . . . . . . . . . . . . . . . . . . . 28 75 8. Security Considerations . . . . . . . . . . . . . . . . . . . 29 76 9. Operational Considerations . . . . . . . . . . . . . . . . . . 31 77 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 78 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 79 11.1. Normative References . . . . . . . . . . . . . . . . . . 34 80 11.2. Informative References . . . . . . . . . . . . . . . . . 34 81 Appendix A. Example Protocol Runs . . . . . . . . . . . . . . . . 36 82 Appendix B. Contributors . . . . . . . . . . . . . . . . . . . . 41 83 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 42 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 85 Intellectual Property and Copyright Statements . . . . . . . . . . 44 87 1. Introduction 89 The SHIM6 protocol [I-D.ietf-shim6-proto] extends IPv6 to support 90 multihoming. It is an IP layer mechanism that hides multihoming from 91 applications. A part of the SHIM6 solution involves detecting when a 92 currently used pair of addresses (or interfaces) between two 93 communication hosts has failed, and picking another pair when this 94 occurs. We call the former failure detection, and the latter locator 95 pair exploration. 97 This document specifies the mechanisms and protocol messages to 98 achieve both failure detection and locator pair exploration. This 99 part of the SHIM6 protocol is called the REAchability Protocol 100 (REAP). 102 Failure detection is made as light weight as possible. Data traffic 103 in both direction is observed, and in the case where there is no 104 traffic because the communication is idle, failure detection is also 105 idle and doesn't generate any packets. When data traffic is flowing 106 in both directions, there is no need to send failure detection 107 packets, either. Only when there is traffic in one direction, the 108 failure detection mechanism generates keepalives in the other 109 direction. As a result, whenever there is outgoing traffic and no 110 incoming return traffic or keepalives, there must be failure, at 111 which point the locator pair exploration is performed to find a 112 working address pair for each direction. 114 The document is structured as follows: Section 3 defines a set of 115 useful terms, Section 4 gives an overview of REAP, and Section 5 116 specifies the message formats and behaviour in detail. Section 8 117 discusses the security considerations of REAP. 119 In this specification, we consider an address to be synonymous with a 120 locator. Other parts of the SHIM6 protocol ensure that the different 121 locators used by a node actually belong together. That is, REAP is 122 not responsible for ensuring that it ends up with a legitimate 123 locator. 125 REAP has been designed to be used with SHIM6, and is therefore 126 tailored to an environment where it runs on hosts, uses widely 127 varying types of paths and is unaware of application context. As a 128 result, REAP attempts to be as self-configuring and unobtrusive as 129 possible. In particular, it avoids sending any packets except where 130 absolutely required and employs exponential back-off to avoid 131 congestion. The downside is that it cannot offer the same 132 granularity of detecting problems as mechanisms that have more 133 application context and ability to negotiate or configure parameters. 134 Future versions of this specification may consider extensions with 135 such capabilities, for instance through inheriting some mechanisms 136 from Bidirectional Forwarding Detection (BFD) protocol 137 [I-D.ietf-bfd-base]. 139 2. Requirements language 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 143 document are to be interpreted as described in [RFC2119]. 145 3. Definitions 147 This section defines terms useful for discussing failure detection 148 and locator pair exploration. 150 3.1. Available Addresses 152 SHIM6 nodes need to be aware of what addresses they themselves have. 153 If a node loses the address it is currently using for communications, 154 another address must replace this address. And if a node loses an 155 address that the node's peer knows about, the peer must be informed. 156 Similarly, when a node acquires a new address it may generally wish 157 the peer to know about it. 159 Definition. Available address - an address is said to be available 160 if all the following conditions are fulfilled: 162 o The address has been assigned to an interface of the node. 164 o The valid lifetime of the prefix (RFC 4861 [RFC4861] Section 165 4.6.2) associated with the address has not expired. 167 o The address is not tentative in the sense of RFC 4862 [RFC4862]. 168 In other words, the address assignment is complete so that 169 communications can be started. 171 Note that this explicitly allows an address to be optimistic in 172 the sense of Optimistic DAD [RFC4429] even though implementations 173 may prefer using other addresses as long as there is an 174 alternative. 176 o The address is a global unicast or unique local address [RFC4193]. 177 That is, it is not an IPv6 site-local or link-local address. 179 With link-local addresses, the nodes would be unable to determine 180 on which link the given address is usable. 182 o The address and interface is acceptable for use according to a 183 local policy. 185 Available addresses are discovered and monitored through mechanisms 186 outside the scope of SHIM6. SHIM6 implementations MUST be able to 187 employ information provided by IPv6 Neighbor Discovery [RFC4861], 188 Address Autoconfiguration [RFC4862], and DHCP [RFC3315] (when DHCP is 189 implemented). This information includes the availability of a new 190 address and status changes of existing addresses (such as when an 191 address becomes invalid). 193 3.2. Locally Operational Addresses 195 Two different granularity levels are needed for failure detection. 196 The coarser granularity is for individual addresses: 198 Definition. Locally Operational Address - an available address is 199 said to be locally operational when its use is known to be possible 200 locally: the interface is up, a default router (if needed) suitable 201 for this address is known to be reachable, and no other local 202 information points to the address being unusable. 204 Locally operational addresses are discovered and monitored through 205 mechanisms outside the SHIM6 protocol. SHIM6 implementations MUST be 206 able to employ information provided from Neighbor Unreachability 207 Detection [RFC4861]. Implementations MAY also employ additional, 208 link layer specific mechanisms. 210 Note 1: A part of the problem in ensuring that an address is 211 operational is making sure that after a change in link layer 212 connectivity we are still connected to the same IP subnet. 213 Mechanisms such as DNA CPL [I-D.ietf-dna-cpl] or DNAv6 214 [I-D.ietf-dna-protocol] can be used to ensure this. 216 Note 2: In theory, it would also be possible for hosts to learn 217 about routing failures for a particular selected source prefix, if 218 only suitable protocols for this purpose existed. Some proposals 219 in this space have been made, see, for instance 220 [I-D.bagnulo-shim6-addr-selection] and 221 [I-D.huitema-multi6-addr-selection], but none have been 222 standardized to date. 224 3.3. Operational Address Pairs 226 The existence of locally operational addresses are not, however, a 227 guarantee that communications can be established with the peer. A 228 failure in the routing infrastructure can prevent packets from 229 reaching their destination. For this reason we need the definition 230 of a second level of granularity, for pairs of addresses: 232 Definition. Bidirectionally operational address pair - a pair of 233 locally operational addresses are said to be an operational address 234 pair when bidirectional connectivity can be shown between the 235 addresses. That is, a packet sent with one of the addresses in the 236 source field and the other in the destination field reaches the 237 destination, and vice versa. 239 Unfortunately, there are scenarios where bidirectionally operational 240 address pairs do not exist. For instance, ingress filtering or 241 network failures may result in one address pair being operational in 242 one direction while another one is operational from the other 243 direction. The following definition captures this general situation: 245 Definition. Unidirectionally operational address pair - a pair of 246 locally operational addresses are said to be an unidirectionally 247 operational address pair when packets sent with the first address as 248 the source and the second address as the destination reaches the 249 destination. 251 SHIM6 implementations MUST support the discovery of operational 252 address pairs through the use of explicit reachability tests and 253 Forced Bidirectional Communication (FBD), described later in this 254 specification. In addition, implementations MAY employ additional 255 mechanisms. Some ideas of such mechanisms are listed below, but not 256 fully specified in this document: 258 o Positive feedback from upper layer protocols. For instance, TCP 259 can indicate to the IP layer that it is making progress. This is 260 similar to how IPv6 Neighbor Unreachability Detection can in some 261 cases be avoided when upper layers provide information about 262 bidirectional connectivity [RFC4861]. 264 In the case of unidirectional connectivity, the upper layer 265 protocol responses come back using another address pair, but show 266 that the messages sent using the first address pair have been 267 received. 269 o Negative feedback from upper layer protocols. It is conceivable 270 that upper layer protocols give an indication of a problem to the 271 multihoming layer. For instance, TCP could indicate that there's 272 either congestion or lack of connectivity in the path because it 273 is not getting ACKs. 275 o ICMP error messages. Given the ease of spoofing ICMP messages, 276 one should be careful to not trust these blindly, however. Our 277 suggestion is to use ICMP error messages only as a hint to perform 278 an explicit reachability test or move an address pair to a lower 279 place in the list of address pairs to be probed, but not as a 280 reason to disrupt ongoing communications without other indications 281 of problems. The situation may be different when certain 282 verifications of the ICMP messages are being performed, as 283 explained by Gont in [I-D.ietf-tcpm-icmp-attacks]. These 284 verifications can ensure that (practically) only on-path attackers 285 can spoof the messages. 287 3.4. Primary Address Pair 289 The primary address pair consists of the addresses that upper layer 290 protocols use in their interaction with the SHIM6 layer. Use of the 291 primary address pair means that the communication is compatible with 292 regular non-SHIM6 communication and no context ID needs to be 293 present. 295 3.5. Current Address Pair 297 SHIM6 needs to avoid sending packets which belong to the same 298 transport connection concurrently over multiple paths. This is 299 because congestion control in commonly used transport protocols is 300 based upon a notion of a single path. While routing can introduce 301 path changes as well and transport protocols have means to deal with 302 this, frequent changes will cause problems. Effective congestion 303 control over multiple paths is considered a research topic at the 304 time this specification is written. SHIM6 does not attempt to employ 305 multiple paths simultaneously. 307 Note: SCTP and future multipath transport protocols are likely to 308 require interaction with SHIM6, at least to ensure that they do 309 not employ SHIM6 unexpectedly. 311 For these reasons it is necessary to choose a particular pair of 312 addresses as the current address pair which is used until problems 313 occur, at least for the same session. 315 It is theoretically possible to support multiple current address 316 pairs for different transport sessions or SHIM6 contexts. 317 However, this is not supported in this version of the SHIM6 318 protocol. 320 A current address pair need not be operational at all times. If 321 there is no traffic to send, we may not know if the primary address 322 pair is operational. Nevertheless, it makes sense to assume that the 323 address pair that worked previously continues to be operational for 324 new communications as well. 326 4. Protocol Overview 328 This section discusses the design of the reachability detection and 329 full reachability exploration mechanisms, and gives on overview of 330 the REAP protocol. 332 Exploring the full set of communication options between two hosts 333 that both have two or more addresses is an expensive operation as the 334 number of combinations to be explored increases very quickly with the 335 number of addresses. For instance, with two addresses on both sides, 336 there are four possible address pairs. Since we can't assume that 337 reachability in one direction automatically means reachability for 338 the complement pair in the other direction, the total number of two- 339 way combinations is eight. (Combinations = nA * nB * 2.) 341 An important observation in multihoming is that failures are 342 relatively infrequent, so that an operational pair that worked a few 343 seconds ago is very likely to be still operational. So it makes 344 sense to have a light-weight protocol that confirms existing 345 reachability, and only invoke heavier exploration when a there is a 346 suspected failure. 348 4.1. Failure Detection 350 Failure detection consists of three parts: tracking local 351 information, tracking remote peer status, and finally verifying 352 reachability. Tracking local information consists of using, for 353 instance, reachability information about the local router as an 354 input. Nodes SHOULD employ techniques listed in Section 3.1 and 355 Section 3.2 to track the local situation. It is also necessary to 356 track remote address information from the peer. For instance, if the 357 peer's currently used address is no longer in use, a mechanism to 358 relay that information is needed. The Update Request message in the 359 SHIM6 protocol is used for this purpose [I-D.ietf-shim6-proto]. 360 Finally, when the local and remote information indicates that 361 communication should be possible and there are upper layer packets to 362 be sent, reachability verification is necessary to ensure that the 363 peers actually have an operational address pair. 365 A technique called Forced Bidirectional Detection (FBD, originally 366 defined in an earlier SHIM6 document [I-D.ietf-shim6-reach-detect]) 367 is employed for the reachability verification. Reachability for the 368 currently used address pair in a SHIM6 context is determined by 369 making sure that whenever there is data traffic in one direction, 370 there is also traffic in the other direction. This can be data 371 traffic as well, but also transport layer acknowledgments or a REAP 372 reachability keepalive if there is no other traffic. This way, it is 373 no longer possible to have traffic in only one direction, so whenever 374 there is data traffic going out, but there are no return packets, 375 there must be a failure, so the full exploration mechanism is 376 started. 378 A more detailed description of the current pair reachability 379 evaluation mechanism: 381 1. To avoid the other side from concluding there is a reachability 382 failure, it's necessary for a host implementing the failure 383 detection mechanism to generate periodic keepalives when there is 384 no other traffic. 386 FBD works by generating REAP keepalives if the node is receiving 387 packets from its peer but not sending any of its own. The 388 keepalives are sent at certain intervals so that the other side 389 knows there is a reachability problem when it doesn't receive any 390 incoming packets for its Send Timeout period. The host 391 communicates its Send Timeout value to the peer as an Keepalive 392 Timeout Option (section 5.3) in the I2, I2bis, R2, or UPDATE 393 messages. The peer then maps this value to its Keepalive Timeout 394 value. 396 The interval after which keepalives are sent is named Keepalive 397 Interval. The RECOMMENDED approach is sending keepalives at one- 398 half to one-third of the Keepalive Timeout interval, so that 399 multiple keepalives are generated and have time to reach the 400 correspondent before it times out. 402 2. Whenever outgoing data packets are generated, a timer is started 403 to reflect the requirement that the peer should generate return 404 traffic from data packets. The timeout value is set to the value 405 of Send Timeout. 407 For the purposes of this specification, "data packet" refers to 408 any packet that is part of a SHIM6 context, including both upper 409 layer protocol packets and SHIM6 protocol messages except those 410 defined in this specification. 412 3. Whenever incoming data packets are received, the timer associated 413 with the return traffic from the peer is stopped, and another 414 timer is started to reflect the requirement for this node to 415 generate return traffic. This timeout value is set to the value 416 of Keepalive Timeout. 418 These two timers are mutually exclusive. In other words, either 419 the node is expecting to see traffic from the peer based on the 420 traffic that the node sent earlier or the node is expecting to 421 respond to the peer based on the traffic that the peer sent 422 earlier (or the node is in an idle state). 424 4. The reception of a REAP keepalive packet leads to stopping the 425 timer associated with the return traffic from the peer. 427 5. Keepalive Interval seconds after the last data packet has been 428 received for a context, and if no other packet has been sent 429 within this context since the data packet has been received, a 430 REAP keepalive packet is generated for the context in question 431 and transmitted to the correspondent. A host may send the 432 keepalive sooner than Keepalive Interval seconds if 433 implementation considerations warrant this, but should take care 434 to avoid sending keepalives at an excessive rate. REAP keepalive 435 packets SHOULD continue to be sent at the Keepalive Interval 436 until either a data packet in the SHIM6 context has been received 437 from the peer or the Keepalive Timeout expires. Keepalives are 438 not sent at all if data was sent within the keep-alive interval. 439 A recommended value range for Keepalive Interval is specified in 440 Section 7. The actual value SHOULD be randomized in order to 441 prevent synchronization. 443 6. Send Timeout seconds after the transmission of a data packet with 444 no return traffic on this context, a full reachability 445 exploration is started. 447 Section 7 provides some suggested defaults for these timeout values. 448 Experience from the deployment of the SHIM6 protocol is needed in 449 order to determine what values are most suitable. 451 4.2. Full Reachability Exploration 453 As explained in previous sections, the currently used address pair 454 may become invalid either through one of the addresses being becoming 455 unavailable or nonoperational, or the pair itself being declared 456 nonoperational. An exploration process attempts to find another 457 operational pair so that communications can resume. 459 What makes this process hard is the requirement to support 460 unidirectionally operational address pairs. It is insufficient to 461 probe address pairs by a simple request - response protocol. 462 Instead, the party that first detects the problem starts a process 463 where it tries each of the different address pairs in turn by sending 464 a message to its peer. These messages carry information about the 465 state of connectivity between the peers, such as whether the sender 466 has seen any traffic from the peer recently. When the peer receives 467 a message that indicates a problem, it assists the process by 468 starting its own parallel exploration to the other direction, again 469 sending information about the recently received payload traffic or 470 signaling messages. 472 Specifically, when A decides that it needs to explore for an 473 alternative address pair to B, it will initiate a set of Probe 474 messages, in sequence, until it gets an Probe message from B 475 indicating that (a) B has received one of A's messages and, 476 obviously, (b) that B's Probe message gets back to A. B uses the same 477 algorithm, but starts the process from the reception of the first 478 Probe message from A. 480 Upon changing to a new address pair, the network path traversed most 481 likely has changed, so that the ULP SHOULD be informed. This can be 482 a signal for the ULP to adapt due to the change in path so that, for 483 example, TCP could initiate a slow start procedure, although it's 484 likely that the circumstances that led to the selection of a new path 485 already caused enough packet loss to trigger slow start. 487 REAP is designed to support failure recovery even in the case of 488 having only unidirectionally operational address pairs. However, due 489 to security concerns discussed in Section 8, the exploration process 490 can typically be run only for a session that has already been 491 established. Specifically, while REAP would in theory be capable of 492 exploration even during connection establishment, its use within the 493 SHIM6 protocol does not allow this. 495 4.3. Exploration Order 497 The exploration process assumes an ability to choose address pairs 498 for testing, in some sequence. This process may result in a 499 combinatorial explosion when there are many addresses on both sides, 500 but a back-off procedure is employed to avoid a "signaling storm". 502 Nodes first consult the RFC 3484 default address selection rules 503 [RFC3484] to determine what combinations of addresses are allowed 504 from a local point of view, as this reduces the search space. RFC 505 3484 also provides a priority ordering among different address pairs, 506 making the search possibly faster. (Additional mechanisms may be 507 defined in the future for arriving at an initial ordering of address 508 pairs before testing starts [I-D.ietf-shim6-locator-pair-selection].) 509 Nodes may also use local information, such as known quality of 510 service parameters or interface types to determine what addresses are 511 preferred over others, and try pairs containing such addresses first. 512 The SHIM6 protocol also carries preference information in its 513 messages. 515 Out of the set of possible candidate address pairs, nodes SHOULD 516 attempt to test through all of them until an operational pair is 517 found, and retrying the process as is necessary. However, all nodes 518 MUST perform this process sequentially and with exponential back-off. 519 This sequential process is necessary in order to avoid a "signaling 520 storm" when an outage occurs (particularly for a complete site). 521 However, it also limits the number of addresses that can in practice 522 be used for multihoming, considering that transport and application 523 layer protocols will fail if the switch to a new address pair takes 524 too long. 526 Section 7 suggests default values for the timers associated with the 527 exploration process. The value Initial Probe Timeout (0.5 seconds) 528 specifies the interval between initial attempts to send probes; 529 Number of Initial Probes (4) specifies how many initial probes can be 530 sent before the exponential backoff procedure needs to be employed. 531 This process increases the time between every probe if there is no 532 response. Typically, each increase doubles the time but this 533 specification does not mandate a particular increase. 535 Note: The rationale for sending four packets at a fixed rate 536 before the exponential backoff is employed is to avoid having to 537 send these packets excessively fast. Without this, having 0.5 538 seconds between the third and fourth probe means that the time 539 between the first and second probe would have to be 0.125 seconds, 540 which gives very little time for a reply to the first packet to 541 arrive. Also, this means that the first four packets are sent 542 within 0.875 seconds rather than 2 seconds, increasing the 543 potential for congestion if a large number of shim contexts need 544 to send probes at the same time after a failure. 546 Finally, Max Probe Timeout (60 seconds) specifies a limit beyond 547 which the probe interval may not grow. If the exploration process 548 reaches this interval, it will continue sending at this rate until a 549 suitable response is triggered or the SHIM6 context is garbage 550 collected, because upper layer protocols using the SHIM6 context in 551 question are no longer attempting to send packets. Reaching the Max 552 Probe Timeout may also serve as a hint to the garbage collection 553 process that the context is no longer usable. 555 5. Protocol Definition 557 5.1. Keepalive Message 559 The format of the keepalive message is as follows: 561 0 1 2 3 562 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Next Header | Hdr Ext Len |0| Type = 66 | Reserved1 |0| 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Checksum |R| | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 568 | Receiver Context Tag | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 | Reserved2 | 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | | 573 + Options + 574 | | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 Next Header, Hdr Ext Len, 0, 0, Checksum 579 These are as specified in Section 5.3 of the SHIM6 protocol 580 description [I-D.ietf-shim6-proto]. 582 Type 584 This field identifies the Keepalive message and MUST be set to 66 585 (Keepalive). 587 Reserved1 589 This is a 7-bit field reserved for future use. It is set to zero 590 on transmit, and MUST be ignored on receipt. 592 R 594 This is a 1-bit field reserved for future use. It is set to zero 595 on transmit, and MUST be ignored on receipt. 597 Receiver Context Tag 599 This is a 47-bit field for the Context Tag the receiver has 600 allocated for the context. 602 Reserved2 604 This is a 32-bit field reserved for future use. It is set to zero 605 on transmit, and MUST be ignored on receipt. 607 Options 609 This MAY contain one or more SHIM6 options.The inclusion of the 610 latter options is not necessary, however, as there are currently 611 no defined options that are useful in a Keepalive message. These 612 options are provided only for future extensibility reasons. 614 A valid message conforms to the format above, has a Receiver Context 615 Tag that matches to context known by the receiver, is valid shim 616 control message as defined in Section 12.2 of the SHIM6 protocol 617 description [I-D.ietf-shim6-proto], and its shim context state is 618 ESTABLISHED. The receiver processes a valid message by inspecting 619 its options, and executing any actions specified for such options. 621 The processing rules for this message are the given in more detail in 622 Section 6. 624 5.2. Probe Message 626 This message performs REAP exploration. Its format is as follows: 628 0 1 2 3 629 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 631 | Next Header | Hdr Ext Len |0| Type = 67 | Reserved |0| 632 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 633 | Checksum |R| | 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 635 | Receiver Context Tag | 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | Precvd| Psent |Sta| Reserved2 | 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | | 640 + First probe sent + 641 | | 642 + Source address + 643 | | 644 + + 645 | | 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 647 | | 648 + First probe sent + 649 | | 650 + Destination address + 651 | | 652 + + 653 | | 654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 655 | First probe nonce | 656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 657 | First probe data | 658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 659 / / 660 / Nth probe sent / 661 | | 662 + Source address + 663 | | 664 + + 665 | | 666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 667 | | 668 + Nth probe sent + 669 | | 670 + Destination address + 671 | | 672 + + 673 | | 674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 675 | Nth probe nonce | 676 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 677 | Nth probe data | 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 | | 680 + First probe received + 681 | | 682 + Source address + 683 | | 684 + + 685 | | 686 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 | | 688 + First probe received + 689 | | 690 + Destination address + 691 | | 692 + + 693 | | 694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 | First probe nonce | 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 697 | First probe data | 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 | | 700 + Nth probe received + 701 | | 702 + Source address + 703 | | 704 + + 705 | | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | | 708 + Nth probe received + 709 | | 710 + Destination address + 711 | | 712 + + 713 | | 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 715 | Nth probe nonce | 716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 | Nth probe data | 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | | 720 + Options + 721 | | 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 | | 724 + Options + 725 | | 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 728 Next Header, Hdr Ext Len, 0, 0, Checksum 730 These are as specified in Section 5.3 of the SHIM6 protocol 731 description [I-D.ietf-shim6-proto]. 733 Type 735 This field identifies the Probe message and MUST be set to 67 736 (Probe). 738 Reserved 740 This is a 7-bit field reserved for future use. It is set to zero 741 on transmit, and MUST be ignored on receipt. 743 R 745 This is a 1-bit field reserved for future use. It is set to zero 746 on transmit, and MUST be ignored on receipt. 748 Receiver Context Tag 750 This is a 47-bit field for the Context Tag the receiver has 751 allocated for the context. 753 Psent 755 This is a 4-bit field that indicates the number of sent probes 756 included in this probe message. The first set of probe fields 757 pertains to the current message and MUST be present, so the 758 minimum value for this field is 1. Additional sent probe fields 759 are copies of the same fields sent in (recent) earlier probes and 760 may be included or omitted as per any logic employed by the 761 implementation. 763 Precvd 765 This is a 4-bit field that indicates the number of received probes 766 included in this probe message. Received probe fields are copies 767 of the same fields in earlier received probes that arrived since 768 the last transition to state Exploring. When a sender is in state 769 InboundOk it MUST include copies of the fields of at least one of 770 the inbound probes. A sender MAY include additional sets of these 771 received probe fields in any state as per any logic employed by 772 the implementation. 774 The fields probe source, probe destination, probe nonce and probe 775 data may be repeated, depending on the value of Psent and 776 Preceived. 778 Sta (State) 780 This 2-bit State field is used to inform the peer about the state 781 of the sender. It has three legal values: 783 0 (Operational) implies that the sender both (a) believes it has 784 no problem communicating and (b) believes that the recipient also 785 has no problem communicating. 787 1 (Exploring) implies that the sender has a problem communicating 788 with the recipient, e.g., it has not seen any traffic from the 789 recipient even when it expected some. 791 2 (InboundOk) implies that the sender believes it has no problem 792 communicating, i.e., it at least sees packets from the recipient, 793 but that the recipient either has a problem or has not yet 794 confirmed to the sender that the problem has been solved. 796 Reserved2 798 MUST be set to 0 upon transmission and MUST be ignored upon 799 reception. 801 Probe source 803 This 128-bit field contains the source IPv6 address used to send 804 the probe. 806 Probe destination 808 This 128-bit field contains the destination IPv6 address used to 809 send the probe. 811 Probe nonce 813 This is a 32-bit field that is initialized by the sender with a 814 value that allows it to determine which sent probes a received 815 probe correlates with. It is highly RECOMMENDED that the nonce 816 field is at least moderately hard to guess so that even on-path 817 attackers can't deduce the next nonce value that will be used. 818 This value SHOULD be generated using a random number generator 819 that is known to have good randomness properties as outlined in 820 RFC 4086 [RFC4086]. 822 Probe data 824 This is a 32-bit field with no fixed meaning. The probe data 825 field is copied back with no changes. Future flags may define a 826 use for this field. 828 Options 830 For future extensions. 832 5.3. Keepalive Timeout Option Format 834 Either side of a SHIM6 context can notify the peer of the value that 835 it would prefer the peer to use as its Keepalive Timeout value. If 836 the host is using a non-default Send Timeout value, it SHOULD 837 communicate this value as a Keepalive Timeout value to the peer in 838 the below option. This option MAY be sent in the I2, I2bis, R2, or 839 UPDATE messages. The option SHOULD only need to be sent once in a 840 given shim6 association. If a host receives this option it SHOULD 841 update its Keepalive Timeout value for the correspondent. 843 0 1 2 3 844 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846 | Type = 10 |0| Length = 4 | 847 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 848 + Reserved | Keepalive Timeout | 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 851 Fields: 853 Type 855 This field identifies the option and MUST be set to 10 (Keepalive 856 Timeout). 858 Length 860 This field MUST be set as specified in Section 5.14 of the SHIM6 861 protocol description [I-D.ietf-shim6-proto]. That is, it is set 862 to 4. 864 Reserved 866 16-bit field reserved for future use. Set to zero upon transmit 867 and MUST be ignored upon receipt. 869 Keepalive Timeout 871 Value in seconds corresponding to suggested Keepalive Timeout 872 value for the peer. 874 6. Behaviour 876 The required behaviour of REAP nodes is specified below in the form 877 of a state machine. The externally observable behaviour of an 878 implementation MUST conform to this state machine, but there is no 879 requirement that the implementation actually employs a state machine. 880 Intermixed with the following description we also provide a state 881 machine description in a tabular form. That form is only 882 informational, however. 884 On a given context with a given peer, the node can be in one of three 885 states: Operational, Exploring, or InboundOK. In the Operational 886 state the underlying address pairs are assumed to be operational. In 887 the Exploring state this node has observed a problem and has 888 currently not seen any traffic from the peer. Finally, in the 889 InboundOK state this node sees traffic from the peer, but peer may 890 not yet see any traffic from this node so that the exploration 891 process needs to continue. 893 The node maintains also the Send timer (Send Timeout seconds) and 894 Keepalive timer (Keepalive Timeout seconds). The Send timer reflects 895 the requirement that when this node sends a payload packet there 896 should be some return traffic (either payload packets or Keepalive 897 messages) within Send Timeout seconds. The Keepalive timer reflects 898 the requirement that when this node receives a payload packet there 899 should a similar response towards the peer. The Keepalive timer is 900 only used within the Operational state, and the Send timer in the 901 Operational and InboundOK states. No timer is running in the 902 Exploring state. As explained in Section 4.1, the two timers are 903 mutually exclusive. That is, either the Keepalive timer is running 904 or the Send timer is running (or no timer is running). 906 Note that Appendix A gives some examples of typical protocol runs to 907 illustrate the behaviour. 909 6.1. Incoming payload packet 911 Upon the reception of a payload packet in the Operational state, the 912 node starts the Keepalive timer if it is not yet running, and stops 913 the Send timer if it was running. 915 If the node is in the Exploring state it transitions to the InboundOK 916 state, sends a Probe message, and starts the Send timer. It fills 917 the Psent and corresponding Probe source address, Probe destination 918 address, Probe nonce, and Probe data fields with information about 919 recent Probe messages that have not yet been reported as seen by the 920 peer. It also fills the Precvd and corresponding Probe source 921 address, Probe destination address, Probe nonce, and Probe data 922 fields with information about recent Probe messages it has seen from 923 the peer. When sending a Probe message, the State field MUST be set 924 to a value that matches the conceptual state of the sender after 925 sending the Probe. In this case the node therefore sets the Sta 926 field to 2 (InboundOk). The IP source and and destination addresses 927 for sending the Probe message are selected as discussed in 928 Section 4.3. 930 In the InboundOK state the node stops the Send timer if it was 931 running, but does not do anything else. 933 The reception of SHIM6 control messages other than the Keepalive and 934 Probe messages are treated similarly with payload packets. 936 While the Keepalive timer is running, the node SHOULD send Keepalive 937 messages to the peer with an interval of Keepalive Interval seconds. 938 Conceptually, a separate timer is used to distinguish between the 939 interval between Keepalive messages and the overall Keepalive Timeout 940 interval. However, this separate timer is not modelled in the 941 tabular or graphical state machines. When sent, the Keepalive 942 message is constructed as described in Section 5.1. It is sent using 943 the current address pair. 945 Operational Exploring InboundOk 946 ------------------------------------------------------------- 947 STOP Send; SEND Probe InboundOk; STOP Send 948 START Keepalive START Send; 949 GOTO InboundOk 951 6.2. Outgoing payload packet 953 Upon sending a payload packet in the Operational state, the node 954 stops the Keepalive timer if it was running and starts the Send timer 955 if it was not running. In the Exploring state there is no effect, 956 and in the InboundOK state the node simply starts the Send timer if 957 it was not yet running. (The sending of SHIM6 control messages is 958 again treated similarly here.) 960 Operational Exploring InboundOk 961 ----------------------------------------------------------- 962 START Send; - START Send 963 STOP Keepalive 965 6.3. Keepalive timeout 967 Upon a timeout on the Keepalive timer, the node sends one last 968 Keepalive message. This can only happen in the Operational state. 970 The Keepalive message is constructed as described in Section 5.1. It 971 is sent using the current address pair. 973 Operational Exploring InboundOk 974 ----------------------------------------------------------- 975 SEND Keepalive - - 977 6.4. Send timeout 979 Upon a timeout on the Send timer, the node enters the Exploring state 980 and sends a Probe message. The Probe message is constructed as 981 explained in Section 6.1, except that the Sta field is set to 1 982 (Exploring). 984 Operational Exploring InboundOk 985 ----------------------------------------------------------- 986 SEND Probe Exploring; - SEND Probe Exploring; 987 GOTO Exploring GOTO Exploring 989 6.5. Retransmission 991 While in the Exploring state the node keeps retransmitting its Probe 992 messages to different (or same) addresses as defined in Section 4.3. 993 A similar process is employed in the InboundOk state, except that 994 upon such retransmission the Send timer is started if it was not 995 running already. 997 The Probe messages are constructed as explained in Section 6.1, 998 except that the Sta field is set to 1 (Exploring) or 2 (InboundOk), 999 depending on which state the sender is in. 1001 Operational Exploring InboundOk 1002 ---------------------------------------------------------- 1003 - SEND Probe Exploring SEND Probe InboundOk 1004 START Send 1006 6.6. Reception of the Keepalive message 1008 Upon the reception of a Keepalive message in the Operational state, 1009 the node stops the Send timer, if it was running. If the node is in 1010 the Exploring state it transitions to the InboundOK state, sends a 1011 Probe message, and starts the Send timer. The Probe message is 1012 constructed as explained in Section 6.1. 1014 In the InboundOK state the Send timer is stopped, if it was running. 1016 Operational Exploring InboundOk 1017 ----------------------------------------------------------- 1018 STOP Send SEND Probe InboundOk; STOP Send 1019 START Send; 1020 GOTO InboundOk 1022 6.7. Reception of the Probe message State=Exploring 1024 Upon receiving a Probe with State set to Exploring, the node enters 1025 the InboundOK state, sends a Probe as described in Section 6.1, stops 1026 the Keepalive timer if it was running, and restarts the Send timer. 1028 Operational Exploring InboundOk 1029 ----------------------------------------------------------- 1030 SEND Probe InboundOk; SEND Probe InboundOk; SEND Probe 1031 STOP Keepalive; START Send; InboundOk; 1032 RESTART Send; GOTO InboundOk RESTART Send 1033 GOTO InboundOk 1035 6.8. Reception of the Probe message State=InboundOk 1037 Upon the reception of a Probe message with State set to InboundOk, 1038 the node sends a Probe message, restarts the Send timer, stops the 1039 Keepalive timer if it was running, and transitions to the Operational 1040 state. New current address pair is chosen for the connection, based 1041 on the reports of received probes in the message that we just 1042 received. If no received probes have been reported, the current 1043 address pair is unchanged. 1045 The Probe message is constructed as explained in Section 6.1, except 1046 that the Sta field is set to 0 (Operational). 1048 Operational Exploring InboundOk 1049 ------------------------------------------------------------- 1050 SEND Probe Operational; SEND Probe Operational; SEND Probe 1051 RESTART Send; RESTART Send; Operational; 1052 STOP Keepalive GOTO Operational RESTART Send; 1053 GOTO Operational 1055 6.9. Reception of the Probe message State=Operational 1057 Upon the reception of a Probe message with State set to Operational, 1058 the node stops the Send timer if it was running, starts the Keepalive 1059 timer if it was not yet running, and transitions to the Operational 1060 state. The Probe message is constructed as explained in Section 6.1, 1061 except that the Sta field is set to 0 (Operational). 1063 Note: This terminates the exploration process when both parties 1064 are happy and know that their peer is happy as well. 1066 Operational Exploring InboundOk 1067 ----------------------------------------------------------- 1068 STOP Send STOP Send; STOP Send; 1069 START Keepalive START Keepalive START Keepalive 1070 GOTO Operational GOTO Operational 1072 The reachability detection and exploration process has no effect on 1073 payload communications until a new operational address pairs have 1074 actually been confirmed. Prior to that the payload packets continue 1075 to be sent to the previously used addresses. 1077 6.10. Graphical Representation of the State Machine 1079 In the PDF version of this specification, an informational drawing 1080 illustrates the state machine. Where the text and the drawing 1081 differ, the text takes precedence. 1083 7. Protocol Constants 1085 The following protocol constants are defined: 1087 Send Timeout 15 seconds 1088 Keepalive Interval X seconds, where X is 1089 one third to one half of 1090 the Keepalive Timeout value 1091 (see Section 4.1) 1092 Initial Probe Timeout 0.5 seconds 1093 Number of Initial Probes 4 probes 1094 Max Probe Timeout 60 seconds 1096 Alternate values of the Send Timeout may be selected by a host and 1097 communicated to the peer in the Keepalive Timeout Option. A very 1098 small value of Send Timeout may affect the ability to exchange 1099 keepalives over a path that has a long roundtrip delay. Similarly, 1100 it may cause SHIM6 to react to temporary failures more often than 1101 necessary. As a result, it is RECOMMENDED that an alternate Send 1102 Timeout value not be under 10 seconds. Choosing a higher value than 1103 the one recommended above is also possible, but there is a 1104 relationship between Send Timeout and the ability of REAP to discover 1105 and correct errors in the communication path. In any case, in order 1106 for SHIM6 to be useful, it should detect and repair communication 1107 problems far before upper layers give up. For this reason, it is 1108 RECOMMENDED that Send Timeout be at most 100 seconds (default TCP R2 1109 timeout [RFC1122]). 1111 Note that it is not expected that the Send Timeout or other values 1112 need to be estimated based on experienced roundtrip times. 1113 Signaling exchanges are performed based on exponential backoff. 1114 The keepalive processes send packets only in the relatively rare 1115 condition that all traffic is unidirectional. Finally, because 1116 Send Timeout is far greater than usual roundtrip times, it merely 1117 divides the traffic into periods that SHIM6 looks at to decide 1118 whether to act. 1120 8. Security Considerations 1122 Attackers may spoof various indications from lower layers and the 1123 network in an effort to confuse the peers about which addresses are 1124 or are not operational. For example, attackers may spoof ICMP error 1125 messages in an effort to cause the parties to move their traffic 1126 elsewhere or even to disconnect. Attackers may also spoof 1127 information related to network attachments, router discovery, and 1128 address assignments in an effort to make the parties believe they 1129 have Internet connectivity when in reality they do not. 1131 This may cause use of non-preferred addresses or even denial-of- 1132 service. 1134 This protocol does not provide any protection of its own for 1135 indications from other parts of the protocol stack. Unprotected 1136 indications SHOULD NOT be taken as a proof of connectivity problems. 1137 However, REAP has weak resistance against incorrect information even 1138 from unprotected indications in the sense that it performs its own 1139 tests prior to picking a new address pair. Denial-of- service 1140 vulnerabilities remain, however, as do vulnerabilities against on 1141 path attackers. 1143 Some aspects of these vulnerabilities can be mitigated through the 1144 use of techniques specific to the other parts of the stack, such as 1145 properly dealing with ICMP errors [I-D.ietf-tcpm-icmp-attacks], link 1146 layer security, or the use of SEND [RFC3971] to protect IPv6 Router 1147 and Neighbor Discovery. 1149 Other parts of the SHIM6 protocol ensure that the set of addresses we 1150 are switching between actually belong together. REAP itself provides 1151 no such assurances. Similarly, REAP provides some protection against 1152 third party flooding attacks [AURA02]; when REAP is run its Probe 1153 nonces can be used as a return routability check that the claimed 1154 address is indeed willing to receive traffic. However, this needs to 1155 be complemented with another mechanism to ensure that the claimed 1156 address is also the correct host. SHIM6 does this by performing 1157 binding of all operations to context tags. 1159 The keepalive mechanism in this specification is vulnerable to 1160 spoofing. On path-attackers that can see a SHIM6 context tag can 1161 send spoofed Keepalive messages once per Send Timeout interval, to 1162 prevent two SHIM6 nodes from sending Keepalives themselves. This 1163 vulnerability is only relevant to nodes involved in a one-way 1164 communication. The result of the attack is that the nodes enter the 1165 exploration phase needlessly, but they should be able to confirm 1166 connectivity unless, of course, the attacker is able to prevent the 1167 exploration phase from completing. Off-path attackers may not be 1168 able to generate spoofed results, given that the context tags are 47- 1169 bit random numbers. 1171 To protect against spoofed keepalive packets, a host implementing 1172 both shim6 and IPsec MAY ignore incoming REAP keepalives if it has 1173 good reason to assume that the other side will be sending IPsec- 1174 protected return traffic. I.e., if a host is sending TCP data, it 1175 can reasonably expect to receive TCP ACKs in return. If no IPsec- 1176 protected ACKs come back but unprotected keepalives do, this could be 1177 the result from an attacker trying to hide broken connectivity. 1179 To protect against spoofed keepalive packets, a host implementing 1180 both shim6 and IPsec MAY ignore incoming REAP keepalives if it has 1181 good reason to assume that the other side will be sending IPsec- 1182 protected return traffic. I.e., if a host is sending TCP data, it 1183 can reasonably expect to receive TCP ACKs in return. If no IPsec- 1184 protected ACKs come back but unprotected keepalives do, this could be 1185 the result from an attacker trying to hide broken connectivity. 1187 The exploration phase is vulnerable to attackers that are on the 1188 path. Off-path attackers would find it hard to guess either the 1189 context tag or the correct probe identifiers. Given that IPsec 1190 operates above the shim layer, it is not possible to protect the 1191 exploration phase against on-path attackers. This is similar to the 1192 ability to protect other Shim6 control exchanges. There are 1193 mechanisms in place to prevent the redirection of communications to 1194 wrong addresses, but on-path attackers can cause denial-of-service, 1195 move communications to less-preferred address pairs, and so on. 1197 Finally, the exploration itself can cause a number of packets to be 1198 sent. As a result it may be used as a tool for packet amplification 1199 in flooding attacks. In order to prevent this it is required that 1200 the protocol employing REAP has built-in mechanisms to prevent this. 1201 For instance, in SHIM6 contexts are created only after a relatively 1202 large number of packets has been exchanged, a cost which reduces the 1203 attractiveness of using SHIM6 and REAP for amplification attacks. 1204 However, such protections are typically not present at connection 1205 establishment time. When exploration would be needed for connection 1206 establishment to succeed, its usage would result in an amplification 1207 vulnerability. As a result, SHIM6 does not support the use of REAP 1208 in connection establishment stage. 1210 9. Operational Considerations 1212 When there are no failures, the failure detection mechanism (and 1213 SHIM6 in general) are light-weight: keepalives are not sent when a 1214 SHIM6 context is idle or when there is traffic in both directions. 1215 So in normal TCP or TCP-like operation, there would only be one or 1216 two keepalives when a session transitions from active to idle. 1218 Only when there are failures, there is significant failure detection 1219 traffic, and then especially in the case where a link goes down that 1220 is shared by many active sessions and by multiple hosts. When this 1221 happens, one keepalive is sent and then a series of probes. This 1222 happens per active (traffic generating) context, which will all 1223 timeout within 10 seconds after the failure. This makes the peak 1224 traffic that SHIM6 generates after a failure around one packet per 1225 second per context. Presumably, the sessions that run over those 1226 contexts were sending at least that much traffic and most likely 1227 more, but if the backup path is significantly lower bandwidth than 1228 the failed path, this could lead to temporary congestion. 1230 However, note that in the case of multihoming using BGP, if the 1231 failover is fast enough that TCP doesn't go into slow start, the 1232 full data traffic that flows over the failed path is switched over 1233 to the backup path, and if this backup path is of a lower 1234 capacity, there will be even more congestion in that case. 1236 Although the failure detection probing does not perform congestion 1237 control as such, the exponential backoff makes sure that the number 1238 of packets sent quickly goes down and eventually reaches one per 1239 context per minute, which should be sufficiently conservative even on 1240 the lowest bandwidth links. 1242 Section 7 specifies a number of protocol parameters. Possible tuning 1243 of these parameters and others that are not mandated in this 1244 specification may affect these properties. It is expected that 1245 further revisions of this specification provide additional 1246 information after sufficient deployment experience has been obtained 1247 from different environments. 1249 Implementations may provide means to monitor their performance and 1250 send alarms about problems. Their standardization is, however, 1251 subject of future specifications. In general, SHIM6 is most 1252 applicable for small sites and hosts, and it is expected that 1253 monitoring requirements on such deployments are relatively modest. 1254 In any case, where the host is associated with a management system, 1255 it is RECOMMENDED that detected failures and failover events are 1256 reported via asynchronous notifications to the management system. 1257 Similarly, where logging mechanisms are available on the host, these 1258 events should be recorded in event logs. 1260 SHIM6 uses the same header for both signaling and the encapsulation 1261 of data packets after a rehoming event. This way, fate is shared 1262 between the two types of packets, so the situation where reachability 1263 probes or keepalives can be transmitted successfully, but data 1264 packets can not, is largely avoided: either all SHIM6 packets make it 1265 through, so SHIM6 functions as intended, or none do, and no SHIM6 1266 state is negotiated. Even in the situation where some packets make 1267 it through and other do not, SHIM6 will generally either work as 1268 intended or provide a service that is no worse than in the absense of 1269 SHIM6, apart from the possible generation a a small amount of 1270 signaling traffic. 1272 Sometimes data packets and possibly data packets encapsulated in the 1273 SHIM6 header do not make it through, but signaling and keepalives do. 1274 This situation can occur when there is a path MTU discovery black 1275 hole on one of the paths. If only large packets are sent at some 1276 point, then reachability exploration will be turned on and REAP will 1277 likely select another path, which may or may not be affected by the 1278 PMTUD black hole. 1280 10. IANA Considerations 1282 No IANA actions are required. The number assignments necessary for 1283 the messages defined in this document appear together with all the 1284 other IANA assignments in the main SHIM6 specification 1285 [I-D.ietf-shim6-proto]. 1287 11. References 1289 11.1. Normative References 1291 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1292 Requirement Levels", BCP 14, RFC 2119, March 1997. 1294 [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., 1295 and M. Carney, "Dynamic Host Configuration Protocol for 1296 IPv6 (DHCPv6)", RFC 3315, July 2003. 1298 [RFC3484] Draves, R., "Default Address Selection for Internet 1299 Protocol version 6 (IPv6)", RFC 3484, February 2003. 1301 [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness 1302 Requirements for Security", BCP 106, RFC 4086, June 2005. 1304 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 1305 Addresses", RFC 4193, October 2005. 1307 [RFC4429] Moore, N., "Optimistic Duplicate Address Detection (DAD) 1308 for IPv6", RFC 4429, April 2006. 1310 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 1311 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 1312 September 2007. 1314 [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless 1315 Address Autoconfiguration", RFC 4862, September 2007. 1317 11.2. Informative References 1319 [AURA02] Aura, T., Roe, M., and J. Arkko, "Security of Internet 1320 Location Management", In Proceedings of the 18th Annual 1321 Computer Security Applications Conference, Las Vegas, 1322 Nevada, USA., December 2002. 1324 [I-D.bagnulo-shim6-addr-selection] 1325 Bagnulo, M., "Address selection in multihomed 1326 environments", draft-bagnulo-shim6-addr-selection-00 (work 1327 in progress), October 2005. 1329 [I-D.huitema-multi6-addr-selection] 1330 Huitema, C., "Address selection in multihomed 1331 environments", draft-huitema-multi6-addr-selection-00 1332 (work in progress), October 2004. 1334 [I-D.ietf-bfd-base] 1335 Katz, D. and D. Ward, "Bidirectional Forwarding 1336 Detection", draft-ietf-bfd-base-06 (work in progress), 1337 March 2007. 1339 [I-D.ietf-dna-cpl] 1340 Nordmark, E. and J. Choi, "DNA with unmodified routers: 1341 Prefix list based approach", draft-ietf-dna-cpl-02 (work 1342 in progress), January 2006. 1344 [I-D.ietf-dna-protocol] 1345 Kempf, J., "Detecting Network Attachment in IPv6 Networks 1346 (DNAv6)", draft-ietf-dna-protocol-06 (work in progress), 1347 June 2007. 1349 [I-D.ietf-hip-mm] 1350 Henderson, T., "End-Host Mobility and Multihoming with the 1351 Host Identity Protocol", draft-ietf-hip-mm-05 (work in 1352 progress), March 2007. 1354 [I-D.ietf-shim6-locator-pair-selection] 1355 Bagnulo, M., "Default Locator-pair selection algorithm for 1356 the SHIM6 protocol", 1357 draft-ietf-shim6-locator-pair-selection-02 (work in 1358 progress), July 2007. 1360 [I-D.ietf-shim6-proto] 1361 Bagnulo, M. and E. Nordmark, "Shim6: Level 3 Multihoming 1362 Shim Protocol for IPv6", draft-ietf-shim6-proto-09 (work 1363 in progress), November 2007. 1365 [I-D.ietf-shim6-reach-detect] 1366 Beijnum, I., "Shim6 Reachability Detection", 1367 draft-ietf-shim6-reach-detect-01 (work in progress), 1368 October 2005. 1370 [I-D.ietf-tcpm-icmp-attacks] 1371 Gont, F., "ICMP attacks against TCP", 1372 draft-ietf-tcpm-icmp-attacks-02 (work in progress), 1373 May 2007. 1375 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1376 Communication Layers", STD 3, RFC 1122, October 1989. 1378 [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure 1379 Neighbor Discovery (SEND)", RFC 3971, March 2005. 1381 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 1382 RFC 4960, September 2007. 1384 Appendix A. Example Protocol Runs 1386 This appendix has examples of REAP protocol runs in typical 1387 scenarios. We start with the simplest scenario of two hosts, A and 1388 B, that have a SHIM6 connection with each other but are not currently 1389 sending any data. As neither side sends anything, they also do not 1390 expect anything back, so there are no messages at all: 1392 EXAMPLE 1: No communications 1394 Peer A Peer B 1395 | | 1396 | | 1397 | | 1398 | | 1399 | | 1400 | | 1401 | | 1402 | | 1404 Our second example involves an active connection with bidirectional 1405 payload packet flows. Here the reception of data from the peer is 1406 taken as an indication of reachability, so again there are no extra 1407 packes: 1409 EXAMPLE 2: Bidirectional communications 1411 Peer A Peer B 1412 | | 1413 | payload packet | 1414 |-------------------------------------------->| 1415 | | 1416 | payload packet | 1417 |<--------------------------------------------| 1418 | | 1419 | payload packet | 1420 |-------------------------------------------->| 1421 | | 1422 | | 1424 The third example is the first one that involves an actual REAP 1425 message. Here the hosts communicate in just one direction, so REAP 1426 messages are needed to indicate to the peer that sends payload 1427 packets that its packets are getting through: 1429 EXAMPLE 3: Unidirectional communications 1431 Peer A Peer B 1432 | | 1433 | payload packet | 1434 |-------------------------------------------->| 1435 | | 1436 | payload packet | 1437 |-------------------------------------------->| 1438 | | 1439 | payload packet | 1440 |-------------------------------------------->| 1441 | | 1442 | Keepalive id=p | 1443 |<--------------------------------------------| 1444 | | 1445 | payload packet | 1446 |-------------------------------------------->| 1447 | | 1448 | | 1450 The next example involves a failure scenario. Here A has addresses A 1451 and B has addresses B1 and B2. The currently used address pairs are 1452 (A, B1) and (B1, A). All connections via B1 become broken, which 1453 leads to an exploration process: 1455 EXAMPLE 4: Failure scenario 1457 Peer A Peer B 1458 | | 1459 State: | State: 1460 Operational | Operational 1461 | (A,B1) payload packet | 1462 |-------------------------------------------->| 1463 | | 1464 | (B1,A) payload packet | 1465 |<--------------------------------------------| At time T1 1466 | | path A<->B1 1467 | (A,B1) payload packet | becomes 1468 |----------------------------------------/ | broken 1469 | | 1470 | ( B1,A) payload packet | 1471 | /-----------------------------------------| 1472 | | 1473 | (A,B1) payload packet | 1474 |----------------------------------------/ | 1475 | | 1476 | (B1,A) payload packet | 1477 | /-----------------------------------------| 1478 | | 1479 | (A,B1) payload packet | 1480 |----------------------------------------/ | 1481 | | 1482 | | Send Timeout 1483 | | seconds after 1484 | | T1, B happens to 1485 | | see the problem 1486 | (B1,A) Probe id=p, | first and sends a 1487 | state=exploring | complaint that 1488 | /-----------------------------------------| it is not rec- 1489 | | eiving anything 1490 | | State: 1491 | | Exploring 1492 | | 1493 | (B2,A) Probe id=q, | 1494 | state=exploring | But its lost, 1495 |<--------------------------------------------| retransmission 1496 | | uses another pair 1497 A realizes | 1498 that it needs | 1499 to start the | 1500 exploration. It | 1501 picks B2 as the | 1502 most likely candidate, | 1503 as it appeared in the | 1504 Probe | 1505 State: InboundOk | 1506 | | 1507 | (A, B2) Probe id=r, | 1508 | state=inboundok, | 1509 | received probe q | This one gets 1510 |-------------------------------------------->| through. 1511 | | State: 1512 | | Operational 1513 | | 1514 | | 1515 | (B2,A) Probe id=s, | 1516 | state=operational, | B now knows 1517 | received probe r | that A has no 1518 |<--------------------------------------------| problem to receive 1519 | | its packets 1520 State: Operational | 1521 | | 1522 | (A,B2) payload packet | 1523 |-------------------------------------------->| Payload packets 1524 | | flow again 1525 | (B2,A) payload packet | 1526 |<--------------------------------------------| 1528 The next example shows when the failure for the current locator pair 1529 is in the other direction only. A has addresses A1 and A2, and B has 1530 addresses B1 and B2. The current communication is between A1 and B1, 1531 but A's packets no longer reach B using this pair. 1533 EXAMPLE 5: One-way failure 1535 Peer A Peer B 1536 | | 1537 State: | State: 1538 Operational | Operational 1539 | | 1540 | (A1,B1) payload packet | 1541 |-------------------------------------------->| 1542 | | 1543 | (B1,A1) payload packet | 1544 |<--------------------------------------------| 1545 | | 1546 | (A1,B1) payload packet | At time T1 1547 |----------------------------------------/ | path A1->B1 1548 | | becomes 1549 | | broken 1550 | (B1,A1) payload packet | 1551 |<--------------------------------------------| 1552 | | 1553 | (A1,B1) payload packet | 1554 |----------------------------------------/ | 1555 | | 1556 | (B1,A1) payload packet | 1557 |<--------------------------------------------| 1558 | | 1559 | (A1,B1) payload packet | 1560 |----------------------------------------/ | 1561 | | 1562 | | Send Timeout 1563 | | seconds after 1564 | | T1, B notices 1565 | | the problem and 1566 | (B1,A1) Probe id=p, | sends a com- 1567 | state=exploring | plaint that 1568 |<--------------------------------------------| it is not rec- 1569 | | eiving anything 1570 A responds | State: Exploring 1571 State: InboundOk | 1572 | | 1573 | (A1, B1) Probe id=q, | 1574 | state=inboundok, | 1575 | received probe p | 1576 |----------------------------------------/ | But A's response 1577 | | is lost 1578 | (B2,A2) Probe id=r, | 1579 | state=exploring | Next try different 1580 |<--------------------------------------------| locator pair 1581 | | 1582 | (A2, B2) Probe id=s, | 1583 | state=inboundok, | 1584 | received probes p, r | This one gets 1585 |-------------------------------------------->| through 1586 | | State: Operational 1587 | | 1588 | | B now knows 1589 | | that A has no 1590 | (B2,A2) Probe id=t, | problem to receive 1591 | state=operational, | its packets, and 1592 | received probe s | that A's probe 1593 |<--------------------------------------------| gets to B. It 1594 | | sends a 1595 State: Operational | confirmation to A 1596 | | 1597 | (A2,B2) payload packet | 1598 |-------------------------------------------->| Payload packets 1599 | | flow again 1600 | (B1,A1) payload packet | 1601 |<--------------------------------------------| 1603 Appendix B. Contributors 1605 This draft attempts to summarize the thoughts and unpublished 1606 contributions of many people, including the MULTI6 WG design team 1607 members Marcelo Bagnulo Braun, Erik Nordmark, Geoff Huston, Kurtis 1608 Lindqvist, Margaret Wasserman, and Jukka Ylitalo, the MOBIKE WG 1609 contributors Pasi Eronen, Tero Kivinen, Francis Dupont, Spencer 1610 Dawkins, and James Kempf, and HIP WG contributors such as Pekka 1611 Nikander. This draft is also in debt to work done in the context of 1612 SCTP [RFC4960] and HIP multihoming and mobility extension 1613 [I-D.ietf-hip-mm]. 1615 Appendix C. Acknowledgements 1617 The authors would also like to thank Christian Huitema, Pekka Savola, 1618 John Loughney, Sam Xia, Hannes Tschofenig, Sebastian Barre, Thomas 1619 Henderson, Matthijs Mekking, Deguang Le, Eric Gray, Dan Romascanu, 1620 Stephen Kent, Alberto Garcia, Bernard Aboba, Lars Eggert, Dave Ward, 1621 and Tim Polk for interesting discussions in this problem space, and 1622 for review of this specification. 1624 Authors' Addresses 1626 Jari Arkko 1627 Ericsson 1628 Jorvas 02420 1629 Finland 1631 Email: jari.arkko@ericsson.com 1633 Iljitsch van Beijnum 1634 IMDEA Networks 1635 Avda. del Mar Mediterraneo, 22 1636 Leganes, Madrid 28918 1637 Spain 1639 Email: iljitsch@muada.com 1641 Full Copyright Statement 1643 Copyright (C) The IETF Trust (2008). 1645 This document is subject to the rights, licenses and restrictions 1646 contained in BCP 78, and except as set forth therein, the authors 1647 retain all their rights. 1649 This document and the information contained herein are provided on an 1650 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1651 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1652 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1653 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1654 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1655 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1657 Intellectual Property 1659 The IETF takes no position regarding the validity or scope of any 1660 Intellectual Property Rights or other rights that might be claimed to 1661 pertain to the implementation or use of the technology described in 1662 this document or the extent to which any license under such rights 1663 might or might not be available; nor does it represent that it has 1664 made any independent effort to identify any such rights. Information 1665 on the procedures with respect to rights in RFC documents can be 1666 found in BCP 78 and BCP 79. 1668 Copies of IPR disclosures made to the IETF Secretariat and any 1669 assurances of licenses to be made available, or the result of an 1670 attempt made to obtain a general license or permission for the use of 1671 such proprietary rights by implementers or users of this 1672 specification can be obtained from the IETF on-line IPR repository at 1673 http://www.ietf.org/ipr. 1675 The IETF invites any interested party to bring to its attention any 1676 copyrights, patents or patent applications, or other proprietary 1677 rights that may cover technology that may be required to implement 1678 this standard. Please address the information to the IETF at 1679 ietf-ipr@ietf.org.