idnits 2.17.1 draft-ietf-shim6-reach-detect-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 12. -- Found old boilerplate from RFC 3978, Section 5.5 on line 399. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 376. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 383. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 389. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 405), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 34. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 6) being 421 lines == It seems as if not all pages are separated by form feeds - found 1 form feeds but 6 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 110: '...e, a value of 10 for ShimKeepT MUST be...' RFC 2119 keyword, line 215: '...implementations SHOULD try, within rea...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Jul 11, 2005) is 6858 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Iljitsch van Beijnum 2 Jul 11, 2005 4 Shim6 Reachability Detection 5 draft-ietf-shim6-reach-detect-01.txt 7 Status of this Memo 9 By submitting this Internet-Draft, each author represents that any 10 applicable patent or other IPR claims of which he or she is aware 11 have been or will be disclosed, and any of which he or she becomes 12 aware will be disclosed, in accordance with Section 6 of BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet Draft expires April 24, 2006. 32 Copyright Notice 34 Copyright (C) The Internet Society (2005). All Rights Reserved. 36 Abstract 38 The shim6 working group is developing a mechanism that allows 39 multihoming by using multiple addresses. When communication between 40 the initially chosen addresses for a transport session is no longer 41 possible, a "shim" layer makes it possible to switch to a different 42 set of addresses without breaking current transport protocol 43 assumptions. This draft discusses the issues of detecting failures 44 in a currently used address pair between two hosts and picking a 45 new address pair to be used when a failure occurs. The input for 46 these processes are ordered lists of local and remote addresses 47 that are reasonably likely to work. (I.e., not include addresses 48 that are known to be unreachable for local reasons.) These lists 49 must be available at both ends of the communication, although the 50 ordering may differ. Building these address lists from locally 51 available information and synchronizing them with the remote end 52 are outside the scope of this document. 54 This text is for the most part based on discussions on the multi6 55 list, several multi6 design team lists and the shim6 list, with 56 notable contributions from Erik Nordmark, Marcelo Bagnulo and Jari 57 Arkko. Suggestions and additions are more than welcome. 59 1 Introduction 61 A naive implementation of an (un)reachability detection mechanism 62 could just probe all possible paths between two hosts periodically. 63 A "path" is defined as a combination of a source address for host A 64 and a destination address for host B. In hop-by-hop forwarding the 65 source address doesn't have any effect on reachability, but in the 66 presence of filters or source address based routing, it may. And 67 although links almost always work in two directions, routing 68 protocols and filters only work in one direction so unidirectional 69 reachability can happen. Without additional mechanisms, the 70 practice of ingress filtering by ISPs makes unidirectional 71 connectivity likely. Being able to use the working leg in a 72 unidirectional path is useful, it's not an essential requirement. 73 It is essential, however, to avoid assuming bidirectional 74 connectivity when there is in fact a unidirectional failure. 76 Exploring the full set of communication options between two hosts 77 that both have two or more addresses is an expensive operation as 78 the number of combinations to be explored increases very quickly 79 with the number of addresses. For instance, with two addresses on 80 both sides, there are four possible address pairs. Since we can't 81 assume that reachability in one direction automatically means 82 reachability for the complement pair in the other direction, the 83 total number of two-way combinations is eight. (Combinations = nA * 84 nB * 2.) 86 An important observation in multihoming is that failures are 87 relatively infrequent, so that a path that worked a few seconds ago 88 is very likely to work now as well. So it makes sense to have a 89 light-weight protocol that confirms existing reachability, and only 90 invoke the much heavier protocol that can determine full 91 reachability when a there is a suspected failure. 93 2 Determining reachability for the current pair 95 Reachability for the currently used address pair in a shim context 96 is determined by making sure that whenever there is data traffic in 97 one direction, there is also traffic in the other direction. This 98 can be data traffic as well, but also transport layer 99 acknowledgments or a shim reachability keepalive if there is no 100 other traffic. This way, it is no longer possible to have traffic 101 in only one direction, so whenever there is data traffic going out, 102 but there are no return packets, there must be a failure, so the 103 full path exploration mechanism is started. 105 A more detailed description of the current pair reachability 106 evaluation mechanism: 108 1. The base timing unit for this mechanism is named ShimKeepT. 109 Until a negotiation mechanism to negotiate different values for 110 ShimKeepT becomes available, a value of 10 for ShimKeepT MUST be 111 used. 113 2. Whenever outgoing packets are generated that are part of a shim 114 context, one of two timestamps belonging to the shim context is 115 updated: the timestamp for outgoing data packets, or the timestamp 116 for outgoing non-data packets. The difference between the two is 117 that data packets are packets that should generate return traffic. 118 The host should use the information available to it to determine 119 whether a packet is a data or a non-data packet. Examples of 120 non-data packets are TCP ACKs and shim keepalive packets. If there 121 is any doubt, a packet should be considered a data packet. 123 3. Whenever incoming packets are received that are part of a shim 124 context, one of two timestamps belonging to the shim context is 125 updated: the timestamp for incoming data packets, or the timestamp 126 of incoming non-data packets. For incoming packets, it's less 127 critical that packets are labeled as data or non-data correctly. In 128 the absence of better information, hosts may assume that any IPv6 129 packet with a total length field with a value of 20 or lower is a 130 non-data packet. 132 4. ShimKeepT seconds after the last data packet has been received 133 for a context, and if no other packet has been sent within this 134 context since the data packet has been received, a shim keepalive 135 packet is generated for the context in question and transmitted to 136 the correspondent. The shim keepalive packet consists of an IPv6 137 header and a shim header containing the context tag, but no 138 subsequent headers. Intermediate headers may be present between the 139 IPv6 and shim headers. A host may send the shim keepalive after 140 fewer than ShimKeepT seconds if implementation considerations 141 warrant this. The average time after which shim keepalives are sent 142 must be at least ShimKeepT / 2 seconds. After potentially sending a 143 single shim keepalive, no additional shim keepalives are sent until 144 a data packet is received within this shim context. If the shim 145 keepalive wasn't sent because a data or non-data packet was sent 146 since the last received data packet, no shim keepalives are sent. 148 5. When after a timeout period since the last transmission of a 149 data packet no packets were received from the correspondent within 150 this context, a full reachability exploration is started. The 151 timeout period is ShimKeepT seconds plus additional time to 152 accommodate for a round trip and regular variations in 153 network-related functions. In the absence of better information, a 154 timeout of at least ShimKeepT + 2 seconds but no more than 155 ShimKeepT + 5 seconds is recommended. 157 3 Address pair exploration 159 In its essence, address pair exploration is very simple: just send 160 probes using every possible address pair, wait for something to 161 come back and possibly consider the round trip time. In practice, 162 testing the full combination of all source addresses and all 163 destination addresses is very undesirable because of the large 164 number of packets involved. This can be especially harmful when a 165 lot of hosts on a link start doing this for many of their 166 correspondents at the same time when there is a failure further 167 upstream. 169 In order to arrive at a desired outcome more quickly and with less 170 packets, and also to accommodate traffic engineering needs, we'll 171 assume a model where each address (source or destination) has two 172 preference values: p1 and p2. Addresses within the same set (source 173 or destination) are ranked by their p1 value, where a higher p1 174 means that the address is more preferred. When there are multiple 175 addresses with the same p1 value, an address is selected at random 176 from the group with the same p1 value, where the likelihood of 177 selecting any given address is relative to its p2 value compared to 178 the sum of all p2 values. So if addresses A, B and C have the same 179 p1 value and p2 values of 10, 30 and 60 for a total of 100, the 180 chance that A is selected is 10%, the chance that B is selected is 181 30% and the chance that C is selected is 60%. 183 Note that preference information may be related to type of service. 184 So different context with different type of service requirements 185 may see different p1 and p2 values for a given address. 187 When a host suspects that there is a failure for a context, it 188 gathers the set of possible source addresses and the set of 189 possible destination addresses. Both sets are ordered such that 190 each next address has an equal or lower p1 value. Addresses with 191 the same p1 value are further ordered as per any heuristics that 192 the host may employ, such as longest prefix matches on known 193 working and/or known not working addresses along with the p2 value. 194 The p2 value is considered relatively weak, and breaking p2 195 ordering is allowed if there is a sufficient reason for this. 196 However, in the absence of other information, p2 ordering should be 197 used. P1 ordering overrules any other information except a recent 198 reachability failure for the address in question. In addition to 199 this, the most recently used address is put in front of the list. 201 From the lists of eligible source and destination addresses, the 202 host creates a list of source/destination address pairs, along with 203 a combined preference value for this address pair. The calculation 204 of the preference value is implementation specific, with the only 205 requirement being that when one address pair has a higher p1 for 206 both the source and destination address than another pair, the pair 207 with the higher p1 values also has a higher combined pair 208 preference value. 210 The list of address pairs from different contexts is combined into 211 a host-wide list of address pairs. The preference values are 212 updated to take into consideration the number of contexts that is 213 interested in the pair. The specifics of calculating the resulting 214 host-wide preference value are left upto the implementation, but 215 implementations SHOULD try, within reason, to avoid using address 216 pairs with lower p1 values when pairs with higher p1 values are 217 available for a context. Context-specific address pair preferences 218 may be normalized prior to calculating host-wide address pair 219 preference values. (So when context A has pairs P and Q with p1 220 values 10 and 1, while context B has pairs R and S with p1 values 7 221 and 4, the values for P and R are changed to 2 and the values for Q 222 and S to 1.) 224 The host now starts probing address pairs, in order from the pair 225 with the highest pair preference to the pair with the lowest pair 226 preference. When all address pairs have been tested, testing 227 restarts from the pair with the highest preference. New pairs that 228 become available are put in the list before pairs that have been 229 probed already, regardless of the preference values. However, both 230 the group of address pairs that haven't been probed and the group 231 of address pairs that have may be reordered to reflect the 232 preference values, as long as reordering is done such that 233 starvation doesn't occur. 235 When a probe is answered by the correspondent, the context that use 236 the address pair in question are informed so they can start 237 remapping address is outgoing packets to the pair in question. (All 238 of this also happens when there is a working pair but an address 239 pair with at least one address with a higher preference is 240 determined to work.) At this point, the context updates its list of 241 address pairs to probe by removing all pairs where either the 242 source address has a lower p1 value than the p1 value of the now 243 working source address, or the destination address has a lower p1 244 value than the p1 value of the now working destination address. 245 Additionally, all address pairs where the p1 values for the source 246 and destination addresses match the respective p1 values of the 247 source and destination addresses in the now working pair are 248 removed from the list. The host-wide list of address pair to probe 249 is updated to reflect the removal of lower or equal priority 250 addresses, so probing will only continue for pairs where at least 251 one address has a higher p1 than the currently working pair. 253 The time between probes (ShimProbeT) must be chosen such that the 254 number of probes is limited to 60 per 300 second period. When no 255 probes have been sent for some time, an implementation may send the 256 initial group of probes at a fairly aggressive rate. For instance, 257 when no probes have been sent for 60 seconds, a host may send a 258 second probe 200 ms after the first one, and increase the 259 ShimProbeT by a factor 1.25 after every probe, until ShimProbeT 260 reaches 5 seconds. This results in sending 5 probes in the first 2 261 seconds and/or 14 probes within the first 20 seconds after a 262 failure. After that, there is one probe every 5 seconds. 264 When a context didn't see any outgoing data packets (see section 2) 265 for four minutes, it removes all its address pairs from the 266 host-wide list of address pairs. 268 4 Address pair exploration packet format 270 The address pair exploration packet may be encapsulated in 271 different ways. An obvious way is inside a shim header. The address 272 pair exploration packet contains the following information: 274 - A type field that is at least 8 bits long 275 - An 8 bit "number of probes sent" field 276 - An 8 bit "number of probes received" field 277 - An 8 bit "options length" field 278 - One or more sent probes (see below) 279 - Zero or more received probes (see below) 280 - Zero or more bytes of option data 282 There is currently one bit in the type field defined: the reply 283 requested bit. If this bit is set, the other side should send a 284 probe in reply to this probe. 286 The option data contains zero or more options in the following 287 format: 289 - An 8 bit option type 290 - An 8 bit option length 291 - Zero or more bytes of data in this option 293 Sent and received probes contain data in the following format: 295 - Source locator/address (128 bits) 296 - Destination locator/address (128 bits) 297 - Sent timestamp (32 bits in ms resolution relative to private epoch) 298 - Time between reception and retransmission (32 bits in ms resolution, 299 0 on first transmission) 300 - Nonce (32 bits) 301 - Sequence number (32 bits) 303 The first and only mandatory sent probe structure contains the 304 addresses that are present in the current IPv6 packet along with a 305 timestamp for the current time. Additional probe structures contain 306 copies of earlier probes, presumably toward different addresses, 307 with the appropriate field indicating how long ago the probe in 308 question was sent. The received probes are copies of the last seen 309 probes from the other side. 311 Note that an application must be able to infer which addresses 312 belong to the same host in order to perform this probing correctly 314 5 NAT and firewall considerations 316 Since shim6 is chartered for IPv6 solutions only, and NAT 317 compatibility is not expected, and by most people, not desired in 318 IPv6, there is no requirement for this protocol to pass through 319 Network Address Translation devices. However, the protocol may be 320 applicable outside shim6, making NAT compatibility desirable. 322 It is absolutely essential that the shim6 negotiations and the 323 reachability detection packets are passed through filters or 324 firewalls wherever application packets are passed through. If the 325 shim6 negotiation and reachability detection packets are filtered 326 out, shim6 can't be used. 328 A more complex situation arises when the shim6 negotiation packets 329 pass through a firewall, but the reachability detection packets are 330 blocked. To avoid this complexity, it's highly desirable to make 331 the shim6 negotiation and reachability detection part of the same 332 protocol, so either both are allowed through or both are blocked. 333 However, the same is true if this reachability detection mechanism 334 is used in other protocols. This makes it desirable to define the 335 reachability detection protocol such that it can be embedded in 336 other protocols. 338 Since firewalls are in wide use, it's important to consider whether 339 a new protocol will be able to pass through most firewalls without 340 requiring changes to the filter configuration. On the other hand, 341 it may not be possible to come up with a protocol that would be 342 allowed through a large percentage of all firewalls without 343 changes, so extra effort in this area may produce limited results. 344 Also, in the long run firewall configuration will presumably be 345 changed, so any compromises would only have short term benefits but 346 long term downsides. 348 6 Security considerations 350 To avoid exposing information (even if it's just the fact that an 351 address is reachable), hosts will probably want to limit themselves 352 to taking part in reachability detection with known correspondents. 353 This means that there must be identifying information and a nonce 354 that is at least hard to guess but easy to check in all 355 reachability detection packets. 357 4 Document and author information 359 This document expires April, 2006. The latest version will always 360 be available at http://www.muada.com/drafts/. Comments are welcome 361 at: 363 Iljitsch van Beijnum 365 Email: iljitsch@muada.com 367 Intellectual Property Statement 369 The IETF takes no position regarding the validity or scope of any 370 Intellectual Property Rights or other rights that might be claimed to 371 pertain to the implementation or use of the technology described in 372 this document or the extent to which any license under such rights 373 might or might not be available; nor does it represent that it has 374 made any independent effort to identify any such rights. Information 375 on the procedures with respect to rights in RFC documents can be 376 found in BCP 78 and BCP 79. 378 Copies of IPR disclosures made to the IETF Secretariat and any 379 assurances of licenses to be made available, or the result of an 380 attempt made to obtain a general license or permission for the use of 381 such proprietary rights by implementers or users of this 382 specification can be obtained from the IETF on-line IPR repository at 383 http://www.ietf.org/ipr. 385 The IETF invites any interested party to bring to its attention any 386 copyrights, patents or patent applications, or other proprietary 387 rights that may cover technology that may be required to implement 388 this standard. Please address the information to the IETF at 389 ietf-ipr@ietf.org. 391 Disclaimer of Validity 393 This document and the information contained herein are provided on an 394 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 395 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 396 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 397 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 398 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 399 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 401 Copyright Statement 403 Copyright (C) The Internet Society (2005). This document is subject 404 to the rights, licenses and restrictions contained in BCP 78, and 405 except as set forth therein, the authors retain all their rights. 407 Acknowledgment 409 Funding for the RFC Editor function is currently provided by the 410 Internet Society.