idnits 2.17.1 draft-worley-sip-happy-earballs-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC6555], [RFC3263]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 682 has weird spacing: '...ordered uno...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (March 2, 2017) is 2605 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4213' is defined on line 1973, but no explicit reference was found in the text ** Obsolete normative reference: RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPCORE D. Worley 3 Internet-Draft Ariadne 4 Intended status: Standards Track March 2, 2017 5 Expires: September 3, 2017 7 Happy Earballs: Success with Dual-Stack SIP 8 draft-worley-sip-happy-earballs-01 10 Abstract 12 TBD: The Session Initiation Protocol (SIP) supports multiple 13 transports running both over IPv4 and IPv6 protocols. In more and 14 more cases, a SIP user agent (UA) is connected to network interfaces 15 with multiple address families. In these cases sending a message 16 from a dual stack client to a dual stack server may suffer from the 17 issues described in [RFC6555] ("Happy Eyeballs"): the UA attempts to 18 send the message using IPv6, but IPv6 connectivity is not working to 19 the server. This can cause significant delays in the process of 20 sending the message to the server. This negatively affects the 21 user's experience. 23 TBD: This document builds on [RFC6555] by modifying the procedures 24 specified in [RFC3263] and related specifications to require that a 25 client ensure that communication targets are accessible before 26 sending messages to them, to allow a client to contact targets out of 27 the order required by other specifications, and to require a client 28 to properly distribute the message load among targets over time. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 3, 2017. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3. Structure of This Document . . . . . . . . . . . . . . . . . 8 67 3.1. Scope of Applicability . . . . . . . . . . . . . . . . . 9 68 4. Baseline Procedures . . . . . . . . . . . . . . . . . . . . . 10 69 4.1. Target Ordering . . . . . . . . . . . . . . . . . . . . . 12 70 4.2. Priority Nodes . . . . . . . . . . . . . . . . . . . . . 13 71 4.3. Unordered Nodes . . . . . . . . . . . . . . . . . . . . . 14 72 4.4. Combinations of Prioritization and Unordered Nodes . . . 14 73 4.5. Load-Balancing Nodes . . . . . . . . . . . . . . . . . . 17 74 5. Primary Procedure Modifications . . . . . . . . . . . . . . . 18 75 5.1. Permitted to Reorder Targets . . . . . . . . . . . . . . 18 76 5.2. Must Preserve Traffic Distribution . . . . . . . . . . . 18 77 6. Additional Requirements . . . . . . . . . . . . . . . . . . . 19 78 6.1. Address Family Preference . . . . . . . . . . . . . . . . 19 79 6.2. Address Selection . . . . . . . . . . . . . . . . . . . . 20 80 6.3. Vias . . . . . . . . . . . . . . . . . . . . . . . . . . 21 81 6.4. DNS Caching . . . . . . . . . . . . . . . . . . . . . . . 21 82 6.5. Unused Flows . . . . . . . . . . . . . . . . . . . . . . 21 83 6.6. Debugging and Troubleshooting . . . . . . . . . . . . . . 22 84 7. Consequences of the New Requirements . . . . . . . . . . . . 22 85 8. Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . 26 86 8.1. Policies and Implementor Latitude . . . . . . . . . . . . 27 87 8.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 28 88 8.2.1. Two Unordered Targets, Both Cached . . . . . . . . . 28 89 8.2.2. Two Unordered Targets, Both Cached, One Slow . . . . 29 90 8.2.3. Two Unordered Targets, One Cached . . . . . . . . . . 30 91 8.2.4. Two Unordered Targets, Neither Cached . . . . . . . . 31 92 8.2.5. Three Unordered Targets, One Cached . . . . . . . . . 31 93 8.2.6. Two Prioritized Targets, Both Cached . . . . . . . . 32 94 8.2.7. Two Prioritized Targets, the Second Cached . . . . . 33 95 8.2.8. Two Prioritized Targets, the First Cached . . . . . . 34 96 8.2.9. Two Prioritized Targets, Neither Cached . . . . . . . 34 97 8.2.10. Three Targets . . . . . . . . . . . . . . . . . . . . 35 98 9. Using a Simplified Data Structure . . . . . . . . . . . . . . 36 99 10. Security Considerations . . . . . . . . . . . . . . . . . . . 39 100 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 101 12. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 102 12.1. Changes from draft-worley-sip-happy-earballs-00 to 103 draft-worley-sip-happy-earballs-01 . . . . . . . . . . . 40 104 12.2. Changes from draft-worley-sip-he-connection-01 to draft- 105 worley-sip-happy-earballs-00 . . . . . . . . . . . . . . 40 106 12.3. Changes from draft-worley-sip-he-connection-00 to draft- 107 worley-sip-he-connection-01 . . . . . . . . . . . . . . 40 108 12.4. Changes from draft-johansson-sip-he-connection-01 to 109 draft-worley-sip-he-connection-00 . . . . . . . . . . . 40 110 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 41 111 13.1. Normative References . . . . . . . . . . . . . . . . . . 41 112 13.2. Informative References . . . . . . . . . . . . . . . . . 42 113 Appendix A. Implementing Load Balancing . . . . . . . . . . . . 43 114 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 43 115 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 44 117 1. Introduction 119 Earballs -- n., another word for ears. 120 Made famous by the animated American TV 121 spy comedy, "Archer". 123 "Ow, my earballs!" -- Cheryl Tunt, "Archer" 125 -- from "Urban Dictionary" 127 The Session Initiation Protocol (SIP) [RFC3261] and the documents 128 that extended it provide support for both IPv4 and IPv6. However, 129 this support has problems with environments that are characteristic 130 of the transitional migratory phase from IPv4 to IPv6 networks: 131 During this phase, many server and client implementations run on 132 dual-stack hosts. In such environments, a dual-stack host will 133 likely suffer greater connection delay, and by extension an inferior 134 user experience, than an IPv4-only host. 136 The difficulty stems from the reality that a device cannot predict 137 whether apparent IPv6 connectivity to another device is usable; both 138 devices may have IPv6 addresses and yet some transit network between 139 the two may not transport IPv6. SIP requires a device that transmits 140 a request to one destination address (e.g., the apparently useful 141 IPv6 address) to wait for a response for a substantial period 142 (usually 32 seconds) before transmitting the request to another 143 destination address (the less-preferred IPv4 address). The result is 144 that apparent IPv6 connectivity that is not functional can cause 145 substantial delays in processing SIP requests. Especially when the 146 requests are call setups (INVITE requests) this creates a very poor 147 user experience. 149 The overall procedure for transmitting a SIP message is as follows: 150 The specification of where a message is to be sent is called a 151 "goal"; in the common case, the goal is the host-port part of a SIP 152 URI. The SIP client uses specified algorithms (particularly 153 [RFC3263]) to make DNS lookups to determine a prioritized set of 154 "targets" (basically, destination IP address family, IP address, 155 transport protocol, and port). Under these specifications, the 156 client contacts the targets in order until one is contacted 157 successfully. In order to contact a target, the client establishes a 158 transport connection (if necessary), sends the message using the 159 transport (possibly resending the message several times), and then 160 (for requests) waits for a response (either provisional or final). 161 The process ends successfully if the client receives a response. The 162 process ends unsuccessfully if the client receives a permanent error 163 from the transport layer or if a SIP timer (Timer B or Timer F in 164 [RFC3261]) expires. Timeouts generally default to 32 seconds. If 165 the user has to wait for even one timeout, this will seriously 166 degrade the user experience. Thus, it is desirable to minimize the 167 number of times the client has timeouts when sending requests. 169 If the target list contains both IPv6 addresses and IPv4 addresses, 170 this procedure can degrade the user's experience in common 171 situations. Typically, this problem arises when the client has an 172 IPv6 interface, the server's preferred address is an IPv6 address, 173 but the transit networks between the client and server do not carry 174 IPv6. This can cause the client to attempt to send a SIP request to 175 the IPv6 target and then wait for a timeout before it continues with 176 an IPv4 target. This problem parallels a problem that was widely 177 seen in web browsers. The web browser problem was cured by 178 specifying that web browsers should use a "Happy Eyeballs" algorithm 179 [RFC6555] to determine the order in which to contact target 180 addresses. 182 This document builds on the concepts of [RFC6555] by amending these 183 procedures, so that targets derived from a goal may be contacted in a 184 different order than is specified by the previous specifications. By 185 analogy with the name "Happy Eyeballs" for similar algorithms in web 186 browsers, we label these algorithms "Happy Earballs" [UD]. 188 Conceptually, the Happy Earballs algorithm is composed of a number of 189 interlocking components: 191 a. For a given message, the set of targets is organized into a 192 directed acyclic graph (or partial order) specifying which 193 targets must be contacted before other targets. 195 b. Targets which are unresponsive, or which respond too slowly 196 relative to other targets in the set, are moved to the end of the 197 order. 199 c. Despite this change, the relative proportions of traffic sent to 200 responsive targets remains as specified by the baseline 201 specifications. 203 d. The client caches the known response times of targets. 205 e. The client does not send the message to a target before knowing 206 the target's response time (unless this is the last target in the 207 set). If needed, the client sends a "probe" transmission (which 208 has no semantic effect) to determine the target's response time. 210 These components are assembled into an event-driven algorithm for 211 sending the message to targets and, when necessary, sending probes to 212 targets. Implementors have flexibility in choosing among targets 213 that are not prioritized, and choosing when to send probes to 214 targets. Implementors also have flexibility defining when one target 215 is considered "too slow" compared to another target. (TBD: Is that 216 true?) Nonetheless, all variants of this algorithm have the 217 properties of distributing traffic among responsive targets as 218 specified by the baseline specifications and not sending messages to 219 non-responsive targets (unless all responsive targets have failed). 221 This document does not address the similar problem for media (RTP and 222 RTCP) packets. Problems with media connectivity can be addressed 223 using ICE [RFC5245], ALTC [RFC6947], or probing using RTP packets. 225 2. Terminology 227 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 228 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 229 document are to be interpreted as described in RFC 2119 [RFC2119]. 231 baseline: the prior specifications of the behavior of a client for 232 sending a message to a goal. The baseline specifications are 233 modified by this document. 235 cache: (verb) to store temporarily information regarding the 236 response time of a target so as to accelerate future message 237 transmission; (noun) the collection of information so stored 239 client: the device that must send a message 241 CRTT(T): for a target T, the cached round-trip response time of T 242 (if any is cached). There is a special value of "infinity" if T 243 did not respond at all. Collectively, these values are called 244 "CRTT() values". Contrasted with RTT(T), which is the actual 245 round-trip response time (at some given moment). 247 dual-stack: narrowly, a device with both IPv4 and IPv6 interfaces; 248 broadly, a device with more than one interface, including those 249 with interfaces that use two or more address families 251 flow: a group of transmissions to a target which are considered 252 related by a transport protocol. For connection-oriented 253 protocols, is the data carried by a connection. For 254 connectionless protocols, is all messages sent to a particular 255 target (5-tuple). For protocols with security associations, is 256 all messages sent within a particular security association. 258 goal: the identification of a particular server. E.g., a URI, a 259 TSAP, a via-para, or information provided by the context of the 260 message. 262 graph: a directed, acyclic graph whose nodes are a set of targets 263 and whose arcs are requirements that one target must be sent to 264 before another target is sent to. (Mathematically, the graph 265 defines a partial order on the targets.) Graph arcs are drawn 266 without arrowheads, and are to be read as the leftmost endpoint 267 must be sent to before the rightmost endpoint. A graph is always 268 relative to the sending of a particular message and may be 269 modified during the processing of the message. A graph may be 270 augmented with dummy nodes that do not represent targets in order 271 to reduce the number of arcs that must be drawn to express the 272 requirements. (Abstractly, dummy nodes may be considered targets 273 to which sends immediately fail.) 275 initial: a target which has no target prioritized before it 276 (considered relative to all targets in the target set, or some 277 subset or rearrangement of the target set, depending on the 278 context) 280 Limit(t): a function converting one time value into another. If 281 RTT(T1) > Limit(RTT(T0)), then target T1 responds "too slowly" 282 relative to the response time of target T0, and T1 is considered 283 non-responsive. Depends on two parameters, "m" and "f". 284 Limit(infinity) is considered to be infinity. 286 normal: a target which is not slow (relative to a particular goal) 287 NSAP: "Network Service Access Point", the identification of a 288 network interface, which comprises an address family and an 289 address 291 priority: the situation where the client must send the message to a 292 first target before it may send the message to a second target. 293 The prioritization of a set of targets necessarily forms a partial 294 order and can be represented by a directed acyclic graph. 296 probe: a transport operation that attempts to determine if a target 297 is responsive, without transmitting a message. Since a probe does 298 not send a message, if the transmission fails, it does not commit 299 the client to waiting a timeout period before sending the message 300 to another target. 302 quick: a target whose cached RTT() is less than Limit(0), and thus 303 is never slow for any goal 305 RTT(T): for a target T, the actual round-trip response time of T. 306 There is a special value of "infinity" if T does not respond at 307 all. Collectively, these values are called "RTT() values". 308 Contrasted with CRTT(T), which is the RTT(T) measurement recorded 309 in the cache. 311 responsive: a property of a target T0 relative to the set of targets 312 for a goal: the response time of the target T0 is sufficiently 313 short when compared with the response time of other targets; 314 RTT(T0) <= RTT(T1) for any T1 in the set of targets. See TBD 315 Section 5 for discussion. 317 send: (verb and noun) to attempt transmission of a (possibly 318 modified copy of) a message to a target. Contrasted with 319 "successfully send", which is when the message is received by the 320 server or when the client detects that the message is received by 321 the server. Does not include "probe" transport operations. 323 server: the (conceptual) device to which a message is to be sent. 324 May consist of multiple physical devices listening on multiple 325 network interfaces. 327 slow: a target T0 which appears to be "too slow" based on the cached 328 RTT() information, i.e., there is another target T1 of the goal 329 for which RTT(T0) > Limit(RTT(T1)) using the cached RTT() values. 330 This includes the case where RTT(T0) is infinity, i.e., T0 does 331 not respond at all. Opposite of "normal". 333 target: the complete specification of a transport to be used to send 334 a message from the client to the server. A target is commonly 335 conceptualized as "protocol/address/port" (which is a TSAP), but 336 the target also includes the TSAP that will be used as the source 337 of the communication, and so "5-tuple" is more accurate. In many 338 cases, the source TSAP is determined by the destination TSAP, so 339 it is not mentioned. Perforce, the transport protocol and address 340 family of the source and destination TSAPs are the same. 342 timeout: (noun) the period of time after sending a message which a 343 client must wait before it is permitted to send the message to 344 another target without receiving positive indication of the 345 failure of the first transmission. For sending requests, either 346 Timer B or Timer F [RFC3261]. Refers either to the abstract time 347 interval or a particular instance after a particular send 348 operation. (verb) the event when the timeout period has expired. 350 TSAP: "Transport Service Access Point", the identification of an 351 endpoint of a transport flow, which usually comprises a transport 352 protocol, an NSAP (or network address), and a port number 354 traffic: the messages for a particular goal that are successfully 355 sent to a particular target or set of targets; the number of such 356 messages sent over a period of time; or the fraction of such 357 messages relative to all messages for the goal 359 3. Structure of This Document 361 This document assumes that the context of the message provides a 362 "goal", which is the specification of the device or collection of 363 devices which are the server, and that there are existing "baseline" 364 specifications which translate the goal into a set of "targets", and 365 in what order(s) the client may send (possibly modified copies of) 366 the message to the targets, until one of the send operations is 367 successful. 369 Section 4 introduces a model of how the baseline specifications 370 operate: in every instance, they produce a partial order or directed 371 acyclic graph of targets specifying which targets must be sent to 372 before sending to which other targets. This section also describes 373 how "load-balanced" targets (SRV records with equal priority) are be 374 handled within this model. 376 Section 5describes the major modifications of the procedures for how 377 a client sends a message to a server. 379 This document relaxes the requirements on the client regarding the 380 order(s) in which the message may be sent to the targets, that is, it 381 permits additional orders, so that the client is less likely to have 382 to wait for a timeout. On the whole, when network connectivity is 383 imperfect, this allows clients to transmit the messages to servers 384 more quickly than they would using the unmodified baseline 385 specifications. 387 However, this document also places additional restrictions on the 388 client's sending behavior to ensure that the overall traffic 389 distribution among the targets converges over time to the 390 distribution that would have resulted from obeying the baseline 391 specifications. 393 Section 6 requires certain additional behaviors that ensure that the 394 use of IPv6 is not disadvantaged in mixed IPv4/IPv6 networks as well 395 as a number of miscellaneous requirements to optimize the behavior of 396 clients. 398 Section 7 discusses some consequences of the new requirements, 399 including what new orders of targets are permitted, what behaviors 400 minimize the time needed to successfully send a message, techniques 401 for probing a target (that is, determining if it is responsive 402 without sending the message, and thus possibly committing to waiting 403 for a timeout period), and suitable approaches for caching 404 information about targets. 406 These consequences, when assembled with the overarching goal of 407 minimizing the time to successfully send the message, produce a 408 generic algorithm, which is succinctly described in event-driven 409 terms in Section 8. 411 Section 8.2 applies the generic algorithm to a number of simple 412 cases, including how it replicates "Happy Eyeballs" behaviors in 413 analogous situations. 415 Section 9 constructs a simplified algorithm (which is a subset of the 416 generic algorithm) that replaces the directed acyclic graph with a 417 list of lists. 419 3.1. Scope of Applicability 421 This document modifies any SIP target selection processes that are 422 defined now or may be defined in the future, excepting those that 423 explicitly exempt themselves. 425 This document does not affect communications specified to be carried 426 only by a single WebSocket transport, as in those contexts there is 427 only one transport target (the WebSocket connection), and hence the 428 target selection process is trivial. 430 Similarly, this document does not affect sending a message when it is 431 specified to be carried by a particular existing connection, as when 432 a client sends a response to a request that it received via a 433 connection-oriented protocol, and that connection still exists. 435 A client MUST NOT consider the set of the target URIs of a "forking" 436 operation to be a single goal to which the processes of this document 437 apply. Instead, the modifications MUST be applied to each of those 438 URIs as separate goals. This is because the decision of whether to 439 send a request to a later forking target may be affected by the SIP 440 response to an earlier transmission ([RFC3261] section 16). A 441 forking proxy may, as part of its policy, apply some or all of these 442 procedures to a forking operation. This may be useful when sending a 443 request to a UAS that has registered multiple addresses for the same 444 AoR (as may be discerned by examining the +sip.instance media feature 445 tags in the registrations [RFC5626]). However, any such processing 446 is outside the scope of this document. 448 4. Baseline Procedures 450 The situation that this document addresses is when a SIP device is 451 required to send a message (which may be either a request or a 452 response). This document uses the term "client" for the device which 453 must send the message. The client is given a "goal", which is the 454 specification of the "server", which is the (possibly composite) 455 device to which the message is to be sent. (Both of these usages are 456 broader than the usage in [RFC3261].) The purpose of client is to 457 successfully send the message to the server. 459 (Note that in the case of a request, when the message is sent to a 460 target, a Via header field will be added to the message, and that the 461 added Via header field will be different for each target. If the 462 client is a UAC, it may effectively use the SIP signaling path as a 463 connectivity check for the media path by varying the SDP between 464 requests sent to different targets. This document considers all of 465 these versions of the message to be copies of the original message to 466 be sent.) 468 If the message is a request, the goal is usually the hostport part of 469 the URI in either the first Route header field or the request-line. 470 If the message is a response, the goal can be specified by the first 471 via-param in the first Via header field. If the message is to be 472 sent to an outbound proxy as specified by a DHCP option ([RFC3319] or 473 [RFC3361]), then the goal is the ordered list of addresses or domain 474 names provided by the DHCP option. In other situations, the goal may 475 be specified by other means. 477 Baseline specifications (e.g., [RFC3263], [RFC3319], [RFC3361], 478 [RFC6724], [RFC7984]) prescribe the construction of a set "targets" 479 which are potential transport destinations to which the message can 480 be sent. Which specifications apply is determined by the context of 481 the message. Targets are commonly conceptualized as 482 protocol/address/port combinations, but in general they are the pairs 483 of source and destination TSAPs that provide the full specification 484 of a transport flow. 486 For example, the sending of an initial REGISTER message can involve 487 six steps of expanding the goal into a list of targets: 489 Selecting one of the SIP domain names from the list provided by a 490 DHCP option [RFC3319] and [RFC3361]. 492 NAPTR translation from a SIP domain name to a server domain name 493 [RFC3263]. 495 Selecting a transport protocol (e.g., UDP or TCP), if there is no 496 transport parameter in the URI. 498 SRV translation of a server domain name to server interface names 499 [RFC2782]. 501 A/AAAA translation of a server interface name to server addresses. 503 Selecting a source address to use to communicate with a server 504 address [RFC6724]. 506 The process of deriving a set of targets from a goal can be 507 conceptualized as constructing a tree, with the root node being the 508 goal and the leaf nodes being the targets (whether or not an 509 implementation constructs such a representation). Each non-leaf node 510 is expanded into zero or more child nodes by the application of the 511 appropriate baseline specification. 513 For a particular node, the relevant baseline specification may 514 prescribe relationships between the traffic volume sent to the 515 subsets of targets that are descended from its children. E.g., a 516 standard may prescribe prioritization, such that if any target 517 descended from a higher-priority child is responsive, no traffic 518 should be sent to any target descended from a lower-priority child. 519 (SRV records and DHCP options can specify prioritization.) 520 Similarly, a standard may prescribe load balancing, such that if 521 there are responsive targets descended from two children, the ratio 522 of traffic to the two subsets targets descended from the two children 523 must be a particular (non-zero positive) number. (SRV records can 524 specify load balancing.) Alternatively, a node may place no 525 restrictions on the traffic to the subsets of targets descended from 526 its children. 528 As always, the construction of the tree and the traffic restrictions 529 incorporated into it may be modified by the local policy. In this 530 document, we assume that all modifications are made to the tree that 531 summarizes the requirements of the baseline specifications. This 532 makes it easier to determine the the interaction of local policy with 533 the procedure modifications of this document. And this assumption 534 does not limit the generality of what local policy may do, since the 535 local policy can remove any ordering restrictions from the tree, thus 536 permitting almost any behavior by the modified procedures. 538 The targets as generated by the baseline specifications MAY be 539 subsetted by deleting any targets that the client cannot access for 540 reasons such as the client does not implement the protocol, or it 541 does not have a network interface that supports the protocol, or it 542 does not have a network interface that can communicate with the 543 address. Removing these targets at an early stage of processing does 544 not affect the on-the-wire behavior of either the baseline processes 545 or the modified processes, since sends to such targets fail 546 immediately. 548 What constitutes failure of a send depends on the situation, and may 549 be a transport protocol failure, the absence of a timely 100 Trying 550 response, or a 503 response ([RFC3261] section 21.5.4 and [RFC3263] 551 section 4.3). For any particular message, either the message is 552 successfully sent to exactly one target or the overall sending 553 process fails. 555 In the worst-case situation, the process may require waiting for one 556 or more transaction timeouts (e.g., Timer B or Timer F in [RFC3261]) 557 before successfully transmitting the message to a target. As the 558 timeouts are typically 32 seconds, such a wait severely impacts the 559 user experience. 561 4.1. Target Ordering 563 The baseline specifications assume that the client will effectively 564 generate an order in which to contact the targets, then the client 565 will sequence through the list, sending the message to each target 566 until one of the sends is deemed to be successful. (Each send may 567 include retransmissions of the message.) This is because at any 568 stage, the client's next action is determined only by the goal and 569 the fact that sending to the previous targets has failed -- the first 570 target in the order is the target that the client will choose first 571 (which depends only on the goal), the second target in the order is 572 the target that the client will choose if and when the send to the 573 first target fails (which depends only on the goal and the identity 574 of the first target), etc. 576 (In mathematical terms, the target order is a total ordering of the 577 targets that is compatible with the partial ordering of the targets 578 specified by the traffic restrictions.) 580 The order must be compatible with the traffic restrictions imposed by 581 the specifications on the targets, and the traffic restrictions are 582 coded into the nodes of the target-derivation tree. The following 583 subsections discuss how the target order reflects these constraints 584 for every non-leaf node type. 586 4.2. Priority Nodes 588 If a node specifies that its children are prioritized, all of the 589 targets descended from a higher-priority child must precede all of 590 the targets descended from a lower-priority child. Suppose the tree 591 of targets has one interior node that specifies prioritization of 592 three targets. This can result from these DNS records: 594 _sip._udp.example.com. SRV 1 1 5060 sip1.example.com 595 _sip._udp.example.com. SRV 2 1 5060 sip2.example.com 596 _sip._udp.example.com. SRV 3 1 5060 sip3.example.com 597 sip1.example.com. A 192.0.2.1 598 sip2.example.com. A 192.0.2.2 599 sip3.example.com. A 192.0.2.3 601 We show the tree with the targets from left to right from highest 602 priority to lowest priority: 604 | 605 -priority-- 606 | | | 607 A B C 609 We can then represent the traffic restrictions in a (directed, 610 acyclic) graph. In our graphs, the arcs are to be read from left to 611 right, requiring sending to the left endpoint before sending to the 612 right endpoint. In this case, sending to A must be done before 613 sending to B, and sending to B before sending to C. We can add dummy 614 "start" and "finish" nodes, represented by "*", for convenience in 615 later analysis. We can imagine them to be targets to which sending 616 fails immediately. 618 *----A----B----C----* 620 There is only one allowed target order: 622 A B C 624 4.3. Unordered Nodes 626 If a node places no traffic restrictions on its children, there is no 627 collective relationship between the targets descended from its 628 children. The targets descended from different children may be 629 appear in any order, and can even be interleaved. We show the tree 630 of a simple example: 632 sip.example.com. A 192.0.2.1 633 sip.example.com. A 192.0.2.2 634 sip.example.com. A 192.0.2.3 636 | 637 -unordered- 638 | | | 639 A B C 641 We then represent the lack of traffic restrictions by a graph which 642 has no lines between targets. We add dummy start and finish nodes to 643 clarify that all three targets are part of one graph: 645 A 646 / \ 647 *-B-* 648 \ / 649 C 651 There are six allowed target orders: 653 A B C 655 A C B 657 B A C 659 B C A 661 C A B 663 C B A 665 4.4. Combinations of Prioritization and Unordered Nodes 667 A prioritized pair of hosts each with an unordered pair of targets 668 can be specified with these DNS records: 670 _sip._udp.example.com. SRV 1 1 5060 sip1.example.com 671 _sip._udp.example.com. SRV 2 1 5060 sip2.example.com 672 sip1.example.com. A 192.0.2.1 673 sip1.example.com. AAAA 2001:DB8::1 674 sip2.example.com. A 192.0.2.2 675 sip2.example.com. AAAA 2001:DB8::2 677 These records result in this tree: 679 | 680 ---priority--- 681 | | 682 unordered unordered 683 | | | | 684 A B C D 686 From this tree, we generate this graph. We can simplify the graph by 687 adding a dummy target which must follow both A and B and must precede 688 both C and D, in order to avoid writing four arcs A-C, A-D, B-C, and 689 B-D. 691 A C 692 / \ / \ 693 * * * 694 \ / \ / 695 B D 697 There are four allowed target orders: 699 A B C D 701 A B D C 703 B A C D 705 B A D C 707 Now consider the situation of a client that has two IPv4 interfaces, 708 192.0.2.129 and 192.0.2.193 (on two separate /26 subnets), with the 709 first prioritized before the second. If the client is configured to 710 attempt to use every interface for every destination address (rather 711 than being restricted by the source address selection rules 712 Section 6.2), every destination address generates two targets, one 713 with source address 192.0.2.129 and one with source address 714 192.0.2.193, with the first prioritized before the second. 716 In this situation, the following DNS records result in an unordered 717 pair of hosts, each with a prioritized pair of targets. 719 sip.example.com. A 192.0.2.1 720 sip.example.com. A 192.0.2.65 722 This situation results in this tree (where we now have to pay 723 attention to the source addresses in the targets): 725 | 726 -----------unordered----------- 727 | | 728 ----priority---- ----priority---- 729 | | | | 730 A B C D 731 192.0.2.129 192.0.2.193 192.0.2.129 192.0.2.193 732 192.0.2.1 192.0.2.1 192.0.2.65 192.0.2.65 734 and this graph: 736 A----B 737 / \ 738 * * 739 \ / 740 C----D 742 There are six allowed target orders: 744 A B C D 746 A C B D 748 C A B D 750 A C D B 752 C A D B 754 C D A B 756 When the client is allowed to select any of the available source 757 addresses for a send, each source address (combined with the 758 destination address) generates a separate target Section 6.2. Of the 759 targets, the one selected by the default source address selection 760 rules is preferred, and the remainder are unordered. If there is 761 only one destination address, it results in a tree with this form: 763 | 764 --priority-- 765 | | 766 | -unordered- 767 | | | | 768 A B C D 770 with this graph: 772 B 773 / \ 774 *----A----*-C-* 775 \ / 776 D 778 There are eight allowed target orders: 780 A B C D 782 A B D C 784 A C B D 786 A C D B 788 A D B C 790 A D C B 792 4.5. Load-Balancing Nodes 794 The practical difficulty with load-balancing nodes is to ensure that 795 the right proportion of traffic is sent to the descendants of each 796 child node without having to maintain long-term records of the amount 797 of traffic that has been sent to each child's descendants. A simple 798 way to accomplish this is to generate a new priority ordering of the 799 children for each new message to be processed, with the orderings 800 randomized to provide the correct traffic distribution among the 801 children. A randomization algorithm that achieves the correct 802 traffic distribution is described in Appendix A. 804 With this method, load-balancing nodes are processed for each message 805 instance by generating a suitably randomized ordering of the 806 children, and then processing them in the same way as the children of 807 a priority node. 809 5. Primary Procedure Modifications 811 The following modifications are specified for all baseline 812 specifications: 814 5.1. Permitted to Reorder Targets 816 A client MAY send the message to the targets in an order that is not 817 permitted by the baseline specifications. 819 5.2. Must Preserve Traffic Distribution 821 To state the next requirement, we must define what it means to say 822 that a target is "responsive". Intuitively, a target is responsive 823 if its response time to a message is not "too much" longer than the 824 response time of any other target for the goal. Note that the 825 responsiveness of a target is always defined relative to the set of 826 targets for a particular goal; hence, a target may be responsive for 827 one goal at the same time that it is not responsive for another goal. 829 We define RTT(T) to mean the round-trip response time of a target, 830 the time it takes to receive confirmation of the receipt of a message 831 sent to the target. If the target does not respond at all, we 832 consider RTT(T) to be "infinity", which is larger than any number. 833 Note that RTT(T) is a fact about reality at some instant, not a 834 measure of the client's current knowledge about reality at some 835 instant. 837 We define a function "Limit" that converts one time value into 838 another: if RTT(T1) > Limit(RTT(T0)), then target T1 responds "too 839 slowly" relative to the response time of target T0, and T1 is 840 considered non-responsive. We define Limit(t) = m*t + f, where "m" 841 and "f" are parameters defined below. 843 The parameter "m" limits the range of response times that we will 844 allow among targets we consider responsive. We set m to be 2. (TBD: 845 Is this a good choice?) 847 The parameter "f" is the length of time that we consider to be 848 insignificant when comparing the response time of targets. We set f 849 to be 2*T1, where T1 is the value of that name used in the procedures 850 of [RFC3261]. That value is commonly the round-trip time estimate of 851 the relevant network, and defaults to 500ms. (TBD: Is this a good 852 choice?) 854 The result is that Limit(t) is "twice t, plus a little more to 855 account for the inherent delays in the network". 857 A target T0 is defined to be "responsive" if 859 its response time, RTT(T0), is less than the relevant timeout 860 (often Timer B or Timer F), and 862 for any other target T1 of the goal, RTT(T0) < Limit(RTT(T1)). 864 TBD: The following wording must be set so that the client can move 865 non-responsive targets to any place in the order (or at least, any 866 later place in the order) without violating this condition. It is 867 not clear this has been accomplished yet. Later: I think we've 868 accomplished this now. 870 We are now ready to specify the major new constraint on the client's 871 behavior: The client's procedures MUST, over time, distribute the 872 traffic for any particular goal among the responsive targets in the 873 same proportions as are required by the baseline specifications. 874 Specifically, 876 If the set of targets for a particular goal does not change over a 877 period of time, 879 for any two subsets of targets, both of which contain at least one 880 target which is responsive for the entire period of time, and 882 if the baseline specifications prescribe a proportion between the 883 traffic to the two subsets, then 885 the proportion between the traffic to the responsive subsets of 886 the two subsets MUST converge to the proportion specified by the 887 baseline specifications. 889 Note that this is a "God's-eye view" requirement; the client has no 890 direct way to determine whether it is satisfying the requirement 891 because it cannot measure RTT() for every possible target at every 892 moment. Nonetheless, the client's procedures must ensure that the 893 requirement is satisfied under all circumstances. 895 6. Additional Requirements 897 6.1. Address Family Preference 899 Unless overridden by user configuration or by network configuration: 900 If the host has a policy of preferring one address family, the client 901 MUST prefer it. If the host's policy is unknown or not obtainable, 902 the client MUST prefer IPv6 over IPv4. Thus, usually clients must 903 give preference to IPv6 over IPv4. 905 Such a preference MUST be implemented in the following way: Consider 906 the "initial" targets, which are the targets which the baseline 907 specifications do not prioritize after any other targets. The client 908 must additionally prioritize the initial targets which are of the 909 preferred address family before the other initial targets. 911 TBD: Is this sufficient? We don't require address family preference 912 to affect non-initial targets. Alternatively, if the server has a 913 lot of IPv6 addresses, none of which are responsive, the only way to 914 quickly send to an IPv4 address is to send probes to all of the 915 initial IPv6 addresses and one (almost-initial) IPv4 addresses. This 916 is recommended in the probing heuristics, but might require a lot of 917 probes. This parallels the advice in RFC 6555 section 5.4 "web site 918 operators with native IPv6 connectivity SHOULD NOT offer multiple 919 AAAA records". 921 6.2. Address Selection 923 Clients SHOULD provide a mechanism by which the address selection 924 configuration [RFC6724] can be customized for the client 925 independently of any other application. 927 Clients SHOULD implement the destination address selection mechanism 928 specified in [RFC6724]. Note that this mechanism provides a priority 929 order among the set of A/AAAA records for a single server host name, 930 whereas [RFC3263] assumes that such sets of A/AAAA records are 931 unordered. 933 Clients SHOULD implement rule 5.5 of section 5 of [RFC6724], 934 preferring to use a source address with a prefix assigned by the 935 selected next-hop. This requires that the IPv6 stack remembers which 936 next-hops advertised which prefixes. 938 Clients SHOULD by default use the source address selection mechanism 939 specified in [RFC6724], which chooses one source TSAP for any 940 particular destination TSAP. 942 Clients SHOULD also be configurable to use an alternative mechanism, 943 in which for any destination TSAP, targets are generated for each 944 source TSAP that could possibly communicate with the destination 945 TSAP, with the source TSAP selected by [RFC6724] prioritized over the 946 other source TSAPs and the other source TSAPs being unordered among 947 themselves. 949 The alternative policy is useful in situations where the source 950 address selection table prioritizes an interface which does not 951 forward SIP traffic to the destination address. (For an example, 952 when the source address selection table routes almost all 953 destinations to an organizational VPN which has restricted 954 connectivity.) 956 6.3. Vias 958 A client MUST provide a via-param in the first Via header field in 959 requests that properly routes from the server to the client, 960 regardless of the presence of NATs in the transportation path. This 961 is necessary even when the request is sent via a connection-oriented 962 protocol, because the connection may be terminated before the 963 response is sent back to the client and the server may need to 964 reestablish a connection. The client SHOULD provide the "rport" 965 parameter on the via-param [RFC3581]. 967 Additionally, to assist tracing and diagnosis, a client SHOULD 968 provide the source TSAP that it used in the via-param. TBD: Is this 969 too strict? Is it actually useful? 971 6.4. DNS Caching 973 The information a client uses to determine a target set must be up- 974 to-date. In particular, the construction of a target set MUST NOT 975 utilize information from DNS whose TTL has expired. However, once a 976 target set has been constructed, it may be retained until the message 977 has either been sent or failed, without regard to the TTL of the 978 information used to construct it. 980 TBD: Should we allow a client to cache target set computations 981 somewhat longer than the TTL to minimize disruption and DNS traffic? 982 Phone calls typically take 3 minutes, so we could allow 5 minutes 983 grace and thus ensure that target sets rarely have to be recomputed 984 during a call. 986 6.5. Unused Flows 988 A flow that is created as a probe but not subsequently used (either 989 to send the message or to maintain a SIP Outbound flow) SHOULD be 990 terminated unless there is reason to believe that it will be used in 991 the near future. This includes flows that are connection-oriented 992 protocols as well as non-connection-oriented flows with security 993 associations. Minimizing the number of unused connections reduces 994 the load on the server and on stateful middleboxes. Also, if the 995 abandoned connection is IPv4, this reduces IPv4 address sharing 996 contention in any intermediate NATs. 998 6.6. Debugging and Troubleshooting 1000 Happy Earballs is aimed at ensuring a reliable user experience 1001 regardless of connectivity problems affecting any single transport. 1002 However, this naturally means that applications employing these 1003 techniques are by default less useful for diagnosing issues with a 1004 particular address family or interface. To assist in that regard, an 1005 implementation MAY provide a mechanism to disable its Happy Earballs 1006 behavior via a user setting, and SHOULD provide data useful for 1007 debugging (e.g., a log or way to review current preferences). 1009 7. Consequences of the New Requirements 1011 In this section we explore some of the consequences of these 1012 requirements and describe possible approaches for designing clients 1013 that satisfy the modified requirements and provide shorter 1014 transmission latency. 1016 A client may send the message to the targets in an order that is not 1017 permitted by the baseline specifications, but it may not omit any 1018 targets from its ordering. Thus, the client is required to send it 1019 to all the targets before it may declare failure of the send process. 1020 (TBD: Would it be better to 408 the message faster?) For example, if 1021 the client has cached information which indicates that a target is 1022 unreachable, the client may move that target to the end of the order, 1023 but if sending to all other targets is unsuccessful, the client must 1024 send to that target before declaring failure. 1026 A client may cache measured RTT() values for targets and use this 1027 information to optimize target orderings; the cached value of RTT(T) 1028 for a target T we name CRTT(T). Because a single target may appear 1029 in the target set for multiple goals, the client should cache RTT() 1030 for targets (rather than judgments of responsiveness), and then when 1031 sending to a goal use CRTT() to determine whether a target T is 1032 responsive relative to the other targets of that particular goal. 1034 A client may determine a target T1 to be "slow" (relative to a given 1035 goal) if its CRTT(T1) is greater than Limit(CRTT(T2)) for some other 1036 target T2 in the goal's target set. A target that is not slow is 1037 labeled "normal". Note that a target being slow is determined by the 1038 client via a combination of the information in the cache and the 1039 state of the network at the moments that the cached information was 1040 recorded. As it were, the client thinks a slow target is non- 1041 responsive, but the target may or may not actually be non-responsive 1042 at that moment, depending on whether the CRTT() values agree with 1043 current reality, which is the RTT() values. 1045 If there is be an upper bound on the length of time that a client 1046 retains CRTT() values, then the client may assume that any slow 1047 target is non-responsive, in that it may place the target after the 1048 normal targets in the order. For this reason, the client is required 1049 to have an upper bound on the length of time that CRTT() values 1050 remain in the cache, after which they are either deleted or replaced 1051 by values based on more recent observations of RTT(). 1053 The client may act on its CRTT() values in this way because in doing 1054 so, it will not violate the traffic distribution requirement: If a 1055 target is responsive over a long period, the eventual delete/refresh 1056 of cached values from before that period ensures that the client will 1057 eventually see the target as normal. (2) If a target is not 1058 continuously responsive over a long period, the traffic distribution 1059 requirement places no restriction on whether the client sends traffic 1060 to it or not, and repositioning it after all normal targets does not 1061 affect the traffic distribution among the normal targets. 1063 Implementations have flexibility regarding the lifetime of cache 1064 entries. The only absolute restriction is that as long as the 1065 configuration is stable, there is a maximum lifetime that CRTT() 1066 values are kept before being deleted or refreshed. The advice in 1067 [RFC6555] suggests that the upper bound on the lifetime of cache 1068 entries should on the order of 10 minutes. 1070 The length of time that different CRTT() values are cached may differ 1071 from each other. If the RTT() observation is from a periodic event 1072 (e.g., registration or NAT binding refresh), the cache lifetime 1073 should be similar to but longer than the period of the event, as the 1074 event period is likely to be configured based on knowledge of the 1075 interval over which the network is likely to remain stable. 1077 When the state of a source address is changed, or the state of the 1078 interface it is assigned to changes, or when the network it is 1079 connected to is re-initialized, cached RTT() values for targets with 1080 that source address should be deleted. Interfaces can determine 1081 network re-initialization by a variety of mechanisms (e.g., [RFC4436] 1082 and [RFC6059]). 1084 When a client processes a message, the ordering of targets that it 1085 sends to must be an ordering permitted by the baseline 1086 specifications, with the exception that slow targets may be moved 1087 after all normal targets. 1089 Since the client's goal is to deliver the message as quickly as 1090 possible, a client should always move slow targets to the end of the 1091 order, after the normal targets. Note that if a probe transmission 1092 during message processing discovers a target to be slow, the target 1093 can be moved at that time to after all normal targets. 1095 A client obtains RTT(T) for a target T whenever it sends to the 1096 target. But it can also obtain RTT(T) by a probe, which is any 1097 transmission to T which requires a response but does not involve 1098 sending the message (and hence does not commit the client to possibly 1099 waiting for a timeout before sending to another target). A client 1100 may send probes to several targets simultaneously. 1102 Probe operations include: 1104 o establishing a transport connection to the target (without sending 1105 the message as data on the connection) 1107 o sending a keep-alive, as specified by the transport protocol 1109 o sending a CR-LF-CR-LF keep-alive on a SIP Outbound flow [RFC5626] 1111 o sending a STUN keep-alive message on a SIP Outbound flow [RFC5626] 1113 o sending an OPTIONS request with "Max-Forwards: 0". 1115 (Note that a probe using an OPTIONS request can be used with any 1116 protocol. If the OPTIONS reaches the target, the target is required 1117 to respond with either a 200 or 483 response [RFC3261], without 1118 forwarding it to another entity, unless the target demands 1119 authentication, which would be indicated by an immediate 401 or 407 1120 response. Conveniently, a server can respond to such a request 1121 statelessly, so such requests are low-overhead. (Although the SIP 1122 Outbound keep-alive methods have even lower overhead.)) 1124 Similarly, if a client has a connection to a target T, and the 1125 connection has been idle for long enough, the client will not have a 1126 CRTT(T) for T, reflecting the fact that the connection may have 1127 failed without the client's knowledge. The client can refresh 1128 CRTT(T) by performing a suitable probe operation within the 1129 connection. 1131 The client should use a flow or connection that is established to a 1132 target instead of establishing a new flow or connection to that 1133 target for sending either a probe or a message. 1135 If the client initiates a probe of a target T, it may be able to 1136 decide that T is slow without waiting to determine the actual value 1137 of RTT(T), which may take as long as the timeout period: the true 1138 value of RTT(T) is always at least as large as the elapsed time since 1139 the probe was sent, and the determination of slowness depends on 1140 whether RTT(T) exceeds Limit(CRTT(Tfastest)) (where Tfastest is the 1141 target with smallest CRTT() value of any target in the target set). 1143 If the client establishes a connection to a target without 1144 simultaneously sending the message, the connection establishment is a 1145 probe of the target, and after initiating the connection but before 1146 sending the message, if that probe reveals that the target is slow, 1147 the client may move that target to the end of the ordering, and 1148 utilize another target. 1150 In order to minimize the chance that the client must wait for a 1151 timeout before sending to another target, the client may send probes 1152 to targets, and the RTT() values revealed by those probes can change 1153 what target the client will send to next. Because of this, the 1154 client's procedures do not simply convert the tree of targets into an 1155 ordering of the targets, which the client then follows -- Information 1156 discovered during the sequence of sends can affect the order targets 1157 are sent to. 1159 There is little value sending a probe to a target unless all targets 1160 of higher priority (1) have been sent the message (and failed), (2) 1161 have been sent a probe, or (3) have a CRTT() value. 1163 A target T0 is "quick" if its CRTT(T0) is less than Limit(0), that 1164 is, if T0 will be normal regardless of the CRTT() values of any other 1165 target. If an allowed next target has a cached RTT() value, and it 1166 is "quick", then it is never slow for any goal. 1168 If the client has reason to believe that it will soon be asked to 1169 send a message to a goal with a target T, and if CRTT(T) does not 1170 exist or is likely to expire before then, the client may decide to 1171 preemptively refresh the cached value by probing T. 1173 Similarly, a client may be in a situation where it has advance notice 1174 that it is likely to need to send a message to a particular target, 1175 for instance, if the user of a UA begins dialing an outgoing call 1176 which will be routed through a particular outgoing proxy. In such a 1177 situation, the client should consider preemptively probing the 1178 target. 1180 Note that the use of probes increases the non-message traffic to the 1181 targets, and thus has a cost. A client minimizes the expected 1182 transmission time by initially probing all of the targets, but that 1183 strategy maximizes the additional traffic. A client should weigh the 1184 tradeoff between improved user experience and increased traffic. In 1185 particular, the client should be aware of which messages require 1186 rapid service for good user experience (e.g., INVITE and BYE) and 1187 which do not (e.g., REGISTER and re-SUBSCRIBE). 1189 A client should avoid sending to a target which does not have a 1190 CRTT() value (unless it is the last remaining target), because the 1191 target might be non-responsive forcing the client to wait for a 1192 timeout. Instead, the client should probe the target before sending 1193 the message to it. 1195 8. Generic Algorithm 1197 Based on these observations, we can construct an event-driven 1198 algorithm that satisfies the revised requirements and avoids sending 1199 the message to slow targets. This algorithm has a number of places 1200 where the implementor has latitude. 1202 The primary state of the algorithm is a graph whose nodes are the 1203 targets which have not yet been sent to and failed. 1205 o Each target is annotated with its CRTT() value, or the absence 1206 thereof. 1208 o Each target has an indicator of whether a probe has been sent to 1209 it, and if so, when the probe was sent. 1211 o The graph is initialized with the graph of targets as constructed 1212 from the target construction tree. 1214 o The initial graph is assumed for convenience to have a unique 1215 rightmost (final) target named Fin, which may be a dummy. Targets 1216 determined to be slow will be moved to after target Fin. 1218 The algorithm uses some data derived from the graph. 1220 o S, which is Limit(CRTT(Tfastest)), where Tfastest is the target 1221 (that is or was in the graph) with the smallest CRTT(). Thus, any 1222 target T with CRTT(T) > S is slow. 1224 o The set of initial targets, targets which (at this time) have no 1225 targets prioritized before them, and thus could be chosen as the 1226 next target to send to. 1228 The algorithm can be most easily described in an event-driven manner, 1229 where particular situations trigger particular actions: 1231 1. When a target T is slow, that is, CRTT(T) > S and it is before 1232 Fin, it is removed from its position in the graph and moved to a 1233 new position, after Fin. (Note that additional arcs may need to 1234 be inserted where T used to be; if A is before T and T is before 1235 B, A remains before B.) (Note that a target can be discovered 1236 to be slow because a probe has been outstanding longer than S 1237 without responding, even if S is less than the timeout period.) 1239 2. When there is no outstanding send, the client may send to any 1240 initial target whose CRTT() is known. 1242 3. When there is no outstanding send and only one target in the 1243 graph, the client may send to it. 1245 4. When there is an initial dummy target, it may be removed from 1246 the graph. 1248 5. When a target T has no CRTT() and all targets that precede T 1249 either have a CRTT() or have a probe outstanding, the client may 1250 send a probe to T. 1252 6. When a probe or a send succeeds, fails, or times out, the client 1253 records CRTT() for that target. 1255 7. When the startup of a protocol (e.g., TCP SYN) probes the target 1256 without sending the message, the client should perform the 1257 startup as a probe, and send the message as a separate action. 1259 8. When a send to a target succeeds, sending to the goal has 1260 succeeded (and the algorithm terminates). 1262 9. When a send to a target fails, the target is removed from the 1263 graph. 1265 10. When the graph is empty (has no nodes), sending to the goal has 1266 failed (and the algorithm terminates). 1268 8.1. Policies and Implementor Latitude 1270 The generic algorithm allows the implementor a considerable degree of 1271 latitude in making the allowed choices. 1273 o If multiple initial targets have a CRTT(), the client can choose 1274 which to send to, e.g., the one with the smallest CRTT(). 1276 o The client chooses which targets to probe. The client should 1277 balance the value of information to be obtained by probing targets 1278 with the cost of doing so. The naive aggressive strategy is to 1279 probe all initial targets that do not have a CRTT(). 1281 o When a probe is sent and a response is not received for a time 1282 period, the client may consider it "lost" and send another probe. 1284 This time period will likely be longer than the typical network 1285 RTT but significantly less than the send timeout period. 1287 o The client may choose to probe targets to exercise the maximum 1288 diversity of network paths, covering the range of interfaces and 1289 address families in the target set. E.g., a non-initial target 1290 that provides diversity may be chosen to be probed (which may 1291 require first probing all of its preceding targets). 1293 o If all initial targets with CRTT() have "large" values, the client 1294 may choose not to send to any of them, and instead send probes to 1295 other targets which might respond faster. Note that in this 1296 situation, probing non-initial targets might be useful; if such a 1297 probe responds quickly, it might cause the initial targets to 1298 become slow and to be moved to the end, causing the probed target 1299 to become initial (and thus available for sending). 1301 o The client may add prioritizations between targets which are not 1302 required to have them due to the baseline specifications (as long 1303 as they are consistent with the prior prioritizations, i.e., do 1304 not specify both that A is before B and that B is before A). 1305 Doing this judiciously may allow the algorithm to use a simpler 1306 data structure for recording prioritization Section 9 1308 8.2. Examples 1310 This section shows examples of the generic algorithm (with naive 1311 choices for the various "policies") in situations involving various 1312 combinations of targets with particular properties. 1314 8.2.1. Two Unordered Targets, Both Cached 1316 Suppose there are two unordered targets, both of which have cached 1317 RTT() values less than S. 1319 A (cached) 1320 / \ 1321 * Fin 1322 \ / 1323 B (cached) 1325 First, the initial dummy node is removed. 1327 A (cached) 1328 \ 1329 Fin 1330 / 1331 B (cached) 1333 The client chooses the target with the smallest CRTT(), which we 1334 assume is A, and sends to it. 1336 If the send succeeds, the algorithm terminates successfully. 1338 In the unlikely case that sending to A fails, it is deleted from the 1339 graph. (Sending to A will likely succeed, because there is a current 1340 cached record of successful communication.) 1342 Fin 1343 / 1344 B (cached) 1346 The client chooses B to send to. 1348 If the send succeeds, the algorithm terminates successfully. 1350 If sending to B fails, it is deleted from the graph. 1352 Fin 1354 The initial dummy node Fin is deleted from the graph. The graph 1355 becomes empty, and the algorithm terminates unsuccessfully. 1357 8.2.2. Two Unordered Targets, Both Cached, One Slow 1359 Suppose there are two unordered targets, both of which have cached 1360 RTT() values, but the value for target A is greater than S. 1362 A (cached) 1363 / \ 1364 * Fin 1365 \ / 1366 B (cached) 1368 First, the initial dummy node is removed. 1370 A (cached) 1371 \ 1372 Fin 1373 / 1374 B (cached) 1376 Target A is examined and found to be slow, so it is moved after Fin. 1378 Fin----A (slow) 1379 / 1380 B (cached) 1382 The client chooses B to send to. 1384 If the send succeeds, the algorithm terminates successfully. 1386 If sending to B fails, it is deleted from the graph. 1388 Fin----A (slow) 1390 The initial dummy node is deleted from the graph. 1392 A (slow) 1394 The client sends to A. 1396 8.2.3. Two Unordered Targets, One Cached 1398 Suppose there are two unordered targets, only one of which has a 1399 CRTT(): 1401 A (cached) 1402 / \ 1403 * Fin 1404 \ / 1405 ----B----- 1407 The initial dummy node is deleted from the graph. 1409 A (cached) 1410 \ 1411 Fin 1412 / 1413 B----- 1415 Only target A is initial and has a CRTT(), so the client sends to it. 1417 Since B is initial and has no CRTT(), the client sends a probe to it. 1419 If the send to A succeeds, the algorithm terminates successfully. 1421 If the send to A fails, it is deleted from the graph. 1423 Fin 1424 / 1425 B----- 1427 Since B is the only remaining (non-dummy) target, the client sends to 1428 it. 1430 8.2.4. Two Unordered Targets, Neither Cached 1432 Suppose there are two unordered targets, neither of which have CRTT() 1433 values: 1435 A 1436 / \ 1437 * Fin 1438 \ / 1439 B 1441 The initial dummy node is deleted from the graph. 1443 A 1444 \ 1445 Fin 1446 / 1447 B 1449 The client cannot send to either target. The client sends probes to 1450 both targets. When one of the probes returns, let us say it is the 1451 probe to A, the graph indicates that one target is cached: 1453 A (cached) 1454 \ 1455 Fin 1456 / 1457 B----- 1459 Target A now is initial and has a CRTT(), so the client sends to it. 1461 In the unlikely case that sending to A fails, A is deleted from the 1462 graph, and the client immediately sends to B because B is the only 1463 remaining target. 1465 This situation parallels the standard "Happy Eyeballs" situation in 1466 HTTP, where the client has two (or more) unordered addresses for the 1467 server, one IPv4 and one IPv6. The client requests connections with 1468 both addresses simultaneously, and the first connection that succeeds 1469 is used to send the HTTP request [RFC6555]. 1471 8.2.5. Three Unordered Targets, One Cached 1473 Suppose there are three unordered targets, only one of which has a 1474 CRTT(): 1476 A (cached) 1477 / \ 1478 *-----B------Fin 1479 \ / 1480 ----C----- 1482 The initial dummy node is deleted from the graph. 1484 A (cached) 1485 \ 1486 B------Fin 1487 / 1488 C----- 1490 Only target A is initial and has a CRTT(), so the client sends to it. 1492 Since B is initial and has no CRTT(), the client sends a probe to it. 1493 Similarly for C. 1495 If the send to A succeeds, the algorithm terminates successfully. 1497 If the send to A fails, it is deleted from the graph. 1499 B------Fin 1500 / 1501 C----- 1503 The client waits until one of the probes (to B and C) either succeeds 1504 or fails. Suppose the probe to B succeeds. 1506 B (cached)------Fin 1507 / 1508 C-------------- 1510 Target B is now initial and has a CRTT(), so the client sends to it. 1512 8.2.6. Two Prioritized Targets, Both Cached 1514 Suppose there are two prioritized targets, both of which have CRTT(): 1516 *----A (cached)----B (cached)----Fin 1518 If neither A nor B is slow, processing proceeds according to the 1519 baseline specification: the client must send to A first, and then if 1520 that fails, send to B. In detail: 1522 The initial dummy node is deleted from the graph. 1524 A (cached)----B (cached)----Fin 1526 The client sends to A. 1528 If sending to A succeeds, the algorithm terminates successfully. If 1529 sending to A fails, A is removed from the graph. 1531 B (cached)----Fin 1533 Target B is now initial, and the client sends to B. 1535 If sending to B succeeds, the algorithm terminates successfully. If 1536 sending to B fails, B is removed from the graph, then Fin is removed 1537 from the graph, and the algorithm terminates unsuccessfully. 1539 However, as processing begins, it is possible that A is slow because 1540 CRTT(A) > S = Limit(CRTT(B)). In that case A is moved after Fin. 1542 B (cached)----Fin----A (slow) 1544 Target B is now initial, and the client sends to B. 1546 If sending to B succeeds, the algorithm terminates successfully. If 1547 sending to B fails, B is removed from the graph, then Fin is removed 1548 from the graph. Then A is initial, and the client sends to A. 1550 8.2.7. Two Prioritized Targets, the Second Cached 1552 Suppose there are two prioritized targets, but only the second has a 1553 CRTT(): 1555 *----A----B (cached)----Fin 1557 The initial dummy node is deleted from the graph. 1559 A----B (cached)----Fin 1561 The client sends a probe to A. If the probe response (the new 1562 CRTT(A)) is less than S = Limit(CRTT(B), the the graph becomes: 1564 A (cached)----B (cached)----Fin 1566 The client then sends to A. 1568 But if the probe remains outstanding longer than S, A becomes slow 1569 and is moved to after Fin. 1571 B (cached)----Fin----A (slow) 1573 At that point, the client sends to B. 1575 8.2.8. Two Prioritized Targets, the First Cached 1577 Suppose there are two prioritized targets, but only the first has a 1578 CRTT(): 1580 *----A (cached)----B ----Fin 1582 The initial dummy node is deleted from the graph. 1584 A (cached)----B----Fin 1586 The client can send to A immediately. If CRTT(A) <= Limit(0), the 1587 client must send to A immediately since A is quick, and cannot be 1588 revealed to be slow by probing other targets. 1590 But if RTT(A) is large enough that there is a reasonable chance that 1591 RTT(B) is smaller than Limit(RTT(A)) (which would make A slow), the 1592 client may choose to send a probe to B first. If the probe response 1593 returns quickly enough, CRTT(A) > Limit(CRTT(B)), which makes A slow. 1594 Then, A would be moved to after Fin, and the client would send to B. 1596 At any point before the probe response arrives, the client can decide 1597 to send to A. But if the probe does not return within 1598 Limit(CRTT(B)), the client should immediately send to A, as a will 1599 not be shown to be slow. 1601 8.2.9. Two Prioritized Targets, Neither Cached 1603 Suppose there are two prioritized targets, neither of which have 1604 CRTT(): 1606 *----A----B----Fin 1608 The initial dummy node is deleted from the graph. 1610 A----B----Fin 1612 The client must now send a probe to A before it can send the message 1613 to A. But if A and B use different network paths (as they likely 1614 will), it is probably advantageous to send probes to both A and B, in 1615 case B is sufficiently more responsive than A to make A slow and 1616 override the prioritization. If the client chooses to send both 1617 probes: 1619 If the response to the probe of A returns first, the client 1620 immediately sends to A. 1622 If the response to the probe of B returns, and the response to the 1623 probe of A returns before Limit(CRTT(B)), the client immediately 1624 sends to A. 1626 If the response to the probe of B returns, and the response to the 1627 probe of A does not returns before Limit(CRTT(B)), the client moves A 1628 to after Fin, and then immediately sends to B. 1630 8.2.10. Three Targets 1632 A more complex case is when the client must choose between three 1633 source addresses for a destination address. One, the address 1634 selected by the usual rules rules, is prioritized and the other two 1635 are unordered. Consider the example where the destination address is 1636 192.0.2.1 and the client has interfaces with addresses 192.0.2.65, 1637 192.0.2.129, and 192.0.2.197, with the first of those being the 1638 interface on the same subnet as the default outbound gateway. Then 1639 the graph is: 1641 B 1642 192.0.2.129/ 1643 192.0.2.1 1644 A / \ 1645 *----192.0.2.65/----* Fin 1646 192.0.2.1 \ C / 1647 192.0.2.197/ 1648 192.0.2.1 1650 Assuming that there are no CRTT() values, the client starts by 1651 probing A. Unless it is being very conservative with probes, the 1652 client also sends probes via B and C. 1654 As the probe responses arrive, CRTT() values are recorded, which can 1655 cause targets to be declared slow and relocated after Fin. 1656 Eventually, some target will be both initial and have a CRTT(), and 1657 the client sends to it. 1659 If the probe response from A arrives before Limit(RTT(B)) and 1660 Limit(RTT(C)), the graph is updated to: 1662 B 1663 / \ 1664 A (cached)----* Fin 1665 \ / 1666 C 1668 At that point, the client sends to A. 1670 On the other hand, if a probe response arrives from B but one does 1671 not arrive from A before Limit(CRTT(B)), then A becomes slow, and it 1672 is moved to after Fin. Then the dummy target after A and before B 1673 and C is deleted. 1675 B (cached) 1676 \ 1677 Fin----A (slow) 1678 / 1679 C--------- 1681 At this point, the client sends to B. 1683 9. Using a Simplified Data Structure 1685 Instead of maintaining a directed acyclic graph to control the 1686 client's operation, the client can replace the graph with a sequence 1687 of sets of targets based on their "rank". The rank of a target is 1688 defined as: 1690 The rank of an initial target is 0. 1692 The rank of a non-initial target is 1 more than the highest rank 1693 of any target that it is prioritized after. 1695 In this scheme, a target is prioritized after all targets with lower 1696 ranks. As processing progresses, all targets in the lowest non-empty 1697 rank are initial, and all targets in higher ranks are non-initial. 1699 This is compatible with the generic algorithm because it only adds to 1700 the prioritization relationships, it does not delete any of them. 1701 Thus, the behavior it causes is compatible with the Happy Earballs 1702 requirements. 1704 For example, consider a server with two addresses, IPv6 and IPv4, 1705 with IPv6 prioritized via SRV records. Both addresses accept both 1706 TCP and UDP traffic: 1708 _sip._udp.example.com. SRV 1 1 5060 sip1.example.com 1709 _sip._udp.example.com. SRV 2 1 5060 sip2.example.com 1710 _sip._tcp.example.com. SRV 1 1 5060 sip1.example.com 1711 _sip._tcp.example.com. SRV 2 1 5060 sip2.example.com 1712 sip1.example.com. AAAA 2001:DB8::1 1713 sip2.example.com. A 192.0.2.1 1715 Suppose a request can be transported using either TCP or UDP, and the 1716 situation does not specify a priority between them. The tree of 1717 targets is then: 1719 | 1720 -------------unordered----------- 1721 | | 1722 -----priority----- -----priority----- 1723 | | | | 1724 TCP 2001:DB8::1 TCP 192.0.2.1 UDP 2001:DB8::1 UDP 192.0.2.1 1726 The graph, annotating each target with its rank, is: 1728 (0) TCP 2001:DB8::1----(1) TCP 192.0.2.1 1729 / \ 1730 * * 1731 \ / 1732 (0) UDP 2001:DB8::1----(1) UDP 192.0.2.1 1734 Which can be turned into a list of lists as: 1736 rank 0: TCP 2001:DB8::1, UDP 2001:DB8::1 1738 rank 1: TCP 192.0.2.1, UDP 192.0.2.1 1740 The rank representation is functionally equivalent to the following 1741 graph, which is the original graph with additional lines, showing 1742 that the rank representation constrains the client's behavior more 1743 than the original graph does: 1745 (0) TCP 2001:DB8::1 (1) TCP 192.0.2.1 1746 / \ / \ 1747 * * * 1748 \ / \ / 1749 (0) UDP 2001:DB8::1 (1) UDP 192.0.2.1 1751 The rank lists can be built (without first constructing the graph) by 1752 walking the target tree from top to bottom and left to right (highest 1753 priority to lowest priority), with each node passing downward MRdown, 1754 the minimum rank that any descendant target is allowed, and each node 1755 passing upward MRup, the minimum rank allowed for any target 1756 prioritized after that node. 1758 o The root node's MRdown is 0. 1760 o For an unordered node: 1762 * Each child's MRdown is the node's MRdown. 1764 * The node's MRup is the maximum of the node's MRdown and all of 1765 its children's MRup's. 1767 o For a priority node (with the children ordered by priority): 1769 * The first child's MRdown is the node's MRdown. 1771 * A later child's MRdown is the preceding child's MRup. 1773 * The node's MRup is the final child's MRup, or if there are no 1774 children, the node's MRdown. 1776 o For a load-balancing node, the children are first prioritized 1777 randomly ([RFC2782] and Appendix A), then processed as for a 1778 prioritized node. 1780 o For a target node: 1782 * The target has rank MRdown. 1784 * The node's MRup is MRdown + 1. 1786 Here is the preceding example's tree, with each node annotated with 1787 its MRdown on the left of the node label and its MRup on the right of 1788 the node label: 1790 | 1791 ---------(0) unordered (2)------- 1792 | | 1793 -(0) priority (2)- -(0) priority (2)- 1794 | | | | 1795 (0) TCP 2001:DB8::1 (1) | (0) UDP 2001:DB8::1 (1) | 1796 | | 1797 (1) TCP 192.0.2.1 (2) (1) UDP 192.0.2.1 (2) 1799 Note that the target tree does not have to be explicitly constructed; 1800 it can be implicitly walked by a series of recursive function calls, 1801 with the functions passing MRdown and MRup values between themselves, 1802 and each target being inserted into the rank list-of-lists as it is 1803 generated. 1805 The address family preference rule Section 6.1 can be implemented 1806 within the rank representation by first constructing the ranks based 1807 on the baseline specifications, and then splitting rank 0 into two 1808 ordered sub-ranks, 0.0 and 0.1, with 0.0 containing all rank 0 1809 targets of the preferred address family and rank 0.1 containing all 1810 other rank 0 targets. 1812 An example of address family preference processing is the ordinary 1813 case of two prioritized servers each with an IPv6 and IPv4 address: 1815 _sip._udp.example.com. SRV 1 1 5060 sip1.example.com 1816 _sip._udp.example.com. SRV 2 1 5060 sip2.example.com 1817 sip1.example.com. AAAA 2001:DB8::1 1818 sip1.example.com. A 192.0.2.1 1819 sip2.example.com. AAAA 2001:DB8::2 1820 sip2.example.com. A 192.0.2.2 1822 | 1823 ----------(0) priority (2)-------- 1824 | | 1825 -(0) unordered (1)- -(1) unordered (2)- 1826 | | | | 1827 (0) 2001:DB8::1 (1) | (1) 2001:DB8::2 (2) | 1828 | | 1829 (0) 192.0.2.1 (1) (1) 192.0.2.2 (2) 1831 After splitting rank 0 based on the address family preference, the 1832 the ranks are: 1834 rank 0.0: 2001:DB8::1 1836 rank 0.1: 192.0.2.1 1838 rank 1: 2001:DB8::2, 192.0.2.2 1840 10. Security Considerations 1842 This document changes the order in which a client will send to 1843 targets but does not change the set of targets that it will send to. 1844 There are no known SIP systems whose security depends on the order in 1845 which a client sends to targets. Given that network connectivity is 1846 unreliable, it is unlikely that the security of any SIP system 1847 depends on the reliable ordering of targets. 1849 The specific security vulnerabilities, attacks and threat models of 1850 the various protocols mentioned in this document (SIP, DNS, SRV 1851 records, etc.) are well-documented in their respective 1852 specifications, and their effect on the security of SIP systems is 1853 unchanged. 1855 11. IANA Considerations 1857 This document does not require any actions by IANA. 1859 12. History 1861 Note to RFC Editor: Upon publication, please remove this section. 1863 12.1. Changes from draft-worley-sip-happy-earballs-00 to draft-worley- 1864 sip-happy-earballs-01 1866 Overhaul of the exposition. 1868 12.2. Changes from draft-worley-sip-he-connection-01 to draft-worley- 1869 sip-happy-earballs-00 1871 Complete overhaul. 1873 Changed "EarBalls" to "Earballs". 1875 12.3. Changes from draft-worley-sip-he-connection-00 to draft-worley- 1876 sip-he-connection-01 1878 Minor changes. 1880 Add note that WebSocket is out of scope, because there is only one 1881 possible transport in WebSocket. 1883 12.4. Changes from draft-johansson-sip-he-connection-01 to draft- 1884 worley-sip-he-connection-00 1886 This version has a different name for technical reasons. It is, in 1887 reality, the successor to draft-johansson-sip-he-connection-01. 1889 Move Acknowledgments after References, as that is the style the 1890 Editor prefers. 1892 Updated Security Considerations: This increment of the H.E. work does 1893 not make normative changes in existing SIP. 1895 Copy a lot of text from RFC 6555, as this I-D is parallel to RFC 1896 6555. 1898 Changed "hostname" to "host name", as the latter form is more common 1899 in RFCs by a moderate margin. 1901 Revised some of the introduction text to parallel the introduction of 1902 RFC 7984. 1904 Changed name of algorithm to "Happy EarBalls", added reference to 1905 Urban Dictionary. 1907 Many expansions of the discussion and revisions of the wording. 1909 13. References 1911 13.1. Normative References 1913 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1914 Requirement Levels", BCP 14, RFC 2119, 1915 DOI 10.17487/RFC2119, March 1997, 1916 . 1918 [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for 1919 specifying the location of services (DNS SRV)", RFC 2782, 1920 DOI 10.17487/RFC2782, February 2000, 1921 . 1923 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1924 A., Peterson, J., Sparks, R., Handley, M., and E. 1925 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1926 DOI 10.17487/RFC3261, June 2002, 1927 . 1929 [RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation 1930 Protocol (SIP): Locating SIP Servers", RFC 3263, 1931 DOI 10.17487/RFC3263, June 2002, 1932 . 1934 [RFC3581] Rosenberg, J. and H. Schulzrinne, "An Extension to the 1935 Session Initiation Protocol (SIP) for Symmetric Response 1936 Routing", RFC 3581, DOI 10.17487/RFC3581, August 2003, 1937 . 1939 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 1940 Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 1941 2012, . 1943 [RFC6724] Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown, 1944 "Default Address Selection for Internet Protocol Version 6 1945 (IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012, 1946 . 1948 [RFC7984] Johansson, O., Salgueiro, G., Gurbani, V., and D. Worley, 1949 Ed., "Locating Session Initiation Protocol (SIP) Servers 1950 in a Dual-Stack IP Network", RFC 7984, 1951 DOI 10.17487/RFC7984, September 2016, 1952 . 1954 13.2. Informative References 1956 [I-D.johansson-sip-he-connection] 1957 Johansson, O., Salgueiro, G., and D. Worley, "Setting up a 1958 SIP (Session Initiation Protocol) connection in a dual 1959 stack network using connection oriented transports", 1960 draft-johansson-sip-he-connection-01 (work in progress), 1961 October 2016. 1963 [RFC3319] Schulzrinne, H. and B. Volz, "Dynamic Host Configuration 1964 Protocol (DHCPv6) Options for Session Initiation Protocol 1965 (SIP) Servers", RFC 3319, DOI 10.17487/RFC3319, July 2003, 1966 . 1968 [RFC3361] Schulzrinne, H., "Dynamic Host Configuration Protocol 1969 (DHCP-for-IPv4) Option for Session Initiation Protocol 1970 (SIP) Servers", RFC 3361, DOI 10.17487/RFC3361, August 1971 2002, . 1973 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1974 for IPv6 Hosts and Routers", RFC 4213, 1975 DOI 10.17487/RFC4213, October 2005, 1976 . 1978 [RFC4436] Aboba, B., Carlson, J., and S. Cheshire, "Detecting 1979 Network Attachment in IPv4 (DNAv4)", RFC 4436, 1980 DOI 10.17487/RFC4436, March 2006, 1981 . 1983 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1984 (ICE): A Protocol for Network Address Translator (NAT) 1985 Traversal for Offer/Answer Protocols", RFC 5245, 1986 DOI 10.17487/RFC5245, April 2010, 1987 . 1989 [RFC5626] Jennings, C., Ed., Mahy, R., Ed., and F. Audet, Ed., 1990 "Managing Client-Initiated Connections in the Session 1991 Initiation Protocol (SIP)", RFC 5626, 1992 DOI 10.17487/RFC5626, October 2009, 1993 . 1995 [RFC6059] Krishnan, S. and G. Daley, "Simple Procedures for 1996 Detecting Network Attachment in IPv6", RFC 6059, 1997 DOI 10.17487/RFC6059, November 2010, 1998 . 2000 [RFC6947] Boucadair, M., Kaplan, H., Gilman, R., and S. 2001 Veikkolainen, "The Session Description Protocol (SDP) 2002 Alternate Connectivity (ALTC) Attribute", RFC 6947, 2003 DOI 10.17487/RFC6947, May 2013, 2004 . 2006 [UD] "The Jews Who Stole Christmas", , "Urban Dictionary, entry 2007 'Earballs'", December 2011, 2008 . 2010 Appendix A. Implementing Load Balancing 2012 Load-balancing is specified by the "weight" field of DNS SRV records. 2013 The defining algorithm is specified in [RFC2782]. The same result 2014 can be obtained with a simpler algorithm: For each server, calculate 2015 a "score": If its weight is 0, its score is "infinity" (in practice, 2016 100 suffices). If its weight is non-zero, its score is calculated by 2017 choosing a random number between 0 and 1, taking the negative of the 2018 logarithm of that number, and dividing the result by the weight. 2019 (Thus, the score is always a positive number.) (The resulting score 2020 has an exponential distribution whose parameter is the weight.) 2021 Then, sort the servers into order of increasing scores, so that the 2022 servers with the smallest scores are used first. 2024 This alternative algorithm is analyzed and sample implementations are 2025 provided in the files in the directory 2026 sipXrouter/sipXtackLib/doc/developer/scores in the GitHub Sipfoundry 2027 project (https://github.com/sipfoundry/sipXrouter), among other 2028 repositories. 2030 Acknowledgments 2032 The authors would like to acknowledge the support and contribution of 2033 the SIP Forum IPv6 Working Group. This document is based on a lot of 2034 tests and discussions at SIPit events, organized by the SIP Forum. 2036 The foundation of this document is the work done by Olle Johansson 2037 and Gonzalo Salgueiro in earlier documents, including 2038 [I-D.johansson-sip-he-connection]. In turn, the foundation of that 2039 work is [RFC6555], whose authors are Dan Wing and Andrew Yourtchenko. 2041 Scott O. Bradner suggested that the formula for determining 2042 responsiveness should contain a constant term. 2044 Roman Shpount described the need for configuration to override the 2045 source address selection mechanism. 2047 Tolga Asveren suggested requiring "rport". 2049 Author's Address 2051 Dale R. Worley 2052 Ariadne Internet Services 2053 738 Main St. 2054 Waltham, MA 02451 2055 US 2057 Email: worley@ariadne.com