idnits 2.17.1 draft-sparks-sip-nit-problems-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 413. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 390. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 397. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 403. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 260 has weird spacing: '...his can ultim...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Jan 2005) is 7041 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 7 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group R. Sparks 2 Internet-Draft Xten 3 Expires: July 2, 2005 Jan 2005 5 Problems identified associated with the Session Initiation Protocol's 6 non-INVITE Transaction 7 draft-sparks-sip-nit-problems-02 9 Status of this Memo 11 This document is an Internet-Draft and is subject to all provisions 12 of section 3 of RFC 3667. By submitting this Internet-Draft, each 13 author represents that any applicable patent or other IPR claims of 14 which he or she is aware have been or will be disclosed, and any of 15 which he or she become aware will be disclosed, in accordance with 16 RFC 3668. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as 21 Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on July 2, 2005. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 This draft describes several problems that have been identified with 43 the Session Initiation Protocol's non-INVITE transaction. 45 Table of Contents 47 1. Problems under the current specifications . . . . . . . . . . 3 48 1.1 NITs must complete immediately or risk losing a race . . . 3 49 1.2 Provisional responses can delay recovery from lost 50 final responses . . . . . . . . . . . . . . . . . . . . . 4 51 1.3 Delayed responses will temporarily blacklist an element . 5 52 1.4 408 for non-INVITE is not useful . . . . . . . . . . . . . 7 53 1.5 Non-INVITE timeouts doom forking proxies . . . . . . . . . 8 54 1.6 Mismatched timer values make winning the race harder . . . 8 55 2. Security Considerations . . . . . . . . . . . . . . . . . . . 9 56 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 57 4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 58 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 59 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9 60 Intellectual Property and Copyright Statements . . . . . . . . 10 62 1. Problems under the current specifications 64 There are a number of unpleasant edge conditions created by the SIP 65 non-INVITE transaction (NIT) model's fixed duration. The negative 66 aspects of some of these are exacerbated by the effect provisional 67 responses have on the non-INVITE transaction state machines as 68 currently defined. 70 1.1 NITs must complete immediately or risk losing a race 72 The non-INVITE transaction defined in RFC 3261 [1] is designed to 73 have a fixed and finite duration (dependent on T1). A consequence of 74 this design is that participants must strive to complete the 75 transaction as quickly as possible. Consider the race condition 76 shown in Figure 1. 78 UAC UAS 79 | request | 80 --- |---. | 81 ^ | `---. | 82 | | `-->| --- 83 | | | ^ 84 | | | | 85 64*T1 | | | 86 | | | | 87 | | | 64*T1 88 | | | | 89 | | | | 90 v | | | 91 timeout <=== --- | 200 OK | | 92 | .---| v 93 | .---' | --- 94 |<--' | 96 Figure 1: NI Race Condition 98 The User Agent Server (UAS) in this figure believes it has responded 99 to the request in time, and that the request succeeded. The User 100 Agent Client (UAC), on the other hand, believes the request has 101 timed-out, hence failed. No longer having a matching client 102 transaction, the UAC core will ignore what it believes to be a 103 spurious response. As far as the UAC is concerned, it received no 104 response at all to its request. The ultimate result is the UAS and 105 UAC have conflicting views of the outcome of the transaction. 107 Therefore, a UAS cannot wait until the last possible moment to send a 108 final response within a NIT. It must, instead, send its response so 109 that it will arrive at the UAC before that UAC times out. 110 Unfortunately, the UAS has no way to accurately measure the 111 propagation time of the request or predict the propagation time of 112 the response. The uncertainty it faces is compounded by each proxy 113 that participates in the transaction. Thus, the UAS's only choice is 114 to send its final response as soon as it possibly can and hope for 115 the best. 117 This result constrains the set of problems that can be solved with a 118 single NIT. Any delay introduced during processing of a request 119 increases the probability of losing the race. If the timing 120 characteristics of that processing are not predictable and 121 controllable, a single NIT is an inappropriate model for handling the 122 request. One viable alternative is to accept the request with a 202 123 and send the ultimate results in a new request in the reciprocal 124 direction. 126 In specialized networks, a UAS might have some reliable knowledge of 127 inter-hop latency and could use that knowledge to determine if it has 128 time to delay its final response in order to perform some processing 129 such as a database lookup while mitigating its risk of losing the 130 race in Figure 1. Establishing this knowledge across arbitrary 131 networks (perhaps using resource reservation techniques and 132 deterministic transports) is not currently feasible. 134 1.2 Provisional responses can delay recovery from lost final responses 136 The non-INVITE client transaction state machine provides reliability 137 for NITs over unreliable transports (UDP) through retransmission of 138 the request message. Timer E is set to T1 when a request is 139 initially transmitted. As long as the machine remains in the Trying 140 state, each time Timer E fires, it will be reset to twice its 141 previous value (capping at T2) and the request is retransmitted. 143 If the non-INVITE client transaction state machine sees a provisional 144 response, it transitions to the Proceeding state, where 145 retransmission continues, but the algorithm for resetting Timer E is 146 simply to use T2 instead of doubling at each firing. (Note that 147 Timer E is not altered during the transition to Proceeding). 149 Making the transition to the Proceeding state before Timer E is reset 150 to T2 can cause recovery from a lost final response to take extra 151 time. Figure 2 shows recovery from a lost final response with and 152 without a provisional message during this window. Recovery occurs 153 within 2*T1 in the case without the provisional. With the 154 provisional, recovery is delayed until T2, which by default is 8*T1. 156 In practical terms, a provisional response to a NIT in currently 157 deployed networks can delay transaction completion by up to 3.5 158 seconds. 160 UAC UAS UAC UAS 161 | | | | 162 --- |----. | --- |----. | 163 ^ | `-->| ^ | `--->| 164 E = T1 | | E = T1 | .-----|(provisional) 165 v | | v |<--' | 166 --- |----. | --- |----. | 167 ^ | `-->| ^ | `--->| 168 | | X<----|(lost final) | | X<-----|(lost final) 169 | | | | | | 170 E = 2*T1 | | | | | 171 | | | | | | 172 | | | | | | 173 v | | | | | 174 --- |----. | | | | 175 | `-->| | | | 176 | .-----|(final) | | | 177 |<-' | | | | 178 | | | | | 179 \/\ /\/ /\/ /\/ /\/ 180 E = T2 181 \/\ /\/ /\/ /\/ /\/ 182 | | | | | 183 | | v | | 184 | | --- |----. | 185 | | | `--->| 186 | | | .-----|(final) 187 | | |<--' | 188 | | | | 190 Figure 2: Provisionals can harm recovery 192 No additional delay is introduced if the first provisional response 193 is received after Timer E has reached its maximum reset interval of 194 T2. 196 1.3 Delayed responses will temporarily blacklist an element 198 A SIP element's use of DNS SRV Resource Records [3] is specified in 199 RFC 3263 [2]. That specification discusses how SIP assures high 200 availability by having upstream elements detect failure of downstream 201 elements. It proceeds to define several types of failure detection 202 and instructions for failover. Two of the behaviors it describes are 203 important to this document: 205 o Within a transaction, transport failure is detected either through 206 an explicit report from the transport layer or through timeout. 207 Note specifically that timeout will indicates transport failure 208 regardless of the transport in use. When transport failure is 209 detected, the request is retried at the next element from the 210 sorted results of the SRV query. 212 o Between transactions, locations reporting temporary failure 213 (through 503/Retry-After for example) are not used until their 214 requested black-out period expires. 216 The specification notes the benefit of caching locations that are 217 successfully contacted, but does not discuss how such a cache is 218 maintained. It is unclear whether an element should stop using 219 (temporarily blacklist) a location returned in the SRV query that 220 results in a transport error. If it does, when should such a 221 location be removed from the blacklist? 223 Without such a blacklist (or equivalent mechanism), the intended 224 availability mechanism fails miserably. Consider traffic between two 225 domains. Proxy pA in domain A needs to forward a sequence of 226 non-INVITE requests to domain B. Through DNS SRV, pA discovers pB1 227 and pB2, and the ordering rules of [2] and [3] indicate it should use 228 pB1 first. The first request to pB1 times out. Since pA is a proxy 229 and a NIT has a fixed duration, pA has no opportunity to retry the 230 request at pB2. If pA does not remember pB1's failure, the second 231 request (and all subsequent non-INVITE requests until pB1 recovers) 232 are doomed to the same failure. Caching would allow the subsequent 233 requests to be tried at pB2. 235 Since miserable failure is not acceptable in deployed networks, we 236 should anticipate that elements will, in fact, cache timeout failures 237 between transactions. Then the race in Figure 1 becomes important. 238 If an element fails to respond "soon enough", it has effectively not 239 responded at all, and will be blacklisted at its peer for some period 240 of time. 242 (Note that even with caching, the first request timeout results in a 243 timeout failure all the way back to the original submitter. The 244 failover mechanisms in [2] work well to increase the resiliency of a 245 given INVITE transaction, but do nothing for a given non-INVITE 246 transaction.) 248 1.4 408 for non-INVITE is not useful 250 Consider the race condition in Figure 1 when the final response is 251 408 instead of 200. Under the current specification, the race is 252 guaranteed to be lost. Most existing endpoints will emit a 408 for a 253 non-INVITE request 64*T1 after receiving the request if they haven't 254 emitted an earlier final response. Such a 408 is guaranteed to 255 arrive at the next upstream element too late to be useful. In fact, 256 in the presence of proxies, these messages are even harmful. When 257 the 408 arrives, each proxy will have already terminated its 258 associated client transaction due to timeout. So, each proxy must 259 forward the 408 upstream statelessly. This, in turn, is guaranteed 260 to arrive too late. As Figure 3 shows, this can ultimately result 261 in bombarding the original requester with spurious 408s. (Note that 262 the proxy's client transaction state machine never enters the 263 Completed state, so Timer K does not enter into play). 265 UAC P1 P2 P3 UAS 266 | | | | | 267 --- ===---. | | | | 268 ^ | `-->===---. | | | 269 | | | `-->===---. | | 270 | | | | `-->===---. | 271 64*T1 | | | | `-->=== 272 | | | | | | 273 | | | | | | 274 v | | | | | 275 (timeout) --- === | | | | 276 | .-408=== | | | 277 |<--' | .-408=== | | 278 | .-408-|<--' | .-408=== | 279 |<--' | .-408-|<--' | .-408=== 280 | .-408-|<--' | .-408-|<--' | 281 |<--' | .-408-|<--' | | 282 | .-408-|<--' | | | 283 |<--' | | | | 284 | | | | | 286 Figure 3: late 408s to non-INVITEs 288 This response bombardment is not limited to the 408 response, though 289 it only exists when participating client transaction state machines 290 are timing out. Figure 4 generalizes Figure 1 to include multiple 291 hops. Note that even though the UAS responds "in time" to P3, the 292 response is too late for P2, P1 and the UAC. 294 UAC P1 P2 P3 UAS 295 | | | | | 296 --- ===---. | | | | 297 ^ | `-->===---. | | | 298 | | | `-->===---. | | 299 | | | | `-->===---. | 300 64*T1 | | | | `-->=== 301 | | | | | | 302 | | | | | | 303 v | | | | | 304 (timeout) --- === | | | | 305 | .-408=== | | .-200-| 306 |<--' | .-408=== .-200-|<--' | 307 | .-408-|<--'.-200-|<--' === | 308 |<--'.-200-|<--' | | === 309 |<--' | | | | 310 | | | | | 312 Figure 4: Additional timeout related error 314 1.5 Non-INVITE timeouts doom forking proxies 316 A single branch with a delayed or missing final response will 317 dominate the processing at proxy that receives no 2xx responses to a 318 forked non-INVITE request. Since this proxy is required to allow all 319 of its client transactions to terminate before choosing a "best 320 response". This forces the proxy's server transaction to lose the 321 race in Figure 1. Any response it ultimately forwards (a 401 for 322 example) will arrive at the upstream elements too late to be used. 323 Thus, if no element among the branches would return a 2xx response, 324 failure of a single element (or its transport) dooms the proxy to 325 failure. 327 1.6 Mismatched timer values make winning the race harder 329 There are many failure scenarios due to misconfiguration or 330 misbehavior that the SIP specification does not discuss. One is 331 placing two elements with different configured values for T1 and T2 332 on the same network. Review of Figure 1 illustrates that the race 333 failure is only made more likely in this misconfigured state (it may 334 appear that shortening T1 at the element behaving as a UAS improves 335 this particular situation, but remember that these elements may trade 336 roles on the next request). Since the protocol provides no mechanism 337 for discovering/negotiating a peer's timer values, exceptional care 338 must be taken when deploying systems with non-defaults to ensure they 339 will _never_ directly communicate with elements with default values. 341 2. Security Considerations 343 This document describes problems with the SIP non-INVITE transaction, 344 including mentioning potential security vulnerabilities. It does not 345 make any changes to the SIP protocol. 347 3. IANA Considerations 349 This document requires no action by IANA. 351 4. Acknowledgments 353 This document captures many conversations about non-INVITE issues. 354 Significant contributers include Ben Campbell, Gonzalo Camarillo, 355 Steve Donovan, Rohan Mahy, Dan Petrie, Adam Roach, Jonathan 356 Rosenberg, and Dean Willis. 358 5 References 360 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 361 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 362 Session Initiation Protocol", RFC 3261, June 2002. 364 [2] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol 365 (SIP): Locating SIP Servers", RFC 3263, June 2002. 367 [3] Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for 368 specifying the location of services (DNS SRV)", RFC 2782, 369 February 2000. 371 Author's Address 373 Robert J. Sparks 374 Xten 375 5100 Tennyson Parkway 376 Suite 1000 377 Plano, TX 75024 379 EMail: rsparks@xten.com 381 Intellectual Property Statement 383 The IETF takes no position regarding the validity or scope of any 384 Intellectual Property Rights or other rights that might be claimed to 385 pertain to the implementation or use of the technology described in 386 this document or the extent to which any license under such rights 387 might or might not be available; nor does it represent that it has 388 made any independent effort to identify any such rights. Information 389 on the procedures with respect to rights in RFC documents can be 390 found in BCP 78 and BCP 79. 392 Copies of IPR disclosures made to the IETF Secretariat and any 393 assurances of licenses to be made available, or the result of an 394 attempt made to obtain a general license or permission for the use of 395 such proprietary rights by implementers or users of this 396 specification can be obtained from the IETF on-line IPR repository at 397 http://www.ietf.org/ipr. 399 The IETF invites any interested party to bring to its attention any 400 copyrights, patents or patent applications, or other proprietary 401 rights that may cover technology that may be required to implement 402 this standard. Please address the information to the IETF at 403 ietf-ipr@ietf.org. 405 Disclaimer of Validity 407 This document and the information contained herein are provided on an 408 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 409 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 410 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 411 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 412 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 413 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 415 Copyright Statement 417 Copyright (C) The Internet Society (2005). This document is subject 418 to the rights, licenses and restrictions contained in BCP 78, and 419 except as set forth therein, the authors retain all their rights. 421 Acknowledgment 423 Funding for the RFC Editor function is currently provided by the 424 Internet Society.