idnits 2.17.1 draft-ietf-mmusic-ice-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 3688. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3665. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3672. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3678. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of too long lines in the document, the longest one being 11 characters in excess of 72. == There are 17 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 16, 2007) is 6281 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4234 (ref. '8') (Obsoleted by RFC 5234) ** Obsolete normative reference: RFC 4566 (ref. '10') (Obsoleted by RFC 8866) == Outdated reference: A later version (-18) exists of draft-ietf-behave-rfc3489bis-05 == Outdated reference: A later version (-16) exists of draft-ietf-behave-turn-02 == Outdated reference: A later version (-02) exists of draft-ietf-sip-ice-option-tag-00 -- Obsolete informational reference (is this intentional?): RFC 3489 (ref. '14') (Obsoleted by RFC 5389) == Outdated reference: A later version (-07) exists of draft-ietf-mmusic-connectivity-precon-02 == Outdated reference: A later version (-04) exists of draft-ietf-avt-rtp-no-op-00 == Outdated reference: A later version (-20) exists of draft-ietf-sip-outbound-07 -- No information found for draft-ietf-mmusic-ice-lite - is the name correct? Summary: 6 errors (**), 0 flaws (~~), 10 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MMUSIC J. Rosenberg 3 Internet-Draft Cisco Systems 4 Expires: July 20, 2007 January 16, 2007 6 Interactive Connectivity Establishment (ICE): A Methodology for Network 7 Address Translator (NAT) Traversal for Offer/Answer Protocols 8 draft-ietf-mmusic-ice-13 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on July 20, 2007. 35 Copyright Notice 37 Copyright (C) The Internet Society (2007). 39 Abstract 41 This document describes a protocol for Network Address Translator 42 (NAT) traversal for multimedia session signaling protocols based on 43 the offer/answer model, such as the Session Initiation Protocol 44 (SIP). This protocol is called Interactive Connectivity 45 Establishment (ICE). ICE makes use of the Session Traversal 46 Utilities for NAT (STUN) protocol, applying its binding discovery and 47 relay usages, in addition to defining a new usage for checking 48 connectivity between peers. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 2. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.1. Gathering Candidate Addresses . . . . . . . . . . . . . . 7 55 2.2. Connectivity Checks . . . . . . . . . . . . . . . . . . . 9 56 2.3. Sorting Candidates . . . . . . . . . . . . . . . . . . . . 10 57 2.4. Frozen Candidates . . . . . . . . . . . . . . . . . . . . 11 58 2.5. Security for Checks . . . . . . . . . . . . . . . . . . . 11 59 2.6. Concluding ICE . . . . . . . . . . . . . . . . . . . . . . 12 60 2.7. Lite Implementations . . . . . . . . . . . . . . . . . . . 13 61 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 13 62 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 16 63 4.1. Full Implementation Requirements . . . . . . . . . . . . . 16 64 4.1.1. Gathering Candidates . . . . . . . . . . . . . . . . . 16 65 4.1.2. Prioritizing Candidates . . . . . . . . . . . . . . . 18 66 4.1.3. Choosing In-Use Candidates . . . . . . . . . . . . . . 20 67 4.2. Lite Implementation . . . . . . . . . . . . . . . . . . . 20 68 4.3. Encoding the SDP . . . . . . . . . . . . . . . . . . . . . 21 69 5. Receiving the Initial Offer . . . . . . . . . . . . . . . . . 22 70 5.1. Verifying ICE Support . . . . . . . . . . . . . . . . . . 23 71 5.2. Determining Role . . . . . . . . . . . . . . . . . . . . . 23 72 5.3. Gathering Candidates . . . . . . . . . . . . . . . . . . . 24 73 5.4. Prioritizing Candidates . . . . . . . . . . . . . . . . . 24 74 5.5. Choosing In Use Candidates . . . . . . . . . . . . . . . . 24 75 5.6. Encoding the SDP . . . . . . . . . . . . . . . . . . . . . 24 76 5.7. Forming the Check Lists . . . . . . . . . . . . . . . . . 24 77 5.8. Performing Periodic Checks . . . . . . . . . . . . . . . . 27 78 6. Receipt of the Initial Answer . . . . . . . . . . . . . . . . 28 79 6.1. Verifying ICE Support . . . . . . . . . . . . . . . . . . 28 80 6.2. Determining Role . . . . . . . . . . . . . . . . . . . . . 28 81 6.3. Forming the Check List . . . . . . . . . . . . . . . . . . 28 82 6.4. Performing Periodic Checks . . . . . . . . . . . . . . . . 28 83 7. Connectivity Checks . . . . . . . . . . . . . . . . . . . . . 28 84 7.1. Client Procedures . . . . . . . . . . . . . . . . . . . . 29 85 7.1.1. Sending the Request . . . . . . . . . . . . . . . . . 29 86 7.1.2. Processing the Response . . . . . . . . . . . . . . . 30 87 7.2. Server Procedures . . . . . . . . . . . . . . . . . . . . 31 88 7.2.1. Additional Procedures for Full Implementations . . . . 32 89 7.2.2. Additional Procedures for Lite Implementations . . . . 34 90 8. Concluding ICE . . . . . . . . . . . . . . . . . . . . . . . . 34 91 9. Subsequent Offer/Answer Exchanges . . . . . . . . . . . . . . 35 92 9.1. Generating the Offer . . . . . . . . . . . . . . . . . . . 35 93 9.1.1. Additional Procedures for Full Implementations . . . . 36 94 9.1.2. Additional Procedures for Lite Implementations . . . . 37 95 9.2. Receiving the Offer and Generating an Answer . . . . . . . 37 96 9.2.1. Additional Procedures for Full Implementations . . . . 38 97 9.3. Updating the Check and Valid Lists . . . . . . . . . . . . 38 98 9.3.1. Additional Procedures for Full Implementations . . . . 38 99 10. Keepalives . . . . . . . . . . . . . . . . . . . . . . . . . . 40 100 11. Media Handling . . . . . . . . . . . . . . . . . . . . . . . . 41 101 11.1. Sending Media . . . . . . . . . . . . . . . . . . . . . . 41 102 11.1.1. Procedures for Full Implementations . . . . . . . . . 41 103 11.1.2. Procedures for Lite Implementations . . . . . . . . . 42 104 11.2. Receiving Media . . . . . . . . . . . . . . . . . . . . . 42 105 12. Usage with SIP . . . . . . . . . . . . . . . . . . . . . . . . 42 106 12.1. Latency Guidelines . . . . . . . . . . . . . . . . . . . . 42 107 12.2. SIP Option Tags and Media Feature Tags . . . . . . . . . . 44 108 12.3. Interactions with Forking . . . . . . . . . . . . . . . . 44 109 12.4. Interactions with Preconditions . . . . . . . . . . . . . 45 110 12.5. Interactions with Third Party Call Control . . . . . . . . 45 111 13. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 112 14. Extensibility Considerations . . . . . . . . . . . . . . . . . 48 113 15. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 114 16. Security Considerations . . . . . . . . . . . . . . . . . . . 54 115 16.1. Attacks on Connectivity Checks . . . . . . . . . . . . . . 54 116 16.2. Attacks on Address Gathering . . . . . . . . . . . . . . . 57 117 16.3. Attacks on the Offer/Answer Exchanges . . . . . . . . . . 57 118 16.4. Insider Attacks . . . . . . . . . . . . . . . . . . . . . 57 119 16.4.1. The Voice Hammer Attack . . . . . . . . . . . . . . . 58 120 16.4.2. STUN Amplification Attack . . . . . . . . . . . . . . 58 121 16.5. Interactions with Application Layer Gateways and SIP . . . 59 122 17. Definition of Connectivity Check Usage . . . . . . . . . . . . 59 123 17.1. Applicability . . . . . . . . . . . . . . . . . . . . . . 60 124 17.2. Client Discovery of Server . . . . . . . . . . . . . . . . 60 125 17.3. Server Determination of Usage . . . . . . . . . . . . . . 60 126 17.4. New Requests or Indications . . . . . . . . . . . . . . . 60 127 17.5. New Attributes . . . . . . . . . . . . . . . . . . . . . . 60 128 17.6. New Error Response Codes . . . . . . . . . . . . . . . . . 61 129 17.7. Client Procedures . . . . . . . . . . . . . . . . . . . . 61 130 17.8. Server Procedures . . . . . . . . . . . . . . . . . . . . 61 131 17.9. Security Considerations for Connectivity Check . . . . . . 61 132 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 61 133 18.1. SDP Attributes . . . . . . . . . . . . . . . . . . . . . . 61 134 18.1.1. candidate Attribute . . . . . . . . . . . . . . . . . 61 135 18.1.2. remote-candidates Attribute . . . . . . . . . . . . . 62 136 18.1.3. ice-lite Attribute . . . . . . . . . . . . . . . . . . 62 137 18.1.4. ice-mismatch Attribute . . . . . . . . . . . . . . . . 63 138 18.1.5. ice-pwd Attribute . . . . . . . . . . . . . . . . . . 63 139 18.1.6. ice-ufrag Attribute . . . . . . . . . . . . . . . . . 63 140 18.1.7. ice-options Attribute . . . . . . . . . . . . . . . . 64 141 18.2. STUN Attributes . . . . . . . . . . . . . . . . . . . . . 64 142 19. IAB Considerations . . . . . . . . . . . . . . . . . . . . . . 65 143 19.1. Problem Definition . . . . . . . . . . . . . . . . . . . . 65 144 19.2. Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 65 145 19.3. Brittleness Introduced by ICE . . . . . . . . . . . . . . 66 146 19.4. Requirements for a Long Term Solution . . . . . . . . . . 67 147 19.5. Issues with Existing NAPT Boxes . . . . . . . . . . . . . 67 148 20. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68 149 21. References . . . . . . . . . . . . . . . . . . . . . . . . . . 68 150 21.1. Normative References . . . . . . . . . . . . . . . . . . . 68 151 21.2. Informative References . . . . . . . . . . . . . . . . . . 69 152 Appendix A. Lite and Full Implementations . . . . . . . . . . . . 71 153 Appendix B. Design Motivations . . . . . . . . . . . . . . . . . 71 154 B.1. Pacing of STUN Transactions . . . . . . . . . . . . . . . 72 155 B.2. Candidates with Multiple Bases . . . . . . . . . . . . . . 72 156 B.3. Purpose of the Translation . . . . . . . . . . . . . . . . 74 157 B.4. Importance of the STUN Username . . . . . . . . . . . . . 74 158 B.5. The Candidate Pair Sequence Number Formula . . . . . . . . 75 159 B.6. The Frozen State . . . . . . . . . . . . . . . . . . . . . 76 160 B.7. The remote-candidates attribute . . . . . . . . . . . . . 76 161 B.8. Why are Keepalives Needed? . . . . . . . . . . . . . . . . 77 162 B.9. Why Prefer Peer Reflexive Candidates? . . . . . . . . . . 78 163 B.10. Why Send an Updated Offer? . . . . . . . . . . . . . . . . 78 164 B.11. Why are Binding Indications Used for Keepalives? . . . . . 78 165 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 80 166 Intellectual Property and Copyright Statements . . . . . . . . . . 81 168 1. Introduction 170 RFC 3264 [4] defines a two-phase exchange of Session Description 171 Protocol (SDP) messages [10] for the purposes of establishment of 172 multimedia sessions. This offer/answer mechanism is used by 173 protocols such as the Session Initiation Protocol (SIP) [3]. 175 Protocols using offer/answer are difficult to operate through Network 176 Address Translators (NAT). Because their purpose is to establish a 177 flow of media packets, they tend to carry IP addresses within their 178 messages, which is known to be problematic through NAT [15]. The 179 protocols also seek to create a media flow directly between 180 participants, so that there is no application layer intermediary 181 between them. This is done to reduce media latency, decrease packet 182 loss, and reduce the operational costs of deploying the application. 183 However, this is difficult to accomplish through NAT. A full 184 treatment of the reasons for this is beyond the scope of this 185 specification. 187 Numerous solutions have been proposed for allowing these protocols to 188 operate through NAT. These include Application Layer Gateways 189 (ALGs), the Middlebox Control Protocol [16], Simple Traversal 190 Underneath NAT (STUN) [14] and its revision, retitled Session 191 Traversal Utilities for NAT [11], the STUN Relay Usage [12], and 192 Realm Specific IP [18] [19] along with session description extensions 193 needed to make them work, such as the Session Description Protocol 194 (SDP) [10] attribute for the Real Time Control Protocol (RTCP) [2]. 195 Unfortunately, these techniques all have pros and cons which make 196 each one optimal in some network topologies, but a poor choice in 197 others. The result is that administrators and implementors are 198 making assumptions about the topologies of the networks in which 199 their solutions will be deployed. This introduces complexity and 200 brittleness into the system. What is needed is a single solution 201 which is flexible enough to work well in all situations. 203 This specification provides that solution for media streams 204 established by signaling protocols based on the offer-answer model. 205 It is called Interactive Connectivity Establishment, or ICE. ICE 206 makes use of STUN and its relay extension, commonly called TURN, but 207 uses them in a specific methodology which avoids many of the pitfalls 208 of using any one alone. 210 2. Overview of ICE 212 In a typical ICE deployment, we have two endpoints (known as agents 213 in RFC 3264 terminology) which want to communicate. They are able to 214 communicate indirectly via some signaling system such as SIP, by 215 which they can perform an offer/answer exchange of SDP [4] messages. 216 Note that ICE is not intended for NAT traversal for SIP, which is 217 assumed to be provided via some other mechanism [32]. At the 218 beginning of the ICE process, the agents are ignorant of their own 219 topologies. In particular, they might or might not be behind a NAT 220 (or multiple tiers of NATs). ICE allows the agents to discover 221 enough information about their topologies to find a path or paths by 222 which they can communicate. 224 Figure Figure 1 shows a typical environment for ICE deployment. The 225 two endpoints are labelled L and R (for left and right, which helps 226 visualize call flows). Both L and R are behind NATs -- though as 227 mentioned before, they don't know that. The type of NAT and its 228 properties are also unknown. Agents L and R are capable of engaging 229 in an offer/answer exchange by which they can exchange SDP messages, 230 whose purpose is to set up a media session between L and R. 231 Typically, this exchange will occur through a SIP server. 233 In addition to the agents, a SIP server and NATs, ICE is typically 234 used in concert with STUN servers in the network. Each agent can 235 have its own STUN server, or they can be the same. 237 +-------+ 238 | SIP | 239 +-------+ | Srvr | +-------+ 240 | STUN | | | | STUN | 241 | Srvr | +-------+ | Srvr | 242 | | / \ | | 243 +-------+ / \ +-------+ 244 / \ 245 / \ 246 / \ 247 / \ 248 / <- Signalling -> \ 249 / \ 250 / \ 251 +--------+ +--------+ 252 | NAT | | NAT | 253 +--------+ +--------+ 254 / \ 255 / \ 256 / \ 257 +-------+ +-------+ 258 | Agent | | Agent | 259 | L | | R | 260 | | | | 261 +-------+ +-------+ 263 Figure 1 265 The basic idea behind ICE is as follows: each agent has a variety of 266 candidate transport addresses it could use to communicate with the 267 other agent. These might include: 269 o It's directly attached network interface (or interfaces in the 270 case of a multihomed machine 272 o A translated address on the public side of a NAT (a "server 273 reflexive" address) 275 o The address of a media relay the agent is using. 277 Potentially, any of L's candidate transport addresses can be used to 278 communicate with any of R's candidate transport addresses. In 279 practice, however, many combinations will not work. For instance, if 280 L and R are both behind NATs then their directly interface addresses 281 are unlikely to be able to communicate directly (this is why ICE is 282 needed, after all!). The purpose of ICE is to discover which pairs 283 of addresses will work. The way that ICE does this is to 284 systematically try all possible pairs (in a carefully sorted order) 285 until it finds one or more that works. 287 2.1. Gathering Candidate Addresses 289 In order to execute ICE, an agent has to identify all of its address 290 candidates. Naturally, one viable candidate is one obtained directly 291 from a local interface the client has towards the network. Such a 292 candidate is called a HOST CANDIDATE. The local interface could be 293 one on a local layer 2 network technology, such as ethernet or WiFi, 294 or it could be one that is obtained through a tunnel mechanism, such 295 as a Virtual Private Network (VPN) or Mobile IP (MIP). In all cases, 296 these appear to the agent as a local interface from which ports (and 297 thus a candidate) can be allocated. 299 If an agent is multihomed, it can obtain a candidate from each 300 interface. Depending on the location of the peer on the IP network 301 relative to the agent, the agent may be reachable by the peer through 302 one of those interfaces, or through another. Consider, for example, 303 an agent which has a local interface to a private net 10 network, and 304 also to the public Internet. A candidate from the net10 interface 305 will be directly reachable when communicating with a peer on the same 306 private net 10 network, while a candidate from the public interface 307 will be directly reachable when communicating with a peer on the 308 public Internet. Rather than trying to guess which interface will 309 work prior to sending an offer, the offering agent includes both 310 candidates in its offer. 312 Once the agent has obtained host candidates, it uses STUN to obtain 313 additional candidates. These come in two flavors: translated 314 addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES) 315 and addresses of media relays (RELAYED CANDIDATES). The relationship 316 of these candidates to the host candidate is shown in Figure 2. Both 317 types of candidates are discovered using STUN. 319 To Internet 321 | 322 | 323 | /------------ Relayed 324 | / Candidate 325 +--------+ 326 | | 327 | STUN | 328 | Server | 329 | | 330 +--------+ 331 | 332 | 333 | /------------ Server 334 |/ Reflexive 335 +------------+ Candidate 336 | NAT | 337 +------------+ 338 | 339 | /------------ Host 340 |/ Candidate 341 +--------+ 342 | | 343 | Agent | 344 | | 345 +--------+ 347 Figure 2 349 To find a server reflexive candidate, the agent sends a STUN Binding 350 Request, using the Binding Discovery Usage [11] from each host 351 candidate, to its STUN server. (It is assumed that the address of 352 the STUN server is configured, or learned in some way.) When the 353 agent sends the Binding Request, the NAT (assuming there is one) will 354 allocate a binding, mapping this server reflexive candidate to the 355 host candidate. Outgoing packets sent from the host candidate will 356 be translated by the NAT to the server reflexive candidate. Incoming 357 packets sent to the server relexive candidate will be translated by 358 the NAT to the host candidate and forwarded to the agent. We call 359 the host candidate associated with a given server reflexive candidate 360 the BASE. 362 Note 364 "Base" refers to the address you'd send from for a particular 365 candidate. Thus, as a degenerate case host candidates also have a 366 base, but it's the same as the host candidate. 368 When there are multiple NATs between the agent and the STUN server, 369 the STUN request will create a binding on each NAT, but only the 370 outermost server reflexive candidate will be discovered by the agent. 371 If the agent is not behind a NAT, then the base candidate will be the 372 same as the server reflexive candidate and the server reflexive 373 candidate can be ignored. 375 The final type of candidate is a RELAYED candidate. The STUN Relay 376 Usage [12] allows a STUN server to act as a media relay, forwarding 377 traffic between L and R. In order to send traffic to L, R sends 378 traffic to the media relay which forwards it to L and vice versa. 379 The same thing happens in the other direction. 381 Traffic from L to R has its addresses rewritten twice: first by the 382 NAT and second by the STUN relay server. Thus, the address that R 383 knows about and the one that it wants to send to is the one on the 384 STUN relay server. This address is the final kind of candidate, 385 which we call a RELAYED CANDIDATE. 387 2.2. Connectivity Checks 389 Once L has gathered all of its candidates, it orders them highest to 390 lowest priority and sends them to R over the signalling channel. The 391 candidates are carried in attributes in the SDP offer. When R 392 receives the offer, it performs the same gathering process and 393 responds with its own list of candidates. At the end of this 394 process, each agent has a complete list of both its candidates and 395 its peer's candidates and is ready to perform connectivity checks by 396 pairing up the candidates to see which pair works. 398 The basic principle of the connectivity checks is simple: 400 1. Sort the candidate pairs in priority order. 402 2. Send checks on each candidate pair in priority order. 404 3. Acknowledge checks received from the other agent. 406 A complete connectivity check for a single candidate pair is a simple 407 4-message handshake: 409 L R 410 - - 411 STUN request -> \ L's 412 <- STUN response / check 414 <- STUN request \ R's 415 STUN response -> / check 417 Figure 3 419 As an optimization, as soon as R gets L's check message he 420 immediately sends his own check message to L on the same candidate 421 pair. This accelerates the process of finding a valid candidate, and 422 is called a triggered check. 424 At the end of this handshake, both L and R know that they can send 425 (and receive) messages end-to-end in both directions. 427 2.3. Sorting Candidates 429 Because the algorithm above searches all candidate pairs, if a 430 working pair exists it will eventually find it no matter what order 431 the candidates are tried in. In order to produce faster (and better) 432 results, the candidates are sorted in a specified order. The 433 algorithm is described in Section 4.1.2 but follows two general 434 principles: 436 o Each agent gives its candidates a numeric priority which is sent 437 along with the candidate to the peer 439 o The local and remote priorities are combined so that each agent 440 has the same ordering for the candidate pairs. 442 The second property is important for getting ICE to work when there 443 are NATs in front of A and B. Frequently, NATs will not allow packets 444 in from a host until the agent behind the NAT has sent a packet 445 towards that host. Consequently, ICE checks in each direction will 446 not succeed until both sides have sent a check through their 447 respective NATs. 449 In general the priority algorithm is designed so that candidates of 450 similar type get similar priorities and so that more direct routes 451 are preferred over indirect ones. Within those guidelines, however, 452 agents have a fair amount of discretion about how to tune their 453 algorithms. 455 2.4. Frozen Candidates 457 The previous description only addresses the case where the agents 458 wish to establish a single media component--i.e., a single flow with 459 a single host-port quartet. However, in many cases (in particular 460 RTP and RTCP) the agents actually need to establish connectivity for 461 more than one flow. 463 The naive way to attack this problem would be to simply do 464 independent ICE exchanges for each media component. This is 465 obviously inefficient because the network properties are likely to be 466 very similar for each component (especially because RTP and RTCP are 467 typically run on adjacent ports). Thus, it should be possible to 468 leverage information from one media component in order to determine 469 the best candidates for another. ICE does this with a mechanism 470 called "frozen candidates." 472 The basic principle behind frozen candidates is that initially only 473 the candidates for a single media component are tested. The other 474 media components are marked "frozen". When the connectivity checks 475 for the first component succeed, the corresponding candidates for the 476 other components are unfrozen and checked immediately. This avoids 477 repeated checking of components which are superficially more 478 attractive but in fact are likely to fail. 480 While we've described "frozen" here as a separate mechanism for 481 expository purposes, in fact it is an integral part of ICE and the 482 the ICE prioritization algorithm automatically ensures that the right 483 candidates are unfrozen and checked in the right order. 485 2.5. Security for Checks 487 Because ICE is used to discover which addresses can be used to send 488 media between two agents, it is important to ensure that the process 489 cannot be hijacked to send media to the wrong location. Each STUN 490 connectivity check is covered by a message authentication code (MAC) 491 computed using a key exchanged in the signalling channel. This MAC 492 provides message integrity and data origin authentication, thus 493 stopping an attacker from forging or modifying connectivity check 494 messages. The MAC also aids in disambiguating ICE exchanges from 495 forked calls. 497 2.6. Concluding ICE 499 ICE checks are performed in a specific sequence, so that high 500 priority pairs are checked first, followed by lower priority ones. 501 One way to conclude ICE is to declare victory as soon as a check for 502 each component of each media stream completes successfully. Indeed, 503 this is a reasonable algorithm, and details for it are provided 504 below. However, it is possible that packet losses will cause a 505 higher priority check to take longer to complete, and allowing ICE to 506 run a little longer might produce better results. More 507 fundamentally, however, the prioritization defined by this 508 specification may not yield "optimal" results. As an example, if the 509 aim is to select low latency media paths, usage of a relay is a hint 510 that latencies may be higher, but it is nothing more than a hint. An 511 actual RTT measurement could be made, and it might demonstrate that a 512 pair with lower priority is actually better than one with higher 513 priority. 515 Consequently, ICE assigns one of the agents in the role of the 516 controlling agent, and the other of the controlled agent. The 517 controlling agent runs a selection algorithm, through which it can 518 decide when to conclude ICE checks, and which pairs get selected. 519 The one that is selected is called the favored candidate pair. When 520 a controlling agent selects a pair for a particular component of a 521 media stream, it generates a check for that pair and includes a flag 522 in the check indicating that the pair has been selected. If the 523 controlled agent has already performed in a check in the reverse 524 direction that succeeded, the controlled agent considers ICE 525 processing to be concluded for that component. Once there is a 526 selected pair for each component of a media stream, the ICE checks 527 for that media stream are considered to be completed. At this point, 528 further checks stop for that media stream - ICE is considered to be 529 done. Consequently, media can flow in each direction for that 530 stream, as shown in Figure 4. Once all of the media streams are 531 completed, the controlling endpoint sends an updated offer if the 532 currently in-use candidates don't match the ones it selected. 534 L R 535 - - 536 STUN request + flag -> \ L's 537 <- STUN response / check 539 -> RTP Data 540 <- RTP Data 542 Figure 4 543 Once ICE is concluded, it can be restarted at any time for one or all 544 of the media streams by each agent. This is done by sending an 545 updated offer indicating a restart. 547 2.7. Lite Implementations 549 In order for ICE to be used in a call, both agents need to support 550 it. However, certain agents, such as those in gateways to the PSTN, 551 media servers, conferencing servers, and voicemail servers, are known 552 to not be behind a NAT or firewall. To make it easier for these 553 devices to support ICE, ICE defines a special type of implementation 554 called "lite" (in contrast to the normal "full" implementation). A 555 lite implementation doesn't gather candidates; it includes only its 556 host candidate for any media stream. When a lite implementation 557 connects with a full implementation, the full agent takes the role of 558 the controlling agent, and the lite agent takes on the controlled 559 role. In addition, lite agents do not need to generate connectivity 560 checks, run the state machines, or compute candidate pairs. For an 561 informational summary of ICE processing as seen by a lite agent, see 562 [33]. 564 3. Terminology 566 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 567 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 568 document are to be interpreted as described in RFC 2119 [1]. 570 This specification makes use of the following terminology: 572 Agent: As defined in RFC 3264, an agent is the protocol 573 implementation involved in the offer/answer exchange. There are 574 two agents involved in an offer/answer exchange. 576 Peer: From the perspective of one of the agents in a session, its 577 peer is the other agent. Specifically, from the perspective of 578 the offerer, the peer is the answerer. From the perspective of 579 the answerer, the peer is the offerer. 581 Transport Address: The combination of an IP address and port. 583 Candidate: A transport address that is to be tested by ICE procedures 584 in order to determine its suitability for usage for receipt of 585 media. 587 Component: A component is a single transport address that is used to 588 support a media stream. For media streams based on RTP, there are 589 two components per media stream - one for RTP, and one for RTCP. 591 Host Candidate: A candidate obtained by binding to a specific port 592 from an interface on the host. This includes both physical 593 interfaces and logical ones, such as ones obtained through Virtual 594 Private Networks (VPNs) and Realm Specific IP (RSIP) [18] (which 595 lives at the operating system level). 597 Server Reflexive Candidate: A candidate obtained by sending a STUN 598 request from a host candidate to a STUN server, distinct from the 599 peer, whose address is configured or learned by the client prior 600 to an offer/answer exchange. 602 Peer Reflexive Candidate: A candidate obtained by sending a STUN 603 request from a host candidate to the STUN server running on a 604 peer's candidate. 606 Relayed Candidate: A candidate obtained by sending a STUN Allocate 607 request from a host candidate to a STUN server. The relayed 608 candidate is resident on the STUN server, and the STUN server 609 relays packets back towards the agent. 611 Translation: The translation of a relayed candidate is the transport 612 address that the relay will forward a packet to, when one is 613 received at the relayed candidate. For relayed candidates learned 614 through the STUN Allocate request, the translation of the relayed 615 candidate is the server reflexive candidate returned by the 616 Allocate response. 618 Base: The base of a server reflexive candidate is the host candidate 619 from which it was derived. A host candidate is also said to have 620 a base, equal to that candidate itself. Similarly, the base of a 621 relayed candidate is that candidate itself. 623 Foundation: Each candidate has a foundation, which is an identifier 624 that is distinct for two candidates that have different types, 625 different interface IP addresses for their base, and different IP 626 addresses for their STUN servers. Two candidates have the same 627 foundation when they are of the same type, their bases have the 628 same IP address, and, for server reflexive or relayed candidates, 629 they come from the same STUN server. Foundations are used to 630 correlate candidates, so that when one candidate is found to be 631 valid, candidates sharing the same foundation can be tested next, 632 as they are likely to also be valid. 634 Local Candidate: A candidate that an agent has obtained and included 635 in an offer or answer it sent. 637 Remote Candidate: A candidate that an agent received in an offer or 638 answer from its peer. 640 In-Use Candidate: A candidate is in-use when it appears in the m/c- 641 line of an active media stream. 643 Candidate Pair: A pairing containing a local candidate and a remote 644 candidate. 646 Check: A candidate pair where the local candidate is a transport 647 address from which an agent can send a STUN connectivity check. 649 Check List: An ordered set of STUN checks that an agent is to 650 generate towards a peer. 652 Periodic Check: A connectivity check generated by an agent as a 653 consequence of a timer that fires periodically, instructing it to 654 send a check. 656 Triggered Check: A connectivity check generated as a consequence of 657 the receipt of a connectivity check from the peer. 659 Valid List: An ordered set of candidate pairs for a media stream that 660 have been validated by a successful STUN transaction. 662 Full: An ICE implementation that performs the complete set of 663 functionality defined by this specification. 665 Lite: An ICE implementation that omits certain functions, 666 implementing only as much as is necessary for a peer 667 implementation that is full to gain the benefits of ICE. Lite 668 implementations can only act as the controlled agent in a session, 669 and do not gather candidates. 671 Controlling Agent: The STUN agent which is responsible for selecting 672 the final choice of candidate pairs and signaling them through 673 STUN and an updated offer, if needed. In any session, one agent 674 is always controlling. The other is the controlled agent. 676 Controlled Agent: A STUN agent which waits for the controlling agent 677 to select the final choice of candidate pairs. 679 4. Sending the Initial Offer 681 In order to send the initial offer in an offer/answer exchange, an 682 agent must gather candidates, priorize them, choose ones for 683 inclusion in the m/c-line, and then formulate and send the SDP. The 684 first of these three steps differ for full and lite implementations. 686 4.1. Full Implementation Requirements 688 4.1.1. Gathering Candidates 690 An agent gathers candidates when it believes that communications is 691 imminent. An offerer can do this based on a user interface cue, or 692 based on an explicit request to initiate a session. Every candidate 693 is a transport address. It also has a type and a base. Three types 694 are defined and gathered by this specification - host candidates, 695 server reflexive candidates, and relayed candidates. The base of a 696 candidate is the candidate that an agent must send from when using 697 that candidate. 699 The first step is to gather host candidates. Host candidates are 700 obtained by binding to ports (typically ephemeral) on an interface 701 (physical or virtual, including VPN interfaces) on the host. The 702 process for gathering host candidates depends on the transport 703 protocol. Procedures are specified here for UDP. 705 For each UDP media stream the agent wishes to use, the agent SHOULD 706 obtain a candidate for each component of the media stream on each 707 interface that the host has. It obtains each candidate by binding to 708 a UDP port on the specific interface. A host candidate (and indeed 709 every candidate) is always associated with a specific component for 710 which it is a candidate. Each component has an ID assigned to it, 711 called the component ID. For RTP-based media streams, the RTP itself 712 has a component ID of 1, and RTCP a component ID of 2. If an agent 713 is using RTCP it MUST obtain a candidate for it. If an agent is 714 using both RTP and RTCP, it would end up with 2*K host candidates if 715 an agent has K interfaces. 717 The base for each host candidate is set to the candidate itself. 719 Agents SHOULD obtain relayed candidates and MUST obtain server 720 reflexive candidates. The requirement to obtain relayed candidates 721 is at SHOULD strength to allow for provider variation. If they are 722 not used, it is RECOMMENDED that it be implemented and just disabled 723 through configuration, so that it can re-enabled through 724 configuration if conditions change in the future. 726 The agent next pairs each host candidate with the STUN server with 727 which it is configured or has discovered by some means. This 728 specification only considers usage of a single STUN server. Every Ta 729 seconds, the agent chooses another such pair (the order is 730 inconsequential), and sends a STUN request to the server from that 731 host candidate. If the agent is using both relayed and server 732 reflexive candidates, this request MUST be a STUN Allocate request 733 from the relay usage [12]. If the agent is using only server 734 reflexive candidates, the request MUST be a STUN Binding request 735 using the binding discovery usage [11]. 737 The value of Ta SHOULD be configurable, and SHOULD have a default of 738 20ms. Note that this pacing applies only to starting STUN 739 transactions with source and destination transport addresses (i.e., 740 the host candidate and STUN server respectively) for which a STUN 741 transaction has not previously been sent. Consequently, 742 retransmissions of a STUN request are governed entirely by the 743 retransmission rules defined in [11]. Similarly, retries of a 744 request due to recoverable errors (such as an authentication 745 challenge) happen immediately and are not paced by timer Ta. Because 746 of this pacing, it will take a certain amount of time to obtain all 747 of the server reflexive and relayed candidates. Implementations 748 should be aware of the time required to do this, and if the 749 application requires a time budget, limit the amount of candidates 750 which are gathered. 752 An Allocate Response will provide the agent with a server reflexive 753 candidate (obtained from the mapped address) and a relayed candidate 754 in the RELAY-ADDRESS attribute. A Binding Response will provide the 755 agent with only a server reflexive candidate (also obtained from the 756 mapped address). The base of the server reflexive candidate is the 757 host candidate from which the Allocate or Binding request was sent. 758 The base of a relayed candidate is that candidate itself. A server 759 reflexive candidate obtained from an Allocate response is the called 760 the "translation" of the relayed candidate obtained from the same 761 response. The agent will need to remember the translation for the 762 relayed candidate, since it is placed into the SDP. If a relayed 763 candidate is identical to a host candidate (which can happen in rare 764 cases), the relayed candidate MUST be discarded. Proper operation of 765 ICE depends on each base being unique. 767 Next, the agent eliminates redundant candidates. A candidate is 768 redundant if its transport address equals another candidate, and its 769 base equals the base of that other candidate. Note that two 770 candidates can have the same transport address yet have different 771 bases, and these would not be considered redundant. 773 Finally, the agent assigns each candidate a foundation. The 774 foundation is an identifier, scoped within a session. Two candidates 775 MUST have the same foundation ID when they are of the same type 776 (host, relayed, server reflexive, peer reflexive or relayed), their 777 bases have the same IP address (the ports can be different), and, for 778 reflexive and relayed candidates, the STUN servers used to obtain 779 them have the same IP address. Similarly, two candidates MUST have 780 different foundations if their types are different, their bases have 781 different IP addresses, or the STUN servers used to obtain them have 782 different IP addresses. 784 4.1.2. Prioritizing Candidates 786 The prioritization process results in the assignment of a priority to 787 each candidate. Each candidate for a media stream MUST have a unique 788 priority. An agent SHOULD compute the priority by determining a 789 preference for each type of candidate (server reflexive, peer 790 reflexive, relayed and host), and, when the agent is multihomed, 791 choosing a preference for its interfaces. These two preferences are 792 then combined to compute the priority for a candidate. That priority 793 SHOULD be computed using the following formula: 795 priority = (2^24)*(type preference) + 796 (2^8)*(local preference) + 797 (2^0)*(256 - component ID) 799 The type preference MUST be an integer from 0 to 126 inclusive, and 800 represents the preference for the type of the candidate (where the 801 types are local, server reflexive, peer reflexive and relayed). A 802 126 is the highest preference, and a 0 is the lowest. Setting the 803 value to a 0 means that candidates of this type will only be used as 804 a last resort. The type preference MUST be identical for all 805 candidates of the same type and MUST be different for candidates of 806 different types. The type preference for peer reflexive candidates 807 MUST be higher than that of server reflexive candidates. Note that 808 candidates gathered based on the procedures of Section 4.1.1 will 809 never be peer reflexive candidates; candidates of these type are 810 learned from the STUN connectivity checks performed by ICE. The 811 component ID is the component ID for the candidate, and MUST be 812 between 1 and 256 inclusive. The local preference MUST be an integer 813 from 0 to 65535 inclusive. It represents a preference for the 814 particular interface from which the candidate was obtained, in cases 815 where an agent is multihomed. 65535 represents the highest 816 preference, and a zero, the lowest. When there is only a single 817 interface, this value SHOULD be set to 65535. Generally speaking, if 818 there are multiple candidates for a particular component for a 819 particular media stream which have the same type, the local 820 preference MUST be unique for each one. In this specification, this 821 only happens for multi-homed hosts. 823 These rules guarantee that there is a unique priority for each 824 candidate. This priority will be used by ICE to determine the order 825 of the connectivity checks and the relative preference for 826 candidates. Consequently, what follows are some guidelines for 827 selection of these values. 829 One criteria for selection of the type and local preference values is 830 the use of an intermediary. That is, if media is sent to that 831 candidate, will the media first transit an intermediate server before 832 being received? Relayed candidates are clearly one type of 833 candidates that involve an intermediary. Another are host candidates 834 obtained from a VPN interface. When media is transited through an 835 intermediary, it can increase the latency between transmission and 836 reception. It can increase the packet losses, because of the 837 additional router hops that may be taken. It may increase the cost 838 of providing service, since media will be routed in and right back 839 out of an intermediary run by the provider. If these concerns are 840 important, the type preference for relayed candidates can be set 841 lower than the type preference for reflexive and host candidates. 842 Indeed, it is RECOMMENDED that in this case, host candidates have a 843 type preference of 126, server reflexive candidates have a type 844 preference of 100, peer reflexive have a type prefence of 110, and 845 relayed candidates have a type preference of zero. Furthermore, if 846 an agent is multi-homed and has multiple interfaces, the local 847 preference for host candidates from a VPN interface SHOULD have a 848 priority of 0. 850 Another criteria for selection of preferences is IP address family. 851 ICE works with both IPv4 and IPv6. It therefore provides a 852 transition mechanism that allows dual-stack hosts to prefer 853 connectivity over IPv6, but to fall back to IPv4 in case the v6 854 networks are disconnected (due, for example, to a failure in a 6to4 855 relay) [23]. It can also help with hosts that have both a native 856 IPv6 address and a 6to4 address. In such a case, lower local 857 preferences could be assigned to the v6 interface, followed by the 858 6to4 interfaces, followed by the v4 interfaces. This allows a site 859 to obtain and begin using native v6 addresses immediately, yet still 860 fallback to 6to4 addresses when communicating with agents in other 861 sites that do not yet have native v6 connectivity. 863 Another criteria for selecting preferences is security. If a user is 864 a telecommuter, and therefore connected to their corporate network 865 and a local home network, they may prefer their voice traffic to be 866 routed over the VPN in order to keep it on the corporate network when 867 communicating within the enterprise, but use the local network when 868 communicating with users outside of the enterprise. In such a case, 869 a VPN interface would have a higher local preference than any other 870 interface. 872 Another criteria for selecting preferences is topological awareness. 873 This is most useful for candidates that make use of relays. In those 874 cases, if an agent has preconfigured or dynamically discovered 875 knowledge of the topological proximity of the relays to itself, it 876 can use that to assign higher local preferences to candidates 877 obtained from closer relays. 879 4.1.3. Choosing In-Use Candidates 881 A candidate is said to be "in-use" if it appears in the m/c-line of 882 an offer or answer. When communicating with an ICE peer, being in- 883 use implies that, should these candidates be selected by the ICE 884 algorithm, a re-INVITE will not be required after ICE processing 885 completes. When communicating with a peer that is not ICE-aware, the 886 in-use candidates will be used exclusively for the exchange of media, 887 as defined in normal offer/answer procedures. 889 An agent MUST choose a set of candidates, one for each component of 890 each active media stream, to be in-use. A media stream is active if 891 it does not contain the a=inactive SDP attribute. 893 It is RECOMMENDED that in-use candidates be chosen based on the 894 likelihood of those candidates to work with the peer that is being 895 contacted. Unfortunately, it is difficult to ascertain which 896 candidates that might be. As an example, consider a user within an 897 enterprise. To reach non-ICE capable agents within the enterprise, 898 host candidates have to be used, since the enterprise policies may 899 prevent communication between elements using a relay on the public 900 network. However, when communicating to peers outside of the 901 enterprise, relayed candidates from a publically accessible STUN 902 server are needed. 904 Indeed, the difficulty in picking just one transport address that 905 will work is the whole problem that motivated the development of this 906 specification in the first place. As such, it is RECOMMENDED that 907 agents select relayed candidates to be in-use. 909 4.2. Lite Implementation 911 For each media stream, the agent allocates a single candidate for 912 each component of the media stream from one of its interfaces. If an 913 agent is multi-homed, it MUST choose one of its interfaces for a 914 particular media stream; ICE cannot be used to dynamically choose 915 one. Each component has an ID assigned to it, called the component 916 ID. For RTP-based media streams, the RTP itself has a component ID 917 of 1, and RTCP a component ID of 2. If an agent is using RTCP it 918 MUST obtain a candidate for it. 920 Each candidate is assigned a foundation. The foundation MUST be 921 different for two candidates from different interfaces (which can 922 occur if media streams are on different interfaces), and MUST be the 923 same otherwise. A simple integer that increments for each interface 924 will suffice. In addition, each candidate MUST be assigned a unique 925 priority amongst all candidates for the same media stream. This 926 priority SHOULD be equal to 2^24*(126) + 2^8*(65535) + 256 minus the 927 component ID, which is 2130706432 minus the component ID. Each of 928 these candidates is also considered to be "in-use", since they will 929 be included in the m/c-line of an offer or answer. 931 4.3. Encoding the SDP 933 The process of encoding the SDP is identical between full and lite 934 implementations. 936 The agent includes a single a=candidate media level attribute in the 937 SDP for each candidate for that media stream. The a=candidate 938 attribute contains the IP address, port and transport protocol for 939 that candidate. A Fully Qualified Domain Name (FQDN) for a host MAY 940 be used in place of a unicast address. In that case, when receiving 941 an offer or answer containing an FQDN in an a=candidate attribute, 942 the FQDN is looked up in the DNS using an A or AAAA record, and the 943 resulting IP address is used for the remainder of ICE processing. 944 The candidate attribute also includes the component ID for that 945 candidate. For media streams based on RTP, candidates for the actual 946 RTP media MUST have a component ID of 1, and candidates for RTCP MUST 947 have a component ID of 2. Other types of media streams which require 948 multiple components MUST develop specifications which define the 949 mapping of components to component IDs, and these component IDs MUST 950 be between 1 and 256. 952 The candidate attribute also includes the priority and the 953 foundation. The agent SHOULD include a type for each candidate by 954 populating the candidate-types production with the appropriate value 955 - "host" for host candidates, "srflx" for server reflexive 956 candidates, "prflx" for peer reflexive candidates (though these never 957 appear in an initial offer/answer exchange), and "relay" for relayed 958 candidates. The related address MUST NOT be included if a type was 959 not included. If a type was included, the related address SHOULD be 960 present for server reflexive, peer reflexive and relayed candidates. 961 If a candidate is server or peer reflexive, the related address is 962 equal to the base for that server or peer reflexive candidate. If 963 the candidate is relayed, the related address is equal to the 964 translation of the relayed address. If the candidiate is a host 965 candidate, there is no related address and the rel-addr production 966 MUST be omitted. 968 STUN connectivity checks between agents make use of a short term 969 credential that is exchanged in the offer/answer process. The 970 username part of this credential is formed by concatenating a 971 username fragment from each agent, separated by a colon. Each agent 972 also provides a password, used to compute the message integrity for 973 requests it receives. As such, an SDP MUST contain the ice-ufrag and 974 ice-pwd attributes, containing the username fragment and password 975 respectively. These can be either session or media level attributes, 976 and thus common across all candidates for all media streams, or all 977 candidates for a particular media stream, respectively. However, if 978 two media streams have identical ice-ufrag's, they MUST have 979 identical ice-pwd's. The ice-ufrag and ice-pwd attributes MUST be 980 chosen randomly at the beginning of a session. The ice-ufrag 981 attribute MUST contain at least 24 bits of randomness, and the ice- 982 pwd attribute MUST contain at least 128 bits of randomness. This 983 means that the ice-ufrag attribute will be at least 4 characters 984 long, and the ice-pwd at least 22 characters long, since the grammar 985 for these attributes allows for 6 bits of randomness per character. 986 The attributes MAY be longer than 4 and 22 characters respectively, 987 of course. 989 If an agent is a lite implementation, it MUST include an "a=ice-lite" 990 session level attribute in its SDP. If an agent is a full 991 implementation, it MUST NOT include this attribute. 993 The m/c-line is populated with the candidates that are in-use. For 994 streams based on RTP, this is done by placing the RTP candidate into 995 the m and c lines respectively. If the agent is utilizing RTCP, it 996 MUST encode the RTCP candidate into the m/c-line using the a=rtcp 997 attribute as defined in RFC 3605 [2]. If RTCP is not in use, the 998 agent MUST signal that using b=RS:0 and b=RR:0 as defined in RFC 3556 999 [5]. 1001 There MUST be a candidate attribute for each component of the media 1002 stream in the m/c-line. 1004 Once an offer or answer are sent, an agent MUST be prepared to 1005 receive both STUN and media packets on each candidate. As discussed 1006 in Section 11.1, media packets can be sent to a candidate prior to 1007 its appearence in the m/c-line. 1009 5. Receiving the Initial Offer 1011 When an agent receives an initial offer, it will check if the offeror 1012 supports ICE, determine its role, gather candidates, prioritize them, 1013 choose one for in-use, encode and send an answer, and for full 1014 implementations, form the check lists and begin connectivity checks. 1016 5.1. Verifying ICE Support 1018 The answerer will proceed with the ICE procedures defined in this 1019 specification if the following are true: 1021 o There is at least one a=candidate attribute for each media stream 1022 in the offer it just received. 1024 o For each media stream, at least one of the candidates is a match 1025 for its respective in-use component in the m/c-line. 1027 If both of these conditions are not met, the agent MUST process the 1028 SDP based on normal RFC 3264 procedures, without using any of the ICE 1029 mechanisms described in the remainder of this specification with two 1030 exceptions. First, in all cases, the agent MUST follow the rules of 1031 Section 10, which describe keepalive procedures for all agents. 1032 Secondly, if the agent is not proceeding with ICE because there were 1033 a=candidate attributes, but none that matched the m/c-line of the 1034 media stream, the agent MUST include an a=ice-mismatch attribute in 1035 its answer. This mismatch occurs in cases where intermediary 1036 elements modify the m/c-line, but don't modify candidate attributes. 1037 By including this attribute in the response, diagnostic information 1038 on the ICE failure is provided to the offeror and any intermediate 1039 signaling entities. 1041 In addition, if the offer contains the "a=ice-lite" attribute, and 1042 the answerer is also lite, the agent MUST process the SDP based on 1043 normal RFC 3264 procedures, as if it didn't support ICE, with the 1044 exception of Section 10, which describes keepalive procedures. 1046 5.2. Determining Role 1048 For each session, each agent takes on a role. There are two roles - 1049 controlling, and controlled. The controlling agent is responsible 1050 for selecting the candidate pairs to be used for each media stream, 1051 and for generating the updated offer based on that selection, when 1052 needed. The controlled agent is told which candidate pairs to use 1053 for each media stream, and does not generate an updated offer to 1054 signal this information in SIP. 1056 If one of the agents is a lite implementation, it MUST assume the 1057 controlled role, and its peer (which will be full) MUST assume the 1058 controlling role. If the agent and its peer are both full 1059 implementations, the agent which generated the offer which started 1060 the ICE processing takes on the controlling role, and the other takes 1061 the controlled role. 1063 Based on this definition, once roles are determined for a session, 1064 they persist unless ICE is restarted, as discussed below. A restart 1065 causes a new selection of roles. 1067 5.3. Gathering Candidates 1069 The process for gathering candidates at the answerer is identical to 1070 the process for the offerer as described in Section 4.1.1 for full 1071 implementations and Section 4.2 for lite implementations. It is 1072 RECOMMENDED that this process begin immediately on receipt of the 1073 offer, prior to user acceptance of a session. Such gathering MAY 1074 even be done pre-emptively when an agent starts. 1076 5.4. Prioritizing Candidates 1078 The process for prioritizing candidates at the answerer is identical 1079 to the process followed by the offerer, as described in Section 4.1.2 1080 for full implementations and Section 4.2 for lite implementations. 1082 5.5. Choosing In Use Candidates 1084 The process for selecting in-use candidates at the answerer is 1085 identical to the process followed by the offerer, as described in 1086 Section 4.1.3 for full implementations and Section 4.2 for lite 1087 implementations. 1089 5.6. Encoding the SDP 1091 The process for encoding the SDP at the answerer is identical to the 1092 process followed by the offerer, as described in Section 4.3. 1094 5.7. Forming the Check Lists 1096 Forming check lists is done only by full implementations. Lite 1097 implementations MUST skip the steps defined in this section. 1099 There is one check list per in-use media stream resulting from the 1100 offer/answer exchange. A media stream is in-use as long as its port 1101 is non-zero (which is used in RFC 3264 to reject a media stream). 1102 Consequently, a media stream is in-use even if it is marked as 1103 a=inactive or has a bandwidth value of zero. Each check list is a 1104 sequence of STUN connectivity checks that are performed by the agent. 1105 To form the check list for a media stream, the agent forms candidate 1106 pairs, computes a candidate pair priority, orders the pairs by 1107 priority, prunes them, and sets their states. These steps are 1108 described in this section. 1110 First, the agent takes each of its candidates for a media stream 1111 (called local candidates) and pairs them with the candidates it 1112 received from its peer (called remote candidates) for that media 1113 stream. A local candidate is paired with a remote candidate if and 1114 only if the two candidates have the same component ID and have the 1115 same IP address version. It is possible that some of the local 1116 candidates don't get paired with a remote candidate, and some of the 1117 remote candidates don't get paired with local candidates. This can 1118 happen if one agent didn't include candidates for the all of the 1119 components for a media stream. In the case of RTP, for example, this 1120 would happen when one agent provided candidates for RTCP, and the 1121 other did not. If this happens, the number of components for that 1122 media stream is effectively reduced, and considered to be equal to 1123 the minimum across both agents of the maximum component ID provided 1124 by each agent across all components for the media stream. 1126 Once the pairs are formed, a candidate pair priority is computed. 1127 Let O-P be the priority for the candidate provided by the offerer. 1128 Let A-P be the priority for the candidate provided by the answerer. 1129 The priority for a pair is computed as: 1131 pair priority = 2^32*MIN(O-P,A-P) + 2*MAX(O-P,A-P) + (O-P>A-P?1:0) 1133 Where O-P>A-P?1:0 is an expression whose value is 1 if O-P is greater 1134 than A-P, and 0 otherwise. This formula ensures a unique priority 1135 for each pair in most cases. One the priority is assigned, the agent 1136 sorts the candidate pairs in decreasing order of priority. If two 1137 pairs have identical priority, the ordering amongst them is 1138 arbitrary. 1140 This sorted list of candidate pairs is used to determine a sequence 1141 of connectivity checks that will be performed. Each check involves 1142 sending a request from a local candidate to a remote candidate. 1143 Since an agent cannot send requests directly from a reflexive 1144 candidate, but only from its base, the agent next goes through the 1145 sorted list of candidate pairs. For each pair where the local 1146 candidate is server reflexive, the server reflexive candidate MUST be 1147 replaced by its base. Once this has been done, the agent MUST remove 1148 redundant pairs. A pair is redundant if its local and remote 1149 candidates are identical to the local and remote candidates of a pair 1150 higher up on the priority list. The result is called the check list 1151 for that media stream, and each candidate pair on it is called a 1152 check. 1154 Each check is also said to have a foundation, which is merely the 1155 combination of the foundations of the local and remote candidates in 1156 the check. 1158 Each check in the check list is associated with a state. This state 1159 is assigned once the check list for each media stream has been 1160 computed. There are five potential values that the state can have: 1162 Waiting: This check has not been performed, and can be performed as 1163 soon as it is the highest priority Waiting check on the check 1164 list. 1166 In-Progress: A request has been sent for this check, but the 1167 transaction is in progress. 1169 Succeeded: This check was already done and produced a successful 1170 result. 1172 Failed: This check was already done and failed, either never 1173 producing any response or producing an unrecoverable failure 1174 response. 1176 Frozen: This check hasn't been performed, and it can't yet be 1177 performed until some other check succeeds, allowing it to move 1178 into the Waiting state. 1180 First, the agent sets all of the checks in each check list to the 1181 Frozen state. Then, it takes the first check in the check list for 1182 the first media stream (a media stream is the first media stream when 1183 it is described by the first m-line in the SDP offer and answer), and 1184 sets its state to Waiting. It then finds all of the other checks in 1185 that check list with the same component ID, but different 1186 foundations, and sets all of their states to Waiting as well. Once 1187 this is done, one of the check lists will have some number of checks 1188 in the Waiting state, and the other check lists will have all of 1189 their checks in the Frozen state. A check list with at least one 1190 check that is not Frozen is called an active check list. 1192 The check list itself is associated with a state, which captures the 1193 state of ICE checks for that media stream. There are two states: 1195 Running: In this state, ICE checks are still in progress for this 1196 media stream. 1198 Completed: In this state, the controlling agent has signaled that a 1199 candidate pair has been selected for each component. 1200 Consequently, no further ICE checks are performed. 1202 When a check list is first constructed as the consequence of an 1203 offer/answer exchange, it is placed in the Running state. 1205 ICE processing across all media streams also has a state associated 1206 with it. This state is equal to Running while checks are in 1207 progress. The state is Completed when all checks have been 1208 completed. Rules for transitioning between states are described 1209 below. 1211 5.8. Performing Periodic Checks 1213 Checks are generated only by full implementations. Lite 1214 implementations MUST skip the steps described in this section. 1216 An agent performs two types of checks. The first type are periodic 1217 checks. These checks occur periodically for each media stream, and 1218 involve choosing the highest priority check in the Waiting state from 1219 each check list, and performing it. The other type of check is 1220 called a triggered check. This is a check that is performed on 1221 receipt of a connectivity check from the peer. This section 1222 describes how periodic checks are performed. 1224 Once the agent has computed the check lists as described in 1225 Section 5.7, it sets a timer for each active check list. The timer 1226 fires every Ta/N seconds, where N is the number of active check lists 1227 (initially, there is only one active check list). Implementations 1228 MAY set the timer to fire less frequently than this. Ta is the same 1229 value used to pace the gathering of candidates, as described in 1230 Section 4.1.1. The first timer for each active check list fires 1231 immediately, so that the agent performs a connectivity check the 1232 moment the offer/answer exchange has been done, followed by the next 1233 periodic check Ta seconds later. 1235 When the timer fires, the agent MUST find the highest priority check 1236 in that check list that is in the Waiting state. The agent then 1237 sends a STUN check from the local candidate of that check to the 1238 remote candidate of that check. The procedures for forming the STUN 1239 request for this purpose are described in Section 7.1.1. If none of 1240 the checks in that check list are in the Waiting state, but there are 1241 checks in the Frozen state, the highest priority check in the Frozen 1242 state is moved into the Waiting state, and that check is performed. 1243 When a check is performed, its state is set to In-Progress. If there 1244 are no checks in either the Waiting or Frozen state, then the timer 1245 for that check list is stopped. 1247 Performing the connectivity check requires the agent to know the 1248 username fragment for the local and remote candidates, and the 1249 password for the remote candidate. For periodic checks, the remote 1250 username fragment and password are learned directly from the SDP 1251 received from the peer, and the local username fragment is known by 1252 the agent. 1254 6. Receipt of the Initial Answer 1256 This section describes the procedures that an agent follows when it 1257 receives the answer from the peer. It verifies that its peer 1258 supports ICE, determines its role, and for full implementations, 1259 forms the check list and begins performing periodic checks. 1261 6.1. Verifying ICE Support 1263 The answerer will proceed with the ICE procedures defined in this 1264 specification if there is at least one a=candidate attribute for each 1265 media stream in the answer it just received. If this condition is 1266 not met, the agent MUST process the SDP based on normal RFC 3264 1267 procedures, without using any of the ICE mechanisms described in the 1268 remainder of this specification, with the exception of Section 10, 1269 which describes keepalive procedures. 1271 In some cases, the answer may omit a=candidate attributes for the 1272 media streams, and instead include an a=ice-mismatch attribute for 1273 one or more of the media streams in the SDP. This signals to the 1274 offerer that the answerer supports ICE, but that ICE processing was 1275 not used for the session because an intermediary modified the m/c- 1276 lines without modifying the candidate attributes. See Section 16 for 1277 a discussion of cases where this can happen. This specification 1278 provides no guidance on how an agent should proceed in such a failure 1279 case. 1281 6.2. Determining Role 1283 The offerer follows the same procedures described for the answerer in 1284 Section 5.2. 1286 6.3. Forming the Check List 1288 Formation of check lists is performed only by full implementations. 1289 The offerer follows the same procedures described for the answerer in 1290 Section 5.7. 1292 6.4. Performing Periodic Checks 1294 Periodic checks are performed only by full implementations. The 1295 offerer follows the same procedures described for the answerer in 1296 Section 5.8. 1298 7. Connectivity Checks 1300 This section describes how connectivity checks are performed. All 1301 ICE implementations are required to be compliant to [11], as opposed 1302 to the older [14]. However, whereas a full implementation will both 1303 generate checks (acting as a STUN client) and receive them (acting as 1304 a STUN server), a lite implementation will only ever receive checks, 1305 and thus will only act as a STUN server. 1307 7.1. Client Procedures 1309 These procedures define how an agent sends a connectivity check, 1310 whether it is a periodic or a triggered check. These procedures are 1311 only applicable to full implementations. 1313 7.1.1. Sending the Request 1315 The agent acting as the client generates a connectivity check either 1316 periodically, or triggered. In either case, the check is generated 1317 by sending a Binding Request from a local candidate, to a remote 1318 candidate. The agent must know the username fragment for both 1319 candidates and the password for the remote candidate. 1321 A Binding Request serving as a connectivity check MUST utilize a STUN 1322 short term credential. Rather than being learned from a Shared 1323 Secret request, the short term credential is exchanged in the offer/ 1324 answer procedures. In particular, the username is formed by 1325 concatenating the username fragment provided by the peer with the 1326 username fragment of the agent sending the request, separated by a 1327 colon (":"). The password is equal to the password provided by the 1328 peer. For example, consider the case where agent A is the offerer, 1329 and agent B is the answerer. Agent A included a username fragment of 1330 AFRAG for its candidates, and a password of APASS. Agent B provided 1331 a username fragment of BFRAG and a password of BPASS. A connectivity 1332 check from A to B (and its response of course) utilize the username 1333 BFRAG:AFRAG and a password of BPASS. A connectivity check from B to 1334 A (and its response) utilize the username AFRAG:BFRAG and a password 1335 of APASS. 1337 An agent MUST include the PRIORITY attribute in its Binding Request. 1338 The attribute MUST be set equal to the priority that would be 1339 assigned, based on the algorithm in Section 4.1.2, to a peer 1340 reflexive candidate learned from this check. Such a peer reflexive 1341 candidate has a stream ID, component ID and local preference that are 1342 equal to the host candidate from which the check is being sent, but a 1343 type preference equal to the value associated with peer reflexive 1344 candidates. 1346 The Binding Request sent by an agent MUST include the USERNAME and 1347 MESSAGE-INTEGRITY attributes. That is, an agent MUST NOT wait to be 1348 challenged for short term credentials. Rather, it MUST provide them 1349 in the Binding Request right away. 1351 The controlling agent MAY include the USE-CANDIDATE attribute in the 1352 Binding Request. The controlled agent MUST NOT include it in its 1353 Binding Request. This attribute signals that the controlling agent 1354 wishes to cease checks for this component, and use the candidate pair 1355 resulting from the check for this component. Section 8 provides 1356 guidance on determining when to include it. 1358 If the agent is using Diffserv Codepoint markings [26] in its media 1359 packets, it SHOULD apply those same markings to its connectivity 1360 checks. 1362 7.1.2. Processing the Response 1364 If the STUN transaction generates an unrecoverable failure response 1365 or times out, the agent sets the state of the check to Failed. The 1366 remainder of this section applies to processing of successful 1367 responses (any response from 200 to 299). 1369 The agent MUST check that the source IP address and port of the 1370 response equals the destination IP address and port that the Binding 1371 Request was sent to, and that the destination IP address and port of 1372 the response match the source IP address and port that the Binding 1373 Request was sent from. If these do not match, the processing 1374 described in the remainder of this section MUST NOT be performed. In 1375 addition, an agent sets the state of the check to Failed. 1377 If the check succeeds, processing continues. The agent creates a 1378 candidate pair whose local candidate equals the mapped address of the 1379 response, and whose remote candidate equals the destination address 1380 to which the request was sent. This is called a validated pair, 1381 since it has been validated by a STUN connectivity check. It is very 1382 important to note that this validated pair will often not be 1383 identical to the check itself; in many cases, the local candidate 1384 (learned through the mapped address in the response) will be 1385 different than the local candidate the request was sent from. 1387 Next, the agent computes the priority for the pair based on the 1388 priority of each candidate, using the algorithm in Section 5.7. The 1389 priority of the local candidate depends on its type. If it is not 1390 peer reflexive, it is equal to the priority signaled for that 1391 candidate in the SDP. If it is peer reflexive, it is equal to the 1392 PRIORITY attribute the agent placed in the Binding Request which just 1393 completed. The priority of the remote candidate is taken from the 1394 SDP of the peer. If the candidate does not appear there, then the 1395 check must have been a triggered check to a new remote candidate. In 1396 that case, the priority is taken as the value of the PRIORITY 1397 attribute in the Binding Request which triggered the check that just 1398 completed. 1400 Once the priority of the candidate pair has been computed, the pair 1401 is added to the valid list for that media stream. If the agent was a 1402 controlling agent, and the check had included a USE-CANDIDATE 1403 attribute, the candidate pair is marked as "favored". If the agent 1404 was a controlled agent, and the check was a triggered check, and the 1405 request which caused the triggered check included the USE-CANDIDATE 1406 attribute, the candidate pair is marked as "favored". 1408 Next, the agent updates its ICE states. The agent checks the mapped 1409 address from the STUN response. If the transport address does not 1410 match any of the local candidates that the agent knows about, the 1411 mapped address represents a new peer reflexive candidate. Its type 1412 is equal to peer reflexive. Its base is set equal to the candidate 1413 from which the STUN check was sent. Its username fragment and 1414 password are identical to the candidate from which the check was 1415 sent. It is assigned the priority value that was placed in the 1416 PRIORITY attribute of the request. Its foundation is selected as 1417 described in Section 4.1.1. The peer reflexive candidate is then 1418 added to the list of local candidates known by the agent (though it 1419 is not paired with other remote candidates at this time). 1421 Next, the agent changes the state for this check to Succeeded. The 1422 agent sees if the success of this check can cause other checks to be 1423 unfrozen. If the check had a component ID of one, the agent MUST 1424 change the states for all other Frozen checks for the same media 1425 stream and same foundation, but different component IDs, to Waiting. 1426 If the component ID for the check was equal to the number of 1427 components for the media stream (where this is the actual number of 1428 components being used, in cases where the number of components 1429 signaled in the SDP differs from offerer to answerer), the agent MUST 1430 change the state for all other Frozen checks for the first component 1431 of different media streams (and thus in different check lists) but 1432 the same foundation, to Waiting. 1434 7.2. Server Procedures 1436 An agent MUST be prepared to receive a Binding Request on the base of 1437 each candidate it included in its most recent offer or answer. 1438 Receipt of a Binding Request on a transport address that the agent 1439 had included in a candidate attribute is an indication that the 1440 connectivity check usage applies to the request. 1442 The agent MUST use a short term credential to authenticate the 1443 request and perform a message integrity check. The agent MUST accept 1444 a credential if the username consists of two values separated by a 1445 colon, where the first value is equal to the username fragment 1446 generated by the agent in an offer or answer for a session in- 1447 progress, and the password is equal to the password for that username 1448 fragment. It is possible (and in fact very likely) that an offeror 1449 will receive a Binding Request prior to receiving the answer from its 1450 peer. However, the request can be processed without receiving this 1451 answer, and a response generated. 1453 If the agent is using Diffserv Codepoint markings [26] in its media 1454 packets, it SHOULD apply those same markings to its responses to 1455 Binding Requests. 1457 7.2.1. Additional Procedures for Full Implementations 1459 This subsection defines the additional server procedures applicable 1460 to full implementations. 1462 For requests being received on a relayed candidate, the source 1463 transport address used for STUN processing (namely, generation of the 1464 XOR-MAPPED-ADDRESS attribute) is the transport address as seen by the 1465 relay. That source transport address will be present in the REMOTE- 1466 ADDRESS attribute of a STUN Data Indication message, if the Binding 1467 Request was delivered through a Data Indication. If the Binding 1468 Request was not encapsulated in a Data Indication, that source 1469 address is equal to the current active destination for the STUN relay 1470 session. 1472 If the STUN request resulted in an error response, no further 1473 processing is performed. 1475 Assuming a success response, if the source transport address of the 1476 request does not match any existing remote candidates, it represents 1477 a new peer reflexive remote candidate. The full-mode agent gives the 1478 candidate a priority equal to the PRIORITY attribute from the 1479 request. The type of the candidate is equal to peer reflexive. Its 1480 foundation is set to an arbitrary value, different from the 1481 foundation for all other remote candidates. Note that any subsequent 1482 offer/answer exchanges will contain this new peer reflexive candidate 1483 in the SDP, and will signal the actual foundation for the candidate. 1484 This candidate is then added to the list of remote candidates. 1485 However, the agent does not pair this candidate with any local 1486 candidates. 1488 Next, the agent constructs a tentative check in the reverse 1489 direction, called a triggered check. The triggered check has a local 1490 candidate equal to the candidate on which the STUN request was 1491 received, and a remote candidate equal to the source transport 1492 address where the request came from (which may be a new peer- 1493 reflexive remote candidate). Since both candidates are known to the 1494 agent, it can obtain their priorities and compute the candidate pair 1495 priority. This tentative check is then looked up in the check list. 1496 There can be one of several outcomes: 1498 o If there is already a check on the check list with this same local 1499 and remote candidates, and the state of that check is Waiting or 1500 Frozen, its state is changed to In-Progress and the tentative 1501 check is performed. 1503 o If there is already a check on the check list with this same local 1504 and remote candidates, and its state was In-Progress, the agent 1505 SHOULD abandon the new tentative check and instead generate an 1506 immediate retransmit of the Binding Request for the check in 1507 progress. This is to facilitate rapid completion of ICE when both 1508 agents are behind NAT. 1510 o If there is already a check on the check list with this same local 1511 and remote candidates, and its state was Succeeded, the new 1512 tentative check is abandoned. If the Binding Request just 1513 received contained the USE-CANDIDATE attribute, it means that the 1514 pair resulting from that previous check is favored by the peer 1515 controlling agent. The agent MUST take the candidate pair in the 1516 valid list that was learned from that previous successful check, 1517 and mark it as favored. 1519 o If there is already a check on the check list with this same local 1520 and remote candidates, and its state was Failed, the new tentative 1521 check is abandoned. 1523 o If there is no matching check on the check list, the new tentative 1524 check is inserted into the check list based on its priority, and 1525 its state is set to In-Progress. 1527 If the tentative check is to be performed, it is constructed and 1528 processed as described in Section 7.1.1. These procedures require 1529 the agent to know the username fragment and password for the peer. 1530 They are readily determined from the SDP and from the check that was 1531 just received. The username fragment for the remote candidate is 1532 equal to the bottom half (the part after the colon) of the USERNAME 1533 in the Binding Request that was just received. Using that username 1534 fragment, the agent can check the SDP messages received from its peer 1535 (there may be more than one in cases of forking), and find this 1536 username fragment. The corresponding password is then selected. If 1537 agent has not yet received this SDP (a likely case for the offerer in 1538 the initial offer/answer exchange), it MUST wait for the SDP to be 1539 received, and then proceed with the triggered check. 1541 7.2.2. Additional Procedures for Lite Implementations 1543 If the check that was just received contained a USE-CANDIDATE 1544 attribute, the agent constructs a candidate pair whose local 1545 candidate is equal to the transport address on which the request was 1546 received, and whose remote candidate is equal to the source transport 1547 address of the request that was received. This candidate pair is 1548 assigned an arbitrary priority, and placed into a list of valid 1549 candidates for that component of that media stream called the valid 1550 list. In addition, it is marked as favored, since the peer agent has 1551 indicated that it is to be used. ICE processing is considered 1552 complete for a media stream if the valid list contains a candidate 1553 pair for each component. 1555 8. Concluding ICE 1557 The processing rules in this section apply only to full 1558 implementations. 1560 Concluding ICE involves selection of pairs by the controlling agent, 1561 updating of state machinery, and possibly the generation of an 1562 updated offer by the controlling agent. 1564 The controlling agent can use any algorithm it likes for deciding 1565 when to select a candidate pair, called the favored pair, as the one 1566 that will be used for media. However, it MUST eventually include a 1567 USE-CANDIDATE attribute in at least one successful check for each 1568 component of each media stream. 1570 The most apparent way to utilize the USE-CANDIDATE attribute is to 1571 run through a series of checks, each of which omit the flag. Once 1572 one or more checks complete successfully for a component of a media 1573 stream, the agent evaluates the choices based on some criteria, and 1574 picks a candidate pair. The criteria for evaluation is a matter of 1575 implementation and it allows for localized optimizations. The check 1576 that yielded this pair is then repeated, this time with the USE- 1577 CANDIDATE flag. This approach provides the most flexibility in terms 1578 of algorithms, and also improves ICE's resilience to variations in 1579 implementation (see Section 14. This approach is called 1580 "introspective selection". The drawback of introspective selection 1581 is that it is guaranteed to increase latencies because it requires an 1582 additional check to be done. 1584 An alternative is called "proactive selection". In this approach, 1585 the controlling agent includes the USE-CANDIDATE attribute in every 1586 check it sends. Once the first check for a component succeeds, it is 1587 used by ICE. In this mode, the agent will end up using the candidate 1588 pair which is highest priority based on ICE's prioritization 1589 algorithm, instead of some other local optimization. It is possible 1590 with proactive selection that multiple checks might succeed with the 1591 flag set; this is why ICE still applies its prioritization algorithm 1592 to pick amongst those pairs that have been favored. 1594 If an agent is controlling and its peer has a lite implementation, an 1595 agent MUST use an introspective selection algorithm. Of course, it 1596 MAY select a favored pair based on ICE's prioritization. The key 1597 requirement is that the agent must complete a successful check before 1598 redoing it with the USE-CANDIDATE attribute. 1600 For both controlling and controlled agents, once a candidate pair in 1601 the Valid list is marked as favored, an agent MUST NOT generate any 1602 further periodic checks for that component of that media stream, and 1603 SHOULD cease any retransmissions in progress for checks for that 1604 component of that media stream. Once there is at least one candidate 1605 pair for each component of a media stream that is favored, a full- 1606 mode agent MUST change the state of processing for its check list to 1607 Completed. Once all of the check lists for the media streams enter 1608 the Completed state, the controlling agent takes the highest priority 1609 favored candidate pair for each component of each media stream. If 1610 any of those candidate pairs differ from the in-use candidates in 1611 m/c-lines of the most recent offer/answer exchange, the controlling 1612 agent MUST generate an updated offer as described in Section 9. 1614 9. Subsequent Offer/Answer Exchanges 1616 An agent MAY generate a subsequent offer at any time. However, the 1617 rules in Section 8 will cause the controlling agent to send an 1618 updated offer at the conclusion of ICE processing when ICE has 1619 selected different candidate pairs from the in-use pairs. This 1620 section defines rules for construction of subsequent offers and 1621 answers. 1623 9.1. Generating the Offer 1625 An agent MAY change the ice-pwd and/or ice-ufrag for a media stream 1626 in an offer. Doing so is a signal to restart ICE processing for that 1627 media stream. When an agent restarts ICE for a media stream, it MUST 1628 NOT include the a=remote-candidates attribute, since the state of the 1629 media stream would not be Completed at this point. Note that it is 1630 permissible to use a session-level attribute in one offer, but to 1631 provide the same password as a media-level attribute in a subsequent 1632 offer. This is not a change in password, just a change in its 1633 representation. 1635 An agent MUST restart ICE processing if the offer is being generated 1636 for the purposes of changing the target of the media stream. In 1637 other words, if an agent wants to generated an updated offer which, 1638 had ICE not been in use, would result in a new value for the 1639 transport address in the m/c-line, the agent MUST restart ICE for 1640 that media stream. This implies that setting the IP address in the c 1641 line to 0.0.0.0 will cause an ICE restart. Consequently, ICE 1642 implementations SHOULD NOT utilize this mechanism for call hold, and 1643 instead use a=inactive as described in [4] 1645 If an agent removes a media stream by setting its port to zero, it 1646 MUST NOT include any candidate attributes for that media stream. 1648 An agent MUST NOT signal a change in its implementation level (full 1649 or lite) by adding or removing the a=ice-lite attribute from an 1650 updated offer, unless ICE processing is being restarted for all media 1651 streams in the offer. Of course, in normal cases the implementation 1652 level is not dynamic and there would be no need to signal a change. 1653 However, in applications like third party call control, which involve 1654 a mid-session change in remote correspondent, this can happen and it 1655 is permitted by ICE with a restart. 1657 Note that an agent can add a new media stream at any time, even if 1658 ICE has long finished for the existing media streams. Based on the 1659 rules described here, checks will begin for this new stream as if it 1660 was in an initial offer. 1662 9.1.1. Additional Procedures for Full Implementations 1664 This section describes additional procedures for full 1665 implementations. 1667 When an agent generates an updated offer, the set of candidate 1668 attributes to include for each media stream depend on the state of 1669 ICE processing for that media stream. If the processing for that 1670 media stream is in the Completed state, a full-mode agent MUST 1671 include a candidate attribute for the local candidate of each pair 1672 that has been chosen for use by ICE for that media stream. A pair is 1673 chosen if it is the highest priority favored pair in the valid list 1674 for a component of that media stream. An agent SHOULD NOT include 1675 any other candidate attributes for that media stream. If ICE 1676 processing for a media stream is in the Running state, the agent MUST 1677 include all current candidates (including peer reflexive candidates 1678 learned through ICE processing) for that media stream. It MAY 1679 include candidates it did not offer previously, but which it has 1680 gathered since the last offer/answer exchange. If a media stream is 1681 new or ICE checks are restarting for that stream, an agent includes 1682 the set of candidates it wishes to utilize. This MAY include some, 1683 none, or all of the previous candidates for that stream in the case 1684 of a restart, and MAY include a totally new set of candidates 1685 gathered as described in Section 4.1.1. 1687 If a candidate was sent in a previous offer/answer exchange, it 1688 SHOULD have the same priority. For a peer reflexive candidate, the 1689 priority SHOULD be the same as determined by the processing in 1690 Section 7.1.2. The foundation SHOULD be the same. The username 1691 fragments and passwords for a media stream SHOULD remain the same as 1692 the previous offer or answer. 1694 Population of the m/c-lines also depends on the state of ICE 1695 processing. If ICE processing for a media stream is in the Completed 1696 state, the m/c-line MUST use the local candidate from the highest 1697 priority favored pair in the valid list for each component of that 1698 media stream. If ICE processing is in the Running state, a full-mode 1699 agent SHOULD populate the m/c-line for that media stream based on the 1700 considerations in Section 4.1.3. 1702 In addition, if the agent is controlling, it MUST include the 1703 a=remote-candidates attribute for each media stream that is in the 1704 Completed state. The attribute contains the remote candidates from 1705 the highest priority favored pair in the valid list for each 1706 component of that media stream. 1708 9.1.2. Additional Procedures for Lite Implementations 1710 A passive-only agent includes its one and only candidate for each 1711 component of each media stream in an a=candidate attribute in any 1712 subsequent offer. This candidate is formed identically to the 1713 procedures for initial offers, as described in Section 4.2. 1715 9.2. Receiving the Offer and Generating an Answer 1717 When receiving a subsequent offer within an existing session, an 1718 agent MUST re-apply the verification procedures in Section 5.1 1719 without regard to the results of verification from any previous 1720 offer/answer exchanges. Indeed, it is possible that a previous 1721 offer/answer exchange resulted in ICE not being used, but it is used 1722 as a consequence of a subsequent exchange. 1724 If the offer contained a change in the a=ice-ufrag or a=ice-pwd 1725 attributes compared to the previous SDP from the peer, it is a signal 1726 that ICE is restarting for this media stream. If all media streams 1727 are restarting, than ICE is restarting overall. Procedures for ICE 1728 restarts are discussed below. Unless ICE is restarting for that 1729 media stream, an agent MUST NOT change the a=ice-ufrag or a=ice-pwd 1730 attributes in an answer relative to the last SDP it provided. Such a 1731 change can only take place in an offer. If ICE is restarting, the 1732 a=ice-ufrag and a=ice-pwd attributes MUST be changed. 1734 An agent MUST NOT change its implementation level from its previous 1735 SDP unless, based on the offer, ICE procedures are being restarted 1736 for all media streams in the offer. In that case, it MAY change its 1737 level. 1739 An agent MUST NOT include the a=remote-candidates attribute in an 1740 answer. 1742 When the answerer generates its answer, it must decide what 1743 candidates to include in the answer, how to populate the m/c-line, 1744 and how to adjust the states of ICE processing. The rules for 1745 inclusion of candidate attributes in an answer are identical to the 1746 rules followed by the offerer as described in Section 9.1 for both 1747 full and lite implementations. For lite implementations, those rules 1748 also apply for setting the m/c-line. However, additional 1749 considerations apply to full implementations. 1751 9.2.1. Additional Procedures for Full Implementations 1753 The computation of the m/c-line additionally depends on the presence 1754 or absence of the a=remote-candidates attribute in a media stream. 1755 If present, it means that the offerer (acting as the controlling 1756 agent) believed that ICE processing has completed for that media 1757 stream. In this case, the remote-candidates attribute contains the 1758 candidates that the answerer is supposed to use. It is possible that 1759 the agent doesn't even know of these candidates yet; they will be 1760 discovered shortly through a response to an in-progress check. The 1761 full-mode agent MUST populate the m/c-line with the candidates from 1762 the a=remote-candidates attribute. 1764 If the offer did not contain the a=remote-candidates attribute, the 1765 agent follows the same procedures for populating the m/c-line as 1766 described for the offerer in Section 9.1. 1768 9.3. Updating the Check and Valid Lists 1770 If ICE is restarting for a media stream, the agent MUST start a new 1771 Valid list for that media stream. However, it retains the old Valid 1772 list for the purposes of sending media until ICE processing 1773 completes, at which point the old Valid list is discarded and the new 1774 one is utilized to determine media and keepalive targets. 1776 9.3.1. Additional Procedures for Full Implementations 1778 The procedures in this section are applicable only to full 1779 implementations. 1781 Once the subsequent offer/answer exchange has completed, each agent 1782 needs to determine the impact, if any, on the Check and Valid lists. 1783 Unless there is an ICE restart, an offer/answer exchange has no 1784 impact on the state of ICE processing for each media stream; that is 1785 determined entirely by the checks themselves. 1787 When ICE restarts, an agent MUST flush the check list for the 1788 affected media streams, and then recompute the check list and its 1789 states as described in Section 5.7. 1791 The remainder of this section describes processing when ICE is not 1792 restarting. 1794 If the offer/answer exchange added a new media stream, the agent MUST 1795 create a new check list for it (and an empty Valid list to start of 1796 course), as described in Section 5.7. 1798 If the offer/answer exchange removed a media stream, or an answer 1799 rejected an offered media stream, an agent MUST flush the Valid list 1800 for that media stream. It MUST terminate any STUN transactions in 1801 progress for that media stream. An agent MUST remove the check list 1802 for that media stream and cancel any pending periodic checks for it. 1804 If a media stream existed previously, and remains after the offer/ 1805 answer exchange, the agent MUST NOT modify the Valid list for that 1806 media stream. However, if an agent is in the Running state for that 1807 media stream, the check list is updated. To do that, the agent 1808 recomputes the check lists using the procedures described in 1809 Section 5.7. If a check on the new check lists was also on the 1810 previous check lists, and its state was Waiting, In-Progress, 1811 Succeeded or Failed, its state is copied over. If a check on the new 1812 check lists does not have a state (because it's a new check on an 1813 existing check list, or a check on a new check list, or the check was 1814 on an old check list but its state was not copied over) its state is 1815 set to Frozen. 1817 If none of the check lists are active (meaning that the checks in 1818 each check list are Frozen), the full-mode agent sets the first check 1819 in the check list for the first media stream to Waiting, and then 1820 sets the state of all other checks in that check list for the same 1821 component ID and with the same foundation to Waiting as well. 1823 Next, the agent goes through each check list, starting with the 1824 highest priority check. If a check has a state of Succeeded, and it 1825 has a component ID of 1, then all Frozen checks in the same check 1826 list with the same foundation whose component IDs are not one, have 1827 their state set to Waiting. If, for a particular check list, there 1828 are checks for each component of that media stream in the Succeeded 1829 state, the agent moves the state of all Frozen checks for the first 1830 component of all other media streams (and thus in different check 1831 lists) with the same foundation to Waiting. 1833 10. Keepalives 1835 All endpoints MUST send keepalives for each media session. These 1836 keepalives serve the purpose of keeping NAT bindings active for the 1837 media session. These keepalives MUST be sent regardless of whether 1838 the media stream is currently inactive, sendonly, recvonly or 1839 sendrecv, and regardless of the presence or value of the bandwidth 1840 attribute. These keepalives MUST be sent even if ICE is not being 1841 utilized for the session at all. The keepalive SHOULD be sent using 1842 a format which is supported by its peer. ICE endpoints allow for 1843 STUN-based keepalives for UDP streams, and as such, STUN keepalives 1844 MUST be used when an agent is communicating with a peer that supports 1845 ICE. An agent can determine that its peer supports ICE by the 1846 presence of a=candidate attributes for each media session. If the 1847 peer does not support ICE, the choice of a packet format for 1848 keepalives is a matter of local implementation. A format which 1849 allows packets to easily be sent in the absence of actual media 1850 content is RECOMMENDED. Examples of formats which readily meet this 1851 goal are RTP No-Op [28] and RTP comfort noise [24]. If the peer 1852 doesn't support any formats that are particularly well suited for 1853 keepalives, an agent SHOULD send RTP packets with an incorrect 1854 version number, or some other form of error which would cause them to 1855 be discarded by the peer. 1857 If there has been no packet sent on a candidate pair being used for 1858 media for Tr seconds (where packets include media and previous 1859 keepalives), an agent MUST generate a keepalive on that pair. Tr 1860 SHOULD be configurable and SHOULD have a default of 15 seconds. 1862 If STUN is being used for keepalives, a STUN Binding Indication is 1863 used [11]. The Binding Indication SHOULD NOT contain integrity 1864 checks; since the messages are simply discarded on receipt regardless 1865 of contents. The Indication SHOULD NOT contain the PRIORITY or USE- 1866 CANDIDATE attributes defined here. The Binding Indication is sent 1867 using the same local and remote candidates that are being used for 1868 media. An agent receipt a Binding Indication MUST discard it 1869 silently. Though Binding Indications are used for keepalives, an 1870 agent MUST be prepared to receive Binding Requests as well. If a 1871 Binding Request is received, a response is generated as discussed in 1872 [11], but there is no impact on ICE processing otherwise. 1874 An agent MUST begin the keepalive processing once ICE has selected 1875 candidates for usage with media, or media begins to flow, whichever 1876 happens first. Keepalives end once the session terminates or the 1877 media stream is removed. 1879 11. Media Handling 1881 11.1. Sending Media 1883 Procedures for sending media differ for full and lite 1884 implementations. 1886 11.1.1. Procedures for Full Implementations 1888 Agents always send media using a candidate pair. An agent will send 1889 media to the remote candidate in the pair (setting the destination 1890 address and port of the packet equal to that remote candidate), and 1891 will send it from the local candidate. When the local candidate is 1892 server or peer reflexive, media is originated from the base. Media 1893 sent from a relayed candidate is sent through that relay, using 1894 procedures defined in [12]. 1896 If the state of a media stream is Running, there is no old Valid list 1897 for that media stream (which would be due to an ICE restart), an 1898 agent MUST NOT send media. 1900 When an agent sends media, it MUST send it using the highest priority 1901 selected pair for each component in either the old Valid list for a 1902 media stream (if it exists), else the new Valid list for that media 1903 stream. In several cases, this will not be the same candidate pairs 1904 present in the m/c-line. When ICE first completes, if the selected 1905 pairs aren't a match for the m/c-line, an updated offer/answer 1906 exchange will take place to remedy this disparity. However, until 1907 that update offer arrives, there will not be a match. Furthermore, 1908 in very unusual cases, the m/c-lines in the updated offer/answer will 1909 not be a match. 1911 ICE has interactions with jitter buffer adaptation mechanisms. An 1912 RTP stream can begin using one candidate, and switch to another one, 1913 though this happens rarely with ICE. The newer candidate may result 1914 in RTP packets taking a different path through the network - one with 1915 different delay characteristics. As discussed below, agents are 1916 encouraged to re-adjust jitter buffers when there are changes in 1917 source or destination address. Furthermore, many audio codecs use 1918 the marker bit to signal the beginning of a talkspurt, for the 1919 purposes of jitter buffer adaptation. For such codecs, it is 1920 RECOMMENDED that the sender change the marker bit when an agent 1921 switches transmission of media from one candidate pair to another. 1923 11.1.2. Procedures for Lite Implementations 1925 A lite implementation MUST NOT send media until it has a Valid list 1926 that contains a candidate pair for each component of that media 1927 stream. Once that happens, the agent MAY begin sending media 1928 packets. To do that, it sends media to the remote candidate in the 1929 pair (setting the destination address and port of the packet equal to 1930 that remote candidate), and will send it from the local candidate. 1932 In cases where there has been an ICE restart, there will be an old 1933 and a new Valid list. The old Valid list MUST be used by the agent 1934 for sending media until the new one is complete, at which point the 1935 new one MUST be used, and the old one discarded. 1937 11.2. Receiving Media 1939 ICE implementations MUST be prepared to receive media on any 1940 candidates provided in the most recent offer/answer exchange. 1942 It is RECOMMENDED that, when an agent receives an RTP packet with a 1943 new source or destination IP address for a particular media stream, 1944 that the agent re-adjust its jitter buffers. 1946 RFC 3550 [21] describes an algorithm in Section 8.2 for detecting 1947 SSRC collisions and loops. These algorithms are based, in part, on 1948 seeing different source transport addresses with the same SSRC. 1949 However, when ICE is used, such changes will sometimes occur as the 1950 media streams switch between candidates. An agent will be able to 1951 determine that a media stream is from the same peer as a consequence 1952 of the STUN exchange that proceeds media transmission. Thus, if 1953 there is a change in source transport address, but the media packets 1954 come from the same peer agent, this SHOULD NOT be treated as an SSRC 1955 collision. 1957 12. Usage with SIP 1959 12.1. Latency Guidelines 1961 ICE requires a series of STUN-based connectivity checks to take place 1962 between endpoints. These checks start from the answerer on 1963 generation of its answer, and start from the offerer when it receives 1964 the answer. These checks can take time to complete, and as such, the 1965 selection of messages to use with offers and answers can effect 1966 perceived user latency. Two latency figures are of particular 1967 interest. These are the post-pickup delay and the post-dial delay. 1969 The post-pickup delay refers to the time between when a user "answers 1970 the phone" and when any speech they utter can be delivered to the 1971 caller. The post-dial delay refers to the time between when a user 1972 enters the destination address for the user, and ringback begins as a 1973 consequence of having succesfully started ringing the phone of the 1974 called party. 1976 To reduce post-dial delays, it is RECOMMENDED that the caller begin 1977 gathering candidates prior to actually sending its initial INVITE. 1978 This can be started upon user interface cues that a call is pending, 1979 such as activity on a keypad or the phone going offhook. 1981 If an offer is received in an INVITE request, the callee SHOULD 1982 immediately gather its candidates and then generate an answer in a 1983 provisional response. ICE requires that a provisional response with 1984 an SDP be transmitted reliably. This can be done through the 1985 existing PRACK mechanism [9], or through an optimization that is 1986 specific to ICE. With this optimization, provisional responses 1987 containing an SDP answer that begins ICE processing for one or more 1988 media streams can be sent reliably without RFC 3264. To do this, the 1989 agent retransmits the provisional response with th exponential 1990 backoff timers described in RFC 3262. Retransmits MUST cease on 1991 receipt of a STUN Binding Request for one of the media streams 1992 signaled in that SDP or on transmission of a 2xx response. If no 1993 Binding Request is received prior to the last retransmit, the agent 1994 does not consider the session terminated. Despite the fact that the 1995 provisional response will be delivered reliably, the rules for when 1996 an agent can send an updated offer or answer do not change from those 1997 specified in RFC 3262. Specifically, if the INVITE contained an 1998 offer, the same answer appears in all of the 1xx and in the 2xx 1999 response to the INVITE. Only after that 2xx has been sent can an 2000 updated offer/answer exchange occur. This optimization SHOULD NOT be 2001 used if both agents support PRACK. Note that the optimization is 2002 very specific to provisional response carrying answers that start ICE 2003 processing; it is not a general technique for 1xx reliability. 2005 Alternatively, an agent MAY delay sending an answer until the 200 OK, 2006 however this results in a poor user experience and is NOT 2007 RECOMMENDED. 2009 Once the answer has been sent, the agent SHOULD begin its 2010 connectivity checks. Once candidate pairs for each component of a 2011 media stream enter the valid list, the callee can begin sending media 2012 on that media stream. 2014 However, prior to this point, any media that needs to be sent towards 2015 the caller (such as SIP early media [25] cannot be transmitted. For 2016 this reason, implementations SHOULD delay alerting the called party 2017 until candidates for each component of each media stream have entered 2018 the valid list. In the case of a PSTN gateway, this would mean that 2019 the setup message into the PSTN is delayed until this point. Doing 2020 this increases the post-dial delay, but has the effect of eliminating 2021 'ghost rings'. Ghost rings are cases where the called party hears 2022 the phone ring, picks up, but hears nothing and cannot be heard. 2023 This technique works without requiring support for, or usage of, 2024 preconditions [6], since its a localized decision. It also has the 2025 benefit of guaranteeing that not a single packet of media will get 2026 clipped, so that post-pickup delay is zero. If an agent chooses to 2027 delay local alerting in this way, it SHOULD generate a 180 response 2028 once alerting begins. 2030 In addition to uses where the offer is in an INVITE, and the answer 2031 is in the provisional and/or 200 OK, ICE works with cases where the 2032 offer appears in the response. In such cases, which are common in 2033 third party call control, ICE agents SHOULD generate their offers in 2034 a reliable provisional response (which MUST utilize RFC 3262). In 2035 that case, the answer will arrive in a PRACK. This allows for ICE 2036 processing to take place prior to alerting. Once ICE completes, the 2037 agent can alert the user and then generate a 200 OK. The 200 OK 2038 would contain no SDP, since the offer/answer exchange has completed. 2039 Agents MAY place the offer in a 2xx instead (in which case the answer 2040 comes in the ACK). This flow is simpler but results in a poorer user 2041 experience. 2043 As discussed in Section 16, offer/answer exchanges SHOULD be secured 2044 against eavesdropping and man-in-the-middle attacks. To do that, the 2045 usage of SIPS [3] is RECOMMENDED when used in concert with ICE. 2047 12.2. SIP Option Tags and Media Feature Tags 2049 [13] specifies a SIP option tag and media feature tag for usage with 2050 ICE. ICE implementations using SIP SHOULD support this 2051 specification, which uses a feature tag in registrations to 2052 facilitate interoperability through gateways. 2054 12.3. Interactions with Forking 2056 ICE interacts very well with forking. Indeed, ICE fixes some of the 2057 problems associated with forking. Without ICE, when a call forks and 2058 the caller receives multiple incoming media streams, it cannot 2059 determine which media stream corresponds to which callee. 2061 With ICE, this problem is resolved. The connectivity checks which 2062 occur prior to transmission of media carry username fragments, which 2063 in turn are correlated to a specific callee. Subsequent media 2064 packets which arrive on the same 5-tuple as the connectivity check 2065 will be associated with that same callee. Thus, the caller can 2066 perform this correlation as long as it has received an answer. 2068 12.4. Interactions with Preconditions 2070 Quality of Service (QoS) preconditions, which are defined in RFC 3312 2071 [6] and RFC 4032 [7], apply only to the transport addresses listed in 2072 the m/c lines in an offer/answer. If ICE changes the transport 2073 address where media is received, this change is reflected in the m/c 2074 lines of a new offer/answer. As such, it appears like any other re- 2075 INVITE would, and is fully treated in RFC 3312 and 4032, which apply 2076 without regard to the fact that the m/c lines are changing due to ICE 2077 negotiations ocurring "in the background". 2079 Indeed, an agent SHOULD NOT indicate that Qos preconditions have been 2080 met until the ICE checks have completed and selected the candidate 2081 pairs to be used for media. 2083 ICE also has (purposeful) interactions with connectivity 2084 preconditions [27]. Those interactions are described there. Note 2085 that the procedures described in Section 12.1 describe their own type 2086 of "preconditions", albeit with less functionality than those 2087 provided by the explicit preconditions in [27]. 2089 12.5. Interactions with Third Party Call Control 2091 ICE works with Flows I, III and IV as described in [17]. Flow I 2092 works without the controller supporting or being aware of ICE. Flow 2093 IV will work as long as the controller passes along the ICE 2094 attributes without alteration. Flow II is fundamentally incompatible 2095 with ICE; each agent will believe itself to be the answerer and thus 2096 never generate a re-INVITE. 2098 The flows for continued operation, as described in Section 7 of RFC 2099 3725, require additional behavior of ICE implementations to support. 2100 In particular, if an agent receives a mid-dialog re-INVITE that 2101 contains no offer, it MUST restart ICE for each media stream and go 2102 through the process of gathering new candidates. Furthermore, that 2103 list of candidates SHOULD include the ones currently in-use. 2105 13. Grammar 2107 This specification defines seven new SDP attributes - the 2108 "candidate", "remote-candidates", "ice-lite", "ice-ufrag", "ice-pwd" 2109 "ice-options" and "ice-mismatch" attributes. 2111 The candidate attribute is a media-level attribute only. It contains 2112 a transport address for a candidate that can be used for connectivity 2113 checks. 2115 The syntax of this attribute is defined using Augmented BNF as 2116 defined in RFC 4234 [8]: 2118 candidate-attribute = "candidate" ":" foundation SP component-id SP 2119 transport SP 2120 priority SP 2121 connection-address SP ;from RFC 4566 2122 port ;port from RFC 4566 2123 [SP cand-type] 2124 [SP rel-addr] 2125 [SP rel-port] 2126 *(SP extension-att-name SP 2127 extension-att-value) 2129 foundation = 1*ice-char 2130 component-id = 1*DIGIT 2131 transport = "UDP" / transport-extension 2132 transport-extension = token ; from RFC 3261 2133 priority = 1*DIGIT 2134 cand-type = "typ" SP candidate-types 2135 candidate-types = "host" / "srflx" / "prflx" / "relay" / token 2136 rel-addr = "raddr" SP connection-address 2137 rel-port = "rport" SP port 2138 extension-att-name = byte-string ;from RFC 4566 2139 extension-att-value = byte-string 2140 ice-char = ALPHA / DIGIT / "+" / "/" 2142 The foundation is composed of one or more ice-char. The component-id 2143 is a positive integer, which identifies the specific component for 2144 which the transport address is a candidate. It MUST start at 1 and 2145 MUST increment by 1 for each component of a particular candidate. 2146 The connect-address production is taken from RFC 4566 [10], allowing 2147 for IPv4 addresses, IPv6 addresses and FQDNs. The port production is 2148 also taken from RFC 4566 [10]. The token production is taken from 2149 RFC 3261 [3]. The transport production indicates the transport 2150 protocol for the candidate. This specification only defines UDP. 2151 However, extensibility is provided to allow for future transport 2152 protocols to be used with ICE, such as TCP or the Datagram Congestion 2153 Control Protocol (DCCP) [29]. 2155 The cand-type production encodes the type of candidate. This 2156 specification defines the values "host", "srflx", "prflx" and "relay" 2157 for host, server reflexive, peer reflexive and relayed candidates, 2158 respectively. The set of candidate types is extensible for the 2159 future. Inclusion of the candidate type is optional. The rel-addr 2160 and rel-port productions convey information the related transport 2161 addresses. Rules for inclusion of these values is described in 2162 Section 4.3. 2164 The a=candidate attribute can itself be extended. The grammar allows 2165 for new name/value pairs to be added at the end of the attribute. An 2166 implementation MUST ignore any name/value pairs it doesn't 2167 understand. 2169 The syntax of the "remote-candidates" attribute is defined using 2170 Augmented BNF as defined in RFC 4234 [8]. The remote-candidates 2171 attribute is a media level attribute only. 2173 remote-candidate-att = "remote-candidates" ":" remote-candidate 2174 0*(SP remote-candidate) 2175 remote-candidate = component-ID SP connection-address SP port 2177 The attribute contains a connection-address and port for each 2178 component. The ordering of components is irrelevant. However, a 2179 value MUST be present for each component of a media stream. 2181 The syntax of the "ice-lite" and "ice-mismatch", both of which are 2182 flags, is: 2184 ice-lite = "ice-lite" 2185 ice-mismatch = "ice-mismatch" 2187 "ice-lite" is a session level attribute only, and "ice-mismatch" is a 2188 media level attribute only. The syntax of the "ice-pwd" and "ice- 2189 ufrag" attributes are defined as: 2191 ice-pwd-att = "ice-pwd" ":" password 2192 ice-ufrag-att = "ice-ufrag" ":" ufrag 2193 password = 22*ice-char 2194 ufrag = 4*ice-char 2196 The "ice-pwd" and "ice-ufrag" attributes can appear at either the 2197 session-level or media-level. When present in both, the value in the 2198 media-level takes precedence. Thus, the value at the session level 2199 is effectively a default that applies to all media streams, unless 2200 overriden by a media-level value. 2202 The "ice-options" attribute is a session level attribute. It 2203 contains a series of tokens which identify the options supported by 2204 the agent. Its grammar is: 2206 ice-options = "ice-options" ":" ice-option-tag 2207 0*(SP ice-option-tag) 2208 ice-option-tag = 1*ice-char 2210 14. Extensibility Considerations 2212 This specification makes very specific choices about how both agents 2213 in a session coordinate to arrive at the set of candidate pairs that 2214 are selected for media. It is anticipated that future specifications 2215 will want to alter these algorithms, whether they are simple changes 2216 like timer tweaks, or larger changes like a revamp of the priority 2217 algorithm. When such a change is made, providing interoperability 2218 between the two agents in a session is critical. 2220 Firstly, ICE provides the a=ice-options SDP attribute. Each 2221 extension or change to ICE is associated with a token. When an agent 2222 supporting such an extension or change generates an offer or an 2223 answer, it MUST include the token for that extension in this 2224 attribute. This allows each side to know what the other side is 2225 doing. This attribute MUST NOT be present if the agent doesn't 2226 support any ICE extensions or changes. 2228 At this time, no IANA registry or registration procedures are defined 2229 for these option tags. At time of writing, it is unclear whether ICE 2230 changes and extensions will be sufficiently common to warrrant a 2231 registry. 2233 One of the complications in achieving interoperability is that ICE 2234 relies on a distributed algorithm running on both agents to converge 2235 on an agreed set of candidate pairs. If the two agents run different 2236 algorithms, it can be difficult to guarantee convergence on the same 2237 candidate pairs. The introspective selection procedure described in 2238 Section 8 eliminates some of the tight coordination by delegating the 2239 selection algorithm completely to the controlling agent. 2240 Consequently, when a controlling agent is communicating with a peer 2241 that supports options it doesn't know about, the agent MUST run an 2242 introspective selection algorithm. When introspective selection is 2243 used, ICE will converge perfectly even when both agents use different 2244 pair prioritization algorithms. One of the keys to such convergence 2245 are triggered checks, which ensure that the favored pair is validated 2246 by both agents. Consequently, any future ICE enhancements MUST 2247 preserve triggered checks. 2249 15. Example 2251 Two agents, L and R, are using ICE. Both are full-mode ICE 2252 implementations. Both agents have a single IPv4 interface. For 2253 agent L, it is 10.0.1.1, and for agent R, 192.0.2.1. Both are 2254 configured with a single STUN server each (indeed, the same one for 2255 each), which is listening for STUN requests at an IP address of 2256 192.0.2.2 and port 3478. This STUN server supports only the Binding 2257 Discovery usage; relays are not used in this example. Agent L is 2258 behind a NAT, and agent R is on the public Internet. The NAT has an 2259 endpoint independent mapping property and an address dependent 2260 filtering property. The public side of the NAT has an IP address of 2261 192.0.2.3. 2263 To facilitate understanding, transport addresses are listed using 2264 variables that have mnemonic names. The format of the name is 2265 entity-type-seqno, where entity refers to the entity whose interface 2266 the transport address is on, and is one of "L", "R", "STUN", or 2267 "NAT". The type is either "PUB" for transport addresses that are 2268 public, and "PRIV" for transport addresses that are private. 2269 Finally, seq-no is a sequence number that is different for each 2270 transport address of the same type on a particular entity. Each 2271 variable has an IP address and port, denoted by varname.IP and 2272 varname.PORT, respectively, where varname is the name of the 2273 variable. 2275 The STUN server has advertised transport address STUN-PUB-1 (which is 2276 192.0.2.2:3478) for the binding discovery usage. 2278 In the call flow itself, STUN messages are annotated with several 2279 attributes. The "S=" attribute indicates the source transport 2280 address of the message. The "D=" attribute indicates the destination 2281 transport address of the message. The "MA=" attribute is used in 2282 STUN Binding Response messages and refers to the mapped address. 2283 "USE-CAND" implies the presence of the USE-CANDIDATE attribute. 2285 The call flow examples omit STUN authentication operations and RTCP, 2286 and focus on RTP for a single media stream between two full 2287 implementations. 2289 L NAT STUN R 2290 |RTP STUN alloc. | | 2291 |(1) STUN Req | | | 2292 |S=$L-PRIV-1 | | | 2293 |D=$STUN-PUB-1 | | | 2294 |------------->| | | 2295 | |(2) STUN Req | | 2296 | |S=$NAT-PUB-1 | | 2297 | |D=$STUN-PUB-1 | | 2298 | |------------->| | 2299 | |(3) STUN Res | | 2300 | |S=$STUN-PUB-1 | | 2301 | |D=$NAT-PUB-1 | | 2302 | |MA=$NAT-PUB-1 | | 2303 | |<-------------| | 2304 |(4) STUN Res | | | 2305 |S=$STUN-PUB-1 | | | 2306 |D=$L-PRIV-1 | | | 2307 |MA=$NAT-PUB-1 | | | 2308 |<-------------| | | 2309 |(5) Offer | | | 2310 |------------------------------------------->| 2311 | | | |RTP STUN alloc. 2312 | | |(6) STUN Req | 2313 | | |S=$R-PUB-1 | 2314 | | |D=$STUN-PUB-1 | 2315 | | |<-------------| 2316 | | |(7) STUN Res | 2317 | | |S=$STUN-PUB-1 | 2318 | | |D=$R-PUB-1 | 2319 | | |MA=$R-PUB-1 | 2320 | | |------------->| 2321 |(8) answer | | | 2322 |<-------------------------------------------| 2323 | |(9) Bind Req | | 2324 | |S=$R-PUB-1 | | 2325 | |D=L-PRIV-1 | | 2326 | |<----------------------------| 2327 | |Dropped | | 2328 |(10) Bind Req | | | 2329 |S=$L-PRIV-1 | | | 2330 |D=$R-PUB-1 | | | 2331 |USE-CAND | | | 2332 |------------->| | | 2333 | |(11) Bind Req | | 2334 | |S=$NAT-PUB-1 | | 2335 | |D=$R-PUB-1 | | 2336 | |USE-CAND | | 2337 | |---------------------------->| 2338 | |(12) Bind Res | | 2339 | |S=$R-PUB-1 | | 2340 | |D=$NAT-PUB-1 | | 2341 | |MA=$NAT-PUB-1 | | 2342 | |<----------------------------| 2343 |(13) Bind Res | | | 2344 |S=$R-PUB-1 | | | 2345 |D=$L-PRIV-1 | | | 2346 |MA=$NAT-PUB-1 | | | 2347 |<-------------| | | 2348 |RTP flows | | | 2349 | |(14) Bind Req | | 2350 | |S=$R-PUB-1 | | 2351 | |D=$NAT-PUB-1 | | 2352 | |<----------------------------| 2353 |(15) Bind Req | | | 2354 |S=$R-PUB-1 | | | 2355 |D=$L-PRIV-1 | | | 2356 |<-------------| | | 2357 |(16) Bind Res | | | 2358 |S=$L-PRIV-1 | | | 2359 |D=$R-PUB-1 | | | 2360 |MA=$R-PUB-1 | | | 2361 |------------->| | | 2362 | |(17) Bind Res | | 2363 | |S=$NAT-PUB-1 | | 2364 | |D=$R-PUB-1 | | 2365 | |MA=$R-PUB-1 | | 2366 | |---------------------------->| 2367 | | | |RTP flows 2369 Figure 11 2371 First, agent L obtains a host candidate from its local interface (not 2372 shown), and from that, sends a STUN Binding Request to the STUN 2373 server to get a server reflexive candidate (messages 1-4). Recall 2374 that the NAT has the address and port independent mapping property. 2375 Here, it creates a binding of NAT-PUB-1 for this UDP request, and 2376 this becomes the server reflexive candidate for RTP. 2378 Agent L sets a type preference of 126 for the host candidate and 100 2379 for the server reflexive. The local preference is 65535. Based on 2380 this, the priority of the host candidate is 2130706178 and for the 2381 server reflexive candidate is 1694498562. The host candidate is 2382 assigned a foundation of 1, and the server reflexive, a foundation of 2383 2. It chooses its server reflexive candidate as the in-use 2384 candidate, and encodes it into the m/c-line. The resulting offer 2385 (message 5) looks like (lines folded for clarity): 2387 v=0 2388 o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP 2389 s= 2390 c=IN IP4 $NAT-PUB-1.IP 2391 t=0 0 2392 a=ice-pwd:asd88fgpdd777uzjYhagZg 2393 a=ice-ufrag:8hhY 2394 m=audio $NAT-PUB-1.PORT RTP/AVP 0 2395 a=rtpmap:0 PCMU/8000 2396 a=candidate:1 1 UDP 2130706178 $L-PRIV-1.IP $L-PRIV-1.PORT typ local 2397 a=candidate:2 1 UDP 1694498562 $NAT-PUB-1.IP $NAT-PUB-1.PORT typ srflx raddr 2398 $L-PRIV-1.IP rport $L-PRIV-1.PORT 2400 The offer, with the variables replaced with their values, will look 2401 like (lines folded for clarity): 2403 v=0 2404 o=jdoe 2890844526 2890842807 IN IP4 10.0.1.1 2405 s= 2406 c=IN IP4 192.0.2.3 2407 t=0 0 2408 a=ice-pwd:asd88fgpdd777uzjYhagZg 2409 a=ice-ufrag:8hhY 2410 m=audio 45664 RTP/AVP 0 2411 a=rtpmap:0 PCMU/8000 2412 a=candidate:1 1 UDP 2130706178 10.0.1.1 8998 typ local 2413 a=candidate:2 1 UDP 1694498562 192.0.2.3 45664 typ srflx raddr 2414 10.0.1.1 rport 8998 2416 This offer is received at agent R. Agent R will obtain a host 2417 candidate, and from it, obtain a server reflexive candidate (messages 2418 6-7). Since R is not behind a NAT, this candidate is identical to 2419 its host candidate, and they share the same base. It therefore 2420 discards this candidate and ends up with a single host candidate. 2421 With identical type and local preferences as L, the priority for this 2422 candidate is 2130706178. It chooses a foundation of 1 for its single 2423 candidate. Its resulting answer looks like: 2425 v=0 2426 o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP 2427 s= 2428 c=IN IP4 $R-PUB-1.IP 2429 t=0 0 2430 a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh 2431 a=ice-ufrag:9uB6 2432 m=audio $R-PUB-1.PORT RTP/AVP 0 2433 a=rtpmap:0 PCMU/8000 2434 a=candidate:1 1 UDP 2130706178 $R-PUB-1.IP $R-PUB-1.PORT typ local 2436 With the variables filled in: 2438 v=0 2439 o=bob 2808844564 2808844564 IN IP4 192.0.2.1 2440 s= 2441 c=IN IP4 192.0.2.1 2442 t=0 0 2443 a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh 2444 a=ice-ufrag:9uB6 2445 m=audio 3478 RTP/AVP 0 2446 a=rtpmap:0 PCMU/8000 2447 a=candidate:1 1 UDP 2130706178 192.0.2.1 3478 typ local 2449 Since neither side indicated that they are passive-only, the agent 2450 which sent the offer that began ICE processing (agent L) becomes the 2451 controlling agent. 2453 Agents L and R both pair up the candidates. They both initially have 2454 two. However, agent L will prune the pair containing its server 2455 reflexive candidate, resulting in just one. At agent L, this pair 2456 (the check) has a local candidate of $L_PRIV_1 and remote candidate 2457 of $R_PUB_1, and has a candidate pair priority of 4.57566E+18 (note 2458 that an implementation would represent this as a 64 bit integer so as 2459 not to lose precision). At agent R, there are two checks. The 2460 highest priority has a local candidate of $R_PUB_1 and remote 2461 candidate of $L_PRIV_1 and has a priority of 4.57566E+18, and the 2462 second has a local candidate of $R_PUB_1 and remote candidate of 2463 $NAT_PUB_1 and priority 3.63891E+18. 2465 Agent R begins its connectivity check (message 9) for the first pair 2466 (between the two host candidates). Since R is the passive agent for 2467 this session, the check omits the USE-CANDIDATE attribute. The host 2468 candidate from agent L is private and behind a different NAT, and 2469 thus this check is discarded. 2471 When agent L gets the answer, it performs its one and only 2472 connectivity check (messages 10-13). It implements the default 2473 algorithm for candidate selection, and thus includes a USE-CANDIDATE 2474 attribute in this check. Since the check succeeds, agent L creates a 2475 new pair, whose local candidate is from the mapped address in the 2476 binding response (NAT-PUB-1 from message 13) and whose remote 2477 candidate is the destination of the request (R-PUB-1 from message 2478 10). This is added to the valid list. In addition, it is marked as 2479 selected since the Binding Request contained the USE-CANDIDATE 2480 attribute. Since there is a selected candidate in the Valid list for 2481 the one component of this media stream, ICE processing for this 2482 stream moves into the Completed state. Agent L can now send media if 2483 it so chooses. 2485 Upon receipt of the check from agent L (message 11), agent R will 2486 generate its triggered check. This check happens to match the next 2487 one on its check list - from its host candidate to agent L's server 2488 reflexive candidate. This check (messages 14-17) will succeed. 2489 Consequently, agent R constructs a new candidate pair using the 2490 mapped address from the response as the local candidate (R-PUB-1) and 2491 the destination of the request (NAT-PUB-1) as the remote candidate. 2492 This pair is added to the Valid list for that media stream. Since 2493 the check was generated in the reverse direction of a check that 2494 contained the USE-CANDIDATE attribute, the candidate pair is marked 2495 as selected. Consequently, processing for this stream moves into the 2496 Completed state, and agent R can also send media. 2498 16. Security Considerations 2500 There are several types of attacks possible in an ICE system. This 2501 section considers these attacks and their countermeasures. 2503 16.1. Attacks on Connectivity Checks 2505 An attacker might attempt to disrupt the STUN connectivity checks. 2506 Ultimately, all of these attacks fool an agent into thinking 2507 something incorrect about the results of the connectivity checks. 2508 The possible false conclusions an attacker can try and cause are: 2510 False Invalid: An attacker can fool a pair of agents into thinking a 2511 candidate pair is invalid, when it isn't. This can be used to 2512 cause an agent to prefer a different candidate (such as one 2513 injected by the attacker), or to disrupt a call by forcing all 2514 candidates to fail. 2516 False Valid: An attacker can fool a pair of agents into thinking a 2517 candidate pair is valid, when it isn't. This can cause an agent 2518 to proceed with a session, but then not be able to receive any 2519 media. 2521 False Peer-Reflexive Candidate: An attacker can cause an agent to 2522 discover a new peer reflexive candidate, when it shouldn't have. 2523 This can be used to redirect media streams to a DoS target or to 2524 the attacker, for eavesdropping or other purposes. 2526 False Valid on False Candidate: An attacker has already convinced an 2527 agent that there is a candidate with an address that doesn't 2528 actually route to that agent (for example, by injecting a false 2529 peer reflexive candidate or false server reflexive candidate). It 2530 must then launch an attack that forces the agents to believe that 2531 this candidate is valid. 2533 Of the various techniques for creating faked STUN messages described 2534 in [11], many are not applicable for the connectivity checks. 2535 Compromises of STUN servers are not much of a concern, since the STUN 2536 servers are embedded in endpoints and distributed throughout the 2537 network. Thus, compromising the STUN server is equivalent to 2538 comprimising the endpoint, and if that happens, far more problematic 2539 attacks are possible than those against ICE. Similarly, DNS attacks 2540 are usually irrelevant since STUN servers are not typically 2541 discovered via DNS, they are signaled via IP addresses embedded in 2542 SDP. Injection of fake responses and relaying modified requests all 2543 can be handled in ICE with the countermeasures discussed below. 2545 To force the false invalid result, the attacker has to wait for the 2546 connectivity check from one of the agents to be sent. When it is, 2547 the attacker needs to inject a fake response with an unrecoverable 2548 error response, such as a 600. However, since the candidate is, in 2549 fact, valid, the original request may reach the peer agent, and 2550 result in a success response. The attacker needs to force this 2551 packet or its response to be dropped, through a DoS attack, layer 2 2552 network disruption, or other technique. If it doesn't do this, the 2553 success response will also reach the originator, alerting it to a 2554 possible attack. Fortunately, this attack is mitigated completely 2555 through the STUN message integrity mechanism. The attacker needs to 2556 inject a fake response, and in order for this response to be 2557 processed, the attacker needs the password. If the offer/answer 2558 signaling is secured, the attacker will not have the password. 2560 Forcing the fake valid result works in a similar way. The agent 2561 needs to wait for the Binding Request from each agent, and inject a 2562 fake success response. The attacker won't need to worry about 2563 disrupting the actual response since, if the candidate is not valid, 2564 it presumably wouldn't be received anyway. However, like the fake 2565 invalid attack, this attack is mitigated completely through the STUN 2566 message integrity and offer/answer security techniques. 2568 Forcing the false peer reflexive candidate result can be done either 2569 with fake requests or responses, or with replays. We consider the 2570 fake requests and responses case first. It requires the attacker to 2571 send a Binding Request to one agent with a source IP address and port 2572 for the false candidate. In addition, the attacker must wait for a 2573 Binding Request from the other agent, and generate a fake response 2574 with a XOR-MAPPED-ADDRESS attribute containing the false candidate. 2575 Like the other attacks described here, this attack is mitigated by 2576 the STUN message integrity mechanisms and secure offer/answer 2577 exchanges. 2579 Forcing the false peer reflexive candidate result with packet replays 2580 is different. The attacker waits until one of the agents sends a 2581 check. It intercepts this request, and replays it towards the other 2582 agent with a faked source IP address. It must also prevent the 2583 original request from reaching the remote agent, either by launching 2584 a DoS attack to cause the packet to be dropped, or forcing it to be 2585 dropped using layer 2 mechanisms. The replayed packet is received at 2586 the other agent, and accepted, since the integrity check passes (the 2587 integrity check cannot and does not cover the source IP address and 2588 port). It is then responded to. This response will contain a XOR- 2589 MAPPED-ADDRESS with the false candidate, and will be sent to that 2590 false candidate. The attacker must then intercept it and relay it 2591 towards the originator. 2593 The other agent will then initiate a connectivity check towards that 2594 false candidate. This validation needs to succeed. This requires 2595 the attacker to force a false valid on a false candidate. Injecting 2596 of fake requests or responses to achieve this goal is prevented using 2597 the integrity mechanisms of STUN and the offer/answer exchange. 2598 Thus, this attack can only be launched through replays. To do that, 2599 the attacker must intercept the check towards this false candidate, 2600 and replay it towards the other agent. Then, it must intercept the 2601 response and replay that back as well. 2603 This attack is very hard to launch unless the attacker themself is 2604 identified by the fake candidate. This is because it requires the 2605 attacker to intercept and replay packets sent by two different hosts. 2606 If both agents are on different networks (for example, across the 2607 public Internet), this attack can be hard to coordinate, since it 2608 needs to occur against two different endpoints on different parts of 2609 the network at the same time. 2611 If the attacker themself is identified by the fake candidate the 2612 attack is easier to coordinate. However, if SRTP is used [22], the 2613 attacker will not be able to play the media packets, they will only 2614 be able to discard them, effectively disabling the media stream for 2615 the call. However, this attack requires the agent to disrupt packets 2616 in order to block the connectivity check from reaching the target. 2617 In that case, if the goal is to disrupt the media stream, its much 2618 easier to just disrupt it with the same mechanism, rather than attack 2619 ICE. 2621 16.2. Attacks on Address Gathering 2623 ICE endpoints make use of STUN for gathering candidates rom a STUN 2624 server in the network. This is corresponds to the Binding Discovery 2625 usage of STUN described in [11]. As a consequence, the attacks 2626 against STUN itself that are described in that specification can 2627 still be used against the binding discovery usage when utilized with 2628 ICE. 2630 However, the additional mechanisms provided by ICE actually 2631 counteract such attacks, making binding discovery with STUN more 2632 secure when combined with ICE than without ICE. 2634 Consider an attacker which is able to provide an agent with a faked 2635 mapped address in a STUN Binding Request that is used for address 2636 gathering. This is the primary attack primitive described in [11]. 2637 This address will be used as a server reflexive candidate in the ICE 2638 exchange. For this candidate to actually be used for media, the 2639 attacker must also attack the connectivity checks, and in particular, 2640 force a false valid on a false candidate. This attack is very hard 2641 to launch if the false address identifies a third party, and is 2642 prevented by SRTP if it identifies the attacker themself. 2644 If the attacker elects not to attack the connectivity checks, the 2645 worst it can do is prevent the server reflexive candidate from being 2646 used. However, if the peer agent has at least one candidate that is 2647 reachable by the agent under attack, the STUN connectivity checks 2648 themselves will provide a peer reflexive candidate that can be used 2649 for the exchange of media. Peer reflexive candidates are generally 2650 preferred over server reflexive candidates. As such, an attack 2651 solely on the STUN address gathering will normally have no impact on 2652 a session at all. 2654 16.3. Attacks on the Offer/Answer Exchanges 2656 An attacker that can modify or disrupt the offer/answer exchanges 2657 themselves can readily launch a variety of attacks with ICE. They 2658 could direct media to a target of a DoS attack, they could insert 2659 themselves into the media stream, and so on. These are similar to 2660 the general security considerations for offer/answer exchanges, and 2661 the security considerations in RFC 3264 [4] apply. These require 2662 techniques for message integrity and encryption for offers and 2663 answers, which are satisfied by the SIPS mechanism [3] when SIP is 2664 used. As such, the usage of SIPS with ICE is RECOMMENDED. 2666 16.4. Insider Attacks 2668 In addition to attacks where the attacker is a third party trying to 2669 insert fake offers, answers or stun messages, there are several 2670 attacks possible with ICE when the attacker is an authenticated and 2671 valid participant in the ICE exchange. 2673 16.4.1. The Voice Hammer Attack 2675 The voice hammer attack is an amplification attack. In this attack, 2676 the attacker initiates sessions to other agents, and includes the IP 2677 address and port of a DoS target in the m/c-line of their SDP. This 2678 causes substantial amplification; a single offer/answer exchange can 2679 create a continuing flood of media packets, possibly at high rates 2680 (consider video sources). This attack is not specific to ICE, but 2681 ICE can help provide remediation. 2683 Specifically, if ICE is used, the agent receiving the malicious SDP 2684 will first peform connectivity checks to the target of media before 2685 sending it there. If this target is a third party host, the checks 2686 will not succeed, and media is never sent. 2688 Unfortunately, ICE doesn't help if its not used, in which case an 2689 attacker could simply send the offer without the ICE parameters. 2690 However, in environments where the set of clients are known, and 2691 limited to ones that support ICE, the server can reject any offers or 2692 answers that don't indicate ICE support. 2694 16.4.2. STUN Amplification Attack 2696 The STUN amplification attack is similar to the voice hammer. 2697 However, instead of voice packets being directed to the target, STUN 2698 connectivity checks are directed to the target. This attack is 2699 accomplished by having the offerer send an offer with a large number 2700 of candidates, say 50. The answerer receives the offer, and starts 2701 its checks, which are directed at the target, and consequently, never 2702 generate a response. The answerer will start a new connectivity 2703 check every 20ms, and each check is a STUN transaction consisting of 2704 7 transmissions of a message 65 bytes in length (plus 28 bytes for 2705 the IP/UDP header) that runs for 7.9 seconds, for a total of 58 2706 bytes/second per transaction on average. In the worst case, there 2707 can be 395 transactions in progress at once (7.9 seconds divided by 2708 20ms), for a total of 182 kbps, just for STUN requests. 2710 It is impossible to eliminate the amplification, but the volume can 2711 be reduced through a variety of heuristics. Agents SHOULD limit the 2712 total number of connectivity checks they perform to 100. 2713 Additionally, agents MAY limit the number of candidates they'll 2714 accept in an offer or answer. 2716 16.5. Interactions with Application Layer Gateways and SIP 2718 Application Layer Gateways (ALGs) are functions present in a NAT 2719 device which inspect the contents of packets and modify them, in 2720 order to facilitate NAT traversal for application protocols. Session 2721 Border Controllers (SBC) are close cousins of ALGs, but are less 2722 transparent since they actually exist as application layer SIP 2723 intermediaries. ICE has interactions with SBCs and ALGs. 2725 If an ALG is SIP aware but not ICE aware, ICE will work through it as 2726 long as the ALG correctly modifies the m/c-lines of SDP. In this 2727 case, correctly means that the ALG does not modify m/c-lines with 2728 external addresses. If the m/c-line contains internal addresses, but 2729 ones for which a public binding exists, the ALG replaces the internal 2730 address in the m/c-line with the public binding. Unfortunately, many 2731 ALG are known to work poorly in these corner cases. ICE does not try 2732 to work around broken ALGs, as this is outside the scope of its 2733 functionality. ICE can help diagnose these conditions, which often 2734 show up as a mismatch between the set of candidates and the m/c-line. 2735 The a=ice-mismatch parameter is used for this purpose. 2737 ICE works best through ALGs when the signaling is run over TLS. This 2738 prevents the ALG from manipulating the SDP messages and interfering 2739 with ICE operation. Implementations which are expected to be 2740 deployed behind ALGs SHOULD provide for TLS transport of the SDP. 2742 If an SBC is SIP aware but not ICE aware, the result depends on the 2743 behavior of the SBC. If it is acting as a proper Back-to-Back User 2744 Agent (B2BUA), the SBC will remove any SDP attributes it doesn't 2745 understand, including the ICE attributes. Consequently, the call 2746 will appear to both endpoints as if the other side doesn't support 2747 ICE. This will result in ICE being disabled, and media flowing 2748 through the SBC, if they SBC has requested it. If, however, the SBC 2749 passes the ICE attributes without modification, yet modifies the m/c- 2750 lines, this will be detected as an ICE mismatch, and ICE processing 2751 is aborted for the call. It is outside of the scope of ICE for it to 2752 act as a tool for "working around" SBCs. If one is present, ICE will 2753 not be used and the SBC techniques take precedence. 2755 17. Definition of Connectivity Check Usage 2757 STUN [11] requires that new usages provide a specific set of 2758 information as part of their formal definition. This section meets 2759 the requirements spelled out there. 2761 17.1. Applicability 2763 This STUN usage provides a connectivity check between two peers 2764 participating in an offer/answer exchange. This check serves to 2765 validate a pair of candidates for usage of exchange of media. 2766 Connectivity checks also allow agents to discover reflexive 2767 candidates towards their peers, called peer reflexive candidates. 2768 Finally, connectivity checks serve to keep NAT bindings alive. 2770 It is fundamental to this STUN usage that the addresses and ports 2771 used for media are the same ones used for the Binding Requests and 2772 responses. Consequently, it will be necessary to demultiplex STUN 2773 traffic from whatever the media traffic is. This demultiplexing is 2774 done using the techniques described in [11]. 2776 17.2. Client Discovery of Server 2778 The client does not follow the DNS-based procedures defined in [11]. 2779 Rather, the remote candidate of the check to be performed is used as 2780 the transport address of the STUN server. Note that the STUN server 2781 is a logical entity, and is not a physically distinct server in this 2782 usage. 2784 17.3. Server Determination of Usage 2786 The server is aware of this usage because it signaled this port 2787 through the offer/answer exchange. Any STUN packets received on this 2788 port will be for the connectivity check usage. 2790 17.4. New Requests or Indications 2792 This usage does not define any new message types. 2794 17.5. New Attributes 2796 This usage defines two new attributes, PRIORITY and USE-CANDIDATE. 2798 The PRIORITY attribute indicates the priority that is to be 2799 associated with a peer reflexive candidate, should one be discovered 2800 by this check. It is a 32 bit unsigned integer, and has an attribute 2801 type of 0x0024. 2803 The USE-CANDIDATE attribute indicates that the candidate pair 2804 resulting from this check should be used for transmission of media. 2805 The attribute has no content (the Length field of the attribute is 2806 zero); it serves as a flag. It has an attribute type of 0x0025. 2808 17.6. New Error Response Codes 2810 This usage does not define any new error response codes. 2812 17.7. Client Procedures 2814 Client procedures are defined in Section 7.1. 2816 17.8. Server Procedures 2818 Server procedures are defined in Section 7.2. 2820 17.9. Security Considerations for Connectivity Check 2822 Security considerations for the connectivity check are discussed in 2823 Section 16. 2825 18. IANA Considerations 2827 This specification registers new SDP attributes and new STUN 2828 attributes. 2830 18.1. SDP Attributes 2832 This specification defines seven new SDP attributes per the 2833 procedures of Section 8.2.4 of [10]. The required information for 2834 the registrations are included here. 2836 18.1.1. candidate Attribute 2838 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2840 Attribute Name: candidate 2842 Long Form: candidate 2844 Type of Attribute: media level 2846 Charset Considerations: The attribute is not subject to the charset 2847 attribute. 2849 Purpose: This attribute is used with Interactive Connectivity 2850 Establishment (ICE), and provides one of many possible candidate 2851 addresses for communication. These addresses are validated with 2852 an end-to-end connectivity check using Simple Traversal Underneath 2853 NAT (STUN). 2855 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2856 please replace XXXX with the RFC number of this specification]. 2858 18.1.2. remote-candidates Attribute 2860 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2862 Attribute Name: remote-candidates 2864 Long Form: remote-candidates 2866 Type of Attribute: media level 2868 Charset Considerations: The attribute is not subject to the charset 2869 attribute. 2871 Purpose: This attribute is used with Interactive Connectivity 2872 Establishment (ICE), and provides the identity of the remote 2873 candidates that the offerer wishes the answerer to use in its 2874 answer. 2876 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2877 please replace XXXX with the RFC number of this specification]. 2879 18.1.3. ice-lite Attribute 2881 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2883 Attribute Name: ice-lite 2885 Long Form: ice-lite 2887 Type of Attribute: session level 2889 Charset Considerations: The attribute is not subject to the charset 2890 attribute. 2892 Purpose: This attribute is used with Interactive Connectivity 2893 Establishment (ICE), and indicates that an agent has the minimum 2894 functionality required to support ICE inter-operation with a peer 2895 that has a full implementation. 2897 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2898 please replace XXXX with the RFC number of this specification]. 2900 18.1.4. ice-mismatch Attribute 2902 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2904 Attribute Name: ice-mismatch 2906 Long Form: ice-mismatch 2908 Type of Attribute: session level 2910 Charset Considerations: The attribute is not subject to the charset 2911 attribute. 2913 Purpose: This attribute is used with Interactive Connectivity 2914 Establishment (ICE), and indicates that an agent is ICE capable, 2915 but did not proceed with ICE due to a mismatch of candidates with 2916 the values in the m/c-line. 2918 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2919 please replace XXXX with the RFC number of this specification]. 2921 18.1.5. ice-pwd Attribute 2923 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2925 Attribute Name: ice-pwd 2927 Long Form: ice-pwd 2929 Type of Attribute: session or media level 2931 Charset Considerations: The attribute is not subject to the charset 2932 attribute. 2934 Purpose: This attribute is used with Interactive Connectivity 2935 Establishment (ICE), and provides the password used to protect 2936 STUN connectivity checks. 2938 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2939 please replace XXXX with the RFC number of this specification]. 2941 18.1.6. ice-ufrag Attribute 2943 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2945 Attribute Name: ice-ufrag 2947 Long Form: ice-ufrag 2949 Type of Attribute: session or media level 2951 Charset Considerations: The attribute is not subject to the charset 2952 attribute. 2954 Purpose: This attribute is used with Interactive Connectivity 2955 Establishment (ICE), and provides the fragments used to construct 2956 the username in STUN connectivity checks. 2958 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2959 please replace XXXX with the RFC number of this specification]. 2961 18.1.7. ice-options Attribute 2963 Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. 2965 Attribute Name: ice-options 2967 Long Form: ice-options 2969 Type of Attribute: session level 2971 Charset Considerations: The attribute is not subject to the charset 2972 attribute. 2974 Purpose: This attribute is used with Interactive Connectivity 2975 Establishment (ICE), and indicates the ICE options or extensions 2976 used by the agent. 2978 Appropriate Values: See Section 13 of RFC XXXX [Note to RFC-ed: 2979 please replace XXXX with the RFC number of this specification]. 2981 18.2. STUN Attributes 2983 This section registers two new STUN attributes per the procedures in 2984 [11]. 2986 0x0024 PRIORITY 2987 0x0025 USE-CANDIDATE 2989 19. IAB Considerations 2991 The IAB has studied the problem of "Unilateral Self Address Fixing", 2992 which is the general process by which a agent attempts to determine 2993 its address in another realm on the other side of a NAT through a 2994 collaborative protocol reflection mechanism [20]. ICE is an example 2995 of a protocol that performs this type of function. Interestingly, 2996 the process for ICE is not unilateral, but bilateral, and the 2997 difference has a signficant impact on the issues raised by IAB. 2998 Indeed, ICE can be considered a B-SAF (Bilateral Self-Address Fixing) 2999 protocol, rather than an UNSAF protocol. Regardless, the IAB has 3000 mandated that any protocols developed for this purpose document a 3001 specific set of considerations. This section meets those 3002 requirements. 3004 19.1. Problem Definition 3006 From RFC 3424 any UNSAF proposal must provide: 3008 Precise definition of a specific, limited-scope problem that is to 3009 be solved with the UNSAF proposal. A short term fix should not be 3010 generalized to solve other problems; this is why "short term fixes 3011 usually aren't". 3013 The specific problems being solved by ICE are: 3015 Provide a means for two peers to determine the set of transport 3016 addresses which can be used for communication. 3018 Provide a means for resolving many of the limitations of other 3019 UNSAF mechanisms by wrapping them in an additional layer of 3020 processing (the ICE methodology). 3022 Provide a means for a agent to determine an address that is 3023 reachable by another peer with which it wishes to communicate. 3025 19.2. Exit Strategy 3027 From RFC 3424, any UNSAF proposal must provide: 3029 Description of an exit strategy/transition plan. The better short 3030 term fixes are the ones that will naturally see less and less use 3031 as the appropriate technology is deployed. 3033 ICE itself doesn't easily get phased out. However, it is useful even 3034 in a globally connected Internet, to serve as a means for detecting 3035 whether a router failure has temporarily disrupted connectivity, for 3036 example. ICE also helps prevent certain security attacks which have 3037 nothing to do with NAT. However, what ICE does is help phase out 3038 other UNSAF mechanisms. ICE effectively selects amongst those 3039 mechanisms, prioritizing ones that are better, and deprioritizing 3040 ones that are worse. Local IPv6 addresses can be preferred. As NATs 3041 begin to dissipate as IPv6 is introduced, server reflexive and 3042 relayed candidates (both forms of UNSAF mechanisms) simply never get 3043 used, because higher priority connectivity exists to the native host 3044 candidates. Therefore, the servers get used less and less, and can 3045 eventually be remove when their usage goes to zero. 3047 Indeed, ICE can assist in the transition from IPv4 to IPv6. It can 3048 be used to determine whether to use IPv6 or IPv4 when two dual-stack 3049 hosts communicate with SIP (IPv6 gets used). It can also allow a 3050 network with both 6to4 and native v6 connectivity to determine which 3051 address to use when communicating with a peer. 3053 19.3. Brittleness Introduced by ICE 3055 From RFC3424, any UNSAF proposal must provide: 3057 Discussion of specific issues that may render systems more 3058 "brittle". For example, approaches that involve using data at 3059 multiple network layers create more dependencies, increase 3060 debugging challenges, and make it harder to transition. 3062 ICE actually removes brittleness from existing UNSAF mechanisms. In 3063 particular, traditional STUN (as described in RFC 3489 [14]) has 3064 several points of brittleness. One of them is the discovery process 3065 which requires a agent to try and classify the type of NAT it is 3066 behind. This process is error-prone. With ICE, that discovery 3067 process is simply not used. Rather than unilaterally assessing the 3068 validity of the address, its validity is dynamically determined by 3069 measuring connectivity to a peer. The process of determining 3070 connectivity is very robust. 3072 Another point of brittleness in traditional STUN and any other 3073 unilateral mechanism is its absolute reliance on an additional 3074 server. ICE makes use of a server for allocating unilateral 3075 addresses, but allows agents to directly connect if possible. 3076 Therefore, in some cases, the failure of a STUN server would still 3077 allow for a call to progress when ICE is used. 3079 Another point of brittleness in traditional STUN is that it assumes 3080 that the STUN server is on the public Internet. Interestingly, with 3081 ICE, that is not necessary. There can be a multitude of STUN servers 3082 in a variety of address realms. ICE will discover the one that has 3083 provided a usable address. 3085 The most troubling point of brittleness in traditional STUN is that 3086 it doesn't work in all network topologies. In cases where there is a 3087 shared NAT between each agent and the STUN server, traditional STUN 3088 may not work. With ICE, that restriction is removed. 3090 Traditional STUN also introduces some security considerations. 3091 Fortunately, those security considerations are also mitigated by ICE. 3093 Consequently, ICE serves to repair the brittleness introduced in 3094 other UNSAF mechanisms, and does not introduce any additional 3095 brittleness into the system. 3097 19.4. Requirements for a Long Term Solution 3099 From RFC 3424, any UNSAF proposal must provide: 3101 Identify requirements for longer term, sound technical solutions 3102 -- contribute to the process of finding the right longer term 3103 solution. 3105 Our conclusions from STUN remain unchanged. However, we feel ICE 3106 actually helps because we believe it can be part of the long term 3107 solution. 3109 19.5. Issues with Existing NAPT Boxes 3111 From RFC 3424, any UNSAF proposal must provide: 3113 Discussion of the impact of the noted practical issues with 3114 existing, deployed NA[P]Ts and experience reports. 3116 A number of NAT boxes are now being deployed into the market which 3117 try and provide "generic" ALG functionality. These generic ALGs hunt 3118 for IP addresses, either in text or binary form within a packet, and 3119 rewrite them if they match a binding. This interferes with 3120 traditional STUN. However, the update to STUN [11] uses an encoding 3121 which hides these binary addresses from generic ALGs. Since [11] is 3122 required for all ICE implementations, this NAPT problem does not 3123 impact ICE. 3125 Existing NAPT boxes have non-deterministic and typically short 3126 expiration times for UDP-based bindings. This requires 3127 implementations to send periodic keepalives to maintain those 3128 bindings. ICE uses a default of 15s, which is a very conservative 3129 estimate. Eventually, over time, as NAT boxes become compliant to 3130 behave [31], this minimum keepalive will become deterministic and 3131 well-known, and the ICE timers can be adjusted. Having a way to 3132 discover and control the minimum keepalive interval would be far 3133 better still. 3135 20. Acknowledgements 3137 The authors would like to thank Flemming Andreasen, Rohan Mahy, Dean 3138 Willis, Eric Cooper, Dan Wing, Douglas Otis, Tim Moore, and Francois 3139 Audet for their comments and input. A special thanks goes to Bill 3140 May, who suggested several of the concepts in this specification, 3141 Philip Matthews, who suggested many of the key performance 3142 optimizations in this specification, Eric Rescorla, who drafted the 3143 text in the introduction, and Magnus Westerlund, for doing several 3144 detailed reviews on the various revisions of this specification. 3146 21. References 3148 21.1. Normative References 3150 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 3151 Levels", BCP 14, RFC 2119, March 1997. 3153 [2] Huitema, C., "Real Time Control Protocol (RTCP) attribute in 3154 Session Description Protocol (SDP)", RFC 3605, October 2003. 3156 [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 3157 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 3158 Session Initiation Protocol", RFC 3261, June 2002. 3160 [4] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 3161 Session Description Protocol (SDP)", RFC 3264, June 2002. 3163 [5] Casner, S., "Session Description Protocol (SDP) Bandwidth 3164 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, 3165 July 2003. 3167 [6] Camarillo, G., Marshall, W., and J. Rosenberg, "Integration of 3168 Resource Management and Session Initiation Protocol (SIP)", 3169 RFC 3312, October 2002. 3171 [7] Camarillo, G. and P. Kyzivat, "Update to the Session Initiation 3172 Protocol (SIP) Preconditions Framework", RFC 4032, March 2005. 3174 [8] Crocker, D. and P. Overell, "Augmented BNF for Syntax 3175 Specifications: ABNF", RFC 4234, October 2005. 3177 [9] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional 3178 Responses in Session Initiation Protocol (SIP)", RFC 3262, 3179 June 2002. 3181 [10] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 3182 Description Protocol", RFC 4566, July 2006. 3184 [11] Rosenberg, J., "Simple Traversal Underneath Network Address 3185 Translators (NAT) (STUN)", draft-ietf-behave-rfc3489bis-05 3186 (work in progress), October 2006. 3188 [12] Rosenberg, J., "Obtaining Relay Addresses from Simple Traversal 3189 Underneath NAT (STUN)", draft-ietf-behave-turn-02 (work in 3190 progress), October 2006. 3192 [13] Rosenberg, J., "Indicating Support for Interactive Connectivity 3193 Establishment (ICE) in the Session Initiation Protocol (SIP)", 3194 draft-ietf-sip-ice-option-tag-00 (work in progress), 3195 January 2007. 3197 21.2. Informative References 3199 [14] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN 3200 - Simple Traversal of User Datagram Protocol (UDP) Through 3201 Network Address Translators (NATs)", RFC 3489, March 2003. 3203 [15] Senie, D., "Network Address Translator (NAT)-Friendly 3204 Application Design Guidelines", RFC 3235, January 2002. 3206 [16] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and A. 3207 Rayhan, "Middlebox communication architecture and framework", 3208 RFC 3303, August 2002. 3210 [17] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, 3211 "Best Current Practices for Third Party Call Control (3pcc) in 3212 the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, 3213 April 2004. 3215 [18] Borella, M., Lo, J., Grabelsky, D., and G. Montenegro, "Realm 3216 Specific IP: Framework", RFC 3102, October 2001. 3218 [19] Borella, M., Grabelsky, D., Lo, J., and K. Taniguchi, "Realm 3219 Specific IP: Protocol Specification", RFC 3103, October 2001. 3221 [20] Daigle, L. and IAB, "IAB Considerations for UNilateral Self- 3222 Address Fixing (UNSAF) Across Network Address Translation", 3223 RFC 3424, November 2002. 3225 [21] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 3226 "RTP: A Transport Protocol for Real-Time Applications", 3227 RFC 3550, July 2003. 3229 [22] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 3230 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 3231 RFC 3711, March 2004. 3233 [23] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via 3234 IPv4 Clouds", RFC 3056, February 2001. 3236 [24] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 3237 Comfort Noise (CN)", RFC 3389, September 2002. 3239 [25] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone 3240 Generation in the Session Initiation Protocol (SIP)", RFC 3960, 3241 December 2004. 3243 [26] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. 3244 Weiss, "An Architecture for Differentiated Services", RFC 2475, 3245 December 1998. 3247 [27] Andreasen, F., "Connectivity Preconditions for Session 3248 Description Protocol Media Streams", 3249 draft-ietf-mmusic-connectivity-precon-02 (work in progress), 3250 June 2006. 3252 [28] Andreasen, F., "A No-Op Payload Format for RTP", 3253 draft-ietf-avt-rtp-no-op-00 (work in progress), May 2005. 3255 [29] Kohler, E., Handley, M., and S. Floyd, "Datagram Congestion 3256 Control Protocol (DCCP)", RFC 4340, March 2006. 3258 [30] Hellstrom, G. and P. Jones, "RTP Payload for Text 3259 Conversation", RFC 4103, June 2005. 3261 [31] Audet, F. and C. Jennings, "NAT Behavioral Requirements for 3262 Unicast UDP", draft-ietf-behave-nat-udp-08 (work in progress), 3263 October 2006. 3265 [32] Jennings, C. and R. Mahy, "Managing Client Initiated 3266 Connections in the Session Initiation Protocol (SIP)", 3267 draft-ietf-sip-outbound-07 (work in progress), January 2007. 3269 [33] Rescorla, E., "Overview of the Lite Implementation of 3270 Interactive Connectivity Establishment (ICE)", 3271 draft-ietf-mmusic-ice-lite-00.txt (work in progress), 3272 January 2007. 3274 Appendix A. Lite and Full Implementations 3276 ICE allows for two types of implementations. A full implementation 3277 supports the controlling and controlled roles in a session, and can 3278 also perform address gathering. In contrast, a lite implementation 3279 is a minimalist implementation that does little but respond to STUN 3280 checks. 3282 Because ICE requires both endpoints to support it in order to bring 3283 benefits to either endpoint, incremental deployment of ICE in a 3284 network is more complicated. Many sessions involve an endpoint which 3285 is, by itself, not behind a NAT and not one that would worry about 3286 NAT traversal. Examples include gateways to the PSTN, media servers, 3287 conference bridges, and application servers. A very common case is 3288 to have one endpoint that requires NAT traversal (such as a VoIP hard 3289 phone or soft phone) make a call through one of these devices. Even 3290 if the phone supports a full ICE implementation, ICE won't be used at 3291 all if the other device doesn't support it. The lite implementation 3292 allows for a low-cost entry point for these devices. Once they 3293 support the lite implementation, full implementations can connect to 3294 them and get the full benefits of ICE. 3296 Consequently, a lite implementation is only appropriate for devices 3297 that will always be connected to the public Internet and have a 3298 public IP address at which it can receive packets from any 3299 correspondent. ICE will not function when a lite implementation is 3300 placed behind a NAT. 3302 It is important to note that the lite implementation was added to 3303 this specification to provide a stepping stone to full 3304 implementation. Even for devices that are always connected to the 3305 public Internet, a full implementation is preferable if achievable. 3306 A full implementation will reduce call setup times. Full 3307 implementations also obtain the security benefits of ICE unrelated to 3308 NAT traversal; in particular, the voice hammer attack described in 3309 Section 16 is prevented only for full implementations, not lite. 3310 Finally, it is often the case that a device which finds itself with a 3311 public address today will be placed in a network tomorrow where it 3312 will be behind a NAT. It is difficult to definitively know, over the 3313 lifetime of a device or product, that it will always be used on the 3314 public Internet. Full implementation provides assurance that 3315 communications will always work. 3317 Appendix B. Design Motivations 3319 ICE contains a number of normative behaviors which may themselves be 3320 simple, but derive from complicated or non-obvious thinking or use 3321 cases which merit further discussion. Since these design motivations 3322 are not neccesary to understand for purposes of implementation, they 3323 are discussed here in an appendix to the specification. This section 3324 is non-normative. 3326 B.1. Pacing of STUN Transactions 3328 STUN transactions used to gather candidates and to verify 3329 connectivity are paced out at an approximate rate of one new 3330 transaction every Ta seconds, where Ta has a default of 20ms. Why 3331 are these transactions paced, and why was 20ms chosen as default? 3333 Sending of these STUN requests will often have the effect of creating 3334 bindings on NAT devices between the client and the STUN servers. 3335 Experience has shown that many NAT devices have upper limits on the 3336 rate at which they will create new bindings. Furthermore, 3337 transmission of these packets on the network makes use of bandwidth 3338 and needs to be rate limited by the agent. As a consequence, the 3339 pacing ensures that the NAT devices does not get overloaded and that 3340 traffic is kept at a reasonable rate. 3342 B.2. Candidates with Multiple Bases 3344 Section 4.1.1 talks about merging together candidates that are 3345 identical but have different bases. When can an agent have two 3346 candidates that have the same IP address and port, but different 3347 bases? Consider the topology of Figure 17: 3349 +----------+ 3350 | STUN Srvr| 3351 +----------+ 3352 | 3353 | 3354 ----- 3355 // \\ 3356 | | 3357 | B:net10 | 3358 | | 3359 \\ // 3360 ----- 3361 | 3362 | 3363 +----------+ 3364 | NAT | 3365 +----------+ 3366 | 3367 | 3368 ----- 3369 // \\ 3370 | A | 3371 |192.168/16 | 3372 | | 3373 \\ // 3374 ----- 3375 | 3376 | 3377 |192.168.1.1 ----- 3378 +----------+ // \\ +----------+ 3379 | | | | | | 3380 | Offerer |---------| C:net10 |---------| Answerer | 3381 | |10.0.1.1 | | 10.0.1.2 | | 3382 +----------+ \\ // +----------+ 3383 ----- 3385 Figure 17 3387 In this case, the offerer is multi-homed. It has one interface, 3388 10.0.1.1, on network C, which is a net 10 private network. The 3389 Answerer is on this same network. The offerer is also connected to 3390 network A, which is 192.168/16. The offerer has an interface of 3391 192.168.1.1 on this network. There is a NAT on this network, natting 3392 into network B, which is another net10 private network, but not 3393 connected to network C. There is a STUN server on network B. 3395 The offerer obtains a host candidate on its interface on network C 3396 (10.0.1.1:2498) and a host candidate on its interface on network A 3397 (192.168.1.1:3344). It performs a STUN query to its configured STUN 3398 server from 192.168.1.1:3344. This query passes through the NAT, 3399 which happens to assign the binding 10.0.1.1:2498. The STUN server 3400 reflects this in the STUN Binding Response. Now, the offerer has 3401 obtained a server reflexive candidate with a transport address that 3402 is identical to a host candidate (10.0.1.1:2498). However, the 3403 server reflexive candidate has a base of 192.168.1.1:3344, and the 3404 host candidate has a base of 10.0.1.1:2498. 3406 B.3. Purpose of the Translation 3408 When a candidate is relayed, the SDP offer or answer contain both the 3409 relayed candidate and its translation. However, the translation is 3410 never used by ICE itself. Why is it present in the message? 3412 There are two motivations for its inclusion. The first is 3413 diagnostic. It is very useful to know the relationship between the 3414 different types of candidates. By including the translation, an 3415 agent can know which relayed candidate is associated with which 3416 reflexive candidate, which in turn is associated with a specific host 3417 candidate. When checks for one candidate succeed and not the others, 3418 this provides useful diagnostics on what is going on in the network. 3420 The second reason has to do with off-path Quality of Service (QoS) 3421 mechanisms. When ICE is used in environments such as PacketCable 3422 2.0, proxies will, in addition to performing normal SIP operations, 3423 inspect the SDP in SIP messages, and extract the IP address and port 3424 for media traffic. They can then interact, through policy servers, 3425 with access routers in the network, to establish guaranteed QoS for 3426 the media flows. This QoS is provided by classifying the RTP traffic 3427 based on 5-tuple, and then providing it a guaranteed rate, or marking 3428 its Diffserv codepoints appropriately. When a residential NAT is 3429 present, and a relayed candidate gets selected for media, this 3430 relayed candidate will be a transport address on an actual STUN 3431 relay. That address says nothing about the actual transport address 3432 in the access router that would be used to classify packets for QoS 3433 treatment. Rather, the translation of that relayed address is 3434 needed. By carrying the translation in the SDP, the proxy can use 3435 that transport address to request QoS from the access router. 3437 B.4. Importance of the STUN Username 3439 ICE requires the usage of message integrity with STUN using its short 3440 term credential functionality. The actual short term credential is 3441 formed by exchanging username fragments in the SDP offer/answer 3442 exchange. The need for this mechanism goes beyond just security; it 3443 is actual required for correct operation of ICE in the first place. 3445 Consider agents A, B, and C. A and B are within private enterprise 1, 3446 which is using 10.0.0.0/8. C is within private enterprise 2, which 3447 is also using 10.0.0.0/8. As it turns out, B and C both have IP 3448 address 10.0.1.1. A sends an offer to C. C, in its answer, provides 3449 A with its host candidates. In this case, those candidates are 3450 10.0.1.1:8866 and 10.0.1.1:8877. As it turns out, B is in a session 3451 at that same time, and is also using 10.0.1.1:8866 and 10.0.1.1:8877 3452 as host candidates. This means that B is prepared to accept STUN 3453 messages on those ports, just as C is. A will send a STUN request to 3454 10.0.1.1:8866 and and another to 10.0.1.1:8877. However, these do 3455 not go to C as expected. Instead, they go to B! If B just replied 3456 to them, A would believe it has connectivity to C, when in fact it 3457 has connectivity to a completely different user, B. To fix this, the 3458 STUN short term credential mechanisms are used. The username 3459 fragments are sufficiently random that it is highly unlikely that B 3460 would be using the same values as A. Consequently, B would reject the 3461 STUN request since the credentials were invalid. In essence, the 3462 STUN username fragments provide a form of transient host identifiers, 3463 bound to a particular offer/answer session. 3465 An unfortunate consequence of the non-uniqueness of IP addresses is 3466 that, in the above example, B might not even be an ICE agent. It 3467 could be any host, and the port to which the STUN packet is directed 3468 could be any ephemeral port on that host. If there is an application 3469 listening on this socket for packets, and it is not prepared to 3470 handle malformed packets for whatever protocol is in use, the 3471 operation of that application could be affected. Fortunately, since 3472 the ports exchanged in SDP are ephemeral and usually drawn from the 3473 dynamic or registered range, the odds are good that the port is not 3474 used to run a server on host B, but rather is the agent side of some 3475 protocol. This decreases the probability of hitting a port in-use, 3476 due to the transient nature of port usage in this range. However, 3477 the possibility of a problem does exist, and network deployers should 3478 be prepared for it. Note that this is not a problem specific to ICE; 3479 stray packets can arrive at a port at any time for any type of 3480 protocol, especially ones on the public Internet. As such, this 3481 requirement is just restating a general design guideline for Internet 3482 applications - be prepared for unknown packets on any port. 3484 B.5. The Candidate Pair Sequence Number Formula 3486 The sequence number for a candidate pair has an odd form. It is: 3488 pair priority = 2^32*MIN(O-P,A-P) + 2*MAX(O-P,A-P) + (O-P>A-P?1:0) 3490 Why is this? When the candidate pairs are sorted based on this 3491 value, the resulting sorting has the MAX/MIN property. This means 3492 that the pairs are first sorted based on decreasing value of the 3493 maximum of the two sequence numbers. For pairs that have the same 3494 value of the maximum sequence number, the minimum sequence number is 3495 used to sort amongst them. If the max and the min sequence numbers 3496 are the same, the offerers priority is used as the tie breaker in the 3497 last part of the expression. The factor of 2*32 is used since the 3498 priority of a single candidate is always less than 2*32, resulting in 3499 the pair priority being a "concatenation" of the two component 3500 priorities. This creates the desired sorting property. 3502 B.6. The Frozen State 3504 The Frozen state is used for two purposes. Firstly, it allows ICE to 3505 first perform checks for the first component of a media stream. Once 3506 a successful check has completed for the first component, the other 3507 components of the same type and local preference will get performed. 3508 Secondly, when there are multiple media streams, it allows ICE to 3509 first check candidates for a single media stream, and once a set of 3510 candidates has been found, candidates of that same type for other 3511 media streams can be checked first. This effectively 'caches' the 3512 results of a check for one media stream, and applies them to another. 3513 For example, if only the relayed candidates for audio (which were the 3514 last resort candidates) succeed, ICE will check the relayed 3515 candidates for video first. 3517 B.7. The remote-candidates attribute 3519 The a=remote-candidates attribute exists to eliminate a race 3520 condition between the updated offer and the response to the STUN 3521 Binding Request that moved a candidate into the Valid list. This 3522 race condition is shown in Figure 18. On receipt of message 4, agent 3523 A adds a candidate pair to the valid list. If there was only a 3524 single media stream with a single component, agent A could now send 3525 an updated offer. However, the check from agent B has not yet 3526 generated a response, and agent B receives the updated offer (message 3527 7) before getting the response (message 10). Thus, it does not yet 3528 know that this particular pair is valid. To eliminate this 3529 condition, the actual candidates at B that were selected by the 3530 offerer (the remote candidates) are included in the offer itself. 3531 Note, however, that agent B will not send media until it has received 3532 this STUN response. 3534 Agent A Network Agent B 3535 |(1) Offer | | 3536 |------------------------------------------>| 3537 |(2) Answer | | 3538 |<------------------------------------------| 3539 |(3) STUN Req. | | 3540 |------------------------------------------>| 3541 |(4) STUN Res. | | 3542 |<------------------------------------------| 3543 |(5) STUN Req. | | 3544 |<------------------------------------------| 3545 |(6) STUN Res. | | 3546 |-------------------->| | 3547 | |Lost | 3548 |(7) Offer | | 3549 |------------------------------------------>| 3550 |(8) Answer | | 3551 |<------------------------------------------| 3552 |(9) STUN Req. | | 3553 |<------------------------------------------| 3554 |(10) STUN Res. | | 3555 |------------------------------------------>| 3557 Figure 18 3559 B.8. Why are Keepalives Needed? 3561 Once media begins flowing on a candidate pair, it is still necessary 3562 to keep the bindings alive at intermediate NATs for the duration of 3563 the session. Normally, the media stream packets themselves (e.g., 3564 RTP) meet this objective. However, several cases merit further 3565 discussion. Firstly, in some RTP usages, such as SIP, the media 3566 streams can be "put on hold". This is accomplished by using the SDP 3567 "sendonly" or "inactive" attributes, as defined in RFC 3264 [4]. RFC 3568 3264 directs implementations to cease transmission of media in these 3569 cases. However, doing so may cause NAT bindings to timeout, and 3570 media won't be able to come off hold. 3572 Secondly, some RTP payload formats, such as the payload format for 3573 text conversation [30], may send packets so infrequently that the 3574 interval exceeds the NAT binding timeouts. 3576 Thirdly, if silence suppression is in use, long periods of silence 3577 may cause media transmission to cease sufficiently long for NAT 3578 bindings to time out. 3580 For these reasons, the media packets themselves cannot be relied 3581 upon. ICE defines a simple periodic keepalive that operates 3582 independently of media transmission. This makes its bandwidth 3583 requirements highly predictable, and thus amenable to QoS 3584 reservations. 3586 B.9. Why Prefer Peer Reflexive Candidates? 3588 Section 4.1.2 describes procedures for computing the priority of 3589 candidate based on its type and local preferences. That section 3590 requires that the type preference for peer reflexive candidates 3591 always be lower than server reflexive. Why is that? The reason has 3592 to do with the security considerations in Section 16. It is much 3593 easier for an attacker to cause an agent to use a false server 3594 reflexive candidate than it is for an attacker to cause an agent to 3595 use a false peer reflexive candidate. Consequently, attacks against 3596 the STUN binding discovery usage are thwarted by ICE by preferring 3597 the peer reflexive candidates. 3599 B.10. Why Send an Updated Offer? 3601 Section 11.1 describes rules for sending media. Both agents can send 3602 media once ICE checks complete, without waiting for an updated offer. 3603 Indeed, the only purpose of the updated offer is to "correct" the 3604 m/c-line so that it matches where media is being sent, based on ICE 3605 procedures. 3607 This begs the question - why is the updated offer/answer exchange 3608 needed at all? Indeed, in a pure offer/answer environment, it would 3609 not be. The offerer and answerer will agree on the candidates to use 3610 through ICE, and then can begin using them. As far as the agents 3611 themselves are concerned, the updated offer/answer provides no new 3612 information. However, in practice, numerous components along the 3613 signaling path look at the SDP information. These include entities 3614 performing off-path QoS reservations, NAT traversal components such 3615 as ALGs and Session Border Controllers (SBCs) and diagnostic tools 3616 that passively monitor the network. For these tools to continue to 3617 function without change, the core property of SDP - that the m/c- 3618 lines represent the addresses used for media - must be retained. For 3619 this reason, an updated offer must be sent. 3621 B.11. Why are Binding Indications Used for Keepalives? 3623 Media keepalives are described in Section 10. These keepalives make 3624 use of STUN when both endpoints are ICE capable. However, rather 3625 than using a Binding Request transaction (which generates a 3626 response), the keepalives use an Indication. Why is that? 3628 The primary reason has to do with network QoS mechanisms. Once media 3629 begins flowing, network elements will assume that the media stream 3630 has a fairly regular structure, making use of periodic packets at 3631 fixed intervals, with the possibility of jitter. If an agent is 3632 sending media packets, and then receives a Binding Request, it would 3633 need to generate a response packet along with its media packets. 3634 This will increase the actual bandwidth requirements for the 5-tuple 3635 carrying the media packets, and introduce jitter in the delivery of 3636 those packets. Analysis has shown that this is a concern in certain 3637 layer 2 access networks that use fairly tight packet schedulers for 3638 media. 3640 Additionally, using a Binding Indication allows integrity to be 3641 disabled, allowing for better performance. This is useful for large 3642 scale endpoints, such as PSTN gateways. 3644 Author's Address 3646 Jonathan Rosenberg 3647 Cisco Systems 3648 600 Lanidex Plaza 3649 Parsippany, NJ 07054 3650 US 3652 Phone: +1 973 952-5000 3653 Email: jdrosen@cisco.com 3654 URI: http://www.jdrosen.net 3656 Intellectual Property Statement 3658 The IETF takes no position regarding the validity or scope of any 3659 Intellectual Property Rights or other rights that might be claimed to 3660 pertain to the implementation or use of the technology described in 3661 this document or the extent to which any license under such rights 3662 might or might not be available; nor does it represent that it has 3663 made any independent effort to identify any such rights. Information 3664 on the procedures with respect to rights in RFC documents can be 3665 found in BCP 78 and BCP 79. 3667 Copies of IPR disclosures made to the IETF Secretariat and any 3668 assurances of licenses to be made available, or the result of an 3669 attempt made to obtain a general license or permission for the use of 3670 such proprietary rights by implementers or users of this 3671 specification can be obtained from the IETF on-line IPR repository at 3672 http://www.ietf.org/ipr. 3674 The IETF invites any interested party to bring to its attention any 3675 copyrights, patents or patent applications, or other proprietary 3676 rights that may cover technology that may be required to implement 3677 this standard. Please address the information to the IETF at 3678 ietf-ipr@ietf.org. 3680 Disclaimer of Validity 3682 This document and the information contained herein are provided on an 3683 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 3684 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 3685 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 3686 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 3687 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3688 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3690 Copyright Statement 3692 Copyright (C) The Internet Society (2007). This document is subject 3693 to the rights, licenses and restrictions contained in BCP 78, and 3694 except as set forth therein, the authors retain all their rights. 3696 Acknowledgment 3698 Funding for the RFC Editor function is currently provided by the 3699 Internet Society.