Internet Engineering Task Force MIDCOM WG Internet Draft J. Rosenberg dynamicsoft J. Weinberger dynamicsoft C. Huitema Microsoft R. Mahy Cisco draft-ietf-midcom-stun-01.txt July 1, 2002 Expires: January 2003 STUN - Simple Traversal of UDP Through Network Address Translators STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract Simple Traversal of UDP Through NATs (STUN) is a lightweight protocol that allows applications to discover the presence and types of Network Address Translators (NATs) and firewalls between them and the public Internet. It also provides the ability for applications to determine the public IP addresses allocated to them by the NAT. STUN works with many existing NATs, and does not require any special behavior from them. As a result, it allows a wide variety of J. Rosenberg et. al. [Page 1] Internet Draft stun July 1, 2002 applications to work through existing NAT infrastructure. J. Rosenberg et. al. [Page 2] Internet Draft stun July 1, 2002 Table of Contents 1 Applicability Statement ............................. 5 2 Introduction ........................................ 5 3 Terminology ......................................... 6 4 Definitions ......................................... 6 5 NAT Variations ...................................... 7 6 Overview of Operation ............................... 7 7 Message Overview .................................... 10 8 Server Behavior ..................................... 11 8.1 Binding Requests .................................... 11 8.2 Shared Secret Requests .............................. 14 9 Client Behavior ..................................... 14 9.1 Discovery ........................................... 15 9.2 Obtaining a Shared Secret ........................... 15 9.3 Formulating the Binding Request ..................... 16 9.4 Processing Binding Responses ........................ 17 10 Use Cases ........................................... 18 10.1 Discovery Process ................................... 18 10.2 Binding Lifetime Discovery .......................... 20 10.3 Binding Acquisition ................................. 20 11 Protocol Details .................................... 22 11.1 Message Header ...................................... 22 11.2 Message Attributes .................................. 23 11.2.1 MAPPED-ADDRESS ...................................... 25 11.2.2 RESPONSE-ADDRESS .................................... 25 11.2.3 CHANGED-ADDRESS ..................................... 25 11.2.4 FLAGS ............................................... 26 11.2.5 SOURCE-ADDRESS ...................................... 26 11.2.6 TARGET-TID .......................................... 26 11.2.7 TARGET-OTP .......................................... 26 11.2.8 MESSAGE-INTEGRITY ................................... 27 11.2.9 ERROR-CODE .......................................... 27 12 Security Considerations ............................. 28 12.1 Attacks on STUN ..................................... 28 12.1.1 Attack I: DDOS Against a Target ..................... 28 12.1.2 Attack II: Silencing a Client ....................... 29 12.1.3 Attack III: Assuming the Identity of a Client ....... 29 12.1.4 Attack IV: Eavesdropping ............................ 29 12.2 Launching the Attacks ............................... 29 12.2.1 Approach I: Compromise a Legitimate STUN Server ..... 30 12.2.2 Approach II: DNS Attacks ............................ 30 12.2.3 Approach III: Rogue router or NAT ................... 30 J. Rosenberg et. al. [Page 3] Internet Draft stun July 1, 2002 12.2.4 Approach IV: MITM ................................... 31 12.2.5 Approach V: Response Injection plus DoS ............. 31 12.2.6 Approach VI: Duplication ............................ 32 12.3 Countermeasures ..................................... 33 13 IANA Considerations ................................. 34 14 IAB Considerations .................................. 34 14.1 Problem Definition .................................. 34 14.2 Exit Strategy ....................................... 35 14.3 Brittleness Introduced by STUN ...................... 36 14.4 Requirements for a Long Term Solution ............... 37 14.5 Issues with Existing NAPT Boxes ..................... 38 14.6 In Closing .......................................... 38 15 Changes since draft-ietf-midcom-stun-00 ............. 39 16 Changes since draft-rosenberg-midcom-stun-01 ........ 39 17 Acknowledgements .................................... 39 18 Authors Addresses ................................... 40 19 Normative References ................................ 40 20 Informative References .............................. 41 J. Rosenberg et. al. [Page 4] Internet Draft stun July 1, 2002 1 Applicability Statement This protocol is not a cure-all for the problems associated with NAT. It does not enable incoming TCP connections through NAT. It allows incoming UDP packets through NAT, but only through a subset of existing NAT types. In particular, STUN does not enable incoming UDP packets through symmetric NATs (defined below), which are common in large enterprises. STUN's discovery procedures are based on assumptions on NAT treatment of UDP; such assumptions may prove invalid down the road as new NAT devices are deployed. For a more complete discussion of the limitations of STUN, see Section 14. 2 Introduction Network Address Translators (NATs), while providing many benefits, also come with many drawbacks. The most troublesome of those drawbacks is the fact that they break many existing IP applications, and make it difficult to deploy new ones. Guidlines have been developed [6] that describe how to build "NAT friendly" protocols, but many protocols simply cannot be constructed according to those guidelines. Examples of such protocols include almost all peer-to- peer protocols, such as multimedia communications, file sharing and games. To combat that problem, Application Layer Gateways (ALGs) have been embedded in NATs. ALGs perform the application layer functions required for a particular protocol to traverse a NAT. Typically, this involves rewriting messages to contain translated addresses, rather than the ones inserted by the sender of the protocol message. ALGs have serious limitations, including scalability, reliability, and speed of deploying new applications. To resolve these problems, the Middlebox Communciations (MIDCOM) protocol is being developed [7]. MIDCOM allows an application entity, such as an end client or network server of some sort (like a SIP proxy [8]) to control a NAT (or firewall), in order to obtain NAT bindings and open or close pinholes. In this way, NATs and applications can be separated once more, eliminating the need for embedding ALGs in NATs, and resolving the limitations imposed by current architectures. Unfortunately, MIDCOM requires upgrades to existing NAT and firewalls, in addition to application components. Complete upgrades of these NAT and firewall products will take a long time, potentially years. This is due, in part, to the fact that the deployers of NAT and firewalls are not the same people who are deploying and using applications. As a result, the incentive to upgrade these devices will be low in many cases. Consider, for example, an airport Internet lounge that provides access with a NAT. A user connecting to the natted network may wish to use a peer-to-peer service, but cannot, J. Rosenberg et. al. [Page 5] Internet Draft stun July 1, 2002 because the NAT doesn't support it. Since the administrators of the lounge are not the ones providing the service, they are not motivated to upgrade their NAT equipment to support it, using either an ALG, or MIDCOM. Another problem is that the MIDCOM protocol requires that the agent controlling the middleboxes know the identity of those middleboxes, and have a relationship with them which permits control. In many configurations, this will not be possible. For example, many cable access providers use NAT in front of their entire access network. This NAT could be in addition to a residential NAT purchased and operated by the end user. The end user will probably not have a control relationship with the NAT in the cable access network, and may not even know of its existence. Many existing proprietary protocols, such as those for online games (such as the games described in RFC 3027 [9]) and Voice over IP, have developed tricks that allow them to operate through NATs without changing those NATs. This draft is an attempt to take some of those ideas, and codify them into an interoperable protocol that can meet the needs of many applications. The protocol described here, Simple Traversal of UDP Through NAT (STUN), allows entities behind a NAT to first discover the presence of a NAT and the type of NAT, and then to learn the addresses bindings allocated by the NAT. STUN requires no changes to NATs, and works with an arbitrary number of NATs in tandem between the application entity and the public Internet. 3 Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [1] and indicate requirement levels for compliant STUN implementations. 4 Definitions STUN Client: A STUN client (also just referred to as a client) is an entity that generates STUN requests. A STUN client can execute on an end system, such as a users PC, or can run in a network element, such as a server. STUN Server: A STUN Server (also just referred to as a server) is an entity that receives STUN requests, and sends STUN responses. STUN servers are generally attached to the public Internet. STUN servers are stateless. J. Rosenberg et. al. [Page 6] Internet Draft stun July 1, 2002 5 NAT Variations It is assumed that the reader is familiar with NATs. It has been observed that NAT treatment of UDP is variable amongst implementations. The four treatments observed in implementations are: Full Cone: A full cone NAT is one where all requests from the same internal IP address and port are mapped to the same external IP address and port. Furthermore, any external host can send a packet to the internal host, by sending a packet to the mapped external address. Restricted Cone: A restricted cone NAT is one where all requests from the same internal IP address and port are mapped to the same external IP address and port. Unlike a full cone NAT, an external host (with IP address X) can send a packet to the internal host only if the internal host had previously sent a packet to IP address X. Port Restricted Cone: A port restricted cone NAT is like a restricted cone NAT, but the restriction includes port numbers. Specifically, an external host can send a packet, with source IP address X and source port P, to the internal host only if the internal host had previously sent a packet to IP address X and port P. Symmetric: A symmetric NAT is one where all requests from the same internal IP address and port, to a specific destination IP address and port, are mapped to the same external IP address and port. If the same host sends a packet with the same source address and port, but to a different destination, a different mapping is used. Furthermore, only the external host that receives a packet can send a UDP packet back to the internal host. Determining the type of NAT is important in many cases. Depending on what the application wants to do, the particular behavior may need to be taken into account. 6 Overview of Operation This section is descriptive only. Normative behavior is described in Sections 8 and 9. The typical STUN configuration is shown in Figure 1. A STUN client is connected to private network 1. This network connects to private network 2 through NAT 1. Private network 2 connects to the public J. Rosenberg et. al. [Page 7] Internet Draft stun July 1, 2002 /-----\ // STUN \\ | Server | \\ // \-----/ +--------------+ Public Internet ................| NAT 2 |....................... +--------------+ +--------------+ Private NET 2 ................| NAT 1 |....................... +--------------+ /-----\ // STUN \\ | Client | \\ // Private NET 1 \-----/ Figure 1: STUN Configuration Internet through NAT 2. On the public Internet is a STUN server. STUN is a simple client-server protocol. A client sends a request to a server, and the server returns a response. There are two types of requests - Binding Requests, sent over UDP, and Shared Secret Requests, sent over TLS. Shared Secret Requests ask the server to return a one time password that can be used as a shared secret in a subsequent Binding Request and Binding Response, for the purposes of authentication and message integrity. Binding requests are used to determine the bindings allocated by NATs. The client sends a Binding Request to the server, over UDP. J. Rosenberg et. al. [Page 8] Internet Draft stun July 1, 2002 The server examines the source IP address and port of the request, and copies them into a response that is sent back to the client. There are some parameters in the request that allow the client to ask that the response be sent elsewhere, or that the server send the response from a different address and port. There are attributes for providing message integrity and authentication. Thats it. The trick is using STUN to discover the presence of NAT, and to learn and use the bindings they allocate. The STUN client is typically embedded in an application which needs to obtain a public IP address and port that can be used to receive data. For example, it might need to obtain an IP address and port to receive RTP [10] traffic. When the application starts, the STUN client within the application sends a STUN Shared Secret Request to its server, obtains a password, and then sends it a Binding Request. STUN servers can be discovered through DNS SRV records [2], and it is generally assumed that the client is configured with the domain to use to find the STUN server. Generally, this will be the domain of the provider of the service the application is using (such a provider is incented to deploy STUN servers in order to allow its customers to use its application through NAT). Of course, a client can determine the address or domain name of a stun server through other means. A STUN server can even be embedded within an end system. The STUN Binding Request is used to discover the presence of a NAT, and to discover the public IP address and port mappings generated by the NAT. Binding Requests are sent to the STUN server using UDP. When a Binding Request arrives at the STUN server, it may have passed through one or more NATs between the STUN client and the STUN server. As a result, the source address of the request received by the server will be the mapped address created by the nat closest to the server. The STUN server copies that source IP address and port into a STUN Binding Response, and sends it back to the source IP address and port of the STUN request. For all of the NAT types above, this response will arrive at the STUN client. When the STUN client receives the STUN Binding Response, it compares the IP address and port in the packet with the local IP address and port it bound to when the request was sent. If these do not match, the STUN client is behind one or more NATs. In the case of a full- cone NAT, the IP address and port in the body of the STUN response are public, and can be used by any host on the public Internet to send packets to the application that sent the STUN request. An application need only listen on the IP address and port from which the STUN request was sent, and send the IP address and port learned in the STUN response to hosts that wish to communicate with it. J. Rosenberg et. al. [Page 9] Internet Draft stun July 1, 2002 Of course, the host may not be behind a full-cone NAT. Indeed, it doesn't yet know what type of NAT it is behind. To determine that, the client uses additional STUN Binding Requests. The exact procedure is flexible, but would generally work as follows. The client would send a second STUN Binding Request, this time to a different STUN server, but from the same source IP address and port. If the IP address and port in the response are different from those in the first response, the client knows it is behind a symmetric NAT. To determine if its behind a full-cone NAT, the client can send a STUN Binding Request with flags that tell the STUN server to send a response from a different IP address and port than the request was received on. In other words, if the client sent a Binding Request to IP address/port A/B using a source IP address/port of X/Y, the STUN server would send the Binding Response to X/Y using source IP address/port C/D. If the client receives this response, it knows it is behind a full cone NAT. STUN also allows the client to ask the server to send the Binding Response from the same IP address the request was received on, but with a different port. This can be used to detect whether the client is behind a port restricted cone nat or just a restricted cone nat. It should be noted that the configuration in Figure 1 is not the only permissible configuration. The STUN server can be located anywhere, including within another client. The only requirement is that the STUN server is reachable by the client, and if the client is trying to obtain a publically routable address, that the server reside on the public Internet. 7 Message Overview STUN messages are TLV (type-length-value) encoded using big endian (network ordered) binary. All STUN messages start with a STUN header, followed by a STUN payload. The payload is a series of STUN attributes, the set of which depends on the message type. The STUN header contains a STUN message type, transaction ID, and length. The message type can be Binding Request, Binding Response, Binding Error Response, Shared Secret Request, Shared Secret Error Response, or Shared Secret Response. The transaction ID is used to correlate requests and responses. The length indicates the total length of the STUN payload, not including the header. This allows STUN to run over TCP. Shared Secret Requests are always sent over TCP (indeed, using TLS). Several STUN attributes are defined for usage in Binding Requests and Binding Responses. The first is a MAPPED-ADDRESS attribute, which is an IP address and port. It is always placed in the Binding Response, and it indicates the source IP address and port the server saw in the J. Rosenberg et. al. [Page 10] Internet Draft stun July 1, 2002 Binding Request. There is also a RESPONSE-ADDRESS attribute, which contains an IP address and port. The RESPONSE-ADDRESS attribute can be present in the Binding Request, and indicates where the Binding Response is to be sent. Its optional, and when not present, the Binding Response is sent to the source IP address and port of the Binding Request. The third attribute is the FLAGS attribute, and it contains boolean flags to control behavior. Three flags are defined: "discard", "change IP" and "change port". The FLAGS attribute is allowed only in the Binding Request. The discard attribute tells the server to not send a Binding Response. The change IP and change port attributes are useful for determining whether the client is behind a restricted cone nat or restricted port cone nat. They instruct the server to send the Binding Responses from a different source IP address and port. The FLAGS attribute is optional in the Binding Request. The fourth attribute is the CHANGED-ADDRESS attribute. It is present in Binding Responses. It informs the client of the source IP address and port that would be used if the client requested the "change IP" and "change port" behavior. The fifth attribute is the SOURCE-ADDRESS attribute. It is only present in Binding Responses. It indicates the source IP address and port where the response was sent from. It is useful for detecting twice NAT configurations. The Shared Secret Request message always contains one attribute - the TARGET-TID attribute. This attribute contains the transaction ID of the Binding Request which be protected using the shared secret learned from the Binding Response. The Shared Secret Response contains one attribute - the TARGET-OTP. This attribute contains the shared secret to be used. 8 Server Behavior The server behavior depends on whether the request is a Binding Request or a Shared Secret Request. 8.1 Binding Requests A STUN server MUST be prepared to receive Binding Requests on four address/port combinations - (A1, P1), (A2, P1), (A1, P2), and (A2, P2). (A1, P1) represent the primary address and port, and these are the ones obtained through the client procedures below. Typically, P1 will be port 3478, the default STUN port. A2 and P2 are arbitrary. A2 and P2 are advertised by the server through the CHANGED-ADDRESS attribute, as described below. J. Rosenberg et. al. [Page 11] Internet Draft stun July 1, 2002 If the Binding Request contains the FLAGS attribute, and the discard flag is true, the server MUST discard the request. It is RECOMMENDED that the server check the Binding Request for a MESSAGE-INTEGRITY attribute. If not present, and the server requires integrity checks on the request, it MUST generate a Binding Error Response with an ERROR-CODE attribute with class 400 and number 1. If present, the server computes the HMAC over the request as described in Section 11.2.8. The key to use depends on the shared secret mechanism. If the STUN Shared Secret Request was used, the key MUST be the one placed in a Shared Secret Response to a Shared Secret Request that had a TARGET-TID equal to the transaction identifier in the Binding Request. If the server doesn't remember the shared secret, because it timed out, the server MUST generate a Binding Error Response. The Binding Error Response MUST include an ERROR-CODE with class 400 and number 2. If the server does have the shared secret, but the computed HMAC differs from the one in the request, the server MUST generate a Binding Error Response with an ERROR-CODE with class 400 and number 3. The Binding Error Response is sent to the source IP address and port the Binding Request came from, and sent from the source IP address and port the Binding Request was sent to. Assuming the Binding Request passed the integrity check, the server MUST generate a single Binding Response. The Binding Response MUST contain the same transaction ID contained in the Binding Request. The length in the message header MUST contain the total length of the message in bytes, excluding the header. The Binding Response MUST have a message type of "Binding Response". The server MUST add a MAPPED-ADDRESS attribute to the Binding Response. The IP address component of this attribute MUST be set to the source IP address observed in the Binding Request. The port component of this attribute MUST be set to the source port observed in the Binding Request. If the RESPONSE-ADDRESS attribute was absent from the Binding Request, the destination address and port of the Binding Response MUST be the same as the source address and port of the Binding Request. Otherwise, the destination address and port of the Binding Response MUST be the value of the IP address and port in the RESPONSE-ADDRESS attribute. The source address and port of the Binding Response depend on the value of the FLAGS attribute and on the address and port the Binding Request was received on, and are summarized in Table 1. Let Da represent the destination IP address of the Binding Request J. Rosenberg et. al. [Page 12] Internet Draft stun July 1, 2002 (which will be either A1 or A2), and Dp represent the destination port of the Binding Request (which will be either P1 or P2). Let Ca represent the other address, so that if Da is A1, Ca is A2. If Da is A2, Ca is A1. Similarly, let Cp represent the other port, so that if Dp is P1, Cp is P2. If Dp is P2, Cp is P1. If the "change port" flag was set in the Binding Request, and the "change IP" flag was not set, the source IP address of the Binding Response MUST be Da and the source port of the Binding Response MUST be Cp. If the "change IP" flag was set in the Binding Request, and the "change port" flag was not set, the source IP address of the Binding Response MUST be Ca and the source port of the Binding Response MUST be Dp. When both flags are set, the source IP address of the Binding Response MUST be Ca and the source port of the Binding Response MUST be Cp. If neither flag is set, the source IP address of the Binding Response MUST be Da and the source port of the Binding Response MUST be Dp. Flags Source Address Source Port none Da Dp Change IP Ca Dp Change port Da Cp Change IP and Change port Ca Cp Table 1: Impact of Flags on Packet Source The server MUST add a SOURCE-ADDRESS attribute to the Binding Response, containing the address and port used to send the Binding Response. The server MUST add a CHANGED-ADDRESS attribute to the Binding Response. This contains the source IP address and port that would be used if the client had requested the "change IP" and "change port" capabilities of the server in the Binding Request. These are Ca and Cp, respectively. If the Binding Request contained a MESSAGE-INTEGRITY attribute, the server MUST add a MESSAGE-INTEGRITY attribute to the Binding Response. If the Binding Request did not contain a MESSAGE-INTEGRITY attribute, the server SHOULD add a MESSAGE-INTEGRITY attribute to the Binding Response. The attribute contains an HMAC [3]. The key to use depends on the shared secret mechanism. If the STUN Shared Secret Request was used, the key MUST be the one placed in a Shared Secret Response to a Shared Secret Request that had a TARGET-TID equal to the transaction identifier in the Binding Request. If the server doesn't remember the shared secret, because it timed out, the server J. Rosenberg et. al. [Page 13] Internet Draft stun July 1, 2002 MUST generate a Binding Error Response instead of a Binding Response. The Binding Error Response MUST include an ERROR-CODE with class 400 and number 2. The server SHOULD NOT retransmit the response. Reliability is achieved by having the client periodically resend the request, each of which triggers a response from the server. 8.2 Shared Secret Requests Shared Secret Requests are always received on TLS connections. When the server receives a request from the client to establish a TLS connection, it SHOULD do so, and SHOULD prevent a site certificate. The TLS ciphersuite TLS_RSA_WITH_AES_128_CBC_SHA SHOULD be used. Client authentication MAY be done, but is not required since the server is not allocating any resources to clients. In fact, the signature validation needed for client authentication outweighs the amount of work STUN itself will require on the server. If the server receives a Shared Secret Request, it MUST verify that the request arrived on a TLS connection. The server extracts the TARGET-TID from the request. It then creates a Shared Secret Response. The Shared Secret Response MUST contain the same transaction ID contained in the Shared Secret Request. The length in the message header MUST contain the total length of the message in bytes, excluding the header. The Shared Secret Response MUST have a message type of "Shared Secret Response". The Shared Secret Response MUST include a TARGET-OTP attribute containing a shared secret which MUST be valid for a Binding Request whose transaction ID equals TARGET-TID. The server MUST always return the same TARGET-OTP for two requests with the same TARGET-TID. This shared secret MUST have at least 128 bits. The server SHOULD remember shared secret, and be able to determine whether the shared secret is valid in a Binding Request with a particular transaction ID, for a period of 30 seconds. After that time, it MAY discard the state. There is no explicit requirement that the server provide a different shared secret for each transaction ID. The shared secrets could, alternatively, be time-based, rotating every few minutes. The shared secrets could also be a function of the transaction ID itself, derived through a suitable key derivation algorithm. The Shared Secret Response is sent over the same TLS connection the request was received on. The server SHOULD keep the connection open, and let the client close it. 9 Client Behavior J. Rosenberg et. al. [Page 14] Internet Draft stun July 1, 2002 The behavior of the client is very straightforward. Its task is to discover the STUN server, obtain a shared secret, formulate the Binding Request, handle request reliability, and process the Binding Responses. 9.1 Discovery Generally, the client will be configured with a domain name of the provider of the STUN servers. This domain name is resolved to an IP address and port of using the SRV procedures specified in RFC 2782 [2]. Specifically, the service name is "stun". The protocol is "udp" (for sending Binding Requests) or "tcp" (for sending Shared Secret Requests). The procedures of RFC 2782 are followed to determine the server to contact. RFC 2782 spells out the details of how a set of SRV records are sorted and then tried. However, it only states that the client should "try to connect to the (protocol, address, service)" without giving any details on what happens in the event of failure. Those details are described here for STUN. For STUN requests, failure occurs if there is a transport failure of some sort (generally, due to fatal ICMP errors in UDP use or connection failures in TCP). Failure also occurs if the the request does not solicit a response after 30 seconds. If a failure occurs, the client SHOULD create a new request, which is identical to the previous, but has a different transaction ID. That request is sent to the next element in the list as specified by RFC 2782. The default port for STUN requests is 3478, for both TCP and UDP. Administrators SHOULD use this port in their SRV records, but MAY use others. If no SRV records were found, the client performs an A or AAAA record lookup of the domain name. The result will be a list of IP addresses, each of which can be contacted at the default port. This would allow a firewall admin to open the STUN port, so hosts within the enterprise could access new applications. Whether they will or won't do this is a good question. 9.2 Obtaining a Shared Secret As discussed in Section 12, there are several attacks possible on STUN systems. Many of these are prevented through integrity of requests and responses. To provide that integrity, STUN makes use of a shared secret between client and server, used as the keying J. Rosenberg et. al. [Page 15] Internet Draft stun July 1, 2002 material for an HMAC used in both the Binding Request and Binding Response. STUN allows for the shared secret to be obtained in any way (for example, Kerberos [11]). However, it MUST have at least 128 bits of randomness. In order to ensure interoperability, this specification mandates a TLS-based mechanism. This mechanism, described in this section, MUST be implemented by clients and servers. First, the client determines the IP address and port that it will open a TCP connection to. This is done using the discovery procedures in Section 9.1, but using _tcp as the transport. The client opens up the connection to that address and port, and immediately begins TLS negotiation [4]. The client MUST authorize the server. If a site certificate is offered, the client MUST verify that the certificate corresponds to a server in the domain that the client looked up in DNS. If a cite certificate is not used, it is RECOMMENDED that the user be queried to verify that the server is authorized to provide STUN responses. Once the connection is opened, the client sends a Shared Secret request. That request has a single attribute, the TARGET-TID, which contains the 128 bit transaction ID that will be used for the Binding Request to be HMACed with the shared secret. The server generates a Shared Secret Response. The response contains a single attribute, TARGET-OTP, which contains the shared secret to be used for the Binding Request. The client MAY generate multiple Shared Secret Requests on the connection, and it MAY do so before receiving Shared Secret Responses to previous Shared Secret Requests. The client SHOULD close the connection as soon as it has finished obtaining one time passwords. Section 9.3 describes how these one-time passwords are used to provide integrity protection over Binding Requests, and Section 8.1 describes how it is used in Binding Responses. 9.3 Formulating the Binding Request A Binding Request formulated by the client follows the syntax rules defined in Section 11. Any two requests that are not bit-wise identical, or not sent to the same server from the same IP address and port, MUST carry different transaction IDs. The transaction ID MUST be uniformly and randomly chosen between 0 and 2**128 - 1. The large range is needed because the transaction ID serves as a form of randomization, helping to prevent replays of previously signed responses from the server. The message type of the request MUST be "Binding Request". J. Rosenberg et. al. [Page 16] Internet Draft stun July 1, 2002 The RESPONSE-ADDRESS attribute is optional in the Binding Request. It is used if the client wishes the response to be sent to a different IP address and port. This is useful for determining whether the client is behind a firewall, and for applications that have separated control and data components. See Section 10.3 for more details. The FLAGS attribute is also optional. Whether it is present depends on what the application is trying to accomplish. See Section 10 for some example uses. The client SHOULD add a MESSAGE-INTEGRITY attribute to the Binding Request. This contains an HMAC [3]. The key to use depends on the shared secret mechanism. If the STUN Shared Secret Request was used, the key MUST be the one found in a Shared Secret Response to a Shared Secret Request that had a TARGET-TID equal to the transaction identifier in the Binding Request. If the attribute is place in the request, the server is guaranteed to provide integrity over the respponse. Once formulated, the client sends the Binding Request. Reliability is accomplished through client retransmissions. Clients SHOULD retransmit the request starting with an interval of 100ms, doubling every retransmit until the interval reaches 1.6s. Retranmissions continue with intervals of 1.6s until a response is received, or a total of 9 requests have been sent, at which time the client SHOULD give up. 9.4 Processing Binding Responses The response can either be a Binding Response or Binding Error Response. If the response is a Binding Error Response, the client checks the class and number from the ERROR-CODE attribute of the response. For 400 class responses with numbers 1, 2 and 3, the client SHOULD obtain a new shared secret, and retry the Binding Request with a new transaction. For 400 class responses with unknown numbers, the client should alert the user that there was an error, and display the reason phrase of the ERROR-CODE response. For 500 class responses with unknown numbers, the client SHOULD retry the Binding Request. For 600 class responses with unknown numbers, the client SHOULD NOT retry the request, and should inform the user of the failure using the reason phrase. If the response is a Binding Response, the client SHOULD check the response for a MESSAGE-INTEGRITY attribute. If not present, and the client placed a MESSAGE-INTEGRITY attribute into the request, it MUST discard the response. If present, the client computes the HMAC over the response as described in Section 11.2.8. The key to use depends J. Rosenberg et. al. [Page 17] Internet Draft stun July 1, 2002 on the shared secret mechanism. If the STUN Shared Secret Request was used, the key MUST be the one placed in a Shared Secret Response to a Shared Secret Request that had a TARGET-TID equal to the transaction identifier in the response. If the computed HMAC differs from the one in the response, the client MUST discard the response, and SHOULD alert the user about a possible attack. If the computed HMAC matches the one from the response, processing continues. Reception of a response to a STUN request will terminate retransmissions of that request. However, clients MUST continue to listen for STUN responses to a request for 10 seconds after the first response. If it receives any responses in this interval with different message types (Binding Responses and Binding Error Responses, for example) or different MAPPED-ADDRESSes, it is an indication of a possible attack. The client MUST NOT use the MAPPED- ADDRESS from any of those responses, and SHOULD alert the user. Furthermore, if a client receives more than twice as many Binding Responses as the number of Binding Requests it sent, it MUST NOT use the MAPPED-ADDRESS from any of those responses, and SHOULD alert the user about a potential attack. If the Binding Response is authenticated, and the MAPPED-ADDRESS was not discarded because of a potential attack, the CLIENT MAY use the MAPPED-ADDRESS and SOURCE-ADDRESS attributes. 10 Use Cases The rules of Sections 8 and 9 describe exactly how a client and server interact to send requests and get responses. However, they do not dictate how the STUN protocol is used to accomplish useful tasks. That is at the discretion of the client. Here, we provide some useful scenarios for applying STUN. 10.1 Discovery Process In this scenario, a user is running a multimedia application which needs to determine which of the following scenarios applies to it: o On the open Internet o Firewall that blocks UDP o Firewall that allows UDP out, and responses have to come back to the source of the request (like a symmetric NAT, but no translation. We call this symmetric UDP Firewall) o Full-cone NAT J. Rosenberg et. al. [Page 18] Internet Draft stun July 1, 2002 o Symmetric NAT o Restricted cone or restricted port cone NAT Which of the six scenarios applies can be determined through the flow chart described in Figure 2. The chart refers only to the sequence of Binding Requests; Shared Secret Requests will, of course, be needed to authenticate each Binding Request used in the sequence. The flow makes use of three tests. In test I, the client sends a STUN Binding Request to a server, without any flags set, and without the RESPONSE-ADDRESS attribute. This causes the server to send the response back to the address and port that the request came from. This response provides the IP address and port for the third party address that would be used if the source IP and/or port were changed. In test II, the client sends a Binding Request with both the "change IP" and "change port" flags set. In test III, the client sends a Binding Request with only the "change port" flag set. The client begins by initiating test I. If this test yields no response, the client knows right away that it is not capable of UDP connectivity. If the test produces a response, the client examines the MAPPED-ADDRESS attribute. If this address and port is the same as the local IP address and port of the socket used to send the request, the client knows that it is not natted. It executes test II. If a response is received, the client knows that it has open access to the Internet (or, at least, its behind a firewall that behaves like a full-cone NAT, but without the translation). If no response is received, the client knows its behind a symmetric UDP firewall. In the event that the IP address and port of the socket did not match the MAPPED-ADDRESS attribute in the response to test I, the client knows that it is behind a NAT. It performs test II. If a response is received, the client knows that it is behind a full-cone NAT. If no response is received, it performs test I again, but this time, does so to the address and port from the CHANGED-ADDRESS attribute. If the IP address and port returned in the MAPPED-ADDRESS attribute are not the same as the ones from the first test I, the client knows its behind a symmetric NAT. If the address and port are the same, the client is either behind a restricted or port restricted NAT. To make a determination about which one it is behind, the client initiates test III. If a response is received, its behind a restricted NAT, and if no response is received, its behind a port restricted NAT. This procedure yields substantial information about the operating condition of the client application. In the event of multiple NATs between the client and the Internet, the type that is discovered will J. Rosenberg et. al. [Page 19] Internet Draft stun July 1, 2002 be the type of the most restrictive NAT between the client and the Internet. The types of NAT, in order of restrictiveness, from most to least, are symmetric, port restricted cone, restricted cone, and full cone. 10.2 Binding Lifetime Discovery STUN can also be used to discover the lifetimes of the bindings created by the NAT. To do that, the client first sends a Binding Request to the server from a particular socket, X. This creates a binding in the NAT. The response from the server contains a MAPPED- ADDRESS attribute, providing the public address and port on the NAT. Call this Pa and Pp, respectively. The client then starts a timer with a value of T seconds. When this timer fires, the client sends another Binding Request to the server, using the same destination address and port, but from a different socket, Y. This request contains a RESPONSE-ADDRESS address attribute, set to (Pa,Pp). This will create a new binding on the NAT, and cause the stun server to send a Binding Response that would match the old binding, if it still exists. If the client receives the Binding Response on socket X, it knows that the binding has not expired. If the client receives the Binding Response on socket Y (which is possible if the old binding expired, and the NAT allocated the same public address and port to the new binding), or receives no response at all, it knows that the binding has expired. The client can find the value of the binding lifetime by doing a binary search through T, arriving eventually at the value where the response is not received for any timer greater than T, but is received for any timer less than T. 10.3 Binding Acquisition Consider once more the case of a VoIP phone. It used the discovery process above when it started up, to discover its environment. Now, it wants to make a call. As part of the discovery process, it determined that it was behind a full-cone NAT. Consider further that this phone consists of two logically separated components - a control component that handles signaling, and a media component that handles the audio, video, and RTP [10]. Both are behind the same NAT. Because of this separation of control and media, we wish to minimize the communication required between them. In fact, they may not even run on the same host. In order to make a voice call, the phone needs to obtain an IP address and port that it can place in the call setup message as the destination for receiving audio. J. Rosenberg et. al. [Page 20] Internet Draft stun July 1, 2002 +--------+ | Test | | I | +--------+ | | V /\ /\ N / \ Y / \ Y +--------+ UDP <-------/Resp\---------->/ IP \------------>| Test | Blocked \ ? / \Same/ | II | \ / \? / +--------+ \/ \/ | | N | | V V /\ +--------+ Sym. N / \ | Test | UDP <---/Resp\ | II | Firewall \ ? / +--------+ \ / | \/ V |Y /\ /\ | Symmetric N / \ +--------+ N / \ V NAT <--- / IP \<-----| Test |<--- /Resp\ Open \Same/ | I | \ ? / Internet \? / +--------+ \ / \/ \/ | |Y | | | V | Full | Cone V /\ +--------+ / \ Y | Test |------>/Resp\---->Restricted | III | \ ? / +--------+ \ / \/ |N | Port +------>Restricted Figure 2: Flow for type discovery process J. Rosenberg et. al. [Page 21] Internet Draft stun July 1, 2002 To obtain an address, the control component sends a Shared Secret Request to the server, obtains a shared secret, and then sends a Binding Request to the server. No flags are present in the Binding Request, and neither is the RESPONSE-ADDRESS field. The Binding Response contains a mapped address. The control component then performs another Shared Secret Request, obtains another shared secret, and uses it to formulate a second Binding Request. This request contains a RESPONSE-ADDRESS, which is set to that mapped address learned from the previous Binding Response. This Binding Request is passed to the media component, along with the IP address and port of the STUN server. The media component sends the Binding Request. The request goes to the STUN server, which sends the Binding Response back to the control component. The control component receives this, and now has learned an IP address and port that will be routed back to the media component that sent the request. The client will be able to receive media from anywhere on this mapped address. In the case of silence suppression, there may be periods where the client receives no media. In this case, the UDP bindings could timeout (UDP bindings in nats are typically short). To deal with this, the application can periodically retransmit the query in order to keep the binding fresh. It is possible that both participants in the multimedia session are behind the same NAT. In that case, both will repeat this procedure above, and both will obtain public address bindings. When one sends media to the other, the media is routed to the nat, and then turns right back around to come back into the enterprise, where it is translated to the private address of the recipient. This is not particularly efficient, and unfortunately, does not work in many commercial NATs. In such cases, the clients may need to retry using private addresses. 11 Protocol Details This section presents the detailed encoding of a STUN message. STUN is a request-response protocol. Clients send a request, and the server sends a response. There are two requests, Binding Request, and Shared Secret Request. The response to a Binding Request can either be the Binding Response or Binding Error Response. The response to a Shared Secret Request can be a Shared Secret Response or Shared Secret Error Response. 11.1 Message Header J. Rosenberg et. al. [Page 22] Internet Draft stun July 1, 2002 All STUN messages consist of a 20 byte header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | STUN Message Type | Message Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Transaction ID +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Message Types can take on the following values: 0x0001 : Binding Request 0x0101 : Binding Response 0x0111 : Binding Error Response 0x0002 : Shared Secret Request 0x0102 : Shared Secret Response 0x0112 : Shared Secret Error Response The message length is the count, in bytes, of the size of the message, not including the 20 byte header. The transaction ID is a 128 bit identifier. It also serves as salt to randomize the request and the response. All responses carry the same identifier as the request they correspond to. 11.2 Message Attributes After the header are 0 or more attributes. Each attribute is TLV encoded, with a 16 bit type, 16 bit length, and variable value: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value .... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ J. Rosenberg et. al. [Page 23] Internet Draft stun July 1, 2002 The following types are defined: 0x0001: MAPPED-ADDRESS 0x0002: RESPONSE-ADDRESS 0x0003: FLAGS 0x0004: SOURCE-ADDRESS 0x0005: CHANGED-ADDRESS 0x0006: TARGET-TID 0x0007: TARGET-OTP 0x0008: MESSAGE-INTEGRITY 0x0009: ERROR-CODE Future extensions MAY define new attributes. If a stun client or server receives a message with an unknown attribute with a type lower than or equal to 0x7fff, the message MUST be discarded. If the type is greater than 0x7fff, the attribute MUST be ignored. The ordering of attributes within a message is not important, and a client or server MUST be prepared to receive them in any order. Any attributes that are known, but are not supposed to be present in a message (MAPPED-ADDRESS in a request, for example) MUST be ignored. Table 2 indicates which attributes are present in which messages. An M indicates that inclusion of the attribute in the message is mandatory, O means its optional, and N/A means that the attribute is not applicable to that message type. Binding Shared Shared Shared Binding Binding Error Secret Secret Secret Att. Request Response Response Request Response Error Response ____________________________________________________________________________ MAPPED-ADDRESS N/A M N/A N/A N/A N/A RESPONSE-ADDRESS O N/A N/A N/A N/A N/A FLAGS O N/A N/A N/A N/A N/A SOURCE-ADDRESS N/A M N/A N/A N/A N/A CHANGED-ADDRESS N/A M N/A N/A N/A N/A TARGET-TID N/A N/A N/A M N/A N/A TARGET-OTP N/A N/A N/A N/A M N/A MESSAGE-INTEGRITY O O O N/A N/A N/A ERROR-CODE N/A N/A M N/A N/A O Table 2: Summary of Attributes J. Rosenberg et. al. [Page 24] Internet Draft stun July 1, 2002 The length refers to the length of the value element. 11.2.1 MAPPED-ADDRESS The MAPPED-ADDRESS attribute indicates the mapped IP address and port. It consists of an eight bit address family, and a sixteen bit port, followed by a fixed length value representing the IP address. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |x x x x x x x x| Family | Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address.. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The port is a network byte ordered representation of the mapped port. The following families are defined: 0x01: IPv4 0x02: IPv6 The first 8 bits of the MAPPED-ADDRESS are ignored, for the purposes of aligning parameters on natural boundaries. For IPv4 addresses, the address is 32 bits. For IPV6, it is 128 bits. New address families MAY be defined by extensions. A message with an unknown address family is discarded. 11.2.2 RESPONSE-ADDRESS The RESPONSE-ADDRESS attribute indicates where the response to a Binding Request should be sent. Its syntax is identical to MAPPED- ADDRESS. 11.2.3 CHANGED-ADDRESS The CHANGED-ADDRESS attribute indicates the IP address and port of a STUN server where responses will be sent from if the "change IP" and/or "change port" flags were set. The attribute is always present in a Binding Response, independent of the value of the flags. Its syntax is identical to MAPPED-ADDRESS. J. Rosenberg et. al. [Page 25] Internet Draft stun July 1, 2002 11.2.4 FLAGS The FLAGS attribute is a series of boolean flags. It is 32 bits long: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |A|B|C| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Only three flags, A,B,C, are currently defined. The other bits MAY be used by extensions to define additional flags. Unknown flags are ignored. Each flag is a binary one if true, zero otherwise. The meaning of the flags is: A: This is the "change IP" flag. If true, it requests the server to send the Binding Response with a different IP address than the one the Binding Request was received on. B: This is the "change port" flag. If true, it requests the server to send the Binding Response with a different port than the one the Binding Request was received on. C: This is the discard flag. If true, the Binding Request is discarded. 11.2.5 SOURCE-ADDRESS The SOURCE-ADDRESS attribute is present in Binding Responses. It indicates the source IP address and port that the server is sending the response from. Its syntax is identical to that of MAPPED-ADDRESS. 11.2.6 TARGET-TID The TARGET-TID attribute is used in Shared Secret Requests. It MUST appear by itself as the sole attribute in a Shared Secret Request. The value of TARGET-TID is a 128 bit transaction ID. This MUST be equal to the transaction ID that will be used in Binding Requests that contain the shared secret learned from the exchange. 11.2.7 TARGET-OTP The TARGET-OTP attribute is used in Shared Secret Responses. It MUST J. Rosenberg et. al. [Page 26] Internet Draft stun July 1, 2002 appear by itself as the sole attribute in a Shared Secret Response. The value of TARGET-OTP is a variable length value that is to be used as a shared secret. 11.2.8 MESSAGE-INTEGRITY The MESSAGE-INTEGRITY attribute contains an HMAC-SHA1 [3] of the STUN message. It can be present in Binding Requests or Binding Responses. Since it uses the SHA1 hash, the HMAC will be 20 bytes. The text used as input to HMAC is the STUN message, including the header, up to and including the attribute preceding the MESSAGE-INTEGRITY attribute. As a result, the MESSAGE-INTEGRITY attribute MUST be the last attribute in any STUN message. The key used as input to HMAC depends on the context. 11.2.9 ERROR-CODE The ERROR-CODE attribute is present in the Binding Error Response and Shared Secret Error Response. It consists of a 16 bit error class, a 16 bit error number, a 32 bit length, followed by a variable length textual reason phrase: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Class | Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reason Phrase (variable) .. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Class is an unsigned integer between 0 and 65536. It specifies the general class of the error. The following classes are defined at this time: 400: Malformed Request 500: Server Error 600: Global Failure The Number identifies a very specific reason for failure within the class. If a client does not understand a particular number, it processes it based on the default processing for that class. J. Rosenberg et. al. [Page 27] Internet Draft stun July 1, 2002 The following numbers in the 400 class are defined at this time: 1: The Binding Request did not contain a MESSAGE-INTEGRITY attribute. 2: The Binding Request did contain a MESSAGE-INTEGRITY attribute, but it used a shared secret that has expired. The client should obtain a new shared secret and try again. 3: The Binding Request contained a MESSAGE-INTEGRITY attribute, but the HMAC was not verified. This could be a sign of a potential attack, or client implementation error. 12 Security Considerations 12.1 Attacks on STUN Generally speaking, attacks on STUN can be classified into denial of service attacks and eavesdropping attacks. Denial of service attacks can be launched against a STUN server itself, or against other elements using the STUN protocol. STUN servers create state through the Shared Secret Request mechanism. To prevent being swamped with traffic, a STUN server SHOULD limit the number of simultaneous TLS connections it will hold open, and SHOULD close connections that have been open for longer than a few minutes, in order to allow new users to connect. Similarly, it SHOULD limit the number of shared secrets it will store. The attacks of greater interest are those in which the stun server and client are used to launch DOS attacks against other entities, including the client itself. Many of the attacks require the attacker to generate a response to a legitimate stun request, in order to provide the client with a faked MAPPED-ADDRESS. The attacks that can be launched using such a technique include: 12.1.1 Attack I: DDOS Against a Target In this case, the attacker provides a large number of clients with the same faked MAPPED-ADDRESS that points to the intended target. This will trick all the stun clients into thinking that their addresses are equal to that of the target. The clients then hand out that address in order to receive traffic on it (for example, in SIP or H.323 messages). However, all of that traffic becomes focused at the intended target. The attack can provide substantial J. Rosenberg et. al. [Page 28] Internet Draft stun July 1, 2002 amplification, especially when used with clients that are using STUN to enable multimedia applications. 12.1.2 Attack II: Silencing a Client In this attack, the attacker seeks to deny a client access to services enabled by STUN (for example, a client using STUN to enable SIP-based multimedia traffic). To do that, the attacker provides that client with a faked MAPPED-ADDRESS. The MAPPED-ADDRESS it provides is an IP address that routes to nowhere. As a result, the client won't receive any of the packets it expects to receive when it hands out the MAPPED-ADDRESS. This exploitation is not very interesting for the attacker. It impacts a single client, which is frequently not the desired target. Moreover, any attacker that can mount the attack could also deny service to the client by other means, such as preventing the client from receiving any response from the STUN server, or even a DHCP server. 12.1.3 Attack III: Assuming the Identity of a Client This attack is similar to attack II. However, the faked MAPPED- ADDRESS points to the attacker themselves. This allows the attacker to receive traffic which was destined for the client. 12.1.4 Attack IV: Eavesdropping In this attack, the attacker forces the client to use a MAPPED- ADDRESS that routes to itself. It then forwards any packets it receives to the client. This attack would allow the attacker to observe all packets sent to the client. However, in order to launch the attack, the attacker must have already been able to observe packets from the client to the STUN server. In most cases (such as when the attack is launched from an access network), this means that the attacker could already observe packets sent to the client. This attack is, as a result, only useful for observing traffic by attackers on the path from the client to the STUN server, but not on the path of media to the client. 12.2 Launching the Attacks It is important to note that attacks of this nature (injecting responses with fake MAPPED-ADDRESSes) require that the attacker be capable of eavesdropping requests sent from the client to the server (or to act as a MITM for such attacks). This is because STUN requests contain a transaction identifier, selected by the client, which is random with 128 bits of entropy. The server echoes this value in the J. Rosenberg et. al. [Page 29] Internet Draft stun July 1, 2002 response, and the client ignores any responses that don't have a matching transaction ID. Therefore, in order for an attacker to provide a faked response that is accepted by the client, the attacker needs to know what the transaction ID in the request was. The large amount of randomness, combined with the need to know when the client sends a request, precludes attackes that involve guessing the transaction ID. Since all of the above attacks rely on this one primitive - injecting a response with a faked MAPPED-ADDRESS - preventing the attacks is accomplished by preventing this one operation. To prevent it, we need to consider the various ways in which it can be accomplished. There are several: 12.2.1 Approach I: Compromise a Legitimate STUN Server In this attack, the attacker compromises a legitimate STUN server through a virus or trojan horse. Presumably, this would allow the attacker to take over the STUN server, and control the types of responses it generates. Compromise of a STUN server can also lead to discovery of open ports. Knowledge of an open port creates an opportunity for DoS attacks on those ports (or DDoS attacks if the traversed NAT is a full cone NAT). Discovering open ports is already fairly trivial using port probing, so this does not represent a major threat. 12.2.2 Approach II: DNS Attacks STUN servers are discovered using DNS SRV records. If an attacker can compromise the DNS, it can inject fake records which map a domain name to the IP address of a STUN server run by the attacker. This will allow it to inject fake responses to launch any of the attacks above. 12.2.3 Approach III: Rogue router or NAT Rather than compromise the STUN server, an attacker can cause a STUN server to generate responses with faked address by compromising a router or NAT on the path from the client to the STUN server. When the STUN request passes through the rogue router or NAT, it rewrites the source address of the packet to be that of the desired MAPPED- ADDRESS. This address cannot be arbitrary. If the attacker is on the public Internet (that is, there are no NATs between it and the STUN server), and the attacker doesn't modify the STUN request, the address has have the property that packets sent from the STUN server to that address would route through the compromised router. This is because the STUN server will send the responses back to the source J. Rosenberg et. al. [Page 30] Internet Draft stun July 1, 2002 address of the request. With a modified source address, the only way they can reach the client is if the compromised router directs them there. If the attacker is on the public Internet, but they can modify the STUN request, they can insert a RESPONSE-ADDRESS attribute into the request, containing the actual source address of the STUN request. This will cause the server to send the response to the client, independent of the source address the STUN server sees. This gives the attacker the ability to forge an arbitrary source address when it forwards the STUN request. If the attacker is on a private network (that is, there are NATs between it and the STUN server), the attacker will not be able to force the server to generate arbitrary MAPPED-ADRESSes in responses. They will only be able to generate MAPPED-ADDRESSes which route to the private network. This is because the NAT between the attacker and the STUN server will rewrite the source address of the STUN request, mapping it to a public address that routes to the private network. Because of this, the attacker can only force the server to generate faked mapped addreses that route to the private network. Unfortunately, it is possible that a low quality NAT would be willing to map an allocated public address to another public address (as opposed to an internal private address), in which case the attacker could forge the source address in a STUN request to be an arbitrary public address. This kind of behavior from NATs does appear to be rare. 12.2.4 Approach IV: MITM As an alternative to approach III, if the attacker can place an element on the path from the client to the server, the element can act as a man-in-the-middle. In that case, it can intercept a STUN request, and generate a STUN response directly with any desired value of the MAPPED-ADDRESS field. Alternatively, it can forward the STUN request to the server (after potential modification), receive the response, and forward it to the client. When forwarding the request and response, this attack is subject to the same limitations on the MAPPED-ADDRESS described in Section 12.2.3. 12.2.5 Approach V: Response Injection plus DoS In this approach, the attacker does not need to be a MITM (as in approaches III and IV). Rather, it only needs to be able to eavesdrop onto a network segment that carries STUN requests. This is easily done in multiple access networks such as ethernet or unprotected 802.11. To inject the fake response, the attacker listens on the network for a STUN request. When it sees one, it simultaneously launches a DoS attack on the STUN server, and generates its own STUN response with the desired MAPPED-ADDRESS value. The STUN response J. Rosenberg et. al. [Page 31] Internet Draft stun July 1, 2002 generated by the attacker will reach the client, and the DoS attack against the server is aimed at preventing the legitimate response from the server from reaching the client. Arguably, the attacker can do without the DoS attack on the server, so long as the faked response beats the real response back to the client, and the client uses the first response, and ignores the second (even though its different). 12.2.6 Approach VI: Duplication This approach is similar to approach V. The attacker listens on the network for a STUN request. When it sees it, it generates its own STUN request towards the server. This STUN request is identical to the one it saw, but with a spoofed source IP address. The spoofed address is equal to the one that the attacker desires to have placed in the MAPPED-ADDRESS of the STUN response. In fact, the attacker generates a flood of such packets. The STUN server will receive the one original request, plus a flood of duplicate fake ones. It generates responses to all of them. If the flood is sufficiently large for the responses to congest routers or some other equipment, there is a reasonable probability that the one real response is lost (along with many of the faked ones), but the net result is that only the faked responses are received by the STUN client. These responses are all identical and all contain the MAPPED-ADDRESS that the attacker desired the client to use. The flood of duplicate packets is not needed (that is, only one faked request is sent), so long as the faked response beats the real response back to the client, and the client uses the first response, and ignores the second (even though its different). Note that, in this approach, launching a DoS attack against the STUN server or the IP network, to prevent the valid response from being sent or received, is problematic. The attacker needs the STUN server to be available to handle its own request. Due to the periodic retransmissions of the request from the client, this leaves a very tiny window of opportunity. The attacker must start the DoS attack immediately after the actual request from the client, causing the correct response to be discarded, and then cease the DoS attack in order to send its own request, all before the next retransmission from the client. Due to the close spacing of the retransmits (100ms to a few seconds), this is very difficult to do. Besides DoS attacks, there may be other ways to prevent the actual request from the client from reaching the server. Layer 2 manipulations, for example, might be able to accomplish it. Fortunately, Approach IV is subject to the same limitations J. Rosenberg et. al. [Page 32] Internet Draft stun July 1, 2002 documented in Section 12.2.3, which limit the range of MAPPED- ADDRESSes the attacker can cause the STUN server to generate. 12.3 Countermeasures STUN provides mechanisms to counter the approaches described above, and additional, non-STUN techniques can be used as well. First off, it is RECOMMENDED that networks with stun clients implement ingress source filtering (RFC 2827 [5]). This is particularly important for the NATs themselves. As Section 12.2.3 explains, NATs which do not perform this check can be used as "reflectors" in DDoS attacks. Most NATs do perform this check as a default mode of operation. We strongly advise people that purchase NATs to ensure that this capability is present and enabled. Secondly, it is RECOMMENDED that STUN servers be run on hosts dedicated to this purpose, with all UDP and TCP ports disabled except for the STUN ports. This is to prevent viruses and trojan horses from infecting STUN servers, in order to prevent their compromise. This helps mitigate Approach I 12.2.1. Thirdly, to prevent the DNS attack of Section 12.2.2, Section 9.2 recommends that the client verify the credentials provided by the server with the name used in the DNS lookup. Finally, all of the attacks above rely on the client using a mapped address used in other protocols. If encryption and message integrity are provided within those protocols, the eavesdropping and identity assumption attacks can be prevented. As such, application protocols that make use of STUN addresses SHOULD use integrity and encryption, even if a SHOULD level strength is not specified for that protocol. For example, multimedia applications using STUN addresses to receive RTP traffic would use secure RTP [12]. The above three techniques are non-STUN mechanisms. STUN itself provides several countermeasures. Approaches IV (Section 12.2.4), when generating the response locally, and V (Section 12.2.5) require an attacker to generate a faked response. This attack is prevented using the server signature scheme provided in STUN, described in Section 8.1. Approaches III (Section 12.2.3) IV (Section 12.2.4), when using the relaying technique, and VI (12.2.6), however, are not preventable through server signatures. Both approaches are most potent when the attacker can modify the request, inserting a RESPONSE-ADDRESS that routes to the client. Fortunately, such modifications are preventable J. Rosenberg et. al. [Page 33] Internet Draft stun July 1, 2002 using the client signature techniques described in Section 9.3. However, these three approaches are still functional when the attacker modifies nothing but the source address of the STUN request. Sadly, this is the one thing that cannot be protected through cryptographic means, as this is the change that STUN itself is seeking to detect and report. It is therefore an inherent weakness in NAT, and not fixable in STUN. To help mitigate these attacks, Section 9.4 provides several heuristics for the client to follow. The client looks for inconsistent or extra responses, both of which are signs of the attacks described above. However, these heuristics are just that - heuristics, and cannot be guaranteed to prevent attacks. The heuristics appear to prevent the attacks as we know how to launch them today. Implementors should stay posted for information on new heuristics that might be required in the future. Such information will be distributed on the IETF MIDCOM mailing list, midcom@ietf.org. 13 IANA Considerations There are no IANA considerations associated with this specification. 14 IAB Considerations The IAB has studied the problem of "Unilateral Self Address Fixing", which is the general process by which a client attempts to determine its address in another realm on the other side of a NAT through a collaborative protocol reflection mechanism [13]. STUN is an example of a protocol that performs this type of function. The IAB has mandated that any protocols developed for this purpose document a specific set of considerations. This section meets those requirements. 14.1 Problem Definition From [13], any UNSAF proposal must provide: Precise definition of a specific, limited-scope problem that is to be solved with the UNSAF proposal. A short term fix should not be generalized to solve other problems; this is why "short term fixes usually aren't". The specific problems being solved by STUN are: o Provide a means for a client to detect the presence of one or more NATs between it and a server run by a service provider on the public Internet. The purpose of such detection is to determine additional steps that might be necessary in order to receive service from that particular provider. J. Rosenberg et. al. [Page 34] Internet Draft stun July 1, 2002 o Provide a means for a client to detect the presence of one or more NATs between it and another client, where the second client is reachable from the first, but it is not known whether the second client resides on the public Internet. o Provide a means for a client to obtain an address on the public Internet from a non-symmetric NAT, for the express purpose of receiving incoming UDP traffic from another host targeted to that address. STUN does not address TCP, either incoming or outgoing, and does not address outgoing UDP communications. 14.2 Exit Strategy From [13], any UNSAF proposal must provide: Description of an exit strategy/transition plan. The better short term fixes are the ones that will naturally see less and less use as the appropriate technology is deployed. STUN comes with its own built in exit strategy. This strategy is the detection operation that is performed as a precursor to the actual UNSAF address-fixing operation. This discovery operation, documented in Section 10.1, attempts to discover the existence of, and type of, any NATS between the client and the service provider network. Whilst the detection of the specific type of NAT may be brittle, the discovery of the existence of NAT is itself quite robust. As NATs are phased out through the deployment of IPv6, the discovery operation will return immediately with the result that there is no NAT, and no further operations are required. Indeed, the discovery operation itself can be used to help motivate deployment of IPv6; if a user detects a NAT between themselves and the public Internet, they can call up their access provider and complain about it. STUN can also help facilitate the introduction of midcom. As midcom- capable NATs are deployed, applications will, instead of using STUN (which also resides at the application layer), first allocate an address binding using midcom. However, it is a well-known limitation of midcom that it only works when the agent knows the middleboxes through which its traffic will flow. Once bindings have been allocated from those middleboxes, a STUN detection procedure can validate that there are no additional middleboxes on the path from the public Internet to the client. If this is the case, the application can continue operation using the address bindings allocated from midcom. If it is not the case, STUN provides a mechanism for self-address fixing through the remaining midcom- J. Rosenberg et. al. [Page 35] Internet Draft stun July 1, 2002 unaware middlboxes. Thus, STUN provides a way to help transition to full midcom-aware networks. 14.3 Brittleness Introduced by STUN From [13], any UNSAF proposal must provide: Discussion of specific issues that may render systems more "brittle". For example, approaches that involve using data at multiple network layers create more dependencies, increase debugging challenges, and make it harder to transition. STUN introduces brittleness into the system in several ways: o The discovery process assumes a certain classification of devices based on their treatment of UDP. Their could be other types of NATs that are deployed that would not fit into one of these molds. Therefore, future NATs may not be properly detected by STUN. STUN clients (but not servers) would need to change to accomodate that. o The binding acquisition usage of STUN does not work for all NAT types. It will work for any application for full cone NATs only. For restricted cone and port restricted cone NAT, it will work for some applications dependening on the application. Application specific processing will generally be needed. For symmetric NATs, the binding acquisition will not yield a usable address. The tight dependency on the specific type of NAT makes the protocol brittle. o STUN assumes that the server exists on the public Internet. If the server is located in another private address realm, the user may or may not be able to use its discovered address to communicate with other users. There is no way to detect such a condition. o The bindings allocated from the NAT need to be continuously refreshed. Since the timeouts for these bindings is very implementation specific, the refresh interval cannot easily be determined. When the binding is not being actively used to receive traffic, but rather just wait for it, the binding refresh will needlessly consume network bandwidth. o The use of the STUN server as an additional network element introduces another point of potential security attack. These attacks are largely prevented by the security measures J. Rosenberg et. al. [Page 36] Internet Draft stun July 1, 2002 provided by STUN, but not entirely. o The use of the STUN server as an additional network element introduces another point of failure. If the client cannot locate a STUN server, or if the server should be unavailable due to failure, the application cannot function. o The use of STUN to discover address bindings will result in an increase in latency for applications. For example, a Voice over IP application will see an increase of call setup delays equal to at least one RTT to the stun server. 14.4 Requirements for a Long Term Solution From [13], any UNSAF proposal must provide: Identify requirements for longer term, sound technical solutions -- contribute to the process of finding the right longer term solution. Our experience with STUN has led to the following requirements for a long term solution to the NAT problem: Requests for bindings and control of other resources in a NAT need to be explicit. Much of the brittleness in STUN derives from its guessing at the parameters of the NAT, rather than telling the NAT what parameters to use. Control needs to be "in-band". There are far too many scenarios in which the client will not know about the location of middleboxes ahead of time. Instead, control of such boxes needs to occur in band, traveling along the same path as the data will itself travel. This guarantees that the right set of middleboxes are controlled. This is only true for first-party controls; third-party controls are best handled using the midcom framework. Control needs to be limited. Users will need to communicate through NATs which are outside of their administrative control. In order for providers to be willing to deploy NATs which can be controlled by users in different domains, the scope of such controls needs to be extremely limited - typically, allocating a binding to reach the address where the control packets are coming from. Simplicity is Paramount. The control protocol will need to be implement in very simple clients. The servers will need to J. Rosenberg et. al. [Page 37] Internet Draft stun July 1, 2002 support extremely high loads. The protocol will need to be extremely robust, being the precursor to a host of application protocols. As such, simplicity is key. 14.5 Issues with Existing NAPT Boxes From [13], any UNSAF proposal must provide: Discussion of the impact of the noted practical issues with existing, deployed NA[P]Ts and experience reports. Several of the practical issues with STUN involve future proofing - breaking the protocol when new NAT types get deployed. Fortunately, this is not an issue at the current time, since most of the deployed NATs are of the types assumed by STUN. The primary usage STUN has found is in the area of VoIP, to facilitate allocation of addresses for receiving RTP [10] traffic. In that application, the periodic keepalives are provided by the RTP traffic itself. However, several practical problems arise for RTP. First, RTP assumes that RTCP traffic is on a port one higher than the RTP traffic. This pairing property cannot be guaranteed through NATs that are not directly controllable. As a result, RTCP traffic may not be properly received. Protocol extensions to SDP have been proposed which mitigate this by allowing the client to signal a different port for RTCP [14]. However, there will be interoperability problems for some time. For VoIP, silence suppression can cause a gap in the transmission of RTP packets. This could result in the loss of a binding in the middle of a call, if that silence period exceeds the binding timeout. This can be mitigated by sending occassional silence packets to keep the binding alive. However, the result is additional brittleness; proper operation depends on the the silence suppression algorithm in use, the usage of a comfort noise codec, the duration of the silence period, and the binding lifetime in the NAT. 14.6 In Closing The problems with STUN are not design flaws in STUN. The problems in STUN have to do with the lack of standardized behaviors and controls in NATs. The result of this lack of standardization has been a proliferation of devices whose behavior is highly unpredictable, extremely variable, and uncontrollable. STUN does the best it can in such a hostile environment. Ultimately, the solution is to make the environment less hostile, and to introduce controls and standardized behaviors into NAT. However, until such time as that happens, STUN provides a good short term solution given the terrible conditions under which it is forced to operate. J. Rosenberg et. al. [Page 38] Internet Draft stun July 1, 2002 15 Changes since draft-ietf-midcom-stun-00 o Removed CMS. Added mandatory-to-implement TLS-based shared secret exchange. The STUN requests and responses are integrity checked with HMAC based on that shared secret. o Extensive reworking of security considerations, much more details on the types of attacks and their prevention. o Added error processing. o Defined two separate requests for STUN - a binding request and a shared secret allocation request. 16 Changes since draft-rosenberg-midcom-stun-01 o Added IANA port 3478. o Removed bit about sending a request to a different server in order to implement the changing of IP address and port. o More rigorously specified the change address and port behavior. o Mandate that the stun server listens on all four address/port combinations possible from change address/port. o Extended the transaction ID to 128 bits, because it provides randomization on the response. o Reorganized the formatting of the attributes once again, to support more convenient alignments. o Changed the algorithm for detecting the binding lifetime timers on the NAT. The previous algorithm only worked for symmetric NAT. o Added an applicability statement up front, summarizing some of the issues with STUN. o Noted that STUN servers introduce another point of failure in the system. o Mentioned that the adress theft attack is only possible in certain situations. 17 Acknowledgements J. Rosenberg et. al. [Page 39] Internet Draft stun July 1, 2002 The authors would like to thank Cedric Aoun, Pete Cordell, Cullen Jennings and Chris Sullivan for their comments, and Baruch Sterman and Alan Hawrylyshen for initial implementations. 18 Authors Addresses Jonathan Rosenberg dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jdrosen@dynamicsoft.com Joel Weinberger dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jweinberger@dynamicsoft.com Christian Huitema Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 email: huitema@microsoft.com Rohan Mahy Cisco Systems 170 West Tasman Dr, MS: SJC-21/3 Phone: +1 408 526 8570 Email: rohan@cisco.com 19 Normative References [1] S. Bradner, "Key words for use in RFCs to indicate requirement levels," RFC 2119, Internet Engineering Task Force, Mar. 1997. [2] A. Gulbrandsen, P. Vixie, and L. Esibov, "A DNS RR for specifying the location of services (DNS SRV)," RFC 2782, Internet Engineering Task Force, Feb. 2000. [3] H. Krawczyk, M. Bellare, and R. Canetti, "HMAC: keyed-hashing for message authentication," RFC 2104, Internet Engineering Task Force, J. Rosenberg et. al. [Page 40] Internet Draft stun July 1, 2002 Feb. 1997. [4] T. Dierks and C. Allen, "The TLS protocol version 1.0," RFC 2246, Internet Engineering Task Force, Jan. 1999. [5] P. Ferguson and D. Senie, "Network ingress filtering: Defeating denial of service attacks which employ IP source address spoofing," RFC 2827, Internet Engineering Task Force, May 2000. 20 Informative References [6] D. Senie, "Network address translator (nat)-friendly application design guidelines," RFC 3235, Internet Engineering Task Force, Jan. 2002. [7] P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, and A. Rayhan, "Middlebox communication architecture and framework," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [8] J. Rosenberg, H. Schulzrinne, et al. , "SIP: Session initiation protocol," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. [9] M. Holdrege and P. Srisuresh, "Protocol complications with the IP network address translator," RFC 3027, Internet Engineering Task Force, Jan. 2001. [10] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a transport protocol for real-time applications," RFC 1889, Internet Engineering Task Force, Jan. 1996. [11] J. Kohl and C. Neuman, "The kerberos network authentication service (V5)," RFC 1510, Internet Engineering Task Force, Sept. 1993. [12] M. Baugher, D. McGrew, et al. , "The secure real time transport protocol," Internet Draft, Internet Engineering Task Force, May 2002. Work in progress. [13] L. Daigle, "IAB considerations for UNilateral self-address fixing (UNSAF)," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. [14] C. Huitema, "RTCP attribute in SDP," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. J. Rosenberg et. al. [Page 41] Internet Draft stun July 1, 2002 Full Copyright Statement Copyright (c) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. J. Rosenberg et. al. [Page 42]