Internet Engineering Task Force SIP WG Internet DraftJ.Rosenberg,H.Schulzrinne draft-rosenberg-sip-entfw-01.txtJ.Rosenberg,J.Weinberger,H.Schulzrinne draft-rosenberg-sip-entfw-02.txt dynamicsoft,Columbia U.March 2,July 20, 2001 Expires:September, 2001February 2002 NAT Friendly SIPTraversal through Residential and Enterprise NATs and FirewallsSTATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to useInternet- DraftsInternet-Drafts as reference material or to cite them other than aswork"work inprogress.progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txtTheTo view the listofInternet-Draft ShadowDirectories can be accessed atDirectories, see http://www.ietf.org/shadow.html. Abstract In this draft, we discuss how SIP can traverse enterprise and residentialfirewalls andNATs. This environment is challenging because we assume here that the end userhas littleor SIP provider has no control over thefirewall orNAT, and that thefirewall orNAT is completely ignorant of SIP.Despite this,Our approach is to make SIP "NAT friendly", with a few minor, backwards compatible extensions. These extensions allow UDP and TCP-based SIP to traverse NATs. We also handle RTP traversal using a combination of symmetric (aka connection-oriented) RTP and a new NAT detection and binding discovery mechanism. The results of the approach are that direct UDP-based RTP is used whenever provably possible in any given nat configuration. We use a network intermediary - in oursolutions forcase, an off- the-shelf router - to handle theNATcaseare very workablewhen both caller andsuffer few disadvantages.called party are behind symmetric NATs. Our approach for binding discovery is effectively a pre-midcom solution that allows binding allocations by talking to a server behind the nat, rather than talking to the nat directly. 1 Introduction The problem of getting applications throughfirewalls andNATs has received a lot of attention [1]. Getting SIP throughfirewalls andNATs is particularlytroublesome.trouble- some. In a previous draft [2] we discussed some of the general issues regarding traversal of firewalls, and discussed some solutions for it. Our solutions were based on having a proxy server control the firewall/NAT with a control protocol of some sort [3]. This protocol can open and close pinholes in the firewall, and/or obtain NAT address bindings to use in rewriting the SDP in a SIP message. The use of a control protocol in the midcom architecture is ideal for carriers, but it does not work when the SIP service provider is not the same as the ISP and transport provider of the end user. This is frequently the case for users behind enterprisefirewallsand NATs who aretryingtry- ing to access SIP services outside of their networks. The samehappenshap- pens for residentialNATs and firewalls.NATs. These devices are often used by consumers who have cable modem and DSL connections, and wish to connectmultiplemulti- ple computers using the single address provided by the cable company or DSL company. [1]Residential firewalls and NATs areoften referred to as cable/DSL routers, and are manufactured by companies like Linksys, Netopia, and Netgear. Ultimately, it is our belief and hope that NATs will disappear with the deployment of IPv6. However, that is not likely to happen for some time. Given the existence of NATs, one way to handle SIP is to embed a SIP ALG within enterpriseNATs and firewalls.NATs. However, this has not happened. The top commercialfirewall andNAT products continue to be SIP-unaware. Even if SIP ALG support were added tomorrow, there is still a huge installed based offirewalls andNATs that do not understand SIP. As a result, there is going to be a long period of time during which users will be behindfirewalls orNATs that are ignorant of SIP, probably at least two to three years. The SIPcommunitycom- munity cannot wait for ubiquituous deployment of SIP awarefirewalls andNATs. Interim solutions are needed NOW to enable SIP services to be delivered to users behind these devices. In this draft, we propose solutions for getting SIP throughenterpriseenter- prise and residential NATsand firewallsthat does not require changes to these devices or to their configurations. NATsand firewallsare a reality, and SIP _________________________ [1] The author of this draft is amongst those who have such a residential NAT, and thus feels highly motivated to solve this particular problem deployment is being hampered by the lack of support for SIP ALGs in these boxes. A solution MUST be found, and we provide one here. 2Architecture _________________________ [1]Some Philosophy Our solution centers on the principle that applications, including components within network servers and end systems, need to take an active role in nat traversal. This is counter to much of the existing work in nat traversal, which focuses on construction of ALGs embedded within NATs to make the existence of nats totally transparent to end systems and application layer network servers. Theauthormidcom efforts [3] have taken a step for- ward by recognizing that applications (either within end systems or network servers) are best suited to take a role in controlling NAT behavior. We believe that this approach needs to be taken one step farther, in that applications, especially those with components in end systems, need to adapt to the existence of non-midcom enabled NATs as well. In fact, we believe that the application of the end- to-end principle in thisdraftcase argues in favor of our approach. The end-to-end principle argues that: The function in question can completely and correctly be implemented only with the knowledge and help of the appli- cation standing at the end points of the communication sys- tem. Therefore, providing that questioned function as a feature of the communication system itself is not possible. It is clear that the end-to-end principle would argue against the existence of NATs in the first place. However, there existence isamongst those who have sucharesidential NAT,matter of reality. In order to properly engineer future protocols andthus feels highly motivatedapplications, we are forced tosolve this particular problemtake their existence as a given, and then investigate how our network design principles provide guidance on how to deal with them. So, given that NATs exist, the end-to-end principle would tell us that only the applications can know what the impact of NAT will be on the functioning of the application. Since the end system is the one invoking the application, it is often best suited to determine how to deal with it. The overall system is much simpler and robust when the application in the end systems takes active participation in dealing with NAT. Another way to view it is from the perspective of application adapta- tion. It has been a common design principle in real time applications for the end systems to adapt to the network conditions. Networks might provide best effort, some level of QoS, or be overprovisioned for real time media. Rather than force the network to always deliver a specific level of quality, the applications detect the network con- ditions, and adapt to whatever they find. The result are robust applications and an overall simpler architecture. Weassumeare arguing that this principle still makes sense when extended to other IP network "characteristics", including the presence of NAT. The existence of NAT, and the type of function it provides, are another axis in the overall space of IP networkarchitectureservice. Applications will be the most robust and will perform best when they detect what level of network service (including QoS and NAT) is being provided, and then adapt to it in an optimal fashion. Just as QoS varies, so too do the types of NATs vary. By detecting what type of NAT is present, an end system can figure out how to achieve the best level of service given the existence of that NAT. This approach means that applications can handle cases where there are ALGs (which still makes sense in many scenarios), application- unaware NATs, or what have you. When NATs disappear entirely, the applications will continue to function, and their performance will improve, in fact. 3 Overview of the Approach Our approach consists of several pieces that are put together for a complete solution. The first is a set of SIP extensions that allow just SIP (but not neccesarily the sessions it establishes) to traverse NATs. Our extensions are relatively minor, backwards compa- tible, and allow NAT traversal for UDP and TCP transports. These extensions to SIP are described in Section 4. Providing traversal for the media streams is more complex. The first step in the process is to allow end systems to detect whether there is a NAT between them and their SIP provider, and furthermore, to detect what type of treatment the NAT affords to UDP. We define a simple protocol which enables that to happen. Once the NAT type is detected, our protocol allows the end system to detect what its pub- lic facing address is on the other side of the NAT. We also discuss a router configuration which allows outside entities to send packets to this public address even under the strictest of NAT behaviors (which we call a symmetric NAT). These protocol mechanisms aredealing with looks likediscussed in Section 5. Unfortunately, the mechanism of Section 5 requires an intermediate RTP relay (which is implemented using another NAT in our proposal) when the user is behind a symmetric NAT. To fix that problem, we define symmetric RTP, which is a new RTP usage scenario. It effectively provides connection-oriented RTP over UDP. It is com- pletely backwards compatible, and can avoid the need for an intermed- iary so long as one side in the call is not behind a symmetric NAT. Symmetric RTP, and the SDP extensions required to support it, are described in Section 6. Finally, in Section 7, we put it all together, and show the various call flows that would exist for a variety of different configura- tions. The end result of our mechanisms are that end-to-end UDP media transport, directly between the two parties in a call leg, is always provided so long as it is provably possible. Only in the cases where it is provably impossible for direct media connectivity do we use an intermediary in the service provider domain. The overall architecture we assume for the discussion is shown in Figure 1. The caller is a UA in enterprise or residence A, and the called party is a UA in enterprise or residence B. The caller uses proxy X as its local outbound proxy, which forwards the call to the proxy of the called party, Y, also outside of the firewall or NAT. The call is then forwarded to the called party within enterprise or residence B.The firewall and/or NAT (FW/NAT) boxes are off-the-shelf boxes with no support for4 SIPALG. We considerExtensions for NATand firewall separately. For NATs, we consider specifically a class of devices referredTraversal This section discusses extensions toas residential NATs. Residential NATs are typically placed in the home, andSIP that allowmultiple devicesSIP itself tomake use of a single IP address provided by a cable or DSL provider. The devices generally disallow incoming traffic, but allow outbound TCPtraverse NATs. There are two primary extensions - via ports andUDP connections. Based ontheterminology defined in RFC 2663 [4], residentialcontact cookie. 4.1 Via Ports The first problem with SIP traversal through NATsare Network Address Port Translators (NAPT). Once a connectionisestablished outwards, data on the same connection is allowed inwards from the remote peer. This is true for UDP as well. Specifically, ifsending auser sends UDP packetsrequest fromlocal IP address and port pair A,B to remote IP address and port pair C,D, they are natteda client behind a NAT tohaveasource address of X,Y. Packetsserver on the outside. SIP specifies that for UDP, the response is sentfrom C,D to X,Y have their destination address nattedtoA,B,the port number in the Via header andare delivered back tothehost behindIP address theNAT. The abilityrequest came from. However, due toNAT UDP packetsNAT, the port number inthis way is critical to our solutions. We have verified this feature ontheleading residential NAT products. Many small offices and home offices (SOHO) also use these devices to allow their business to connect toVia header will be wrong. This means that theInternet over cable or DSL. Becauseresponse will not be sent to thedevice is configured identically in this case, we lump itproper location. How- ever, withthe residential NAT. Enterprise firewalls are used in larger enterprises. TheyTCP, responses aretypically configured with much tighter security. We assumesent over theworst case scenario, which isconnection the INVITE arrived on. This means thatthese boxes will allow users inside their enterprises to browsea response sent over theweb, and specifically, to browse secure web sites. UDP, both inbound and outbound, is disallowed. TCP inbound is disallowed. OutboundTCP connection will be received properly by a caller behind a NAT. Therefore, one solution for traversal of requests fromany host within the enterpriseinside to outside isallowed out onlyto use persistent TCP connections. However, many VoIP endpoints do not sup- port80 and 443.TCP, so a UDP based solution is desirable. Ourassumptionapproach isthat these devices are not running NAT. Handling enterprise devices that are both firewalls and NAPT involves combing the solutions for both cases. Wherever appropriate, we discuss any issues specifictocombiningdefine a new Via header parameter, called thetwo. In general, getting SIP services to function behind these devicesresponse port, encoded as "rport". This parameter is inserted by +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | SIP UA| . . | SIP UA| . . | Joe | . . | Bob | . . +-------+ . . +-------+ . ............................. ............................. Enterprise or Enterprise or Residence A Residence B Figure 1: Network Architecturerequires resolution of several problems: Originating Requests: Getting SIP requests fromclients (which can be proxies or UACs) when they wish for thecaller, Joe, to proxy X, and responses from proxy X backresponse tothe Joe. Receiving Requests: Getting SIP requests from proxy Ybe sent to thecalled party, Bob, and responses from Bob back to proxy Y. Handling RTP: Getting media to go from Joe to BobIP address andBob to Joe. We discuss solutions for each in turn. 3 Originating requestsport the request was sent from. Thefirst problemparameter isoriginating requests from the caller through a firewall/NAT, outinserted with no value to flag this feature. When received at aproxy, and gettingserver, theresponses from this proxy back toserver inserts thecaller. 3.1 NAT The residential NAT will allow both outgoing UDP and TCP traffic toport5060. This means that there are no problems in generating an outbound INVITE. However, there are issues withtheresponse. SIP specifies that for UDP,request was received from as theresponsevalue of this parameter. That port issentused to forward theport number inresponse. response-port = ``rport'' [``='' 1*DIGIT] A client inserting the rport into the Via headerandMUST wait for responses on theIP addresssocket the requestcame from. However, due to NAT, the port numberis sent on, and MUST also list, in theVia header will be wrong. This means thatsent-by field, theresponse will not be sent tolocal port of that socket theproper location. However, with TCP, responses arerequest was sentover the connection thefrom. The latter is mandatory for backwards compatibility. Consider an example. A client sends an INVITEarrived on.which looks like: INVITE sip:user@domain SIP/2.0 Via: SIP/2.0/UDP 10.1.1.1:4540;rport Thismeans that a responseINVITE is sentover the TCP connection will be received properly by a caller behindwith aNAT.source port of 4540 and source IP of 10.1.1.1. Thesimplest solution, therefore,request isfor the caller to use a TCP connection to sendnatted, so that theINVITE,source IP appears as 68.44.20.1 andreceivetheresponse. We recommend that this connection be kept open permanently, to avoidsource port as 9988. This is received at a proxy. The proxy forwards theneedrequest, but not before appending a value toestablish it for new calls. A persistent connection is also needed for incoming callsthe rport parameter inany case (see Section 4). For devicesthe proxied request: INVITE sip:user@domain2 SIP/2.0 Via: SIP/2.0/UDP proxy.domain.com Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988 This request generates a response, whichdo not support TCP, UDP may be used. However,arrives at the proxy: SIP/2.0 200 OK Via: SIP/2.0/UDP proxy.domain.com Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988 The proxyneeds to be able to sendstrips its top Via, and then examines theUDPnext one. It contains both a received param, and an rport. The result is that the follow response is sent totheIP address*and*68.44.20.1, port 9988: SIP/2.0 200 OK Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988 The NAT rewrites therequest arrived on. This is not standardized behavior, but could potentially be configured for requests from users that are known to be behind residential NATs. In order fordestination address of thisconnectionpacket back tobe used for re-INVITEs or BYEs,IP 10.1.1.1, port 4540, and is received by theproxy needs to record route. 3.2 Firewall We assumeclient. This works fine when thefirewall (FW) blocks all outgoing UDP, but will allow some outgoing TCP. Inserver supports this extension, so long as there are no nats between theworstclient and server. Consider a server that does not understand it. In this case, it willonly allow outgoing HTTP traffic on 80,ignore the rport parameter, andHTTPS on 443. HTTPS is nothing more than HTTP over TLS/SSL [5]. What's interesting about httpssend the following response to IP 10.1.1.1, port 4540: SIP/2.0 200 OK Via: SIP/2.0/UDP 10.1.1.1:4540;rport As specified by SIP, this response isthatsent to theconnection starts out with TLS, negotiates a secure channel,source IP of the request, andthen runs HTTP over this channel. All HTTP messages are encrypted. The FW never sees any HTTP messagesthe port in theclear, only TLS/SSL messages. The important implicationVia header. Since the client isthatlisten- ing on 4540, the response is received correctly. In the case where the server does not support the extension, but there isno way foraFW to have application layer intelligence that depends onnat between theexistence of HTTP on port 443. In fact, any protocol can be run over TLS on port 443,client andit will lookthesame toserver, theFW. Since we assume that the FW lets HTTPS through, it should allow SIP over TLS through, running on port 443. Thus, our proposalresponse is sent tohavethecaller, Joe, initiate a TLS connection onsource IP and port443 toin theproxy server X. OnceVia, which will be dropped by theTLS connectionnat. This issecured,theclient can send SIP messages over this connection. Handling ofsame behavior exhibited by SIPover TLS/SSLtoday. As a result, our extension isidentical to TCP. Responses frombackwards compatible, in theproxy are sent over this connectionsense that it always works at least as well[6]. We recommend thatas baseline SIP. When both sides sup- port it, and there is a nat in theclient maintainmiddle, traversal works correctly. For theTLS connectionresponse to always beopen (more on thisreceived, the NAT binding must remain inSection 4). This avoidsexistence for theneedduration of the transaction. Most UDP NAT bind- ings appear tore-initiatehave a timeout of one minute. Therefore, non-INVITE transactions will have no problem. For INVITE transactions, theTLS connection forclient may need to retransmit its INVITE everyoutgoing call. Fooling20 seconds or so, even after receiving a provisional response, in order to keep theFW into believingbinding open to receive the final response. Because of the increased network trafficis HTTPS by runninggenerated to keep the UDP bindings active, itover port 443isnot nice. We would strongly recommendRECOMMENDED thatclients first try the IANA registered port for SIP over TLS, port 5061. If no response is received over this connection, the client should then try 443. Note that outgoing requests may work with just vanilla TCP. However, we have observed that some firewalls examine TCP connections to look for specific protocols. Thus, SIP over TCP on 5060 may not work. SIP overTCPon port 80 may also not work,be used instead, assome firewalls check for HTTP messages. This is why we prefer TLS; we believe thatitis most likelygenerates much less data. 4.2 Contact Translation The received port parameter will allow requests initiated from inside the NAT (and their responses), to work.In order for this connection to be used for re-INVITEs or BYEs, theHowever, getting requests from a proxyneedsoutside the NAT, torecord route. 4 Receiving requests Unfortunately, receiving requestsa host inside, isnot as simple as sending them. We consider first the NAT case, and then the firewall case. 4.1 NATa different story. The problem has to do with registrations. In Figure 1, the callee, Bob, will receive requests at their UA because they had previously sent a REGISTER request to their registrar, which is co-located with proxy Y. This registration contains a Contact header which lists the address where the incoming requests should be sent to. However, in the case of NAT, this address will be wrong. It will contain a domain name or IP address that is within the private space of enterprise B. Thus, the REGISTER might look like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Contact: sip:bob@10.0.1.100 This address is not reachable by the proxy. To solve this problem, we need two things. First, we need apersistent connectionper- sistent "connection" to be established from Bob to Y. Secondly, we need a way for incoming requests destined for B to be routed over this connection. To address this first problem,we recommend thatclientsthathave to send REGISTERrequests do soreuests over a TCP or TLS connection,as describedor use UDP along with the response port parameter inSection 3. Furthermore, they keepthe Via header. If TCP is used, this connection is kept openpermanently. REGISTER refreshes are sent over this connection.indefinitely. We further recommend that the proxy/registrar hold this connection in a table, where the table is indexed by the remote side of the transport connection. For UDP, the client holds on to the socket, and uses it for REGISTER refreshes and to receive incoming calls. The server also holds on to the "connection". In the case of UDP, that means that server stores the local IP/port that the request was received on, and indexes it by the source IP and port the request was sent from. When the proxy wishes to send a packet to some server at IP address M, port N, transport O, it looks up the tuple (M,N,O) in the table to see if a connection already exists, and then uses it. The NAT bindings are kept fresh through REGISTER refreshes (see Sec- tion 4.2.1). Now, a connection is available for contacting the user. However, this connection must be associated with sip:bob@Y.com. Unfortunately, it is not. Calls for sip:bob@Y.com are translated to sip:bob@10.0.1.100, which does not correspond to the remote side connection used to send the register, as seen by the proxy. Thats because of NAT, which will make the remote side appear to be a publically routable address. To handle this problem, the proxy could, in principal, record the IP address and port from the remote side of the connection used to send a REGISTER. Then, it can create a Contact entry of the form sip:bob@[ip-addr]:[port], where [ip-addr] and [port] are the IP address and port of the remote side of the connection. However, this is assuming that the registration is for the purposes of connecting the address in the To field with the machine the connection is coming from. That may not be the intent of the registration. Theregistrationregistra- tion may be used to set up a call forwarding service, for example. As a result, it is our proposal that clients be allowed to explicitly ask a proxy to create a Contact entry corresponding to the machine a REGISTER is sent from.We propose thatTo do that, the UA inserts aspecific contact hostname valueTranslate header into the request. This header contains the URL (which MUST bereservedone of the Contact URLs) that is tohavebe translated, along with a parameter that indicates themeaning "I don't know what my address is, please usetype of NAT the client is behind. translate-header = ``Translate'' ``:'' SIP-URL [``;'' ``nat'' ``='' nat-types] nat-types = ``sym'' | ``cone'' If a server receives a REGISTER request with a translate header, it finds the matching Contact header, and replaces the host value with the source IPaddress,address of the REGISTER, and the port value with the source port of the REGISTER. This is the actual Contact stored in the registration database, andtransport fromreturned to theconnection over which this REGISTER was delivered". We proposeclient in the response. The nat-type parameter is an optional parameter thatthis host name be "jibufobutbmpu".tells the regis- trar what type of NAT the client is behind. Thisnameinformation is"I hate NATSvery helpful for some faul tolerance and scalability scenarios, described below. Section 5 discusses how alot" with each letter incremented by one. This nameclient can determine what type of NAT it isunlikely to be used in real systems (as opposed to something like "default", which could be real host name).behind. Consider once more the architecture of Figure 1. The callee has an IP address of 10.0.1.100. Itinitiatessends aTCP connectionREGISTER from port 2234 to port 5060 on the proxy. This connection goes through the NAT, and the source address is rewritten to 77.2.3.88, and the source port to 2937. The registration looks like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Via: SIP/2.0/UDP 10.0.1.100;rport Translate: sip:bob@10.0.1.100:2234 Contact:sip:bob@jibufobutbmpusip:bob@10.0.1.100:2234 The proxy Y then stores theincoming TCP connectionsocket the request was received on into atable: (77.2.3.88,2397,TCP)table, indexed by the source port: (77.2.3.88,2397,UDP) -> [reference toTCP connection]UDP socket] It alsoupdatestranslates thecontact list forContact header to sip:bob@77.2.3.88:2397, and stores that in the registration database. It then responds to the REGISTER: SIP/2.0 200 OK From: sip:bob@Y.com To: sip:bob@Y.com Via: SIP/2.0/UDP 10.0.1.100;rport=2397;received=77.2.3.88 Contact: sip:bob@77.2.3.88:2397 This response is sent toinclude77.2.3.88:2397 because of theURL sip:bob@77.2.3.88:2937;transport=tcp.rport. The NAT translates this to 10.0.1.00:2234, which is then received by the client. Now, when an INVITE arrives for sip:b@Y.com, it is looked up in the registration database. The contact is extracted, and the proxy tries to send the request to that address. To do so, it checks itsconnectionconnec- tion table to an open connection to the IP address, port andtransporttran- sport where the request is destined. In this case, such a connection is available, and the request is forwarded over it. Because it is over a connection with an existing NAT binding, it is properly routed through the NAT. The response from the callee is also routed over the same connection. In order for this connection to be used for re-INVITEs or BYEs, the proxy needs to record route.4.2 Firewalls4.2.1 Refresh Interval Since the connection used for the registrations is held persistently in order to receive incoming calls, the NAT binding must be main- tained. To avoid timeout, data must traverse the NAT over that con- nection with some minimum period. When UDP is used, registrations will need to be refreshed at least once every minute. Thesituationclients SHOULD include an Expires header or parameter with this value. For TCP, a longer interval can be used. 10 minutes issomewhat simplerRECOMMENDED. To test whether the interval is short enough, proxy servers MAY attempt to send OPTIONS requests to the client shortly before the registration expires. If the OPTIONS requests generates no response at all, the server SHOULD lower the value of the Expires header in the next registration. Servers SHOULD cache and reuse the largest successful refresh interval that they discover for a given Contact value. 4.2.2 Routing to the Ingress Proxy A complication arises when a domain supports multiple proxy servers. Consider the scenario shown in Figure 2 A user joe in domain.com is behind a NAT. In DNS, domain.com contains an SRV entry that points to three servers, 1.domain.com, 2.domain.com and 3.domain.com. When the user registers, they will resolve domain.com to one of these. Assume its 1.domain.com. As a result of this, the connection state is stored proxy 1. In the case offirewalls. We still needTCP, this connection state is important. Unless calls for joe@domain.com arrive tohaveproxy 1, they won't be routable to the UA. In the case of UDP, whether it is important or not depends on the type of NAT the user is behind. One type of NAT, which we call "sym- metric", treats UDP much like TCP. When A sends apersistent connection establishedrequest fromBob outinside to B on theproxy, possibly using TLS overoutside, UDP messages back to A must come from B, with a source port equal to the destination port443.of messages from Aregistration is then sent over this address,to B. In the other case, which we call "cone", which is described in [4], UDP messages back to A can have any source port and IP address. If the user is behind a NAT that operates in cone mode, any of the proxies in the proxy farm willlook like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Contact: sip:bob@44.2.4.1;transport=tcp For thisbe able towork, incomingreach the customer through the NAT. All will send requests to the public IP address and port binding created by the NAT, but with different source IP addresses and ports. Since source addressing doesn't matter, things work well. In this case, the proxy need not even store connection state as described in Section 4. If the user is behind a NAT that operates in symmetric mode, callsfor sip:bob@Y.comto the user mustbe routed overcome in through theconnection established by Bobproxy that the user registered to. -- // \\ / \ | DB | | | \ / \\ // -- +-----+ +-----+ +-----+ | | | | | | domain.com |Proxy| |Proxy| |Proxy| | 1 | | 2 | | 3 | +-----+ +-----+ +-----+ +-------------------------+ | NAT | +-------------------------+ +-----+ | | |UA | | | +-----+ Figure 2: Multiple Proxy Configuration In order to enable this, we recommend that the location server data- base store not only the contact, but the proxyY. We assumethat the user con- nected to. When a call comes in for that user, the proxymaintains persistent connectionsreceiving the INVITE looks up the user in the database. The database entry indicates the proxy the user is connected to (call this the connected proxy). If the connected proxy is not the proxy which received the INVITE, the proxy that received the INVITE uses atable, indexedroute header to force the call through the connected proxy. In the case where joe registered at proxy1, and the incoming INVITE arrived at proxy 2, the request sent byremote address, port,proxy 2 would look like: INVITE sip:proxy1.domain.com SIP/2.0 Route: sip:joe@22.1.20.3:3038 This request will first go to proxy1, andtransport (as described above for NAT). In order for thisfrom there, over the exist- ing connection tobe used when contacting Bob, Bob's contact address must bejoe. The differing proxy behaviors for symmetric and cone NATs explains thesame aspresence of theconnection address.nat-type attribute in the Translate header. Assuming the client can determine which type it is behind (using the mechanisms described below), it can simply inform the proxy, allowing it to take the proper action. 4.2.3 INVITE Usage The 200 OK response to the REGISTER request contains the SIP URL that the registrar placed into the database. Thismeansaddress has the impor- tant property that it is routable to theremote connection address,client from the proxy on the public side of the NAT. As a result, the client needs to place this URL asseen by Y, hasthe Contact header in its INVITE requests and 2xx responses tobe 44.2.4.1:5060. However, there are several cases whereINVITE, so that itmight not be.can be reached from the proxy on the outside. 5 RTP/RTCP NAPT Identification and Traversal In this section, we provide a protocol and basic architecture that allows a client to detect whatcases wouldtype of NAT itnot be? First off,is behind (cone or sym- metric), and obtain the public address for an RTP stream. The general idea is to make use of reflectors that return back to the clientmight be multi- homed. Multi-homedthe source IP address and port that a request came from. The general configuration is showin Figure 3. In this figure, the hosts that wish to make or receive a call are behind enterprise or residen- tial NATs. They areincreasingly common as VPNs become more pervasive. VPNs show up as virtual interfaces,makinghosts multihomed. The client may notuse of a service provider that deploys, along with its proxies, three different reflectors, along with a few off-the-shelf routers configured in a specific fashion to act as a media intermediary. Reflector A is responsible for letting the user know whether they are behind a symmetric NAT, and for providing the address of another reflector (type C) which can beableused tocorrectly guessobtain an address binding on on a network intermediary. Reflector B is used to let the user know whether they are behind a cone NAT (one whichinterfaceallows packets back to a natted host from any source port and IP, not just theREGISTER willone the outbound packet was sent to). It MUST be on a different IP address and port than reflector A. This is to deal with NATs which may allow packets back to an internal address from the same IP the packet was senton. Ifto, but different port. This kind of "partial-cone" NAT would be equivalent to a symmetrical one for the purposes of RTP. Reflector C is used to allow the user to determine an address binding that is created on a NAT in the service provider domain. This NAT, and the routers around it, are configured so that the user can receive UDP packets through their enterprise NAT, even if its a sym- metric NAT. 5.1 At initial power-up of Host A When a clientguesses incorrectly,boots up, it first attempts to determine whether it is behind a NAT, and if so, what type. The following procedure is used: 1. Host A sends initial probe (probe type one) to Reflector A from its RTP and RTCP listener ports. Reflector A is the same IP addressinas theContact header mayproxy server configured for this endpoint but an incremented port value (i.e. 5062). Reflec- tor A could be the same physical device as the proxy server or on adifferent interface thanseperate host by a static address translation. 2. Reflector A responds to Host A with an initial acknowledge- ment (probe response type one). This will create a symmetr- ical NAPT translation if theone usedNAPT was initial a partial cone that migrates tosendsymmetrical based on a response. Host A will re-transmit theregistration.probe packet every 50ms (until a timeout period of one minute) or until it receives this acknowledgement. Thesecond case whenacknowledgement (probe response type one) will not contain theconnectionexternally visible IP address of Host A; rather it will identify itself as the initial ack- nowledgement andcontact address don't matchcontain a transaction timeout value. This value indicates the maximum time that Host A should wait for a message from Reflector B before determining it iswhenbehind a symmetrical NAPT. If Host A does not receive a +---------+ | | /Reflector| /| B | /+---------+ / / / +---------+ / | | / /Reflector| / /| A | / / +---------+ +---------+ +---------+ / / | | |Ent. NAPT| / / | Host A -----Router A \ / / | | | |\ / / +---------+ +---------+ \ // +---------+ \// |Service | /------Provider | / |Router A | / +----|----+ +---------+ +---------+ / | | | |Ent. NAPT|/ | | Host B -----Router B / | | | | | | +---------+ +---------+ | +----|----+ +---------+ |Ser. Prov| |Reflector| | NAPT ---------- C | | Router | | | +---------+ +----|----+ | | | | +----|----+ | | |Registrar| | | +---------+ Figure 3: Configuration for NAPT Identification and Traversal message from Reflector B within theclient incorrectly discoversspecified timeframe, Host A will know that it is behind a symmetrical NAPT and send a subsequent message to Reflector A in which it asks for the address of Reflector C. By placing the request for the address of Reflector C after Host A has failed to hear from Reflector B, the provider can utilize deterministic load-balancing mechanisms for itsown IP address, even when singly homed. We have observedSymmetrical Media Server. For this reason, Reflector A should be transaction state- ful. If a request for the address of Reflector C comes that does not match transaction information (i.e. source IP address) and is outside of the designated transaction timeout value plus one second, then Reflector A should respond with an error (i.e. 481). This will help limit attacks on Reflector A in which the attacker tries tofrequentlythrow off any load balancing mechanisms that the provider might be using when selecting thecase. In fact, we have seen some systems report back 127.0.0.1 (the loopback address),address for Reflector C to be used infact, as their IP address. Thus, even without NAT,theContactresponding to hosts. 3. Reflector A instructs Reflector B to send a message (probe response type two) to Host A. This message will contain the externally visible addressmayof Host A and the transaction timeout value that was sent to Host A. 4. Reflector B will send the message (probe response type two) to Host A and inform Reflector A that it has sent the mes- sage to Host A. Reflector A will continue to instruct Reflector B to send the message to Host A every 20ms or until it receives the acknowledgement from Reflector B that the message has been sent. 5. If Host A receives the message (probe response type two) from Reflector B it will know that it is behind a full-cone style NAPT. Host A will send an acknowledgement to Reflec- tor B. Reflector B will continue to retransmit the message to Host A every 50ms for up to the transaction timeout value specified by Reflector A or until it receives an ack- nowledgement from Host A. 6. If Host A does notmatchget a probe response type two within thesourcetimeout value specified by Reflector A of sending its ini- tial probe packet, it will assume that it is behind a sym- metrical NAPT. If this occurs, Host A sends a message to Reflector A (Probe Type Three) informing it that it is behind a symmetrical NAPT. Reflector A will respond to this message with an acknowledgement that includes the IP address of Reflector C. Reflector A will retransmit this response every 50ms for up to 30 seconds or until it receives an acknowledgement from Host A. A call flow for theTLScase where Host A is behind a full-cone NAPT is show in Figure 4, and if Host A is behind a symmetrical NAPT, Figure 5. Host A Reflector A Reflector B | | | ---Probe Type One--->| | | | | |<-Probe Response----- | | Type One | | | -----Instruct----->| | | | | |<----Acknowledge--- | | | | | |<--------Probe Response Type Two-------- | | ---------------Acknowledge------------->| | | Figure 4: Full cone flow 5.2 When forming an Invite orTCP connection used18n response At some point later, host A either wishes toregister.make a call, or wishes to answer an incoming call. Infact, this problem has nothingeither case, if its behind a NAT, it needs todo with NATsplace an address and port in the SDP in the offer orfirewalls. We have observedanswer which can be used to receive media. The approach that is used depends on what type of NAT the client determined it was behind. If Host A determined ithappeningwas behind a full-cone NAPT: 1. Host A sends a pre-Invite probe (probe type two) to Reflec- tor A from its RTP and RTCP listener ports. 2. Reflector A responds to Host A with Host A's externally visible IP addresses. Host A then uses this address and port inmany real world scenarios. Asthe SDP header of the SIP message (note that this requires the SDP to carry RTCP address and port informa- tion). 3. If Host A does not receive aresult,response from Reflector A, it will retransmit the pre- Invite probe every 50ms for up to Host A Reflector A Reflector B | | | +--Probe Type One--->| | | | | |<-Probe Response----+ | | Type One | | | +----Instruct----->| | | | | |<----Acknowledge--+ | | | | | | ....--Probe Response Type Two-------+ | | | +-Probe Type Three-->| | | |<-Probe Response----+ | Type Three (with | | IP address of | | Reflector C) | | | +---Acknowledge----->| | | Figure 5: Symmetric flow 10 seconds. If Host A does not receive a response from Reflector A, it will inform the user that a network error has occurred an re-run the power-on test detailed above. The message flow for RTP isour recommendation that,as follows: Step Device Addressing 1 RTP listener port on Host A DA=192.1.3.2:6060, SA=X:Y 2 Enterprise NAPT router A DA=193.1.3.2:6060, SA=X1:Y1 3 Router (receives packet and passes it to E0) 4 Service Provider NAPT router, Ethernet port 0 DA=193.1.3.2:6060, SA=X2:Y2 5 Reflector Response created with SA=X2:Y2 6 Service Provider NAPT router, Ethernet port 0 SA=193.1.3.2:6060, DA=X1:Y1 7 Enterprise NAPT router SA=193.1.3.2:6060, DA=X:Y 8 RTP listener port on Host A SA=193.1.3.2:6060, DA=X:Y with payload reflecting external SA of X2:Y2 The SIP endpoint now places X2:Y2 into its SDP header as its RTP and RTCP listener. In the above example, the address is 193.1.1.3 with ageneral rule, clients userandomly selected port. This address and port actual exist on the"Contact cookie"RTP NAPT router and is addressable via the public internet. Now Host B receives this information in an Invite or 180 message and sends RTP media to X2:Y2 (e.g. 193.1.1.3:32001). Since this is apersistent connectionpub- lic address, the packet is sent as follows: Step Device Addressing 1 RTP sender on Host B DA=X2:Y2, SA=A:Z 2 Enterprise NAPT router B DA=X2:Y2, SA=A1:Z1 3 Router (receives packet and passes it to E1) 4 Service Provider NAPT router, Ethernet port 1 DA=X2:Y2, SA=193.1.3.2:6060 5 Service Provider NAPT router, Ethernet port 0 DA=X1:Y1, SA=193.1.3.2:6060 6 Enterprise NAPT router DA=X:Y, SA=193.1.3.2:6060 7 RTP listener on Host A (receives packet) DA=X:Y, SA=193.1.3.2:6060 5.3 During the Call (Full Cone NAPT) The media path, inorderthis situation, is end-to-end via the enterprise NAPT routers. Media does not traverse the service provider's reflec- tors or symmetrical media servers. During the life of the call, Host A would need toensuresend a periodic heartbeat (i.e. every 30 seconds) either to the reflector or Host B (the callee's endpoint) from the RTP listener port (RTCP packets should be sent regardless of media). The heartbeat ensures thattheya media path (i.e. NAPT translations) arereachable. This solution worksnot torn down due to prolonged silence There is no need forfirewalls, NATs, multi-homed hosts, singly homed hosts, andendpoints behind Full Cone NAPTs to inform the reflectors about the termination of avarietycall since the media does not affect the consumption of service provider resources. 5.4 If Host A determined that it was behind a symmetrical NAPT: If the host is behind a symmetric enterprise NAT, things are more complex. With normal RTP, a network intermediary needs to be used. The user receives media packets from this intermediary, and the othercases. Storing incoming connectionsparty in the call sends packets to the intermediary. 1. Host A sends atable for later reuse is useful even between proxies. If TCP or TLSpre-Invite probe (probe type two) to Reflec- tor C from its RTP and RTCP listener ports. 2. Reflector C responds to Host A with an authentication chal- lenge (i.e. 401 Not Authorized). It is suggested that dig- est authentication (rfc2069) be usedbetween proxies XandY,thatconnection canthe user information be based on their SIP profiles stored in a registrar. The nonce created byboth XReflector C could be comprised of an element of time (i.e. UMT), the externally visible IP address andY,port on which the pre-Invite probe appears to be sourced from when it reaches Reflector C, andthus reused for messaging in either direction. It is fora private key configured on Reflector C. Since Host A will have no knowledge of its externally visible address at thisreason that we separatepoint, spoofing/replaying a response to this challenge becomes difficult. 3. Host A responds to the challenge by hashing the SIP userid and password based on the nonce provided by Reflector C. 4. Reflector C digests the results of this challenge and for- wards a query of the user's information in the registrar. The connectiontable management frombetween Reflector C and theregistration processing. Such table management is needed if oneregistrar should be over a secure tunnel (i.e. TLS). 5. The registrar will keep track of theproxies wasnumber of concurrent connections requested by Host A. This should be on theinside ofcen- tralized registrar rather than thefirewall, for example. Inreflector in the event thatcase, responsesmultiple reflectors exist. If the registrar determines that Host A is at its pre-determined maximum number of con- current sessions, the registrar will fail the query despite credentials matching andrequests inreturn an appropriate error to Reflector C. Reflector C will subsequently reply to Host A with a probe response challenge failure (max sessions). 6. If Host A is within thereverse direction would neednumber of allowed concurrent ses- sions but does not provide correct credentials, the regis- trar will fail the query and return the appropriate message to Reflector C. Reflector C will subsequently reply to Host A with a probe response challenge failure (invalid user). 7. If successful, Reflector C returns a probe response type two to Host A which includes the externally visible IP address of Host A and a unique call id. There will beforwarded overa separate response for both RTP and RTCP and they will have unique call ids since theconnection initiated byReflector may not be able to match probe requests for RTP and RTCP. This call id is used later when informing theproxy. 5 Handlingreflector that this call has been torn down. 8. Reflector C will inform the registrar that Host A now has an additional active connection (there will be two per call for each host: one for RTPDealing with SIP wasand another for RTCP). The registrar will send an acknowledgement to Reflector C. 9. Host A sends an acknowledgement to Reflector C. Reflector C will re-transmit theeasy part. Gettingprobe response to Host A every 20ms until it receives an acknowledgement for up to 30 seconds. If Host A does not acknowledge themedia throughprobe response type two, Reflector C will begin an independent call timer that sends aNAT or firewallmessage to the registrar to remove one concurrent call for Host A after a pre-determined amount of time (i.e. 180 seconds). This timer is to ensure that endpoints cannot exploit the service providers NAPT router by intentionally failing to acknowledge the probe response (and therefore creating morecomplex.concurrent calls than they are allotted) without penalizing the subscriber for a possible network failure. A call flow for this case is shown in Figure 6. 5.5 During the Call (Symmetrical NAPT) This section only applies when the endpoint is using the service provider's Symmetrical Media Server. Host A now proceeds by sending its SIP message with an SDP header that includes the information obtained from the reflector. Note that the SDP must carry RTCP information. During the life of the call, Host A would need to send a periodic heartbeat (i.e. every 30 seconds) to the reflector for both RTP and RTCP. This heartbeat would include the call id. The heartbeat serves two purposes: it ensures that a media path (i.e. NAPT translations) are not torn down due to prolonged silence and that the concurrent session counter is eventu- ally decremented in the event of an endpoint failure. In regards to decrementing the counter, Reflector C will keep a delta timer for each call id based ondynamic ports, peer-to-peer,heartbeat. Should the delta time exceed a pre- configured value that is a multiplier of the heartbeat frequency but greater than the independent session timer (i.e. 210 seconds), Reflector C will believe that the call is no longer active andUDP, allinform the registrar to decrement the counter. As noted above, optionally Host A Reflector C Registrar | | | ---Probe Type Two->| | | | | |<-Challenge (401)-- | | | | -----Response----->| | | | | | -------Query-------->| | | | | |<-----Response------- | | | |<-Reply (Auth)----- | | | | | -----Inform--------->| | | | | |<---Acknowledge------ ------Acknowledge->| | | | Figure 6: Symmetrical NAT call flow the reflector could instruct the service provider's NAPT router to remove the translation. 5.6 Call Teardown (Symmetrical NAPT) This section only applies when the endpoint is using the service provider's Symmetrical Media Server. 1. At the end ofwhich are problematic for NATs, firewalls,the call (Bye, Cancel, orboth. Our solutionin response to a 400/500/600 SIP message), Host A sends a post-call closure messages (probe type four) to Reflector C with a matching call id from the earlier probe type two response from both its RTP and RTCP listener ports. 2. Reflector C responds to Host A with an authentication chal- lenge (same mechanism is used as when setting up the call). This authentication is done in order to protect against service attacks (hackers sends closure messages for other systems). 3. Host A responds to the challenge. 4. Reflector C compares the results of this challenge to the user's information in the registrar. 5. If successful, Reflector C informs the registrar to remove one concurrent session from the counter. Optionally, Reflector C can instruct the service provider's NAPT router to remove the translation for this session. The registrar will acknowledge the decrementing of thecurrent session counter. 6. Reflector C sends an acknowledgement to Host A. 7. If unsuccessful, Reflector C replies to Host A with a probe response challenge failure (invalid user). Host A Reflector C Registrar | | | +-Probe Type Four->| | | | | |<-Challenge (401)-+ | | | | +----Response----->| | | | | | +------Query-------->| | | | | |<-----Response------+ | | | | +----Instruct------->| | | | | |<---Acknowledge-----+ | | | |<----Acknowledge--+ | | | Figure 7: Call Teardown 6 Symmetric RTP The approach in section 5 requires the useconnection oriented media,of an intermediary when eitherUDP, TCP, or TLS, withof theentitiesparties is behindNATs or firewalls initiating the connection.a symmetric NAT. This can be avoided so long as both of the parties are not behind symmetric NAT. The idea isdiscussed in more detailto use symmetric RTP. Symmetric RTP is a new convention for RTP usage within SIP, and is described below.5.1 NATsThe trick to getting RTP through a NAT is to make sure it exhibits two characteristics. First, any users behind a NAT have to send the first packet to establish a NAT binding. Secondly, media sent back to that user must be to the source port where the media came from. In other words, if Joe calls Bob, and only Joe is behind a NAT, Joe must send the first UDP packet to Bob. Let's say Joe sends from IP address and port pair A,B to Bob at public address and port C,D. The NAT will translate port pair A,B to X,Y. Bob receives the media. To talk to Joe, it is essential that Joe send his media with source port C,D to destination port X,Y. This will be received by the NAT, and have the destination translated to A,B, where it is sent to Joe. Unfortunately, RTP does not work this way. When used with SIP, a conversation between Joe and Bob will result in two RTP sessions, one from Joe to the address Bob provided in his SDP, and one from Bob to the address provided by Joe in his SDP. This will not work withNAT. 5.1.1 Bi-Directional RTPsymmteric NAT without an intermediary. 6.1 Operation Our solution is simple: we definebi-directionalsymmetric RTP.Bi-directionalSymmetric RTP runs over UDP. Like TCP, one side initiates a connection to the other side. As a result, one side is active (initiates the connection), and the other side is passive (waits for the connection). Like TCP, data in the reverse direction is sent to the port where the connection came from. Unlike TCP, abi-directionalsymmetric RTP connection is created when the first packet arrives; there is no explicit handshake or setup. There are no retransmissons or changes to the RTP protocol operation. The only difference is thatbidirectionalsymmetric RTP involves sending media on the same socket used to receive it. An example flow usingbidirectionalsymmetric media is shown in Figure2.8. Joe calls Bob. Assume for this flow that Joe is behind a NAT, and Bob is not. For simplicities sake, we don't show proxies, and don't show much of the SIP detail. Joe indicates, in his SDP in the INVITE, that he is capable ofbi-directionalsymmetric RTP, and wishes to be the active side of the connection (more on this later). Bob receives the INVITE, and responds with a 200 OK. His SDP indicates that he can be the passive side, and he provides the IP address and port to connect to. When Joe receives the 200 OK, an ACK is sent. Then, Joe sends a RTP packet to the IP address and port provided by Bob. The RTP packet passes through the NAT, and has its source address rewritten. When Bob receives this packet, the connection is established. Bob now has the IP address and port to send media back to. This address/port is the one from the source address of the RTP packet Bob just received (which has been natted). Bob sends media to this address. Thosepacketspack- ets have their destination address natted, translated back to the address Joe used to send the first packet. In traditional unidirectional RTP, Joe would have included an IP address and port in the INVITE, and Bob would have sent media to this address, rather than the one in the RTP packet received from Joe. This does not work through NAT, since this address is wrong, and since no NAT binding has been established.BidirectionalSymmetric RTP does not suffer this problem; note how Joe does not actually need to provide an IP address in the SDP in hisINVITE.INVITE (although must be provided for backwards compatibility). The call flow when Bob is behind the NAT is very similar, and is shown in Figure3.9. Instead of Joe being the active side of theconnection,con- nection, Bob is the active side. It is important to note that the role of active or passive for the RTP connection is not tied to who makes the call. As a result, when only one the participants is behind a NAT, a direct UDP connection can be used between them. When both are behind NATs,an RTP translatora different solution isneeded. Thisneeded, and this isdescribed in Section 5.1.3. 5.1.2 Signaling Supportdiscussed below. 6.2 SDP Extensions SDP extensions are needed to allow the signaling discussed above to take place. Specifically, extensions are needed to indicate that a media stream isbidirectionalsymmetric RTP, and to allow each side to indicate that they are active, passive, or can play either role. As it turns out, this is exactly the kind of signaling provided in the SDP extensions for TCP media[7].[5]. That draft only handles TCP and TLS, but the semantics for TCP are identical tobidirectionalsymmetric UDP.Therefore, we propose that a new keyword, BAVP, be used to signal thatThere- fore, the transport remains UDP, but theRTP is bidirectional. Thedirection attribute and the exchange procedures defined in[7][5] for TCP works as described forBAVP.UDP. The fact that the stream is symmetric is signaled by the pres- ence of the active, passive, or both attributes. Revisiting the flow in Figure2,8, the SDP in the INVITE would actually appear as: c=IN IP410.0.1.1/12710.0.1.1 m=audio 9RTP/BAVPRTP/UDP 0 a=direction:active and in the 200 OK as:c=IN IP4 4.5.11.3/127 m=audio 4444 RTP/BAVP 0 a=direction:passive 5.1.3 Both parties behind NAT The approach described above works if (1) only one of the two parties are behind a NAT, and (2) the party behind a NAT knows they are behind a NAT. To handle these problems, we introduce functionality into the proxies. The proxies can detect, by inspecting components of the messages, which parties are behind NATs. They can rewrite SDP in order to ensure that those parties behind NATs are active. Furthermore, when both are behind a NAT, the proxies can bring an RTP translator into the call. RTP translators can be thought of as RTP routers; they receive RTP packets on a particular incoming port, and send them out on a different port/address. When both parties are behind a NAT, the proxies will rewrite the SDP so that both sides initiate outward connections to the RTP translator. The RTP translator then hands packets back and forth between the connections. We show these boxes incorporated into the architecture in Figure 4. Only one translator is needed per call. Our architecture will only result in usage of the box when both parties are behind NATs, which| | | | | | |---------------------------------------------> | | | INV sip:bob@Y.com | | | active | | | | | | | | | | |<--------------------------------------------- | | | 200 OK | | | passive | | | 4.5.11.3:4444 | | | | | | | |---------------------------------------------> | | | ACK | | | | | | | | | RTP from Joe to Bob | |----------------->---------------------------> | |S:10.0.1.1:12 |S:7.1.1.1:227 | |D:4.5.11.3:4444 |D:4.5.11.3:4444 | | | | | | RTP from Bob to Joe | |<--------------<-------------------------------| |S:4.5.11.3:4444 | S:4.5.11.3:4444 | |D:10.0.1.1:12 | D:7.1.1.1:227 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe NAT Bob Figure2: Bi-directional8: Symmetric RTP Flow | | | | | | |---------------------------------------------> | | | INV sip:bob@Y.com | | | either | | | 7.1.1.1:88 | | | | | | | |<--------------------------------------------- | | | 200 OK | | | active | | | | | | | | | | |---------------------------------------------> | | | ACK | | | | | | | | |RTP from Bob to Joe | |<----------------<---------------------------< | |S:4.5.11.3:654 | S:10.0.1.1:44 | |D:7.1.1.1:88 | D:7.1.1.1:88 | | | | | |RTP from Bob to Joe | |>-------------->------------------------------>| | | | |S:7.1.1.1:88 | S:7.1.1.1:88 | |D:4.5.11.3:654 | D:10.0.1.1:44 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe NAT Bob Figure3: Bi-directional9: Symmetric RTP Flow, NAT role reversedis thec=IN IP4 4.5.11.3 m=audio 4444 RTP/UDP 0 a=direction:passive For reasons of backwards compatibility, a host that indicates active onlycase when one is needed. Our solution will resultinthe invocation of RTP forwarding services by the domain of the called party. The basic idea behind the solution is this. User agentsan INVITE must still list an IP address and port in the SDP, and beableprepared toinitiate or terminate bidirectional RTP connections. The calling side always indicates support for both.receive media on it. Whena proxy for a user in some domain receives a call (either to or from that user), that proxy accepts the responsibility for settingthe 200 OK comes, if it contains no direction attributeinat all, the client knows that the server did not support this SDPin suchextension. As away thatresult, theclient will be able to successfully handle media. Consider first proxy X, representing Joe. When Joe makes an outgoing call, Joe's UAserver willsetignore the direction attribute in theSDP to "both" and include the IP addressINVITE, andport Joe is preparedproceed toreceivesend mediaon. This INVITE is senttoproxy X. Proxy X determines if Joe is behind a NAT. This can be done either through configuration (when the user signs up, they indicate whether they are behind a NAT or not), or through packet inspection. If the source address of the INVITE does not matchthe IP address and port in theVia header (especially if the ports don't match), Joe is behind a NAT. If JoeINVITE. The result isbehindaNAT, proxy X knows that Joe can not accept incoming connections. Thus, Joe cannot actually be either active or passive; he must be active. Proxy X therefore rewrites the SDPvery nice, smooth backwards compatibility from sym- metric toindicate a direction of active. If, for some reason, Joe's UA had settraditional RTP usage. 6.2.1 RTCP Address and Port Unfortunately, theSDP to indicate either active or passive, this can be taken as an indicator that Joe knows he is (active) or isNAT may not(passive) behind a NAT, in which case no action is needed by the proxy. When the call arrives at proxy Y, proxy Y first determines the call routing. If it discovers that the call is to be routedallocate consecutive port bindings to thecalled party's machine (which it knows based on whether the user registered with the Contact cookie),RTP andit determinesRTCP packets. THis means thatthe called party is behindaNAT (based on the source address of the REGISTER compared to the address in the top Via header of the REGISTER), the proxy mayclient will need tomodify the SDP. If the SDPsig- nal in theincoming INVITE indicates a direction of both, it is changed to passive (this way, the called party initiates the connection). If the direction is passive, nothing is done. If theSDPin the incoming INVITE indicates a direction of active, there is a problem. Both parties are only capable of initiating active connections. To handle this, proxy Y needs to involve an RTP translator. It allocates a pair of address/port pairs, A and B, from the translator. It rewrites the SDP in the INVITE to indicate a direction of passive, and setsthe IPadressaddress and portpair +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ ---- /RTP \ | Forw.| \ / ---- +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | Joe | . . | Bob | . . | SIP UA| . . | SIP UA| . . +-------+ . . +-------+ . ............................. ............................. Enterprise A Enterprise B Figure 4: RTP Translators to A. This will ensure that the called party initiates anfor both RTPconnection out to the translator. Similarly, in the SDP in the response, the direction (which will be active) is rewritten to passive,andthe IP addressRTCP, separately. An approach for doing this isset to B. This will ensure that the calling party initiates andocumented by Huitema 7 Using Symmetric RTPconnection out to the translator. The proxy then tells the translator that packets received on A should be relayed to the connection on B,andviceNAT ID together In this section, we show how aversa. The actions athost would make use of both symmetric RTP and theproxies for incomingNAT ID andoutgoing callsbinding protocol. There aresummarized in Table 1. Call Direction SDP direction rewritemany cases tonote Incoming both passive Incoming active passive introduce RTP translator Incoming passive - Outgoing both active Outgoing active - Outgoing passive - Table 1: Rules for SDP Rewriting Based on these rules, wecon- sider. The caller and callee cananalyze the four cases. In case one, neither party iseither be behind a symmetric NAT, cone NAT, or no NAT. The callerindicates a direction of "both" in the SDP. The local outbound proxy doesand callee can either support or notchange that, since it detects thatsupport thecaller is not behind a NAT.symmetric RTP extension. Thecall is forwarded to the proxy for the called party. It doesn't modify the SDP either, and forwards the call to the called party. In its response, the called party indicates that itcaller or callee can either support or not support the NAT ID proposal. While this may seem like adirectionlarge number of"both". When the response is delivered to the calling party, both sides initiate bidirectional RTP connections to each other. Onecases (144 ofthem is chosen, and is used for media. Inthem), thesecond case,actual behavior at a host to handle all thecallercases isbehindquite simple. Why would aNAT,host ever support symmetric RTP, butthe called partynot NAT ID? This isnot. The caller indicates a direction of "both"in cases where theSDP. The local outbound proxy detects thathost is some kind of service provider media- enabled device, such as a gateway or conferencing server. These net- works are ideally deployed without NAT at all, or with a midcom-based firewall solution. As a result, NAT-ID is not needed, since thecallerhost knows it has a public address. Symmetric RTP is still helpful, to allow optimized access to the service from hosts behind a NAT.It therefore modifiesIn considering theSDP to indicate a direction of "active". The callcases, though, this case isforwardedidentical to theproxy forone where thecalled party. It determineshost does support NAT ID, since NAT ID will always indicate that thecalled party is not behindhost has aNAT. So, it leaves the SDP alone.public address. Thecalled party sees that the caller requested the active sidebehavior of theconnection. So, in the 200 OK response, the called party indicates passive. This 200 OKhost during call setup isforwarded backtherefore identical to thecaller. The caller initiates a bidirectionalcase where NAT-ID wasn't there. This case aside, symmetric RTPconnection the called party, which succeeds. The media is sent over that connection. Indoes require thethird case,use of NAT ID to detect whether thecallerhost isnotbehind aNAT, butNAT or not. We start with the caller. If thecalled party is. Thecallerindicates a directionis an existing client that is unaware of"both" insymmetric RTP or theSDP. The local outbound proxy does not change that, sinceNAT ID protocol, itdetects thatsends a regular INVITE. Of course, this will only work if the caller is not behind a NAT.The call is forwarded to the proxy for the called party. This proxy determines thatIf thecalled party iscaller supports NAT ID, it can detect if its behind a NAT.It rewrites the direction tag in the SDP in the INVITE from "both" to "passive". This is received at the called party. It has no choice but to respond withIf so, before adirection of "active" in its 200 OK. This is forwarded to the calling party. The called party then initiatescall, it determines abidirectional RTP connection to the caller, which succeeds. The media is sent over that connection. Inpublic address using thefourth,NAT ID protocol, andworst case, scenario, both are behind NATs. The caller indicates a direction of "both"uses this in the SDP.The local outbound proxy detects that the callerIf it also supports sym- metric RTP, and is behind aNAT. It therefore modifies the SDP to indicatesymmetric NAT, it indicates a direction of"active". The call is forwarded to the proxyactive forthe called party. THis proxy also detects that the called party isits media streams. If its behind aNAT. However, the SDP indicates a direction of "active", which is bad. The proxy then brings in an RTP translator,cone NAT, it indi- cates that it supports both active andrewrites the direction to bepassive. Italso sets the c line and m line to contain address/port pair A ofthen sends thetranslator. This INVITE receivedINVITE. It arrives at the called party.It has no choice but to respond with a direction of "active" in its 200 OK. The 200 OK is received atIf theproxy, wherecalled party supports symmetric RTP, itrewrites the direction tag from "active" to "passive". It also sets the c line and m line to contain address/port pair B of the translator. This INVITE is received atchecks whether thecalling party. Both sides then initiate outbound connections. Thecallersends RTP to address/port B, and the callee sends RTP to address/port A. The translator exchanges media between these two connections. Either the proxy or the RTP translator can manage the lifecycle of the connection binding. If the proxy does it, the proxy must record- route When the call is oversupported it (knownthrough the BYE),based on theproxy destroyspresence of theconnections and connection bindings fromdirection attribute in thetranslator.SDP). If theRTP translator manages the lifecycles, the proxy need not ever record route or maintain call state. When the call is over, thecaller supported it, andcallee both disconnect their RTP connections tothetranslator (this is done with an RTCP BYE). When both connections disconnect, the translator can destroy the bindings. In cases where therecalled party isno RTP translator available, and both parties arenot behind a NAT,media cannot flow. In some cases, this will be detectable by the called party orthey insert theirproxy (ifpublic address into theincomingSDPhas bidirectional media with a direction of active,in the response, and offer to be the passive side. Otherwise, if the called party is behind a NAT,and no translator is available). In this case,they obtain an address using thecalled party or proxy responds with a 488 Not Acceptable Here, and includes a Warning header indicating a code 308 -NATTraversal Failure. 5.2 Firewalls Because firewalls restrict connections to outbound only, the same problem that plagues NATs also plagues firewalls. The same solution as described above can also solve it, with a few minor tweaks. The solution in Section 5.1 is defined for UDP. UDP will not work through firewalls. Therefore, RTP over TCP or TLS is used instead. In the worst case, the RTP would need to be carried over a TLS connection on port 443. Besides this difference, the solution for firewall is the same as described for NAT. Note that since SIP may be over TLS to port 443 as well, the proxyID proto- col, and insert that into theRTP translator should not be onSDP in thesame IP address. 6 Caveats There are many caveats with our proposed solutions, especially for firewall. 6.1 NAT Solutions o RTP translators are horrible.response. Theauthor spent much time arguing against such devices, oncalled party indicates passive if thegrounds thatcaller indicated active, or they indicate active otherwise. If theunderlying IP network already providing routing capabilities, and that these do not need to be replicated atcalled party doesn't support symmetric RTP, it allocates an address binding (if it supports thevoice transport layer. They will increase overall voice latency, introduce another point of failure,NAT ID protocol), andincur additional costs to providers. However, they are unavoidable givenplaces that in thefundamental semantic of the IP address, that it is a globally reachable point for communications, has been violated by NATs. Perhaps this is argument can be rephrased as, "unreliable and delayed communication beats no communication." o IfSDP in the response. Since symmetric RTPtranslatoris notco-resident withsupported, no direction attributes are indicated in the response. If theproxy, some kind of control protocol is needed to allocate addresses and to establish bindings. No such protocol exists right now. The midcom protocol [3] or MGCP [8] might be used for this purpose. We expect these translators to be bundled with proxies, and thus make use of proprietary protocols initially. o It is possible that both caller andcalled partyare behind a NAT, but are behind *the same* NAT. In this case, no RTP translatorisneeded. In theory, this case can be hard to detect, but in practice, can frequently be determined administratively. As an example, a SIP provider might be providing centrex typesignorant ofservices to users in a network behind a NAT. The proxy providing these services will know which users belong to the same enterprise, andNAT ID, itcan modify its behavior accordingly. Even if the proxysimply places whatever it thinks iswrong,its address in theworst caseresponse. The result of this fairly simple processing is thatanmedia flows directly whenever at all possible, using symmetric RTPtranslator is involved, increasing voice latency. o Ifwhenever pos- sible. Only in thecalling party ismost extreme case, where both caller and callee are behinda NAT, an RTP connection cannot be established until the 200 OK is returned tosymmetric NATs, does thecaller. This meansservice provider NAT get used. We also get smooth backwards compatibility, so thatthe post-pickup delay increases by an RTT, which introduces additional clipping. Thiscalls work as best they canbe solved through early media. The SDPif one side isreturned in a 183, allowing the media connection to be established before the 200 OK. oignorant of these extensions. 8 Security Considerations Theuseallocation ofpersistent TCP or TLS connections for SIP betweenaddresses on theuser agents and their proxies makes clustering more complex. With traditional UDP, a callservice provider NAT consumes resources. Therefore, requests forsome user could arrive at any proxy that has accessthose resources need to be authen- ticated, and coupled with thelocationapplication layer servicewhich can route the call to Bob. Not so any longer. With persistent connections, the users are partitioned across the proxies in a cluster. 6.2 Firewall Solutions o Riding on top of port 443 for SIP over TLS goes against the principles of the guidelines establishedprovided by theIESG [9]. o TLS or TCP will result in very bad voice delays as soon as the packet loss is nonzero. Interestingly, with zero packet loss, the delays for voice over TCP will be equal to those of voice over UDP. Clients will need adaptive voice buffer algorithms that can tolerate wide swings in latencies. o Current SIP client implementations do not require a TCP stack. The firewall solution will require TCP and/or TLS. o For firewalls, our approach requires a TLS server process (to receive RTP) embedded within a SIP enabled communications client.provider. Thiswill require a public/private key and its associated certificate, available to the client, issued from a Certification Authority (CA) thatisknown towhy we specify theother party. Similarly,use ofa TLS client will require that the client be configured with the keys of a set of well known CAs. SupportSIP authentication mechanisms forTCP and/or TLS inthesoftphones can be mitigated by deploying UDP to TCP/TLS translation proxies inside of the firewall. 7 Security Considerations RTP translatorsreflector protocol. Sample Router Configurations The following areeffectively man-in-the middle systems. As a result, a rogue proxy and RTP translatorsample configuration files that canlisten inbe used onthe media of all users initiating calls through it. To prevent this, clients initiating TLS connections toaserver should verify that the server name in the SDP is a subdomain of the name presentedCisco router inthe certificate. Furthermore, the client should only connectorder toservers whose domains are subdomains of their service provider, or the provider of the other party inprovide thecall. 8 Conclusion In this draft, we have proposed some modifications to SIP operation which allow it to successfully pass through NATs and firewalls. We believe ourNATsolution is very workable. It has minimal impact on clients, allows voice to run over UDP, and uses direct UDP transportfunctions needed inall but the worst case. Our solutions for firewalls are less palatable. The ideal solution is for firewall administrators to allow SIP (over TCP on 5060 or TLS on 5061) out through the firewall, and to eventually deploy ALGs, preferably using the midcom architecture. We believe that solving the firewall and NAT problems are critical for deployment of SIP.Figure 3. Service Provider Router A sample configuration: int s0 ip address 63.1.1.1 255.255.255.252 int e0 ip address 193.1.2.2 255.255.255.0 int e1 ip address 193.1.1.2 255.255.255.0 ip route 193.1.2.0 25.255.255.0 e0 ip route 193.1.1.0 255.255.255.0 e1 ip route 193.1.3.2 255.255.255.255 e0 ip route 0.0.0.0 0.0.0.0 s0 Service Provider NAPT router sample configuration: int e0 ip nat inside ip address 193.1.2.1 255.255.255.0 int e1 ip nat outside ip address 193.1.1.1 255.255.255.0 int e3 ip address 193.1.3.1 255.255.255.0 ip nat pool rtp 193.1.1.3 193.1.1.3 prefix 24 ip nat inside source list 9Acknowledgements We would like to thank Jeffrey Citron and John Butz from Vonage for their efforts at verifying UDP NAT capabilities in existing commercial products. 10pool rtp overload ip nat outside source static udp list 9 193.1.3.2 6060 access-list 9 permit any any ip route 0.0.0.0 0.0.0.0 e0 A Author's Addresses Jonathan Rosenberg dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jdrosen@dynamicsoft.com Joel Weinberger dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jweinberger@dynamicsoft.com Henning Schulzrinne Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: schulzrinne@cs.columbia.edu11B Bibliography [1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP network address translator (NAT)," Internet Draft, InternetEngineeringEngineer- ing Task Force, Oct. 2000. Work in progress. [2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through firewalls and NATs," Internet Draft, Internet Engineering Task Force, Feb. 2000. Work in progress. [3] P. Srisuresh, J. Kuthan, and J. Rosenberg, "Middleboxcommunicationcommunica- tion architecture and framework," Internet Draft, InternetEngineeringEngineer- ing Task Force, Feb. 2001. Work in progress. [4]P. Srisuresh and M. Holdrege, "IP network address translator (NAT) terminology and considerations," Request for Comments 2663, Internet Engineering Task Force, Aug. 1999. [5] E. Rescorla, "HTTP over TLS," RequestC. Huitema, "Short term NAT requirements forComments 2818,UDP based peer-to- peer applications," InternetEngineering Task Force, May 2000. [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: session initiation protocol," Request for Comments 2543,Draft, Internet Engineering Task Force,Mar. 1999. [7]Feb. 2001. Work in progress. [5] D. Yon, "TCP-Based media transport in SDP," Internet Draft, Internet Engineering Task Force, Nov.2000. Work in progress. [8] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett, "Media gateway control protocol (MGCP) version 1.0," Request for Comments 2705, Internet Engineering Task Force, Oct. 1999. [9] K. Moore, "On the use of HTTP as a substrate for other protocols," Internet Draft, Internet Engineering Task Force, Oct.2000. Work in progress.