< draft-rosenberg-sip-entfw-01.txt   draft-rosenberg-sip-entfw-02.txt >
Internet Engineering Task Force SIP WG Internet Engineering Task Force SIP WG
Internet Draft J.Rosenberg,H.Schulzrinne Internet Draft J.Rosenberg,J.Weinberger,H.Schulzrinne
draft-rosenberg-sip-entfw-01.txt dynamicsoft,Columbia U. draft-rosenberg-sip-entfw-02.txt dynamicsoft,Columbia U.
March 2, 2001 July 20, 2001
Expires: September, 2001 Expires: February 2002
SIP Traversal through Residential and Enterprise NATs and Firewalls NAT Friendly SIP
STATUS OF THIS MEMO STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as work in progress. material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
In this draft, we discuss how SIP can traverse enterprise and In this draft, we discuss how SIP can traverse enterprise and
residential firewalls and NATs. This environment is challenging residential NATs. This environment is challenging because we assume
because we assume here that the end user has little or no control here that the end user or SIP provider has no control over the NAT,
over the firewall or NAT, and that the firewall or NAT is completely and that the NAT is completely ignorant of SIP. Our approach is to
ignorant of SIP. Despite this, our solutions for the NAT case are make SIP "NAT friendly", with a few minor, backwards compatible
very workable and suffer few disadvantages. extensions. These extensions allow UDP and TCP-based SIP to traverse
NATs. We also handle RTP traversal using a combination of symmetric
(aka connection-oriented) RTP and a new NAT detection and binding
discovery mechanism. The results of the approach are that direct
UDP-based RTP is used whenever provably possible in any given nat
configuration. We use a network intermediary - in our case, an off-
the-shelf router - to handle the case when both caller and called
party are behind symmetric NATs. Our approach for binding discovery
is effectively a pre-midcom solution that allows binding allocations
by talking to a server behind the nat, rather than talking to the nat
directly.
1 Introduction 1 Introduction
The problem of getting applications through firewalls and NATs has The problem of getting applications through NATs has received a lot
received a lot of attention [1]. Getting SIP through firewalls and of attention [1]. Getting SIP through NATs is particularly trouble-
NATs is particularly troublesome. In a previous draft [2] we some. In a previous draft [2] we discussed some of the general issues
discussed some of the general issues regarding traversal of regarding traversal of firewalls, and discussed some solutions for
firewalls, and discussed some solutions for it. Our solutions were it. Our solutions were based on having a proxy server control the
based on having a proxy server control the firewall/NAT with a firewall/NAT with a control protocol of some sort [3]. This protocol
control protocol of some sort [3]. This protocol can open and close can open and close pinholes in the firewall, and/or obtain NAT
pinholes in the firewall, and/or obtain NAT address bindings to use address bindings to use in rewriting the SDP in a SIP message.
in rewriting the SDP in a SIP message.
The use of a control protocol in the midcom architecture is ideal for The use of a control protocol in the midcom architecture is ideal for
carriers, but it does not work when the SIP service provider is not carriers, but it does not work when the SIP service provider is not
the same as the ISP and transport provider of the end user. This is the same as the ISP and transport provider of the end user. This is
frequently the case for users behind enterprise firewalls and NATs frequently the case for users behind enterprise and NATs who are try-
who are trying to access SIP services outside of their networks. The ing to access SIP services outside of their networks. The same hap-
same happens for residential NATs and firewalls. These devices are pens for residential NATs. These devices are often used by consumers
often used by consumers who have cable modem and DSL connections, and who have cable modem and DSL connections, and wish to connect multi-
wish to connect multiple computers using the single address provided ple computers using the single address provided by the cable company
by the cable company or DSL company. [1] Residential firewalls and or DSL company. [1] often referred to as cable/DSL routers, and are
NATs are often referred to as cable/DSL routers, and are manufactured manufactured by companies like Linksys, Netopia, and Netgear.
by companies like Linksys, Netopia, and Netgear.
Ultimately, it is our belief and hope that NATs will disappear with Ultimately, it is our belief and hope that NATs will disappear with
the deployment of IPv6. However, that is not likely to happen for the deployment of IPv6. However, that is not likely to happen for
some time. some time.
Given the existence of NATs, one way to handle SIP is to embed a SIP Given the existence of NATs, one way to handle SIP is to embed a SIP
ALG within enterprise NATs and firewalls. However, this has not ALG within enterprise NATs. However, this has not happened. The top
happened. The top commercial firewall and NAT products continue to be commercial NAT products continue to be SIP-unaware. Even if SIP ALG
SIP-unaware. Even if SIP ALG support were added tomorrow, there is support were added tomorrow, there is still a huge installed based of
still a huge installed based of firewalls and NATs that do not NATs that do not understand SIP. As a result, there is going to be a
understand SIP. As a result, there is going to be a long period of long period of time during which users will be behind NATs that are
time during which users will be behind firewalls or NATs that are ignorant of SIP, probably at least two to three years. The SIP com-
ignorant of SIP, probably at least two to three years. The SIP munity cannot wait for ubiquituous deployment of SIP aware NATs.
community cannot wait for ubiquituous deployment of SIP aware Interim solutions are needed NOW to enable SIP services to be
firewalls and NATs. Interim solutions are needed NOW to enable SIP delivered to users behind these devices.
services to be delivered to users behind these devices.
In this draft, we propose solutions for getting SIP through
enterprise and residential NATs and firewalls that does not require
changes to these devices or to their configurations. NATs and
firewalls are a reality, and SIP deployment is being hampered by the
lack of support for SIP ALGs in these boxes. A solution MUST be
found, and we provide one here.
2 Architecture
In this draft, we propose solutions for getting SIP through enter-
prise and residential NATs that does not require changes to these
devices or to their configurations. NATs are a reality, and SIP
_________________________ _________________________
[1] The author of this draft is amongst those who [1] The author of this draft is amongst those who
have such a residential NAT, and thus feels highly have such a residential NAT, and thus feels highly
motivated to solve this particular problem motivated to solve this particular problem
We assume that the network architecture we are dealing with looks deployment is being hampered by the lack of support for SIP ALGs in
like Figure 1. The caller is a UA in enterprise or residence A, and these boxes. A solution MUST be found, and we provide one here.
the called party is a UA in enterprise or residence B. The caller
uses proxy X as its local outbound proxy, which forwards the call to
the proxy of the called party, Y, also outside of the firewall or
NAT. The call is then forwarded to the called party within enterprise
or residence B.
The firewall and/or NAT (FW/NAT) boxes are off-the-shelf boxes with 2 Some Philosophy
no support for SIP ALG. We consider NAT and firewall separately. For
NATs, we consider specifically a class of devices referred to as
residential NATs.
Residential NATs are typically placed in the home, and allow multiple Our solution centers on the principle that applications, including
devices to make use of a single IP address provided by a cable or DSL components within network servers and end systems, need to take an
provider. The devices generally disallow incoming traffic, but allow active role in nat traversal.
outbound TCP and UDP connections. Based on the terminology defined in
RFC 2663 [4], residential NATs are Network Address Port Translators
(NAPT). Once a connection is established outwards, data on the same
connection is allowed inwards from the remote peer. This is true for
UDP as well. Specifically, if a user sends UDP packets from local IP
address and port pair A,B to remote IP address and port pair C,D,
they are natted to have a source address of X,Y. Packets sent from
C,D to X,Y have their destination address natted to A,B, and are
delivered back to the host behind the NAT. The ability to NAT UDP
packets in this way is critical to our solutions. We have verified
this feature on the leading residential NAT products.
Many small offices and home offices (SOHO) also use these devices to This is counter to much of the existing work in nat traversal, which
allow their business to connect to the Internet over cable or DSL. focuses on construction of ALGs embedded within NATs to make the
Because the device is configured identically in this case, we lump it existence of nats totally transparent to end systems and application
with the residential NAT. layer network servers. The midcom efforts [3] have taken a step for-
ward by recognizing that applications (either within end systems or
network servers) are best suited to take a role in controlling NAT
behavior. We believe that this approach needs to be taken one step
farther, in that applications, especially those with components in
end systems, need to adapt to the existence of non-midcom enabled
NATs as well. In fact, we believe that the application of the end-
to-end principle in this case argues in favor of our approach.
Enterprise firewalls are used in larger enterprises. They are The end-to-end principle argues that:
typically configured with much tighter security. We assume the worst
case scenario, which is that these boxes will allow users inside
their enterprises to browse the web, and specifically, to browse
secure web sites. UDP, both inbound and outbound, is disallowed. TCP
inbound is disallowed. Outbound TCP from any host within the
enterprise is allowed out only to port 80 and 443. Our assumption is
that these devices are not running NAT.
Handling enterprise devices that are both firewalls and NAPT involves The function in question can completely and correctly be
combing the solutions for both cases. Wherever appropriate, we implemented only with the knowledge and help of the appli-
discuss any issues specific to combining the two. cation standing at the end points of the communication sys-
tem. Therefore, providing that questioned function as a
feature of the communication system itself is not possible.
In general, getting SIP services to function behind these devices It is clear that the end-to-end principle would argue against the
existence of NATs in the first place. However, there existence is a
matter of reality. In order to properly engineer future protocols and
applications, we are forced to take their existence as a given, and
then investigate how our network design principles provide guidance
on how to deal with them.
So, given that NATs exist, the end-to-end principle would tell us
that only the applications can know what the impact of NAT will be on
the functioning of the application. Since the end system is the one
invoking the application, it is often best suited to determine how to
deal with it. The overall system is much simpler and robust when the
application in the end systems takes active participation in dealing
with NAT.
Another way to view it is from the perspective of application adapta-
tion. It has been a common design principle in real time applications
for the end systems to adapt to the network conditions. Networks
might provide best effort, some level of QoS, or be overprovisioned
for real time media. Rather than force the network to always deliver
a specific level of quality, the applications detect the network con-
ditions, and adapt to whatever they find. The result are robust
applications and an overall simpler architecture.
We are arguing that this principle still makes sense when extended to
other IP network "characteristics", including the presence of NAT.
The existence of NAT, and the type of function it provides, are
another axis in the overall space of IP network service. Applications
will be the most robust and will perform best when they detect what
level of network service (including QoS and NAT) is being provided,
and then adapt to it in an optimal fashion. Just as QoS varies, so
too do the types of NATs vary. By detecting what type of NAT is
present, an end system can figure out how to achieve the best level
of service given the existence of that NAT.
This approach means that applications can handle cases where there
are ALGs (which still makes sense in many scenarios), application-
unaware NATs, or what have you. When NATs disappear entirely, the
applications will continue to function, and their performance will
improve, in fact.
3 Overview of the Approach
Our approach consists of several pieces that are put together for a
complete solution. The first is a set of SIP extensions that allow
just SIP (but not neccesarily the sessions it establishes) to
traverse NATs. Our extensions are relatively minor, backwards compa-
tible, and allow NAT traversal for UDP and TCP transports. These
extensions to SIP are described in Section 4.
Providing traversal for the media streams is more complex. The first
step in the process is to allow end systems to detect whether there
is a NAT between them and their SIP provider, and furthermore, to
detect what type of treatment the NAT affords to UDP. We define a
simple protocol which enables that to happen. Once the NAT type is
detected, our protocol allows the end system to detect what its pub-
lic facing address is on the other side of the NAT. We also discuss a
router configuration which allows outside entities to send packets to
this public address even under the strictest of NAT behaviors (which
we call a symmetric NAT). These protocol mechanisms are discussed in
Section 5.
Unfortunately, the mechanism of Section 5 requires an intermediate
RTP relay (which is implemented using another NAT in our proposal)
when the user is behind a symmetric NAT. To fix that problem, we
define symmetric RTP, which is a new RTP usage scenario. It
effectively provides connection-oriented RTP over UDP. It is com-
pletely backwards compatible, and can avoid the need for an intermed-
iary so long as one side in the call is not behind a symmetric NAT.
Symmetric RTP, and the SDP extensions required to support it, are
described in Section 6.
Finally, in Section 7, we put it all together, and show the various
call flows that would exist for a variety of different configura-
tions. The end result of our mechanisms are that end-to-end UDP media
transport, directly between the two parties in a call leg, is always
provided so long as it is provably possible. Only in the cases where
it is provably impossible for direct media connectivity do we use an
intermediary in the service provider domain.
The overall architecture we assume for the discussion is shown in
Figure 1.
The caller is a UA in enterprise or residence A, and the called party
is a UA in enterprise or residence B. The caller uses proxy X as its
local outbound proxy, which forwards the call to the proxy of the
called party, Y, also outside of the firewall or NAT. The call is
then forwarded to the called party within enterprise or residence B.
4 SIP Extensions for NAT Traversal
This section discusses extensions to SIP that allow SIP itself to
traverse NATs. There are two primary extensions - via ports and the
contact cookie.
4.1 Via Ports
The first problem with SIP traversal through NATs is sending a
request from a client behind a NAT to a server on the outside.
SIP specifies that for UDP, the response is sent to the port number
in the Via header and the IP address the request came from. However,
due to NAT, the port number in the Via header will be wrong. This
means that the response will not be sent to the proper location. How-
ever, with TCP, responses are sent over the connection the INVITE
arrived on. This means that a response sent over the TCP connection
will be received properly by a caller behind a NAT. Therefore, one
solution for traversal of requests from inside to outside is to use
persistent TCP connections. However, many VoIP endpoints do not sup-
port TCP, so a UDP based solution is desirable.
Our approach is to define a new Via header parameter, called the
response port, encoded as "rport". This parameter is inserted by
+-------+ +-------+ +-------+ +-------+
| SIP | | SIP | | SIP | | SIP |
| Proxy | | Proxy | | Proxy | | Proxy |
| X | | Y | | X | | Y |
| | | | | | | |
+-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+
........|FW/NAT |............ ........|FW/NAT |............ ........|FW/NAT |............ ........|FW/NAT |............
. | | . . | | . . | | . . | | .
skipping to change at page 5, line 4 skipping to change at page 7, line 4
. +-------+ . . +-------+ . . +-------+ . . +-------+ .
. | SIP UA| . . | SIP UA| . . | SIP UA| . . | SIP UA| .
. | Joe | . . | Bob | . . | Joe | . . | Bob | .
. +-------+ . . +-------+ . . +-------+ . . +-------+ .
............................. ............................. ............................. .............................
Enterprise or Enterprise or Enterprise or Enterprise or
Residence A Residence B Residence A Residence B
Figure 1: Network Architecture Figure 1: Network Architecture
requires resolution of several problems: clients (which can be proxies or UACs) when they wish for the
response to be sent to the IP address and port the request was sent
Originating Requests: Getting SIP requests from the caller, Joe, from. The parameter is inserted with no value to flag this feature.
to proxy X, and responses from proxy X back to the Joe. When received at a server, the server inserts the port the request
was received from as the value of this parameter. That port is used
to forward the response.
Receiving Requests: Getting SIP requests from proxy Y to the response-port = ``rport'' [``='' 1*DIGIT]
called party, Bob, and responses from Bob back to proxy Y.
Handling RTP: Getting media to go from Joe to Bob and Bob to A client inserting the rport into the Via header MUST wait for
Joe. responses on the socket the request is sent on, and MUST also list,
in the sent-by field, the local port of that socket the request was
sent from. The latter is mandatory for backwards compatibility.
We discuss solutions for each in turn. Consider an example. A client sends an INVITE which looks like:
3 Originating requests INVITE sip:user@domain SIP/2.0
Via: SIP/2.0/UDP 10.1.1.1:4540;rport
The first problem is originating requests from the caller through a This INVITE is sent with a source port of 4540 and source IP of
firewall/NAT, out to a proxy, and getting the responses from this 10.1.1.1. The request is natted, so that the source IP appears as
proxy back to the caller. 68.44.20.1 and the source port as 9988. This is received at a proxy.
The proxy forwards the request, but not before appending a value to
the rport parameter in the proxied request:
3.1 NAT INVITE sip:user@domain2 SIP/2.0
Via: SIP/2.0/UDP proxy.domain.com
Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988
The residential NAT will allow both outgoing UDP and TCP traffic to This request generates a response, which arrives at the proxy:
port 5060. This means that there are no problems in generating an
outbound INVITE. However, there are issues with the response.
SIP specifies that for UDP, the response is sent to the port number SIP/2.0 200 OK
in the Via header and the IP address the request came from. However, Via: SIP/2.0/UDP proxy.domain.com
due to NAT, the port number in the Via header will be wrong. This Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988
means that the response will not be sent to the proper location.
However, with TCP, responses are sent over the connection the INVITE
arrived on. This means that a response sent over the TCP connection
will be received properly by a caller behind a NAT.
The simplest solution, therefore, is for the caller to use a TCP The proxy strips its top Via, and then examines the next one. It
connection to send the INVITE, and receive the response. We recommend contains both a received param, and an rport. The result is that the
that this connection be kept open permanently, to avoid the need to follow response is sent to IP address 68.44.20.1, port 9988:
establish it for new calls. A persistent connection is also needed
for incoming calls in any case (see Section 4). For devices which do
not support TCP, UDP may be used. However, the proxy needs to be able
to send the UDP response to the address *and* port the request
arrived on. This is not standardized behavior, but could potentially
be configured for requests from users that are known to be behind
residential NATs.
In order for this connection to be used for re-INVITEs or BYEs, the SIP/2.0 200 OK
proxy needs to record route. Via: SIP/2.0/UDP 10.1.1.1:4540;received=68.44.20.1;rport=9988
3.2 Firewall The NAT rewrites the destination address of this packet back to IP
We assume the firewall (FW) blocks all outgoing UDP, but will allow 10.1.1.1, port 4540, and is received by the client.
some outgoing TCP. In the worst case, it will only allow outgoing
HTTP traffic on 80, and HTTPS on 443. HTTPS is nothing more than HTTP
over TLS/SSL [5]. What's interesting about https is that the
connection starts out with TLS, negotiates a secure channel, and then
runs HTTP over this channel. All HTTP messages are encrypted. The FW
never sees any HTTP messages in the clear, only TLS/SSL messages. The
important implication is that there is no way for a FW to have
application layer intelligence that depends on the existence of HTTP
on port 443. In fact, any protocol can be run over TLS on port 443,
and it will look the same to the FW. Since we assume that the FW lets
HTTPS through, it should allow SIP over TLS through, running on port
443.
Thus, our proposal is to have the caller, Joe, initiate a TLS This works fine when the server supports this extension, so long as
connection on port 443 to the proxy server X. Once the TLS connection there are no nats between the client and server. Consider a server
is secured, the client can send SIP messages over this connection. that does not understand it. In this case, it will ignore the rport
Handling of SIP over TLS/SSL is identical to TCP. Responses from the parameter, and send the following response to IP 10.1.1.1, port 4540:
proxy are sent over this connection as well [6]. We recommend that
the client maintain the TLS connection to be open (more on this in
Section 4). This avoids the need to re-initiate the TLS connection
for every outgoing call.
Fooling the FW into believing the traffic is HTTPS by running it over SIP/2.0 200 OK
port 443 is not nice. We would strongly recommend that clients first Via: SIP/2.0/UDP 10.1.1.1:4540;rport
try the IANA registered port for SIP over TLS, port 5061. If no
response is received over this connection, the client should then try
443.
Note that outgoing requests may work with just vanilla TCP. However, As specified by SIP, this response is sent to the source IP of the
we have observed that some firewalls examine TCP connections to look request, and the port in the Via header. Since the client is listen-
for specific protocols. Thus, SIP over TCP on 5060 may not work. SIP ing on 4540, the response is received correctly.
over TCP on port 80 may also not work, as some firewalls check for
HTTP messages. This is why we prefer TLS; we believe that it is most
likely to work.
In order for this connection to be used for re-INVITEs or BYEs, the In the case where the server does not support the extension, but
proxy needs to record route. there is a nat between the client and the server, the response is
sent to the source IP and port in the Via, which will be dropped by
the nat. This is the same behavior exhibited by SIP today. As a
result, our extension is backwards compatible, in the sense that it
always works at least as well as baseline SIP. When both sides sup-
port it, and there is a nat in the middle, traversal works correctly.
4 Receiving requests For the response to always be received, the NAT binding must remain
in existence for the duration of the transaction. Most UDP NAT bind-
ings appear to have a timeout of one minute. Therefore, non-INVITE
transactions will have no problem. For INVITE transactions, the
client may need to retransmit its INVITE every 20 seconds or so, even
after receiving a provisional response, in order to keep the binding
open to receive the final response.
Unfortunately, receiving requests is not as simple as sending them. Because of the increased network traffic generated to keep the UDP
We consider first the NAT case, and then the firewall case. bindings active, it is RECOMMENDED that TCP be used instead, as it
generates much less data.
4.1 NAT 4.2 Contact Translation
The received port parameter will allow requests initiated from inside
the NAT (and their responses), to work. However, getting requests
from a proxy outside the NAT, to a host inside, is a different story.
The problem has to do with registrations. In Figure 1, the callee, The problem has to do with registrations. In Figure 1, the callee,
Bob, will receive requests at their UA because they had previously Bob, will receive requests at their UA because they had previously
sent a REGISTER request to their registrar, which is co-located with sent a REGISTER request to their registrar, which is co-located with
proxy Y. This registration contains a Contact header which lists the proxy Y. This registration contains a Contact header which lists the
address where the incoming requests should be sent to. However, in address where the incoming requests should be sent to. However, in
the case of NAT, this address will be wrong. It will contain a domain the case of NAT, this address will be wrong. It will contain a domain
name or IP address that is within the private space of enterprise B. name or IP address that is within the private space of enterprise B.
Thus, the REGISTER might look like: Thus, the REGISTER might look like:
REGISTER sip:Y.com SIP/2.0 REGISTER sip:Y.com SIP/2.0
From: sip:bob@Y.com From: sip:bob@Y.com
To: sip:bob@Y.com To: sip:bob@Y.com
Contact: sip:bob@10.0.1.100 Contact: sip:bob@10.0.1.100
This address is not reachable by the proxy. This address is not reachable by the proxy.
To solve this problem, we need two things. First, we need a To solve this problem, we need two things. First, we need a per-
persistent connection to be established from Bob to Y. Secondly, we sistent "connection" to be established from Bob to Y. Secondly, we
need a way for incoming requests destined for B to be routed over need a way for incoming requests destined for B to be routed over
this connection. this connection.
To address this first problem, we recommend that clients that send To address this first problem, clients have to send REGISTER reuests
REGISTER requests do so over a TCP or TLS connection, as described in over a TCP or TLS connection, or use UDP along with the response port
Section 3. Furthermore, they keep this connection open permanently. parameter in the Via header. If TCP is used, this connection is kept
REGISTER refreshes are sent over this connection. We further open indefinitely. We further recommend that the proxy/registrar hold
recommend that the proxy/registrar hold this connection in a table, this connection in a table, where the table is indexed by the remote
where the table is indexed by the remote side of the transport side of the transport connection. For UDP, the client holds on to the
connection. When the proxy wishes to send a packet to some server at socket, and uses it for REGISTER refreshes and to receive incoming
IP address M, port N, transport O, it looks up the tuple (M,N,O) in calls. The server also holds on to the "connection". In the case of
the table to see if a connection already exists, and then uses it. UDP, that means that server stores the local IP/port that the request
was received on, and indexes it by the source IP and port the request
was sent from. When the proxy wishes to send a packet to some server
at IP address M, port N, transport O, it looks up the tuple (M,N,O)
in the table to see if a connection already exists, and then uses it.
The NAT bindings are kept fresh through REGISTER refreshes (see Sec-
tion 4.2.1).
Now, a connection is available for contacting the user. However, this Now, a connection is available for contacting the user. However, this
connection must be associated with sip:bob@Y.com. Unfortunately, it connection must be associated with sip:bob@Y.com. Unfortunately, it
is not. Calls for sip:bob@Y.com are translated to sip:bob@10.0.1.100, is not. Calls for sip:bob@Y.com are translated to sip:bob@10.0.1.100,
which does not correspond to the remote side connection used to send which does not correspond to the remote side connection used to send
the register, as seen by the proxy. Thats because of NAT, which will the register, as seen by the proxy. Thats because of NAT, which will
make the remote side appear to be a publically routable address. make the remote side appear to be a publically routable address.
To handle this problem, the proxy could, in principal, record the IP To handle this problem, the proxy could, in principal, record the IP
address and port from the remote side of the connection used to send address and port from the remote side of the connection used to send
a REGISTER. Then, it can create a Contact entry of the form a REGISTER. Then, it can create a Contact entry of the form
sip:bob@[ip-addr]:[port], where [ip-addr] and [port] are the IP sip:bob@[ip-addr]:[port], where [ip-addr] and [port] are the IP
address and port of the remote side of the connection. However, this address and port of the remote side of the connection. However, this
is assuming that the registration is for the purposes of connecting is assuming that the registration is for the purposes of connecting
the address in the To field with the machine the connection is coming the address in the To field with the machine the connection is coming
from. That may not be the intent of the registration. The from. That may not be the intent of the registration. The registra-
registration may be used to set up a call forwarding service, for tion may be used to set up a call forwarding service, for example.
example.
As a result, it is our proposal that clients be allowed to explicitly As a result, it is our proposal that clients be allowed to explicitly
ask a proxy to create a Contact entry corresponding to the machine a ask a proxy to create a Contact entry corresponding to the machine a
REGISTER is sent from. We propose that a specific contact hostname REGISTER is sent from. To do that, the UA inserts a Translate header
value be reserved to have the meaning "I don't know what my address into the request. This header contains the URL (which MUST be one of
is, please use the IP address, port and transport from the connection the Contact URLs) that is to be translated, along with a parameter
over which this REGISTER was delivered". We propose that this host that indicates the type of NAT the client is behind.
name be "jibufobutbmpu". This name is "I hate NATS a lot" with each
letter incremented by one. This name is unlikely to be used in real translate-header = ``Translate'' ``:'' SIP-URL [``;'' ``nat'' ``=''
systems (as opposed to something like "default", which could be real nat-types]
host name). nat-types = ``sym'' | ``cone''
If a server receives a REGISTER request with a translate header, it
finds the matching Contact header, and replaces the host value with
the source IP address of the REGISTER, and the port value with the
source port of the REGISTER. This is the actual Contact stored in the
registration database, and returned to the client in the response.
The nat-type parameter is an optional parameter that tells the regis-
trar what type of NAT the client is behind. This information is very
helpful for some faul tolerance and scalability scenarios, described
below. Section 5 discusses how a client can determine what type of
NAT it is behind.
Consider once more the architecture of Figure 1. The callee has an IP Consider once more the architecture of Figure 1. The callee has an IP
address of 10.0.1.100. It initiates a TCP connection to port 5060 on address of 10.0.1.100. It sends a REGISTER from port 2234 to port
the proxy. This connection goes through the NAT, and the source 5060 on the proxy. This connection goes through the NAT, and the
address is rewritten to 77.2.3.88, and the port to 2937. The source address is rewritten to 77.2.3.88, and the source port to
registration looks like: 2937. The registration looks like:
REGISTER sip:Y.com SIP/2.0 REGISTER sip:Y.com SIP/2.0
From: sip:bob@Y.com From: sip:bob@Y.com
To: sip:bob@Y.com To: sip:bob@Y.com
Contact: sip:bob@jibufobutbmpu Via: SIP/2.0/UDP 10.0.1.100;rport
Translate: sip:bob@10.0.1.100:2234
Contact: sip:bob@10.0.1.100:2234
The proxy Y then stores the incoming TCP connection into a table: The proxy Y then stores the socket the request was received on into a
table, indexed by the source port:
(77.2.3.88,2397,TCP) -> [reference to TCP connection] (77.2.3.88,2397,UDP) -> [reference to UDP socket]
It also updates the contact list for sip:bob@Y.com to include the URL It also translates the Contact header to sip:bob@77.2.3.88:2397, and
sip:bob@77.2.3.88:2937;transport=tcp. stores that in the registration database. It then responds to the
REGISTER:
SIP/2.0 200 OK
From: sip:bob@Y.com
To: sip:bob@Y.com
Via: SIP/2.0/UDP 10.0.1.100;rport=2397;received=77.2.3.88
Contact: sip:bob@77.2.3.88:2397
This response is sent to 77.2.3.88:2397 because of the rport. The NAT
translates this to 10.0.1.00:2234, which is then received by the
client.
Now, when an INVITE arrives for sip:b@Y.com, it is looked up in the Now, when an INVITE arrives for sip:b@Y.com, it is looked up in the
registration database. The contact is extracted, and the proxy tries registration database. The contact is extracted, and the proxy tries
to send the request to that address. To do so, it checks its to send the request to that address. To do so, it checks its connec-
connection table to an open connection to the IP address, port and tion table to an open connection to the IP address, port and tran-
transport where the request is destined. In this case, such a sport where the request is destined. In this case, such a connection
connection is available, and the request is forwarded over it. The is available, and the request is forwarded over it. Because it is
response from the callee is also routed over the same connection. over a connection with an existing NAT binding, it is properly routed
through the NAT. The response from the callee is also routed over the
same connection.
In order for this connection to be used for re-INVITEs or BYEs, the In order for this connection to be used for re-INVITEs or BYEs, the
proxy needs to record route. proxy needs to record route.
4.2 Firewalls 4.2.1 Refresh Interval
Since the connection used for the registrations is held persistently
in order to receive incoming calls, the NAT binding must be main-
tained. To avoid timeout, data must traverse the NAT over that con-
nection with some minimum period. When UDP is used, registrations
will need to be refreshed at least once every minute. The clients
SHOULD include an Expires header or parameter with this value. For
TCP, a longer interval can be used. 10 minutes is RECOMMENDED.
The situation is somewhat simpler for the case of firewalls. We still To test whether the interval is short enough, proxy servers MAY
need to have a persistent connection established from Bob out to the attempt to send OPTIONS requests to the client shortly before the
proxy, possibly using TLS over port 443. A registration is then sent registration expires. If the OPTIONS requests generates no response
over this address, which will look like: at all, the server SHOULD lower the value of the Expires header in
the next registration. Servers SHOULD cache and reuse the largest
successful refresh interval that they discover for a given Contact
value.
REGISTER sip:Y.com SIP/2.0 4.2.2 Routing to the Ingress Proxy
From: sip:bob@Y.com
To: sip:bob@Y.com
Contact: sip:bob@44.2.4.1;transport=tcp
For this to work, incoming calls for sip:bob@Y.com must be routed A complication arises when a domain supports multiple proxy servers.
over the connection established by Bob to proxy Y. We assume the Consider the scenario shown in Figure 2
proxy maintains persistent connections in a table, indexed by remote
address, port, and transport (as described above for NAT). In order
for this connection to be used when contacting Bob, Bob's contact
address must be the same as the connection address. This means that
the remote connection address, as seen by Y, has to be 44.2.4.1:5060.
However, there are several cases where it might not be.
In what cases would it not be? First off, the client might be multi- A user joe in domain.com is behind a NAT. In DNS, domain.com contains
homed. Multi-homed hosts are increasingly common as VPNs become more an SRV entry that points to three servers, 1.domain.com, 2.domain.com
pervasive. VPNs show up as virtual interfaces, making hosts and 3.domain.com. When the user registers, they will resolve
multihomed. The client may not be able to correctly guess which domain.com to one of these. Assume its 1.domain.com. As a result of
interface the REGISTER will be sent on. If the client guesses this, the connection state is stored proxy 1.
incorrectly, the IP address in the Contact header may be on a
different interface than the one used to send the registration. The
second case when the connection address and contact address don't
match is when the client incorrectly discovers its own IP address,
even when singly homed. We have observed this to frequently be the
case. In fact, we have seen some systems report back 127.0.0.1 (the
loopback address), in fact, as their IP address.
Thus, even without NAT, the Contact address may not match the source In the case of TCP, this connection state is important. Unless calls
address of the TLS or TCP connection used to register. In fact, this for joe@domain.com arrive to proxy 1, they won't be routable to the
problem has nothing to do with NATs or firewalls. We have observed it UA. In the case of UDP, whether it is important or not depends on the
happening in many real world scenarios. type of NAT the user is behind. One type of NAT, which we call "sym-
metric", treats UDP much like TCP. When A sends a request from inside
to B on the outside, UDP messages back to A must come from B, with a
source port equal to the destination port of messages from A to B. In
the other case, which we call "cone", which is described in [4], UDP
messages back to A can have any source port and IP address.
As a result, it is our recommendation that, as a general rule, If the user is behind a NAT that operates in cone mode, any of the
clients use the "Contact cookie" and a persistent connection in order proxies in the proxy farm will be able to reach the customer through
to ensure that they are reachable. This solution works for firewalls, the NAT. All will send requests to the public IP address and port
NATs, multi-homed hosts, singly homed hosts, and a variety of other binding created by the NAT, but with different source IP addresses
cases. and ports. Since source addressing doesn't matter, things work well.
In this case, the proxy need not even store connection state as
described in Section 4.
Storing incoming connections in a table for later reuse is useful If the user is behind a NAT that operates in symmetric mode, calls to
even between proxies. If TCP or TLS is used between proxies X and Y, the user must come in through the proxy that the user registered to.
that connection can be stored by both X and Y, and thus reused for
messaging in either direction. It is for this reason that we separate
the connection table management from the registration processing.
Such table management is needed if one of the proxies was on the
inside of the firewall, for example. In that case, responses and
requests in the reverse direction would need to be forwarded over the
connection initiated by the proxy.
5 Handling RTP --
// \\
/ \
| DB |
| |
\ /
\\ //
--
Dealing with SIP was the easy part. Getting the media through a NAT +-----+ +-----+ +-----+
or firewall is more complex. RTP is on dynamic ports, peer-to-peer, | | | | | | domain.com
and UDP, all of which are problematic for NATs, firewalls, or both. |Proxy| |Proxy| |Proxy|
| 1 | | 2 | | 3 |
+-----+ +-----+ +-----+
Our solution is to use connection oriented media, either UDP, TCP, or +-------------------------+
TLS, with the entities behind NATs or firewalls initiating the | NAT |
connection. This is discussed in more detail below. +-------------------------+
5.1 NATs +-----+
| |
|UA |
| |
+-----+
Figure 2: Multiple Proxy Configuration
In order to enable this, we recommend that the location server data-
base store not only the contact, but the proxy that the user con-
nected to. When a call comes in for that user, the proxy receiving
the INVITE looks up the user in the database. The database entry
indicates the proxy the user is connected to (call this the connected
proxy). If the connected proxy is not the proxy which received the
INVITE, the proxy that received the INVITE uses a route header to
force the call through the connected proxy. In the case where joe
registered at proxy1, and the incoming INVITE arrived at proxy 2, the
request sent by proxy 2 would look like:
INVITE sip:proxy1.domain.com SIP/2.0
Route: sip:joe@22.1.20.3:3038
This request will first go to proxy1, and from there, over the exist-
ing connection to joe.
The differing proxy behaviors for symmetric and cone NATs explains
the presence of the nat-type attribute in the Translate header.
Assuming the client can determine which type it is behind (using the
mechanisms described below), it can simply inform the proxy, allowing
it to take the proper action.
4.2.3 INVITE Usage
The 200 OK response to the REGISTER request contains the SIP URL that
the registrar placed into the database. This address has the impor-
tant property that it is routable to the client from the proxy on the
public side of the NAT. As a result, the client needs to place this
URL as the Contact header in its INVITE requests and 2xx responses to
INVITE, so that it can be reached from the proxy on the outside.
5 RTP/RTCP NAPT Identification and Traversal
In this section, we provide a protocol and basic architecture that
allows a client to detect what type of NAT it is behind (cone or sym-
metric), and obtain the public address for an RTP stream.
The general idea is to make use of reflectors that return back to the
client the source IP address and port that a request came from. The
general configuration is showin Figure 3. In this figure, the hosts
that wish to make or receive a call are behind enterprise or residen-
tial NATs. They are making use of a service provider that deploys,
along with its proxies, three different reflectors, along with a few
off-the-shelf routers configured in a specific fashion to act as a
media intermediary.
Reflector A is responsible for letting the user know whether they are
behind a symmetric NAT, and for providing the address of another
reflector (type C) which can be used to obtain an address binding on
on a network intermediary.
Reflector B is used to let the user know whether they are behind a
cone NAT (one which allows packets back to a natted host from any
source port and IP, not just the one the outbound packet was sent
to). It MUST be on a different IP address and port than reflector A.
This is to deal with NATs which may allow packets back to an internal
address from the same IP the packet was sent to, but different port.
This kind of "partial-cone" NAT would be equivalent to a symmetrical
one for the purposes of RTP.
Reflector C is used to allow the user to determine an address binding
that is created on a NAT in the service provider domain. This NAT,
and the routers around it, are configured so that the user can
receive UDP packets through their enterprise NAT, even if its a sym-
metric NAT.
5.1 At initial power-up of Host A
When a client boots up, it first attempts to determine whether it is
behind a NAT, and if so, what type. The following procedure is used:
1. Host A sends initial probe (probe type one) to Reflector A
from its RTP and RTCP listener ports. Reflector A is the
same IP address as the proxy server configured for this
endpoint but an incremented port value (i.e. 5062). Reflec-
tor A could be the same physical device as the proxy server
or on a seperate host by a static address translation.
2. Reflector A responds to Host A with an initial acknowledge-
ment (probe response type one). This will create a symmetr-
ical NAPT translation if the NAPT was initial a partial
cone that migrates to symmetrical based on a response. Host
A will re-transmit the probe packet every 50ms (until a
timeout period of one minute) or until it receives this
acknowledgement. The acknowledgement (probe response type
one) will not contain the externally visible IP address of
Host A; rather it will identify itself as the initial ack-
nowledgement and contain a transaction timeout value. This
value indicates the maximum time that Host A should wait
for a message from Reflector B before determining it is
behind a symmetrical NAPT. If Host A does not receive a
+---------+
| |
/Reflector|
/| B |
/+---------+
/
/
/ +---------+
/ | |
/ /Reflector|
/ /| A |
/ / +---------+
+---------+ +---------+ / /
| | |Ent. NAPT| / /
| Host A -----Router A \ / /
| | | |\ / /
+---------+ +---------+ \ // +---------+
\// |Service |
/------Provider |
/ |Router A |
/ +----|----+
+---------+ +---------+ / |
| | |Ent. NAPT|/ |
| Host B -----Router B / |
| | | | |
+---------+ +---------+ |
+----|----+ +---------+
|Ser. Prov| |Reflector|
| NAPT ---------- C |
| Router | | |
+---------+ +----|----+
|
|
|
|
+----|----+
| |
|Registrar|
| |
+---------+
Figure 3: Configuration for NAPT Identification and Traversal
message from Reflector B within the specified timeframe,
Host A will know that it is behind a symmetrical NAPT and
send a subsequent message to Reflector A in which it asks
for the address of Reflector C. By placing the request for
the address of Reflector C after Host A has failed to hear
from Reflector B, the provider can utilize deterministic
load-balancing mechanisms for its Symmetrical Media Server.
For this reason, Reflector A should be transaction state-
ful. If a request for the address of Reflector C comes that
does not match transaction information (i.e. source IP
address) and is outside of the designated transaction
timeout value plus one second, then Reflector A should
respond with an error (i.e. 481). This will help limit
attacks on Reflector A in which the attacker tries to throw
off any load balancing mechanisms that the provider might
be using when selecting the address for Reflector C to be
used in the responding to hosts.
3. Reflector A instructs Reflector B to send a message (probe
response type two) to Host A. This message will contain the
externally visible address of Host A and the transaction
timeout value that was sent to Host A.
4. Reflector B will send the message (probe response type two)
to Host A and inform Reflector A that it has sent the mes-
sage to Host A. Reflector A will continue to instruct
Reflector B to send the message to Host A every 20ms or
until it receives the acknowledgement from Reflector B that
the message has been sent.
5. If Host A receives the message (probe response type two)
from Reflector B it will know that it is behind a full-cone
style NAPT. Host A will send an acknowledgement to Reflec-
tor B. Reflector B will continue to retransmit the message
to Host A every 50ms for up to the transaction timeout
value specified by Reflector A or until it receives an ack-
nowledgement from Host A.
6. If Host A does not get a probe response type two within the
timeout value specified by Reflector A of sending its ini-
tial probe packet, it will assume that it is behind a sym-
metrical NAPT. If this occurs, Host A sends a message to
Reflector A (Probe Type Three) informing it that it is
behind a symmetrical NAPT. Reflector A will respond to this
message with an acknowledgement that includes the IP
address of Reflector C. Reflector A will retransmit this
response every 50ms for up to 30 seconds or until it
receives an acknowledgement from Host A.
A call flow for the case where Host A is behind a full-cone NAPT is
show in Figure 4, and if Host A is behind a symmetrical NAPT, Figure
5.
Host A Reflector A Reflector B
| | |
---Probe Type One--->| |
| | |
|<-Probe Response----- |
| Type One | |
| -----Instruct----->|
| | |
| |<----Acknowledge---
| | |
| |
|<--------Probe Response Type Two--------
| |
---------------Acknowledge------------->|
| |
Figure 4: Full cone flow
5.2 When forming an Invite or 18n response
At some point later, host A either wishes to make a call, or wishes
to answer an incoming call. In either case, if its behind a NAT, it
needs to place an address and port in the SDP in the offer or answer
which can be used to receive media. The approach that is used depends
on what type of NAT the client determined it was behind.
If Host A determined it was behind a full-cone NAPT:
1. Host A sends a pre-Invite probe (probe type two) to Reflec-
tor A from its RTP and RTCP listener ports.
2. Reflector A responds to Host A with Host A's externally
visible IP addresses. Host A then uses this address and
port in the SDP header of the SIP message (note that this
requires the SDP to carry RTCP address and port informa-
tion).
3. If Host A does not receive a response from Reflector A, it
will retransmit the pre- Invite probe every 50ms for up to
Host A Reflector A Reflector B
| | |
+--Probe Type One--->| |
| | |
|<-Probe Response----+ |
| Type One | |
| +----Instruct----->|
| | |
| |<----Acknowledge--+
| | |
| |
| ....--Probe Response Type Two-------+
| | |
+-Probe Type Three-->|
| |
|<-Probe Response----+
| Type Three (with |
| IP address of |
| Reflector C) |
| |
+---Acknowledge----->|
| |
Figure 5: Symmetric flow
10 seconds. If Host A does not receive a response from
Reflector A, it will inform the user that a network error
has occurred an re-run the power-on test detailed above.
The message flow for RTP is as follows:
Step Device Addressing
1 RTP listener port on Host A DA=192.1.3.2:6060, SA=X:Y
2 Enterprise NAPT router A DA=193.1.3.2:6060, SA=X1:Y1
3 Router (receives packet and
passes it to E0)
4 Service Provider NAPT router,
Ethernet port 0 DA=193.1.3.2:6060, SA=X2:Y2
5 Reflector Response created with SA=X2:Y2
6 Service Provider NAPT router,
Ethernet port 0 SA=193.1.3.2:6060, DA=X1:Y1
7 Enterprise NAPT router SA=193.1.3.2:6060, DA=X:Y
8 RTP listener port on Host A SA=193.1.3.2:6060, DA=X:Y
with payload reflecting
external SA of X2:Y2
The SIP endpoint now places X2:Y2 into its SDP header as its RTP and
RTCP listener. In the above example, the address is 193.1.1.3 with a
randomly selected port. This address and port actual exist on the RTP
NAPT router and is addressable via the public internet.
Now Host B receives this information in an Invite or 180 message and
sends RTP media to X2:Y2 (e.g. 193.1.1.3:32001). Since this is a pub-
lic address, the packet is sent as follows:
Step Device Addressing
1 RTP sender on Host B DA=X2:Y2, SA=A:Z
2 Enterprise NAPT router B DA=X2:Y2, SA=A1:Z1
3 Router (receives packet and
passes it to E1)
4 Service Provider NAPT router,
Ethernet port 1 DA=X2:Y2, SA=193.1.3.2:6060
5 Service Provider NAPT router,
Ethernet port 0 DA=X1:Y1, SA=193.1.3.2:6060
6 Enterprise NAPT router DA=X:Y, SA=193.1.3.2:6060
7 RTP listener on Host A
(receives packet) DA=X:Y, SA=193.1.3.2:6060
5.3 During the Call (Full Cone NAPT)
The media path, in this situation, is end-to-end via the enterprise
NAPT routers. Media does not traverse the service provider's reflec-
tors or symmetrical media servers. During the life of the call, Host
A would need to send a periodic heartbeat (i.e. every 30 seconds)
either to the reflector or Host B (the callee's endpoint) from the
RTP listener port (RTCP packets should be sent regardless of media).
The heartbeat ensures that a media path (i.e. NAPT translations) are
not torn down due to prolonged silence
There is no need for endpoints behind Full Cone NAPTs to inform the
reflectors about the termination of a call since the media does not
affect the consumption of service provider resources.
5.4 If Host A determined that it was behind a symmetrical NAPT:
If the host is behind a symmetric enterprise NAT, things are more
complex. With normal RTP, a network intermediary needs to be used.
The user receives media packets from this intermediary, and the other
party in the call sends packets to the intermediary.
1. Host A sends a pre-Invite probe (probe type two) to Reflec-
tor C from its RTP and RTCP listener ports.
2. Reflector C responds to Host A with an authentication chal-
lenge (i.e. 401 Not Authorized). It is suggested that dig-
est authentication (rfc2069) be used and that the user
information be based on their SIP profiles stored in a
registrar. The nonce created by Reflector C could be
comprised of an element of time (i.e. UMT), the externally
visible IP address and port on which the pre-Invite probe
appears to be sourced from when it reaches Reflector C, and
a private key configured on Reflector C. Since Host A will
have no knowledge of its externally visible address at this
point, spoofing/replaying a response to this challenge
becomes difficult.
3. Host A responds to the challenge by hashing the SIP userid
and password based on the nonce provided by Reflector C.
4. Reflector C digests the results of this challenge and for-
wards a query of the user's information in the registrar.
The connection between Reflector C and the registrar should
be over a secure tunnel (i.e. TLS).
5. The registrar will keep track of the number of concurrent
connections requested by Host A. This should be on the cen-
tralized registrar rather than the reflector in the event
that multiple reflectors exist. If the registrar determines
that Host A is at its pre-determined maximum number of con-
current sessions, the registrar will fail the query despite
credentials matching and return an appropriate error to
Reflector C. Reflector C will subsequently reply to Host A
with a probe response challenge failure (max sessions).
6. If Host A is within the number of allowed concurrent ses-
sions but does not provide correct credentials, the regis-
trar will fail the query and return the appropriate message
to Reflector C. Reflector C will subsequently reply to Host
A with a probe response challenge failure (invalid user).
7. If successful, Reflector C returns a probe response type
two to Host A which includes the externally visible IP
address of Host A and a unique call id. There will be a
separate response for both RTP and RTCP and they will have
unique call ids since the Reflector may not be able to
match probe requests for RTP and RTCP. This call id is used
later when informing the reflector that this call has been
torn down.
8. Reflector C will inform the registrar that Host A now has
an additional active connection (there will be two per call
for each host: one for RTP and another for RTCP). The
registrar will send an acknowledgement to Reflector C.
9. Host A sends an acknowledgement to Reflector C. Reflector C
will re-transmit the probe response to Host A every 20ms
until it receives an acknowledgement for up to 30 seconds.
If Host A does not acknowledge the probe response type two,
Reflector C will begin an independent call timer that sends
a message to the registrar to remove one concurrent call
for Host A after a pre-determined amount of time (i.e. 180
seconds). This timer is to ensure that endpoints cannot
exploit the service providers NAPT router by intentionally
failing to acknowledge the probe response (and therefore
creating more concurrent calls than they are allotted)
without penalizing the subscriber for a possible network
failure.
A call flow for this case is shown in Figure 6.
5.5 During the Call (Symmetrical NAPT)
This section only applies when the endpoint is using the service
provider's Symmetrical Media Server.
Host A now proceeds by sending its SIP message with an SDP header
that includes the information obtained from the reflector. Note that
the SDP must carry RTCP information. During the life of the call,
Host A would need to send a periodic heartbeat (i.e. every 30
seconds) to the reflector for both RTP and RTCP. This heartbeat would
include the call id. The heartbeat serves two purposes: it ensures
that a media path (i.e. NAPT translations) are not torn down due to
prolonged silence and that the concurrent session counter is eventu-
ally decremented in the event of an endpoint failure. In regards to
decrementing the counter, Reflector C will keep a delta timer for
each call id based on heartbeat. Should the delta time exceed a pre-
configured value that is a multiplier of the heartbeat frequency but
greater than the independent session timer (i.e. 210 seconds),
Reflector C will believe that the call is no longer active and inform
the registrar to decrement the counter. As noted above, optionally
Host A Reflector C Registrar
| | |
---Probe Type Two->| |
| | |
|<-Challenge (401)-- |
| | |
-----Response----->| |
| | |
| -------Query-------->|
| | |
| |<-----Response-------
| | |
|<-Reply (Auth)----- |
| | |
| -----Inform--------->|
| | |
| |<---Acknowledge------
------Acknowledge->| |
| |
Figure 6: Symmetrical NAT call flow
the reflector could instruct the service provider's NAPT router to
remove the translation.
5.6 Call Teardown (Symmetrical NAPT)
This section only applies when the endpoint is using the service
provider's Symmetrical Media Server.
1. At the end of the call (Bye, Cancel, or in response to a
400/500/600 SIP message), Host A sends a post-call closure
messages (probe type four) to Reflector C with a matching
call id from the earlier probe type two response from both
its RTP and RTCP listener ports.
2. Reflector C responds to Host A with an authentication chal-
lenge (same mechanism is used as when setting up the call).
This authentication is done in order to protect against
service attacks (hackers sends closure messages for other
systems).
3. Host A responds to the challenge.
4. Reflector C compares the results of this challenge to the
user's information in the registrar.
5. If successful, Reflector C informs the registrar to remove
one concurrent session from the counter. Optionally,
Reflector C can instruct the service provider's NAPT router
to remove the translation for this session. The registrar
will acknowledge the decrementing of thecurrent session
counter.
6. Reflector C sends an acknowledgement to Host A.
7. If unsuccessful, Reflector C replies to Host A with a probe
response challenge failure (invalid user).
Host A Reflector C Registrar
| | |
+-Probe Type Four->| |
| | |
|<-Challenge (401)-+ |
| | |
+----Response----->| |
| | |
| +------Query-------->|
| | |
| |<-----Response------+
| | |
| +----Instruct------->|
| | |
| |<---Acknowledge-----+
| | |
|<----Acknowledge--+ |
| |
Figure 7: Call Teardown
6 Symmetric RTP
The approach in section 5 requires the use of an intermediary when
either of the parties is behind a symmetric NAT. This can be avoided
so long as both of the parties are not behind symmetric NAT. The idea
is to use symmetric RTP. Symmetric RTP is a new convention for RTP
usage within SIP, and is described below.
The trick to getting RTP through a NAT is to make sure it exhibits The trick to getting RTP through a NAT is to make sure it exhibits
two characteristics. First, any users behind a NAT have to send the two characteristics. First, any users behind a NAT have to send the
first packet to establish a NAT binding. Secondly, media sent back to first packet to establish a NAT binding. Secondly, media sent back to
that user must be to the source port where the media came from. In that user must be to the source port where the media came from. In
other words, if Joe calls Bob, and only Joe is behind a NAT, Joe must other words, if Joe calls Bob, and only Joe is behind a NAT, Joe must
send the first UDP packet to Bob. Let's say Joe sends from IP address send the first UDP packet to Bob. Let's say Joe sends from IP address
and port pair A,B to Bob at public address and port C,D. The NAT will and port pair A,B to Bob at public address and port C,D. The NAT will
translate port pair A,B to X,Y. Bob receives the media. To talk to translate port pair A,B to X,Y. Bob receives the media. To talk to
Joe, it is essential that Joe send his media with source port C,D to Joe, it is essential that Joe send his media with source port C,D to
destination port X,Y. This will be received by the NAT, and have the destination port X,Y. This will be received by the NAT, and have the
destination translated to A,B, where it is sent to Joe. destination translated to A,B, where it is sent to Joe.
Unfortunately, RTP does not work this way. When used with SIP, a Unfortunately, RTP does not work this way. When used with SIP, a
conversation between Joe and Bob will result in two RTP sessions, one conversation between Joe and Bob will result in two RTP sessions, one
from Joe to the address Bob provided in his SDP, and one from Bob to from Joe to the address Bob provided in his SDP, and one from Bob to
the address provided by Joe in his SDP. This will not work with NAT. the address provided by Joe in his SDP. This will not work with
symmteric NAT without an intermediary.
5.1.1 Bi-Directional RTP 6.1 Operation
Our solution is simple: we define bi-directional RTP. Bi-directional Our solution is simple: we define symmetric RTP. Symmetric RTP runs
RTP runs over UDP. Like TCP, one side initiates a connection to the over UDP. Like TCP, one side initiates a connection to the other
other side. As a result, one side is active (initiates the side. As a result, one side is active (initiates the connection), and
connection), and the other side is passive (waits for the the other side is passive (waits for the connection). Like TCP, data
connection). Like TCP, data in the reverse direction is sent to the in the reverse direction is sent to the port where the connection
port where the connection came from. Unlike TCP, a bi-directional RTP came from. Unlike TCP, a symmetric RTP connection is created when the
connection is created when the first packet arrives; there is no first packet arrives; there is no explicit handshake or setup. There
explicit handshake or setup. There are no retransmissons or changes are no retransmissons or changes to the RTP protocol operation. The
to the RTP protocol operation. The only difference is that only difference is that symmetric RTP involves sending media on the
bidirectional RTP involves sending media on the same socket used to same socket used to receive it.
receive it.
An example flow using bidirectional media is shown in Figure 2. Joe An example flow using symmetric media is shown in Figure 8. Joe calls
calls Bob. Assume for this flow that Joe is behind a NAT, and Bob is Bob. Assume for this flow that Joe is behind a NAT, and Bob is not.
not. For simplicities sake, we don't show proxies, and don't show For simplicities sake, we don't show proxies, and don't show much of
much of the SIP detail. Joe indicates, in his SDP in the INVITE, that the SIP detail. Joe indicates, in his SDP in the INVITE, that he is
he is capable of bi-directional RTP, and wishes to be the active side capable of symmetric RTP, and wishes to be the active side of the
of the connection (more on this later). Bob receives the INVITE, and connection (more on this later). Bob receives the INVITE, and
responds with a 200 OK. His SDP indicates that he can be the passive responds with a 200 OK. His SDP indicates that he can be the passive
side, and he provides the IP address and port to connect to. When Joe side, and he provides the IP address and port to connect to. When Joe
receives the 200 OK, an ACK is sent. Then, Joe sends a RTP packet to receives the 200 OK, an ACK is sent. Then, Joe sends a RTP packet to
the IP address and port provided by Bob. The RTP packet passes the IP address and port provided by Bob. The RTP packet passes
through the NAT, and has its source address rewritten. When Bob through the NAT, and has its source address rewritten. When Bob
receives this packet, the connection is established. Bob now has the receives this packet, the connection is established. Bob now has the
IP address and port to send media back to. This address/port is the IP address and port to send media back to. This address/port is the
one from the source address of the RTP packet Bob just received one from the source address of the RTP packet Bob just received
(which has been natted). Bob sends media to this address. Those (which has been natted). Bob sends media to this address. Those pack-
packets have their destination address natted, translated back to the ets have their destination address natted, translated back to the
address Joe used to send the first packet. address Joe used to send the first packet.
In traditional unidirectional RTP, Joe would have included an IP In traditional unidirectional RTP, Joe would have included an IP
address and port in the INVITE, and Bob would have sent media to this address and port in the INVITE, and Bob would have sent media to this
address, rather than the one in the RTP packet received from Joe. address, rather than the one in the RTP packet received from Joe.
This does not work through NAT, since this address is wrong, and This does not work through NAT, since this address is wrong, and
since no NAT binding has been established. Bidirectional RTP does not since no NAT binding has been established. Symmetric RTP does not
suffer this problem; note how Joe does not actually need to provide suffer this problem; note how Joe does not actually need to provide
an IP address in the SDP in his INVITE. an IP address in the SDP in his INVITE (although must be provided for
backwards compatibility).
The call flow when Bob is behind the NAT is very similar, and is The call flow when Bob is behind the NAT is very similar, and is
shown in Figure 3. Instead of Joe being the active side of the shown in Figure 9. Instead of Joe being the active side of the con-
connection, Bob is the active side. It is important to note that the nection, Bob is the active side. It is important to note that the
role of active or passive for the RTP connection is not tied to who role of active or passive for the RTP connection is not tied to who
makes the call. makes the call.
As a result, when only one the participants is behind a NAT, a direct As a result, when only one the participants is behind a NAT, a direct
UDP connection can be used between them. When both are behind NATs, UDP connection can be used between them. When both are behind NATs, a
an RTP translator is needed. This is described in Section 5.1.3. different solution is needed, and this is discussed below.
5.1.2 Signaling Support 6.2 SDP Extensions
SDP extensions are needed to allow the signaling discussed above to SDP extensions are needed to allow the signaling discussed above to
take place. Specifically, extensions are needed to indicate that a take place. Specifically, extensions are needed to indicate that a
media stream is bidirectional RTP, and to allow each side to indicate media stream is symmetric RTP, and to allow each side to indicate
that they are active, passive, or can play either role. that they are active, passive, or can play either role.
As it turns out, this is exactly the kind of signaling provided in As it turns out, this is exactly the kind of signaling provided in
the SDP extensions for TCP media [7]. That draft only handles TCP and the SDP extensions for TCP media [5]. That draft only handles TCP and
TLS, but the semantics for TCP are identical to bidirectional UDP. TLS, but the semantics for TCP are identical to symmetric UDP. There-
Therefore, we propose that a new keyword, BAVP, be used to signal fore, the transport remains UDP, but the direction attribute and the
that the RTP is bidirectional. The direction attribute and the exchange procedures defined in [5] for TCP works as described for
exchange procedures defined in [7] works as described for BAVP. UDP. The fact that the stream is symmetric is signaled by the pres-
ence of the active, passive, or both attributes.
Revisiting the flow in Figure 2, the SDP in the INVITE would actually Revisiting the flow in Figure 8, the SDP in the INVITE would actually
appear as: appear as:
c=IN IP4 10.0.1.1/127 c=IN IP4 10.0.1.1
m=audio 9 RTP/BAVP 0 m=audio 9 RTP/UDP 0
a=direction:active a=direction:active
and in the 200 OK as: and in the 200 OK as:
c=IN IP4 4.5.11.3/127
m=audio 4444 RTP/BAVP 0
a=direction:passive
5.1.3 Both parties behind NAT
The approach described above works if (1) only one of the two parties
are behind a NAT, and (2) the party behind a NAT knows they are
behind a NAT. To handle these problems, we introduce functionality
into the proxies. The proxies can detect, by inspecting components of
the messages, which parties are behind NATs. They can rewrite SDP in
order to ensure that those parties behind NATs are active.
Furthermore, when both are behind a NAT, the proxies can bring an RTP
translator into the call. RTP translators can be thought of as RTP
routers; they receive RTP packets on a particular incoming port, and
send them out on a different port/address. When both parties are
behind a NAT, the proxies will rewrite the SDP so that both sides
initiate outward connections to the RTP translator. The RTP
translator then hands packets back and forth between the connections.
We show these boxes incorporated into the architecture in Figure 4.
Only one translator is needed per call. Our architecture will only
result in usage of the box when both parties are behind NATs, which
| | | | | |
| | | | | |
|---------------------------------------------> | |---------------------------------------------> |
| | INV sip:bob@Y.com | | | INV sip:bob@Y.com |
| | active | | | active |
| | | | | |
| | | | | |
| | | | | |
|<--------------------------------------------- | |<--------------------------------------------- |
| | 200 OK | | | 200 OK |
skipping to change at page 13, line 47 skipping to change at page 27, line 48
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Joe NAT Bob Joe NAT Bob
Figure 2: Bi-directional RTP Flow Figure 8: Symmetric RTP Flow
| | | | | |
| | | | | |
|---------------------------------------------> | |---------------------------------------------> |
| | INV sip:bob@Y.com | | | INV sip:bob@Y.com |
| | either | | | either |
| | 7.1.1.1:88 | | | 7.1.1.1:88 |
| | | | | |
| | | | | |
|<--------------------------------------------- | |<--------------------------------------------- |
| | 200 OK | | | 200 OK |
skipping to change at page 15, line 4 skipping to change at page 29, line 4
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Joe NAT Bob Joe NAT Bob
Figure 3: Bi-directional RTP Flow, NAT role reversed Figure 9: Symmetric RTP Flow, NAT role reversed
is the only case when one is needed. Our solution will result in the
invocation of RTP forwarding services by the domain of the called
party.
The basic idea behind the solution is this. User agents must be able
to initiate or terminate bidirectional RTP connections. The calling
side always indicates support for both. When a proxy for a user in
some domain receives a call (either to or from that user), that proxy
accepts the responsibility for setting the direction attribute in the
SDP in such a way that the client will be able to successfully handle
media.
Consider first proxy X, representing Joe. When Joe makes an outgoing
call, Joe's UA will set the direction attribute in the SDP to "both"
and include the IP address and port Joe is prepared to receive media
on. This INVITE is sent to proxy X. Proxy X determines if Joe is
behind a NAT. This can be done either through configuration (when the
user signs up, they indicate whether they are behind a NAT or not),
or through packet inspection. If the source address of the INVITE
does not match the address and port in the Via header (especially if
the ports don't match), Joe is behind a NAT.
If Joe is behind a NAT, proxy X knows that Joe can not accept
incoming connections. Thus, Joe cannot actually be either active or
passive; he must be active. Proxy X therefore rewrites the SDP to
indicate a direction of active. If, for some reason, Joe's UA had set
the SDP to indicate either active or passive, this can be taken as an
indicator that Joe knows he is (active) or is not (passive) behind a
NAT, in which case no action is needed by the proxy.
When the call arrives at proxy Y, proxy Y first determines the call
routing. If it discovers that the call is to be routed to the called
party's machine (which it knows based on whether the user registered
with the Contact cookie), and it determines that the called party is
behind a NAT (based on the source address of the REGISTER compared to
the address in the top Via header of the REGISTER), the proxy may
need to modify the SDP. If the SDP in the incoming INVITE indicates a
direction of both, it is changed to passive (this way, the called
party initiates the connection). If the direction is passive, nothing
is done. If the SDP in the incoming INVITE indicates a direction of
active, there is a problem. Both parties are only capable of
initiating active connections. To handle this, proxy Y needs to
involve an RTP translator. It allocates a pair of address/port pairs,
A and B, from the translator. It rewrites the SDP in the INVITE to
indicate a direction of passive, and sets the IP adress and port pair
+-------+ +-------+
| SIP | | SIP |
| Proxy | | Proxy |
| X | | Y |
| | | |
+-------+ +-------+
----
/RTP \
| Forw.|
\ /
----
+-------+ +-------+
........|FW/NAT |............ ........|FW/NAT |............
. | | . . | | .
. +-------+ . . +-------+ .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. +-------+ . . +-------+ .
. | Joe | . . | Bob | .
. | SIP UA| . . | SIP UA| .
. +-------+ . . +-------+ .
............................. .............................
Enterprise A Enterprise B
Figure 4: RTP Translators
to A. This will ensure that the called party initiates an RTP
connection out to the translator. Similarly, in the SDP in the
response, the direction (which will be active) is rewritten to
passive, and the IP address is set to B. This will ensure that the
calling party initiates an RTP connection out to the translator. The
proxy then tells the translator that packets received on A should be
relayed to the connection on B, and vice a versa.
The actions at the proxies for incoming and outgoing calls are
summarized in Table 1.
Call Direction SDP direction rewrite to note
Incoming both passive
Incoming active passive introduce RTP translator
Incoming passive -
Outgoing both active
Outgoing active -
Outgoing passive -
Table 1: Rules for SDP Rewriting
Based on these rules, we can analyze the four cases.
In case one, neither party is behind a NAT. The caller indicates a
direction of "both" in the SDP. The local outbound proxy does not
change that, since it detects that the caller is not behind a NAT.
The call is forwarded to the proxy for the called party. It doesn't
modify the SDP either, and forwards the call to the called party. In
its response, the called party indicates that it can support a
direction of "both". When the response is delivered to the calling
party, both sides initiate bidirectional RTP connections to each
other. One of them is chosen, and is used for media.
In the second case, the caller is behind a NAT, but the called party
is not. The caller indicates a direction of "both" in the SDP. The
local outbound proxy detects that the caller is behind a NAT. It
therefore modifies the SDP to indicate a direction of "active". The
call is forwarded to the proxy for the called party. It determines
that the called party is not behind a NAT. So, it leaves the SDP
alone. The called party sees that the caller requested the active
side of the connection. So, in the 200 OK response, the called party
indicates passive. This 200 OK is forwarded back to the caller. The
caller initiates a bidirectional RTP connection the called party,
which succeeds. The media is sent over that connection.
In the third case, the caller is not behind a NAT, but the called
party is. The caller indicates a direction of "both" in the SDP. The
local outbound proxy does not change that, since it detects that the
caller is not behind a NAT. The call is forwarded to the proxy for
the called party. This proxy determines that the called party is
behind a NAT. It rewrites the direction tag in the SDP in the INVITE
from "both" to "passive". This is received at the called party. It
has no choice but to respond with a direction of "active" in its 200
OK. This is forwarded to the calling party. The called party then
initiates a bidirectional RTP connection to the caller, which
succeeds. The media is sent over that connection.
In the fourth, and worst case, scenario, both are behind NATs. The
caller indicates a direction of "both" in the SDP. The local outbound
proxy detects that the caller is behind a NAT. It therefore modifies
the SDP to indicate a direction of "active". The call is forwarded to
the proxy for the called party. THis proxy also detects that the
called party is behind a NAT. However, the SDP indicates a direction
of "active", which is bad. The proxy then brings in an RTP
translator, and rewrites the direction to be passive. It also sets
the c line and m line to contain address/port pair A of the
translator. This INVITE received at the called party. It has no
choice but to respond with a direction of "active" in its 200 OK. The
200 OK is received at the proxy, where it rewrites the direction tag
from "active" to "passive". It also sets the c line and m line to
contain address/port pair B of the translator. This INVITE is
received at the calling party. Both sides then initiate outbound
connections. The caller sends RTP to address/port B, and the callee
sends RTP to address/port A. The translator exchanges media between
these two connections.
Either the proxy or the RTP translator can manage the lifecycle of
the connection binding. If the proxy does it, the proxy must record-
route When the call is over (known through the BYE), the proxy
destroys the connections and connection bindings from the translator.
If the RTP translator manages the lifecycles, the proxy need not ever
record route or maintain call state. When the call is over, the
caller and callee both disconnect their RTP connections to the
translator (this is done with an RTCP BYE). When both connections
disconnect, the translator can destroy the bindings.
In cases where there is no RTP translator available, and both parties c=IN IP4 4.5.11.3
are behind a NAT, media cannot flow. In some cases, this will be m=audio 4444 RTP/UDP 0
detectable by the called party or their proxy (if the incoming SDP a=direction:passive
has bidirectional media with a direction of active, and the called
party is behind a NAT, and no translator is available). In this case,
the called party or proxy responds with a 488 Not Acceptable Here,
and includes a Warning header indicating a code 308 - NAT Traversal
Failure.
5.2 Firewalls For reasons of backwards compatibility, a host that indicates active
only in an INVITE must still list an IP address and port in the SDP,
and be prepared to receive media on it. When the 200 OK comes, if it
contains no direction attribute at all, the client knows that the
server did not support this SDP extension. As a result, the server
will ignore the direction attribute in the INVITE, and proceed to
send media to the IP address and port in the INVITE.
Because firewalls restrict connections to outbound only, the same The result is a very nice, smooth backwards compatibility from sym-
problem that plagues NATs also plagues firewalls. The same solution metric to traditional RTP usage.
as described above can also solve it, with a few minor tweaks. The
solution in Section 5.1 is defined for UDP. UDP will not work through
firewalls. Therefore, RTP over TCP or TLS is used instead. In the
worst case, the RTP would need to be carried over a TLS connection on
port 443. Besides this difference, the solution for firewall is the
same as described for NAT. Note that since SIP may be over TLS to
port 443 as well, the proxy and the RTP translator should not be on
the same IP address.
6 Caveats 6.2.1 RTCP Address and Port
There are many caveats with our proposed solutions, especially for Unfortunately, the NAT may not allocate consecutive port bindings to
firewall. the RTP and RTCP packets. THis means that a client will need to sig-
nal in the SDP the IP address and port for both RTP and RTCP,
separately. An approach for doing this is documented by Huitema
6.1 NAT Solutions 7 Using Symmetric RTP and NAT ID together
o RTP translators are horrible. The author spent much time In this section, we show how a host would make use of both symmetric
arguing against such devices, on the grounds that the RTP and the NAT ID and binding protocol. There are many cases to con-
underlying IP network already providing routing capabilities, sider. The caller and callee can either be behind a symmetric NAT,
and that these do not need to be replicated at the voice cone NAT, or no NAT. The caller and callee can either support or not
transport layer. They will increase overall voice latency, support the symmetric RTP extension. The caller or callee can either
introduce another point of failure, and incur additional costs support or not support the NAT ID proposal. While this may seem like
to providers. However, they are unavoidable given that the a large number of cases (144 of them), the actual behavior at a host
fundamental semantic of the IP address, that it is a globally to handle all the cases is quite simple.
reachable point for communications, has been violated by NATs.
Perhaps this is argument can be rephrased as, "unreliable and
delayed communication beats no communication."
o If the RTP translator is not co-resident with the proxy, some Why would a host ever support symmetric RTP, but not NAT ID? This is
kind of control protocol is needed to allocate addresses and in cases where the host is some kind of service provider media-
to establish bindings. No such protocol exists right now. The enabled device, such as a gateway or conferencing server. These net-
midcom protocol [3] or MGCP [8] might be used for this works are ideally deployed without NAT at all, or with a midcom-based
purpose. We expect these translators to be bundled with firewall solution. As a result, NAT-ID is not needed, since the host
proxies, and thus make use of proprietary protocols initially. knows it has a public address. Symmetric RTP is still helpful, to
allow optimized access to the service from hosts behind a NAT. In
considering the cases, though, this case is identical to the one
where the host does support NAT ID, since NAT ID will always indicate
that the host has a public address. The behavior of the host during
call setup is therefore identical to the case where NAT-ID wasn't
there. This case aside, symmetric RTP does require the use of NAT ID
to detect whether the host is behind a NAT or not.
o It is possible that both caller and called party are behind a We start with the caller. If the caller is an existing client that is
NAT, but are behind *the same* NAT. In this case, no RTP unaware of symmetric RTP or the NAT ID protocol, it sends a regular
translator is needed. In theory, this case can be hard to INVITE. Of course, this will only work if the caller is not behind a
detect, but in practice, can frequently be determined NAT. If the caller supports NAT ID, it can detect if its behind a
administratively. As an example, a SIP provider might be NAT. If so, before a call, it determines a public address using the
providing centrex types of services to users in a network NAT ID protocol, and uses this in the SDP. If it also supports sym-
behind a NAT. The proxy providing these services will know metric RTP, and is behind a symmetric NAT, it indicates a direction
which users belong to the same enterprise, and it can modify of active for its media streams. If its behind a cone NAT, it indi-
its behavior accordingly. Even if the proxy is wrong, the cates that it supports both active and passive.
worst case is that an RTP translator is involved, increasing
voice latency.
o If the calling party is behind a NAT, an RTP connection cannot It then sends the INVITE. It arrives at the called party. If the
be established until the 200 OK is returned to the caller. called party supports symmetric RTP, it checks whether the caller
This means that the post-pickup delay increases by an RTT, supported it (known based on the presence of the direction attribute
which introduces additional clipping. This can be solved in the SDP). If the caller supported it, and the called party is not
through early media. The SDP is returned in a 183, allowing behind a NAT, they insert their public address into the SDP in the
the media connection to be established before the 200 OK. response, and offer to be the passive side. Otherwise, if the called
party is behind a NAT, they obtain an address using the NAT ID proto-
col, and insert that into the SDP in the response. The called party
indicates passive if the caller indicated active, or they indicate
active otherwise.
o The use of persistent TCP or TLS connections for SIP between If the called party doesn't support symmetric RTP, it allocates an
the user agents and their proxies makes clustering more address binding (if it supports the NAT ID protocol), and places that
complex. With traditional UDP, a call for some user could in the SDP in the response. Since symmetric RTP is not supported, no
arrive at any proxy that has access to the location service direction attributes are indicated in the response. If the called
which can route the call to Bob. Not so any longer. With party is ignorant of NAT ID, it simply places whatever it thinks is
persistent connections, the users are partitioned across the its address in the response.
proxies in a cluster.
6.2 Firewall Solutions The result of this fairly simple processing is that media flows
directly whenever at all possible, using symmetric RTP whenever pos-
sible. Only in the most extreme case, where both caller and callee
are behind symmetric NATs, does the service provider NAT get used. We
also get smooth backwards compatibility, so that calls work as best
they can if one side is ignorant of these extensions.
o Riding on top of port 443 for SIP over TLS goes against the 8 Security Considerations
principles of the guidelines established by the IESG [9].
o TLS or TCP will result in very bad voice delays as soon as the The allocation of addresses on the service provider NAT consumes
packet loss is nonzero. Interestingly, with zero packet loss, resources. Therefore, requests for those resources need to be authen-
the delays for voice over TCP will be equal to those of voice ticated, and coupled with the application layer service provided by
over UDP. Clients will need adaptive voice buffer algorithms the provider. This is why we specify the use of SIP authentication
that can tolerate wide swings in latencies. mechanisms for the reflector protocol.
o Current SIP client implementations do not require a TCP stack. Sample Router Configurations
The firewall solution will require TCP and/or TLS. The following are sample configuration files that can be used on a
Cisco router in order to provide the NAT functions needed in Figure
3.
o For firewalls, our approach requires a TLS server process (to Service Provider Router A sample configuration:
receive RTP) embedded within a SIP enabled communications int s0
client. This will require a public/private key and its ip address 63.1.1.1 255.255.255.252
associated certificate, available to the client, issued from a
Certification Authority (CA) that is known to the other party.
Similarly, use of a TLS client will require that the client be
configured with the keys of a set of well known CAs.
Support for TCP and/or TLS in the softphones can be mitigated by int e0
deploying UDP to TCP/TLS translation proxies inside of the firewall. ip address 193.1.2.2 255.255.255.0
7 Security Considerations int e1
ip address 193.1.1.2 255.255.255.0
RTP translators are effectively man-in-the middle systems. As a ip route 193.1.2.0 25.255.255.0 e0
result, a rogue proxy and RTP translator can listen in on the media ip route 193.1.1.0 255.255.255.0 e1
of all users initiating calls through it. To prevent this, clients ip route 193.1.3.2 255.255.255.255 e0
initiating TLS connections to a server should verify that the server ip route 0.0.0.0 0.0.0.0 s0
name in the SDP is a subdomain of the name presented in the
certificate. Furthermore, the client should only connect to servers
whose domains are subdomains of their service provider, or the
provider of the other party in the call.
8 Conclusion Service Provider NAPT router sample configuration:
int e0
ip nat inside
ip address 193.1.2.1 255.255.255.0
In this draft, we have proposed some modifications to SIP operation int e1
which allow it to successfully pass through NATs and firewalls. We ip nat outside
believe our NAT solution is very workable. It has minimal impact on ip address 193.1.1.1 255.255.255.0
clients, allows voice to run over UDP, and uses direct UDP transport
in all but the worst case. Our solutions for firewalls are less
palatable. The ideal solution is for firewall administrators to allow
SIP (over TCP on 5060 or TLS on 5061) out through the firewall, and
to eventually deploy ALGs, preferably using the midcom architecture.
We believe that solving the firewall and NAT problems are critical int e3
for deployment of SIP. ip address 193.1.3.1 255.255.255.0
9 Acknowledgements ip nat pool rtp 193.1.1.3 193.1.1.3 prefix 24
ip nat inside source list 9 pool rtp overload
ip nat outside source static udp list 9 193.1.3.2 6060
access-list 9 permit any any
We would like to thank Jeffrey Citron and John Butz from Vonage for ip route 0.0.0.0 0.0.0.0 e0
their efforts at verifying UDP NAT capabilities in existing
commercial products.
10 Author's Addresses A Author's Addresses
Jonathan Rosenberg Jonathan Rosenberg
dynamicsoft dynamicsoft
72 Eagle Rock Avenue 72 Eagle Rock Avenue
First Floor First Floor
East Hanover, NJ 07936 East Hanover, NJ 07936
email: jdrosen@dynamicsoft.com email: jdrosen@dynamicsoft.com
Joel Weinberger
dynamicsoft
72 Eagle Rock Avenue
First Floor
East Hanover, NJ 07936
email: jweinberger@dynamicsoft.com
Henning Schulzrinne Henning Schulzrinne
Columbia University Columbia University
M/S 0401 M/S 0401
1214 Amsterdam Ave. 1214 Amsterdam Ave.
New York, NY 10027-7003 New York, NY 10027-7003
email: schulzrinne@cs.columbia.edu email: schulzrinne@cs.columbia.edu
11 Bibliography B Bibliography
[1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP [1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP
network address translator (NAT)," Internet Draft, Internet network address translator (NAT)," Internet Draft, Internet Engineer-
Engineering Task Force, Oct. 2000. Work in progress. ing Task Force, Oct. 2000. Work in progress.
[2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through [2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through
firewalls and NATs," Internet Draft, Internet Engineering Task Force, firewalls and NATs," Internet Draft, Internet Engineering Task Force,
Feb. 2000. Work in progress. Feb. 2000. Work in progress.
[3] P. Srisuresh, J. Kuthan, and J. Rosenberg, "Middlebox [3] P. Srisuresh, J. Kuthan, and J. Rosenberg, "Middlebox communica-
communication architecture and framework," Internet Draft, Internet tion architecture and framework," Internet Draft, Internet Engineer-
Engineering Task Force, Feb. 2001. Work in progress. ing Task Force, Feb. 2001. Work in progress.
[4] P. Srisuresh and M. Holdrege, "IP network address translator
(NAT) terminology and considerations," Request for Comments 2663,
Internet Engineering Task Force, Aug. 1999.
[5] E. Rescorla, "HTTP over TLS," Request for Comments 2818, Internet
Engineering Task Force, May 2000.
[6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: [4] C. Huitema, "Short term NAT requirements for UDP based peer-to-
session initiation protocol," Request for Comments 2543, Internet peer applications," Internet Draft, Internet Engineering Task Force,
Engineering Task Force, Mar. 1999. Feb. 2001. Work in progress.
[7] D. Yon, "TCP-Based media transport in SDP," Internet Draft, [5] D. Yon, "TCP-Based media transport in SDP," Internet Draft,
Internet Engineering Task Force, Nov. 2000. Work in progress. Internet Engineering Task Force, Nov. 2000. Work in progress.
[8] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett,
"Media gateway control protocol (MGCP) version 1.0," Request for
Comments 2705, Internet Engineering Task Force, Oct. 1999.
[9] K. Moore, "On the use of HTTP as a substrate for other
protocols," Internet Draft, Internet Engineering Task Force, Oct.
2000. Work in progress.
 End of changes. 107 change blocks. 
608 lines changed or deleted 1042 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/