[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Sip] comments on draft-ietf-sip-outbound



Many thanks for the comments - this I agree with you on this stuff. Will
bring up the major items as slides for tomorrow.


> -----Original Message-----
> From: sip-bounces at ietf.org [mailto:sip-bounces at ietf.org] On
> Behalf Of Jonathan Rosenberg
> Sent: Sunday, July 31, 2005 6:43 AM
> To: IETF SIP List
> Subject: [Sip] comments on draft-ietf-sip-outbound
>
> This draft is much improved from the previous version. Its an
> essential
> part of the SIP nat traversal story so I am glad to see that.
> Comments
> below, with big issues first.
>
>
> Bigger issues
>
> * I still like STUN even over TCP. It has the advantage of
> generating a
> response, and allowing the client to correlate the response to the
> request. See more below.

I had a long discussion with several TCP experts today and now fully
understand what is going on here. I agree and will recommend to the group
that we switch to STUN over TCP.

>
> * the idea of a firewall between EP and registrar is something I'd
> rather treat in Rohan's connect-reuse draft. Let this draft
> focus on the
> client to network connections.

The case Dean wanted addressed was actually a NAT between the two proxies.
Dean, are you ok with just having Rohan's draft address this?

>
> * I find the overview of opreations confusing; it describes many
> different scenarios, and gives the impression that the draft is a
> hodgepodge of techniques. I would prefer to cast the overview
> and spec
> overall in terms of the most genreal case - multiple edge
> proxies and a
> separate registrar - and then treat other scenarios as a
> simplified case.

I'm trying to balance this off against most implementers, and I suspect a
large percentage of the WG is not familuar with the term edge proxy. But
agree with general comment and will try to refine this.

>
> * There are definitely some terminology problems and inconsistencies
> around the terms connection, flow and binding.

Agreed - need to fix. May ask you for a detail NITS review some time after
IETF.

>
> > With connection
> >    oriented transports such as TCP or TLS, the keep alive
> will detect
> >    failure after a NAT reboot.  Connection oriented
> transport failures
> >    are detected by having the UA periodically sends a CRLF over the
> >    connection; if the connection has failed, a connection
> level error
> >    will be reported to the UA.
>
> This is not always the case. I've run into deployments where the NATs
> simply dropped bindings on the floor, and then dropped any subsequent
> data packets without sending a reset or anything else. As such, these
> took a very long time to get detected as connection failures. Using a
> stun keepalive transaction over tcp allows these to be
> detected rapidly.
>
> > A CRLF can be considered the beginning
> >    of the next message that will be sent, and therefore
> this approach is
> >    backwards compatible with the core SIP specification.
>
> This is a non-issue. The client needs to know that the server is
> supporting this mechanism in the first place before doing
> anything. Its
> just like sigcomp; you need to know the server does it before
> you start.
>   The draft proposes the stun flag for that. What you really want is
> something like "connection-management=stun", where you can specify
> different connection management techniques. You'd use that
> even if this
> was going to be over tcp.

yes, will switch to STUN over TCP, sorry it took me so long to understand
why this was critical. I will do my best to explain to rest of WG why this
change is needed.


>
> Another really important point - the presence of
> "connection-management=stun" or whatever in the route header field of
> the register also tells the proxy that the client is using the
> mechanisms in this spec. As such, the proxy knows that it can
> skip "bad
> things" that servers sometimes do (like be an SBC) to deal
> with clients
> behind nat. Thats another reason why you want this connection
> management
> flag there even in the tcp case.
>
> * its not clear to me that the wait time in section 4.3 provides
> sufficient avalanche restart protection, especially in a
> major case like
> a metropolitan power outage. You're still goint to see a huge
> flood of
> initial registrations, since the spec tells clients not to
> delay their
> initial registrations when they power up.

Cleary deployments may want to change these. I will try and put some text to
motivate these numbers. The number came very roughly from estimate of if
load caused by registration is same as invite (huge approximation) and
average non presence client does say peak hour calls of 2 bhca, then proxy
farm cpu to support 2 calls per hour per client. This would imply that it
had cpu power to do 2 registrations per hour per client implying that when
the clients were attempting to register at steady rate or 2 per hour (or 30
minutes) they would all register.

>
> > Any time the registrar checks if a new contact matches an
> >    existing contact in the location database, it MUST also
> check and see
> >    if both the instance-id and flow-id match.  If they do not match,
> >    then the they are not the same contact.
>
> also, need to discuss the case where the contact URI do
> match, but the
> instance-id and flow-id don't. In this case, they are NOT the
> same contact.

Hmm, meant to have that - will fix

>
> >> The two algorithm for edge proxies are nearly identical with the
> >>    exception that one integrity protects the identifier so
> it can not be
> >>    tampered with.  It is not clear if this integrity protection is
> >>    needed.  The WG should determine if this integrity is
> need or not
> >>    then refine this specification.
>
> gruu has the same issue, and I am becoming convinced the encryption
> doesnt help much.
>
>

hash may be more appropriate than encryption but unless you are doing all
the edge proxy to reg either inside a walled garden, or over sips, it seems
like you need the integrity protection or an attacker can hijack calls.

>
>
> Minor

Did not comment but agree with most this -

>
> > capable of allowing new connections from the private address side to
> >    the public side.  It is worth noting that most UAs in
> the world are
> >    deployed behind firewalls or NATs.
>
> The last sentence is true today, but who knows what the
> future brings.
> Generally, since RFC text is burned in stone, its better to avoid
> statements about current network deployment scenarios.

will remove

>
> > However, these systems can still form TLS
> >    connections to a proxy or registrar such that the UA
> authenticates
> >    the server certificate, and the server authenticates the
> UA using a
> >    shared secret in a digest challenge.
>
> .. digest challenge over that TLS connection.
>
> > The key idea of this specification is that when a UA sends
> a REGISTER
> >    request, the proxy can later use this same connection to
> forward any
> >    requests that need to go to this UA.
>
> Since this spec applies to UDP also, I think you need to say
> here that
> "connection" doesnt just mean a tcp connection.

will try to fix - I strugled many place with the words to describe what is
basically connected UDP.

>
>
> Generally, the introduction seems to dance around the main
> point, which
> should be stated bluntly I think: This specification allows SIP
> registrations to be used to register clients behind a NAT or firewall.
>
> > flow: A Flow is a network protocol layer connection between
> two hosts
> >       that is represented by the network address of both
> ends and the
> >       protocol.  For TCP and UDP this would include the IP
> addresses and
> >       ports of both ends and the protocol (TCP or UDP).  With TCP, a
> >       flow would often have to one to one correspondence
> with a single
> >       file descriptor in the operating system.
> >    flow-id: This refers to the value of a new header
> parameter value for
> >       the contact header.  When UA register multiple times, each
> >       registration gets a unique flow-id value.
>
> One would think that a flow-id identifies a flow, but based
> on the above
> definitions it doesn't. The first definition implies that a flow is
> identified by its 5-tuple.
>
> I think that there are really three things here. One is a
> "flow", which
> represents a logical association between a UA and a specific
> EP host it
> wishes to correspond with. A flow is identified by a flow ID. A flow
> manifests itself by a "connection", which is a TCP connection
> for TCP,
> and a 5-tuple for UDP. The main point is that, as clienc connections
> come and go, the connection IDs (the 5-tuples) change, but
> they are all
> the same flow and thus have the same flow ID.

will try to fix up

>
> > The overall approach is fairly simple.  Each UA has a unique
> >    instance-id that stays the same for this UA even if the
> UA reboots or
> >    is power cycled.
>
> reference GRUU
>
> > The overall approach is fairly simple.  Each UA has a unique
> >    instance-id that stays the same for this UA even if the
> UA reboots or
> >    is power cycled.  Each UA can register multiple times.
>
> Sure. But here, I think you mean it can do so multiple times
> against the
> same AOR, for the purposes of HA.
>
> > Each
> >    registration includes the instance-id for the UA and a
> flow-id label
> >    that is different for each connection.
>
> So, if I re-register on a new tcp connection (because my old one
> failed), this would seem to imply a new flow-id, but thats
> not what you
> want.
>
> > UAs use a keep alive mechanism to keep their flow to the proxy or
> >    registrar alive.  For TCP, TLS, and other connection oriented
> >    protocols this is a burst containing a single CRLF.  For
> UDP it is a
> >    STUN request sent over the flow.
>
> I think you mean connection instead of flow as the last word here.
>
> >  When a proxy goes to route a message to a UA for which it has a
> >    binding,
>
> you have not yet defined or used the term "binding". Also in this
> sentence I think you are talking specifically about the
> registrar proxy.
>
> > The flow-id parameter is used to allow the
> >    registrar to detect and avoid using invalid contacts when a UA
> >    reboots, as described later in this section.
>
> reboots or reconnects since its old connection failed for some reason
> (like a nat reboot).
>
> > Implementors often ask why the value of the sip.instance is inside
> >       angle brackets.  This is a requirement of RFC 3840 [8] which
> >       defines that media feature tags in SIP.  Feature tags
> which are
> >       strings are compared by case sensitive string comparison.  To
> >       differentiate these tags from  tokens (which are not case
> >       sensitive), case sensitive parameters such as the sip.instance
> >       media feature tag are placed inside angle brackets.
>
> this is discussed more thoroughly in gruu now, not sure it is
> needed to
> replicate that text here
>
> > If the proxy had
> >    multiple flows that all went to this UA, it could choose
> any one of
> >    registration binding that it had for this AOR and had the same
> >    instance-id as the selected UA.
>
> this isnt just something a proxy "could" do - its almost
> fundamental to
> the operation of this spec. The set of registrations with differing
> flows to the same isntance ID represent a logical bundle, of
> which the
> proxy only picks one when actually forwarding a request.
>
>
> >  Note:  The STUN mechanism is very robust and allows the detection
> >       of a changed IP address.  It may also be possible to
> do this with
> >       OPTIONS messages and rport; although this approach has the
> >       advantage of being backwards compatible, it also increases the
> >       load on the proxy or registrar server.
>
> backwards compatibility is a red-herring; see above.
>
> OPTIONS "significantly* increases the load.
>
> > The UA MUST also add a distinct flow-id
> >    parameter to the contact header.
>
> header field
>
> > described in RFC 3261 and RFC 3263.  In particular implementers
> >    should note that a 503 with a Retry-After is not
> considered a failure
> >    to form the connection.  The UA should wait the
> indicated amount of
> >    time and retry the connection.
>
> these two sentences seem contradictory - its not a failure to
> form the
> connection, but you're supposed to retry the connection..
>
> > 4.1.1  Instance-ID Selection
>
> this all belongs in gruu if needed.
>
> > User Agents that form flows with stream oriented protocols such as
> >    TCP, TLS, or SCTP SHOULD periodically send a CRLF over
> the connection
> >    to detect liveness of the flow.  If when sending the CRLF, the
> >    transport reports an error, then the connection is
> considered to have
> >    failed.  It is RECOMMENDED that a CRLF be sent if the
> flow has not
> >    had any data sent or received in the previous 500 to 600 seconds.
>
> this time is way too long. Thing about existing IM clients
> like AIM. I
> find that AIM detects failed connections within about 30s,
> indicating a
> keepalive of around that amount of time. Remember, for the
> duration that
> a bad connection is not detected, there is a gap in availability for
> incoming calls. Rapid detection is essential.
>
>
> > User Agents that form flows with datagram oriented protocols such as
> >    UDP SHOULD check if the URI has the "stun" tag (defined in
> >    Section 10) and, if the tag is present,
>
> you need to specify what URI you are talking about here.
>
> > Any time a SIP message is sent and the proxy does not respond, this
> >    is also considered a failure, the flow is closed
>
> how does one close a UDP flow?
>
> > Any time a SIP message is sent and the proxy does not respond, this
> >    is also considered a failure, the flow is closed and the
> procedures
> >    in Section 4.1 are followed to form a new flow.
>
> no - it doesnt follow the procedures in 4.1 yet. It has to
> figure out if
> it needs to wait. Generally, whenever section 4.2 concludes that a
> connection failed, it should refer to 4.3 for determining when to
> re-establish it.
>
> > however, if there
> >    is a failure in forming this flow, the UA needs to wait a certain
> >    amount of time before retrying to form a flow to this
> particular URI
> >    in the proxy set.
>
> important to mention that this is to avoid avalanche restart
>
> > wait-time = min( 1800, (30 * (2 ^ consecutive-failures)))

will fix


>
> this formula doesn't have the base-time in it; 30s is hard-coded.
>
> >   These three times SHOULD be configurable in the UA.
>
> not clear which three you mean. You should explicitly define three
> variables and use them in the equation above. Also, you should talk
> about why they would ever be changed (its based on server capacity).
>
> >  o  The proxy MUST NOT populate the target set with more than one
> >       contact with the same AOR and instance-id at a time.
> If a request
> >       for a particular AOR and instance-id fails with a 410
> response,
> >       the proxy SHOULD replace the failed branch with another target
> >       with the same AOR and instance-id, but a different flow-id.
>
>
> you should discuss cases like sequential forking, which normally pick
> contacts in q-value order; it won't do that now, and would rather
> initially treat the ones with the same instance ID as a group
> and pick
> one of those.
>
>
>  > Proxies MUST Record-Route so that mid dialog requests are
> routed over
>  >    the correct flow.
>
> just edge proxies
>
>  > If a proxy or registrar receives a network error when sending a SIP
>  >    message over a particular flow, it MUST remove all the
> bindings that
>  >    use that flow (regardless of AOR).
>
> I thought a flow was specific to a single AOR?
>
> > Edge Proxies MUST Record-Route with the same URI that was
> used in the
> >    path so that mid dialog requests still are routed over
> the correct
> >    flow.
>
> The thing thats MUST strength is record-routing so that mid-dialog
> requests go over the same flow. You can do that without using
> the same
> URI as in the Path.
>
> > When a URI is created that refers to a SIP device that supports STUN
> >    as described in this section, the URI parameter "stun",
> as defined in
> >    Section 10 SHOULD be added to the URI.
>
> MUST I think
>
> > 9.  Grammar
> >
> >    This specification defines a new Contact header field parameter,
> >    flow-id.  The grammar for DIGIT and EQUAL is obtained
> from RFC 3261
> >    [3].
>
> what about the stun parameter (prefer connection-management
> or something
> similar, per above)
>
>
>
>
>
> --
> Jonathan D. Rosenberg, Ph.D.                   600 Lanidex Plaza
> Director, Service Provider VoIP Architecture   Parsippany, NJ
> 07054-2711
> Cisco Systems
> jdrosen at cisco.com                              FAX:   (973) 952-5050
> http://www.jdrosen.net                         PHONE: (973) 952-5000
> http://www.cisco.com
>
>
> _______________________________________________
> Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
> This list is for NEW development of the core SIP Protocol
> Use sip-implementors at cs.columbia.edu for questions on current sip
> Use sipping at ietf.org for new developments on the application of sip
>


_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sipping at ietf.org for new developments on the application of sip