[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MMUSIC] comedia-fix-00 comments
At 09:58 PM 1/14/2003, Jonathan Rosenberg wrote:
David,
Ben, Paul and I discussed your comments extensively over the last week or
so. Our conclusion was that we will not use comedia for simple messaging
sessions. At a high level, the conclusion was that we were looking at
"message sessions over TCP" and that comedia was perhaps optimized for
"streaming media sessions over TCP". As such, many of our requirements
that were critical, or assumptions we could make (such as demux within a
single TCP connection) don't necessarily apply to streaming media over TCP.
Now, I still think that much of what we proposed is valid. However, it is
no longer critical to get it changed. So, keep that in mind as you read my
responses below.
The the meta-question is whether comedia-fix will continue to block
comedia's progress. On one hand you imply above that it won't, but then
again we have the rest of this email that argues that comedia is severely
flawed. So, where are we going from here?
lem at all, IMO), and the draft should be updated to reflect this.
I conceded this point. You are correct that its not 1:1.
I still believe that this will fundamentally make tcp-based streaming
media complicated for firewalls. It would be REALLY nice to allow an admin
to open the "streaming FOO port" and that would allow for all media
traffic of that type through. You are still going to need dynamic ports in
your proposal, which perpetuates the firewall problems. The use of dynamic
ports was (arguably) needed for UDP because there is no other
protocol-independent demux. However, that is NOT true for TCP, which has
the notion of multiple connections on a port. Thus, to me, any usage of
TCP should make use of this natural and well-understood property, rather
than perpetuating a (IMHO ultimately wrong) design choice for UDP [ranting
for a moment, given the huge problems we have encountered in deploying
voip, a large number of which are based on the usage of dynamic ports for
session demux, I think ultimately we should have chosen a well-known port
for RTP, and then used something like SSRC or CNAME to demux. But that is
another conversation.]
I agree that fixed ports are desirable. Changing your endpoint ID from its
current form to be something keyed from within the media protocol itself
would go a long way towards this while alleviating my other concerns.
3.2 No Connection Re-Use
This references 3.1 and restates the incorrect scaling assumption.
This scenario seems to be the only problem for which connection/session
lifetime decoupling is presented as a solution. I cover this in more
detail below, but the short answer is that any economy of design derived
from this solution is offset by burdening other, more common scenarios
with undesirable restrictions.
Can you explain what you mean?
Well, like I said, I explain what I mean below. :-)
3.3 Security
The Man-in-the-Middle attack can be mitigated by simply disallowing
multiple connections to the advertised IP/port during the offer period.
If more than one arrives, then the endpoint assumes an attack and does a
session teardown. (yes, this requires updating the language in comedia
to reflect this new scenario)
That helps, but doesnt eliminate the problem. I can dos the other party,
preventing them from connecting to you. This came up during the
development of stun, which documents a similar attack.
I admit this is not a crucial point, since if it truly bothers you, using
real media level security will fix it.
A DoS is a very different animal than a security breach, and arguably an
improvement in terms of vulnerability introduced by the protocol, since
protecting against all forms of DoS is nearly impossible. Besides,
comedia-fix didn't fix the security breach either, so I'm not sure what it
is you're after.
The firewall trust issue can be addressed by a chain of trust, can it
not? I.e., presumably the firewall authenticates/trusts a SIP proxy
which in turn authenticates/trusts each of the UA's.
That would be the midcom approach, yes.
Ok, so you agree that the original assertion in comedia-fix is not valid.
The instruction to open/close pinholes is made by the SIP proxy to the
firewall as a result of the message traffic the proxy is relaying, not by
any sort of stateful packet inspection by the firewall.
As for the "there's no ALG yet" argument for residential gateways: I
submit yet again that without including the source address/port, comedia
will preclude an ALG from ever being written.
I think thats a good thing.
ALGs are bad. Thats why midcom got started. We need to keep application
layer intelligence out of them, and either get protocols to work through
them without adding more to the mess (stun/turn), or even better, specify
well-defined behaviors for them that can be controlled through something
like nsis. But, all of that is far off.
I think we understand each others' opposing points of view. Not clear to
me that we need to discuss it further, and I think the topic at hand is
best addressed via other points (like the other email thread we're having).
The suggestion in comedia-fix (to use a header at the start of the
connection) jumbles the protocol stack, and as such may cause some
unintended consequences.
Arguably, yes. Not the first time though. It is common to do this in many
protocols that are strongly recommended in IETF.
At the heart of the issue is "what is it that we are signaling
here?". If we are signaling RTP/AVP over TCP, then we violate the spec
with the initial ID transmission.
THink of it like a windowing operation in an operating system. During the
times at which the RTP/AVP stack has the connection, its compliant. Its
just that it doesnt get the connection right away.
This doesn't convince me. I still haven't seen precedent in SDP where the
protocol, as described in SDP, was augmented by some other protocol not
listed within the SDP.
If we are signaling RTP/AVP over TLS, where does the ID transmission go,
as part of the TLS stream or prior to the TLS handshake? In either case
we end up with a stream that is neither (a) 100% RTP/AVP and/or (b) 100% TLS.
Funny you should say this, since TLS is a good example of where common
practice is to mix protocol layers.
The recommended practice these days is for protocols to NOT run their apps
over tls on a separate port. The prtoocol itself needs to be able to demux
the two, and to transition from "raw protocol" to TLS. An example of this
is the STARTTLS mechanism used in http and other protocols. First its
HTTP, and then its TLS, all on the same TCP connection.
But what you are really talking about is SMTP, not TLS, so this doesn't
hold either. The difference here is that the STARTTLS command IS IN THE
SMTP SPEC. The session from start-to-finish is 100% legal SMTP, and any
security middlebox is going to recognize it as such. Not so with the
Endpoint ID approach in comedia-fix.
Also, this is the only way for SASL to work. SASL, by design, inserts a
security layer in the middle of the protocol stream. Sasl compliant
protocols need to specify how they will demarcate the point at which you
"hand the connection" over to the sasl security layer, so it can decrypt
the messages and hand them over to the application.
So, while I agree with your concerns, there is a long-established and
recommended practice of such a thing in a good number of widely deployed
IETF protocols.
To repeat: these are embedded in the protocol specs and designs themselves,
so they don't go out of scope the way that Endpoint ID does.
This problem can manifest itself in implementation complications as well
as middlebox problems.
I don't see the middlebox issue. Please elaborate.
Stateful packet inspection. See below.
Take TLS for example: let's say we put the endpoint ID ahead of the TLS
handshake. The OpenSSL folks were kind enough to provide C developers a
simple API where you prepend "ssl_" in front of the normal BSD socket
calls, and for the most part you can write code for SSL/TLS in the same
manner as raw TCP sockets. But with an endpoint ID, suddenly I the
developer have to do a more complicated scheme where I (a) connect a TCP
socket, (b) send/receive the endpoint ID, then (c) construct an SSL
session based on the connected socket. This of course presumes that the
language-specific API to SSL is flexible enough for the developer to
juggle the order of the protocol stack in this manner. Other API's
(Java, Perl, VB, etc.) may not be so forgiving.
Indeed. THis is a weakness. Not a new one. In fact, the need to support
STARTTLS makes me think this is a very common feature of TLS stacks. But,
I have not looked into it.
Implementation difficulties are rarely met with sympathy at the IETF it
seems. :-) I'll agree to differ on this and leave it for now.
Then there are the considerations for a decomposed gateway topology. By
multiplexing signalling information into the media stream, suddenly you
force the a tight coupling between the signalling and media transport
infrastructures by intermixing signalling tokens with the media stream.
How is this different than having the signaling thing ask the media thing
for an IP address, and then putting that IP address into the SDP? Then,
when the media thing receives media on that IP address, it knows the
"control" connection associated with it. Here, we do the same, but instead
of IP address, its IP+eid. Same difference.
The difference is that the media thing needs to know about windowing the
protocol, which means that it's now tightly bound to SDP-based signalling
rather than being a generic media thing.
For middlebox problems, consider the stateful packet inspection
firewall. The idea of course is to enforce a policy by protocol rather
than by connectivity. So SMTP, HTTP, FTP, and DNS are allowed, but
nothing else, and it doesn't matter what ports or addresses are used.
In theory the sysadmin should also be able to specify SIP, TLS, RTP/AVP,
and IM protocols in this list and allow those forward-looking end-users
to utilize this fancy new SIP stuff. With comedia-fix, every protocol
that is signaled will not, in fact, *be* the protocol described in the
SDP. So when the firewall watches the media stream, it doesn't see a
stream that fits any of the protocol structures it recognizes, and
therefore disallows the stream.
Now I am confused. The way that firewall admins enforce policy by protocol
rather than by connectivity is through ports. In the current comedia spec,
there will almost never be just a single port for the newfangled media
stream. So, you can't enforce policy. The mechanism in comedia-fix allows
there to be one, and only one, port allocated for any specific
media-over-tcp type. This will enable firewall admins to do things, not
disable it.
You're talking about port-number based enforcement. I'm talking about
stateful packet inspection, where the validation (yes, often combined with
port filtering) is done by verifying that the content of network traffic
itself---not just the endpoint information---conforms to the protocols that
are allowed by the policy. By adding the initial data, the traffic no
longer conforms to the protocols as understood by the inspection engine.
I do not understand your point about the protocol not "being" the one that
is advertised. It most certainly is, with the addition of a 32 byte ID up
front, but who cares?
Who cares? The firewall does. See above.
Another case is where TLS is provided by an external accelerator box to
offload the compute load of the server---same issue.
4.2 Decoupled lifetimes
This has a number of side-effects. First, it essentially forces a
single-port architecture on both of the endpoints, as the only other
option is the "this-doesn't-scale" 1:1 client-to-port-number mapping.
With comedia as it stands today, the lifetime of the listener port is
decoupled from the lifetime of the session that is begat from that
listener. No longer the case with comedia-fix.
You can use as many ports as you like, ranging from 1 to one for each
session. Because I don't need the port to correlate, I am free to use any
algorithm I like. Comedia has to use an algorithm which has at least
one-per-pending-offer, because you have no way to correlate. Therefore, I
do not believe that comedia-fix introduces a binary "1 port" or "1 port
per session" restriction.
Speaking of decoupling... :-) I was decoupling the lifetime issue from
endpoint-ID. So if you assume that the endpoint-ID idea in comedia-fix
does not become spec, but decoupled lifetimes *does* become spec, then the
issue I stated above becomes a factor. So we are both correct, but are
working from different assumptions.
At any rate, at the moment this is eclipsed by more important points,
discussed below.
Second, it forces endpoints to use "both" where one could have opted for
just "active" or "passive". The problem here is that if one endpoint
specified "passive", and the media connection drops, it has no way to
legally reinstate the connection if it needs to send data to the other
endpoint. In essence, comedia-fix "raises the stakes" as to what is an
appropriate connection mode to negotiate. In comedia, the only thing at
stake was the ability to bring up the intial connection. In comedia-fix,
we add the issue of recovering from the scenario that the remote endpoint
will drop the connection.
You need to handle connection failures. In the current comedia, if the
passive side loses the connection, it can bring it up by doing a re-invite
and then include a reconnect parameter. What you CANNOT do is have the
active side reconnect without doing a re-inVITE. In comedia-fix, you can
have the active side reconnect without doing a re-INVITE. There is no
reason we could not also allow the passive side to do a re-INVITE to force
a reconnect from the active side. I did propose to remove that, but
perhaps it is useful after all.
See below for my rationale on the re-INVITE.
Third, as described in comedia-fix, NAT's can cause a similar outage to
#2 even if each endpoint specifies "both".
The decoupled lifetime, to my knowledge, is unprecedented.
Really? See RFC3261. SIP can run over TCP. When it does, the lifetime of
the connection is not strongly coupled to the lifetime of the transaction
or dialog. It used to be, in rfc2543. After years of trying to make it
work, we found it was a nightmare to maintain. THus, it got disacarded.
The result was a nice clean separate of the SIP layer from teh TCP
connectivity layer. This change bought us many properties, including
improvements in robustness (the ability to recover a SIP transaction even
if the tcp connection the request went over had closed).
Its critical in any protocol where the intermediary elements provide a
subset of the endpoint functionality.
"Unprecedented" in terms of the rules governing media endpoint behavior
when using SDP, not in the entire IETF protocol suite. Sheesh... :-)
The rationale for requiring a re-INVITE was that it was safest and simplest
to treat the loss of a connection the in same way as a failure of a media
stream. So if the connection goes away, but you didn't get a BYE, and you
didn't think you were done yet, then the way to attempt to recover the
connection was via a re-INVITE. It was assumed that this mapped closest to
how connectionless media endpoints would behave.
So now that the context is clearer: is there any precedent for
connectionless media endpoints to just start sending media again even after
the stream has deemed to have failed for some reason? See next comment for
an example.
In developing comedia I tried to apply the litmus test of "how is this
scenario handled in connectionless media, and how do connections
differ?". I don't believe that decoupled lifetimes fares very well
here. Remember that while "traditional" SDP media transports may be
connectionless, they aren't stateless. So the corollary to decoupling
connection lifetime from session lifetime is akin to allowing RTP/AVP
streams to be stopped, reset, then restarted, all without any additional
activity on the signalling channel. Again, I'm not an expert here, but I
imagine that this would be considered outside the spec today.
There is no notion to starting/stopping RTP streams, so its hard to say
what this would mean.
Well, consider the failure case, where the data sent to that port either
(a) drops below some performance threshold (90% packet loss, 25,000ms
latency, pick your favorite poison), (b) doesn't relate at all to what came
before it, or (c) is just undecipherable garbage. In whatever case, the
endpoint is able to discern that something is sufficiently amiss that the
stream can be considered failed and therefore ended.
I also don't understand the rationale behind the statement: "If one side
wishes to force the other to reconnect, it merely drops the connection.
When the other side has data to send, it will establish a new
connection." The "reconnect" attribute was never intended to "force"
reconnects. Rather, it was intended as a way to recover from an
*unintended* disconnection in such as a way as to preserve the existing
session state. Is there ever a reason that an endpoint would tear down
the connection solely to inconvenience the other endpoint with the task
of re-establishing it? (insert half-smiley here, since that's the
impression one gets from the language in comedia-fix)
No, of course not.
I must admit I am not sure what I meant here. Per above, I do see how the
reconnect parameter would help.
Ok, sounds like that is resolved, reconnect is useful therefore stays.
I guess I'm having trouble understanding why it is so important that the
lifetime be decoupled. The only argument put forth is the
multiplexing-bridge scenario. If we really are using a multiplexed
tunnel to bridge two networks, what's the harm in making the tunneling
connection "sticky"?
The intermediaries in this case woudl need to "reference count". That is,
they would need to keep track of each session that gets set up, and then
torn down, which had media over that connection. This is impossible to do
when the media intermediary is decoupled from the SIP proxy. Doing it
would require what we call a B2BUA in SIP, along with a colocated media
intermediary or a remote one under tight control.
Another problem is disconnects due to some kind of network outage. Lets
say users are sending their TCP-media through two relays. The connection
between the relays gets lost, but the connections from each user to their
relay remains. Neither user will try to reconnect, since as far as they
know, there is no TCP connection problem. The relays know there is a
problem, and they could reconnect, but comedia prohibits them from doing
so. They would have to be a b2bua once more, and issue a new INVITE on
their own, just to re-establish it. In other words, the relays need to be
the same (in terms of fucntioanlity) as the endpoints. I want to drive
such things out of intermediaries, not INTO them.
So, to sum up, with comedia today, building these relays requires a very
tight coupling between the SIP proxy (which has to be a b2bua) and the tcp
media relay. WIth the comedia-fix, they are decoupled. The SIP
intermediary can remain a proxy, and the media-relay can worry about its
own connectivity.
On the flip side, since there is a multiplexing function occurring in an
intermediary, shouldn't there be a mechanism for determining the lifetime
of the feeds into the mux that just makes this problem go away? How
today, do the endpoints in this scenario know when to do an initial
INVITE, then do the final BYE?
Sure, it can be computed. It just introduces a coupling whcih is not
otherwise there.
Sounds like you aren't going to use comedia for this anyway, so I won't
pursue the topic.
-Jonathan R.
--
Jonathan D. Rosenberg, Ph.D. 72 Eagle Rock Ave.
Chief Scientist First Floor
dynamicsoft East Hanover, NJ 07936
jdrosen@dynamicsoft.com FAX: (973) 952-5050
http://www.jdrosen.net PHONE: (973) 952-5000
http://www.dynamicsoft.com
_______________________________________________
mmusic mailing list
mmusic@ietf.org
https://www1.ietf.org/mailman/listinfo/mmusic
_______________________________________________
mmusic mailing list
mmusic@ietf.org
https://www1.ietf.org/mailman/listinfo/mmusic