[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MMUSIC] comedia-fix-00 comments
David,
Ben, Paul and I discussed your comments extensively over the last week
or so. Our conclusion was that we will not use comedia for simple
messaging sessions. At a high level, the conclusion was that we were
looking at "message sessions over TCP" and that comedia was perhaps
optimized for "streaming media sessions over TCP". As such, many of our
requirements that were critical, or assumptions we could make (such as
demux within a single TCP connection) don't necessarily apply to
streaming media over TCP.
Now, I still think that much of what we proposed is valid. However, it
is no longer critical to get it changed. So, keep that in mind as you
read my responses below.
David Yon wrote:
This is a rather long and rambling commentary, so let me summarize the
major points up front:
- The scaling issue is misstated.
- The security issues raised are not actually solved by
comedia-fix, but can be mitigated more simply.
- The proposed mechanism for session correlation has
serious side-effects that are not addressed.
- The connection/session lifetime decoupling is a
major paradigm shift, and also has some serious
side-effects.
Details below, arranged by comedia-fix sections:
3.1 Port Multiplicity on Servers
This section is incorrect. Even without a means to correlate incoming
connections to sessions, the mapping of allocated port numbers to
clients is not 1:1. I'm surprised to see this statement in the draft
since I thought I had explained that in detail at earlier meetings.
The size of the port space need only be as large as the NUMBER OF
PENDING OFFERS made to potential clients. This is not the scaling
problem described in comedia-fix (and not really a scaling problem at
all, IMO), and the draft should be updated to reflect this.
I conceded this point. You are correct that its not 1:1.
I still believe that this will fundamentally make tcp-based streaming
media complicated for firewalls. It would be REALLY nice to allow an
admin to open the "streaming FOO port" and that would allow for all
media traffic of that type through. You are still going to need dynamic
ports in your proposal, which perpetuates the firewall problems. The use
of dynamic ports was (arguably) needed for UDP because there is no other
protocol-independent demux. However, that is NOT true for TCP, which has
the notion of multiple connections on a port. Thus, to me, any usage of
TCP should make use of this natural and well-understood property, rather
than perpetuating a (IMHO ultimately wrong) design choice for UDP
[ranting for a moment, given the huge problems we have encountered in
deploying voip, a large number of which are based on the usage of
dynamic ports for session demux, I think ultimately we should have
chosen a well-known port for RTP, and then used something like SSRC or
CNAME to demux. But that is another conversation.]
3.2 No Connection Re-Use
This references 3.1 and restates the incorrect scaling assumption.
This scenario seems to be the only problem for which connection/session
lifetime decoupling is presented as a solution. I cover this in more
detail below, but the short answer is that any economy of design derived
from this solution is offset by burdening other, more common scenarios
with undesirable restrictions.
Can you explain what you mean?
3.3 Security
The Man-in-the-Middle attack can be mitigated by simply disallowing
multiple connections to the advertised IP/port during the offer period.
If more than one arrives, then the endpoint assumes an attack and does a
session teardown. (yes, this requires updating the language in comedia
to reflect this new scenario)
That helps, but doesnt eliminate the problem. I can dos the other party,
preventing them from connecting to you. This came up during the
development of stun, which documents a similar attack.
I admit this is not a crucial point, since if it truly bothers you,
using real media level security will fix it.
The firewall trust issue can be addressed by a chain of trust, can it
not? I.e., presumably the firewall authenticates/trusts a SIP proxy
which in turn authenticates/trusts each of the UA's.
That would be the midcom approach, yes.
The instruction to
open/close pinholes is made by the SIP proxy to the firewall as a result
of the message traffic the proxy is relaying, not by any sort of
stateful packet inspection by the firewall.
As for the "there's no ALG yet" argument for residential gateways: I
submit yet again that without including the source address/port, comedia
will preclude an ALG from ever being written.
I think thats a good thing.
ALGs are bad. Thats why midcom got started. We need to keep application
layer intelligence out of them, and either get protocols to work through
them without adding more to the mess (stun/turn), or even better,
specify well-defined behaviors for them that can be controlled through
something like nsis. But, all of that is far off.
4.1 Endpoint-ID
This concerns me greatly. During the extended period when comedia was
in development, I longed for just a "hook" that would allow for reliable
connection correlation. But alas, since we are signaling media streams
that use an open-ended set of protocols, such a hook did not exist. My
working assumption was that shoe-horning additional information into an
arbitrary protocol was outside the bounds of what SDP could impose.
Guess I was wrong. :-)
We wrangled on this one too. In the end I felt the benefits (it would
work, as opposed to, it didnt) outweighed the costs.
The suggestion in comedia-fix (to use a header at the start of the
connection) jumbles the protocol stack, and as such may cause some
unintended consequences.
Arguably, yes. Not the first time though. It is common to do this in
many protocols that are strongly recommended in IETF.
At the heart of the issue is "what is it that
we are signaling here?". If we are signaling RTP/AVP over TCP, then we
violate the spec with the initial ID transmission.
THink of it like a windowing operation in an operating system. During
the times at which the RTP/AVP stack has the connection, its compliant.
Its just that it doesnt get the connection right away.
If we are signaling
RTP/AVP over TLS, where does the ID transmission go, as part of the TLS
stream or prior to the TLS handshake? In either case we end up with a
stream that is neither (a) 100% RTP/AVP and/or (b) 100% TLS.
Funny you should say this, since TLS is a good example of where common
practice is to mix protocol layers.
The recommended practice these days is for protocols to NOT run their
apps over tls on a separate port. The prtoocol itself needs to be able
to demux the two, and to transition from "raw protocol" to TLS. An
example of this is the STARTTLS mechanism used in http and other
protocols. First its HTTP, and then its TLS, all on the same TCP connection.
Also, this is the only way for SASL to work. SASL, by design, inserts a
security layer in the middle of the protocol stream. Sasl compliant
protocols need to specify how they will demarcate the point at which you
"hand the connection" over to the sasl security layer, so it can decrypt
the messages and hand them over to the application.
So, while I agree with your concerns, there is a long-established and
recommended practice of such a thing in a good number of widely deployed
IETF protocols.
This problem can manifest itself in implementation complications as well
as middlebox problems.
I don't see the middlebox issue. Please elaborate.
Take TLS for example: let's say we put the endpoint ID ahead of the TLS
handshake. The OpenSSL folks were kind enough to provide C developers a
simple API where you prepend "ssl_" in front of the normal BSD socket
calls, and for the most part you can write code for SSL/TLS in the same
manner as raw TCP sockets. But with an endpoint ID, suddenly I the
developer have to do a more complicated scheme where I (a) connect a TCP
socket, (b) send/receive the endpoint ID, then (c) construct an SSL
session based on the connected socket. This of course presumes that the
language-specific API to SSL is flexible enough for the developer to
juggle the order of the protocol stack in this manner. Other API's
(Java, Perl, VB, etc.) may not be so forgiving.
Indeed. THis is a weakness. Not a new one. In fact, the need to support
STARTTLS makes me think this is a very common feature of TLS stacks.
But, I have not looked into it.
Then there are the considerations for a decomposed gateway topology. By
multiplexing signalling information into the media stream, suddenly you
force the a tight coupling between the signalling and media transport
infrastructures by intermixing signalling tokens with the media stream.
How is this different than having the signaling thing ask the media
thing for an IP address, and then putting that IP address into the SDP?
Then, when the media thing receives media on that IP address, it knows
the "control" connection associated with it. Here, we do the same, but
instead of IP address, its IP+eid. Same difference.
For middlebox problems, consider the stateful packet inspection
firewall. The idea of course is to enforce a policy by protocol rather
than by connectivity. So SMTP, HTTP, FTP, and DNS are allowed, but
nothing else, and it doesn't matter what ports or addresses are used.
In theory the sysadmin should also be able to specify SIP, TLS, RTP/AVP,
and IM protocols in this list and allow those forward-looking end-users
to utilize this fancy new SIP stuff. With comedia-fix, every protocol
that is signaled will not, in fact, *be* the protocol described in the
SDP. So when the firewall watches the media stream, it doesn't see a
stream that fits any of the protocol structures it recognizes, and
therefore disallows the stream.
Now I am confused. The way that firewall admins enforce policy by
protocol rather than by connectivity is through ports. In the current
comedia spec, there will almost never be just a single port for the
newfangled media stream. So, you can't enforce policy. The mechanism in
comedia-fix allows there to be one, and only one, port allocated for any
specific media-over-tcp type. This will enable firewall admins to do
things, not disable it.
I do not understand your point about the protocol not "being" the one
that is advertised. It most certainly is, with the addition of a 32 byte
ID up front, but who cares?
Another case is where TLS is provided by an external accelerator box to
offload the compute load of the server---same issue.
4.2 Decoupled lifetimes
This has a number of side-effects. First, it essentially forces a
single-port architecture on both of the endpoints, as the only other
option is the "this-doesn't-scale" 1:1 client-to-port-number mapping.
With comedia as it stands today, the lifetime of the listener port is
decoupled from the lifetime of the session that is begat from that
listener. No longer the case with comedia-fix.
You can use as many ports as you like, ranging from 1 to one for each
session. Because I don't need the port to correlate, I am free to use
any algorithm I like. Comedia has to use an algorithm which has at least
one-per-pending-offer, because you have no way to correlate. Therefore,
I do not believe that comedia-fix introduces a binary "1 port" or "1
port per session" restriction.
Second, it forces endpoints to use "both" where one could have opted for
just "active" or "passive". The problem here is that if one endpoint
specified "passive", and the media connection drops, it has no way to
legally reinstate the connection if it needs to send data to the other
endpoint. In essence, comedia-fix "raises the stakes" as to what is an
appropriate connection mode to negotiate. In comedia, the only thing at
stake was the ability to bring up the intial connection. In
comedia-fix, we add the issue of recovering from the scenario that the
remote endpoint will drop the connection.
You need to handle connection failures. In the current comedia, if the
passive side loses the connection, it can bring it up by doing a
re-invite and then include a reconnect parameter. What you CANNOT do is
have the active side reconnect without doing a re-inVITE. In
comedia-fix, you can have the active side reconnect without doing a
re-INVITE. There is no reason we could not also allow the passive side
to do a re-INVITE to force a reconnect from the active side. I did
propose to remove that, but perhaps it is useful after all.
Third, as described in comedia-fix, NAT's can cause a similar outage to
#2 even if each endpoint specifies "both".
The decoupled lifetime, to my knowledge, is unprecedented.
Really? See RFC3261. SIP can run over TCP. When it does, the lifetime of
the connection is not strongly coupled to the lifetime of the
transaction or dialog. It used to be, in rfc2543. After years of trying
to make it work, we found it was a nightmare to maintain. THus, it got
disacarded. The result was a nice clean separate of the SIP layer from
teh TCP connectivity layer. This change bought us many properties,
including improvements in robustness (the ability to recover a SIP
transaction even if the tcp connection the request went over had closed).
Its critical in any protocol where the intermediary elements provide a
subset of the endpoint functionality.
In
developing comedia I tried to apply the litmus test of "how is this
scenario handled in connectionless media, and how do connections
differ?". I don't believe that decoupled lifetimes fares very well
here. Remember that while "traditional" SDP media transports may be
connectionless, they aren't stateless. So the corollary to decoupling
connection lifetime from session lifetime is akin to allowing RTP/AVP
streams to be stopped, reset, then restarted, all without any additional
activity on the signalling channel. Again, I'm not an expert here, but
I imagine that this would be considered outside the spec today.
There is no notion to starting/stopping RTP streams, so its hard to say
what this would mean.
I also don't understand the rationale behind the statement: "If one side
wishes to force the other to reconnect, it merely drops the connection.
When the other side has data to send, it will establish a new
connection." The "reconnect" attribute was never intended to "force"
reconnects. Rather, it was intended as a way to recover from an
*unintended* disconnection in such as a way as to preserve the existing
session state. Is there ever a reason that an endpoint would tear down
the connection solely to inconvenience the other endpoint with the task
of re-establishing it? (insert half-smiley here, since that's the
impression one gets from the language in comedia-fix)
No, of course not.
I must admit I am not sure what I meant here. Per above, I do see how
the reconnect parameter would help.
I guess I'm having trouble understanding why it is so important that the
lifetime be decoupled. The only argument put forth is the
multiplexing-bridge scenario. If we really are using a multiplexed
tunnel to bridge two networks, what's the harm in making the tunneling
connection "sticky"?
The intermediaries in this case woudl need to "reference count". That
is, they would need to keep track of each session that gets set up, and
then torn down, which had media over that connection. This is impossible
to do when the media intermediary is decoupled from the SIP proxy. Doing
it would require what we call a B2BUA in SIP, along with a colocated
media intermediary or a remote one under tight control.
Another problem is disconnects due to some kind of network outage. Lets
say users are sending their TCP-media through two relays. The connection
between the relays gets lost, but the connections from each user to
their relay remains. Neither user will try to reconnect, since as far as
they know, there is no TCP connection problem. The relays know there is
a problem, and they could reconnect, but comedia prohibits them from
doing so. They would have to be a b2bua once more, and issue a new
INVITE on their own, just to re-establish it. In other words, the relays
need to be the same (in terms of fucntioanlity) as the endpoints. I want
to drive such things out of intermediaries, not INTO them.
So, to sum up, with comedia today, building these relays requires a very
tight coupling between the SIP proxy (which has to be a b2bua) and the
tcp media relay. WIth the comedia-fix, they are decoupled. The SIP
intermediary can remain a proxy, and the media-relay can worry about its
own connectivity.
On the flip side, since there is a multiplexing
function occurring in an intermediary, shouldn't there be a mechanism
for determining the lifetime of the feeds into the mux that just makes
this problem go away? How today, do the endpoints in this scenario know
when to do an initial INVITE, then do the final BYE?
Sure, it can be computed. It just introduces a coupling whcih is not
otherwise there.
-Jonathan R.
--
Jonathan D. Rosenberg, Ph.D. 72 Eagle Rock Ave.
Chief Scientist First Floor
dynamicsoft East Hanover, NJ 07936
jdrosen@dynamicsoft.com FAX: (973) 952-5050
http://www.jdrosen.net PHONE: (973) 952-5000
http://www.dynamicsoft.com
_______________________________________________
mmusic mailing list
mmusic@ietf.org
https://www1.ietf.org/mailman/listinfo/mmusic