[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MMUSIC] comedia-fix-00 comments



This is a rather long and rambling commentary, so let me summarize the major points up front:

- The scaling issue is misstated.

- The security issues raised are not actually solved by
comedia-fix, but can be mitigated more simply.

- The proposed mechanism for session correlation has
serious side-effects that are not addressed.

- The connection/session lifetime decoupling is a
major paradigm shift, and also has some serious
side-effects.

Details below, arranged by comedia-fix sections:

3.1 Port Multiplicity on Servers

This section is incorrect. Even without a means to correlate incoming connections to sessions, the mapping of allocated port numbers to clients is not 1:1. I'm surprised to see this statement in the draft since I thought I had explained that in detail at earlier meetings.

The size of the port space need only be as large as the NUMBER OF PENDING OFFERS made to potential clients. This is not the scaling problem described in comedia-fix (and not really a scaling problem at all, IMO), and the draft should be updated to reflect this.


3.2 No Connection Re-Use

This references 3.1 and restates the incorrect scaling assumption.

This scenario seems to be the only problem for which connection/session lifetime decoupling is presented as a solution. I cover this in more detail below, but the short answer is that any economy of design derived from this solution is offset by burdening other, more common scenarios with undesirable restrictions.


3.3 Security

The Man-in-the-Middle attack can be mitigated by simply disallowing multiple connections to the advertised IP/port during the offer period. If more than one arrives, then the endpoint assumes an attack and does a session teardown. (yes, this requires updating the language in comedia to reflect this new scenario)

The firewall trust issue can be addressed by a chain of trust, can it not? I.e., presumably the firewall authenticates/trusts a SIP proxy which in turn authenticates/trusts each of the UA's. The instruction to open/close pinholes is made by the SIP proxy to the firewall as a result of the message traffic the proxy is relaying, not by any sort of stateful packet inspection by the firewall.

As for the "there's no ALG yet" argument for residential gateways: I submit yet again that without including the source address/port, comedia will preclude an ALG from ever being written.


4.1 Endpoint-ID

This concerns me greatly. During the extended period when comedia was in development, I longed for just a "hook" that would allow for reliable connection correlation. But alas, since we are signaling media streams that use an open-ended set of protocols, such a hook did not exist. My working assumption was that shoe-horning additional information into an arbitrary protocol was outside the bounds of what SDP could impose. Guess I was wrong. :-)

The suggestion in comedia-fix (to use a header at the start of the connection) jumbles the protocol stack, and as such may cause some unintended consequences. At the heart of the issue is "what is it that we are signaling here?". If we are signaling RTP/AVP over TCP, then we violate the spec with the initial ID transmission. If we are signaling RTP/AVP over TLS, where does the ID transmission go, as part of the TLS stream or prior to the TLS handshake? In either case we end up with a stream that is neither (a) 100% RTP/AVP and/or (b) 100% TLS.

This problem can manifest itself in implementation complications as well as middlebox problems.

Take TLS for example: let's say we put the endpoint ID ahead of the TLS handshake. The OpenSSL folks were kind enough to provide C developers a simple API where you prepend "ssl_" in front of the normal BSD socket calls, and for the most part you can write code for SSL/TLS in the same manner as raw TCP sockets. But with an endpoint ID, suddenly I the developer have to do a more complicated scheme where I (a) connect a TCP socket, (b) send/receive the endpoint ID, then (c) construct an SSL session based on the connected socket. This of course presumes that the language-specific API to SSL is flexible enough for the developer to juggle the order of the protocol stack in this manner. Other API's (Java, Perl, VB, etc.) may not be so forgiving.

While I'm not a streaming media expert, I imagine that the situation for streaming media libraries may be worse.

Then there are the considerations for a decomposed gateway topology. By multiplexing signalling information into the media stream, suddenly you force the a tight coupling between the signalling and media transport infrastructures by intermixing signalling tokens with the media stream.

For middlebox problems, consider the stateful packet inspection firewall. The idea of course is to enforce a policy by protocol rather than by connectivity. So SMTP, HTTP, FTP, and DNS are allowed, but nothing else, and it doesn't matter what ports or addresses are used. In theory the sysadmin should also be able to specify SIP, TLS, RTP/AVP, and IM protocols in this list and allow those forward-looking end-users to utilize this fancy new SIP stuff. With comedia-fix, every protocol that is signaled will not, in fact, *be* the protocol described in the SDP. So when the firewall watches the media stream, it doesn't see a stream that fits any of the protocol structures it recognizes, and therefore disallows the stream.

Another case is where TLS is provided by an external accelerator box to offload the compute load of the server---same issue.


4.2 Decoupled lifetimes

This has a number of side-effects. First, it essentially forces a single-port architecture on both of the endpoints, as the only other option is the "this-doesn't-scale" 1:1 client-to-port-number mapping. With comedia as it stands today, the lifetime of the listener port is decoupled from the lifetime of the session that is begat from that listener. No longer the case with comedia-fix.

Second, it forces endpoints to use "both" where one could have opted for just "active" or "passive". The problem here is that if one endpoint specified "passive", and the media connection drops, it has no way to legally reinstate the connection if it needs to send data to the other endpoint. In essence, comedia-fix "raises the stakes" as to what is an appropriate connection mode to negotiate. In comedia, the only thing at stake was the ability to bring up the intial connection. In comedia-fix, we add the issue of recovering from the scenario that the remote endpoint will drop the connection.

Third, as described in comedia-fix, NAT's can cause a similar outage to #2 even if each endpoint specifies "both".

The decoupled lifetime, to my knowledge, is unprecedented. In developing comedia I tried to apply the litmus test of "how is this scenario handled in connectionless media, and how do connections differ?". I don't believe that decoupled lifetimes fares very well here. Remember that while "traditional" SDP media transports may be connectionless, they aren't stateless. So the corollary to decoupling connection lifetime from session lifetime is akin to allowing RTP/AVP streams to be stopped, reset, then restarted, all without any additional activity on the signalling channel. Again, I'm not an expert here, but I imagine that this would be considered outside the spec today.

I also don't understand the rationale behind the statement: "If one side wishes to force the other to reconnect, it merely drops the connection. When the other side has data to send, it will establish a new connection." The "reconnect" attribute was never intended to "force" reconnects. Rather, it was intended as a way to recover from an *unintended* disconnection in such as a way as to preserve the existing session state. Is there ever a reason that an endpoint would tear down the connection solely to inconvenience the other endpoint with the task of re-establishing it? (insert half-smiley here, since that's the impression one gets from the language in comedia-fix)

I guess I'm having trouble understanding why it is so important that the lifetime be decoupled. The only argument put forth is the multiplexing-bridge scenario. If we really are using a multiplexed tunnel to bridge two networks, what's the harm in making the tunneling connection "sticky"? On the flip side, since there is a multiplexing function occurring in an intermediary, shouldn't there be a mechanism for determining the lifetime of the feeds into the mux that just makes this problem go away? How today, do the endpoints in this scenario know when to do an initial INVITE, then do the final BYE?

Or, a more blunt way of saying this: "Why is this comedia's problem?" This strikes me as a lot of bending-over-backwards to accommodate a network transport architecture that should be transparent to the signaling protocol used by the endpoints.

_______________________________________________
mmusic mailing list
mmusic@ietf.org
https://www1.ietf.org/mailman/listinfo/mmusic