[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Review of store and forward drafts



Hi Eric,

 

Thanks very much for providing these comments; it is much appreciated to have this analysis of the problems and solutions in the draft.

 

> These documents describe a set of use cases and a protocol for a 

> modification to SRTP to do "store and forward". I have several

> concerns:

 

> - Are these use cases actually compelling?

 

The use cases are motivated by applications used and needed by the intended primary users and their major concern is data confidentiality. Given that, the use cases are real and compelling. In media distribution use cases, the confidentiality requirement is asymmetric; it (mainly) comes from the content owner. In the other use cases the requirement is symmetric and comes from both sender and receiver. Typical users in the answering machine and conferencing use cases are as stated in the draft organizations such as public safety and enterprises. Or for that matter anyone that uses transport protection and wants a similar level of protection in the store-and-forward case.

 

> - Do these use cases all need the same set of mechanisms?

 

Maybe not, but the draft proposes a common solution. And if all use cases can be easily solved with a single set of mechanisms, there seems to be no need for several solutions.

 

> - How is key management really going to be done for these.

> - How realistic is the threat model?

 

These questions are answered below.

 

> The use cases here fall into two major categories:

 

> - media distribution

> - recording media on answering machines/voicemail servers

> - untrusted conference bridges

 

> It's not clear to me that these are really that similar problems.  In 

> particular, the media distribution cases mostly seem to me to be cases

> in which what's needed is to temporally decouple message integrity

> from confidentiality so that you can avoid storing bogus data which is

> only determined to be bogus when you go to replay it. This is 

> particularly noticeable in the "Recording Encrypted Media at Home"

> example (S 4.2.4) where there's no real need to treat the DVR as 

> untrusted--you just want to be sure that the media is correct at the 

> time you record it even though you don't have a license for the media

> yet.

 

You are right in that the problems are a bit different. In this use case it might be sufficient to separate the keys. However, the solution then becomes severely limited in that the media cannot be pre-encrypted and the stored media cannot be forwarded. To get away from such limitations, a more important requirement in the media distribution use cases is to decouple the encryption from the RTP header to make the protection transport independent (See Section 5 in draft-mattsson). When it comes to trusting the DVR in the use case you mention we agree that the owner of the DVR would as you write probably trust it. However, the major security requirement in all use cases is to protect the confidentiality of the data and a content owner would have (very) limited trust in an arbitrary DVR.

 

While we initially say that the trust model does not need to be symmetric in that both sender and receiver needs to have the same level of trust in the middlebox, we have probably been sloppy in pointing out this difference in a few places (like here, for instance). Thanks for identifying this issue.

 

> I agree that the voice mail case requires something more substantial, 

> however it's also much less compelling. First, it's not clear to me 

> that people really don't trust their voice mail servers. Second, voice

> mail servers do a fair amount of mixing (for instance, prompts) with 

> the media. As I said on the call, I'm skeptical that existing phones 

> will handle this well. Third, it requires a key management scheme we 

> don't have: in order for this use case to work, you need a way for the

> sender of the voice mail to do offline key exchange with the ultimate 

> receiver. If this worked well, we wouldn't have had any need for 

> RTPSEC to solve the special case of online key exchange!

 

Well, in the general case with two random peers without any relation, the key management has no general solution today, and it is probably such peers that you think of when you write that the use case is not compelling. The typical users we have in mind, public-safety and enterprise users, would normally have key management systems already. Note that the de-coupling of the key management and SRTP SaF was a deliberate choice to allow use of different existing and/or future key management solutions. The fact that a general key management scheme does not exist is to me not an argument to stop looking at the use case, it is rather an argument to start looking also at the key management problem. In Appendix B in draft-mattsson we illustrate how the key management can be achieved with small changes to DTLS-SRTP or MIKEY, but some work surely needs to be done. Finally, you can of course not mix encrypted data, but the needed functionality for prompting etc can be achieved by switching of security contexts within one stream or simply sending the prompts in a separate stream.

 

> The untrusted conference bridge case seems even less compelling, 

> because real bridges do a lot of fancy audio stuff.

 

One example where no mixing is done is Push-to-talk, which is a major use case for the expected users such as public safety and enterprises. Ordinary conference bridges would need some type of floor control.

 

> Another concern I have is that the threat model seems fairly 

> unrealistic. Why is it safe to assume that the middlebox which is not 

> allowed to see the media is nevertheless allowed to retime it and 

> mount generic cut-and-paste attacks? I appreciate you have a 

> countermeasure for that, but that countermeasure relies on a separate 

> integrity check and basically precludes much of the point of this 

> work, it seems to me.

 

The trust in the middlebox may sometimes be very high (as you yourself point out above, e.g. the end user likely has strong trust in his own PVR). In other cases, it may be weak(er). We looked for a single trust model that is reasonable (neither too strong nor too weak) to apply to all our use cases. The main concern (from end-user's point of view) in all our uses cases is the confidentiality of sensitive/valuable data. We claim that from this (confidentiality) point of view it _is_ safe to use the proposed solution in the defined trust model. The middlebox is trusted to deliver media as requested, nothing else, and to this end we used the well-known "honest-by-curious" model. But this does not imply that the middlebox is "allowed" to do the copy-paste attacks and similar things. The deliberately malicious middlebox threat is in fact not realistic in the intended operational environment. Nevertheless, we always leave control of data confidentiality to the endpoints.

 

In summary:

*	In the media distribution use cases, the main concern of the content creator is probably not what a DVR does with the media as long as it cannot access cleartext data. In the same case, the owner of the DVR would trust it.
*	In the answering machine and conferencing use cases, there is no realistic case were the owner of the service (e.g. an operator) would have something to gain by launching such copy-paste attacks in general. (Note that such attacks would anyhow be much harder to do in a controlled way, when the data is e2e encrypted.) The privacy of the data is the fundamental requirement for public safety and enterprise users.
*	The trust model has been chosen to be simple but at the same time it also "allows" some attacks which are not realistic (but they are still detectable).

 

When we update the draft, we will include a section about copy-and-paste attacks in the security considerations.

 

> At the end of the day, I wonder whether there is a simpler scheme in 

> which we simply partition the authentication and encryption transforms

> so that disconnected keys can be used for each. ISTM that this would 

> allow the most interesting set of use cases (those concerned with 

> media distribution) with significantly less effort. Yes, I realize 

> that it wouldn't provide quite the same level of integrity protection,

> but it's not clear that that's really necessary. To the extent to 

> which you *do* want to double-wrap the data, it's not clear to me that

> doing it again in SRTP is really that useful: a lot of the point of 

> SRTP is that it leverages existing headers for maximum data 

> compactness. Once you're doublewrapping and adding new headers like 

> e2e PUV, maybe you could just get away with an outer DTLS wrapper 

> without any SRTP extensions

 

Key separation is a very nice and simple alternative that would work in some of the uses cases. Unfortunately, it has several shortcomings. It does not work well in the media distribution use cases as rewind and random seek functionality would be impossible. For that functionality to be possible, the protection has to be transport independent. To solve this, we introduced the PUV. To identify the SA we needed the CCI, and to protect several streams with a single SA, we added the SSS. We did consider alternatives, but the requirements in the identified use cases cannot be solved with any "simpler" changes to SRTP.

 

/John