Network Working Group                                            R. Blom
Internet-Draft                                                  Y. Cheng
Intended status: Standards Track                             F. Lindholm
Expires: May 7, 2009                                         J. Mattsson
                                                              M. Naslund
                                                              K. Norrman
                                                       Ericsson Research
                                                        November 3, 2008


                         SRTP Store and Forward
                draft-mattsson-srtp-store-and-forward-01

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 7, 2009.

Abstract

   The Secure Real-time Transport Protocol (SRTP) was designed to allow
   simple and efficient protection of RTP.  To provide this, encryption
   and authentication of media and control signaling are tightly coupled
   to the RTP session, and to the information in the RTP header.  Hence,
   in general it is not possible to perform store and forward of
   protected media.

   This document gives, based on a use case analysis, requirements that


Blom, et al.               Expires May 7, 2009                  [Page 1]

Internet-Draft           SRTP Store and Forward            November 2008


   SRTP and new SRTP transforms need to satisfy in order to allow secure
   store-and-forward operation.  A first proposal on how to introduce
   the needed new functionality and transforms in SRTP is also
   presented.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Selected SRTP background facts . . . . . . . . . . . . . . . .  4
   4.  Use Cases  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     4.1.  Trust Model and Assumptions  . . . . . . . . . . . . . . .  6
     4.2.  Media Distribution Use Cases . . . . . . . . . . . . . . .  6
       4.2.1.  Streaming Pre-encrypted Media  . . . . . . . . . . . .  6
       4.2.2.  Video on Demand  . . . . . . . . . . . . . . . . . . .  6
       4.2.3.  Caching Protected Media in the Network . . . . . . . .  7
       4.2.4.  Recording Encrypted Media at Home  . . . . . . . . . .  7
     4.3.  Answering Machine use case . . . . . . . . . . . . . . . .  7
       4.3.1.  Storing/Caching Encrypted Media  . . . . . . . . . . .  7
       4.3.2.  Transport Protection . . . . . . . . . . . . . . . . .  8
       4.3.3.  Playback of Media Stream . . . . . . . . . . . . . . .  8
       4.3.4.  Multiple Callers . . . . . . . . . . . . . . . . . . .  9
     4.4.  Use Case: Centralized Conferencing . . . . . . . . . . . .  9
   5.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  9
   6.  Solution Proposal  . . . . . . . . . . . . . . . . . . . . . . 11
     6.1.  overview . . . . . . . . . . . . . . . . . . . . . . . . . 11
     6.2.  SRTP Cryptographic Contexts  . . . . . . . . . . . . . . . 12
     6.3.  New Transforms . . . . . . . . . . . . . . . . . . . . . . 13
       6.3.1.  Media Protection Transform . . . . . . . . . . . . . . 14
       6.3.2.  Replay Protection  . . . . . . . . . . . . . . . . . . 16
       6.3.3.  Key Derivation . . . . . . . . . . . . . . . . . . . . 16
   7.  Commented Example Usage  . . . . . . . . . . . . . . . . . . . 16
   8.  Implications on SRTP . . . . . . . . . . . . . . . . . . . . . 18
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
     9.1.  Media protection Transform . . . . . . . . . . . . . . . . 18
     9.2.  Replay Protection  . . . . . . . . . . . . . . . . . . . . 18
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 19
     12.2. Informative References . . . . . . . . . . . . . . . . . . 19
   Appendix A.  Draft Compound Transform Details  . . . . . . . . . . 19
     A.1.  Processing . . . . . . . . . . . . . . . . . . . . . . . . 20
       A.1.1.  Sender . . . . . . . . . . . . . . . . . . . . . . . . 21
       A.1.2.  Middlebox  . . . . . . . . . . . . . . . . . . . . . . 21
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
   Intellectual Property and Copyright Statements . . . . . . . . . . 24


Blom, et al.               Expires May 7, 2009                  [Page 2]

Internet-Draft           SRTP Store and Forward            November 2008


1.  Introduction

   The Secure Real-time Transport Protocol (SRTP) [RFC3711] is a profile
   of the Real-time Transport Protocol (RTP) [RFC3550], and it provides
   confidentiality, message authentication, and replay protection to
   both RTP and RTCP (Real-time Transport Control Protocol).

   SRTP was designed to protect real-time point-to-point communications
   and is, as presently defined, not aimed for communication solutions
   that include non-trusted store-and-forward middleboxes, i.e
   middleboxes that should not have access to cleartext media, but still
   should be able to have access to other data in order retransmit media
   according to RTP standard procedures.

   Media in need of e2e protection could e.g. be real-time voice and
   video information/media clips for internal use by personnel in
   enterprises or authorities.  There are also multimedia telephony
   applications utilizing mail-boxes and other store and forward
   functions that need e2e protection.  E2e protection could also be
   needed to protect subscribed media like commercial-free radio and
   television that is distributed over the Internet.

   A typical use case is store-and-forward media distributions systems.
   Many of those systems require that media is confidentiality protected
   end-to-end (e2e) between the media source and the media rendering
   device; this to prevent illegitimate media intercept or sharing.  At
   the same time the communication should be hop-by-hop (hbh) protected
   to prevent malicious users from performing denial of service attacks
   by sending bogus data to store-and-forward middleboxes.  Methods like
   the Packet-switched Streaming Service (PSS) [3GPP.26.234] exhibit the
   properties needed for secure store-and-forward operation, but they
   are part of larger frameworks tailored for very specific use cases.
   Thus it would be desirable to be able to offer use of SRTP as a
   general lightweight mechanism to achieve this type of protection.

   Trying to use SRTP with store-and-forward middleboxes reveals two
   main problems.  The first problem is due to the fact that the
   incoming and outgoing RTP streams in general are independent;
   received RTP packets cannot just be stored and later retransmitted.
   This in particular implies that SRTP with currently defined
   transforms cannot be applied.  For details see section 3.

   It should be noted that store-and-forward of media in most cases
   requires that side information is available when retransmitting
   received media.  Such side information is e.g.  RTP timestamp
   information and the needed side information may come from the RTP
   header, RTCP messages and session definition data.


Blom, et al.               Expires May 7, 2009                  [Page 3]

Internet-Draft           SRTP Store and Forward            November 2008


   The second problem is that to provide both e2e and hbh protection,
   two independent security contexts with associated protection
   mechanisms have to coexist; a feature unavailable in SRTP as
   currently specified.  To resolve these problems, SRTP needs
   enhancements that in an efficient and coherent way support store-and-
   forward use cases.

   The objective of this document is to explore use cases for a SRTP
   store-and-forward solution, derive associated requirements and
   present and discuss an approach for a solution.


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   Definitions of terms and notation will, unless otherwise indicated,
   be as defined in [RFC3711].

   The term authentication will be used to denote message authentication
   and message integrity protection.

   By RTP transport protection or simply transport protection, we mean
   protection (confidentiality, authentication, etc.) of streamed RTP
   packets.  This is provided by SRTP according to [RFC3711].

   By media protection, we similarly mean protection of the application
   payloads carried in RTP.  SRTP provides media protection, but only
   during transport (see above).


3.  Selected SRTP background facts

   SRTP as currently specified has the features described below, which
   explain why it cannot be directly used in store-and-forward
   applications.  They also indicate how a SRTP store-and-forward
   solution could be designed.

   o  All current SRTP transforms use the RTP header as input.  AES-CTR
      uses the SSRC and the packet index to calculate the IV
      (Initialization Vector), AES-f8 uses even more header parameters,
      and HMAC-SHA1 authenticates the full RTP header.  The SSRC is
      typically determined by the key management protocol and the packet
      index includes the RTP sequence number, which should be randomly
      chosen according to RTP [RFC3550].  All this means that there are
      no standard compliant ways to receive SRTP protected packets in


Blom, et al.               Expires May 7, 2009                  [Page 4]

Internet-Draft           SRTP Store and Forward            November 2008


      one stream and later just retransmit the packets as they were
      received.

   o  Even if the SRTP relevant RTP parameters like SSRC and the SRTP
      index could be determined beforehand for the retransmission
      stream, it would not allow a client to randomly seek in a stream
      without renegotiating the session, as it would lead to
      misalignment between the packet index used for streaming and the
      packet index used by SRTP at the originator.  If the user jumps to
      a different part of the stream, it is impossible to continue
      increasing the RTP sequence number stepwise while at the same time
      keeping it equal to the sequence number needed for decryption.
      Jumping backward (e.g. media rewind) would cause even more
      problems as the retransmitted packets would be discarded by the
      SRTP replay protection.

   o  The encryption key and the authentication key are both derived
      from the same master key in SRTP, see Figure 1.  This means that a
      client which is able to derive e.g. the authentication key will
      also always have access to the encryption key making it impossible
      to use say the session encr_key for e2e protection and the session
      auth_key for hbh protection.


                       Packet index -------+
                                           |
                                           v
      +------------+                 +------------+ Session encr_key
      |            |   Master key    |            +------------------>
      |  External  +---------------->|     Key    | Session auth_key
      |    Key     |                 | Derivation +------------------>
      | Management +---------------->|            | Session salt_key
      |            |   Master salt   |            +------------------>
      +------------+                 +------------+

                       Figure 1: SRTP key derivation


4.  Use Cases

   The use cases below were chosen to illustrate media streaming
   scenarios where the current SRTP specification [RFC3711] does not
   provide sufficient functionality.  These use cases provide context
   and general rationale for the requirements presented in Section 5.

   Note that the necessary key distribution and media session set-up is
   out of scope for this document and will thus not be discussed in any
   detail in the use cases below.


Blom, et al.               Expires May 7, 2009                  [Page 5]

Internet-Draft           SRTP Store and Forward            November 2008


4.1.  Trust Model and Assumptions

   The trust model assumed in this document includes two parties who
   wish to communicate securely via one or more honest but curious
   middleboxes.  This means that the communicating parties trust the
   middlebox to deliver the media as expected, but they do not trust it
   with cleartext data.  In the use cases below there is no example of
   multiple (sequential) middleboxes, but it is a natural generalization
   and it seems warranted to cover this case as well.

4.2.  Media Distribution Use Cases

4.2.1.  Streaming Pre-encrypted Media

   A content creator wants to distribute high value content to clients.
   The content provider distributes the media via a streaming server
   which should not have access to cleartext media, typically because it
   is not trusted by the content creator.  In one scenario the content
   creator streams the media to the streaming server where the media is
   stored in a protected format.  In another scenario the protected
   media may be delivered to the streaming server via e.g. file
   transfer.  These use cases correspond to use of pre-encryption in
   media distribution.  In both cases protected media is available in
   the streaming server for later transmission to different clients.

   Even in cases when the streaming server could be trusted with
   cleartext data there are reasons why one would like to avoid
   performing encryption in the streaming server itself.  One reason to
   use pre-encryption is to offload the streaming server the task of
   encrypting the media, especially if the same media is used several
   times e.g. in video on demand.  If the media is pre-encrypted the
   streaming server only needs to add integrity protection to the
   encrypted media before streaming it to the clients.  Clients are
   trusted by the content creator and have access to the encryption key.
   When a client receives a packet, the authenticity is checked using a
   security context shared with the streaming server and the decryption
   is performed using a security context shared with the content
   creator.

4.2.2.  Video on Demand

   Some protected content is offered as video on demand where users can
   watch selected video clips at any time.  The media is unicasted and
   the clients are offered random seek functionality which allow them to
   quickly jump to any part of the video.  Other features offered may be
   rendering with speed translation as in fast forward and slow motion
   rendering.  These features can be used to skip parts of the video or
   jump backward to see interesting parts again.  The problem here is


Blom, et al.               Expires May 7, 2009                  [Page 6]

Internet-Draft           SRTP Store and Forward            November 2008


   jumping back and forth and performing rendering speed translations in
   an e2e protected media stream.

4.2.3.  Caching Protected Media in the Network

   High value encrypted media (e.g.  Internet Protocol Television
   (IPTV), and radio) is broadcasted in a network.  Only clients trusted
   by the content creator have access to the encryption key.  A network
   node is caching the media, but is not trusted by the content creator
   and has therefore no access to the encryption keys.  A client that
   missed the beginning of a program might stream the media from the
   network cache instead of listening to the broadcast.  Due to the
   trust model where the content creator only trusts the clients, the
   media needs to be e2e protected.  But the media also needs to be hbh
   integrity protected to protect against DoS attacks.

4.2.4.  Recording Encrypted Media at Home

   High value encrypted media (e.g.  IPTV, and radio) is broadcasted in
   a network.  Only clients trusted by the content creator have access
   to the encryption key.  A user is recording the media on a HDD (Hard
   Disk Drive), but does not yet have a license or have a license that
   does not allow cleartext copying.  The media is therefore stored in
   protected format on the HDD.  There is however a strong need for the
   HDD to be able to check the integrity of the media before it is
   stored.  Otherwise a DoS attack may fill the HDD with garbage.

4.3.  Answering Machine use case

4.3.1.  Storing/Caching Encrypted Media

   Operators commonly provide an answering machine service to their
   customers.  In this case the communicating parties (the caller and
   the callee) may not wish to disclose the media to any other party,
   and hence want to apply encryption between each other.  This requires
   that they are able to establish a shared key; how that is
   accomplished is out of scope for this document.  The answering
   machine acts as a store and forward middlebox, which has to store
   encrypted data and re-transmit it to the callee.  The answering
   machine may act as a streaming server when sending the data to the
   callee, and will then not use the exact same RTP headers on the
   outgoing SRTP traffic as was used on the incoming SRTP traffic.  SRTP
   as specified in [RFC3711] will not work in this case, since parts of
   the RTP header is input to the encryption/authentication transforms.

   An alternative forwarding of the recorded media from the answering
   machine to the callee could be by file transfer, sending the recorded
   media in e.g. the same format as was used to store it.  Such


Blom, et al.               Expires May 7, 2009                  [Page 7]

Internet-Draft           SRTP Store and Forward            November 2008


   forwarding would not be according to SRTP, but would still yield end-
   to-end protection of the media.  Note however, that decryption and
   rendering would be similar to part of an enhanced SRTP solution.

4.3.2.  Transport Protection

   To avoid that the answering machine is filled up with bogus data, it
   is necessary for the answering machine to authenticate the sender of
   the traffic, and further, to verify the authenticity of the incoming
   traffic.  This poses a problem for SRTP as of [RFC3711] in that the
   message authentication requires a session key shared with the
   answering machine, but the encryption key shall as discussed above
   not be available to it.  This implies that there is a need for two
   independent security contexts, one end-to-end and one hop-by-hop.

   When the callee retrieves the media from the answering machine,
   message authentication is also beneficial.  There are two
   possibilities.  Since the answering machine is trusted not to
   actively behave maliciously, it may be sufficient to provide message
   authentication between the answering machine and the callee.  Also
   here it would be necessary to have a separation between the e2e
   protection and the hbh protection.  A second option is that
   authentication is applied from the caller to the callee.  But if the
   authentication is applied in that way, the answering machine will not
   be able to verify the integrity of the incoming traffic from the
   caller.  It is of course also possible that message authentication is
   desired for any combination of endpoints, i.e. between the caller and
   the callee, between the caller and the answering machine, and between
   the answering machine and the callee.

4.3.3.  Playback of Media Stream

   When a user listens to the messages stored on the answering machine,
   it is useful to be able to rewind and/or fast forward in the media
   stream.  For SRTP as of [RFC3711] this is not possible.  The reason
   for that is that even if the same payloads can be re-inserted in the
   stream by the answering machine, the RTP sequence number is steadily
   increasing on a per packet basis.  Since the synchronization of the
   encryption transforms is based on the RTP sequence number, the
   decryption will fail.  In addition, message authentication will fail
   since the authentication according to [RFC3711] shall cover the
   header of the RTP packet.  This implies that the payload and the
   media have to be protected by a mechanism which is independent of
   parameters used in the transport protocol.


Blom, et al.               Expires May 7, 2009                  [Page 8]

Internet-Draft           SRTP Store and Forward            November 2008


4.3.4.  Multiple Callers

   Several messages may be left on the answering machine, received in
   different sessions and possibly from different callers.  The result
   of this is that different keys were used to encrypt the media.
   Depending on how the callee retrieves the messages from the answering
   machine, different options are possible.  One option is to retrieve
   each message as a separate stream, and in this case a separate
   session is required per message.  Another option is to somehow switch
   security contexts midstream when the next message starts.

4.4.  Use Case: Centralized Conferencing

   Another use case is a conference bridge that is not to be trusted
   with the cleartext media.  In this case the conference bridge cannot
   act as a mixer, but in some cases this may be a reasonable
   assumption.  An example is Push-To-Talk solutions, where only one
   user at a time is allowed to talk.  In this setting, the media may be
   re-packaged by the conferencing server into RTP packets with
   different headers compared to the incoming traffic.  As described in
   Section 3, this causes authentication and decryption to fail in SRTP.


5.  Requirements

   The use cases above show that to enable store and forward in an
   enhanced SRTP, it has to in an efficient way support the following
   requirements:

   o  Transport independent media protection

      It SHALL be possible to have media protection which is independent
      of RTP parameters.

      To allow retransmission of received protected media, a transform
      for protecting the RTP payload that is independent of RTP
      transport parameters is needed.

      The media protection MUST cover both message authentication and
      confidentiality protection.

   o  Media source authentication

      It SHALL be possible to provide source authentication of the media
      stream.

      In a group setting, source authentication is here meant to ensure
      that the message originated from a member of the group.  This


Blom, et al.               Expires May 7, 2009                  [Page 9]

Internet-Draft           SRTP Store and Forward            November 2008


      requirement is fulfilled if media has authentication protection in
      a transport independent manner.

   o  Support of playback of protected media streams

      A client SHALL be able to do random seek in a protected media
      stream.

      Note that as playback functions like retransmission and random
      seek capability are features in the described use cases, replay
      protection can not be required for transport independent media
      protection.

   o  Transport protection

      It SHALL be possible to provide transport protection which is
      independent of the media protection.

      The transport protection MUST be able to provide confidentiality,
      authentication, and replay protection for RTP and at least
      authentication and replay protection for RTCP.

      This requirement maps well against SRTP as of [RFC3711].
      Transport protection is also a means to provide replay protection
      of the media on a hop-by-hop basis.

   o  Separation of security contexts

      It MUST be possible to have independent security contexts for the
      transport independent media protection and the transport
      protection.

      This means in particular that there has to be two distinct master
      keys, one for e2e media protection and one for hbh transport
      protection.


   o  Change of transport independent media protection security context

      It MUST be possible to signal to the receiver the current media
      protection security context to use.  It MUST be possible to change
      this security context midstream.

      This is needed to allow single stream multiplexing of e.g.
      protected media "clips" which were generated using different
      transport independent media protection security contexts

      The requirements imply that the media protection format has to


Blom, et al.               Expires May 7, 2009                 [Page 10]

Internet-Draft           SRTP Store and Forward            November 2008


      include a Crypto Context Indicator (CCI) field for robust
      operation.  The CCI can be thought of as a generalized MKI and may
      be defined to also include all the MKI based functionality defined
      in [RFC3711].


6.  Solution Proposal

6.1.  overview

   The stated requirements above seem possible to meet by implementing a
   few minor additions to SRTP.  These additions mainly address new SRTP
   transforms, introduction of media and transport protection crypto
   context definitions together with key handling and derivation.

   A high level description of the proposed new SRTP functionality is as
   follows: The first step is to perform a transport independent media
   protection operation.  The coverage of this transform is the RTP
   payload only.  This operation could preferably be an Authenticated
   Encryption with Associated Data (AEAD) transform, which allows part
   of the payload to be sent in plain.  The media protection will rely
   on an explicit IV sequence number (IVSN) which is forwarded in the
   payload.

   After the steps making up the transport independent media protection
   have been performed the protection processing proceeds as currently
   defined by [RFC3711], which results in the addition of the required
   transport protection.

   Keying for transport protection, i.e. the SRTP internal key
   derivation performed is the same as described in [RFC3711].  The key
   derivation function operates on a master key and a master salt where
   the master key is denoted hbh-key.

   The keying for the media protection is defined in an equivalent way,
   producing keying material for the AEAD transform.  The e2e keying
   material is based on another master key, the e2e-key, which is
   independent of the hbh-key.  Also for the e2e context a master salt
   is defined.  The key derivations used to derive the e2e keying
   material will also use the key derivation function defined in
   [RFC3711].

   Note that with the approach taken only the end-points for the media
   protection will have to implement the new SRTP functionality with a
   combined media and transport transform including handling of two
   security contexts.  In the following we will denote such a combined
   transform a Compound Transform.  The store and forward middlebox can
   rely solely on [RFC3711] using already existing functionality for


Blom, et al.               Expires May 7, 2009                 [Page 11]

Internet-Draft           SRTP Store and Forward            November 2008


   store-and-forward operation, given that the transport transform in
   the compound transform is equivalent to a transform defined for
   [RFC3711].  However, there are some practical reasons why also the
   middlebox needs to have some "knowledge" of the e2e part of the
   protection, see below.

   To summarize: By a compound transform, we mean the combination of
   media protection transform according to the suggested AEAD of
   Section 6.3 (using the e2e key) and one of the defined transforms of
   [RFC3711] for the hbh part (using the hbh key).  The compound
   transform should be defined in this way to allow an intermediary to
   re-use a [RFC3711] compliant implementation of SRTP to first receive
   and then resend the media.

   For RTCP the solution principles described for RTP applies.  However,
   the main application for RTCP is to control the traffic over one hop
   which means that e2e encryption cannot be applied in general.  But
   note that there are RTCP application messages which might benefit
   from having e2e integrity protection.

6.2.  SRTP Cryptographic Contexts

   SRTP maintains a cryptographic context, containing master key(s),
   cryptographic transforms, etc., for the associated SRTP session.
   Exactly how the parameters in the cryptographic context are agreed is
   out of scope of SRTP and is a session set-up issue.  SRTP assumes
   that a cryptographic context or rather the master key therein, is
   shared only between mutually trusted parties.

   The SRTP cryptographic context concept is reusable for the proposed
   solution.  Conceptually, the originator and the intended end-receiver
   share an "e2e context" while a "hbh context" is shared by an endpoint
   and an intermediary or by two intermediaries.  To comply with the
   trust model of the use cases above the master key(s) in the e2e
   context MUST be cryptographically independent of, and MUST NOT be
   deducible from the master key of any hbh context.  The key management
   protocol(s) used MUST therefore be able to negotiate keys satisfying
   these requirements.

   The identification of the hbh security context should be as defined
   in [RFC3711] while the used e2e media security context either is
   implicitly identified in the session set-up or its identification
   relies on the proposed crypto context indicator (CCI).

   A sender will use two cryptographic contexts: an e2e context used for
   payload protection to the end-receiver, and, a hbh context used to
   secure the SRTP transport to the (first) intermediary.  Similarly,
   the end-receiver will use two contexts.  An intermediary node


Blom, et al.               Expires May 7, 2009                 [Page 12]

Internet-Draft           SRTP Store and Forward            November 2008


   however, will only use one standard SRTP context.  In other words, an
   e2e context is used to achieve transport independent media protection
   as required in Section 5, and a hbh context similarly is used to
   achieve transport protection.

   For both e2e and hbh contexts, it is assumed that SRTP cryptographic
   context parameters, such as master key and salt (if needed) are
   included.  From these, SRTP session keys/salts are derived similarly
   to [RFC3711] (see Figure 2).

                      e2e_context (payload protection)
                   <----------------------------------->
                 +---+             +---+              +---+
                 | S |             | M |              | R |
                 +---+             +---+              +---+
                    <---------------> <----------------->
                       hbh_context1       hbh_context2

                          ^                        ^
                          |                        |
                          +- transport protection -+

          Figure 2: Context sharing (Sender, Middlebox, Receiver)

   If several senders' payloads are multiplexed within the same SRTP
   stream from a server to a receiver (as discussed in Section 4.3.4)
   there may be need for the receiver to switch between e2e contexts in
   "midstream".  This can be implemented using a mechanism similar to
   the SRTP MKI field in the e2e context (what is referred to as CCI
   above).  The hbh context would, however, not need any change but
   could rely on an MKI field according to the current definition in
   [RFC3711].

6.3.  New Transforms

   As indicated above the new transform will be a media protection
   transform combined with a transport protection transform where the
   transport protection transform equals a transform as specified in
   [RFC3711].  Thus, here we will only describe and discuss the media
   protection part.

   We propose that the media protection part of the new transform is
   defined as an AEAD transform.  That is, both confidentiality and
   authenticity are provided by the same transform which also would
   allow part of the payload to be forwarded in plain but still be
   integrity protected.  The possibility to have part of the media
   payload forwarded in plain can be essential to enable simplifications
   in rendering functionality.


Blom, et al.               Expires May 7, 2009                 [Page 13]

Internet-Draft           SRTP Store and Forward            November 2008


   If for some reason only encryption or integrity protection is needed,
   it is in many cases easy to see how to separate the encryption part
   from the authentication part and handle them as separate transforms.

6.3.1.  Media Protection Transform

   The new media protection transform is proposed to be AES-GCM as
   defined in [GCM] and using AES as blockcipher, which allows reuse of
   most of the functionality in existing SRTP implementations.

   AES-GCM uses a 128-bit IV and here it is proposed that the IV is
   constructed from a session salt, a nonce (signaled out of band before
   the streaming begins), and an IV sequence number, the IVSN, included
   in each SRTP packet.  The exact IV forming function f to use is ffs.
   The nonce is added to replace the SSRC that guaranteed stream-
   uniqueness of the IV, and the IVSN replace the packet index that
   guaranteed packet uniqueness.


Blom, et al.               Expires May 7, 2009                 [Page 14]

Internet-Draft           SRTP Store and Forward            November 2008


                             Session                            Crypto
                           (encr.) Key                         ctxt ind.
                                |                                  |
                                V                                  |
                            +-------+                              |
                         A  |       |                              |
     RTP Payload      +---->|       |-----------------------+      |
   +---+-----------+  |     |  GCM  |--------------+        |      |
   | A |     P     |--+     |       |-----+        |        |      |
   |   |           |  |     |       |     |        |        |      |
   +---+-----------+  |     |       |     V        V        V      V
                      |  P  |       |   +---+-------------+---+--+---+
                      +---->|       |   | A |   Enc( P )  |TAG|IV|CCI|
                            |       |   |   |             |   |SN|   |
                            +-------+   +---+-------------+---+--+---+
                                ^        Protected RTP payload  ^
                                | IV                            |
                                |                               |
                             +-----+                            |
                   Nonce --->|  f  |<- Session Salt             |
                             |     |                            |
                             +-----+                            |
                                ^                               |
                                |                               |
                                +-------------------------------+
                                |
                             +-----+
                             | IV  |
                             | SN  |
                             +-----+

                   Figure 3: Media protection transform

   The media payload processing is illustrated in Figure 3.  Note that
   the processing incurs message expansion.  The original payload is
   divided into two parts, the part A which only will be authenticated
   and the part P which will be authenticated and encrypted.  The IVSN
   is a counter which increases by one for each handled payload and it
   is concatenated to the protected payload.  The authentication tag,
   TAG, is calculated over the plaintext part A and the encrypted part
   Enc( P ).  The session key used is here assumed to be the Session
   encr_key derived from the e2e-key (see Section 6.3.2).  The transform
   may also include a Crypto Context Identifier, CCI, used to identify
   the used e2e crypto context.  The details of this field are ffs but
   it may be defined to also include MKI functionality.

   On the receiver side, the CCI and the IVSN are extracted from the
   received payload and are used to check the integrity tag and to


Blom, et al.               Expires May 7, 2009                 [Page 15]

Internet-Draft           SRTP Store and Forward            November 2008


   retrieve the original RTP Payload in the obvious way.

   In Appendix A a draft description of the processing steps of a
   compound transform is given.

6.3.2.  Replay Protection

   When the RTP data is hbh transport protected between server and
   receiver, replay protection on the transport level is provided as the
   hbh protection offers the same security features as [RFC3711].  As
   mentioned, it is assumed that the server is trusted not to attempt
   replay of data on media level, unless the user requests it and thus,
   this is in line with the trust model.

   The IVSN used in the media protection functionality offers the
   possibility to implement replay protection on application level if an
   application requires it.

6.3.3.  Key Derivation

   Session key derivation (and optional key refresh) for the hbh context
   is performed as in [RFC3711] and is based on SRTP 48-bit index.

   Session key derivation (and optional key refresh) for the e2e context
   is also performed as in [RFC3711] but it is still open if the
   functionality for autonomous rekeying needs to be included.  If it is
   included it would be based on the IVSN instead of the packet index.


7.  Commented Example Usage

   In this example use case it is assumed that a sender S wants to send
   e2e protected media to a receiver R via an intermediary M. For this M
   will use SRTP with a compound transform as defined above.

   1.  S defines an e2e crypto context and forwards it to R. S also
       agrees a hbh crypto context with M. Each crypto context defines a
       master key, i.e. k_e2e and k_hbh respectively.  Note that for
       store and forward operation, the e2e crypto context has to be
       decided unilaterally by the sender.

       The compound transform defines standard HMAC-SHA1 for transport
       authentication and NULL encryption which corresponds to a
       transform defined for [RFC3711]

       The e2e protection is configured to use AES-GCM as defined above,
       giving both integrity and confidentiality protection.  For the
       e2e protection S also indicates if some part of the payload is


Blom, et al.               Expires May 7, 2009                 [Page 16]

Internet-Draft           SRTP Store and Forward            November 2008


       sent in plain by specifying its length.

       How these crypto contexts are set up (which key management
       protocol to use etc.) is out of scope.  Still, it can be noted
       that in principle it could be done by having e.g. two MIKEY
       [RFC3830] exchanges, one between S and M and one between S and R.

   2.  S sets up an SRTP session with M to have data forwarded to R. S
       offers the compound transform to M. M, knowing that it will act
       as an intermediary, accepts the offer (even though it doesn't
       have access to the e2e crypto context).  M records that the media
       received is e2e protected.  M also records the identity of the
       compound transform used.

   3.  To receive the media stream, M initiates SRTP as in [RFC3711]
       using a transform equivalent to the hbh transform in the compound
       transform offered by S.

   4.  S starts to transmit SRTP towards M, in effect using GCM and
       k_e2e for e2e media protection and HMAC-SHA1 with k_hbh for
       transport authentication.

   5.  M receives the packets and verifies the hbh authenticity of each
       SRTP packet and stores the (protected) payloads together with
       relevant side information to be used when the media is forwarded.
       Note that M would perform exactly the same operations when
       storing unprotected media for later forwarding.

   6.  At some later time, R sets up a session with M to render the
       stored media.  As R contacts a middlebox, R offers use of a
       compound transform, preferably having the same e2e transform as
       was used by S (the e2e transform may be part of the e2e crypto
       context).  If R offered a compound transform which doesn't use
       the same e2e transform or if R offered use of standard SRTP, M
       would decline the offer and propose a compatible compound
       transform.  A hbh crypto context, which is independent of the
       first one, is agreed between R and M.

   7.  M, knowing that the stored payloads are e2e protected, initiates
       use of SRTP as in [RFC3711] specifying the transform to be used
       to equal the hbh transform in the compound transform agreed
       between R and M. M then transmits the authenticated media stream
       to R.

   8.  When receiving the SRTP packets from M, R first verifies the
       transport authentication and then checks e2e media authentication
       and decrypts the payloads to retrieve the plaintext media.


Blom, et al.               Expires May 7, 2009                 [Page 17]

Internet-Draft           SRTP Store and Forward            November 2008


8.  Implications on SRTP

   As the SRTP specification allows new transforms, the new transforms
   can be added with only minor implications.

   The handling of dual security contexts (in the end-points) is however
   a new feature which will have to be introduced in SRTP.

   The Key Derivation Function defined in [RFC3711] can be reused for
   both the e2e and the hbh security contexts.


9.  Security Considerations

9.1.  Media protection Transform

   Any fixed key-stream output, generated from the same inputs (i.e. key
   and IV) MUST only be used to encrypt once.  Reusing such a key-stream
   (commonly called a "two-time pad") would almost certainly compromise
   security.

   The new AES-GCM based transform accomplish packet-uniqueness by
   including the IVSN and stream-uniqueness by inclusion of a nonce in
   the IV formation.  Thus, the nonce MUST be unique between all the RTP
   streams within the same RTP session that share the same e2e master
   key.  Master keys MAY be shared between streams belonging to the same
   RTP session, but it is RECOMMENDED that each stream have its own
   master key.

   With the above conditions fulfilled the security level of the AES-GCM
   based transform will equal the level offered by [RFC3711].  Thus the
   compound transform will as whole also have the same security level as
   [RFC3711]

9.2.  Replay Protection

   Replay protection is only provided on hbh basis.  Not that the
   requirements on random seek in the media stream rules out any general
   replay protection mechanism applied on an e2e basis and that this
   threat falls outside the assumed trust model.  Still, the IVSN used
   offers the possibly to implement application specific replay
   protection mechanisms.


10.  Acknowledgements

   The authors would like to thank Daniel Catrein, Frank Hartung, and
   Magnus Westerlund for their support and valuable comments.


Blom, et al.               Expires May 7, 2009                 [Page 18]

Internet-Draft           SRTP Store and Forward            November 2008


11.  IANA Considerations

   To signal that the new transforms are used, each relevant key
   management protocol needs to register the new transforms including
   numbering scheme and syntax with IANA.


12.  References

12.1.  Normative References

   [GCM]      NIST, "Recommendation for Block Cipher Modes of Operation:
              Galois/Counter Mode (GCM) and GMAC", NIST SP 800-38D,
              November 2007.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

12.2.   Informative References

   [3GPP.26.234]
              3GPP, "Transparent end-to-end Packet-switched Streaming
              Service (PSS); Protocols and codecs", 3GPP TS 26.234
              7.5.0, March 2008.

   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
              August 2004.


Appendix A.  Draft Compound Transform Details

   This informative appendix proposes a way to define the compound
   transform such that it fits well in the SRTP framework.  We assume
   the transform is defined to provide

   o  Integrity and confidentiality e2e (the media part)

   o  Integrity hbh (the transport part)


Blom, et al.               Expires May 7, 2009                 [Page 19]

Internet-Draft           SRTP Store and Forward            November 2008


   Clearly other combinations are also possible in the form of any or
   all of the 15 possible (non-trivial) combinations of the security
   services confidentiality and integrity for the hbh as well as the e2e
   part.  However, we feel that integrity and confidentiality on e2e
   basis combined with hbh integrity will be sufficient in most cases.

   As discussed above we introduce a compound transform, CT.  The CT has
   two parts:

   o  AES-GCM is used to process the RTP payload, providing
      confidentiality and integrity and is intended for the e2e
      protection.  Conceptually, we view AES-GCM as an encryption
      transform within the SRTP framework.

   o  HMAC-SHA1 is used to provide integrity protection of the entire
      RTP packet (including the AES-GCM encrypted payload and the
      metadata added by AES-CGM) and is intended for the hbh part.


            +--------+---+--------------+---+--+---+----------+
            | RTP    | A |    Enc( P )  |TAG|IV|CCI|   Auth   |
            | header |   |              |   |SN|   | Tag (hbh)|
            +--------+---+--------------+---+--+---+----------+
            ^        ^_____________________________^
            |         RTP e2e Protected Payload    ^
            | _______  Hbh protected RTP packet ___|


                      Figure 4: SRTP protection scope

   Below, we make the natural (and necessary) assumption that the sender
   is made aware (e.g. by session set-up signaling) that the media will
   be delivered/stored in a middlebox.  Similarly, we assume the
   middlebox is aware of that it is acting as a middlebox.

A.1.  Processing

   Recall that standard SRTP processing has the following principal
   form.

   1.  The sender determines keys, transforms, and other parameters from
       the cryptographic context.

   2.  The sender encrypts the payload (optional).

   3.  The sender integrity protects the RTP payload (optional).


Blom, et al.               Expires May 7, 2009                 [Page 20]

Internet-Draft           SRTP Store and Forward            November 2008


   On the receiver side, the decryption/integrity verification is
   reversed.

   In the following we describe the processing taking place in sender,
   middlebox, and ultimate receiver as triggered by the use of the CT
   transform indicated by the cryptographic contexts involved.

A.1.1.  Sender

   1.  The sender determines keys and other parameters in the same way
       as standard SRTP does.  The crypto context states that the CT
       transform shall be used.

   2.  The sender applies the AES-GCM part of CT to the payload.
       Conceptually treating AES-GCM as an encryption transform, this
       agrees with the normal SRTP processing.

   3.  The sender next applies the HMAC part of CT.  Again, this agrees
       with adding standard SRTP integrity protection.

A.1.2.  Middlebox

A.1.2.1.  Message Storage

   1.  The middlebox determines keys and other parameters in the same
       way as standard SRTP does.  The crypto context states that the CT
       transform shall be used.  Since the middlebox is aware of its
       role as a (receiving) middlebox, the middlebox configures itself
       to verify integrity but not to decrypt the payload.  To fit with
       the normal SRTP processing, the middlebox may therefore
       conceptually configure itself to perform HMAC integrity
       verification but use NULL decryption as supported by SRTP.

   2.  The middlebox next applies the HMAC part of CT according to
       standard SRTP integrity verification and replay protection."

   3.  The middlebox extracts the payload (which is the AES_GCM output
       as generated by the sender) and stores it for later retrieval by
       the receiver.

A.1.2.2.  Message Delivery

   1.  The middlebox determines keys and other parameters in the same
       way as standard SRTP does.  The crypto context states that the CT
       transform shall be used.  Since the middlebox is aware of its
       role as a (sending) middlebox, the middlebox configures itself to
       not encrypt the payload but only to add integrity protection.


Blom, et al.               Expires May 7, 2009                 [Page 21]

Internet-Draft           SRTP Store and Forward            November 2008


   2.  The middlebox applies NULL encryption to the payload.

   3.  The middlebox applies HMAC integrity.

A.1.2.3.  Message Delivery

   The crypto context tells the receiver to use the CT transform and the
   receiver can process accordingly.


Authors' Addresses

   Rolf Blom
   Ericsson Research
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 585 317 07
   Email: rolf.j.blom@ericsson.com


   Yi Cheng
   Ericsson Research
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 568 674 22
   Email: yi.cheng@ericsson.com


   F. Lindholm
   Ericsson AB
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 585 317 05
   Email: fredrik.lindholm@ericsson.com


   John Mattsson
   Ericsson Research
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 404 35 01
   Email: john.mattsson@ericsson.com


Blom, et al.               Expires May 7, 2009                 [Page 22]

Internet-Draft           SRTP Store and Forward            November 2008


   Mats Naslund
   Ericsson Research
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 585 337 39
   Email: mats.naslund@ericsson.com


   Karl Norrman
   Ericsson Research
   SE-164 80 Stockholm
   Sweden

   Phone: +46 8 404 45 02
   Email: karl.norrman@ericsson.com


Blom, et al.               Expires May 7, 2009                 [Page 23]

Internet-Draft           SRTP Store and Forward            November 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Blom, et al.               Expires May 7, 2009                 [Page 24]